QEMU-Devel Archive on lore.kernel.org
 help / color / Atom feed
* [PULL 00/14] Monitor patches for 2020-10-09
@ 2020-10-09  6:34 Markus Armbruster
  2020-10-09  6:34 ` [PULL 01/14] monitor: Add Monitor parameter to monitor_set_cpu() Markus Armbruster
                   ` (14 more replies)
  0 siblings, 15 replies; 27+ messages in thread
From: Markus Armbruster @ 2020-10-09  6:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf

The following changes since commit 497d415d76b9f59fcae27f22df1ca2c3fa4df64e:

  Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20201008-1' into staging (2020-10-08 21:41:20 +0100)

are available in the Git repository at:

  git://repo.or.cz/qemu/armbru.git tags/pull-monitor-2020-10-09

for you to fetch changes up to eb94b81a94bce112e6b206df846c1551aaf6cab6:

  block: Convert 'block_resize' to coroutine (2020-10-09 07:08:20 +0200)

----------------------------------------------------------------
Monitor patches for 2020-10-09

* QMP command block-resize and HMP command block_resize no longer
  block the main loop.

----------------------------------------------------------------
Kevin Wolf (14):
      monitor: Add Monitor parameter to monitor_set_cpu()
      monitor: Add Monitor parameter to monitor_get_cpu_index()
      monitor: Use getter/setter functions for cur_mon
      hmp: Update current monitor only in handle_hmp_command()
      qmp: Assert that no other monitor is active
      qmp: Call monitor_set_cur() only in qmp_dispatch()
      monitor: Make current monitor a per-coroutine property
      qapi: Add a 'coroutine' flag for commands
      qmp: Move dispatcher to a coroutine
      hmp: Add support for coroutine command handlers
      util/async: Add aio_co_reschedule_self()
      block: Add bdrv_co_enter()/leave()
      block: Add bdrv_lock()/unlock()
      block: Convert 'block_resize' to coroutine

 docs/devel/qapi-code-gen.txt            |  29 +++++++
 docs/sphinx/qapidoc.py                  |   2 +-
 qapi/block-core.json                    |   3 +-
 include/block/aio.h                     |  10 +++
 include/block/block.h                   |  31 ++++++++
 include/monitor/monitor.h               |   7 +-
 include/qapi/qmp/dispatch.h             |   5 +-
 monitor/monitor-internal.h              |   7 +-
 audio/wavcapture.c                      |   8 +-
 block.c                                 |  50 ++++++++++++
 blockdev.c                              |  16 ++--
 dump/dump.c                             |   2 +-
 hw/core/machine-hmp-cmds.c              |   2 +-
 hw/scsi/vhost-scsi.c                    |   2 +-
 hw/virtio/vhost-vsock.c                 |   2 +-
 migration/fd.c                          |   4 +-
 monitor/hmp-cmds.c                      |   4 +-
 monitor/hmp.c                           |  44 ++++++++---
 monitor/misc.c                          |  38 +++++----
 monitor/monitor.c                       | 101 +++++++++++++++++++++---
 monitor/qmp-cmds-control.c              |   2 +
 monitor/qmp-cmds.c                      |   2 +-
 monitor/qmp.c                           | 131 +++++++++++++++++++++++---------
 net/socket.c                            |   2 +-
 net/tap.c                               |   6 +-
 qapi/qmp-dispatch.c                     |  65 +++++++++++++++-
 qapi/qmp-registry.c                     |   3 +
 qga/main.c                              |   2 +-
 softmmu/cpus.c                          |   2 +-
 stubs/monitor-core.c                    |  10 ++-
 tests/test-qmp-cmds.c                   |  10 ++-
 tests/test-util-sockets.c               |  12 +--
 trace/control.c                         |   2 +-
 util/aio-posix.c                        |   8 +-
 util/async.c                            |  30 ++++++++
 util/qemu-error.c                       |   6 +-
 util/qemu-print.c                       |   3 +-
 util/qemu-sockets.c                     |   1 +
 hmp-commands.hx                         |   1 +
 scripts/qapi/commands.py                |  10 ++-
 scripts/qapi/expr.py                    |  11 ++-
 scripts/qapi/introspect.py              |   2 +-
 scripts/qapi/schema.py                  |  13 +++-
 tests/qapi-schema/meson.build           |   1 +
 tests/qapi-schema/oob-coroutine.err     |   2 +
 tests/qapi-schema/oob-coroutine.json    |   2 +
 tests/qapi-schema/oob-coroutine.out     |   0
 tests/qapi-schema/qapi-schema-test.json |   1 +
 tests/qapi-schema/qapi-schema-test.out  |   2 +
 tests/qapi-schema/test-qapi.py          |   7 +-
 50 files changed, 573 insertions(+), 143 deletions(-)
 create mode 100644 tests/qapi-schema/oob-coroutine.err
 create mode 100644 tests/qapi-schema/oob-coroutine.json
 create mode 100644 tests/qapi-schema/oob-coroutine.out

-- 
2.26.2



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PULL 01/14] monitor: Add Monitor parameter to monitor_set_cpu()
  2020-10-09  6:34 [PULL 00/14] Monitor patches for 2020-10-09 Markus Armbruster
@ 2020-10-09  6:34 ` Markus Armbruster
  2020-10-09  6:34 ` [PULL 02/14] monitor: Add Monitor parameter to monitor_get_cpu_index() Markus Armbruster
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 27+ messages in thread
From: Markus Armbruster @ 2020-10-09  6:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Stefan Hajnoczi

From: Kevin Wolf <kwolf@redhat.com>

Most callers actually don't have to rely on cur_mon, but already know
for which monitor they call monitor_set_cpu().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20201005155855.256490-2-kwolf@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 include/monitor/monitor.h |  2 +-
 monitor/hmp-cmds.c        |  2 +-
 monitor/misc.c            | 10 +++++-----
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/monitor/monitor.h b/include/monitor/monitor.h
index c0170773d4..04f472ac4f 100644
--- a/include/monitor/monitor.h
+++ b/include/monitor/monitor.h
@@ -33,7 +33,7 @@ int monitor_vprintf(Monitor *mon, const char *fmt, va_list ap)
     GCC_FMT_ATTR(2, 0);
 int monitor_printf(Monitor *mon, const char *fmt, ...) GCC_FMT_ATTR(2, 3);
 void monitor_flush(Monitor *mon);
-int monitor_set_cpu(int cpu_index);
+int monitor_set_cpu(Monitor *mon, int cpu_index);
 int monitor_get_cpu_index(void);
 
 void monitor_read_command(MonitorHMP *mon, int show_prompt);
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index dc0de39219..0c0a03f824 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -998,7 +998,7 @@ void hmp_cpu(Monitor *mon, const QDict *qdict)
     /* XXX: drop the monitor_set_cpu() usage when all HMP commands that
             use it are converted to the QAPI */
     cpu_index = qdict_get_int(qdict, "index");
-    if (monitor_set_cpu(cpu_index) < 0) {
+    if (monitor_set_cpu(mon, cpu_index) < 0) {
         monitor_printf(mon, "invalid CPU index\n");
     }
 }
diff --git a/monitor/misc.c b/monitor/misc.c
index 6e0da0cb96..25b42593cc 100644
--- a/monitor/misc.c
+++ b/monitor/misc.c
@@ -129,7 +129,7 @@ char *qmp_human_monitor_command(const char *command_line, bool has_cpu_index,
     cur_mon = &hmp.common;
 
     if (has_cpu_index) {
-        int ret = monitor_set_cpu(cpu_index);
+        int ret = monitor_set_cpu(&hmp.common, cpu_index);
         if (ret < 0) {
             cur_mon = old_mon;
             error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "cpu-index",
@@ -255,7 +255,7 @@ static void monitor_init_qmp_commands(void)
 }
 
 /* Set the current CPU defined by the user. Callers must hold BQL. */
-int monitor_set_cpu(int cpu_index)
+int monitor_set_cpu(Monitor *mon, int cpu_index)
 {
     CPUState *cpu;
 
@@ -263,8 +263,8 @@ int monitor_set_cpu(int cpu_index)
     if (cpu == NULL) {
         return -1;
     }
-    g_free(cur_mon->mon_cpu_path);
-    cur_mon->mon_cpu_path = object_get_canonical_path(OBJECT(cpu));
+    g_free(mon->mon_cpu_path);
+    mon->mon_cpu_path = object_get_canonical_path(OBJECT(cpu));
     return 0;
 }
 
@@ -285,7 +285,7 @@ static CPUState *mon_get_cpu_sync(bool synchronize)
         if (!first_cpu) {
             return NULL;
         }
-        monitor_set_cpu(first_cpu->cpu_index);
+        monitor_set_cpu(cur_mon, first_cpu->cpu_index);
         cpu = first_cpu;
     }
     assert(cpu != NULL);
-- 
2.26.2



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PULL 02/14] monitor: Add Monitor parameter to monitor_get_cpu_index()
  2020-10-09  6:34 [PULL 00/14] Monitor patches for 2020-10-09 Markus Armbruster
  2020-10-09  6:34 ` [PULL 01/14] monitor: Add Monitor parameter to monitor_set_cpu() Markus Armbruster
@ 2020-10-09  6:34 ` Markus Armbruster
  2020-10-09  6:34 ` [PULL 03/14] monitor: Use getter/setter functions for cur_mon Markus Armbruster
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 27+ messages in thread
From: Markus Armbruster @ 2020-10-09  6:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Stefan Hajnoczi

From: Kevin Wolf <kwolf@redhat.com>

Most callers actually don't have to rely on cur_mon, but already know
for which monitor they call monitor_get_cpu_index().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20201005155855.256490-3-kwolf@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 include/monitor/monitor.h  |  2 +-
 hw/core/machine-hmp-cmds.c |  2 +-
 monitor/hmp-cmds.c         |  2 +-
 monitor/misc.c             | 20 ++++++++++----------
 softmmu/cpus.c             |  2 +-
 5 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/include/monitor/monitor.h b/include/monitor/monitor.h
index 04f472ac4f..93bedf0b75 100644
--- a/include/monitor/monitor.h
+++ b/include/monitor/monitor.h
@@ -34,7 +34,7 @@ int monitor_vprintf(Monitor *mon, const char *fmt, va_list ap)
 int monitor_printf(Monitor *mon, const char *fmt, ...) GCC_FMT_ATTR(2, 3);
 void monitor_flush(Monitor *mon);
 int monitor_set_cpu(Monitor *mon, int cpu_index);
-int monitor_get_cpu_index(void);
+int monitor_get_cpu_index(Monitor *mon);
 
 void monitor_read_command(MonitorHMP *mon, int show_prompt);
 int monitor_read_password(MonitorHMP *mon, ReadLineFunc *readline_func,
diff --git a/hw/core/machine-hmp-cmds.c b/hw/core/machine-hmp-cmds.c
index f4092b98cc..6357be9c6b 100644
--- a/hw/core/machine-hmp-cmds.c
+++ b/hw/core/machine-hmp-cmds.c
@@ -34,7 +34,7 @@ void hmp_info_cpus(Monitor *mon, const QDict *qdict)
     for (cpu = cpu_list; cpu; cpu = cpu->next) {
         int active = ' ';
 
-        if (cpu->value->cpu_index == monitor_get_cpu_index()) {
+        if (cpu->value->cpu_index == monitor_get_cpu_index(mon)) {
             active = '*';
         }
 
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index 0c0a03f824..9789f4277f 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -1009,7 +1009,7 @@ void hmp_memsave(Monitor *mon, const QDict *qdict)
     const char *filename = qdict_get_str(qdict, "filename");
     uint64_t addr = qdict_get_int(qdict, "val");
     Error *err = NULL;
-    int cpu_index = monitor_get_cpu_index();
+    int cpu_index = monitor_get_cpu_index(mon);
 
     if (cpu_index < 0) {
         monitor_printf(mon, "No CPU available\n");
diff --git a/monitor/misc.c b/monitor/misc.c
index 25b42593cc..4ea575eea8 100644
--- a/monitor/misc.c
+++ b/monitor/misc.c
@@ -269,23 +269,23 @@ int monitor_set_cpu(Monitor *mon, int cpu_index)
 }
 
 /* Callers must hold BQL. */
-static CPUState *mon_get_cpu_sync(bool synchronize)
+static CPUState *mon_get_cpu_sync(Monitor *mon, bool synchronize)
 {
     CPUState *cpu = NULL;
 
-    if (cur_mon->mon_cpu_path) {
-        cpu = (CPUState *) object_resolve_path_type(cur_mon->mon_cpu_path,
+    if (mon->mon_cpu_path) {
+        cpu = (CPUState *) object_resolve_path_type(mon->mon_cpu_path,
                                                     TYPE_CPU, NULL);
         if (!cpu) {
-            g_free(cur_mon->mon_cpu_path);
-            cur_mon->mon_cpu_path = NULL;
+            g_free(mon->mon_cpu_path);
+            mon->mon_cpu_path = NULL;
         }
     }
-    if (!cur_mon->mon_cpu_path) {
+    if (!mon->mon_cpu_path) {
         if (!first_cpu) {
             return NULL;
         }
-        monitor_set_cpu(cur_mon, first_cpu->cpu_index);
+        monitor_set_cpu(mon, first_cpu->cpu_index);
         cpu = first_cpu;
     }
     assert(cpu != NULL);
@@ -297,7 +297,7 @@ static CPUState *mon_get_cpu_sync(bool synchronize)
 
 CPUState *mon_get_cpu(void)
 {
-    return mon_get_cpu_sync(true);
+    return mon_get_cpu_sync(cur_mon, true);
 }
 
 CPUArchState *mon_get_cpu_env(void)
@@ -307,9 +307,9 @@ CPUArchState *mon_get_cpu_env(void)
     return cs ? cs->env_ptr : NULL;
 }
 
-int monitor_get_cpu_index(void)
+int monitor_get_cpu_index(Monitor *mon)
 {
-    CPUState *cs = mon_get_cpu_sync(false);
+    CPUState *cs = mon_get_cpu_sync(mon, false);
 
     return cs ? cs->cpu_index : UNASSIGNED_CPU_INDEX;
 }
diff --git a/softmmu/cpus.c b/softmmu/cpus.c
index 9e33416b4d..e9d4a44fcc 100644
--- a/softmmu/cpus.c
+++ b/softmmu/cpus.c
@@ -793,6 +793,6 @@ exit:
 
 void qmp_inject_nmi(Error **errp)
 {
-    nmi_monitor_handle(monitor_get_cpu_index(), errp);
+    nmi_monitor_handle(monitor_get_cpu_index(cur_mon), errp);
 }
 
-- 
2.26.2



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PULL 03/14] monitor: Use getter/setter functions for cur_mon
  2020-10-09  6:34 [PULL 00/14] Monitor patches for 2020-10-09 Markus Armbruster
  2020-10-09  6:34 ` [PULL 01/14] monitor: Add Monitor parameter to monitor_set_cpu() Markus Armbruster
  2020-10-09  6:34 ` [PULL 02/14] monitor: Add Monitor parameter to monitor_get_cpu_index() Markus Armbruster
@ 2020-10-09  6:34 ` Markus Armbruster
  2020-10-09  6:34 ` [PULL 04/14] hmp: Update current monitor only in handle_hmp_command() Markus Armbruster
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 27+ messages in thread
From: Markus Armbruster @ 2020-10-09  6:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Stefan Hajnoczi

From: Kevin Wolf <kwolf@redhat.com>

cur_mon really needs to be coroutine-local as soon as we move monitor
command handlers to coroutines and let them yield. As a first step, just
remove all direct accesses to cur_mon so that we can implement this in
the getter function later.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20201005155855.256490-4-kwolf@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 include/monitor/monitor.h  |  3 ++-
 audio/wavcapture.c         |  8 ++++----
 dump/dump.c                |  2 +-
 hw/scsi/vhost-scsi.c       |  2 +-
 hw/virtio/vhost-vsock.c    |  2 +-
 migration/fd.c             |  4 ++--
 monitor/hmp.c              | 11 +++++------
 monitor/misc.c             | 13 +++++++------
 monitor/monitor.c          | 24 +++++++++++++++++++++++-
 monitor/qmp-cmds-control.c |  2 ++
 monitor/qmp-cmds.c         |  2 +-
 monitor/qmp.c              |  7 ++-----
 net/socket.c               |  2 +-
 net/tap.c                  |  6 +++---
 softmmu/cpus.c             |  2 +-
 stubs/monitor-core.c       |  5 ++++-
 tests/test-util-sockets.c  | 12 ++++++------
 trace/control.c            |  2 +-
 util/qemu-error.c          |  6 +++---
 util/qemu-print.c          |  3 ++-
 util/qemu-sockets.c        |  1 +
 21 files changed, 73 insertions(+), 46 deletions(-)

diff --git a/include/monitor/monitor.h b/include/monitor/monitor.h
index 93bedf0b75..543eafcb76 100644
--- a/include/monitor/monitor.h
+++ b/include/monitor/monitor.h
@@ -5,7 +5,6 @@
 #include "qapi/qapi-types-misc.h"
 #include "qemu/readline.h"
 
-extern __thread Monitor *cur_mon;
 typedef struct MonitorHMP MonitorHMP;
 typedef struct MonitorOptions MonitorOptions;
 
@@ -13,6 +12,8 @@ typedef struct MonitorOptions MonitorOptions;
 
 extern QemuOptsList qemu_mon_opts;
 
+Monitor *monitor_cur(void);
+Monitor *monitor_set_cur(Monitor *mon);
 bool monitor_cur_is_qmp(void);
 
 void monitor_init_globals(void);
diff --git a/audio/wavcapture.c b/audio/wavcapture.c
index 17e87ed6f4..c60286e162 100644
--- a/audio/wavcapture.c
+++ b/audio/wavcapture.c
@@ -1,5 +1,5 @@
 #include "qemu/osdep.h"
-#include "monitor/monitor.h"
+#include "qemu/qemu-print.h"
 #include "qapi/error.h"
 #include "qemu/error-report.h"
 #include "audio.h"
@@ -94,9 +94,9 @@ static void wav_capture_info (void *opaque)
     WAVState *wav = opaque;
     char *path = wav->path;
 
-    monitor_printf (cur_mon, "Capturing audio(%d,%d,%d) to %s: %d bytes\n",
-                    wav->freq, wav->bits, wav->nchannels,
-                    path ? path : "<not available>", wav->bytes);
+    qemu_printf("Capturing audio(%d,%d,%d) to %s: %d bytes\n",
+                wav->freq, wav->bits, wav->nchannels,
+                path ? path : "<not available>", wav->bytes);
 }
 
 static struct capture_ops wav_capture_ops = {
diff --git a/dump/dump.c b/dump/dump.c
index 45da46a952..dec32468d9 100644
--- a/dump/dump.c
+++ b/dump/dump.c
@@ -1986,7 +1986,7 @@ void qmp_dump_guest_memory(bool paging, const char *file,
 
 #if !defined(WIN32)
     if (strstart(file, "fd:", &p)) {
-        fd = monitor_get_fd(cur_mon, p, errp);
+        fd = monitor_get_fd(monitor_cur(), p, errp);
         if (fd == -1) {
             return;
         }
diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
index a83ffeefc8..4d70fa036b 100644
--- a/hw/scsi/vhost-scsi.c
+++ b/hw/scsi/vhost-scsi.c
@@ -177,7 +177,7 @@ static void vhost_scsi_realize(DeviceState *dev, Error **errp)
     }
 
     if (vs->conf.vhostfd) {
-        vhostfd = monitor_fd_param(cur_mon, vs->conf.vhostfd, errp);
+        vhostfd = monitor_fd_param(monitor_cur(), vs->conf.vhostfd, errp);
         if (vhostfd == -1) {
             error_prepend(errp, "vhost-scsi: unable to parse vhostfd: ");
             return;
diff --git a/hw/virtio/vhost-vsock.c b/hw/virtio/vhost-vsock.c
index c8f0699b4f..f9db4beb47 100644
--- a/hw/virtio/vhost-vsock.c
+++ b/hw/virtio/vhost-vsock.c
@@ -143,7 +143,7 @@ static void vhost_vsock_device_realize(DeviceState *dev, Error **errp)
     }
 
     if (vsock->conf.vhostfd) {
-        vhostfd = monitor_fd_param(cur_mon, vsock->conf.vhostfd, errp);
+        vhostfd = monitor_fd_param(monitor_cur(), vsock->conf.vhostfd, errp);
         if (vhostfd == -1) {
             error_prepend(errp, "vhost-vsock: unable to parse vhostfd: ");
             return;
diff --git a/migration/fd.c b/migration/fd.c
index 0a29ecdebf..6f2f50475f 100644
--- a/migration/fd.c
+++ b/migration/fd.c
@@ -26,7 +26,7 @@
 void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error **errp)
 {
     QIOChannel *ioc;
-    int fd = monitor_get_fd(cur_mon, fdname, errp);
+    int fd = monitor_get_fd(monitor_cur(), fdname, errp);
     if (fd == -1) {
         return;
     }
@@ -55,7 +55,7 @@ static gboolean fd_accept_incoming_migration(QIOChannel *ioc,
 void fd_start_incoming_migration(const char *fdname, Error **errp)
 {
     QIOChannel *ioc;
-    int fd = monitor_fd_param(cur_mon, fdname, errp);
+    int fd = monitor_fd_param(monitor_cur(), fdname, errp);
     if (fd == -1) {
         return;
     }
diff --git a/monitor/hmp.c b/monitor/hmp.c
index 4ecdefd705..e0cc9e65dd 100644
--- a/monitor/hmp.c
+++ b/monitor/hmp.c
@@ -1300,12 +1300,11 @@ cleanup:
 
 static void monitor_read(void *opaque, const uint8_t *buf, int size)
 {
-    MonitorHMP *mon;
-    Monitor *old_mon = cur_mon;
+    MonitorHMP *mon = container_of(opaque, MonitorHMP, common);
+    Monitor *old_mon;
     int i;
 
-    cur_mon = opaque;
-    mon = container_of(cur_mon, MonitorHMP, common);
+    old_mon = monitor_set_cur(&mon->common);
 
     if (mon->rs) {
         for (i = 0; i < size; i++) {
@@ -1313,13 +1312,13 @@ static void monitor_read(void *opaque, const uint8_t *buf, int size)
         }
     } else {
         if (size == 0 || buf[size - 1] != 0) {
-            monitor_printf(cur_mon, "corrupted command\n");
+            monitor_printf(&mon->common, "corrupted command\n");
         } else {
             handle_hmp_command(mon, (char *)buf);
         }
     }
 
-    cur_mon = old_mon;
+    monitor_set_cur(old_mon);
 }
 
 static void monitor_event(void *opaque, QEMUChrEvent event)
diff --git a/monitor/misc.c b/monitor/misc.c
index 4ea575eea8..ee8db45094 100644
--- a/monitor/misc.c
+++ b/monitor/misc.c
@@ -125,13 +125,12 @@ char *qmp_human_monitor_command(const char *command_line, bool has_cpu_index,
 
     monitor_data_init(&hmp.common, false, true, false);
 
-    old_mon = cur_mon;
-    cur_mon = &hmp.common;
+    old_mon = monitor_set_cur(&hmp.common);
 
     if (has_cpu_index) {
         int ret = monitor_set_cpu(&hmp.common, cpu_index);
         if (ret < 0) {
-            cur_mon = old_mon;
+            monitor_set_cur(old_mon);
             error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "cpu-index",
                        "a CPU number");
             goto out;
@@ -139,7 +138,7 @@ char *qmp_human_monitor_command(const char *command_line, bool has_cpu_index,
     }
 
     handle_hmp_command(&hmp, command_line);
-    cur_mon = old_mon;
+    monitor_set_cur(old_mon);
 
     WITH_QEMU_LOCK_GUARD(&hmp.common.mon_lock) {
         if (qstring_get_length(hmp.common.outbuf) > 0) {
@@ -297,7 +296,7 @@ static CPUState *mon_get_cpu_sync(Monitor *mon, bool synchronize)
 
 CPUState *mon_get_cpu(void)
 {
-    return mon_get_cpu_sync(cur_mon, true);
+    return mon_get_cpu_sync(monitor_cur(), true);
 }
 
 CPUArchState *mon_get_cpu_env(void)
@@ -1232,6 +1231,7 @@ static void hmp_acl_remove(Monitor *mon, const QDict *qdict)
 
 void qmp_getfd(const char *fdname, Error **errp)
 {
+    Monitor *cur_mon = monitor_cur();
     mon_fd_t *monfd;
     int fd, tmp_fd;
 
@@ -1270,6 +1270,7 @@ void qmp_getfd(const char *fdname, Error **errp)
 
 void qmp_closefd(const char *fdname, Error **errp)
 {
+    Monitor *cur_mon = monitor_cur();
     mon_fd_t *monfd;
     int tmp_fd;
 
@@ -1356,7 +1357,7 @@ AddfdInfo *qmp_add_fd(bool has_fdset_id, int64_t fdset_id, bool has_opaque,
                       const char *opaque, Error **errp)
 {
     int fd;
-    Monitor *mon = cur_mon;
+    Monitor *mon = monitor_cur();
     AddfdInfo *fdinfo;
 
     fd = qemu_chr_fe_get_msgfd(&mon->chr);
diff --git a/monitor/monitor.c b/monitor/monitor.c
index 0f32892ad4..099c164c6d 100644
--- a/monitor/monitor.c
+++ b/monitor/monitor.c
@@ -66,13 +66,31 @@ MonitorList mon_list;
 int mon_refcount;
 static bool monitor_destroyed;
 
-__thread Monitor *cur_mon;
+static __thread Monitor *cur_monitor;
+
+Monitor *monitor_cur(void)
+{
+    return cur_monitor;
+}
+
+/**
+ * Sets a new current monitor and returns the old one.
+ */
+Monitor *monitor_set_cur(Monitor *mon)
+{
+    Monitor *old_monitor = cur_monitor;
+
+    cur_monitor = mon;
+    return old_monitor;
+}
 
 /**
  * Is the current monitor, if any, a QMP monitor?
  */
 bool monitor_cur_is_qmp(void)
 {
+    Monitor *cur_mon = monitor_cur();
+
     return cur_mon && monitor_is_qmp(cur_mon);
 }
 
@@ -209,6 +227,8 @@ int monitor_printf(Monitor *mon, const char *fmt, ...)
  */
 int error_vprintf(const char *fmt, va_list ap)
 {
+    Monitor *cur_mon = monitor_cur();
+
     if (cur_mon && !monitor_cur_is_qmp()) {
         return monitor_vprintf(cur_mon, fmt, ap);
     }
@@ -217,6 +237,8 @@ int error_vprintf(const char *fmt, va_list ap)
 
 int error_vprintf_unless_qmp(const char *fmt, va_list ap)
 {
+    Monitor *cur_mon = monitor_cur();
+
     if (!cur_mon) {
         return vfprintf(stderr, fmt, ap);
     }
diff --git a/monitor/qmp-cmds-control.c b/monitor/qmp-cmds-control.c
index 8f04cfa6e6..a456762f6a 100644
--- a/monitor/qmp-cmds-control.c
+++ b/monitor/qmp-cmds-control.c
@@ -69,6 +69,7 @@ static bool qmp_caps_accept(MonitorQMP *mon, QMPCapabilityList *list,
 void qmp_qmp_capabilities(bool has_enable, QMPCapabilityList *enable,
                           Error **errp)
 {
+    Monitor *cur_mon = monitor_cur();
     MonitorQMP *mon;
 
     assert(monitor_is_qmp(cur_mon));
@@ -119,6 +120,7 @@ static void query_commands_cb(const QmpCommand *cmd, void *opaque)
 CommandInfoList *qmp_query_commands(Error **errp)
 {
     CommandInfoList *list = NULL;
+    Monitor *cur_mon = monitor_cur();
     MonitorQMP *mon;
 
     assert(monitor_is_qmp(cur_mon));
diff --git a/monitor/qmp-cmds.c b/monitor/qmp-cmds.c
index 0ab5b78580..1abef70a89 100644
--- a/monitor/qmp-cmds.c
+++ b/monitor/qmp-cmds.c
@@ -328,7 +328,7 @@ void qmp_add_client(const char *protocol, const char *fdname,
     Chardev *s;
     int fd;
 
-    fd = monitor_get_fd(cur_mon, fdname, errp);
+    fd = monitor_get_fd(monitor_cur(), fdname, errp);
     if (fd < 0) {
         return;
     }
diff --git a/monitor/qmp.c b/monitor/qmp.c
index d433ceae5b..bb2d9d0cc7 100644
--- a/monitor/qmp.c
+++ b/monitor/qmp.c
@@ -139,12 +139,9 @@ static void monitor_qmp_dispatch(MonitorQMP *mon, QObject *req)
     QDict *rsp;
     QDict *error;
 
-    old_mon = cur_mon;
-    cur_mon = &mon->common;
-
+    old_mon = monitor_set_cur(&mon->common);
     rsp = qmp_dispatch(mon->commands, req, qmp_oob_enabled(mon));
-
-    cur_mon = old_mon;
+    monitor_set_cur(old_mon);
 
     if (mon->commands == &qmp_cap_negotiation_commands) {
         error = qdict_get_qdict(rsp, "error");
diff --git a/net/socket.c b/net/socket.c
index 2d21fddd9c..15b410e8d8 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -727,7 +727,7 @@ int net_init_socket(const Netdev *netdev, const char *name,
     if (sock->has_fd) {
         int fd, ret;
 
-        fd = monitor_fd_param(cur_mon, sock->fd, errp);
+        fd = monitor_fd_param(monitor_cur(), sock->fd, errp);
         if (fd == -1) {
             return -1;
         }
diff --git a/net/tap.c b/net/tap.c
index 04ce72dd2f..c46ff66184 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -700,7 +700,7 @@ static void net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
         if (vhostfdname) {
             int ret;
 
-            vhostfd = monitor_fd_param(cur_mon, vhostfdname, &err);
+            vhostfd = monitor_fd_param(monitor_cur(), vhostfdname, &err);
             if (vhostfd == -1) {
                 if (tap->has_vhostforce && tap->vhostforce) {
                     error_propagate(errp, err);
@@ -808,7 +808,7 @@ int net_init_tap(const Netdev *netdev, const char *name,
             return -1;
         }
 
-        fd = monitor_fd_param(cur_mon, tap->fd, errp);
+        fd = monitor_fd_param(monitor_cur(), tap->fd, errp);
         if (fd == -1) {
             return -1;
         }
@@ -862,7 +862,7 @@ int net_init_tap(const Netdev *netdev, const char *name,
         }
 
         for (i = 0; i < nfds; i++) {
-            fd = monitor_fd_param(cur_mon, fds[i], errp);
+            fd = monitor_fd_param(monitor_cur(), fds[i], errp);
             if (fd == -1) {
                 ret = -1;
                 goto free_fail;
diff --git a/softmmu/cpus.c b/softmmu/cpus.c
index e9d4a44fcc..47cceddd80 100644
--- a/softmmu/cpus.c
+++ b/softmmu/cpus.c
@@ -793,6 +793,6 @@ exit:
 
 void qmp_inject_nmi(Error **errp)
 {
-    nmi_monitor_handle(monitor_get_cpu_index(cur_mon), errp);
+    nmi_monitor_handle(monitor_get_cpu_index(monitor_cur()), errp);
 }
 
diff --git a/stubs/monitor-core.c b/stubs/monitor-core.c
index 6cff1c4e1d..0cd2d864b2 100644
--- a/stubs/monitor-core.c
+++ b/stubs/monitor-core.c
@@ -3,7 +3,10 @@
 #include "qemu-common.h"
 #include "qapi/qapi-emit-events.h"
 
-__thread Monitor *cur_mon;
+Monitor *monitor_cur(void)
+{
+    return NULL;
+}
 
 void monitor_init_qmp(Chardev *chr, bool pretty, Error **errp)
 {
diff --git a/tests/test-util-sockets.c b/tests/test-util-sockets.c
index 1bbb16d9b1..f6336e0f91 100644
--- a/tests/test-util-sockets.c
+++ b/tests/test-util-sockets.c
@@ -52,6 +52,7 @@ static void test_fd_is_socket_good(void)
 
 static int mon_fd = -1;
 static const char *mon_fdname;
+__thread Monitor *cur_mon;
 
 int monitor_get_fd(Monitor *mon, const char *fdname, Error **errp)
 {
@@ -65,15 +66,14 @@ int monitor_get_fd(Monitor *mon, const char *fdname, Error **errp)
 }
 
 /*
- * Syms of stubs in libqemuutil.a are discarded at .o file granularity.
- * To replace monitor_get_fd() we must ensure everything in
- * stubs/monitor.c is defined, to make sure monitor.o is discarded
+ * Syms of stubs in libqemuutil.a are discarded at .o file
+ * granularity.  To replace monitor_get_fd() and monitor_cur(), we
+ * must ensure that we also replace any other symbol that is used in
+ * the binary and would be taken from the same stub object file,
  * otherwise we get duplicate syms at link time.
  */
-__thread Monitor *cur_mon;
+Monitor *monitor_cur(void) { return cur_mon; }
 int monitor_vprintf(Monitor *mon, const char *fmt, va_list ap) { abort(); }
-void monitor_init_qmp(Chardev *chr, bool pretty, Error **errp) {}
-void monitor_init_hmp(Chardev *chr, bool use_readline, Error **errp) {}
 
 #ifndef _WIN32
 static void test_socket_fd_pass_name_good(void)
diff --git a/trace/control.c b/trace/control.c
index c63a4de732..b35e512dce 100644
--- a/trace/control.c
+++ b/trace/control.c
@@ -176,7 +176,7 @@ void trace_enable_events(const char *line_buf)
 {
     if (is_help_option(line_buf)) {
         trace_list_events();
-        if (cur_mon == NULL) {
+        if (monitor_cur() == NULL) {
             exit(0);
         }
     } else {
diff --git a/util/qemu-error.c b/util/qemu-error.c
index 3ee41438e9..aa30f03564 100644
--- a/util/qemu-error.c
+++ b/util/qemu-error.c
@@ -171,7 +171,7 @@ static void print_loc(void)
     int i;
     const char *const *argp;
 
-    if (!cur_mon && progname) {
+    if (!monitor_cur() && progname) {
         fprintf(stderr, "%s:", progname);
         sep = " ";
     }
@@ -208,7 +208,7 @@ static void vreport(report_type type, const char *fmt, va_list ap)
     GTimeVal tv;
     gchar *timestr;
 
-    if (error_with_timestamp && !cur_mon) {
+    if (error_with_timestamp && !monitor_cur()) {
         g_get_current_time(&tv);
         timestr = g_time_val_to_iso8601(&tv);
         error_printf("%s ", timestr);
@@ -216,7 +216,7 @@ static void vreport(report_type type, const char *fmt, va_list ap)
     }
 
     /* Only prepend guest name if -msg guest-name and -name guest=... are set */
-    if (error_with_guestname && error_guest_name && !cur_mon) {
+    if (error_with_guestname && error_guest_name && !monitor_cur()) {
         error_printf("%s ", error_guest_name);
     }
 
diff --git a/util/qemu-print.c b/util/qemu-print.c
index e79d6b8396..69ba612f56 100644
--- a/util/qemu-print.c
+++ b/util/qemu-print.c
@@ -20,6 +20,7 @@
  */
 int qemu_vprintf(const char *fmt, va_list ap)
 {
+    Monitor *cur_mon = monitor_cur();
     if (cur_mon) {
         return monitor_vprintf(cur_mon, fmt, ap);
     }
@@ -48,7 +49,7 @@ int qemu_printf(const char *fmt, ...)
 int qemu_vfprintf(FILE *stream, const char *fmt, va_list ap)
 {
     if (!stream) {
-        return monitor_vprintf(cur_mon, fmt, ap);
+        return monitor_vprintf(monitor_cur(), fmt, ap);
     }
     return vfprintf(stream, fmt, ap);
 }
diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
index de4bf7616e..05e5c73f9d 100644
--- a/util/qemu-sockets.c
+++ b/util/qemu-sockets.c
@@ -1092,6 +1092,7 @@ fail:
 
 static int socket_get_fd(const char *fdstr, int num, Error **errp)
 {
+    Monitor *cur_mon = monitor_cur();
     int fd;
     if (num != 1) {
         error_setg_errno(errp, EINVAL, "socket_get_fd: too many connections");
-- 
2.26.2



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PULL 04/14] hmp: Update current monitor only in handle_hmp_command()
  2020-10-09  6:34 [PULL 00/14] Monitor patches for 2020-10-09 Markus Armbruster
                   ` (2 preceding siblings ...)
  2020-10-09  6:34 ` [PULL 03/14] monitor: Use getter/setter functions for cur_mon Markus Armbruster
@ 2020-10-09  6:34 ` Markus Armbruster
  2020-10-09  6:34 ` [PULL 05/14] qmp: Assert that no other monitor is active Markus Armbruster
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 27+ messages in thread
From: Markus Armbruster @ 2020-10-09  6:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Stefan Hajnoczi

From: Kevin Wolf <kwolf@redhat.com>

The current monitor is updated relatively early in the command handling
code even though only the command handler actually needs it.

The current monitor will become coroutine-local later, so we can only
update it when we know in which coroutine the command will be exectued.
Move it to handle_hmp_command() where this information will be
available.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20201005155855.256490-5-kwolf@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 monitor/hmp.c  | 10 +++++-----
 monitor/misc.c |  5 -----
 2 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/monitor/hmp.c b/monitor/hmp.c
index e0cc9e65dd..560ec98e7b 100644
--- a/monitor/hmp.c
+++ b/monitor/hmp.c
@@ -1061,6 +1061,7 @@ void handle_hmp_command(MonitorHMP *mon, const char *cmdline)
     QDict *qdict;
     const HMPCommand *cmd;
     const char *cmd_start = cmdline;
+    Monitor *old_mon;
 
     trace_handle_hmp_command(mon, cmdline);
 
@@ -1079,7 +1080,11 @@ void handle_hmp_command(MonitorHMP *mon, const char *cmdline)
         return;
     }
 
+    /* old_mon is non-NULL when called from qmp_human_monitor_command() */
+    old_mon = monitor_set_cur(&mon->common);
     cmd->cmd(&mon->common, qdict);
+    monitor_set_cur(old_mon);
+
     qobject_unref(qdict);
 }
 
@@ -1301,11 +1306,8 @@ cleanup:
 static void monitor_read(void *opaque, const uint8_t *buf, int size)
 {
     MonitorHMP *mon = container_of(opaque, MonitorHMP, common);
-    Monitor *old_mon;
     int i;
 
-    old_mon = monitor_set_cur(&mon->common);
-
     if (mon->rs) {
         for (i = 0; i < size; i++) {
             readline_handle_byte(mon->rs, buf[i]);
@@ -1317,8 +1319,6 @@ static void monitor_read(void *opaque, const uint8_t *buf, int size)
             handle_hmp_command(mon, (char *)buf);
         }
     }
-
-    monitor_set_cur(old_mon);
 }
 
 static void monitor_event(void *opaque, QEMUChrEvent event)
diff --git a/monitor/misc.c b/monitor/misc.c
index ee8db45094..4a859fb24a 100644
--- a/monitor/misc.c
+++ b/monitor/misc.c
@@ -120,17 +120,13 @@ char *qmp_human_monitor_command(const char *command_line, bool has_cpu_index,
                                 int64_t cpu_index, Error **errp)
 {
     char *output = NULL;
-    Monitor *old_mon;
     MonitorHMP hmp = {};
 
     monitor_data_init(&hmp.common, false, true, false);
 
-    old_mon = monitor_set_cur(&hmp.common);
-
     if (has_cpu_index) {
         int ret = monitor_set_cpu(&hmp.common, cpu_index);
         if (ret < 0) {
-            monitor_set_cur(old_mon);
             error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "cpu-index",
                        "a CPU number");
             goto out;
@@ -138,7 +134,6 @@ char *qmp_human_monitor_command(const char *command_line, bool has_cpu_index,
     }
 
     handle_hmp_command(&hmp, command_line);
-    monitor_set_cur(old_mon);
 
     WITH_QEMU_LOCK_GUARD(&hmp.common.mon_lock) {
         if (qstring_get_length(hmp.common.outbuf) > 0) {
-- 
2.26.2



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PULL 05/14] qmp: Assert that no other monitor is active
  2020-10-09  6:34 [PULL 00/14] Monitor patches for 2020-10-09 Markus Armbruster
                   ` (3 preceding siblings ...)
  2020-10-09  6:34 ` [PULL 04/14] hmp: Update current monitor only in handle_hmp_command() Markus Armbruster
@ 2020-10-09  6:34 ` Markus Armbruster
  2020-10-09  6:34 ` [PULL 06/14] qmp: Call monitor_set_cur() only in qmp_dispatch() Markus Armbruster
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 27+ messages in thread
From: Markus Armbruster @ 2020-10-09  6:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Stefan Hajnoczi

From: Kevin Wolf <kwolf@redhat.com>

monitor_qmp_dispatch() is never supposed to be called in the context of
another monitor, so assert that monitor_cur() is NULL instead of saving
and restoring it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20201005155855.256490-6-kwolf@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 monitor/qmp.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/monitor/qmp.c b/monitor/qmp.c
index bb2d9d0cc7..8469970c69 100644
--- a/monitor/qmp.c
+++ b/monitor/qmp.c
@@ -140,8 +140,11 @@ static void monitor_qmp_dispatch(MonitorQMP *mon, QObject *req)
     QDict *error;
 
     old_mon = monitor_set_cur(&mon->common);
+    assert(old_mon == NULL);
+
     rsp = qmp_dispatch(mon->commands, req, qmp_oob_enabled(mon));
-    monitor_set_cur(old_mon);
+
+    monitor_set_cur(NULL);
 
     if (mon->commands == &qmp_cap_negotiation_commands) {
         error = qdict_get_qdict(rsp, "error");
-- 
2.26.2



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PULL 06/14] qmp: Call monitor_set_cur() only in qmp_dispatch()
  2020-10-09  6:34 [PULL 00/14] Monitor patches for 2020-10-09 Markus Armbruster
                   ` (4 preceding siblings ...)
  2020-10-09  6:34 ` [PULL 05/14] qmp: Assert that no other monitor is active Markus Armbruster
@ 2020-10-09  6:34 ` Markus Armbruster
  2020-10-09  6:34 ` [PULL 07/14] monitor: Make current monitor a per-coroutine property Markus Armbruster
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 27+ messages in thread
From: Markus Armbruster @ 2020-10-09  6:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Stefan Hajnoczi

From: Kevin Wolf <kwolf@redhat.com>

The correct way to set the current monitor for a coroutine handler will
be different than for a blocking handler, so monitor_set_cur() needs to
be called in qmp_dispatch().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20201005155855.256490-7-kwolf@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 include/qapi/qmp/dispatch.h | 3 ++-
 monitor/qmp.c               | 9 ++-------
 qapi/qmp-dispatch.c         | 8 +++++++-
 qga/main.c                  | 2 +-
 stubs/monitor-core.c        | 5 +++++
 tests/test-qmp-cmds.c       | 6 +++---
 6 files changed, 20 insertions(+), 13 deletions(-)

diff --git a/include/qapi/qmp/dispatch.h b/include/qapi/qmp/dispatch.h
index 5a9cf82472..0c2f467028 100644
--- a/include/qapi/qmp/dispatch.h
+++ b/include/qapi/qmp/dispatch.h
@@ -14,6 +14,7 @@
 #ifndef QAPI_QMP_DISPATCH_H
 #define QAPI_QMP_DISPATCH_H
 
+#include "monitor/monitor.h"
 #include "qemu/queue.h"
 
 typedef void (QmpCommandFunc)(QDict *, QObject **, Error **);
@@ -49,7 +50,7 @@ const char *qmp_command_name(const QmpCommand *cmd);
 bool qmp_has_success_response(const QmpCommand *cmd);
 QDict *qmp_error_response(Error *err);
 QDict *qmp_dispatch(const QmpCommandList *cmds, QObject *request,
-                    bool allow_oob);
+                    bool allow_oob, Monitor *cur_mon);
 bool qmp_is_oob(const QDict *dict);
 
 typedef void (*qmp_cmd_callback_fn)(const QmpCommand *cmd, void *opaque);
diff --git a/monitor/qmp.c b/monitor/qmp.c
index 8469970c69..e746b3575d 100644
--- a/monitor/qmp.c
+++ b/monitor/qmp.c
@@ -135,16 +135,11 @@ static void monitor_qmp_respond(MonitorQMP *mon, QDict *rsp)
 
 static void monitor_qmp_dispatch(MonitorQMP *mon, QObject *req)
 {
-    Monitor *old_mon;
     QDict *rsp;
     QDict *error;
 
-    old_mon = monitor_set_cur(&mon->common);
-    assert(old_mon == NULL);
-
-    rsp = qmp_dispatch(mon->commands, req, qmp_oob_enabled(mon));
-
-    monitor_set_cur(NULL);
+    rsp = qmp_dispatch(mon->commands, req, qmp_oob_enabled(mon),
+                       &mon->common);
 
     if (mon->commands == &qmp_cap_negotiation_commands) {
         error = qdict_get_qdict(rsp, "error");
diff --git a/qapi/qmp-dispatch.c b/qapi/qmp-dispatch.c
index 79347e0864..2fdbc0fba4 100644
--- a/qapi/qmp-dispatch.c
+++ b/qapi/qmp-dispatch.c
@@ -89,7 +89,7 @@ bool qmp_is_oob(const QDict *dict)
 }
 
 QDict *qmp_dispatch(const QmpCommandList *cmds, QObject *request,
-                    bool allow_oob)
+                    bool allow_oob, Monitor *cur_mon)
 {
     Error *err = NULL;
     bool oob;
@@ -152,7 +152,13 @@ QDict *qmp_dispatch(const QmpCommandList *cmds, QObject *request,
         args = qdict_get_qdict(dict, "arguments");
         qobject_ref(args);
     }
+
+    assert(monitor_cur() == NULL);
+    monitor_set_cur(cur_mon);
+
     cmd->fn(args, &ret, &err);
+
+    monitor_set_cur(NULL);
     qobject_unref(args);
     if (err) {
         /* or assert(!ret) after reviewing all handlers: */
diff --git a/qga/main.c b/qga/main.c
index 740f5f7303..dea6a3aa64 100644
--- a/qga/main.c
+++ b/qga/main.c
@@ -579,7 +579,7 @@ static void process_event(void *opaque, QObject *obj, Error *err)
     }
 
     g_debug("processing command");
-    rsp = qmp_dispatch(&ga_commands, obj, false);
+    rsp = qmp_dispatch(&ga_commands, obj, false, NULL);
 
 end:
     ret = send_response(s, rsp);
diff --git a/stubs/monitor-core.c b/stubs/monitor-core.c
index 0cd2d864b2..dc1748bf13 100644
--- a/stubs/monitor-core.c
+++ b/stubs/monitor-core.c
@@ -8,6 +8,11 @@ Monitor *monitor_cur(void)
     return NULL;
 }
 
+Monitor *monitor_set_cur(Monitor *mon)
+{
+    return NULL;
+}
+
 void monitor_init_qmp(Chardev *chr, bool pretty, Error **errp)
 {
 }
diff --git a/tests/test-qmp-cmds.c b/tests/test-qmp-cmds.c
index d12ff47e26..5f1b245e19 100644
--- a/tests/test-qmp-cmds.c
+++ b/tests/test-qmp-cmds.c
@@ -152,7 +152,7 @@ static QObject *do_qmp_dispatch(bool allow_oob, const char *template, ...)
     req = qdict_from_vjsonf_nofail(template, ap);
     va_end(ap);
 
-    resp = qmp_dispatch(&qmp_commands, QOBJECT(req), allow_oob);
+    resp = qmp_dispatch(&qmp_commands, QOBJECT(req), allow_oob, NULL);
     g_assert(resp);
     ret = qdict_get(resp, "return");
     g_assert(ret);
@@ -175,7 +175,7 @@ static void do_qmp_dispatch_error(bool allow_oob, ErrorClass cls,
     req = qdict_from_vjsonf_nofail(template, ap);
     va_end(ap);
 
-    resp = qmp_dispatch(&qmp_commands, QOBJECT(req), allow_oob);
+    resp = qmp_dispatch(&qmp_commands, QOBJECT(req), allow_oob, NULL);
     g_assert(resp);
     error = qdict_get_qdict(resp, "error");
     g_assert(error);
@@ -231,7 +231,7 @@ static void test_dispatch_cmd_success_response(void)
     QDict *resp;
 
     qdict_put_str(req, "execute", "cmd-success-response");
-    resp = qmp_dispatch(&qmp_commands, QOBJECT(req), false);
+    resp = qmp_dispatch(&qmp_commands, QOBJECT(req), false, NULL);
     g_assert_null(resp);
     qobject_unref(req);
 }
-- 
2.26.2



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PULL 07/14] monitor: Make current monitor a per-coroutine property
  2020-10-09  6:34 [PULL 00/14] Monitor patches for 2020-10-09 Markus Armbruster
                   ` (5 preceding siblings ...)
  2020-10-09  6:34 ` [PULL 06/14] qmp: Call monitor_set_cur() only in qmp_dispatch() Markus Armbruster
@ 2020-10-09  6:34 ` Markus Armbruster
  2020-10-09  6:34 ` [PULL 08/14] qapi: Add a 'coroutine' flag for commands Markus Armbruster
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 27+ messages in thread
From: Markus Armbruster @ 2020-10-09  6:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Stefan Hajnoczi

From: Kevin Wolf <kwolf@redhat.com>

This way, a monitor command handler will still be able to access the
current monitor, but when it yields, all other code code will correctly
get NULL from monitor_cur().

This uses a hash table to map the coroutine pointer to the current
monitor of that coroutine.  Outside of coroutine context, we associate
the current monitor with the leader coroutine of the current thread.

Approaches to implement some form of coroutine local storage directly in
the coroutine core code have been considered and discarded because they
didn't end up being much more generic than the hash table and their
performance impact on coroutines not using coroutine local storage was
unclear. As the block layer uses a coroutine per I/O request, this is a
fast path and we have to be careful. It's safest to just stay out of
this path with code only used by the monitor.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20201005155855.256490-8-kwolf@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 include/monitor/monitor.h |  2 +-
 monitor/hmp.c             |  4 ++--
 monitor/monitor.c         | 34 +++++++++++++++++++++++++++-------
 qapi/qmp-dispatch.c       |  4 ++--
 stubs/monitor-core.c      |  2 +-
 5 files changed, 33 insertions(+), 13 deletions(-)

diff --git a/include/monitor/monitor.h b/include/monitor/monitor.h
index 543eafcb76..348bfad3d5 100644
--- a/include/monitor/monitor.h
+++ b/include/monitor/monitor.h
@@ -13,7 +13,7 @@ typedef struct MonitorOptions MonitorOptions;
 extern QemuOptsList qemu_mon_opts;
 
 Monitor *monitor_cur(void);
-Monitor *monitor_set_cur(Monitor *mon);
+Monitor *monitor_set_cur(Coroutine *co, Monitor *mon);
 bool monitor_cur_is_qmp(void);
 
 void monitor_init_globals(void);
diff --git a/monitor/hmp.c b/monitor/hmp.c
index 560ec98e7b..abaf939b2d 100644
--- a/monitor/hmp.c
+++ b/monitor/hmp.c
@@ -1081,9 +1081,9 @@ void handle_hmp_command(MonitorHMP *mon, const char *cmdline)
     }
 
     /* old_mon is non-NULL when called from qmp_human_monitor_command() */
-    old_mon = monitor_set_cur(&mon->common);
+    old_mon = monitor_set_cur(qemu_coroutine_self(), &mon->common);
     cmd->cmd(&mon->common, qdict);
-    monitor_set_cur(old_mon);
+    monitor_set_cur(qemu_coroutine_self(), old_mon);
 
     qobject_unref(qdict);
 }
diff --git a/monitor/monitor.c b/monitor/monitor.c
index 099c164c6d..ef68ca9d21 100644
--- a/monitor/monitor.c
+++ b/monitor/monitor.c
@@ -58,29 +58,48 @@ IOThread *mon_iothread;
 /* Bottom half to dispatch the requests received from I/O thread */
 QEMUBH *qmp_dispatcher_bh;
 
-/* Protects mon_list, monitor_qapi_event_state, monitor_destroyed.  */
+/*
+ * Protects mon_list, monitor_qapi_event_state, coroutine_mon,
+ * monitor_destroyed.
+ */
 QemuMutex monitor_lock;
 static GHashTable *monitor_qapi_event_state;
+static GHashTable *coroutine_mon; /* Maps Coroutine* to Monitor* */
 
 MonitorList mon_list;
 int mon_refcount;
 static bool monitor_destroyed;
 
-static __thread Monitor *cur_monitor;
-
 Monitor *monitor_cur(void)
 {
-    return cur_monitor;
+    Monitor *mon;
+
+    qemu_mutex_lock(&monitor_lock);
+    mon = g_hash_table_lookup(coroutine_mon, qemu_coroutine_self());
+    qemu_mutex_unlock(&monitor_lock);
+
+    return mon;
 }
 
 /**
  * Sets a new current monitor and returns the old one.
+ *
+ * If a non-NULL monitor is set for a coroutine, another call
+ * resetting it to NULL is required before the coroutine terminates,
+ * otherwise a stale entry would remain in the hash table.
  */
-Monitor *monitor_set_cur(Monitor *mon)
+Monitor *monitor_set_cur(Coroutine *co, Monitor *mon)
 {
-    Monitor *old_monitor = cur_monitor;
+    Monitor *old_monitor = monitor_cur();
+
+    qemu_mutex_lock(&monitor_lock);
+    if (mon) {
+        g_hash_table_replace(coroutine_mon, co, mon);
+    } else {
+        g_hash_table_remove(coroutine_mon, co);
+    }
+    qemu_mutex_unlock(&monitor_lock);
 
-    cur_monitor = mon;
     return old_monitor;
 }
 
@@ -623,6 +642,7 @@ void monitor_init_globals_core(void)
 {
     monitor_qapi_event_init();
     qemu_mutex_init(&monitor_lock);
+    coroutine_mon = g_hash_table_new(NULL, NULL);
 
     /*
      * The dispatcher BH must run in the main loop thread, since we
diff --git a/qapi/qmp-dispatch.c b/qapi/qmp-dispatch.c
index 2fdbc0fba4..5677ba92ca 100644
--- a/qapi/qmp-dispatch.c
+++ b/qapi/qmp-dispatch.c
@@ -154,11 +154,11 @@ QDict *qmp_dispatch(const QmpCommandList *cmds, QObject *request,
     }
 
     assert(monitor_cur() == NULL);
-    monitor_set_cur(cur_mon);
+    monitor_set_cur(qemu_coroutine_self(), cur_mon);
 
     cmd->fn(args, &ret, &err);
 
-    monitor_set_cur(NULL);
+    monitor_set_cur(qemu_coroutine_self(), NULL);
     qobject_unref(args);
     if (err) {
         /* or assert(!ret) after reviewing all handlers: */
diff --git a/stubs/monitor-core.c b/stubs/monitor-core.c
index dc1748bf13..d058a2a00d 100644
--- a/stubs/monitor-core.c
+++ b/stubs/monitor-core.c
@@ -8,7 +8,7 @@ Monitor *monitor_cur(void)
     return NULL;
 }
 
-Monitor *monitor_set_cur(Monitor *mon)
+Monitor *monitor_set_cur(Coroutine *co, Monitor *mon)
 {
     return NULL;
 }
-- 
2.26.2



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PULL 08/14] qapi: Add a 'coroutine' flag for commands
  2020-10-09  6:34 [PULL 00/14] Monitor patches for 2020-10-09 Markus Armbruster
                   ` (6 preceding siblings ...)
  2020-10-09  6:34 ` [PULL 07/14] monitor: Make current monitor a per-coroutine property Markus Armbruster
@ 2020-10-09  6:34 ` Markus Armbruster
  2020-10-09  6:34 ` [PULL 09/14] qmp: Move dispatcher to a coroutine Markus Armbruster
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 27+ messages in thread
From: Markus Armbruster @ 2020-10-09  6:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Stefan Hajnoczi

From: Kevin Wolf <kwolf@redhat.com>

This patch adds a new 'coroutine' flag to QMP command definitions that
tells the QMP dispatcher that the command handler is safe to be run in a
coroutine.

The documentation of the new flag pretends that this flag is already
used as intended, which it isn't yet after this patch. We'll implement
this in another patch in this series.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20201005155855.256490-9-kwolf@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 docs/devel/qapi-code-gen.txt            | 29 +++++++++++++++++++++++++
 docs/sphinx/qapidoc.py                  |  2 +-
 include/qapi/qmp/dispatch.h             |  1 +
 tests/test-qmp-cmds.c                   |  4 ++++
 scripts/qapi/commands.py                | 10 ++++++---
 scripts/qapi/expr.py                    | 11 ++++++++--
 scripts/qapi/introspect.py              |  2 +-
 scripts/qapi/schema.py                  | 13 +++++++----
 tests/qapi-schema/meson.build           |  1 +
 tests/qapi-schema/oob-coroutine.err     |  2 ++
 tests/qapi-schema/oob-coroutine.json    |  2 ++
 tests/qapi-schema/oob-coroutine.out     |  0
 tests/qapi-schema/qapi-schema-test.json |  1 +
 tests/qapi-schema/qapi-schema-test.out  |  2 ++
 tests/qapi-schema/test-qapi.py          |  7 +++---
 15 files changed, 73 insertions(+), 14 deletions(-)
 create mode 100644 tests/qapi-schema/oob-coroutine.err
 create mode 100644 tests/qapi-schema/oob-coroutine.json
 create mode 100644 tests/qapi-schema/oob-coroutine.out

diff --git a/docs/devel/qapi-code-gen.txt b/docs/devel/qapi-code-gen.txt
index 5fc67c99cd..4a41e36a75 100644
--- a/docs/devel/qapi-code-gen.txt
+++ b/docs/devel/qapi-code-gen.txt
@@ -472,6 +472,7 @@ Syntax:
                 '*gen': false,
                 '*allow-oob': true,
                 '*allow-preconfig': true,
+                '*coroutine': true,
                 '*if': COND,
                 '*features': FEATURES }
 
@@ -596,6 +597,34 @@ before the machine is built.  It defaults to false.  For example:
 QMP is available before the machine is built only when QEMU was
 started with --preconfig.
 
+Member 'coroutine' tells the QMP dispatcher whether the command handler
+is safe to be run in a coroutine.  It defaults to false.  If it is true,
+the command handler is called from coroutine context and may yield while
+waiting for an external event (such as I/O completion) in order to avoid
+blocking the guest and other background operations.
+
+Coroutine safety can be hard to prove, similar to thread safety.  Common
+pitfalls are:
+
+- The global mutex isn't held across qemu_coroutine_yield(), so
+  operations that used to assume that they execute atomically may have
+  to be more careful to protect against changes in the global state.
+
+- Nested event loops (AIO_WAIT_WHILE() etc.) are problematic in
+  coroutine context and can easily lead to deadlocks.  They should be
+  replaced by yielding and reentering the coroutine when the condition
+  becomes false.
+
+Since the command handler may assume coroutine context, any callers
+other than the QMP dispatcher must also call it in coroutine context.
+In particular, HMP commands calling such a QMP command handler must
+enter coroutine context before calling the handler.
+
+It is an error to specify both 'coroutine': true and 'allow-oob': true
+for a command.  We don't currently have a use case for both together and
+without a use case, it's not entirely clear what the semantics should
+be.
+
 The optional 'if' member specifies a conditional.  See "Configuring
 the schema" below for more on this.
 
diff --git a/docs/sphinx/qapidoc.py b/docs/sphinx/qapidoc.py
index 6944ffa6aa..e03abcbb95 100644
--- a/docs/sphinx/qapidoc.py
+++ b/docs/sphinx/qapidoc.py
@@ -330,7 +330,7 @@ class QAPISchemaGenRSTVisitor(QAPISchemaVisitor):
 
     def visit_command(self, name, info, ifcond, features, arg_type,
                       ret_type, gen, success_response, boxed, allow_oob,
-                      allow_preconfig):
+                      allow_preconfig, coroutine):
         doc = self._cur_doc
         self._add_doc('Command',
                       self._nodes_for_arguments(doc,
diff --git a/include/qapi/qmp/dispatch.h b/include/qapi/qmp/dispatch.h
index 0c2f467028..9fd2b720a7 100644
--- a/include/qapi/qmp/dispatch.h
+++ b/include/qapi/qmp/dispatch.h
@@ -25,6 +25,7 @@ typedef enum QmpCommandOptions
     QCO_NO_SUCCESS_RESP       =  (1U << 0),
     QCO_ALLOW_OOB             =  (1U << 1),
     QCO_ALLOW_PRECONFIG       =  (1U << 2),
+    QCO_COROUTINE             =  (1U << 3),
 } QmpCommandOptions;
 
 typedef struct QmpCommand
diff --git a/tests/test-qmp-cmds.c b/tests/test-qmp-cmds.c
index 5f1b245e19..d3413bfef0 100644
--- a/tests/test-qmp-cmds.c
+++ b/tests/test-qmp-cmds.c
@@ -36,6 +36,10 @@ void qmp_cmd_success_response(Error **errp)
 {
 }
 
+void qmp_coroutine_cmd(Error **errp)
+{
+}
+
 Empty2 *qmp_user_def_cmd0(Error **errp)
 {
     return g_new0(Empty2, 1);
diff --git a/scripts/qapi/commands.py b/scripts/qapi/commands.py
index 3cf9e1110b..6e6fc94a14 100644
--- a/scripts/qapi/commands.py
+++ b/scripts/qapi/commands.py
@@ -176,7 +176,8 @@ out:
     return ret
 
 
-def gen_register_command(name, success_response, allow_oob, allow_preconfig):
+def gen_register_command(name, success_response, allow_oob, allow_preconfig,
+                         coroutine):
     options = []
 
     if not success_response:
@@ -185,6 +186,8 @@ def gen_register_command(name, success_response, allow_oob, allow_preconfig):
         options += ['QCO_ALLOW_OOB']
     if allow_preconfig:
         options += ['QCO_ALLOW_PRECONFIG']
+    if coroutine:
+        options += ['QCO_COROUTINE']
 
     if not options:
         options = ['QCO_NO_OPTIONS']
@@ -267,7 +270,7 @@ void %(c_prefix)sqmp_init_marshal(QmpCommandList *cmds);
 
     def visit_command(self, name, info, ifcond, features,
                       arg_type, ret_type, gen, success_response, boxed,
-                      allow_oob, allow_preconfig):
+                      allow_oob, allow_preconfig, coroutine):
         if not gen:
             return
         # FIXME: If T is a user-defined type, the user is responsible
@@ -285,7 +288,8 @@ void %(c_prefix)sqmp_init_marshal(QmpCommandList *cmds);
             self._genh.add(gen_marshal_decl(name))
             self._genc.add(gen_marshal(name, arg_type, boxed, ret_type))
             self._regy.add(gen_register_command(name, success_response,
-                                                allow_oob, allow_preconfig))
+                                                allow_oob, allow_preconfig,
+                                                coroutine))
 
 
 def gen_commands(schema, output_dir, prefix):
diff --git a/scripts/qapi/expr.py b/scripts/qapi/expr.py
index 2942520399..a15c1fb474 100644
--- a/scripts/qapi/expr.py
+++ b/scripts/qapi/expr.py
@@ -88,10 +88,17 @@ def check_flags(expr, info):
         if key in expr and expr[key] is not False:
             raise QAPISemError(
                 info, "flag '%s' may only use false value" % key)
-    for key in ['boxed', 'allow-oob', 'allow-preconfig']:
+    for key in ['boxed', 'allow-oob', 'allow-preconfig', 'coroutine']:
         if key in expr and expr[key] is not True:
             raise QAPISemError(
                 info, "flag '%s' may only use true value" % key)
+    if 'allow-oob' in expr and 'coroutine' in expr:
+        # This is not necessarily a fundamental incompatibility, but
+        # we don't have a use case and the desired semantics isn't
+        # obvious.  The simplest solution is to forbid it until we get
+        # a use case for it.
+        raise QAPISemError(info, "flags 'allow-oob' and 'coroutine' "
+                                 "are incompatible")
 
 
 def check_if(expr, info, source):
@@ -342,7 +349,7 @@ def check_exprs(exprs):
                        ['command'],
                        ['data', 'returns', 'boxed', 'if', 'features',
                         'gen', 'success-response', 'allow-oob',
-                        'allow-preconfig'])
+                        'allow-preconfig', 'coroutine'])
             normalize_members(expr.get('data'))
             check_command(expr, info)
         elif meta == 'event':
diff --git a/scripts/qapi/introspect.py b/scripts/qapi/introspect.py
index 23652be810..5907b09cd5 100644
--- a/scripts/qapi/introspect.py
+++ b/scripts/qapi/introspect.py
@@ -216,7 +216,7 @@ const QLitObject %(c_name)s = %(c_string)s;
 
     def visit_command(self, name, info, ifcond, features,
                       arg_type, ret_type, gen, success_response, boxed,
-                      allow_oob, allow_preconfig):
+                      allow_oob, allow_preconfig, coroutine):
         arg_type = arg_type or self._schema.the_empty_object_type
         ret_type = ret_type or self._schema.the_empty_object_type
         obj = {'arg-type': self._use_type(arg_type),
diff --git a/scripts/qapi/schema.py b/scripts/qapi/schema.py
index 78309a00f0..d1307ec661 100644
--- a/scripts/qapi/schema.py
+++ b/scripts/qapi/schema.py
@@ -128,7 +128,7 @@ class QAPISchemaVisitor:
 
     def visit_command(self, name, info, ifcond, features,
                       arg_type, ret_type, gen, success_response, boxed,
-                      allow_oob, allow_preconfig):
+                      allow_oob, allow_preconfig, coroutine):
         pass
 
     def visit_event(self, name, info, ifcond, features, arg_type, boxed):
@@ -713,7 +713,8 @@ class QAPISchemaCommand(QAPISchemaEntity):
 
     def __init__(self, name, info, doc, ifcond, features,
                  arg_type, ret_type,
-                 gen, success_response, boxed, allow_oob, allow_preconfig):
+                 gen, success_response, boxed, allow_oob, allow_preconfig,
+                 coroutine):
         super().__init__(name, info, doc, ifcond, features)
         assert not arg_type or isinstance(arg_type, str)
         assert not ret_type or isinstance(ret_type, str)
@@ -726,6 +727,7 @@ class QAPISchemaCommand(QAPISchemaEntity):
         self.boxed = boxed
         self.allow_oob = allow_oob
         self.allow_preconfig = allow_preconfig
+        self.coroutine = coroutine
 
     def check(self, schema):
         super().check(schema)
@@ -768,7 +770,8 @@ class QAPISchemaCommand(QAPISchemaEntity):
         visitor.visit_command(
             self.name, self.info, self.ifcond, self.features,
             self.arg_type, self.ret_type, self.gen, self.success_response,
-            self.boxed, self.allow_oob, self.allow_preconfig)
+            self.boxed, self.allow_oob, self.allow_preconfig,
+            self.coroutine)
 
 
 class QAPISchemaEvent(QAPISchemaEntity):
@@ -1074,6 +1077,7 @@ class QAPISchema:
         boxed = expr.get('boxed', False)
         allow_oob = expr.get('allow-oob', False)
         allow_preconfig = expr.get('allow-preconfig', False)
+        coroutine = expr.get('coroutine', False)
         ifcond = expr.get('if')
         features = self._make_features(expr.get('features'), info)
         if isinstance(data, OrderedDict):
@@ -1086,7 +1090,8 @@ class QAPISchema:
         self._def_entity(QAPISchemaCommand(name, info, doc, ifcond, features,
                                            data, rets,
                                            gen, success_response,
-                                           boxed, allow_oob, allow_preconfig))
+                                           boxed, allow_oob, allow_preconfig,
+                                           coroutine))
 
     def _def_event(self, expr, info, doc):
         name = expr['event']
diff --git a/tests/qapi-schema/meson.build b/tests/qapi-schema/meson.build
index f08c902911..1f222a7a13 100644
--- a/tests/qapi-schema/meson.build
+++ b/tests/qapi-schema/meson.build
@@ -142,6 +142,7 @@ schemas = [
   'nested-struct-data.json',
   'nested-struct-data-invalid-dict.json',
   'non-objects.json',
+  'oob-coroutine.json',
   'oob-test.json',
   'allow-preconfig-test.json',
   'pragma-doc-required-crap.json',
diff --git a/tests/qapi-schema/oob-coroutine.err b/tests/qapi-schema/oob-coroutine.err
new file mode 100644
index 0000000000..c01a4992bd
--- /dev/null
+++ b/tests/qapi-schema/oob-coroutine.err
@@ -0,0 +1,2 @@
+oob-coroutine.json: In command 'oob-command-1':
+oob-coroutine.json:2: flags 'allow-oob' and 'coroutine' are incompatible
diff --git a/tests/qapi-schema/oob-coroutine.json b/tests/qapi-schema/oob-coroutine.json
new file mode 100644
index 0000000000..0f67663bcd
--- /dev/null
+++ b/tests/qapi-schema/oob-coroutine.json
@@ -0,0 +1,2 @@
+# Check that incompatible flags allow-oob and coroutine are rejected
+{ 'command': 'oob-command-1', 'allow-oob': true, 'coroutine': true }
diff --git a/tests/qapi-schema/oob-coroutine.out b/tests/qapi-schema/oob-coroutine.out
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/tests/qapi-schema/qapi-schema-test.json b/tests/qapi-schema/qapi-schema-test.json
index 3a9f2cbb33..63f92adf68 100644
--- a/tests/qapi-schema/qapi-schema-test.json
+++ b/tests/qapi-schema/qapi-schema-test.json
@@ -148,6 +148,7 @@
   'returns': 'UserDefTwo' }
 
 { 'command': 'cmd-success-response', 'data': {}, 'success-response': false }
+{ 'command': 'coroutine-cmd', 'data': {}, 'coroutine': true }
 
 # Returning a non-dictionary requires a name from the whitelist
 { 'command': 'guest-get-time', 'data': {'a': 'int', '*b': 'int' },
diff --git a/tests/qapi-schema/qapi-schema-test.out b/tests/qapi-schema/qapi-schema-test.out
index 891b4101e0..8868ca0dca 100644
--- a/tests/qapi-schema/qapi-schema-test.out
+++ b/tests/qapi-schema/qapi-schema-test.out
@@ -203,6 +203,8 @@ command user_def_cmd2 q_obj_user_def_cmd2-arg -> UserDefTwo
     gen=True success_response=True boxed=False oob=False preconfig=False
 command cmd-success-response None -> None
     gen=True success_response=False boxed=False oob=False preconfig=False
+command coroutine-cmd None -> None
+    gen=True success_response=True boxed=False oob=False preconfig=False coroutine=True
 object q_obj_guest-get-time-arg
     member a: int optional=False
     member b: int optional=True
diff --git a/tests/qapi-schema/test-qapi.py b/tests/qapi-schema/test-qapi.py
index f396b471eb..e8db9d09d9 100755
--- a/tests/qapi-schema/test-qapi.py
+++ b/tests/qapi-schema/test-qapi.py
@@ -68,12 +68,13 @@ class QAPISchemaTestVisitor(QAPISchemaVisitor):
 
     def visit_command(self, name, info, ifcond, features,
                       arg_type, ret_type, gen, success_response, boxed,
-                      allow_oob, allow_preconfig):
+                      allow_oob, allow_preconfig, coroutine):
         print('command %s %s -> %s'
               % (name, arg_type and arg_type.name,
                  ret_type and ret_type.name))
-        print('    gen=%s success_response=%s boxed=%s oob=%s preconfig=%s'
-              % (gen, success_response, boxed, allow_oob, allow_preconfig))
+        print('    gen=%s success_response=%s boxed=%s oob=%s preconfig=%s%s'
+              % (gen, success_response, boxed, allow_oob, allow_preconfig,
+                 " coroutine=True" if coroutine else ""))
         self._print_if(ifcond)
         self._print_features(features)
 
-- 
2.26.2



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PULL 09/14] qmp: Move dispatcher to a coroutine
  2020-10-09  6:34 [PULL 00/14] Monitor patches for 2020-10-09 Markus Armbruster
                   ` (7 preceding siblings ...)
  2020-10-09  6:34 ` [PULL 08/14] qapi: Add a 'coroutine' flag for commands Markus Armbruster
@ 2020-10-09  6:34 ` Markus Armbruster
  2020-10-12 10:47   ` Alex Bennée
  2020-10-17  8:15   ` Volker Rümelin
  2020-10-09  6:34 ` [PULL 10/14] hmp: Add support for coroutine command handlers Markus Armbruster
                   ` (5 subsequent siblings)
  14 siblings, 2 replies; 27+ messages in thread
From: Markus Armbruster @ 2020-10-09  6:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Stefan Hajnoczi

From: Kevin Wolf <kwolf@redhat.com>

This moves the QMP dispatcher to a coroutine and runs all QMP command
handlers that declare 'coroutine': true in coroutine context so they
can avoid blocking the main loop while doing I/O or waiting for other
events.

For commands that are not declared safe to run in a coroutine, the
dispatcher drops out of coroutine context by calling the QMP command
handler from a bottom half.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20201005155855.256490-10-kwolf@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 include/qapi/qmp/dispatch.h |   1 +
 monitor/monitor-internal.h  |   6 +-
 monitor/monitor.c           |  55 +++++++++++++---
 monitor/qmp.c               | 122 +++++++++++++++++++++++++++---------
 qapi/qmp-dispatch.c         |  63 +++++++++++++++++--
 qapi/qmp-registry.c         |   3 +
 util/aio-posix.c            |   8 ++-
 7 files changed, 213 insertions(+), 45 deletions(-)

diff --git a/include/qapi/qmp/dispatch.h b/include/qapi/qmp/dispatch.h
index 9fd2b720a7..af8d96c570 100644
--- a/include/qapi/qmp/dispatch.h
+++ b/include/qapi/qmp/dispatch.h
@@ -31,6 +31,7 @@ typedef enum QmpCommandOptions
 typedef struct QmpCommand
 {
     const char *name;
+    /* Runs in coroutine context if QCO_COROUTINE is set */
     QmpCommandFunc *fn;
     QmpCommandOptions options;
     QTAILQ_ENTRY(QmpCommand) node;
diff --git a/monitor/monitor-internal.h b/monitor/monitor-internal.h
index b39e03b744..b55d6df07f 100644
--- a/monitor/monitor-internal.h
+++ b/monitor/monitor-internal.h
@@ -155,7 +155,9 @@ static inline bool monitor_is_qmp(const Monitor *mon)
 
 typedef QTAILQ_HEAD(MonitorList, Monitor) MonitorList;
 extern IOThread *mon_iothread;
-extern QEMUBH *qmp_dispatcher_bh;
+extern Coroutine *qmp_dispatcher_co;
+extern bool qmp_dispatcher_co_shutdown;
+extern bool qmp_dispatcher_co_busy;
 extern QmpCommandList qmp_commands, qmp_cap_negotiation_commands;
 extern QemuMutex monitor_lock;
 extern MonitorList mon_list;
@@ -173,7 +175,7 @@ void monitor_fdsets_cleanup(void);
 
 void qmp_send_response(MonitorQMP *mon, const QDict *rsp);
 void monitor_data_destroy_qmp(MonitorQMP *mon);
-void monitor_qmp_bh_dispatcher(void *data);
+void coroutine_fn monitor_qmp_dispatcher_co(void *data);
 
 int get_monitor_def(int64_t *pval, const char *name);
 void help_cmd(Monitor *mon, const char *name);
diff --git a/monitor/monitor.c b/monitor/monitor.c
index ef68ca9d21..ceffe1a83b 100644
--- a/monitor/monitor.c
+++ b/monitor/monitor.c
@@ -55,8 +55,32 @@ typedef struct {
 /* Shared monitor I/O thread */
 IOThread *mon_iothread;
 
-/* Bottom half to dispatch the requests received from I/O thread */
-QEMUBH *qmp_dispatcher_bh;
+/* Coroutine to dispatch the requests received from I/O thread */
+Coroutine *qmp_dispatcher_co;
+
+/* Set to true when the dispatcher coroutine should terminate */
+bool qmp_dispatcher_co_shutdown;
+
+/*
+ * qmp_dispatcher_co_busy is used for synchronisation between the
+ * monitor thread and the main thread to ensure that the dispatcher
+ * coroutine never gets scheduled a second time when it's already
+ * scheduled (scheduling the same coroutine twice is forbidden).
+ *
+ * It is true if the coroutine is active and processing requests.
+ * Additional requests may then be pushed onto mon->qmp_requests,
+ * and @qmp_dispatcher_co_shutdown may be set without further ado.
+ * @qmp_dispatcher_co_busy must not be woken up in this case.
+ *
+ * If false, you also have to set @qmp_dispatcher_co_busy to true and
+ * wake up @qmp_dispatcher_co after pushing the new requests.
+ *
+ * The coroutine will automatically change this variable back to false
+ * before it yields.  Nobody else may set the variable to false.
+ *
+ * Access must be atomic for thread safety.
+ */
+bool qmp_dispatcher_co_busy;
 
 /*
  * Protects mon_list, monitor_qapi_event_state, coroutine_mon,
@@ -623,9 +647,24 @@ void monitor_cleanup(void)
     }
     qemu_mutex_unlock(&monitor_lock);
 
-    /* QEMUBHs needs to be deleted before destroying the I/O thread */
-    qemu_bh_delete(qmp_dispatcher_bh);
-    qmp_dispatcher_bh = NULL;
+    /*
+     * The dispatcher needs to stop before destroying the I/O thread.
+     *
+     * We need to poll both qemu_aio_context and iohandler_ctx to make
+     * sure that the dispatcher coroutine keeps making progress and
+     * eventually terminates.  qemu_aio_context is automatically
+     * polled by calling AIO_WAIT_WHILE on it, but we must poll
+     * iohandler_ctx manually.
+     */
+    qmp_dispatcher_co_shutdown = true;
+    if (!qatomic_xchg(&qmp_dispatcher_co_busy, true)) {
+        aio_co_wake(qmp_dispatcher_co);
+    }
+
+    AIO_WAIT_WHILE(qemu_get_aio_context(),
+                   (aio_poll(iohandler_get_aio_context(), false),
+                    qatomic_mb_read(&qmp_dispatcher_co_busy)));
+
     if (mon_iothread) {
         iothread_destroy(mon_iothread);
         mon_iothread = NULL;
@@ -649,9 +688,9 @@ void monitor_init_globals_core(void)
      * have commands assuming that context.  It would be nice to get
      * rid of those assumptions.
      */
-    qmp_dispatcher_bh = aio_bh_new(iohandler_get_aio_context(),
-                                   monitor_qmp_bh_dispatcher,
-                                   NULL);
+    qmp_dispatcher_co = qemu_coroutine_create(monitor_qmp_dispatcher_co, NULL);
+    qatomic_mb_set(&qmp_dispatcher_co_busy, true);
+    aio_co_schedule(iohandler_get_aio_context(), qmp_dispatcher_co);
 }
 
 int monitor_init(MonitorOptions *opts, bool allow_hmp, Error **errp)
diff --git a/monitor/qmp.c b/monitor/qmp.c
index e746b3575d..b42f8c6af3 100644
--- a/monitor/qmp.c
+++ b/monitor/qmp.c
@@ -133,6 +133,10 @@ static void monitor_qmp_respond(MonitorQMP *mon, QDict *rsp)
     }
 }
 
+/*
+ * Runs outside of coroutine context for OOB commands, but in
+ * coroutine context for everything else.
+ */
 static void monitor_qmp_dispatch(MonitorQMP *mon, QObject *req)
 {
     QDict *rsp;
@@ -206,43 +210,99 @@ static QMPRequest *monitor_qmp_requests_pop_any_with_lock(void)
     return req_obj;
 }
 
-void monitor_qmp_bh_dispatcher(void *data)
+void coroutine_fn monitor_qmp_dispatcher_co(void *data)
 {
-    QMPRequest *req_obj = monitor_qmp_requests_pop_any_with_lock();
+    QMPRequest *req_obj = NULL;
     QDict *rsp;
     bool need_resume;
     MonitorQMP *mon;
 
-    if (!req_obj) {
-        return;
-    }
+    while (true) {
+        assert(qatomic_mb_read(&qmp_dispatcher_co_busy) == true);
 
-    mon = req_obj->mon;
-    /*  qmp_oob_enabled() might change after "qmp_capabilities" */
-    need_resume = !qmp_oob_enabled(mon) ||
-        mon->qmp_requests->length == QMP_REQ_QUEUE_LEN_MAX - 1;
-    qemu_mutex_unlock(&mon->qmp_queue_lock);
-    if (req_obj->req) {
-        QDict *qdict = qobject_to(QDict, req_obj->req);
-        QObject *id = qdict ? qdict_get(qdict, "id") : NULL;
-        trace_monitor_qmp_cmd_in_band(qobject_get_try_str(id) ?: "");
-        monitor_qmp_dispatch(mon, req_obj->req);
-    } else {
-        assert(req_obj->err);
-        rsp = qmp_error_response(req_obj->err);
-        req_obj->err = NULL;
-        monitor_qmp_respond(mon, rsp);
-        qobject_unref(rsp);
-    }
+        /*
+         * Mark the dispatcher as not busy already here so that we
+         * don't miss any new requests coming in the middle of our
+         * processing.
+         */
+        qatomic_mb_set(&qmp_dispatcher_co_busy, false);
 
-    if (need_resume) {
-        /* Pairs with the monitor_suspend() in handle_qmp_command() */
-        monitor_resume(&mon->common);
-    }
-    qmp_request_free(req_obj);
+        while (!(req_obj = monitor_qmp_requests_pop_any_with_lock())) {
+            /*
+             * No more requests to process.  Wait to be reentered from
+             * handle_qmp_command() when it pushes more requests, or
+             * from monitor_cleanup() when it requests shutdown.
+             */
+            if (!qmp_dispatcher_co_shutdown) {
+                qemu_coroutine_yield();
 
-    /* Reschedule instead of looping so the main loop stays responsive */
-    qemu_bh_schedule(qmp_dispatcher_bh);
+                /*
+                 * busy must be set to true again by whoever
+                 * rescheduled us to avoid double scheduling
+                 */
+                assert(qatomic_xchg(&qmp_dispatcher_co_busy, false) == true);
+            }
+
+            /*
+             * qmp_dispatcher_co_shutdown may have changed if we
+             * yielded and were reentered from monitor_cleanup()
+             */
+            if (qmp_dispatcher_co_shutdown) {
+                return;
+            }
+        }
+
+        if (qatomic_xchg(&qmp_dispatcher_co_busy, true) == true) {
+            /*
+             * Someone rescheduled us (probably because a new requests
+             * came in), but we didn't actually yield. Do that now,
+             * only to be immediately reentered and removed from the
+             * list of scheduled coroutines.
+             */
+            qemu_coroutine_yield();
+        }
+
+        /*
+         * Move the coroutine from iohandler_ctx to qemu_aio_context for
+         * executing the command handler so that it can make progress if it
+         * involves an AIO_WAIT_WHILE().
+         */
+        aio_co_schedule(qemu_get_aio_context(), qmp_dispatcher_co);
+        qemu_coroutine_yield();
+
+        mon = req_obj->mon;
+        /* qmp_oob_enabled() might change after "qmp_capabilities" */
+        need_resume = !qmp_oob_enabled(mon) ||
+            mon->qmp_requests->length == QMP_REQ_QUEUE_LEN_MAX - 1;
+        qemu_mutex_unlock(&mon->qmp_queue_lock);
+        if (req_obj->req) {
+            QDict *qdict = qobject_to(QDict, req_obj->req);
+            QObject *id = qdict ? qdict_get(qdict, "id") : NULL;
+            trace_monitor_qmp_cmd_in_band(qobject_get_try_str(id) ?: "");
+            monitor_qmp_dispatch(mon, req_obj->req);
+        } else {
+            assert(req_obj->err);
+            rsp = qmp_error_response(req_obj->err);
+            req_obj->err = NULL;
+            monitor_qmp_respond(mon, rsp);
+            qobject_unref(rsp);
+        }
+
+        if (need_resume) {
+            /* Pairs with the monitor_suspend() in handle_qmp_command() */
+            monitor_resume(&mon->common);
+        }
+        qmp_request_free(req_obj);
+
+        /*
+         * Yield and reschedule so the main loop stays responsive.
+         *
+         * Move back to iohandler_ctx so that nested event loops for
+         * qemu_aio_context don't start new monitor commands.
+         */
+        aio_co_schedule(iohandler_get_aio_context(), qmp_dispatcher_co);
+        qemu_coroutine_yield();
+    }
 }
 
 static void handle_qmp_command(void *opaque, QObject *req, Error *err)
@@ -303,7 +363,9 @@ static void handle_qmp_command(void *opaque, QObject *req, Error *err)
     qemu_mutex_unlock(&mon->qmp_queue_lock);
 
     /* Kick the dispatcher routine */
-    qemu_bh_schedule(qmp_dispatcher_bh);
+    if (!qatomic_xchg(&qmp_dispatcher_co_busy, true)) {
+        aio_co_wake(qmp_dispatcher_co);
+    }
 }
 
 static void monitor_qmp_read(void *opaque, const uint8_t *buf, int size)
diff --git a/qapi/qmp-dispatch.c b/qapi/qmp-dispatch.c
index 5677ba92ca..9a2d7dd29a 100644
--- a/qapi/qmp-dispatch.c
+++ b/qapi/qmp-dispatch.c
@@ -12,12 +12,16 @@
  */
 
 #include "qemu/osdep.h"
+
+#include "block/aio.h"
 #include "qapi/error.h"
 #include "qapi/qmp/dispatch.h"
 #include "qapi/qmp/qdict.h"
 #include "qapi/qmp/qjson.h"
 #include "sysemu/runstate.h"
 #include "qapi/qmp/qbool.h"
+#include "qemu/coroutine.h"
+#include "qemu/main-loop.h"
 
 static QDict *qmp_dispatch_check_obj(QDict *dict, bool allow_oob,
                                      Error **errp)
@@ -88,6 +92,30 @@ bool qmp_is_oob(const QDict *dict)
         && !qdict_haskey(dict, "execute");
 }
 
+typedef struct QmpDispatchBH {
+    const QmpCommand *cmd;
+    Monitor *cur_mon;
+    QDict *args;
+    QObject **ret;
+    Error **errp;
+    Coroutine *co;
+} QmpDispatchBH;
+
+static void do_qmp_dispatch_bh(void *opaque)
+{
+    QmpDispatchBH *data = opaque;
+
+    assert(monitor_cur() == NULL);
+    monitor_set_cur(qemu_coroutine_self(), data->cur_mon);
+    data->cmd->fn(data->args, data->ret, data->errp);
+    monitor_set_cur(qemu_coroutine_self(), NULL);
+    aio_co_wake(data->co);
+}
+
+/*
+ * Runs outside of coroutine context for OOB commands, but in coroutine
+ * context for everything else.
+ */
 QDict *qmp_dispatch(const QmpCommandList *cmds, QObject *request,
                     bool allow_oob, Monitor *cur_mon)
 {
@@ -153,12 +181,39 @@ QDict *qmp_dispatch(const QmpCommandList *cmds, QObject *request,
         qobject_ref(args);
     }
 
+    assert(!(oob && qemu_in_coroutine()));
     assert(monitor_cur() == NULL);
-    monitor_set_cur(qemu_coroutine_self(), cur_mon);
+    if (!!(cmd->options & QCO_COROUTINE) == qemu_in_coroutine()) {
+        monitor_set_cur(qemu_coroutine_self(), cur_mon);
+        cmd->fn(args, &ret, &err);
+        monitor_set_cur(qemu_coroutine_self(), NULL);
+    } else {
+       /*
+        * Actual context doesn't match the one the command needs.
+        *
+        * Case 1: we are in coroutine context, but command does not
+        * have QCO_COROUTINE.  We need to drop out of coroutine
+        * context for executing it.
+        *
+        * Case 2: we are outside coroutine context, but command has
+        * QCO_COROUTINE.  Can't actually happen, because we get here
+        * outside coroutine context only when executing a command
+        * out of band, and OOB commands never have QCO_COROUTINE.
+        */
+        assert(!oob && qemu_in_coroutine() && !(cmd->options & QCO_COROUTINE));
 
-    cmd->fn(args, &ret, &err);
-
-    monitor_set_cur(qemu_coroutine_self(), NULL);
+        QmpDispatchBH data = {
+            .cur_mon    = cur_mon,
+            .cmd        = cmd,
+            .args       = args,
+            .ret        = &ret,
+            .errp       = &err,
+            .co         = qemu_coroutine_self(),
+        };
+        aio_bh_schedule_oneshot(qemu_get_aio_context(), do_qmp_dispatch_bh,
+                                &data);
+        qemu_coroutine_yield();
+    }
     qobject_unref(args);
     if (err) {
         /* or assert(!ret) after reviewing all handlers: */
diff --git a/qapi/qmp-registry.c b/qapi/qmp-registry.c
index d0f9a1d3e3..58c65b5052 100644
--- a/qapi/qmp-registry.c
+++ b/qapi/qmp-registry.c
@@ -20,6 +20,9 @@ void qmp_register_command(QmpCommandList *cmds, const char *name,
 {
     QmpCommand *cmd = g_malloc0(sizeof(*cmd));
 
+    /* QCO_COROUTINE and QCO_ALLOW_OOB are incompatible for now */
+    assert(!((options & QCO_COROUTINE) && (options & QCO_ALLOW_OOB)));
+
     cmd->name = name;
     cmd->fn = fn;
     cmd->enabled = true;
diff --git a/util/aio-posix.c b/util/aio-posix.c
index 280f27bb99..30f5354b1e 100644
--- a/util/aio-posix.c
+++ b/util/aio-posix.c
@@ -15,6 +15,7 @@
 
 #include "qemu/osdep.h"
 #include "block/block.h"
+#include "qemu/main-loop.h"
 #include "qemu/rcu.h"
 #include "qemu/rcu_queue.h"
 #include "qemu/sockets.h"
@@ -558,8 +559,13 @@ bool aio_poll(AioContext *ctx, bool blocking)
      * There cannot be two concurrent aio_poll calls for the same AioContext (or
      * an aio_poll concurrent with a GSource prepare/check/dispatch callback).
      * We rely on this below to avoid slow locked accesses to ctx->notify_me.
+     *
+     * aio_poll() may only be called in the AioContext's thread. iohandler_ctx
+     * is special in that it runs in the main thread, but that thread's context
+     * is qemu_aio_context.
      */
-    assert(in_aio_context_home_thread(ctx));
+    assert(in_aio_context_home_thread(ctx == iohandler_get_aio_context() ?
+                                      qemu_get_aio_context() : ctx));
 
     qemu_lockcnt_inc(&ctx->list_lock);
 
-- 
2.26.2



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PULL 10/14] hmp: Add support for coroutine command handlers
  2020-10-09  6:34 [PULL 00/14] Monitor patches for 2020-10-09 Markus Armbruster
                   ` (8 preceding siblings ...)
  2020-10-09  6:34 ` [PULL 09/14] qmp: Move dispatcher to a coroutine Markus Armbruster
@ 2020-10-09  6:34 ` Markus Armbruster
  2020-10-09  6:34 ` [PULL 11/14] util/async: Add aio_co_reschedule_self() Markus Armbruster
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 27+ messages in thread
From: Markus Armbruster @ 2020-10-09  6:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Dr . David Alan Gilbert, Stefan Hajnoczi

From: Kevin Wolf <kwolf@redhat.com>

Often, QMP command handlers are not only called to handle QMP commands,
but also from a corresponding HMP command handler. In order to give them
a consistent environment, optionally run HMP command handlers in a
coroutine, too.

The implementation is a lot simpler than in QMP because for HMP, we
still block the VM while the coroutine is running.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20201005155855.256490-11-kwolf@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 docs/devel/qapi-code-gen.txt |  4 ++--
 monitor/monitor-internal.h   |  1 +
 monitor/hmp.c                | 37 +++++++++++++++++++++++++++++++-----
 3 files changed, 35 insertions(+), 7 deletions(-)

diff --git a/docs/devel/qapi-code-gen.txt b/docs/devel/qapi-code-gen.txt
index 4a41e36a75..c6438c6aa9 100644
--- a/docs/devel/qapi-code-gen.txt
+++ b/docs/devel/qapi-code-gen.txt
@@ -617,8 +617,8 @@ pitfalls are:
 
 Since the command handler may assume coroutine context, any callers
 other than the QMP dispatcher must also call it in coroutine context.
-In particular, HMP commands calling such a QMP command handler must
-enter coroutine context before calling the handler.
+In particular, HMP commands calling such a QMP command handler must be
+marked .coroutine = true in hmp-commands.hx.
 
 It is an error to specify both 'coroutine': true and 'allow-oob': true
 for a command.  We don't currently have a use case for both together and
diff --git a/monitor/monitor-internal.h b/monitor/monitor-internal.h
index b55d6df07f..ad2e64be13 100644
--- a/monitor/monitor-internal.h
+++ b/monitor/monitor-internal.h
@@ -74,6 +74,7 @@ typedef struct HMPCommand {
     const char *help;
     const char *flags; /* p=preconfig */
     void (*cmd)(Monitor *mon, const QDict *qdict);
+    bool coroutine;
     /*
      * @sub_table is a list of 2nd level of commands. If it does not exist,
      * cmd should be used. If it exists, sub_table[?].cmd should be
diff --git a/monitor/hmp.c b/monitor/hmp.c
index abaf939b2d..c5cd9d372b 100644
--- a/monitor/hmp.c
+++ b/monitor/hmp.c
@@ -1056,12 +1056,26 @@ fail:
     return NULL;
 }
 
+typedef struct HandleHmpCommandCo {
+    Monitor *mon;
+    const HMPCommand *cmd;
+    QDict *qdict;
+    bool done;
+} HandleHmpCommandCo;
+
+static void handle_hmp_command_co(void *opaque)
+{
+    HandleHmpCommandCo *data = opaque;
+    data->cmd->cmd(data->mon, data->qdict);
+    monitor_set_cur(qemu_coroutine_self(), NULL);
+    data->done = true;
+}
+
 void handle_hmp_command(MonitorHMP *mon, const char *cmdline)
 {
     QDict *qdict;
     const HMPCommand *cmd;
     const char *cmd_start = cmdline;
-    Monitor *old_mon;
 
     trace_handle_hmp_command(mon, cmdline);
 
@@ -1080,10 +1094,23 @@ void handle_hmp_command(MonitorHMP *mon, const char *cmdline)
         return;
     }
 
-    /* old_mon is non-NULL when called from qmp_human_monitor_command() */
-    old_mon = monitor_set_cur(qemu_coroutine_self(), &mon->common);
-    cmd->cmd(&mon->common, qdict);
-    monitor_set_cur(qemu_coroutine_self(), old_mon);
+    if (!cmd->coroutine) {
+        /* old_mon is non-NULL when called from qmp_human_monitor_command() */
+        Monitor *old_mon = monitor_set_cur(qemu_coroutine_self(), &mon->common);
+        cmd->cmd(&mon->common, qdict);
+        monitor_set_cur(qemu_coroutine_self(), old_mon);
+    } else {
+        HandleHmpCommandCo data = {
+            .mon = &mon->common,
+            .cmd = cmd,
+            .qdict = qdict,
+            .done = false,
+        };
+        Coroutine *co = qemu_coroutine_create(handle_hmp_command_co, &data);
+        monitor_set_cur(co, &mon->common);
+        aio_co_enter(qemu_get_aio_context(), co);
+        AIO_WAIT_WHILE(qemu_get_aio_context(), !data.done);
+    }
 
     qobject_unref(qdict);
 }
-- 
2.26.2



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PULL 11/14] util/async: Add aio_co_reschedule_self()
  2020-10-09  6:34 [PULL 00/14] Monitor patches for 2020-10-09 Markus Armbruster
                   ` (9 preceding siblings ...)
  2020-10-09  6:34 ` [PULL 10/14] hmp: Add support for coroutine command handlers Markus Armbruster
@ 2020-10-09  6:34 ` Markus Armbruster
  2020-10-09  6:34 ` [PULL 12/14] block: Add bdrv_co_enter()/leave() Markus Armbruster
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 27+ messages in thread
From: Markus Armbruster @ 2020-10-09  6:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Stefan Hajnoczi

From: Kevin Wolf <kwolf@redhat.com>

Add a function that can be used to move the currently running coroutine
to a different AioContext (and therefore potentially a different
thread).

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20201005155855.256490-12-kwolf@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 include/block/aio.h | 10 ++++++++++
 util/async.c        | 30 ++++++++++++++++++++++++++++++
 2 files changed, 40 insertions(+)

diff --git a/include/block/aio.h b/include/block/aio.h
index ec8c5af642..5f342267d5 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -17,6 +17,7 @@
 #ifdef CONFIG_LINUX_IO_URING
 #include <liburing.h>
 #endif
+#include "qemu/coroutine.h"
 #include "qemu/queue.h"
 #include "qemu/event_notifier.h"
 #include "qemu/thread.h"
@@ -654,6 +655,15 @@ static inline bool aio_node_check(AioContext *ctx, bool is_external)
  */
 void aio_co_schedule(AioContext *ctx, struct Coroutine *co);
 
+/**
+ * aio_co_reschedule_self:
+ * @new_ctx: the new context
+ *
+ * Move the currently running coroutine to new_ctx. If the coroutine is already
+ * running in new_ctx, do nothing.
+ */
+void coroutine_fn aio_co_reschedule_self(AioContext *new_ctx);
+
 /**
  * aio_co_wake:
  * @co: the coroutine
diff --git a/util/async.c b/util/async.c
index f758354c6a..674dbefb7c 100644
--- a/util/async.c
+++ b/util/async.c
@@ -569,6 +569,36 @@ void aio_co_schedule(AioContext *ctx, Coroutine *co)
     aio_context_unref(ctx);
 }
 
+typedef struct AioCoRescheduleSelf {
+    Coroutine *co;
+    AioContext *new_ctx;
+} AioCoRescheduleSelf;
+
+static void aio_co_reschedule_self_bh(void *opaque)
+{
+    AioCoRescheduleSelf *data = opaque;
+    aio_co_schedule(data->new_ctx, data->co);
+}
+
+void coroutine_fn aio_co_reschedule_self(AioContext *new_ctx)
+{
+    AioContext *old_ctx = qemu_get_current_aio_context();
+
+    if (old_ctx != new_ctx) {
+        AioCoRescheduleSelf data = {
+            .co = qemu_coroutine_self(),
+            .new_ctx = new_ctx,
+        };
+        /*
+         * We can't directly schedule the coroutine in the target context
+         * because this would be racy: The other thread could try to enter the
+         * coroutine before it has yielded in this one.
+         */
+        aio_bh_schedule_oneshot(old_ctx, aio_co_reschedule_self_bh, &data);
+        qemu_coroutine_yield();
+    }
+}
+
 void aio_co_wake(struct Coroutine *co)
 {
     AioContext *ctx;
-- 
2.26.2



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PULL 12/14] block: Add bdrv_co_enter()/leave()
  2020-10-09  6:34 [PULL 00/14] Monitor patches for 2020-10-09 Markus Armbruster
                   ` (10 preceding siblings ...)
  2020-10-09  6:34 ` [PULL 11/14] util/async: Add aio_co_reschedule_self() Markus Armbruster
@ 2020-10-09  6:34 ` Markus Armbruster
  2020-10-09  6:34 ` [PULL 13/14] block: Add bdrv_lock()/unlock() Markus Armbruster
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 27+ messages in thread
From: Markus Armbruster @ 2020-10-09  6:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Stefan Hajnoczi

From: Kevin Wolf <kwolf@redhat.com>

Add a pair of functions to temporarily move the current coroutine to the
AioContext of a given BlockDriverState.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20201005155855.256490-13-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 include/block/block.h | 17 +++++++++++++++++
 block.c               | 23 +++++++++++++++++++++++
 2 files changed, 40 insertions(+)

diff --git a/include/block/block.h b/include/block/block.h
index ce2ac39299..1027c58a41 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -640,6 +640,23 @@ bool bdrv_debug_is_suspended(BlockDriverState *bs, const char *tag);
  */
 AioContext *bdrv_get_aio_context(BlockDriverState *bs);
 
+/**
+ * Move the current coroutine to the AioContext of @bs and return the old
+ * AioContext of the coroutine. Increase bs->in_flight so that draining @bs
+ * will wait for the operation to proceed until the corresponding
+ * bdrv_co_leave().
+ *
+ * Consequently, you can't call drain inside a bdrv_co_enter/leave() section as
+ * this will deadlock.
+ */
+AioContext *coroutine_fn bdrv_co_enter(BlockDriverState *bs);
+
+/**
+ * Ends a section started by bdrv_co_enter(). Move the current coroutine back
+ * to old_ctx and decrease bs->in_flight again.
+ */
+void coroutine_fn bdrv_co_leave(BlockDriverState *bs, AioContext *old_ctx);
+
 /**
  * Transfer control to @co in the aio context of @bs
  */
diff --git a/block.c b/block.c
index 52b2e2709f..8d9b9017d3 100644
--- a/block.c
+++ b/block.c
@@ -6303,6 +6303,29 @@ AioContext *bdrv_get_aio_context(BlockDriverState *bs)
     return bs ? bs->aio_context : qemu_get_aio_context();
 }
 
+AioContext *coroutine_fn bdrv_co_enter(BlockDriverState *bs)
+{
+    Coroutine *self = qemu_coroutine_self();
+    AioContext *old_ctx = qemu_coroutine_get_aio_context(self);
+    AioContext *new_ctx;
+
+    /*
+     * Increase bs->in_flight to ensure that this operation is completed before
+     * moving the node to a different AioContext. Read new_ctx only afterwards.
+     */
+    bdrv_inc_in_flight(bs);
+
+    new_ctx = bdrv_get_aio_context(bs);
+    aio_co_reschedule_self(new_ctx);
+    return old_ctx;
+}
+
+void coroutine_fn bdrv_co_leave(BlockDriverState *bs, AioContext *old_ctx)
+{
+    aio_co_reschedule_self(old_ctx);
+    bdrv_dec_in_flight(bs);
+}
+
 void bdrv_coroutine_enter(BlockDriverState *bs, Coroutine *co)
 {
     aio_co_enter(bdrv_get_aio_context(bs), co);
-- 
2.26.2



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PULL 13/14] block: Add bdrv_lock()/unlock()
  2020-10-09  6:34 [PULL 00/14] Monitor patches for 2020-10-09 Markus Armbruster
                   ` (11 preceding siblings ...)
  2020-10-09  6:34 ` [PULL 12/14] block: Add bdrv_co_enter()/leave() Markus Armbruster
@ 2020-10-09  6:34 ` Markus Armbruster
  2020-10-09  6:34 ` [PULL 14/14] block: Convert 'block_resize' to coroutine Markus Armbruster
  2020-10-09 13:47 ` [PULL 00/14] Monitor patches for 2020-10-09 Peter Maydell
  14 siblings, 0 replies; 27+ messages in thread
From: Markus Armbruster @ 2020-10-09  6:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Stefan Hajnoczi

From: Kevin Wolf <kwolf@redhat.com>

Inside of coroutine context, we can't directly use aio_context_acquire()
for the AioContext of a block node because we already own the lock of
the current AioContext and we need to avoid double locking to prevent
deadlocks.

This provides helper functions to lock the AioContext of a node only if
it's not the same as the current AioContext.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20201005155855.256490-14-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 include/block/block.h | 14 ++++++++++++++
 block.c               | 27 +++++++++++++++++++++++++++
 2 files changed, 41 insertions(+)

diff --git a/include/block/block.h b/include/block/block.h
index 1027c58a41..d16c401cb4 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -657,6 +657,20 @@ AioContext *coroutine_fn bdrv_co_enter(BlockDriverState *bs);
  */
 void coroutine_fn bdrv_co_leave(BlockDriverState *bs, AioContext *old_ctx);
 
+/**
+ * Locks the AioContext of @bs if it's not the current AioContext. This avoids
+ * double locking which could lead to deadlocks: This is a coroutine_fn, so we
+ * know we already own the lock of the current AioContext.
+ *
+ * May only be called in the main thread.
+ */
+void coroutine_fn bdrv_co_lock(BlockDriverState *bs);
+
+/**
+ * Unlocks the AioContext of @bs if it's not the current AioContext.
+ */
+void coroutine_fn bdrv_co_unlock(BlockDriverState *bs);
+
 /**
  * Transfer control to @co in the aio context of @bs
  */
diff --git a/block.c b/block.c
index 8d9b9017d3..430edf79bb 100644
--- a/block.c
+++ b/block.c
@@ -6326,6 +6326,33 @@ void coroutine_fn bdrv_co_leave(BlockDriverState *bs, AioContext *old_ctx)
     bdrv_dec_in_flight(bs);
 }
 
+void coroutine_fn bdrv_co_lock(BlockDriverState *bs)
+{
+    AioContext *ctx = bdrv_get_aio_context(bs);
+
+    /* In the main thread, bs->aio_context won't change concurrently */
+    assert(qemu_get_current_aio_context() == qemu_get_aio_context());
+
+    /*
+     * We're in coroutine context, so we already hold the lock of the main
+     * loop AioContext. Don't lock it twice to avoid deadlocks.
+     */
+    assert(qemu_in_coroutine());
+    if (ctx != qemu_get_aio_context()) {
+        aio_context_acquire(ctx);
+    }
+}
+
+void coroutine_fn bdrv_co_unlock(BlockDriverState *bs)
+{
+    AioContext *ctx = bdrv_get_aio_context(bs);
+
+    assert(qemu_in_coroutine());
+    if (ctx != qemu_get_aio_context()) {
+        aio_context_release(ctx);
+    }
+}
+
 void bdrv_coroutine_enter(BlockDriverState *bs, Coroutine *co)
 {
     aio_co_enter(bdrv_get_aio_context(bs), co);
-- 
2.26.2



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PULL 14/14] block: Convert 'block_resize' to coroutine
  2020-10-09  6:34 [PULL 00/14] Monitor patches for 2020-10-09 Markus Armbruster
                   ` (12 preceding siblings ...)
  2020-10-09  6:34 ` [PULL 13/14] block: Add bdrv_lock()/unlock() Markus Armbruster
@ 2020-10-09  6:34 ` Markus Armbruster
  2020-10-09 13:47 ` [PULL 00/14] Monitor patches for 2020-10-09 Peter Maydell
  14 siblings, 0 replies; 27+ messages in thread
From: Markus Armbruster @ 2020-10-09  6:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Stefan Hajnoczi

From: Kevin Wolf <kwolf@redhat.com>

block_resize performs some I/O that could potentially take quite some
time, so use it as an example for the new 'coroutine': true annotation
in the QAPI schema.

bdrv_truncate() requires that we're already in the right AioContext for
the BlockDriverState if called in coroutine context. So instead of just
taking the AioContext lock, move the QMP handler coroutine to the
context.

Call blk_unref() only after switching back because blk_unref() may only
be called in the main thread.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20201005155855.256490-15-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 qapi/block-core.json |  3 ++-
 blockdev.c           | 16 ++++++++--------
 hmp-commands.hx      |  1 +
 3 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 3758ea9912..ee5ebef7f2 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1310,7 +1310,8 @@
 { 'command': 'block_resize',
   'data': { '*device': 'str',
             '*node-name': 'str',
-            'size': 'int' } }
+            'size': 'int' },
+  'coroutine': true }
 
 ##
 # @NewImageMode:
diff --git a/blockdev.c b/blockdev.c
index a6ae475dac..fe6fb5dc1d 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2449,14 +2449,14 @@ BlockDirtyBitmapSha256 *qmp_x_debug_block_dirty_bitmap_sha256(const char *node,
     return ret;
 }
 
-void qmp_block_resize(bool has_device, const char *device,
-                      bool has_node_name, const char *node_name,
-                      int64_t size, Error **errp)
+void coroutine_fn qmp_block_resize(bool has_device, const char *device,
+                                   bool has_node_name, const char *node_name,
+                                   int64_t size, Error **errp)
 {
     Error *local_err = NULL;
     BlockBackend *blk = NULL;
     BlockDriverState *bs;
-    AioContext *aio_context;
+    AioContext *old_ctx;
 
     bs = bdrv_lookup_bs(has_device ? device : NULL,
                         has_node_name ? node_name : NULL,
@@ -2466,9 +2466,6 @@ void qmp_block_resize(bool has_device, const char *device,
         return;
     }
 
-    aio_context = bdrv_get_aio_context(bs);
-    aio_context_acquire(aio_context);
-
     if (size < 0) {
         error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "size", "a >0 size");
         goto out;
@@ -2485,12 +2482,15 @@ void qmp_block_resize(bool has_device, const char *device,
     }
 
     bdrv_drained_begin(bs);
+    old_ctx = bdrv_co_enter(bs);
     blk_truncate(blk, size, false, PREALLOC_MODE_OFF, 0, errp);
+    bdrv_co_leave(bs, old_ctx);
     bdrv_drained_end(bs);
 
 out:
+    bdrv_co_lock(bs);
     blk_unref(blk);
-    aio_context_release(aio_context);
+    bdrv_co_unlock(bs);
 }
 
 void qmp_block_stream(bool has_job_id, const char *job_id, const char *device,
diff --git a/hmp-commands.hx b/hmp-commands.hx
index e43ce600b8..cd068389de 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -76,6 +76,7 @@ ERST
         .params     = "device size",
         .help       = "resize a block image",
         .cmd        = hmp_block_resize,
+        .coroutine  = true,
     },
 
 SRST
-- 
2.26.2



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PULL 00/14] Monitor patches for 2020-10-09
  2020-10-09  6:34 [PULL 00/14] Monitor patches for 2020-10-09 Markus Armbruster
                   ` (13 preceding siblings ...)
  2020-10-09  6:34 ` [PULL 14/14] block: Convert 'block_resize' to coroutine Markus Armbruster
@ 2020-10-09 13:47 ` Peter Maydell
  14 siblings, 0 replies; 27+ messages in thread
From: Peter Maydell @ 2020-10-09 13:47 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: Kevin Wolf, QEMU Developers

On Fri, 9 Oct 2020 at 07:37, Markus Armbruster <armbru@redhat.com> wrote:
>
> The following changes since commit 497d415d76b9f59fcae27f22df1ca2c3fa4df64e:
>
>   Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20201008-1' into staging (2020-10-08 21:41:20 +0100)
>
> are available in the Git repository at:
>
>   git://repo.or.cz/qemu/armbru.git tags/pull-monitor-2020-10-09
>
> for you to fetch changes up to eb94b81a94bce112e6b206df846c1551aaf6cab6:
>
>   block: Convert 'block_resize' to coroutine (2020-10-09 07:08:20 +0200)
>
> ----------------------------------------------------------------
> Monitor patches for 2020-10-09
>
> * QMP command block-resize and HMP command block_resize no longer
>   block the main loop.


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/5.2
for any user-visible changes.

-- PMM


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PULL 09/14] qmp: Move dispatcher to a coroutine
  2020-10-09  6:34 ` [PULL 09/14] qmp: Move dispatcher to a coroutine Markus Armbruster
@ 2020-10-12 10:47   ` Alex Bennée
  2020-10-12 11:25     ` Kevin Wolf
  2020-10-17  8:15   ` Volker Rümelin
  1 sibling, 1 reply; 27+ messages in thread
From: Alex Bennée @ 2020-10-12 10:47 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: kwolf, Pavel Dovgalyuk, qemu-devel, Stefan Hajnoczi


Markus Armbruster <armbru@redhat.com> writes:

> From: Kevin Wolf <kwolf@redhat.com>
>
> This moves the QMP dispatcher to a coroutine and runs all QMP command
> handlers that declare 'coroutine': true in coroutine context so they
> can avoid blocking the main loop while doing I/O or waiting for other
> events.

This subtly changes the replay behaviour leading to a hang in:

  10:55:18 [alex.bennee@hackbox2:~/l/q/b/bisect] (625581c2…)|✚1(+1/-1) + ./tests/venv/bin/avocado run tests/acceptance/replay_kernel.py:ReplayKernel.test_arm_virt
  Fetching asset from tests/acceptance/replay_kernel.py:ReplayKernel.test_arm_virt
  JOB ID     : ec11fd2544f06e6c0d421f16afa757b49f7ed734
  JOB LOG    : /home/alex.bennee/avocado/job-results/job-2020-10-12T11.40-ec11fd2/job.log
   (1/1) tests/acceptance/replay_kernel.py:ReplayKernel.test_arm_virt: ERROR: Could not perform graceful shutdown (26.27 s)
  RESULTS    : PASS 0 | ERROR 1 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
  JOB TIME   : 27.77 s

Looking at the log:

  2020-10-12 11:40:31,426 __init__         L0085 DEBUG| [    3.887411] rtc-pl031 9010000.pl031: setting system clock to 2020-10-12 10:40:31 UTC (1602499231)
  2020-10-12 11:40:31,428 __init__         L0085 DEBUG| [    3.887431] sr_init: No PMIC hook to init smartreflex
  2020-10-12 11:40:31,447 __init__         L0085 DEBUG| [    3.897193] uart-pl011 9000000.pl011: no DMA platform data
  2020-10-12 11:40:31,460 __init__         L0085 DEBUG| [    3.897242] md: Waiting for all devices to be available before autodetect
  2020-10-12 11:40:31,462 __init__         L0085 DEBUG| [    3.897259] md: If you don't use raid, use raid=noautodetect
  2020-10-12 11:40:31,475 __init__         L0085 DEBUG| [    3.897819] md: Autodetecting RAID arrays.
  2020-10-12 11:40:31,476 __init__         L0085 DEBUG| [    3.897832] md: autorun ...
  2020-10-12 11:40:31,477 __init__         L0085 DEBUG| [    3.897842] md: ... autorun DONE.
  2020-10-12 11:40:31,483 __init__         L0085 DEBUG| [    3.897962] VFS: Cannot open root device "(null)" or unknown-block(0,0): error -6
  2020-10-12 11:40:31,483 qmp              L0245 DEBUG| >>> {'execute': 'quit'}
  2020-10-12 11:40:31,495 qmp              L0145 DEBUG| <<< {'timestamp': {'seconds': 1602499231, 'microseconds': 493379}, 'event': 'SHUTDOWN', 'data': {'guest': True, 'reason': 'guest-reset'}}
  2020-10-12 11:40:31,733 machine          L0325 WARNI| qemu received signal 6; command: "./qemu-system-arm -display none -vga none -chardev socket,id=mon,path=/var/tmp/tmpzls53khe/qemu-8487-monitor.sock -mon chardev=mon,mode=control -machine virt -chardev socket,id=console,path=/var/tmp/tmpzls53khe/qemu-8487-console.sock,server,nowait -serial chardev:console -icount shift=1,rr=record,rrfile=/var/tmp/avocado_n00stdrf/avocado_job_aw60qdul/1-tests_acceptance_replay_kernel.py_ReplayKernel.test_arm_virt/replay.bin -kernel /home/alex.bennee/avocado/data/cache/by_location/62750ce9e069e69e6a7ff04ff54c382ee660b92a/vmlinuz -append printk.time=1 panic=-1 console=ttyAMA0 -net none -no-reboot"
  2020-10-12 11:40:31,734 stacktrace       L0039 ERROR|
  2020-10-12 11:40:31,734 stacktrace       L0042 ERROR| Reproduced traceback from: /home/alex.bennee/lsrc/qemu.git/builds/bisect/tests/venv/lib/python3.6/site-packages/avocado/core/test.py:767
  2020-10-12 11:40:31,735 stacktrace       L0045 ERROR| Traceback (most recent call last):
  2020-10-12 11:40:31,735 stacktrace       L0045 ERROR|   File "/home/alex.bennee/lsrc/qemu.git/python/qemu/machine.py", line 435, in _do_shutdown
  2020-10-12 11:40:31,735 stacktrace       L0045 ERROR|     self._soft_shutdown(timeout, has_quit)
  2020-10-12 11:40:31,735 stacktrace       L0045 ERROR|   File "/home/alex.bennee/lsrc/qemu.git/python/qemu/machine.py", line 415, in _soft_shutdown
  2020-10-12 11:40:31,735 stacktrace       L0045 ERROR|     self._qmp.cmd('quit')
  2020-10-12 11:40:31,735 stacktrace       L0045 ERROR|   File "/home/alex.bennee/lsrc/qemu.git/python/qemu/qmp.py", line 266, in cmd
  2020-10-12 11:40:31,735 stacktrace       L0045 ERROR|     return self.cmd_obj(qmp_cmd)
  2020-10-12 11:40:31,735 stacktrace       L0045 ERROR|   File "/home/alex.bennee/lsrc/qemu.git/python/qemu/qmp.py", line 249, in cmd_obj
  2020-10-12 11:40:31,736 stacktrace       L0045 ERROR|     raise QMPConnectError("Unexpected empty reply from server")
  2020-10-12 11:40:31,736 stacktrace       L0045 ERROR| qemu.qmp.QMPConnectError: Unexpected empty reply from server
  2020-10-12 11:40:31,736 stacktrace       L0045 ERROR|
  2020-10-12 11:40:31,736 stacktrace       L0045 ERROR| The above exception was the direct cause of the following exception:
  2020-10-12 11:40:31,736 stacktrace       L0045 ERROR|
  2020-10-12 11:40:31,736 stacktrace       L0045 ERROR| Traceback (most recent call last):
  2020-10-12 11:40:31,736 stacktrace       L0045 ERROR|   File "/home/alex.bennee/lsrc/qemu.git/builds/bisect/tests/acceptance/replay_kernel.py", line 128, in test_arm_virt
  2020-10-12 11:40:31,736 stacktrace       L0045 ERROR|     self.run_rr(kernel_path, kernel_command_line, console_pattern, shift=1)
  2020-10-12 11:40:31,736 stacktrace       L0045 ERROR|   File "/home/alex.bennee/lsrc/qemu.git/builds/bisect/tests/acceptance/replay_kernel.py", line 71, in run_rr
  2020-10-12 11:40:31,736 stacktrace       L0045 ERROR|     True, shift, args, replay_path)
  2020-10-12 11:40:31,736 stacktrace       L0045 ERROR|   File "/home/alex.bennee/lsrc/qemu.git/builds/bisect/tests/acceptance/replay_kernel.py", line 57, in run_vm
  2020-10-12 11:40:31,736 stacktrace       L0045 ERROR|     vm.shutdown()
  2020-10-12 11:40:31,736 stacktrace       L0045 ERROR|   File "/home/alex.bennee/lsrc/qemu.git/python/qemu/machine.py", line 465, in shutdown
  2020-10-12 11:40:31,736 stacktrace       L0045 ERROR|     self._do_shutdown(timeout, has_quit)
  2020-10-12 11:40:31,736 stacktrace       L0045 ERROR|   File "/home/alex.bennee/lsrc/qemu.git/python/qemu/machine.py", line 439, in _do_shutdown
  2020-10-12 11:40:31,736 stacktrace       L0045 ERROR|     from exc
  2020-10-12 11:40:31,737 stacktrace       L0045 ERROR| qemu.machine.AbnormalShutdown: Could not perform graceful shutdown

The commit before:

  2020-10-12 11:44:02,803 __init__         L0085 DEBUG| [    3.897837] md: ... autorun DONE.
  2020-10-12 11:44:02,806 __init__         L0085 DEBUG| [    3.897958] VFS: Cannot open root device "(null)" or unknown-block(0,0): error -6
  2020-10-12 11:44:02,806 qmp              L0245 DEBUG| >>> {'execute': 'quit'}
  2020-10-12 11:44:02,814 qmp              L0250 DEBUG| <<< {'return': {}}
  2020-10-12 11:44:02,879 replay_kernel    L0059 INFO | finished the recording with log size 237596 bytes
  2020-10-12 11:44:02,879 replay_kernel    L0064 INFO | elapsed time 24.35 sec
  2020-10-12 11:44:02,888 replay_kernel    L0044 INFO | replaying the execution...
  2020-10-12 11:44:02,889 machine          L0362 DEBUG| VM launch command: './qemu-system-arm -display none -vga none -chardev socket,id=mon,path=/var/tmp/tmpihqswp6_/qemu-7747-monitor.sock -mon chardev=mon,mode=control -machine virt -chardev socket,id=console,path=/var/tmp/tmpihqswp6_/qemu-7747-console.sock,server,nowait -serial chardev:console
  -icount shift=1,rr=replay,rrfile=/var/tmp/avocado_o3pfy2he/avocado_job_nf30vjqg/1-tests_acceptance_replay_kernel.py_ReplayKernel.test_arm_virt/replay.bin -kernel /home/alex.bennee/avocado/data/cache/by_location/62750ce9e069e69e6a7ff04ff54c382ee660b92a/vmlinuz -append printk.time=1 panic=-1 console=ttyAMA0 -net none -no-reboot'
  2020-10-12 11:44:03,001 qmp              L0245 DEBUG| >>> {'execute': 'qmp_capabilities'}
  2020-10-12 11:44:03,172 qmp              L0250 DEBUG| <<< {'return': {}}
  2020-10-12 11:44:04,899 __init__         L0085 DEBUG| [    0.000000] Booting Linux on physical CPU 0x0
  2020-10-12 11:44:04,904 __init__         L0085 DEBUG| [    0.000000] Linux version 4.18.16-300.fc29.armv7hl (mockbuild@buildvm-armv7-06.arm.fedoraproject.org) (gcc version 8.2.1 20180801 (Red Hat 8.2.1-2) (GCC)) #1 SMP Sun Oct 21 00:56:28 UTC 2018


>
> For commands that are not declared safe to run in a coroutine, the
> dispatcher drops out of coroutine context by calling the QMP command
> handler from a bottom half.
>
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> Reviewed-by: Markus Armbruster <armbru@redhat.com>
> Message-Id: <20201005155855.256490-10-kwolf@redhat.com>
> Reviewed-by: Markus Armbruster <armbru@redhat.com>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>  include/qapi/qmp/dispatch.h |   1 +
>  monitor/monitor-internal.h  |   6 +-
>  monitor/monitor.c           |  55 +++++++++++++---
>  monitor/qmp.c               | 122 +++++++++++++++++++++++++++---------
>  qapi/qmp-dispatch.c         |  63 +++++++++++++++++--
>  qapi/qmp-registry.c         |   3 +
>  util/aio-posix.c            |   8 ++-
>  7 files changed, 213 insertions(+), 45 deletions(-)
>
> diff --git a/include/qapi/qmp/dispatch.h b/include/qapi/qmp/dispatch.h
> index 9fd2b720a7..af8d96c570 100644
> --- a/include/qapi/qmp/dispatch.h
> +++ b/include/qapi/qmp/dispatch.h
> @@ -31,6 +31,7 @@ typedef enum QmpCommandOptions
>  typedef struct QmpCommand
>  {
>      const char *name;
> +    /* Runs in coroutine context if QCO_COROUTINE is set */
>      QmpCommandFunc *fn;
>      QmpCommandOptions options;
>      QTAILQ_ENTRY(QmpCommand) node;
> diff --git a/monitor/monitor-internal.h b/monitor/monitor-internal.h
> index b39e03b744..b55d6df07f 100644
> --- a/monitor/monitor-internal.h
> +++ b/monitor/monitor-internal.h
> @@ -155,7 +155,9 @@ static inline bool monitor_is_qmp(const Monitor *mon)
>  
>  typedef QTAILQ_HEAD(MonitorList, Monitor) MonitorList;
>  extern IOThread *mon_iothread;
> -extern QEMUBH *qmp_dispatcher_bh;
> +extern Coroutine *qmp_dispatcher_co;
> +extern bool qmp_dispatcher_co_shutdown;
> +extern bool qmp_dispatcher_co_busy;
>  extern QmpCommandList qmp_commands, qmp_cap_negotiation_commands;
>  extern QemuMutex monitor_lock;
>  extern MonitorList mon_list;
> @@ -173,7 +175,7 @@ void monitor_fdsets_cleanup(void);
>  
>  void qmp_send_response(MonitorQMP *mon, const QDict *rsp);
>  void monitor_data_destroy_qmp(MonitorQMP *mon);
> -void monitor_qmp_bh_dispatcher(void *data);
> +void coroutine_fn monitor_qmp_dispatcher_co(void *data);
>  
>  int get_monitor_def(int64_t *pval, const char *name);
>  void help_cmd(Monitor *mon, const char *name);
> diff --git a/monitor/monitor.c b/monitor/monitor.c
> index ef68ca9d21..ceffe1a83b 100644
> --- a/monitor/monitor.c
> +++ b/monitor/monitor.c
> @@ -55,8 +55,32 @@ typedef struct {
>  /* Shared monitor I/O thread */
>  IOThread *mon_iothread;
>  
> -/* Bottom half to dispatch the requests received from I/O thread */
> -QEMUBH *qmp_dispatcher_bh;
> +/* Coroutine to dispatch the requests received from I/O thread */
> +Coroutine *qmp_dispatcher_co;
> +
> +/* Set to true when the dispatcher coroutine should terminate */
> +bool qmp_dispatcher_co_shutdown;
> +
> +/*
> + * qmp_dispatcher_co_busy is used for synchronisation between the
> + * monitor thread and the main thread to ensure that the dispatcher
> + * coroutine never gets scheduled a second time when it's already
> + * scheduled (scheduling the same coroutine twice is forbidden).
> + *
> + * It is true if the coroutine is active and processing requests.
> + * Additional requests may then be pushed onto mon->qmp_requests,
> + * and @qmp_dispatcher_co_shutdown may be set without further ado.
> + * @qmp_dispatcher_co_busy must not be woken up in this case.
> + *
> + * If false, you also have to set @qmp_dispatcher_co_busy to true and
> + * wake up @qmp_dispatcher_co after pushing the new requests.
> + *
> + * The coroutine will automatically change this variable back to false
> + * before it yields.  Nobody else may set the variable to false.
> + *
> + * Access must be atomic for thread safety.
> + */
> +bool qmp_dispatcher_co_busy;
>  
>  /*
>   * Protects mon_list, monitor_qapi_event_state, coroutine_mon,
> @@ -623,9 +647,24 @@ void monitor_cleanup(void)
>      }
>      qemu_mutex_unlock(&monitor_lock);
>  
> -    /* QEMUBHs needs to be deleted before destroying the I/O thread */
> -    qemu_bh_delete(qmp_dispatcher_bh);
> -    qmp_dispatcher_bh = NULL;
> +    /*
> +     * The dispatcher needs to stop before destroying the I/O thread.
> +     *
> +     * We need to poll both qemu_aio_context and iohandler_ctx to make
> +     * sure that the dispatcher coroutine keeps making progress and
> +     * eventually terminates.  qemu_aio_context is automatically
> +     * polled by calling AIO_WAIT_WHILE on it, but we must poll
> +     * iohandler_ctx manually.
> +     */
> +    qmp_dispatcher_co_shutdown = true;
> +    if (!qatomic_xchg(&qmp_dispatcher_co_busy, true)) {
> +        aio_co_wake(qmp_dispatcher_co);
> +    }
> +
> +    AIO_WAIT_WHILE(qemu_get_aio_context(),
> +                   (aio_poll(iohandler_get_aio_context(), false),
> +                    qatomic_mb_read(&qmp_dispatcher_co_busy)));
> +
>      if (mon_iothread) {
>          iothread_destroy(mon_iothread);
>          mon_iothread = NULL;
> @@ -649,9 +688,9 @@ void monitor_init_globals_core(void)
>       * have commands assuming that context.  It would be nice to get
>       * rid of those assumptions.
>       */
> -    qmp_dispatcher_bh = aio_bh_new(iohandler_get_aio_context(),
> -                                   monitor_qmp_bh_dispatcher,
> -                                   NULL);
> +    qmp_dispatcher_co = qemu_coroutine_create(monitor_qmp_dispatcher_co, NULL);
> +    qatomic_mb_set(&qmp_dispatcher_co_busy, true);
> +    aio_co_schedule(iohandler_get_aio_context(), qmp_dispatcher_co);
>  }
>  
>  int monitor_init(MonitorOptions *opts, bool allow_hmp, Error **errp)
> diff --git a/monitor/qmp.c b/monitor/qmp.c
> index e746b3575d..b42f8c6af3 100644
> --- a/monitor/qmp.c
> +++ b/monitor/qmp.c
> @@ -133,6 +133,10 @@ static void monitor_qmp_respond(MonitorQMP *mon, QDict *rsp)
>      }
>  }
>  
> +/*
> + * Runs outside of coroutine context for OOB commands, but in
> + * coroutine context for everything else.
> + */
>  static void monitor_qmp_dispatch(MonitorQMP *mon, QObject *req)
>  {
>      QDict *rsp;
> @@ -206,43 +210,99 @@ static QMPRequest *monitor_qmp_requests_pop_any_with_lock(void)
>      return req_obj;
>  }
>  
> -void monitor_qmp_bh_dispatcher(void *data)
> +void coroutine_fn monitor_qmp_dispatcher_co(void *data)
>  {
> -    QMPRequest *req_obj = monitor_qmp_requests_pop_any_with_lock();
> +    QMPRequest *req_obj = NULL;
>      QDict *rsp;
>      bool need_resume;
>      MonitorQMP *mon;
>  
> -    if (!req_obj) {
> -        return;
> -    }
> +    while (true) {
> +        assert(qatomic_mb_read(&qmp_dispatcher_co_busy) == true);
>  
> -    mon = req_obj->mon;
> -    /*  qmp_oob_enabled() might change after "qmp_capabilities" */
> -    need_resume = !qmp_oob_enabled(mon) ||
> -        mon->qmp_requests->length == QMP_REQ_QUEUE_LEN_MAX - 1;
> -    qemu_mutex_unlock(&mon->qmp_queue_lock);
> -    if (req_obj->req) {
> -        QDict *qdict = qobject_to(QDict, req_obj->req);
> -        QObject *id = qdict ? qdict_get(qdict, "id") : NULL;
> -        trace_monitor_qmp_cmd_in_band(qobject_get_try_str(id) ?: "");
> -        monitor_qmp_dispatch(mon, req_obj->req);
> -    } else {
> -        assert(req_obj->err);
> -        rsp = qmp_error_response(req_obj->err);
> -        req_obj->err = NULL;
> -        monitor_qmp_respond(mon, rsp);
> -        qobject_unref(rsp);
> -    }
> +        /*
> +         * Mark the dispatcher as not busy already here so that we
> +         * don't miss any new requests coming in the middle of our
> +         * processing.
> +         */
> +        qatomic_mb_set(&qmp_dispatcher_co_busy, false);
>  
> -    if (need_resume) {
> -        /* Pairs with the monitor_suspend() in handle_qmp_command() */
> -        monitor_resume(&mon->common);
> -    }
> -    qmp_request_free(req_obj);
> +        while (!(req_obj = monitor_qmp_requests_pop_any_with_lock())) {
> +            /*
> +             * No more requests to process.  Wait to be reentered from
> +             * handle_qmp_command() when it pushes more requests, or
> +             * from monitor_cleanup() when it requests shutdown.
> +             */
> +            if (!qmp_dispatcher_co_shutdown) {
> +                qemu_coroutine_yield();
>  
> -    /* Reschedule instead of looping so the main loop stays responsive */
> -    qemu_bh_schedule(qmp_dispatcher_bh);
> +                /*
> +                 * busy must be set to true again by whoever
> +                 * rescheduled us to avoid double scheduling
> +                 */
> +                assert(qatomic_xchg(&qmp_dispatcher_co_busy, false) == true);
> +            }
> +
> +            /*
> +             * qmp_dispatcher_co_shutdown may have changed if we
> +             * yielded and were reentered from monitor_cleanup()
> +             */
> +            if (qmp_dispatcher_co_shutdown) {
> +                return;
> +            }
> +        }
> +
> +        if (qatomic_xchg(&qmp_dispatcher_co_busy, true) == true) {
> +            /*
> +             * Someone rescheduled us (probably because a new requests
> +             * came in), but we didn't actually yield. Do that now,
> +             * only to be immediately reentered and removed from the
> +             * list of scheduled coroutines.
> +             */
> +            qemu_coroutine_yield();
> +        }
> +
> +        /*
> +         * Move the coroutine from iohandler_ctx to qemu_aio_context for
> +         * executing the command handler so that it can make progress if it
> +         * involves an AIO_WAIT_WHILE().
> +         */
> +        aio_co_schedule(qemu_get_aio_context(), qmp_dispatcher_co);
> +        qemu_coroutine_yield();
> +
> +        mon = req_obj->mon;
> +        /* qmp_oob_enabled() might change after "qmp_capabilities" */
> +        need_resume = !qmp_oob_enabled(mon) ||
> +            mon->qmp_requests->length == QMP_REQ_QUEUE_LEN_MAX - 1;
> +        qemu_mutex_unlock(&mon->qmp_queue_lock);
> +        if (req_obj->req) {
> +            QDict *qdict = qobject_to(QDict, req_obj->req);
> +            QObject *id = qdict ? qdict_get(qdict, "id") : NULL;
> +            trace_monitor_qmp_cmd_in_band(qobject_get_try_str(id) ?: "");
> +            monitor_qmp_dispatch(mon, req_obj->req);
> +        } else {
> +            assert(req_obj->err);
> +            rsp = qmp_error_response(req_obj->err);
> +            req_obj->err = NULL;
> +            monitor_qmp_respond(mon, rsp);
> +            qobject_unref(rsp);
> +        }
> +
> +        if (need_resume) {
> +            /* Pairs with the monitor_suspend() in handle_qmp_command() */
> +            monitor_resume(&mon->common);
> +        }
> +        qmp_request_free(req_obj);
> +
> +        /*
> +         * Yield and reschedule so the main loop stays responsive.
> +         *
> +         * Move back to iohandler_ctx so that nested event loops for
> +         * qemu_aio_context don't start new monitor commands.
> +         */
> +        aio_co_schedule(iohandler_get_aio_context(), qmp_dispatcher_co);
> +        qemu_coroutine_yield();
> +    }
>  }
>  
>  static void handle_qmp_command(void *opaque, QObject *req, Error *err)
> @@ -303,7 +363,9 @@ static void handle_qmp_command(void *opaque, QObject *req, Error *err)
>      qemu_mutex_unlock(&mon->qmp_queue_lock);
>  
>      /* Kick the dispatcher routine */
> -    qemu_bh_schedule(qmp_dispatcher_bh);
> +    if (!qatomic_xchg(&qmp_dispatcher_co_busy, true)) {
> +        aio_co_wake(qmp_dispatcher_co);
> +    }
>  }
>  
>  static void monitor_qmp_read(void *opaque, const uint8_t *buf, int size)
> diff --git a/qapi/qmp-dispatch.c b/qapi/qmp-dispatch.c
> index 5677ba92ca..9a2d7dd29a 100644
> --- a/qapi/qmp-dispatch.c
> +++ b/qapi/qmp-dispatch.c
> @@ -12,12 +12,16 @@
>   */
>  
>  #include "qemu/osdep.h"
> +
> +#include "block/aio.h"
>  #include "qapi/error.h"
>  #include "qapi/qmp/dispatch.h"
>  #include "qapi/qmp/qdict.h"
>  #include "qapi/qmp/qjson.h"
>  #include "sysemu/runstate.h"
>  #include "qapi/qmp/qbool.h"
> +#include "qemu/coroutine.h"
> +#include "qemu/main-loop.h"
>  
>  static QDict *qmp_dispatch_check_obj(QDict *dict, bool allow_oob,
>                                       Error **errp)
> @@ -88,6 +92,30 @@ bool qmp_is_oob(const QDict *dict)
>          && !qdict_haskey(dict, "execute");
>  }
>  
> +typedef struct QmpDispatchBH {
> +    const QmpCommand *cmd;
> +    Monitor *cur_mon;
> +    QDict *args;
> +    QObject **ret;
> +    Error **errp;
> +    Coroutine *co;
> +} QmpDispatchBH;
> +
> +static void do_qmp_dispatch_bh(void *opaque)
> +{
> +    QmpDispatchBH *data = opaque;
> +
> +    assert(monitor_cur() == NULL);
> +    monitor_set_cur(qemu_coroutine_self(), data->cur_mon);
> +    data->cmd->fn(data->args, data->ret, data->errp);
> +    monitor_set_cur(qemu_coroutine_self(), NULL);
> +    aio_co_wake(data->co);
> +}
> +
> +/*
> + * Runs outside of coroutine context for OOB commands, but in coroutine
> + * context for everything else.
> + */
>  QDict *qmp_dispatch(const QmpCommandList *cmds, QObject *request,
>                      bool allow_oob, Monitor *cur_mon)
>  {
> @@ -153,12 +181,39 @@ QDict *qmp_dispatch(const QmpCommandList *cmds, QObject *request,
>          qobject_ref(args);
>      }
>  
> +    assert(!(oob && qemu_in_coroutine()));
>      assert(monitor_cur() == NULL);
> -    monitor_set_cur(qemu_coroutine_self(), cur_mon);
> +    if (!!(cmd->options & QCO_COROUTINE) == qemu_in_coroutine()) {
> +        monitor_set_cur(qemu_coroutine_self(), cur_mon);
> +        cmd->fn(args, &ret, &err);
> +        monitor_set_cur(qemu_coroutine_self(), NULL);
> +    } else {
> +       /*
> +        * Actual context doesn't match the one the command needs.
> +        *
> +        * Case 1: we are in coroutine context, but command does not
> +        * have QCO_COROUTINE.  We need to drop out of coroutine
> +        * context for executing it.
> +        *
> +        * Case 2: we are outside coroutine context, but command has
> +        * QCO_COROUTINE.  Can't actually happen, because we get here
> +        * outside coroutine context only when executing a command
> +        * out of band, and OOB commands never have QCO_COROUTINE.
> +        */
> +        assert(!oob && qemu_in_coroutine() && !(cmd->options & QCO_COROUTINE));
>  
> -    cmd->fn(args, &ret, &err);
> -
> -    monitor_set_cur(qemu_coroutine_self(), NULL);
> +        QmpDispatchBH data = {
> +            .cur_mon    = cur_mon,
> +            .cmd        = cmd,
> +            .args       = args,
> +            .ret        = &ret,
> +            .errp       = &err,
> +            .co         = qemu_coroutine_self(),
> +        };
> +        aio_bh_schedule_oneshot(qemu_get_aio_context(), do_qmp_dispatch_bh,
> +                                &data);
> +        qemu_coroutine_yield();
> +    }
>      qobject_unref(args);
>      if (err) {
>          /* or assert(!ret) after reviewing all handlers: */
> diff --git a/qapi/qmp-registry.c b/qapi/qmp-registry.c
> index d0f9a1d3e3..58c65b5052 100644
> --- a/qapi/qmp-registry.c
> +++ b/qapi/qmp-registry.c
> @@ -20,6 +20,9 @@ void qmp_register_command(QmpCommandList *cmds, const char *name,
>  {
>      QmpCommand *cmd = g_malloc0(sizeof(*cmd));
>  
> +    /* QCO_COROUTINE and QCO_ALLOW_OOB are incompatible for now */
> +    assert(!((options & QCO_COROUTINE) && (options & QCO_ALLOW_OOB)));
> +
>      cmd->name = name;
>      cmd->fn = fn;
>      cmd->enabled = true;
> diff --git a/util/aio-posix.c b/util/aio-posix.c
> index 280f27bb99..30f5354b1e 100644
> --- a/util/aio-posix.c
> +++ b/util/aio-posix.c
> @@ -15,6 +15,7 @@
>  
>  #include "qemu/osdep.h"
>  #include "block/block.h"
> +#include "qemu/main-loop.h"
>  #include "qemu/rcu.h"
>  #include "qemu/rcu_queue.h"
>  #include "qemu/sockets.h"
> @@ -558,8 +559,13 @@ bool aio_poll(AioContext *ctx, bool blocking)
>       * There cannot be two concurrent aio_poll calls for the same AioContext (or
>       * an aio_poll concurrent with a GSource prepare/check/dispatch callback).
>       * We rely on this below to avoid slow locked accesses to ctx->notify_me.
> +     *
> +     * aio_poll() may only be called in the AioContext's thread. iohandler_ctx
> +     * is special in that it runs in the main thread, but that thread's context
> +     * is qemu_aio_context.
>       */
> -    assert(in_aio_context_home_thread(ctx));
> +    assert(in_aio_context_home_thread(ctx == iohandler_get_aio_context() ?
> +                                      qemu_get_aio_context() : ctx));
>  
>      qemu_lockcnt_inc(&ctx->list_lock);


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PULL 09/14] qmp: Move dispatcher to a coroutine
  2020-10-12 10:47   ` Alex Bennée
@ 2020-10-12 11:25     ` Kevin Wolf
  2020-10-12 11:53       ` Philippe Mathieu-Daudé
  0 siblings, 1 reply; 27+ messages in thread
From: Kevin Wolf @ 2020-10-12 11:25 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Pavel Dovgalyuk, Markus Armbruster, Stefan Hajnoczi, qemu-devel

Am 12.10.2020 um 12:47 hat Alex Bennée geschrieben:
> 
> Markus Armbruster <armbru@redhat.com> writes:
> 
> > From: Kevin Wolf <kwolf@redhat.com>
> >
> > This moves the QMP dispatcher to a coroutine and runs all QMP command
> > handlers that declare 'coroutine': true in coroutine context so they
> > can avoid blocking the main loop while doing I/O or waiting for other
> > events.
> 
> This subtly changes the replay behaviour leading to a hang in:
> 
>   10:55:18 [alex.bennee@hackbox2:~/l/q/b/bisect] (625581c2…)|✚1(+1/-1) + ./tests/venv/bin/avocado run tests/acceptance/replay_kernel.py:ReplayKernel.test_arm_virt
>   Fetching asset from tests/acceptance/replay_kernel.py:ReplayKernel.test_arm_virt
>   JOB ID     : ec11fd2544f06e6c0d421f16afa757b49f7ed734
>   JOB LOG    : /home/alex.bennee/avocado/job-results/job-2020-10-12T11.40-ec11fd2/job.log
>    (1/1) tests/acceptance/replay_kernel.py:ReplayKernel.test_arm_virt: ERROR: Could not perform graceful shutdown (26.27 s)
>   RESULTS    : PASS 0 | ERROR 1 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
>   JOB TIME   : 27.77 s
> 
> Looking at the log:
> 
>   2020-10-12 11:40:31,426 __init__         L0085 DEBUG| [    3.887411] rtc-pl031 9010000.pl031: setting system clock to 2020-10-12 10:40:31 UTC (1602499231)
>   2020-10-12 11:40:31,428 __init__         L0085 DEBUG| [    3.887431] sr_init: No PMIC hook to init smartreflex
>   2020-10-12 11:40:31,447 __init__         L0085 DEBUG| [    3.897193] uart-pl011 9000000.pl011: no DMA platform data
>   2020-10-12 11:40:31,460 __init__         L0085 DEBUG| [    3.897242] md: Waiting for all devices to be available before autodetect
>   2020-10-12 11:40:31,462 __init__         L0085 DEBUG| [    3.897259] md: If you don't use raid, use raid=noautodetect
>   2020-10-12 11:40:31,475 __init__         L0085 DEBUG| [    3.897819] md: Autodetecting RAID arrays.
>   2020-10-12 11:40:31,476 __init__         L0085 DEBUG| [    3.897832] md: autorun ...
>   2020-10-12 11:40:31,477 __init__         L0085 DEBUG| [    3.897842] md: ... autorun DONE.
>   2020-10-12 11:40:31,483 __init__         L0085 DEBUG| [    3.897962] VFS: Cannot open root device "(null)" or unknown-block(0,0): error -6
>   2020-10-12 11:40:31,483 qmp              L0245 DEBUG| >>> {'execute': 'quit'}
>   2020-10-12 11:40:31,495 qmp              L0145 DEBUG| <<< {'timestamp': {'seconds': 1602499231, 'microseconds': 493379}, 'event': 'SHUTDOWN', 'data': {'guest': True, 'reason': 'guest-reset'}}
>   2020-10-12 11:40:31,733 machine          L0325 WARNI| qemu received signal 6; command: "./qemu-system-arm -display none -vga none -chardev socket,id=mon,path=/var/tmp/tmpzls53khe/qemu-8487-monitor.sock -mon chardev=mon,mode=control -machine virt -chardev socket,id=console,path=/var/tmp/tmpzls53khe/qemu-8487-console.sock,server,nowait -serial chardev:console -icount shift=1,rr=record,rrfile=/var/tmp/avocado_n00stdrf/avocado_job_aw60qdul/1-tests_acceptance_replay_kernel.py_ReplayKernel.test_arm_virt/replay.bin -kernel /home/alex.bennee/avocado/data/cache/by_location/62750ce9e069e69e6a7ff04ff54c382ee660b92a/vmlinuz -append printk.time=1 panic=-1 console=ttyAMA0 -net none -no-reboot"

This looks like a crash (SIGABRT) rather than a hang. Do you have a
stack trace for the crashed process?

Kevin

>   2020-10-12 11:40:31,734 stacktrace       L0039 ERROR|
>   2020-10-12 11:40:31,734 stacktrace       L0042 ERROR| Reproduced traceback from: /home/alex.bennee/lsrc/qemu.git/builds/bisect/tests/venv/lib/python3.6/site-packages/avocado/core/test.py:767
>   2020-10-12 11:40:31,735 stacktrace       L0045 ERROR| Traceback (most recent call last):
>   2020-10-12 11:40:31,735 stacktrace       L0045 ERROR|   File "/home/alex.bennee/lsrc/qemu.git/python/qemu/machine.py", line 435, in _do_shutdown
>   2020-10-12 11:40:31,735 stacktrace       L0045 ERROR|     self._soft_shutdown(timeout, has_quit)
>   2020-10-12 11:40:31,735 stacktrace       L0045 ERROR|   File "/home/alex.bennee/lsrc/qemu.git/python/qemu/machine.py", line 415, in _soft_shutdown
>   2020-10-12 11:40:31,735 stacktrace       L0045 ERROR|     self._qmp.cmd('quit')
>   2020-10-12 11:40:31,735 stacktrace       L0045 ERROR|   File "/home/alex.bennee/lsrc/qemu.git/python/qemu/qmp.py", line 266, in cmd
>   2020-10-12 11:40:31,735 stacktrace       L0045 ERROR|     return self.cmd_obj(qmp_cmd)
>   2020-10-12 11:40:31,735 stacktrace       L0045 ERROR|   File "/home/alex.bennee/lsrc/qemu.git/python/qemu/qmp.py", line 249, in cmd_obj
>   2020-10-12 11:40:31,736 stacktrace       L0045 ERROR|     raise QMPConnectError("Unexpected empty reply from server")
>   2020-10-12 11:40:31,736 stacktrace       L0045 ERROR| qemu.qmp.QMPConnectError: Unexpected empty reply from server
>   2020-10-12 11:40:31,736 stacktrace       L0045 ERROR|
>   2020-10-12 11:40:31,736 stacktrace       L0045 ERROR| The above exception was the direct cause of the following exception:
>   2020-10-12 11:40:31,736 stacktrace       L0045 ERROR|
>   2020-10-12 11:40:31,736 stacktrace       L0045 ERROR| Traceback (most recent call last):
>   2020-10-12 11:40:31,736 stacktrace       L0045 ERROR|   File "/home/alex.bennee/lsrc/qemu.git/builds/bisect/tests/acceptance/replay_kernel.py", line 128, in test_arm_virt
>   2020-10-12 11:40:31,736 stacktrace       L0045 ERROR|     self.run_rr(kernel_path, kernel_command_line, console_pattern, shift=1)
>   2020-10-12 11:40:31,736 stacktrace       L0045 ERROR|   File "/home/alex.bennee/lsrc/qemu.git/builds/bisect/tests/acceptance/replay_kernel.py", line 71, in run_rr
>   2020-10-12 11:40:31,736 stacktrace       L0045 ERROR|     True, shift, args, replay_path)
>   2020-10-12 11:40:31,736 stacktrace       L0045 ERROR|   File "/home/alex.bennee/lsrc/qemu.git/builds/bisect/tests/acceptance/replay_kernel.py", line 57, in run_vm
>   2020-10-12 11:40:31,736 stacktrace       L0045 ERROR|     vm.shutdown()
>   2020-10-12 11:40:31,736 stacktrace       L0045 ERROR|   File "/home/alex.bennee/lsrc/qemu.git/python/qemu/machine.py", line 465, in shutdown
>   2020-10-12 11:40:31,736 stacktrace       L0045 ERROR|     self._do_shutdown(timeout, has_quit)
>   2020-10-12 11:40:31,736 stacktrace       L0045 ERROR|   File "/home/alex.bennee/lsrc/qemu.git/python/qemu/machine.py", line 439, in _do_shutdown
>   2020-10-12 11:40:31,736 stacktrace       L0045 ERROR|     from exc
>   2020-10-12 11:40:31,737 stacktrace       L0045 ERROR| qemu.machine.AbnormalShutdown: Could not perform graceful shutdown
> 
> The commit before:
> 
>   2020-10-12 11:44:02,803 __init__         L0085 DEBUG| [    3.897837] md: ... autorun DONE.
>   2020-10-12 11:44:02,806 __init__         L0085 DEBUG| [    3.897958] VFS: Cannot open root device "(null)" or unknown-block(0,0): error -6
>   2020-10-12 11:44:02,806 qmp              L0245 DEBUG| >>> {'execute': 'quit'}
>   2020-10-12 11:44:02,814 qmp              L0250 DEBUG| <<< {'return': {}}
>   2020-10-12 11:44:02,879 replay_kernel    L0059 INFO | finished the recording with log size 237596 bytes
>   2020-10-12 11:44:02,879 replay_kernel    L0064 INFO | elapsed time 24.35 sec
>   2020-10-12 11:44:02,888 replay_kernel    L0044 INFO | replaying the execution...
>   2020-10-12 11:44:02,889 machine          L0362 DEBUG| VM launch command: './qemu-system-arm -display none -vga none -chardev socket,id=mon,path=/var/tmp/tmpihqswp6_/qemu-7747-monitor.sock -mon chardev=mon,mode=control -machine virt -chardev socket,id=console,path=/var/tmp/tmpihqswp6_/qemu-7747-console.sock,server,nowait -serial chardev:console
>   -icount shift=1,rr=replay,rrfile=/var/tmp/avocado_o3pfy2he/avocado_job_nf30vjqg/1-tests_acceptance_replay_kernel.py_ReplayKernel.test_arm_virt/replay.bin -kernel /home/alex.bennee/avocado/data/cache/by_location/62750ce9e069e69e6a7ff04ff54c382ee660b92a/vmlinuz -append printk.time=1 panic=-1 console=ttyAMA0 -net none -no-reboot'
>   2020-10-12 11:44:03,001 qmp              L0245 DEBUG| >>> {'execute': 'qmp_capabilities'}
>   2020-10-12 11:44:03,172 qmp              L0250 DEBUG| <<< {'return': {}}
>   2020-10-12 11:44:04,899 __init__         L0085 DEBUG| [    0.000000] Booting Linux on physical CPU 0x0
>   2020-10-12 11:44:04,904 __init__         L0085 DEBUG| [    0.000000] Linux version 4.18.16-300.fc29.armv7hl (mockbuild@buildvm-armv7-06.arm.fedoraproject.org) (gcc version 8.2.1 20180801 (Red Hat 8.2.1-2) (GCC)) #1 SMP Sun Oct 21 00:56:28 UTC 2018
> 
> 
> >
> > For commands that are not declared safe to run in a coroutine, the
> > dispatcher drops out of coroutine context by calling the QMP command
> > handler from a bottom half.
> >
> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > Reviewed-by: Markus Armbruster <armbru@redhat.com>
> > Message-Id: <20201005155855.256490-10-kwolf@redhat.com>
> > Reviewed-by: Markus Armbruster <armbru@redhat.com>
> > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> > Signed-off-by: Markus Armbruster <armbru@redhat.com>
> > ---
> >  include/qapi/qmp/dispatch.h |   1 +
> >  monitor/monitor-internal.h  |   6 +-
> >  monitor/monitor.c           |  55 +++++++++++++---
> >  monitor/qmp.c               | 122 +++++++++++++++++++++++++++---------
> >  qapi/qmp-dispatch.c         |  63 +++++++++++++++++--
> >  qapi/qmp-registry.c         |   3 +
> >  util/aio-posix.c            |   8 ++-
> >  7 files changed, 213 insertions(+), 45 deletions(-)
> >
> > diff --git a/include/qapi/qmp/dispatch.h b/include/qapi/qmp/dispatch.h
> > index 9fd2b720a7..af8d96c570 100644
> > --- a/include/qapi/qmp/dispatch.h
> > +++ b/include/qapi/qmp/dispatch.h
> > @@ -31,6 +31,7 @@ typedef enum QmpCommandOptions
> >  typedef struct QmpCommand
> >  {
> >      const char *name;
> > +    /* Runs in coroutine context if QCO_COROUTINE is set */
> >      QmpCommandFunc *fn;
> >      QmpCommandOptions options;
> >      QTAILQ_ENTRY(QmpCommand) node;
> > diff --git a/monitor/monitor-internal.h b/monitor/monitor-internal.h
> > index b39e03b744..b55d6df07f 100644
> > --- a/monitor/monitor-internal.h
> > +++ b/monitor/monitor-internal.h
> > @@ -155,7 +155,9 @@ static inline bool monitor_is_qmp(const Monitor *mon)
> >  
> >  typedef QTAILQ_HEAD(MonitorList, Monitor) MonitorList;
> >  extern IOThread *mon_iothread;
> > -extern QEMUBH *qmp_dispatcher_bh;
> > +extern Coroutine *qmp_dispatcher_co;
> > +extern bool qmp_dispatcher_co_shutdown;
> > +extern bool qmp_dispatcher_co_busy;
> >  extern QmpCommandList qmp_commands, qmp_cap_negotiation_commands;
> >  extern QemuMutex monitor_lock;
> >  extern MonitorList mon_list;
> > @@ -173,7 +175,7 @@ void monitor_fdsets_cleanup(void);
> >  
> >  void qmp_send_response(MonitorQMP *mon, const QDict *rsp);
> >  void monitor_data_destroy_qmp(MonitorQMP *mon);
> > -void monitor_qmp_bh_dispatcher(void *data);
> > +void coroutine_fn monitor_qmp_dispatcher_co(void *data);
> >  
> >  int get_monitor_def(int64_t *pval, const char *name);
> >  void help_cmd(Monitor *mon, const char *name);
> > diff --git a/monitor/monitor.c b/monitor/monitor.c
> > index ef68ca9d21..ceffe1a83b 100644
> > --- a/monitor/monitor.c
> > +++ b/monitor/monitor.c
> > @@ -55,8 +55,32 @@ typedef struct {
> >  /* Shared monitor I/O thread */
> >  IOThread *mon_iothread;
> >  
> > -/* Bottom half to dispatch the requests received from I/O thread */
> > -QEMUBH *qmp_dispatcher_bh;
> > +/* Coroutine to dispatch the requests received from I/O thread */
> > +Coroutine *qmp_dispatcher_co;
> > +
> > +/* Set to true when the dispatcher coroutine should terminate */
> > +bool qmp_dispatcher_co_shutdown;
> > +
> > +/*
> > + * qmp_dispatcher_co_busy is used for synchronisation between the
> > + * monitor thread and the main thread to ensure that the dispatcher
> > + * coroutine never gets scheduled a second time when it's already
> > + * scheduled (scheduling the same coroutine twice is forbidden).
> > + *
> > + * It is true if the coroutine is active and processing requests.
> > + * Additional requests may then be pushed onto mon->qmp_requests,
> > + * and @qmp_dispatcher_co_shutdown may be set without further ado.
> > + * @qmp_dispatcher_co_busy must not be woken up in this case.
> > + *
> > + * If false, you also have to set @qmp_dispatcher_co_busy to true and
> > + * wake up @qmp_dispatcher_co after pushing the new requests.
> > + *
> > + * The coroutine will automatically change this variable back to false
> > + * before it yields.  Nobody else may set the variable to false.
> > + *
> > + * Access must be atomic for thread safety.
> > + */
> > +bool qmp_dispatcher_co_busy;
> >  
> >  /*
> >   * Protects mon_list, monitor_qapi_event_state, coroutine_mon,
> > @@ -623,9 +647,24 @@ void monitor_cleanup(void)
> >      }
> >      qemu_mutex_unlock(&monitor_lock);
> >  
> > -    /* QEMUBHs needs to be deleted before destroying the I/O thread */
> > -    qemu_bh_delete(qmp_dispatcher_bh);
> > -    qmp_dispatcher_bh = NULL;
> > +    /*
> > +     * The dispatcher needs to stop before destroying the I/O thread.
> > +     *
> > +     * We need to poll both qemu_aio_context and iohandler_ctx to make
> > +     * sure that the dispatcher coroutine keeps making progress and
> > +     * eventually terminates.  qemu_aio_context is automatically
> > +     * polled by calling AIO_WAIT_WHILE on it, but we must poll
> > +     * iohandler_ctx manually.
> > +     */
> > +    qmp_dispatcher_co_shutdown = true;
> > +    if (!qatomic_xchg(&qmp_dispatcher_co_busy, true)) {
> > +        aio_co_wake(qmp_dispatcher_co);
> > +    }
> > +
> > +    AIO_WAIT_WHILE(qemu_get_aio_context(),
> > +                   (aio_poll(iohandler_get_aio_context(), false),
> > +                    qatomic_mb_read(&qmp_dispatcher_co_busy)));
> > +
> >      if (mon_iothread) {
> >          iothread_destroy(mon_iothread);
> >          mon_iothread = NULL;
> > @@ -649,9 +688,9 @@ void monitor_init_globals_core(void)
> >       * have commands assuming that context.  It would be nice to get
> >       * rid of those assumptions.
> >       */
> > -    qmp_dispatcher_bh = aio_bh_new(iohandler_get_aio_context(),
> > -                                   monitor_qmp_bh_dispatcher,
> > -                                   NULL);
> > +    qmp_dispatcher_co = qemu_coroutine_create(monitor_qmp_dispatcher_co, NULL);
> > +    qatomic_mb_set(&qmp_dispatcher_co_busy, true);
> > +    aio_co_schedule(iohandler_get_aio_context(), qmp_dispatcher_co);
> >  }
> >  
> >  int monitor_init(MonitorOptions *opts, bool allow_hmp, Error **errp)
> > diff --git a/monitor/qmp.c b/monitor/qmp.c
> > index e746b3575d..b42f8c6af3 100644
> > --- a/monitor/qmp.c
> > +++ b/monitor/qmp.c
> > @@ -133,6 +133,10 @@ static void monitor_qmp_respond(MonitorQMP *mon, QDict *rsp)
> >      }
> >  }
> >  
> > +/*
> > + * Runs outside of coroutine context for OOB commands, but in
> > + * coroutine context for everything else.
> > + */
> >  static void monitor_qmp_dispatch(MonitorQMP *mon, QObject *req)
> >  {
> >      QDict *rsp;
> > @@ -206,43 +210,99 @@ static QMPRequest *monitor_qmp_requests_pop_any_with_lock(void)
> >      return req_obj;
> >  }
> >  
> > -void monitor_qmp_bh_dispatcher(void *data)
> > +void coroutine_fn monitor_qmp_dispatcher_co(void *data)
> >  {
> > -    QMPRequest *req_obj = monitor_qmp_requests_pop_any_with_lock();
> > +    QMPRequest *req_obj = NULL;
> >      QDict *rsp;
> >      bool need_resume;
> >      MonitorQMP *mon;
> >  
> > -    if (!req_obj) {
> > -        return;
> > -    }
> > +    while (true) {
> > +        assert(qatomic_mb_read(&qmp_dispatcher_co_busy) == true);
> >  
> > -    mon = req_obj->mon;
> > -    /*  qmp_oob_enabled() might change after "qmp_capabilities" */
> > -    need_resume = !qmp_oob_enabled(mon) ||
> > -        mon->qmp_requests->length == QMP_REQ_QUEUE_LEN_MAX - 1;
> > -    qemu_mutex_unlock(&mon->qmp_queue_lock);
> > -    if (req_obj->req) {
> > -        QDict *qdict = qobject_to(QDict, req_obj->req);
> > -        QObject *id = qdict ? qdict_get(qdict, "id") : NULL;
> > -        trace_monitor_qmp_cmd_in_band(qobject_get_try_str(id) ?: "");
> > -        monitor_qmp_dispatch(mon, req_obj->req);
> > -    } else {
> > -        assert(req_obj->err);
> > -        rsp = qmp_error_response(req_obj->err);
> > -        req_obj->err = NULL;
> > -        monitor_qmp_respond(mon, rsp);
> > -        qobject_unref(rsp);
> > -    }
> > +        /*
> > +         * Mark the dispatcher as not busy already here so that we
> > +         * don't miss any new requests coming in the middle of our
> > +         * processing.
> > +         */
> > +        qatomic_mb_set(&qmp_dispatcher_co_busy, false);
> >  
> > -    if (need_resume) {
> > -        /* Pairs with the monitor_suspend() in handle_qmp_command() */
> > -        monitor_resume(&mon->common);
> > -    }
> > -    qmp_request_free(req_obj);
> > +        while (!(req_obj = monitor_qmp_requests_pop_any_with_lock())) {
> > +            /*
> > +             * No more requests to process.  Wait to be reentered from
> > +             * handle_qmp_command() when it pushes more requests, or
> > +             * from monitor_cleanup() when it requests shutdown.
> > +             */
> > +            if (!qmp_dispatcher_co_shutdown) {
> > +                qemu_coroutine_yield();
> >  
> > -    /* Reschedule instead of looping so the main loop stays responsive */
> > -    qemu_bh_schedule(qmp_dispatcher_bh);
> > +                /*
> > +                 * busy must be set to true again by whoever
> > +                 * rescheduled us to avoid double scheduling
> > +                 */
> > +                assert(qatomic_xchg(&qmp_dispatcher_co_busy, false) == true);
> > +            }
> > +
> > +            /*
> > +             * qmp_dispatcher_co_shutdown may have changed if we
> > +             * yielded and were reentered from monitor_cleanup()
> > +             */
> > +            if (qmp_dispatcher_co_shutdown) {
> > +                return;
> > +            }
> > +        }
> > +
> > +        if (qatomic_xchg(&qmp_dispatcher_co_busy, true) == true) {
> > +            /*
> > +             * Someone rescheduled us (probably because a new requests
> > +             * came in), but we didn't actually yield. Do that now,
> > +             * only to be immediately reentered and removed from the
> > +             * list of scheduled coroutines.
> > +             */
> > +            qemu_coroutine_yield();
> > +        }
> > +
> > +        /*
> > +         * Move the coroutine from iohandler_ctx to qemu_aio_context for
> > +         * executing the command handler so that it can make progress if it
> > +         * involves an AIO_WAIT_WHILE().
> > +         */
> > +        aio_co_schedule(qemu_get_aio_context(), qmp_dispatcher_co);
> > +        qemu_coroutine_yield();
> > +
> > +        mon = req_obj->mon;
> > +        /* qmp_oob_enabled() might change after "qmp_capabilities" */
> > +        need_resume = !qmp_oob_enabled(mon) ||
> > +            mon->qmp_requests->length == QMP_REQ_QUEUE_LEN_MAX - 1;
> > +        qemu_mutex_unlock(&mon->qmp_queue_lock);
> > +        if (req_obj->req) {
> > +            QDict *qdict = qobject_to(QDict, req_obj->req);
> > +            QObject *id = qdict ? qdict_get(qdict, "id") : NULL;
> > +            trace_monitor_qmp_cmd_in_band(qobject_get_try_str(id) ?: "");
> > +            monitor_qmp_dispatch(mon, req_obj->req);
> > +        } else {
> > +            assert(req_obj->err);
> > +            rsp = qmp_error_response(req_obj->err);
> > +            req_obj->err = NULL;
> > +            monitor_qmp_respond(mon, rsp);
> > +            qobject_unref(rsp);
> > +        }
> > +
> > +        if (need_resume) {
> > +            /* Pairs with the monitor_suspend() in handle_qmp_command() */
> > +            monitor_resume(&mon->common);
> > +        }
> > +        qmp_request_free(req_obj);
> > +
> > +        /*
> > +         * Yield and reschedule so the main loop stays responsive.
> > +         *
> > +         * Move back to iohandler_ctx so that nested event loops for
> > +         * qemu_aio_context don't start new monitor commands.
> > +         */
> > +        aio_co_schedule(iohandler_get_aio_context(), qmp_dispatcher_co);
> > +        qemu_coroutine_yield();
> > +    }
> >  }
> >  
> >  static void handle_qmp_command(void *opaque, QObject *req, Error *err)
> > @@ -303,7 +363,9 @@ static void handle_qmp_command(void *opaque, QObject *req, Error *err)
> >      qemu_mutex_unlock(&mon->qmp_queue_lock);
> >  
> >      /* Kick the dispatcher routine */
> > -    qemu_bh_schedule(qmp_dispatcher_bh);
> > +    if (!qatomic_xchg(&qmp_dispatcher_co_busy, true)) {
> > +        aio_co_wake(qmp_dispatcher_co);
> > +    }
> >  }
> >  
> >  static void monitor_qmp_read(void *opaque, const uint8_t *buf, int size)
> > diff --git a/qapi/qmp-dispatch.c b/qapi/qmp-dispatch.c
> > index 5677ba92ca..9a2d7dd29a 100644
> > --- a/qapi/qmp-dispatch.c
> > +++ b/qapi/qmp-dispatch.c
> > @@ -12,12 +12,16 @@
> >   */
> >  
> >  #include "qemu/osdep.h"
> > +
> > +#include "block/aio.h"
> >  #include "qapi/error.h"
> >  #include "qapi/qmp/dispatch.h"
> >  #include "qapi/qmp/qdict.h"
> >  #include "qapi/qmp/qjson.h"
> >  #include "sysemu/runstate.h"
> >  #include "qapi/qmp/qbool.h"
> > +#include "qemu/coroutine.h"
> > +#include "qemu/main-loop.h"
> >  
> >  static QDict *qmp_dispatch_check_obj(QDict *dict, bool allow_oob,
> >                                       Error **errp)
> > @@ -88,6 +92,30 @@ bool qmp_is_oob(const QDict *dict)
> >          && !qdict_haskey(dict, "execute");
> >  }
> >  
> > +typedef struct QmpDispatchBH {
> > +    const QmpCommand *cmd;
> > +    Monitor *cur_mon;
> > +    QDict *args;
> > +    QObject **ret;
> > +    Error **errp;
> > +    Coroutine *co;
> > +} QmpDispatchBH;
> > +
> > +static void do_qmp_dispatch_bh(void *opaque)
> > +{
> > +    QmpDispatchBH *data = opaque;
> > +
> > +    assert(monitor_cur() == NULL);
> > +    monitor_set_cur(qemu_coroutine_self(), data->cur_mon);
> > +    data->cmd->fn(data->args, data->ret, data->errp);
> > +    monitor_set_cur(qemu_coroutine_self(), NULL);
> > +    aio_co_wake(data->co);
> > +}
> > +
> > +/*
> > + * Runs outside of coroutine context for OOB commands, but in coroutine
> > + * context for everything else.
> > + */
> >  QDict *qmp_dispatch(const QmpCommandList *cmds, QObject *request,
> >                      bool allow_oob, Monitor *cur_mon)
> >  {
> > @@ -153,12 +181,39 @@ QDict *qmp_dispatch(const QmpCommandList *cmds, QObject *request,
> >          qobject_ref(args);
> >      }
> >  
> > +    assert(!(oob && qemu_in_coroutine()));
> >      assert(monitor_cur() == NULL);
> > -    monitor_set_cur(qemu_coroutine_self(), cur_mon);
> > +    if (!!(cmd->options & QCO_COROUTINE) == qemu_in_coroutine()) {
> > +        monitor_set_cur(qemu_coroutine_self(), cur_mon);
> > +        cmd->fn(args, &ret, &err);
> > +        monitor_set_cur(qemu_coroutine_self(), NULL);
> > +    } else {
> > +       /*
> > +        * Actual context doesn't match the one the command needs.
> > +        *
> > +        * Case 1: we are in coroutine context, but command does not
> > +        * have QCO_COROUTINE.  We need to drop out of coroutine
> > +        * context for executing it.
> > +        *
> > +        * Case 2: we are outside coroutine context, but command has
> > +        * QCO_COROUTINE.  Can't actually happen, because we get here
> > +        * outside coroutine context only when executing a command
> > +        * out of band, and OOB commands never have QCO_COROUTINE.
> > +        */
> > +        assert(!oob && qemu_in_coroutine() && !(cmd->options & QCO_COROUTINE));
> >  
> > -    cmd->fn(args, &ret, &err);
> > -
> > -    monitor_set_cur(qemu_coroutine_self(), NULL);
> > +        QmpDispatchBH data = {
> > +            .cur_mon    = cur_mon,
> > +            .cmd        = cmd,
> > +            .args       = args,
> > +            .ret        = &ret,
> > +            .errp       = &err,
> > +            .co         = qemu_coroutine_self(),
> > +        };
> > +        aio_bh_schedule_oneshot(qemu_get_aio_context(), do_qmp_dispatch_bh,
> > +                                &data);
> > +        qemu_coroutine_yield();
> > +    }
> >      qobject_unref(args);
> >      if (err) {
> >          /* or assert(!ret) after reviewing all handlers: */
> > diff --git a/qapi/qmp-registry.c b/qapi/qmp-registry.c
> > index d0f9a1d3e3..58c65b5052 100644
> > --- a/qapi/qmp-registry.c
> > +++ b/qapi/qmp-registry.c
> > @@ -20,6 +20,9 @@ void qmp_register_command(QmpCommandList *cmds, const char *name,
> >  {
> >      QmpCommand *cmd = g_malloc0(sizeof(*cmd));
> >  
> > +    /* QCO_COROUTINE and QCO_ALLOW_OOB are incompatible for now */
> > +    assert(!((options & QCO_COROUTINE) && (options & QCO_ALLOW_OOB)));
> > +
> >      cmd->name = name;
> >      cmd->fn = fn;
> >      cmd->enabled = true;
> > diff --git a/util/aio-posix.c b/util/aio-posix.c
> > index 280f27bb99..30f5354b1e 100644
> > --- a/util/aio-posix.c
> > +++ b/util/aio-posix.c
> > @@ -15,6 +15,7 @@
> >  
> >  #include "qemu/osdep.h"
> >  #include "block/block.h"
> > +#include "qemu/main-loop.h"
> >  #include "qemu/rcu.h"
> >  #include "qemu/rcu_queue.h"
> >  #include "qemu/sockets.h"
> > @@ -558,8 +559,13 @@ bool aio_poll(AioContext *ctx, bool blocking)
> >       * There cannot be two concurrent aio_poll calls for the same AioContext (or
> >       * an aio_poll concurrent with a GSource prepare/check/dispatch callback).
> >       * We rely on this below to avoid slow locked accesses to ctx->notify_me.
> > +     *
> > +     * aio_poll() may only be called in the AioContext's thread. iohandler_ctx
> > +     * is special in that it runs in the main thread, but that thread's context
> > +     * is qemu_aio_context.
> >       */
> > -    assert(in_aio_context_home_thread(ctx));
> > +    assert(in_aio_context_home_thread(ctx == iohandler_get_aio_context() ?
> > +                                      qemu_get_aio_context() : ctx));
> >  
> >      qemu_lockcnt_inc(&ctx->list_lock);
> 
> 
> -- 
> Alex Bennée
> 



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PULL 09/14] qmp: Move dispatcher to a coroutine
  2020-10-12 11:25     ` Kevin Wolf
@ 2020-10-12 11:53       ` Philippe Mathieu-Daudé
  2020-10-12 12:47         ` Kevin Wolf
  0 siblings, 1 reply; 27+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-10-12 11:53 UTC (permalink / raw)
  To: Kevin Wolf, Alex Bennée
  Cc: Pavel Dovgalyuk, Markus Armbruster, Stefan Hajnoczi, qemu-devel

On 10/12/20 1:25 PM, Kevin Wolf wrote:
> Am 12.10.2020 um 12:47 hat Alex Bennée geschrieben:
>>
>> Markus Armbruster <armbru@redhat.com> writes:
>>
>>> From: Kevin Wolf <kwolf@redhat.com>
>>>
>>> This moves the QMP dispatcher to a coroutine and runs all QMP command
>>> handlers that declare 'coroutine': true in coroutine context so they
>>> can avoid blocking the main loop while doing I/O or waiting for other
>>> events.
>>
>> This subtly changes the replay behaviour leading to a hang in:
>>
>>    10:55:18 [alex.bennee@hackbox2:~/l/q/b/bisect] (625581c2…)|✚1(+1/-1) + ./tests/venv/bin/avocado run tests/acceptance/replay_kernel.py:ReplayKernel.test_arm_virt
>>    Fetching asset from tests/acceptance/replay_kernel.py:ReplayKernel.test_arm_virt
>>    JOB ID     : ec11fd2544f06e6c0d421f16afa757b49f7ed734
>>    JOB LOG    : /home/alex.bennee/avocado/job-results/job-2020-10-12T11.40-ec11fd2/job.log
>>     (1/1) tests/acceptance/replay_kernel.py:ReplayKernel.test_arm_virt: ERROR: Could not perform graceful shutdown (26.27 s)
>>    RESULTS    : PASS 0 | ERROR 1 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
>>    JOB TIME   : 27.77 s
>>
>> Looking at the log:
>>
>>    2020-10-12 11:40:31,426 __init__         L0085 DEBUG| [    3.887411] rtc-pl031 9010000.pl031: setting system clock to 2020-10-12 10:40:31 UTC (1602499231)
>>    2020-10-12 11:40:31,428 __init__         L0085 DEBUG| [    3.887431] sr_init: No PMIC hook to init smartreflex
>>    2020-10-12 11:40:31,447 __init__         L0085 DEBUG| [    3.897193] uart-pl011 9000000.pl011: no DMA platform data
>>    2020-10-12 11:40:31,460 __init__         L0085 DEBUG| [    3.897242] md: Waiting for all devices to be available before autodetect
>>    2020-10-12 11:40:31,462 __init__         L0085 DEBUG| [    3.897259] md: If you don't use raid, use raid=noautodetect
>>    2020-10-12 11:40:31,475 __init__         L0085 DEBUG| [    3.897819] md: Autodetecting RAID arrays.
>>    2020-10-12 11:40:31,476 __init__         L0085 DEBUG| [    3.897832] md: autorun ...
>>    2020-10-12 11:40:31,477 __init__         L0085 DEBUG| [    3.897842] md: ... autorun DONE.
>>    2020-10-12 11:40:31,483 __init__         L0085 DEBUG| [    3.897962] VFS: Cannot open root device "(null)" or unknown-block(0,0): error -6
>>    2020-10-12 11:40:31,483 qmp              L0245 DEBUG| >>> {'execute': 'quit'}
>>    2020-10-12 11:40:31,495 qmp              L0145 DEBUG| <<< {'timestamp': {'seconds': 1602499231, 'microseconds': 493379}, 'event': 'SHUTDOWN', 'data': {'guest': True, 'reason': 'guest-reset'}}
>>    2020-10-12 11:40:31,733 machine          L0325 WARNI| qemu received signal 6; command: "./qemu-system-arm -display none -vga none -chardev socket,id=mon,path=/var/tmp/tmpzls53khe/qemu-8487-monitor.sock -mon chardev=mon,mode=control -machine virt -chardev socket,id=console,path=/var/tmp/tmpzls53khe/qemu-8487-console.sock,server,nowait -serial chardev:console -icount shift=1,rr=record,rrfile=/var/tmp/avocado_n00stdrf/avocado_job_aw60qdul/1-tests_acceptance_replay_kernel.py_ReplayKernel.test_arm_virt/replay.bin -kernel /home/alex.bennee/avocado/data/cache/by_location/62750ce9e069e69e6a7ff04ff54c382ee660b92a/vmlinuz -append printk.time=1 panic=-1 console=ttyAMA0 -net none -no-reboot"
> 
> This looks like a crash (SIGABRT) rather than a hang. Do you have a
> stack trace for the crashed process?

No crash, exit(0):

VFS: Cannot open root device "(null)" or unknown-block(0,0): error -6
Please append a correct "root=" boot option; here are the available 
partitions:
Kernel panic - not syncing: VFS: Unable to mount root fs on 
unknown-block(0,0)
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.18.16-300.fc29.armv7hl #1
Hardware name: Generic DT based system
[<c0313f7c>] (unwind_backtrace) from [<c030dc64>] (show_stack+0x20/0x24)
[<c030dc64>] (show_stack) from [<c0b50ec4>] (dump_stack+0x88/0xa8)
[<c0b50ec4>] (dump_stack) from [<c03592f8>] (panic+0xd4/0x26c)
[<c03592f8>] (panic) from [<c110183c>] (mount_block_root+0x250/0x2ec)
[<c110183c>] (mount_block_root) from [<c1101950>] (mount_root+0x78/0x90)
[<c1101950>] (mount_root) from [<c1101ac4>] (prepare_namespace+0x15c/0x19c)
[<c1101ac4>] (prepare_namespace) from [<c11012e8>] 
(kernel_init_freeable+0x2c0/0x370)
[<c11012e8>] (kernel_init_freeable) from [<c0b63914>] 
(kernel_init+0x18/0x128)
[<c0b63914>] (kernel_init) from [<c03010e8>] (ret_from_fork+0x14/0x2c)
Exception stack(0xc790bfb0 to 0xc790bff8)
bfa0:                                     00000000 00000000 00000000 
00000000
bfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
00000000
bfe0: 00000000 00000000 00000000 00000000 00000013 00000000

-> PSCI call
    -> QEMU_PSCI_0_2_FN_SYSTEM_RESET
       -> SHUTDOWN_CAUSE_GUEST_RESET
          -> exit(0)

> 
> Kevin
> 
>>    2020-10-12 11:40:31,734 stacktrace       L0039 ERROR|
>>    2020-10-12 11:40:31,734 stacktrace       L0042 ERROR| Reproduced traceback from: /home/alex.bennee/lsrc/qemu.git/builds/bisect/tests/venv/lib/python3.6/site-packages/avocado/core/test.py:767
>>    2020-10-12 11:40:31,735 stacktrace       L0045 ERROR| Traceback (most recent call last):
>>    2020-10-12 11:40:31,735 stacktrace       L0045 ERROR|   File "/home/alex.bennee/lsrc/qemu.git/python/qemu/machine.py", line 435, in _do_shutdown
>>    2020-10-12 11:40:31,735 stacktrace       L0045 ERROR|     self._soft_shutdown(timeout, has_quit)
>>    2020-10-12 11:40:31,735 stacktrace       L0045 ERROR|   File "/home/alex.bennee/lsrc/qemu.git/python/qemu/machine.py", line 415, in _soft_shutdown
>>    2020-10-12 11:40:31,735 stacktrace       L0045 ERROR|     self._qmp.cmd('quit')
>>    2020-10-12 11:40:31,735 stacktrace       L0045 ERROR|   File "/home/alex.bennee/lsrc/qemu.git/python/qemu/qmp.py", line 266, in cmd
>>    2020-10-12 11:40:31,735 stacktrace       L0045 ERROR|     return self.cmd_obj(qmp_cmd)
>>    2020-10-12 11:40:31,735 stacktrace       L0045 ERROR|   File "/home/alex.bennee/lsrc/qemu.git/python/qemu/qmp.py", line 249, in cmd_obj
>>    2020-10-12 11:40:31,736 stacktrace       L0045 ERROR|     raise QMPConnectError("Unexpected empty reply from server")
>>    2020-10-12 11:40:31,736 stacktrace       L0045 ERROR| qemu.qmp.QMPConnectError: Unexpected empty reply from server
>>    2020-10-12 11:40:31,736 stacktrace       L0045 ERROR|
>>    2020-10-12 11:40:31,736 stacktrace       L0045 ERROR| The above exception was the direct cause of the following exception:
>>    2020-10-12 11:40:31,736 stacktrace       L0045 ERROR|
>>    2020-10-12 11:40:31,736 stacktrace       L0045 ERROR| Traceback (most recent call last):
>>    2020-10-12 11:40:31,736 stacktrace       L0045 ERROR|   File "/home/alex.bennee/lsrc/qemu.git/builds/bisect/tests/acceptance/replay_kernel.py", line 128, in test_arm_virt
>>    2020-10-12 11:40:31,736 stacktrace       L0045 ERROR|     self.run_rr(kernel_path, kernel_command_line, console_pattern, shift=1)
>>    2020-10-12 11:40:31,736 stacktrace       L0045 ERROR|   File "/home/alex.bennee/lsrc/qemu.git/builds/bisect/tests/acceptance/replay_kernel.py", line 71, in run_rr
>>    2020-10-12 11:40:31,736 stacktrace       L0045 ERROR|     True, shift, args, replay_path)
>>    2020-10-12 11:40:31,736 stacktrace       L0045 ERROR|   File "/home/alex.bennee/lsrc/qemu.git/builds/bisect/tests/acceptance/replay_kernel.py", line 57, in run_vm
>>    2020-10-12 11:40:31,736 stacktrace       L0045 ERROR|     vm.shutdown()
>>    2020-10-12 11:40:31,736 stacktrace       L0045 ERROR|   File "/home/alex.bennee/lsrc/qemu.git/python/qemu/machine.py", line 465, in shutdown
>>    2020-10-12 11:40:31,736 stacktrace       L0045 ERROR|     self._do_shutdown(timeout, has_quit)
>>    2020-10-12 11:40:31,736 stacktrace       L0045 ERROR|   File "/home/alex.bennee/lsrc/qemu.git/python/qemu/machine.py", line 439, in _do_shutdown
>>    2020-10-12 11:40:31,736 stacktrace       L0045 ERROR|     from exc
>>    2020-10-12 11:40:31,737 stacktrace       L0045 ERROR| qemu.machine.AbnormalShutdown: Could not perform graceful shutdown
>>
>> The commit before:
>>
>>    2020-10-12 11:44:02,803 __init__         L0085 DEBUG| [    3.897837] md: ... autorun DONE.
>>    2020-10-12 11:44:02,806 __init__         L0085 DEBUG| [    3.897958] VFS: Cannot open root device "(null)" or unknown-block(0,0): error -6
>>    2020-10-12 11:44:02,806 qmp              L0245 DEBUG| >>> {'execute': 'quit'}
>>    2020-10-12 11:44:02,814 qmp              L0250 DEBUG| <<< {'return': {}}
>>    2020-10-12 11:44:02,879 replay_kernel    L0059 INFO | finished the recording with log size 237596 bytes
>>    2020-10-12 11:44:02,879 replay_kernel    L0064 INFO | elapsed time 24.35 sec
>>    2020-10-12 11:44:02,888 replay_kernel    L0044 INFO | replaying the execution...
>>    2020-10-12 11:44:02,889 machine          L0362 DEBUG| VM launch command: './qemu-system-arm -display none -vga none -chardev socket,id=mon,path=/var/tmp/tmpihqswp6_/qemu-7747-monitor.sock -mon chardev=mon,mode=control -machine virt -chardev socket,id=console,path=/var/tmp/tmpihqswp6_/qemu-7747-console.sock,server,nowait -serial chardev:console
>>    -icount shift=1,rr=replay,rrfile=/var/tmp/avocado_o3pfy2he/avocado_job_nf30vjqg/1-tests_acceptance_replay_kernel.py_ReplayKernel.test_arm_virt/replay.bin -kernel /home/alex.bennee/avocado/data/cache/by_location/62750ce9e069e69e6a7ff04ff54c382ee660b92a/vmlinuz -append printk.time=1 panic=-1 console=ttyAMA0 -net none -no-reboot'
>>    2020-10-12 11:44:03,001 qmp              L0245 DEBUG| >>> {'execute': 'qmp_capabilities'}
>>    2020-10-12 11:44:03,172 qmp              L0250 DEBUG| <<< {'return': {}}
>>    2020-10-12 11:44:04,899 __init__         L0085 DEBUG| [    0.000000] Booting Linux on physical CPU 0x0
>>    2020-10-12 11:44:04,904 __init__         L0085 DEBUG| [    0.000000] Linux version 4.18.16-300.fc29.armv7hl (mockbuild@buildvm-armv7-06.arm.fedoraproject.org) (gcc version 8.2.1 20180801 (Red Hat 8.2.1-2) (GCC)) #1 SMP Sun Oct 21 00:56:28 UTC 2018
[...]


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PULL 09/14] qmp: Move dispatcher to a coroutine
  2020-10-12 11:53       ` Philippe Mathieu-Daudé
@ 2020-10-12 12:47         ` Kevin Wolf
  2020-10-12 15:02           ` Alex Bennée
  0 siblings, 1 reply; 27+ messages in thread
From: Kevin Wolf @ 2020-10-12 12:47 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: Pavel Dovgalyuk, Alex Bennée, Markus Armbruster,
	Stefan Hajnoczi, qemu-devel

Am 12.10.2020 um 13:53 hat Philippe Mathieu-Daudé geschrieben:
> On 10/12/20 1:25 PM, Kevin Wolf wrote:
> > Am 12.10.2020 um 12:47 hat Alex Bennée geschrieben:
> > > 
> > > Markus Armbruster <armbru@redhat.com> writes:
> > > 
> > > > From: Kevin Wolf <kwolf@redhat.com>
> > > > 
> > > > This moves the QMP dispatcher to a coroutine and runs all QMP command
> > > > handlers that declare 'coroutine': true in coroutine context so they
> > > > can avoid blocking the main loop while doing I/O or waiting for other
> > > > events.
> > > 
> > > This subtly changes the replay behaviour leading to a hang in:
> > > 
> > >    10:55:18 [alex.bennee@hackbox2:~/l/q/b/bisect] (625581c2…)|✚1(+1/-1) + ./tests/venv/bin/avocado run tests/acceptance/replay_kernel.py:ReplayKernel.test_arm_virt
> > >    Fetching asset from tests/acceptance/replay_kernel.py:ReplayKernel.test_arm_virt
> > >    JOB ID     : ec11fd2544f06e6c0d421f16afa757b49f7ed734
> > >    JOB LOG    : /home/alex.bennee/avocado/job-results/job-2020-10-12T11.40-ec11fd2/job.log
> > >     (1/1) tests/acceptance/replay_kernel.py:ReplayKernel.test_arm_virt: ERROR: Could not perform graceful shutdown (26.27 s)
> > >    RESULTS    : PASS 0 | ERROR 1 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
> > >    JOB TIME   : 27.77 s
> > > 
> > > Looking at the log:
> > > 
> > >    2020-10-12 11:40:31,426 __init__         L0085 DEBUG| [    3.887411] rtc-pl031 9010000.pl031: setting system clock to 2020-10-12 10:40:31 UTC (1602499231)
> > >    2020-10-12 11:40:31,428 __init__         L0085 DEBUG| [    3.887431] sr_init: No PMIC hook to init smartreflex
> > >    2020-10-12 11:40:31,447 __init__         L0085 DEBUG| [    3.897193] uart-pl011 9000000.pl011: no DMA platform data
> > >    2020-10-12 11:40:31,460 __init__         L0085 DEBUG| [    3.897242] md: Waiting for all devices to be available before autodetect
> > >    2020-10-12 11:40:31,462 __init__         L0085 DEBUG| [    3.897259] md: If you don't use raid, use raid=noautodetect
> > >    2020-10-12 11:40:31,475 __init__         L0085 DEBUG| [    3.897819] md: Autodetecting RAID arrays.
> > >    2020-10-12 11:40:31,476 __init__         L0085 DEBUG| [    3.897832] md: autorun ...
> > >    2020-10-12 11:40:31,477 __init__         L0085 DEBUG| [    3.897842] md: ... autorun DONE.
> > >    2020-10-12 11:40:31,483 __init__         L0085 DEBUG| [    3.897962] VFS: Cannot open root device "(null)" or unknown-block(0,0): error -6
> > >    2020-10-12 11:40:31,483 qmp              L0245 DEBUG| >>> {'execute': 'quit'}
> > >    2020-10-12 11:40:31,495 qmp              L0145 DEBUG| <<< {'timestamp': {'seconds': 1602499231, 'microseconds': 493379}, 'event': 'SHUTDOWN', 'data': {'guest': True, 'reason': 'guest-reset'}}
> > >    2020-10-12 11:40:31,733 machine          L0325 WARNI| qemu received signal 6; command: "./qemu-system-arm -display none -vga none -chardev socket,id=mon,path=/var/tmp/tmpzls53khe/qemu-8487-monitor.sock -mon chardev=mon,mode=control -machine virt -chardev socket,id=console,path=/var/tmp/tmpzls53khe/qemu-8487-console.sock,server,nowait -serial chardev:console -icount shift=1,rr=record,rrfile=/var/tmp/avocado_n00stdrf/avocado_job_aw60qdul/1-tests_acceptance_replay_kernel.py_ReplayKernel.test_arm_virt/replay.bin -kernel /home/alex.bennee/avocado/data/cache/by_location/62750ce9e069e69e6a7ff04ff54c382ee660b92a/vmlinuz -append printk.time=1 panic=-1 console=ttyAMA0 -net none -no-reboot"
> > 
> > This looks like a crash (SIGABRT) rather than a hang. Do you have a
> > stack trace for the crashed process?
> 
> No crash, exit(0):

Why does the log say "qemu received signal 6" then?

> VFS: Cannot open root device "(null)" or unknown-block(0,0): error -6

Alex has this error in the logs before this commit, so I assume this is
expected. All of the following is then probably expected, too, because
it follows directly from this error:

> Please append a correct "root=" boot option; here are the available
> partitions:
> Kernel panic - not syncing: VFS: Unable to mount root fs on
> unknown-block(0,0)
> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.18.16-300.fc29.armv7hl #1
> Hardware name: Generic DT based system
> [<c0313f7c>] (unwind_backtrace) from [<c030dc64>] (show_stack+0x20/0x24)
> [<c030dc64>] (show_stack) from [<c0b50ec4>] (dump_stack+0x88/0xa8)
> [<c0b50ec4>] (dump_stack) from [<c03592f8>] (panic+0xd4/0x26c)
> [<c03592f8>] (panic) from [<c110183c>] (mount_block_root+0x250/0x2ec)
> [<c110183c>] (mount_block_root) from [<c1101950>] (mount_root+0x78/0x90)
> [<c1101950>] (mount_root) from [<c1101ac4>] (prepare_namespace+0x15c/0x19c)
> [<c1101ac4>] (prepare_namespace) from [<c11012e8>]
> (kernel_init_freeable+0x2c0/0x370)
> [<c11012e8>] (kernel_init_freeable) from [<c0b63914>]
> (kernel_init+0x18/0x128)
> [<c0b63914>] (kernel_init) from [<c03010e8>] (ret_from_fork+0x14/0x2c)
> Exception stack(0xc790bfb0 to 0xc790bff8)
> bfa0:                                     00000000 00000000 00000000
> 00000000
> bfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 00000000
> bfe0: 00000000 00000000 00000000 00000000 00000013 00000000
> 
> -> PSCI call
>    -> QEMU_PSCI_0_2_FN_SYSTEM_RESET
>       -> SHUTDOWN_CAUSE_GUEST_RESET
>          -> exit(0)

Hm... So you're saying that the test sends a 'quit' QMP command, but
before it could be processed, the guest causes QEMU to exit, so the test
will never receive a reply to its request?

If commit 9ce44e2ce2 changes anything about this, it seems to me that it
would be that more QMP commands are processed during monitor_cleanup()
because it doesn't just delete the dispatcher BH, but waits until it's
not busy any more.

Looking at this code again, however, the order in monitor_cleanup() is
probably wrong. We should first shut down the dispatcher (which may
still be using Monitor objects) and then destroy the monitors. This
could possibly explain a crash, but probably not wrong results with a
successful shutdown.

Kevin

> > >    2020-10-12 11:40:31,734 stacktrace       L0039 ERROR|
> > >    2020-10-12 11:40:31,734 stacktrace       L0042 ERROR| Reproduced traceback from: /home/alex.bennee/lsrc/qemu.git/builds/bisect/tests/venv/lib/python3.6/site-packages/avocado/core/test.py:767
> > >    2020-10-12 11:40:31,735 stacktrace       L0045 ERROR| Traceback (most recent call last):
> > >    2020-10-12 11:40:31,735 stacktrace       L0045 ERROR|   File "/home/alex.bennee/lsrc/qemu.git/python/qemu/machine.py", line 435, in _do_shutdown
> > >    2020-10-12 11:40:31,735 stacktrace       L0045 ERROR|     self._soft_shutdown(timeout, has_quit)
> > >    2020-10-12 11:40:31,735 stacktrace       L0045 ERROR|   File "/home/alex.bennee/lsrc/qemu.git/python/qemu/machine.py", line 415, in _soft_shutdown
> > >    2020-10-12 11:40:31,735 stacktrace       L0045 ERROR|     self._qmp.cmd('quit')
> > >    2020-10-12 11:40:31,735 stacktrace       L0045 ERROR|   File "/home/alex.bennee/lsrc/qemu.git/python/qemu/qmp.py", line 266, in cmd
> > >    2020-10-12 11:40:31,735 stacktrace       L0045 ERROR|     return self.cmd_obj(qmp_cmd)
> > >    2020-10-12 11:40:31,735 stacktrace       L0045 ERROR|   File "/home/alex.bennee/lsrc/qemu.git/python/qemu/qmp.py", line 249, in cmd_obj
> > >    2020-10-12 11:40:31,736 stacktrace       L0045 ERROR|     raise QMPConnectError("Unexpected empty reply from server")
> > >    2020-10-12 11:40:31,736 stacktrace       L0045 ERROR| qemu.qmp.QMPConnectError: Unexpected empty reply from server
> > >    2020-10-12 11:40:31,736 stacktrace       L0045 ERROR|
> > >    2020-10-12 11:40:31,736 stacktrace       L0045 ERROR| The above exception was the direct cause of the following exception:
> > >    2020-10-12 11:40:31,736 stacktrace       L0045 ERROR|
> > >    2020-10-12 11:40:31,736 stacktrace       L0045 ERROR| Traceback (most recent call last):
> > >    2020-10-12 11:40:31,736 stacktrace       L0045 ERROR|   File "/home/alex.bennee/lsrc/qemu.git/builds/bisect/tests/acceptance/replay_kernel.py", line 128, in test_arm_virt
> > >    2020-10-12 11:40:31,736 stacktrace       L0045 ERROR|     self.run_rr(kernel_path, kernel_command_line, console_pattern, shift=1)
> > >    2020-10-12 11:40:31,736 stacktrace       L0045 ERROR|   File "/home/alex.bennee/lsrc/qemu.git/builds/bisect/tests/acceptance/replay_kernel.py", line 71, in run_rr
> > >    2020-10-12 11:40:31,736 stacktrace       L0045 ERROR|     True, shift, args, replay_path)
> > >    2020-10-12 11:40:31,736 stacktrace       L0045 ERROR|   File "/home/alex.bennee/lsrc/qemu.git/builds/bisect/tests/acceptance/replay_kernel.py", line 57, in run_vm
> > >    2020-10-12 11:40:31,736 stacktrace       L0045 ERROR|     vm.shutdown()
> > >    2020-10-12 11:40:31,736 stacktrace       L0045 ERROR|   File "/home/alex.bennee/lsrc/qemu.git/python/qemu/machine.py", line 465, in shutdown
> > >    2020-10-12 11:40:31,736 stacktrace       L0045 ERROR|     self._do_shutdown(timeout, has_quit)
> > >    2020-10-12 11:40:31,736 stacktrace       L0045 ERROR|   File "/home/alex.bennee/lsrc/qemu.git/python/qemu/machine.py", line 439, in _do_shutdown
> > >    2020-10-12 11:40:31,736 stacktrace       L0045 ERROR|     from exc
> > >    2020-10-12 11:40:31,737 stacktrace       L0045 ERROR| qemu.machine.AbnormalShutdown: Could not perform graceful shutdown
> > > 
> > > The commit before:
> > > 
> > >    2020-10-12 11:44:02,803 __init__         L0085 DEBUG| [    3.897837] md: ... autorun DONE.
> > >    2020-10-12 11:44:02,806 __init__         L0085 DEBUG| [    3.897958] VFS: Cannot open root device "(null)" or unknown-block(0,0): error -6
> > >    2020-10-12 11:44:02,806 qmp              L0245 DEBUG| >>> {'execute': 'quit'}
> > >    2020-10-12 11:44:02,814 qmp              L0250 DEBUG| <<< {'return': {}}
> > >    2020-10-12 11:44:02,879 replay_kernel    L0059 INFO | finished the recording with log size 237596 bytes
> > >    2020-10-12 11:44:02,879 replay_kernel    L0064 INFO | elapsed time 24.35 sec
> > >    2020-10-12 11:44:02,888 replay_kernel    L0044 INFO | replaying the execution...
> > >    2020-10-12 11:44:02,889 machine          L0362 DEBUG| VM launch command: './qemu-system-arm -display none -vga none -chardev socket,id=mon,path=/var/tmp/tmpihqswp6_/qemu-7747-monitor.sock -mon chardev=mon,mode=control -machine virt -chardev socket,id=console,path=/var/tmp/tmpihqswp6_/qemu-7747-console.sock,server,nowait -serial chardev:console
> > >    -icount shift=1,rr=replay,rrfile=/var/tmp/avocado_o3pfy2he/avocado_job_nf30vjqg/1-tests_acceptance_replay_kernel.py_ReplayKernel.test_arm_virt/replay.bin -kernel /home/alex.bennee/avocado/data/cache/by_location/62750ce9e069e69e6a7ff04ff54c382ee660b92a/vmlinuz -append printk.time=1 panic=-1 console=ttyAMA0 -net none -no-reboot'
> > >    2020-10-12 11:44:03,001 qmp              L0245 DEBUG| >>> {'execute': 'qmp_capabilities'}
> > >    2020-10-12 11:44:03,172 qmp              L0250 DEBUG| <<< {'return': {}}
> > >    2020-10-12 11:44:04,899 __init__         L0085 DEBUG| [    0.000000] Booting Linux on physical CPU 0x0
> > >    2020-10-12 11:44:04,904 __init__         L0085 DEBUG| [    0.000000] Linux version 4.18.16-300.fc29.armv7hl (mockbuild@buildvm-armv7-06.arm.fedoraproject.org) (gcc version 8.2.1 20180801 (Red Hat 8.2.1-2) (GCC)) #1 SMP Sun Oct 21 00:56:28 UTC 2018
> [...]
> 



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PULL 09/14] qmp: Move dispatcher to a coroutine
  2020-10-12 12:47         ` Kevin Wolf
@ 2020-10-12 15:02           ` Alex Bennée
  2020-10-12 18:49             ` Ben Widawsky
  0 siblings, 1 reply; 27+ messages in thread
From: Alex Bennée @ 2020-10-12 15:02 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Markus Armbruster, qemu-devel, Philippe Mathieu-Daudé,
	Pavel Dovgalyuk, Stefan Hajnoczi, Cleber Rosa


Kevin Wolf <kwolf@redhat.com> writes:

> Am 12.10.2020 um 13:53 hat Philippe Mathieu-Daudé geschrieben:
>> On 10/12/20 1:25 PM, Kevin Wolf wrote:
>> > Am 12.10.2020 um 12:47 hat Alex Bennée geschrieben:
>> > > 
>> > > Markus Armbruster <armbru@redhat.com> writes:
>> > > 
>> > > > From: Kevin Wolf <kwolf@redhat.com>
>> > > > 
>> > > > This moves the QMP dispatcher to a coroutine and runs all QMP command
>> > > > handlers that declare 'coroutine': true in coroutine context so they
>> > > > can avoid blocking the main loop while doing I/O or waiting for other
>> > > > events.
>> > > 
>> > > This subtly changes the replay behaviour leading to a hang in:
>> > > 
>> > >    10:55:18 [alex.bennee@hackbox2:~/l/q/b/bisect] (625581c2…)|✚1(+1/-1) + ./tests/venv/bin/avocado run tests/acceptance/replay_kernel.py:ReplayKernel.test_arm_virt
>> > >    Fetching asset from tests/acceptance/replay_kernel.py:ReplayKernel.test_arm_virt
>> > >    JOB ID     : ec11fd2544f06e6c0d421f16afa757b49f7ed734
>> > >    JOB LOG    : /home/alex.bennee/avocado/job-results/job-2020-10-12T11.40-ec11fd2/job.log
>> > >     (1/1) tests/acceptance/replay_kernel.py:ReplayKernel.test_arm_virt: ERROR: Could not perform graceful shutdown (26.27 s)
>> > >    RESULTS    : PASS 0 | ERROR 1 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
>> > >    JOB TIME   : 27.77 s
>> > > 
>> > > Looking at the log:
>> > > 
>> > >    2020-10-12 11:40:31,426 __init__         L0085 DEBUG| [    3.887411] rtc-pl031 9010000.pl031: setting system clock to 2020-10-12 10:40:31 UTC (1602499231)
>> > >    2020-10-12 11:40:31,428 __init__         L0085 DEBUG| [    3.887431] sr_init: No PMIC hook to init smartreflex
>> > >    2020-10-12 11:40:31,447 __init__         L0085 DEBUG| [    3.897193] uart-pl011 9000000.pl011: no DMA platform data
>> > >    2020-10-12 11:40:31,460 __init__         L0085 DEBUG| [    3.897242] md: Waiting for all devices to be available before autodetect
>> > >    2020-10-12 11:40:31,462 __init__         L0085 DEBUG| [    3.897259] md: If you don't use raid, use raid=noautodetect
>> > >    2020-10-12 11:40:31,475 __init__         L0085 DEBUG| [    3.897819] md: Autodetecting RAID arrays.
>> > >    2020-10-12 11:40:31,476 __init__         L0085 DEBUG| [    3.897832] md: autorun ...
>> > >    2020-10-12 11:40:31,477 __init__         L0085 DEBUG| [    3.897842] md: ... autorun DONE.
>> > >    2020-10-12 11:40:31,483 __init__         L0085 DEBUG| [    3.897962] VFS: Cannot open root device "(null)" or unknown-block(0,0): error -6
>> > >    2020-10-12 11:40:31,483 qmp              L0245 DEBUG| >>> {'execute': 'quit'}
>> > >    2020-10-12 11:40:31,495 qmp              L0145 DEBUG| <<< {'timestamp': {'seconds': 1602499231, 'microseconds': 493379}, 'event': 'SHUTDOWN', 'data': {'guest': True, 'reason': 'guest-reset'}}
>> > >    2020-10-12 11:40:31,733 machine          L0325 WARNI| qemu received signal 6; command: "./qemu-system-arm -display none -vga none -chardev socket,id=mon,path=/var/tmp/tmpzls53khe/qemu-8487-monitor.sock -mon chardev=mon,mode=control -machine virt -chardev socket,id=console,path=/var/tmp/tmpzls53khe/qemu-8487-console.sock,server,nowait -serial chardev:console -icount shift=1,rr=record,rrfile=/var/tmp/avocado_n00stdrf/avocado_job_aw60qdul/1-tests_acceptance_replay_kernel.py_ReplayKernel.test_arm_virt/replay.bin -kernel /home/alex.bennee/avocado/data/cache/by_location/62750ce9e069e69e6a7ff04ff54c382ee660b92a/vmlinuz -append printk.time=1 panic=-1 console=ttyAMA0 -net none -no-reboot"
>> > 
>> > This looks like a crash (SIGABRT) rather than a hang. Do you have a
>> > stack trace for the crashed process?
>> 
>> No crash, exit(0):
>
> Why does the log say "qemu received signal 6" then?
>
>> VFS: Cannot open root device "(null)" or unknown-block(0,0): error -6
>
> Alex has this error in the logs before this commit, so I assume this is
> expected. All of the following is then probably expected, too, because
> it follows directly from this error:
>
>> Please append a correct "root=" boot option; here are the available
>> partitions:
>> Kernel panic - not syncing: VFS: Unable to mount root fs on
>> unknown-block(0,0)
>> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.18.16-300.fc29.armv7hl #1
>> Hardware name: Generic DT based system
>> [<c0313f7c>] (unwind_backtrace) from [<c030dc64>] (show_stack+0x20/0x24)
>> [<c030dc64>] (show_stack) from [<c0b50ec4>] (dump_stack+0x88/0xa8)
>> [<c0b50ec4>] (dump_stack) from [<c03592f8>] (panic+0xd4/0x26c)
>> [<c03592f8>] (panic) from [<c110183c>] (mount_block_root+0x250/0x2ec)
>> [<c110183c>] (mount_block_root) from [<c1101950>] (mount_root+0x78/0x90)
>> [<c1101950>] (mount_root) from [<c1101ac4>] (prepare_namespace+0x15c/0x19c)
>> [<c1101ac4>] (prepare_namespace) from [<c11012e8>]
>> (kernel_init_freeable+0x2c0/0x370)
>> [<c11012e8>] (kernel_init_freeable) from [<c0b63914>]
>> (kernel_init+0x18/0x128)
>> [<c0b63914>] (kernel_init) from [<c03010e8>] (ret_from_fork+0x14/0x2c)
>> Exception stack(0xc790bfb0 to 0xc790bff8)
>> bfa0:                                     00000000 00000000 00000000
>> 00000000
>> bfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000
>> 00000000
>> bfe0: 00000000 00000000 00000000 00000000 00000013 00000000
>> 
>> -> PSCI call
>>    -> QEMU_PSCI_0_2_FN_SYSTEM_RESET
>>       -> SHUTDOWN_CAUSE_GUEST_RESET
>>          -> exit(0)

Yes - the test is recording the kernel up until the point it gives up.

> Hm... So you're saying that the test sends a 'quit' QMP command, but
> before it could be processed, the guest causes QEMU to exit, so the test
> will never receive a reply to its request?
>
> If commit 9ce44e2ce2 changes anything about this, it seems to me that it
> would be that more QMP commands are processed during monitor_cleanup()
> because it doesn't just delete the dispatcher BH, but waits until it's
> not busy any more.
>
> Looking at this code again, however, the order in monitor_cleanup() is
> probably wrong. We should first shut down the dispatcher (which may
> still be using Monitor objects) and then destroy the monitors. This
> could possibly explain a crash, but probably not wrong results with a
> successful shutdown.

I suspect this is a race between QEMU shutting down because the guest
shut it down and the acceptance test trying to shut things down via QMP.
I think the proper fix is either to:

  - s/panic=-1/panic=0/ in the command line (and probably drop --no-reboot)

which would let the acceptance test cleanly shutdown via QMP. Or:

  - modify the test to declare that qemu will shutdown itself and
    therefor no "quit" needs to be sent

WDYT?

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PULL 09/14] qmp: Move dispatcher to a coroutine
  2020-10-12 15:02           ` Alex Bennée
@ 2020-10-12 18:49             ` Ben Widawsky
  2020-10-13  7:56               ` Philippe Mathieu-Daudé
  0 siblings, 1 reply; 27+ messages in thread
From: Ben Widawsky @ 2020-10-12 18:49 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Kevin Wolf, qemu-devel, Philippe Mathieu-Daudé,
	Markus Armbruster, Pavel Dovgalyuk, Stefan Hajnoczi, Cleber Rosa

On 20-10-12 16:02:34, Alex Bennée wrote:
> 
> Kevin Wolf <kwolf@redhat.com> writes:
> 
> > Am 12.10.2020 um 13:53 hat Philippe Mathieu-Daudé geschrieben:
> >> On 10/12/20 1:25 PM, Kevin Wolf wrote:
> >> > Am 12.10.2020 um 12:47 hat Alex Bennée geschrieben:
> >> > > 
> >> > > Markus Armbruster <armbru@redhat.com> writes:
> >> > > 
> >> > > > From: Kevin Wolf <kwolf@redhat.com>
> >> > > > 
> >> > > > This moves the QMP dispatcher to a coroutine and runs all QMP command
> >> > > > handlers that declare 'coroutine': true in coroutine context so they
> >> > > > can avoid blocking the main loop while doing I/O or waiting for other
> >> > > > events.
> >> > > 
> >> > > This subtly changes the replay behaviour leading to a hang in:
> >> > > 
> >> > >    10:55:18 [alex.bennee@hackbox2:~/l/q/b/bisect] (625581c2…)|✚1(+1/-1) + ./tests/venv/bin/avocado run tests/acceptance/replay_kernel.py:ReplayKernel.test_arm_virt
> >> > >    Fetching asset from tests/acceptance/replay_kernel.py:ReplayKernel.test_arm_virt
> >> > >    JOB ID     : ec11fd2544f06e6c0d421f16afa757b49f7ed734
> >> > >    JOB LOG    : /home/alex.bennee/avocado/job-results/job-2020-10-12T11.40-ec11fd2/job.log
> >> > >     (1/1) tests/acceptance/replay_kernel.py:ReplayKernel.test_arm_virt: ERROR: Could not perform graceful shutdown (26.27 s)
> >> > >    RESULTS    : PASS 0 | ERROR 1 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
> >> > >    JOB TIME   : 27.77 s
> >> > > 
> >> > > Looking at the log:
> >> > > 
> >> > >    2020-10-12 11:40:31,426 __init__         L0085 DEBUG| [    3.887411] rtc-pl031 9010000.pl031: setting system clock to 2020-10-12 10:40:31 UTC (1602499231)
> >> > >    2020-10-12 11:40:31,428 __init__         L0085 DEBUG| [    3.887431] sr_init: No PMIC hook to init smartreflex
> >> > >    2020-10-12 11:40:31,447 __init__         L0085 DEBUG| [    3.897193] uart-pl011 9000000.pl011: no DMA platform data
> >> > >    2020-10-12 11:40:31,460 __init__         L0085 DEBUG| [    3.897242] md: Waiting for all devices to be available before autodetect
> >> > >    2020-10-12 11:40:31,462 __init__         L0085 DEBUG| [    3.897259] md: If you don't use raid, use raid=noautodetect
> >> > >    2020-10-12 11:40:31,475 __init__         L0085 DEBUG| [    3.897819] md: Autodetecting RAID arrays.
> >> > >    2020-10-12 11:40:31,476 __init__         L0085 DEBUG| [    3.897832] md: autorun ...
> >> > >    2020-10-12 11:40:31,477 __init__         L0085 DEBUG| [    3.897842] md: ... autorun DONE.
> >> > >    2020-10-12 11:40:31,483 __init__         L0085 DEBUG| [    3.897962] VFS: Cannot open root device "(null)" or unknown-block(0,0): error -6
> >> > >    2020-10-12 11:40:31,483 qmp              L0245 DEBUG| >>> {'execute': 'quit'}
> >> > >    2020-10-12 11:40:31,495 qmp              L0145 DEBUG| <<< {'timestamp': {'seconds': 1602499231, 'microseconds': 493379}, 'event': 'SHUTDOWN', 'data': {'guest': True, 'reason': 'guest-reset'}}
> >> > >    2020-10-12 11:40:31,733 machine          L0325 WARNI| qemu received signal 6; command: "./qemu-system-arm -display none -vga none -chardev socket,id=mon,path=/var/tmp/tmpzls53khe/qemu-8487-monitor.sock -mon chardev=mon,mode=control -machine virt -chardev socket,id=console,path=/var/tmp/tmpzls53khe/qemu-8487-console.sock,server,nowait -serial chardev:console -icount shift=1,rr=record,rrfile=/var/tmp/avocado_n00stdrf/avocado_job_aw60qdul/1-tests_acceptance_replay_kernel.py_ReplayKernel.test_arm_virt/replay.bin -kernel /home/alex.bennee/avocado/data/cache/by_location/62750ce9e069e69e6a7ff04ff54c382ee660b92a/vmlinuz -append printk.time=1 panic=-1 console=ttyAMA0 -net none -no-reboot"
> >> > 
> >> > This looks like a crash (SIGABRT) rather than a hang. Do you have a
> >> > stack trace for the crashed process?
> >> 
> >> No crash, exit(0):
> >
> > Why does the log say "qemu received signal 6" then?
> >
> >> VFS: Cannot open root device "(null)" or unknown-block(0,0): error -6
> >
> > Alex has this error in the logs before this commit, so I assume this is
> > expected. All of the following is then probably expected, too, because
> > it follows directly from this error:
> >
> >> Please append a correct "root=" boot option; here are the available
> >> partitions:
> >> Kernel panic - not syncing: VFS: Unable to mount root fs on
> >> unknown-block(0,0)
> >> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.18.16-300.fc29.armv7hl #1
> >> Hardware name: Generic DT based system
> >> [<c0313f7c>] (unwind_backtrace) from [<c030dc64>] (show_stack+0x20/0x24)
> >> [<c030dc64>] (show_stack) from [<c0b50ec4>] (dump_stack+0x88/0xa8)
> >> [<c0b50ec4>] (dump_stack) from [<c03592f8>] (panic+0xd4/0x26c)
> >> [<c03592f8>] (panic) from [<c110183c>] (mount_block_root+0x250/0x2ec)
> >> [<c110183c>] (mount_block_root) from [<c1101950>] (mount_root+0x78/0x90)
> >> [<c1101950>] (mount_root) from [<c1101ac4>] (prepare_namespace+0x15c/0x19c)
> >> [<c1101ac4>] (prepare_namespace) from [<c11012e8>]
> >> (kernel_init_freeable+0x2c0/0x370)
> >> [<c11012e8>] (kernel_init_freeable) from [<c0b63914>]
> >> (kernel_init+0x18/0x128)
> >> [<c0b63914>] (kernel_init) from [<c03010e8>] (ret_from_fork+0x14/0x2c)
> >> Exception stack(0xc790bfb0 to 0xc790bff8)
> >> bfa0:                                     00000000 00000000 00000000
> >> 00000000
> >> bfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> >> 00000000
> >> bfe0: 00000000 00000000 00000000 00000000 00000013 00000000
> >> 
> >> -> PSCI call
> >>    -> QEMU_PSCI_0_2_FN_SYSTEM_RESET
> >>       -> SHUTDOWN_CAUSE_GUEST_RESET
> >>          -> exit(0)
> 
> Yes - the test is recording the kernel up until the point it gives up.
> 
> > Hm... So you're saying that the test sends a 'quit' QMP command, but
> > before it could be processed, the guest causes QEMU to exit, so the test
> > will never receive a reply to its request?
> >
> > If commit 9ce44e2ce2 changes anything about this, it seems to me that it
> > would be that more QMP commands are processed during monitor_cleanup()
> > because it doesn't just delete the dispatcher BH, but waits until it's
> > not busy any more.
> >
> > Looking at this code again, however, the order in monitor_cleanup() is
> > probably wrong. We should first shut down the dispatcher (which may
> > still be using Monitor objects) and then destroy the monitors. This
> > could possibly explain a crash, but probably not wrong results with a
> > successful shutdown.
> 
> I suspect this is a race between QEMU shutting down because the guest
> shut it down and the acceptance test trying to shut things down via QMP.
> I think the proper fix is either to:

I'm not sure my problem is the same, but I do have the same symptom.

> 
>   - s/panic=-1/panic=0/ in the command line (and probably drop --no-reboot)
> 
> which would let the acceptance test cleanly shutdown via QMP.

I tried this, which works well on some of the platforms which were failing.
-    KERNEL_COMMON_COMMAND_LINE = 'printk.time=1 panic=-1 '
+    KERNEL_COMMON_COMMAND_LINE = 'printk.time=1 panic=-0 '

     def run_vm(self, kernel_path, kernel_command_line, console_pattern,
                record, shift, args, replay_path):
@@ -47,8 +47,8 @@ class ReplayKernel(LinuxKernelTest):
                     (shift, mode, replay_path),
                     '-kernel', kernel_path,
                     '-append', kernel_command_line,
-                    '-net', 'none',
-                    '-no-reboot')
+                    '-net', 'none'
+                    )
         if args:
             vm.add_args(*args)
         vm.launch()
@@ -154,7 +154,7 @@ class ReplayKernel(LinuxKernelTest):
         kernel_command_line = (self.KERNEL_COMMON_COMMAND_LINE +
                                'console=ttyS0,115200 '
                                'usbcore.nousb '
-                               'panic=-1 noreboot')
+                               'panic=-0')
         console_pattern = 'Boot successful.'

> 
>   - modify the test to declare that qemu will shutdown itself and
>     therefor no "quit" needs to be sent
> 
> WDYT?
> 
> -- 
> Alex Bennée
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PULL 09/14] qmp: Move dispatcher to a coroutine
  2020-10-12 18:49             ` Ben Widawsky
@ 2020-10-13  7:56               ` Philippe Mathieu-Daudé
  2020-10-13 11:29                 ` Kevin Wolf
  0 siblings, 1 reply; 27+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-10-13  7:56 UTC (permalink / raw)
  To: Ben Widawsky
  Cc: Kevin Wolf, Markus Armbruster, QEMU Developers, Pavel Dovgalyuk,
	Stefan Hajnoczi, Cleber Rosa, Alex Bennée,
	Philippe Mathieu-Daudé

On Mon, Oct 12, 2020 at 8:57 PM Ben Widawsky <ben@bwidawsk.net> wrote:
>
> On 20-10-12 16:02:34, Alex Bennée wrote:
> >
> > Kevin Wolf <kwolf@redhat.com> writes:
> >
> > > Am 12.10.2020 um 13:53 hat Philippe Mathieu-Daudé geschrieben:
> > >> On 10/12/20 1:25 PM, Kevin Wolf wrote:
> > >> > Am 12.10.2020 um 12:47 hat Alex Bennée geschrieben:
> > >> > >
> > >> > > Markus Armbruster <armbru@redhat.com> writes:
> > >> > >
> > >> > > > From: Kevin Wolf <kwolf@redhat.com>
> > >> > > >
> > >> > > > This moves the QMP dispatcher to a coroutine and runs all QMP command
> > >> > > > handlers that declare 'coroutine': true in coroutine context so they
> > >> > > > can avoid blocking the main loop while doing I/O or waiting for other
> > >> > > > events.
> > >> > >
> > >> > > This subtly changes the replay behaviour leading to a hang in:
> > >> > >
> > >> > >    10:55:18 [alex.bennee@hackbox2:~/l/q/b/bisect] (625581c2…)|✚1(+1/-1) + ./tests/venv/bin/avocado run tests/acceptance/replay_kernel.py:ReplayKernel.test_arm_virt
> > >> > >    Fetching asset from tests/acceptance/replay_kernel.py:ReplayKernel.test_arm_virt
> > >> > >    JOB ID     : ec11fd2544f06e6c0d421f16afa757b49f7ed734
> > >> > >    JOB LOG    : /home/alex.bennee/avocado/job-results/job-2020-10-12T11.40-ec11fd2/job.log
> > >> > >     (1/1) tests/acceptance/replay_kernel.py:ReplayKernel.test_arm_virt: ERROR: Could not perform graceful shutdown (26.27 s)
> > >> > >    RESULTS    : PASS 0 | ERROR 1 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
> > >> > >    JOB TIME   : 27.77 s
> > >> > >
> > >> > > Looking at the log:
> > >> > >
> > >> > >    2020-10-12 11:40:31,426 __init__         L0085 DEBUG| [    3.887411] rtc-pl031 9010000.pl031: setting system clock to 2020-10-12 10:40:31 UTC (1602499231)
> > >> > >    2020-10-12 11:40:31,428 __init__         L0085 DEBUG| [    3.887431] sr_init: No PMIC hook to init smartreflex
> > >> > >    2020-10-12 11:40:31,447 __init__         L0085 DEBUG| [    3.897193] uart-pl011 9000000.pl011: no DMA platform data
> > >> > >    2020-10-12 11:40:31,460 __init__         L0085 DEBUG| [    3.897242] md: Waiting for all devices to be available before autodetect
> > >> > >    2020-10-12 11:40:31,462 __init__         L0085 DEBUG| [    3.897259] md: If you don't use raid, use raid=noautodetect
> > >> > >    2020-10-12 11:40:31,475 __init__         L0085 DEBUG| [    3.897819] md: Autodetecting RAID arrays.
> > >> > >    2020-10-12 11:40:31,476 __init__         L0085 DEBUG| [    3.897832] md: autorun ...
> > >> > >    2020-10-12 11:40:31,477 __init__         L0085 DEBUG| [    3.897842] md: ... autorun DONE.
> > >> > >    2020-10-12 11:40:31,483 __init__         L0085 DEBUG| [    3.897962] VFS: Cannot open root device "(null)" or unknown-block(0,0): error -6
> > >> > >    2020-10-12 11:40:31,483 qmp              L0245 DEBUG| >>> {'execute': 'quit'}
> > >> > >    2020-10-12 11:40:31,495 qmp              L0145 DEBUG| <<< {'timestamp': {'seconds': 1602499231, 'microseconds': 493379}, 'event': 'SHUTDOWN', 'data': {'guest': True, 'reason': 'guest-reset'}}
> > >> > >    2020-10-12 11:40:31,733 machine          L0325 WARNI| qemu received signal 6; command: "./qemu-system-arm -display none -vga none -chardev socket,id=mon,path=/var/tmp/tmpzls53khe/qemu-8487-monitor.sock -mon chardev=mon,mode=control -machine virt -chardev socket,id=console,path=/var/tmp/tmpzls53khe/qemu-8487-console.sock,server,nowait -serial chardev:console -icount shift=1,rr=record,rrfile=/var/tmp/avocado_n00stdrf/avocado_job_aw60qdul/1-tests_acceptance_replay_kernel.py_ReplayKernel.test_arm_virt/replay.bin -kernel /home/alex.bennee/avocado/data/cache/by_location/62750ce9e069e69e6a7ff04ff54c382ee660b92a/vmlinuz -append printk.time=1 panic=-1 console=ttyAMA0 -net none -no-reboot"
> > >> >
> > >> > This looks like a crash (SIGABRT) rather than a hang. Do you have a
> > >> > stack trace for the crashed process?
> > >>
> > >> No crash, exit(0):
> > >
> > > Why does the log say "qemu received signal 6" then?
> > >
> > >> VFS: Cannot open root device "(null)" or unknown-block(0,0): error -6
> > >
> > > Alex has this error in the logs before this commit, so I assume this is
> > > expected. All of the following is then probably expected, too, because
> > > it follows directly from this error:
> > >
> > >> Please append a correct "root=" boot option; here are the available
> > >> partitions:
> > >> Kernel panic - not syncing: VFS: Unable to mount root fs on
> > >> unknown-block(0,0)
> > >> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.18.16-300.fc29.armv7hl #1
> > >> Hardware name: Generic DT based system
> > >> [<c0313f7c>] (unwind_backtrace) from [<c030dc64>] (show_stack+0x20/0x24)
> > >> [<c030dc64>] (show_stack) from [<c0b50ec4>] (dump_stack+0x88/0xa8)
> > >> [<c0b50ec4>] (dump_stack) from [<c03592f8>] (panic+0xd4/0x26c)
> > >> [<c03592f8>] (panic) from [<c110183c>] (mount_block_root+0x250/0x2ec)
> > >> [<c110183c>] (mount_block_root) from [<c1101950>] (mount_root+0x78/0x90)
> > >> [<c1101950>] (mount_root) from [<c1101ac4>] (prepare_namespace+0x15c/0x19c)
> > >> [<c1101ac4>] (prepare_namespace) from [<c11012e8>]
> > >> (kernel_init_freeable+0x2c0/0x370)
> > >> [<c11012e8>] (kernel_init_freeable) from [<c0b63914>]
> > >> (kernel_init+0x18/0x128)
> > >> [<c0b63914>] (kernel_init) from [<c03010e8>] (ret_from_fork+0x14/0x2c)
> > >> Exception stack(0xc790bfb0 to 0xc790bff8)
> > >> bfa0:                                     00000000 00000000 00000000
> > >> 00000000
> > >> bfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> > >> 00000000
> > >> bfe0: 00000000 00000000 00000000 00000000 00000013 00000000
> > >>
> > >> -> PSCI call
> > >>    -> QEMU_PSCI_0_2_FN_SYSTEM_RESET
> > >>       -> SHUTDOWN_CAUSE_GUEST_RESET
> > >>          -> exit(0)
> >
> > Yes - the test is recording the kernel up until the point it gives up.
> >
> > > Hm... So you're saying that the test sends a 'quit' QMP command, but
> > > before it could be processed, the guest causes QEMU to exit, so the test
> > > will never receive a reply to its request?
> > >
> > > If commit 9ce44e2ce2 changes anything about this, it seems to me that it
> > > would be that more QMP commands are processed during monitor_cleanup()
> > > because it doesn't just delete the dispatcher BH, but waits until it's
> > > not busy any more.
> > >
> > > Looking at this code again, however, the order in monitor_cleanup() is
> > > probably wrong. We should first shut down the dispatcher (which may
> > > still be using Monitor objects) and then destroy the monitors. This
> > > could possibly explain a crash, but probably not wrong results with a
> > > successful shutdown.
> >
> > I suspect this is a race between QEMU shutting down because the guest
> > shut it down and the acceptance test trying to shut things down via QMP.
> > I think the proper fix is either to:
>
> I'm not sure my problem is the same, but I do have the same symptom.
>
> >
> >   - s/panic=-1/panic=0/ in the command line (and probably drop --no-reboot)
> >
> > which would let the acceptance test cleanly shutdown via QMP.
>
> I tried this, which works well on some of the platforms which were failing.
> -    KERNEL_COMMON_COMMAND_LINE = 'printk.time=1 panic=-1 '
> +    KERNEL_COMMON_COMMAND_LINE = 'printk.time=1 panic=-0 '
>
>      def run_vm(self, kernel_path, kernel_command_line, console_pattern,
>                 record, shift, args, replay_path):
> @@ -47,8 +47,8 @@ class ReplayKernel(LinuxKernelTest):
>                      (shift, mode, replay_path),
>                      '-kernel', kernel_path,
>                      '-append', kernel_command_line,
> -                    '-net', 'none',
> -                    '-no-reboot')
> +                    '-net', 'none'
> +                    )
>          if args:
>              vm.add_args(*args)
>          vm.launch()
> @@ -154,7 +154,7 @@ class ReplayKernel(LinuxKernelTest):
>          kernel_command_line = (self.KERNEL_COMMON_COMMAND_LINE +
>                                 'console=ttyS0,115200 '
>                                 'usbcore.nousb '
> -                               'panic=-1 noreboot')
> +                               'panic=-0')
>          console_pattern = 'Boot successful.'
>
> >
> >   - modify the test to declare that qemu will shutdown itself and
> >     therefor no "quit" needs to be sent
> >
> > WDYT?

Can you send it as a formal patch please? :)

> >
> > --
> > Alex Bennée
> >
>



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PULL 09/14] qmp: Move dispatcher to a coroutine
  2020-10-13  7:56               ` Philippe Mathieu-Daudé
@ 2020-10-13 11:29                 ` Kevin Wolf
  0 siblings, 0 replies; 27+ messages in thread
From: Kevin Wolf @ 2020-10-13 11:29 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: Ben Widawsky, Markus Armbruster, QEMU Developers,
	Pavel Dovgalyuk, Stefan Hajnoczi, Cleber Rosa, Alex Bennée,
	Philippe Mathieu-Daudé

Am 13.10.2020 um 09:56 hat Philippe Mathieu-Daudé geschrieben:
> On Mon, Oct 12, 2020 at 8:57 PM Ben Widawsky <ben@bwidawsk.net> wrote:
> >
> > On 20-10-12 16:02:34, Alex Bennée wrote:
> > >
> > > Kevin Wolf <kwolf@redhat.com> writes:
> > >
> > > > Am 12.10.2020 um 13:53 hat Philippe Mathieu-Daudé geschrieben:
> > > >> On 10/12/20 1:25 PM, Kevin Wolf wrote:
> > > >> > Am 12.10.2020 um 12:47 hat Alex Bennée geschrieben:
> > > >> > >
> > > >> > > Markus Armbruster <armbru@redhat.com> writes:
> > > >> > >
> > > >> > > > From: Kevin Wolf <kwolf@redhat.com>
> > > >> > > >
> > > >> > > > This moves the QMP dispatcher to a coroutine and runs all QMP command
> > > >> > > > handlers that declare 'coroutine': true in coroutine context so they
> > > >> > > > can avoid blocking the main loop while doing I/O or waiting for other
> > > >> > > > events.
> > > >> > >
> > > >> > > This subtly changes the replay behaviour leading to a hang in:
> > > >> > >
> > > >> > >    10:55:18 [alex.bennee@hackbox2:~/l/q/b/bisect] (625581c2…)|✚1(+1/-1) + ./tests/venv/bin/avocado run tests/acceptance/replay_kernel.py:ReplayKernel.test_arm_virt
> > > >> > >    Fetching asset from tests/acceptance/replay_kernel.py:ReplayKernel.test_arm_virt
> > > >> > >    JOB ID     : ec11fd2544f06e6c0d421f16afa757b49f7ed734
> > > >> > >    JOB LOG    : /home/alex.bennee/avocado/job-results/job-2020-10-12T11.40-ec11fd2/job.log
> > > >> > >     (1/1) tests/acceptance/replay_kernel.py:ReplayKernel.test_arm_virt: ERROR: Could not perform graceful shutdown (26.27 s)
> > > >> > >    RESULTS    : PASS 0 | ERROR 1 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
> > > >> > >    JOB TIME   : 27.77 s
> > > >> > >
> > > >> > > Looking at the log:
> > > >> > >
> > > >> > >    2020-10-12 11:40:31,426 __init__         L0085 DEBUG| [    3.887411] rtc-pl031 9010000.pl031: setting system clock to 2020-10-12 10:40:31 UTC (1602499231)
> > > >> > >    2020-10-12 11:40:31,428 __init__         L0085 DEBUG| [    3.887431] sr_init: No PMIC hook to init smartreflex
> > > >> > >    2020-10-12 11:40:31,447 __init__         L0085 DEBUG| [    3.897193] uart-pl011 9000000.pl011: no DMA platform data
> > > >> > >    2020-10-12 11:40:31,460 __init__         L0085 DEBUG| [    3.897242] md: Waiting for all devices to be available before autodetect
> > > >> > >    2020-10-12 11:40:31,462 __init__         L0085 DEBUG| [    3.897259] md: If you don't use raid, use raid=noautodetect
> > > >> > >    2020-10-12 11:40:31,475 __init__         L0085 DEBUG| [    3.897819] md: Autodetecting RAID arrays.
> > > >> > >    2020-10-12 11:40:31,476 __init__         L0085 DEBUG| [    3.897832] md: autorun ...
> > > >> > >    2020-10-12 11:40:31,477 __init__         L0085 DEBUG| [    3.897842] md: ... autorun DONE.
> > > >> > >    2020-10-12 11:40:31,483 __init__         L0085 DEBUG| [    3.897962] VFS: Cannot open root device "(null)" or unknown-block(0,0): error -6
> > > >> > >    2020-10-12 11:40:31,483 qmp              L0245 DEBUG| >>> {'execute': 'quit'}
> > > >> > >    2020-10-12 11:40:31,495 qmp              L0145 DEBUG| <<< {'timestamp': {'seconds': 1602499231, 'microseconds': 493379}, 'event': 'SHUTDOWN', 'data': {'guest': True, 'reason': 'guest-reset'}}
> > > >> > >    2020-10-12 11:40:31,733 machine          L0325 WARNI| qemu received signal 6; command: "./qemu-system-arm -display none -vga none -chardev socket,id=mon,path=/var/tmp/tmpzls53khe/qemu-8487-monitor.sock -mon chardev=mon,mode=control -machine virt -chardev socket,id=console,path=/var/tmp/tmpzls53khe/qemu-8487-console.sock,server,nowait -serial chardev:console -icount shift=1,rr=record,rrfile=/var/tmp/avocado_n00stdrf/avocado_job_aw60qdul/1-tests_acceptance_replay_kernel.py_ReplayKernel.test_arm_virt/replay.bin -kernel /home/alex.bennee/avocado/data/cache/by_location/62750ce9e069e69e6a7ff04ff54c382ee660b92a/vmlinuz -append printk.time=1 panic=-1 console=ttyAMA0 -net none -no-reboot"
> > > >> >
> > > >> > This looks like a crash (SIGABRT) rather than a hang. Do you have a
> > > >> > stack trace for the crashed process?
> > > >>
> > > >> No crash, exit(0):
> > > >
> > > > Why does the log say "qemu received signal 6" then?
> > > >
> > > >> VFS: Cannot open root device "(null)" or unknown-block(0,0): error -6
> > > >
> > > > Alex has this error in the logs before this commit, so I assume this is
> > > > expected. All of the following is then probably expected, too, because
> > > > it follows directly from this error:
> > > >
> > > >> Please append a correct "root=" boot option; here are the available
> > > >> partitions:
> > > >> Kernel panic - not syncing: VFS: Unable to mount root fs on
> > > >> unknown-block(0,0)
> > > >> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.18.16-300.fc29.armv7hl #1
> > > >> Hardware name: Generic DT based system
> > > >> [<c0313f7c>] (unwind_backtrace) from [<c030dc64>] (show_stack+0x20/0x24)
> > > >> [<c030dc64>] (show_stack) from [<c0b50ec4>] (dump_stack+0x88/0xa8)
> > > >> [<c0b50ec4>] (dump_stack) from [<c03592f8>] (panic+0xd4/0x26c)
> > > >> [<c03592f8>] (panic) from [<c110183c>] (mount_block_root+0x250/0x2ec)
> > > >> [<c110183c>] (mount_block_root) from [<c1101950>] (mount_root+0x78/0x90)
> > > >> [<c1101950>] (mount_root) from [<c1101ac4>] (prepare_namespace+0x15c/0x19c)
> > > >> [<c1101ac4>] (prepare_namespace) from [<c11012e8>]
> > > >> (kernel_init_freeable+0x2c0/0x370)
> > > >> [<c11012e8>] (kernel_init_freeable) from [<c0b63914>]
> > > >> (kernel_init+0x18/0x128)
> > > >> [<c0b63914>] (kernel_init) from [<c03010e8>] (ret_from_fork+0x14/0x2c)
> > > >> Exception stack(0xc790bfb0 to 0xc790bff8)
> > > >> bfa0:                                     00000000 00000000 00000000
> > > >> 00000000
> > > >> bfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> > > >> 00000000
> > > >> bfe0: 00000000 00000000 00000000 00000000 00000013 00000000
> > > >>
> > > >> -> PSCI call
> > > >>    -> QEMU_PSCI_0_2_FN_SYSTEM_RESET
> > > >>       -> SHUTDOWN_CAUSE_GUEST_RESET
> > > >>          -> exit(0)
> > >
> > > Yes - the test is recording the kernel up until the point it gives up.
> > >
> > > > Hm... So you're saying that the test sends a 'quit' QMP command, but
> > > > before it could be processed, the guest causes QEMU to exit, so the test
> > > > will never receive a reply to its request?
> > > >
> > > > If commit 9ce44e2ce2 changes anything about this, it seems to me that it
> > > > would be that more QMP commands are processed during monitor_cleanup()
> > > > because it doesn't just delete the dispatcher BH, but waits until it's
> > > > not busy any more.
> > > >
> > > > Looking at this code again, however, the order in monitor_cleanup() is
> > > > probably wrong. We should first shut down the dispatcher (which may
> > > > still be using Monitor objects) and then destroy the monitors. This
> > > > could possibly explain a crash, but probably not wrong results with a
> > > > successful shutdown.
> > >
> > > I suspect this is a race between QEMU shutting down because the guest
> > > shut it down and the acceptance test trying to shut things down via QMP.
> > > I think the proper fix is either to:
> >
> > I'm not sure my problem is the same, but I do have the same symptom.
> >
> > >
> > >   - s/panic=-1/panic=0/ in the command line (and probably drop --no-reboot)
> > >
> > > which would let the acceptance test cleanly shutdown via QMP.
> >
> > I tried this, which works well on some of the platforms which were failing.
> > -    KERNEL_COMMON_COMMAND_LINE = 'printk.time=1 panic=-1 '
> > +    KERNEL_COMMON_COMMAND_LINE = 'printk.time=1 panic=-0 '
> >
> >      def run_vm(self, kernel_path, kernel_command_line, console_pattern,
> >                 record, shift, args, replay_path):
> > @@ -47,8 +47,8 @@ class ReplayKernel(LinuxKernelTest):
> >                      (shift, mode, replay_path),
> >                      '-kernel', kernel_path,
> >                      '-append', kernel_command_line,
> > -                    '-net', 'none',
> > -                    '-no-reboot')
> > +                    '-net', 'none'
> > +                    )
> >          if args:
> >              vm.add_args(*args)
> >          vm.launch()
> > @@ -154,7 +154,7 @@ class ReplayKernel(LinuxKernelTest):
> >          kernel_command_line = (self.KERNEL_COMMON_COMMAND_LINE +
> >                                 'console=ttyS0,115200 '
> >                                 'usbcore.nousb '
> > -                               'panic=-1 noreboot')
> > +                               'panic=-0')
> >          console_pattern = 'Boot successful.'
> >
> > >
> > >   - modify the test to declare that qemu will shutdown itself and
> > >     therefor no "quit" needs to be sent
> > >
> > > WDYT?
> 
> Can you send it as a formal patch please? :)

Please don't "fix" the test case when the problem is a QEMU bug.


I reproduced the bug myself now to fill in the missing information and
this is how it crashes:

(gdb) bt
#0  0x00007fe541cf4bc5 in raise () at /lib64/libc.so.6
#1  0x00007fe541cdd8a4 in abort () at /lib64/libc.so.6
#2  0x000055c24e965327 in error_exit (err=16, msg=0x55c24eead3a0 <__func__.33> "qemu_mutex_destroy") at ../util/qemu-thread-posix.c:37
#3  0x000055c24e9654c3 in qemu_mutex_destroy (mutex=0x55c25133e0f0) at ../util/qemu-thread-posix.c:70
#4  0x000055c24e7cfaf1 in monitor_data_destroy_qmp (mon=0x55c25133dfd0) at ../monitor/qmp.c:439
#5  0x000055c24e7d23bc in monitor_data_destroy (mon=0x55c25133dfd0) at ../monitor/monitor.c:615
#6  0x000055c24e7d253a in monitor_cleanup () at ../monitor/monitor.c:644
#7  0x000055c24e6cb002 in qemu_cleanup () at ../softmmu/vl.c:4549
#8  0x000055c24e0d259b in main (argc=24, argv=0x7ffff66b0d58, envp=0x7ffff66b0e20) at ../softmmu/main.c:51

The reason is that qemu_mutex_destroy(&mon->qmp_queue_lock) is called
while mon->qmp_queue_lock is still held by the dispatcher coroutine.
This is fixed by correcting the order in monitor_cleanup() as I had
already noticed above.


For the sake of completeness, this is where the dispatcher coroutine has
yielded when the process crashes:

(gdb) l *0x000055c24e7cf3da
0x55c24e7cf3da is in monitor_qmp_dispatcher_co (../monitor/qmp.c:273).
268              * involves an AIO_WAIT_WHILE().
269              */
270             aio_co_schedule(qemu_get_aio_context(), qmp_dispatcher_co);
271 ===>        qemu_coroutine_yield();
272
273             mon = req_obj->mon;
274             /* qmp_oob_enabled() might change after "qmp_capabilities" */
275             need_resume = !qmp_oob_enabled(mon) ||
276                 mon->qmp_requests->length == QMP_REQ_QUEUE_LEN_MAX - 1;
277             qemu_mutex_unlock(&mon->qmp_queue_lock);


Please try the following patch. It fixes the problem for me.

Kevin


diff --git a/monitor/monitor.c b/monitor/monitor.c
index ceffe1a83b..84222cd130 100644
--- a/monitor/monitor.c
+++ b/monitor/monitor.c
@@ -632,23 +632,9 @@ void monitor_cleanup(void)
         iothread_stop(mon_iothread);
     }
 
-    /* Flush output buffers and destroy monitors */
-    qemu_mutex_lock(&monitor_lock);
-    monitor_destroyed = true;
-    while (!QTAILQ_EMPTY(&mon_list)) {
-        Monitor *mon = QTAILQ_FIRST(&mon_list);
-        QTAILQ_REMOVE(&mon_list, mon, entry);
-        /* Permit QAPI event emission from character frontend release */
-        qemu_mutex_unlock(&monitor_lock);
-        monitor_flush(mon);
-        monitor_data_destroy(mon);
-        qemu_mutex_lock(&monitor_lock);
-        g_free(mon);
-    }
-    qemu_mutex_unlock(&monitor_lock);
-
     /*
-     * The dispatcher needs to stop before destroying the I/O thread.
+     * The dispatcher needs to stop before destroying the monitor and
+     * the I/O thread.
      *
      * We need to poll both qemu_aio_context and iohandler_ctx to make
      * sure that the dispatcher coroutine keeps making progress and
@@ -665,6 +651,21 @@ void monitor_cleanup(void)
                    (aio_poll(iohandler_get_aio_context(), false),
                     qatomic_mb_read(&qmp_dispatcher_co_busy)));
 
+    /* Flush output buffers and destroy monitors */
+    qemu_mutex_lock(&monitor_lock);
+    monitor_destroyed = true;
+    while (!QTAILQ_EMPTY(&mon_list)) {
+        Monitor *mon = QTAILQ_FIRST(&mon_list);
+        QTAILQ_REMOVE(&mon_list, mon, entry);
+        /* Permit QAPI event emission from character frontend release */
+        qemu_mutex_unlock(&monitor_lock);
+        monitor_flush(mon);
+        monitor_data_destroy(mon);
+        qemu_mutex_lock(&monitor_lock);
+        g_free(mon);
+    }
+    qemu_mutex_unlock(&monitor_lock);
+
     if (mon_iothread) {
         iothread_destroy(mon_iothread);
         mon_iothread = NULL;



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PULL 09/14] qmp: Move dispatcher to a coroutine
  2020-10-09  6:34 ` [PULL 09/14] qmp: Move dispatcher to a coroutine Markus Armbruster
  2020-10-12 10:47   ` Alex Bennée
@ 2020-10-17  8:15   ` Volker Rümelin
  2020-10-19  8:57     ` Kevin Wolf
  1 sibling, 1 reply; 27+ messages in thread
From: Volker Rümelin @ 2020-10-17  8:15 UTC (permalink / raw)
  To: Kevin Wolf, qemu-devel; +Cc: Markus Armbruster, Stefan Hajnoczi

> From: Kevin Wolf <kwolf@redhat.com>
>
> This moves the QMP dispatcher to a coroutine and runs all QMP command
> handlers that declare 'coroutine': true in coroutine context so they
> can avoid blocking the main loop while doing I/O or waiting for other
> events.
>
> For commands that are not declared safe to run in a coroutine, the
> dispatcher drops out of coroutine context by calling the QMP command
> handler from a bottom half.

Hi Kevin,

since commit 9ce44e2ce2 "qmp: Move dispatcher to a coroutine" I see the following error on Windows whenever I close the QEMU window or shut down the guest.

$ ./qemu-system-x86_64.exe -machine pc,accel=tcg -display gtk
**
ERROR:../qemu/util/aio-win32.c:337:aio_poll: assertion failed: (in_aio_context_home_thread(ctx))
Bail out! ERROR:../qemu/util/aio-win32.c:337:aio_poll: assertion failed: (in_aio_context_home_thread(ctx))

I wonder if you forgot to apply the changes to util/aio-posix.c to util/aio-win32.c too? This solves the problem on my Windows host. But I have to admit I don't know the code here.

With best regards,
Volker

> diff --git a/util/aio-posix.c b/util/aio-posix.c
> index 280f27bb99..30f5354b1e 100644
> --- a/util/aio-posix.c
> +++ b/util/aio-posix.c
> @@ -15,6 +15,7 @@
>  
>  #include "qemu/osdep.h"
>  #include "block/block.h"
> +#include "qemu/main-loop.h"
>  #include "qemu/rcu.h"
>  #include "qemu/rcu_queue.h"
>  #include "qemu/sockets.h"
> @@ -558,8 +559,13 @@ bool aio_poll(AioContext *ctx, bool blocking)
>       * There cannot be two concurrent aio_poll calls for the same AioContext (or
>       * an aio_poll concurrent with a GSource prepare/check/dispatch callback).
>       * We rely on this below to avoid slow locked accesses to ctx->notify_me.
> +     *
> +     * aio_poll() may only be called in the AioContext's thread. iohandler_ctx
> +     * is special in that it runs in the main thread, but that thread's context
> +     * is qemu_aio_context.
>       */
> -    assert(in_aio_context_home_thread(ctx));
> +    assert(in_aio_context_home_thread(ctx == iohandler_get_aio_context() ?
> +                                      qemu_get_aio_context() : ctx));
>  
>      qemu_lockcnt_inc(&ctx->list_lock);
>  



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PULL 09/14] qmp: Move dispatcher to a coroutine
  2020-10-17  8:15   ` Volker Rümelin
@ 2020-10-19  8:57     ` Kevin Wolf
  2020-10-19 18:24       ` Volker Rümelin
  0 siblings, 1 reply; 27+ messages in thread
From: Kevin Wolf @ 2020-10-19  8:57 UTC (permalink / raw)
  To: Volker Rümelin; +Cc: qemu-devel, Stefan Hajnoczi, Markus Armbruster

Am 17.10.2020 um 10:15 hat Volker Rümelin geschrieben:
> > From: Kevin Wolf <kwolf@redhat.com>
> >
> > This moves the QMP dispatcher to a coroutine and runs all QMP command
> > handlers that declare 'coroutine': true in coroutine context so they
> > can avoid blocking the main loop while doing I/O or waiting for other
> > events.
> >
> > For commands that are not declared safe to run in a coroutine, the
> > dispatcher drops out of coroutine context by calling the QMP command
> > handler from a bottom half.
> 
> Hi Kevin,
> 
> since commit 9ce44e2ce2 "qmp: Move dispatcher to a coroutine" I see
> the following error on Windows whenever I close the QEMU window or
> shut down the guest.
> 
> $ ./qemu-system-x86_64.exe -machine pc,accel=tcg -display gtk
> **
> ERROR:../qemu/util/aio-win32.c:337:aio_poll: assertion failed: (in_aio_context_home_thread(ctx))
> Bail out! ERROR:../qemu/util/aio-win32.c:337:aio_poll: assertion failed: (in_aio_context_home_thread(ctx))
> 
> I wonder if you forgot to apply the changes to util/aio-posix.c to
> util/aio-win32.c too? This solves the problem on my Windows host. But
> I have to admit I don't know the code here.

Hi Volker,

yes, you're right. The assertion in the Windows code was added only in
commit 5710a3e09f9 after I had posted some versions of the patch series,
so while I did check this initially, I missed during the rebase for
later versions of the series that I would have to update the patches for
Windows.

Would you like to send a patch for this? I could send one myself if you
prefer, but I can only compile-test Windows patches, so I'd have to rely
on your testing anyway.

Kevin

> > diff --git a/util/aio-posix.c b/util/aio-posix.c
> > index 280f27bb99..30f5354b1e 100644
> > --- a/util/aio-posix.c
> > +++ b/util/aio-posix.c
> > @@ -15,6 +15,7 @@
> >  
> >  #include "qemu/osdep.h"
> >  #include "block/block.h"
> > +#include "qemu/main-loop.h"
> >  #include "qemu/rcu.h"
> >  #include "qemu/rcu_queue.h"
> >  #include "qemu/sockets.h"
> > @@ -558,8 +559,13 @@ bool aio_poll(AioContext *ctx, bool blocking)
> >       * There cannot be two concurrent aio_poll calls for the same AioContext (or
> >       * an aio_poll concurrent with a GSource prepare/check/dispatch callback).
> >       * We rely on this below to avoid slow locked accesses to ctx->notify_me.
> > +     *
> > +     * aio_poll() may only be called in the AioContext's thread. iohandler_ctx
> > +     * is special in that it runs in the main thread, but that thread's context
> > +     * is qemu_aio_context.
> >       */
> > -    assert(in_aio_context_home_thread(ctx));
> > +    assert(in_aio_context_home_thread(ctx == iohandler_get_aio_context() ?
> > +                                      qemu_get_aio_context() : ctx));
> >  
> >      qemu_lockcnt_inc(&ctx->list_lock);
> >  
> 



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PULL 09/14] qmp: Move dispatcher to a coroutine
  2020-10-19  8:57     ` Kevin Wolf
@ 2020-10-19 18:24       ` Volker Rümelin
  0 siblings, 0 replies; 27+ messages in thread
From: Volker Rümelin @ 2020-10-19 18:24 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, Stefan Hajnoczi, Markus Armbruster

>>> Hi Kevin,
>>>
>>> since commit 9ce44e2ce2 "qmp: Move dispatcher to a coroutine" I see
>>> the following error on Windows whenever I close the QEMU window or
>>> shut down the guest.
>>>
>>> $ ./qemu-system-x86_64.exe -machine pc,accel=tcg -display gtk
>>> **
>>> ERROR:../qemu/util/aio-win32.c:337:aio_poll: assertion failed: (in_aio_context_home_thread(ctx))
>>> Bail out! ERROR:../qemu/util/aio-win32.c:337:aio_poll: assertion failed: (in_aio_context_home_thread(ctx))
>>>
>>> I wonder if you forgot to apply the changes to util/aio-posix.c to
>>> util/aio-win32.c too? This solves the problem on my Windows host. But
>>> I have to admit I don't know the code here.
> Hi Volker,
>
> yes, you're right. The assertion in the Windows code was added only in
> commit 5710a3e09f9 after I had posted some versions of the patch series,
> so while I did check this initially, I missed during the rebase for
> later versions of the series that I would have to update the patches for
> Windows.
>
> Would you like to send a patch for this? I could send one myself if you
> prefer, but I can only compile-test Windows patches, so I'd have to rely
> on your testing anyway.
>
> Kevin

Thank you for that information. I'll send a patch.

With best regards,
Volker


^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, back to index

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-09  6:34 [PULL 00/14] Monitor patches for 2020-10-09 Markus Armbruster
2020-10-09  6:34 ` [PULL 01/14] monitor: Add Monitor parameter to monitor_set_cpu() Markus Armbruster
2020-10-09  6:34 ` [PULL 02/14] monitor: Add Monitor parameter to monitor_get_cpu_index() Markus Armbruster
2020-10-09  6:34 ` [PULL 03/14] monitor: Use getter/setter functions for cur_mon Markus Armbruster
2020-10-09  6:34 ` [PULL 04/14] hmp: Update current monitor only in handle_hmp_command() Markus Armbruster
2020-10-09  6:34 ` [PULL 05/14] qmp: Assert that no other monitor is active Markus Armbruster
2020-10-09  6:34 ` [PULL 06/14] qmp: Call monitor_set_cur() only in qmp_dispatch() Markus Armbruster
2020-10-09  6:34 ` [PULL 07/14] monitor: Make current monitor a per-coroutine property Markus Armbruster
2020-10-09  6:34 ` [PULL 08/14] qapi: Add a 'coroutine' flag for commands Markus Armbruster
2020-10-09  6:34 ` [PULL 09/14] qmp: Move dispatcher to a coroutine Markus Armbruster
2020-10-12 10:47   ` Alex Bennée
2020-10-12 11:25     ` Kevin Wolf
2020-10-12 11:53       ` Philippe Mathieu-Daudé
2020-10-12 12:47         ` Kevin Wolf
2020-10-12 15:02           ` Alex Bennée
2020-10-12 18:49             ` Ben Widawsky
2020-10-13  7:56               ` Philippe Mathieu-Daudé
2020-10-13 11:29                 ` Kevin Wolf
2020-10-17  8:15   ` Volker Rümelin
2020-10-19  8:57     ` Kevin Wolf
2020-10-19 18:24       ` Volker Rümelin
2020-10-09  6:34 ` [PULL 10/14] hmp: Add support for coroutine command handlers Markus Armbruster
2020-10-09  6:34 ` [PULL 11/14] util/async: Add aio_co_reschedule_self() Markus Armbruster
2020-10-09  6:34 ` [PULL 12/14] block: Add bdrv_co_enter()/leave() Markus Armbruster
2020-10-09  6:34 ` [PULL 13/14] block: Add bdrv_lock()/unlock() Markus Armbruster
2020-10-09  6:34 ` [PULL 14/14] block: Convert 'block_resize' to coroutine Markus Armbruster
2020-10-09 13:47 ` [PULL 00/14] Monitor patches for 2020-10-09 Peter Maydell

QEMU-Devel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/qemu-devel/0 qemu-devel/git/0.git
	git clone --mirror https://lore.kernel.org/qemu-devel/1 qemu-devel/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 qemu-devel qemu-devel/ https://lore.kernel.org/qemu-devel \
		qemu-devel@nongnu.org
	public-inbox-index qemu-devel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.nongnu.qemu-devel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git