All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v9 0/8] Introduce 'yank' oob qmp command to recover from hanging qemu
@ 2020-10-28 18:45 Lukas Straub
  2020-10-28 18:45 ` [PATCH v9 1/8] Introduce yank feature Lukas Straub
                   ` (7 more replies)
  0 siblings, 8 replies; 13+ messages in thread
From: Lukas Straub @ 2020-10-28 18:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Daniel P. Berrangé,
	qemu-block, Juan Quintela, Markus Armbruster,
	Dr. David Alan Gilbert, Max Reitz, Paolo Bonzini,
	Marc-André Lureau

[-- Attachment #1: Type: text/plain, Size: 3545 bytes --]

Hello Everyone,
I finally found time again to work on this, so here is v9 with the new qmp api.
We still need ACKs from NBD and chardev maintainers.

Changes:

v9:
 -rebase onto master
 -implemented new qmp api as proposed by Markus

v8:
 -add Reviewed-by and Acked-by tags
 -rebase onto master
  -minor change to migration
  -convert to meson
 -change "Since:" to 5.2
 -varios code style fixes (Markus Armbruster)
 -point to oob restrictions in comment to yank_register_function
  (Markus Armbruster)
 -improve qmp documentation (Markus Armbruster)
 -document oob suitability of qio_channel and io_shutdown (Markus Armbruster)

v7:
 -yank_register_instance now returns error via Error **errp instead of aborting
 -dropped "chardev/char.c: Check for duplicate id before  creating chardev"

v6:
 -add Reviewed-by and Acked-by tags
 -rebase on master
 -lots of changes in nbd due to rebase
 -only take maintainership of util/yank.c and include/qemu/yank.h (Daniel P. Berrangé)
 -fix a crash discovered by the newly added chardev test
 -fix the test itself

v5:
 -move yank.c to util/
 -move yank.h to include/qemu/
 -add license to yank.h
 -use const char*
 -nbd: use atomic_store_release and atomic_load_aqcuire
 -io-channel: ensure thread-safety and document it
 -add myself as maintainer for yank

v4:
 -fix build errors...

v3:
 -don't touch softmmu/vl.c, use __contructor__ attribute instead (Paolo Bonzini)
 -fix build errors
 -rewrite migration patch so it actually passes all tests

v2:
 -don't touch io/ code anymore
 -always register yank functions
 -'yank' now takes a list of instances to yank
 -'query-yank' returns a list of yankable instances

Overview:
Hello Everyone,
In many cases, if qemu has a network connection (qmp, migration, chardev, etc.)
to some other server and that server dies or hangs, qemu hangs too.
These patches introduce the new 'yank' out-of-band qmp command to recover from
these kinds of hangs. The different subsystems register callbacks which get
executed with the yank command. For example the callback can shutdown() a
socket. This is intended for the colo use-case, but it can be used for other
things too of course.

Regards,
Lukas Straub

Lukas Straub (8):
  Introduce yank feature
  block/nbd.c: Add yank feature
  chardev/char-socket.c: Add yank feature
  migration: Add yank feature
  io/channel-tls.c: make qio_channel_tls_shutdown thread-safe
  io: Document qmp oob suitability of qio_channel_shutdown and
    io_shutdown
  MAINTAINERS: Add myself as maintainer for yank feature
  tests/test-char.c: Wait for the chardev to connect in
    char_socket_client_dupid_test

 MAINTAINERS                   |   6 +
 block/nbd.c                   | 154 ++++++++++++++----------
 chardev/char-socket.c         |  35 ++++++
 include/io/channel.h          |   5 +-
 include/qemu/yank.h           |  95 +++++++++++++++
 io/channel-tls.c              |   6 +-
 migration/channel.c           |  13 +++
 migration/migration.c         |  25 ++++
 migration/multifd.c           |  10 ++
 migration/qemu-file-channel.c |   7 ++
 migration/savevm.c            |   6 +
 qapi/misc.json                | 106 +++++++++++++++++
 tests/test-char.c             |   1 +
 util/meson.build              |   1 +
 util/yank.c                   | 213 ++++++++++++++++++++++++++++++++++
 15 files changed, 619 insertions(+), 64 deletions(-)
 create mode 100644 include/qemu/yank.h
 create mode 100644 util/yank.c

--
2.20.1

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v9 1/8] Introduce yank feature
  2020-10-28 18:45 [PATCH v9 0/8] Introduce 'yank' oob qmp command to recover from hanging qemu Lukas Straub
@ 2020-10-28 18:45 ` Lukas Straub
  2020-10-29 16:36   ` Markus Armbruster
  2020-10-28 18:45 ` [PATCH v9 2/8] block/nbd.c: Add " Lukas Straub
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 13+ messages in thread
From: Lukas Straub @ 2020-10-28 18:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Daniel P. Berrangé,
	qemu-block, Juan Quintela, Markus Armbruster,
	Dr. David Alan Gilbert, Max Reitz, Paolo Bonzini,
	Marc-André Lureau

[-- Attachment #1: Type: text/plain, Size: 12655 bytes --]

The yank feature allows to recover from hanging qemu by "yanking"
at various parts. Other qemu systems can register themselves and
multiple yank functions. Then all yank functions for selected
instances can be called by the 'yank' out-of-band qmp command.
Available instances can be queried by a 'query-yank' oob command.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/qemu/yank.h |  95 ++++++++++++++++++++
 qapi/misc.json      | 106 ++++++++++++++++++++++
 util/meson.build    |   1 +
 util/yank.c         | 213 ++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 415 insertions(+)
 create mode 100644 include/qemu/yank.h
 create mode 100644 util/yank.c

diff --git a/include/qemu/yank.h b/include/qemu/yank.h
new file mode 100644
index 0000000000..89755e62af
--- /dev/null
+++ b/include/qemu/yank.h
@@ -0,0 +1,95 @@
+/*
+ * QEMU yank feature
+ *
+ * Copyright (c) Lukas Straub <lukasstraub2@web.de>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef YANK_H
+#define YANK_H
+
+#include "qapi/qapi-types-misc.h"
+
+typedef void (YankFn)(void *opaque);
+
+/**
+ * yank_register_instance: Register a new instance.
+ *
+ * This registers a new instance for yanking. Must be called before any yank
+ * function is registered for this instance.
+ *
+ * This function is thread-safe.
+ *
+ * @instance: The instance.
+ * @errp: Error object.
+ */
+void yank_register_instance(const YankInstance *instance, Error **errp);
+
+/**
+ * yank_unregister_instance: Unregister a instance.
+ *
+ * This unregisters a instance. Must be called only after every yank function
+ * of the instance has been unregistered.
+ *
+ * This function is thread-safe.
+ *
+ * @instance: The instance.
+ */
+void yank_unregister_instance(const YankInstance *instance);
+
+/**
+ * yank_register_function: Register a yank function
+ *
+ * This registers a yank function. All limitations of qmp oob commands apply
+ * to the yank function as well. See docs/devel/qapi-code-gen.txt under
+ * "An OOB-capable command handler must satisfy the following conditions".
+ *
+ * This function is thread-safe.
+ *
+ * @instance: The instance.
+ * @func: The yank function.
+ * @opaque: Will be passed to the yank function.
+ */
+void yank_register_function(const YankInstance *instance,
+                            YankFn *func,
+                            void *opaque);
+
+/**
+ * yank_unregister_function: Unregister a yank function
+ *
+ * This unregisters a yank function.
+ *
+ * This function is thread-safe.
+ *
+ * @instance: The instance.
+ * @func: func that was passed to yank_register_function.
+ * @opaque: opaque that was passed to yank_register_function.
+ */
+void yank_unregister_function(const YankInstance *instance,
+                              YankFn *func,
+                              void *opaque);
+
+/**
+ * yank_generic_iochannel: Generic yank function for iochannel
+ *
+ * This is a generic yank function which will call qio_channel_shutdown on the
+ * provided QIOChannel.
+ *
+ * @opaque: QIOChannel to shutdown
+ */
+void yank_generic_iochannel(void *opaque);
+
+#define BLOCKDEV_YANK_INSTANCE(the_node_name) (&(YankInstance) { \
+        .type = YANK_INSTANCE_TYPE_BLOCKDEV, \
+        .u.blockdev.node_name = (the_node_name) })
+
+#define CHARDEV_YANK_INSTANCE(the_id) (&(YankInstance) { \
+        .type = YANK_INSTANCE_TYPE_CHARDEV, \
+        .u.chardev.id = (the_id) })
+
+#define MIGRATION_YANK_INSTANCE (&(YankInstance) { \
+        .type = YANK_INSTANCE_TYPE_MIGRATION })
+
+#endif
diff --git a/qapi/misc.json b/qapi/misc.json
index 40df513856..3b7de02a4d 100644
--- a/qapi/misc.json
+++ b/qapi/misc.json
@@ -568,3 +568,109 @@
  'data': { '*option': 'str' },
  'returns': ['CommandLineOptionInfo'],
  'allow-preconfig': true }
+
+##
+# @YankInstanceType:
+#
+# An enumeration of yank instance types. See "YankInstance" for more
+# information.
+#
+# Since: 5.2
+##
+{ 'enum': 'YankInstanceType',
+  'data': [ 'blockdev', 'chardev', 'migration' ] }
+
+##
+# @YankInstanceBlockdev:
+#
+# Specifies which blockdev to yank. See "YankInstance" for more information.
+#
+# @node-name: the blockdev's node-name
+#
+# Since: 5.2
+##
+{ 'struct': 'YankInstanceBlockdev',
+  'data': { 'node-name': 'str' } }
+
+##
+# @YankInstanceChardev:
+#
+# Specifies which chardev to yank. See "YankInstance" for more information.
+#
+# @id: the chardev's ID
+#
+# Since: 5.2
+##
+{ 'struct': 'YankInstanceChardev',
+  'data': { 'id': 'str' } }
+
+##
+# @YankInstance:
+#
+# A yank instance can be yanked with the "yank" qmp command to recover from a
+# hanging qemu.
+#
+# Currently implemented yank instances:
+#  -nbd block device:
+#   Yanking it will shutdown the connection to the nbd server without
+#   attempting to reconnect.
+#  -socket chardev:
+#   Yanking it will shutdown the connected socket.
+#  -migration:
+#   Yanking it will shutdown all migration connections.
+#
+# Since: 5.2
+##
+{ 'union': 'YankInstance',
+  'base': { 'type': 'YankInstanceType' },
+  'discriminator': 'type',
+  'data': {
+      'blockdev': 'YankInstanceBlockdev',
+      'chardev': 'YankInstanceChardev' } }
+
+##
+# @yank:
+#
+# Recover from hanging qemu by yanking the specified instances. See
+# "YankInstance" for more information.
+#
+# Takes a list of @YankInstance as argument.
+#
+# Returns: nothing.
+#
+# Example:
+#
+# -> { "execute": "yank",
+#      "arguments": {
+#          "instances": [
+#               { "type": "block-node",
+#                 "node-name": "nbd0" }
+#          ] } }
+# <- { "return": {} }
+#
+# Since: 5.2
+##
+{ 'command': 'yank',
+  'data': { 'instances': ['YankInstance'] },
+  'allow-oob': true }
+
+##
+# @query-yank:
+#
+# Query yank instances. See "YankInstance" for more information.
+#
+# Returns: list of @YankInstance
+#
+# Example:
+#
+# -> { "execute": "query-yank" }
+# <- { "return": [
+#          { "type": "block-node",
+#            "node-name": "nbd0" }
+#      ] }
+#
+# Since: 5.2
+##
+{ 'command': 'query-yank',
+  'returns': ['YankInstance'],
+  'allow-oob': true }
diff --git a/util/meson.build b/util/meson.build
index c5159ad79d..dbda9d9123 100644
--- a/util/meson.build
+++ b/util/meson.build
@@ -50,6 +50,7 @@ endif

 if have_system
   util_ss.add(when: 'CONFIG_GIO', if_true: [files('dbus.c'), gio])
+  util_ss.add(files('yank.c'))
 endif

 if have_block
diff --git a/util/yank.c b/util/yank.c
new file mode 100644
index 0000000000..0b3a816706
--- /dev/null
+++ b/util/yank.c
@@ -0,0 +1,213 @@
+/*
+ * QEMU yank feature
+ *
+ * Copyright (c) Lukas Straub <lukasstraub2@web.de>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qemu/thread.h"
+#include "qemu/queue.h"
+#include "qapi/qapi-commands-misc.h"
+#include "qapi/qapi-visit-misc.h"
+#include "qapi/clone-visitor.h"
+#include "io/channel.h"
+#include "qemu/yank.h"
+
+struct YankFuncAndParam {
+    YankFn *func;
+    void *opaque;
+    QLIST_ENTRY(YankFuncAndParam) next;
+};
+
+struct YankInstanceEntry {
+    YankInstance *instance;
+    QLIST_HEAD(, YankFuncAndParam) yankfns;
+    QLIST_ENTRY(YankInstanceEntry) next;
+};
+
+typedef struct YankFuncAndParam YankFuncAndParam;
+typedef struct YankInstanceEntry YankInstanceEntry;
+
+/*
+ * This lock protects the yank_instance_list below.
+ */
+static QemuMutex yank_lock;
+
+static QLIST_HEAD(, YankInstanceEntry) yank_instance_list
+    = QLIST_HEAD_INITIALIZER(yank_instance_list);
+
+static int yank_compare_instances(const YankInstance *a, const YankInstance *b)
+{
+    if (a->type != b->type) {
+        return 0;
+    }
+
+    switch (a->type) {
+    case YANK_INSTANCE_TYPE_BLOCKDEV:
+        return !strcmp(a->u.blockdev.node_name, b->u.blockdev.node_name);
+    break;
+
+    case YANK_INSTANCE_TYPE_CHARDEV:
+        return !strcmp(a->u.chardev.id, b->u.chardev.id);
+    break;
+
+    case YANK_INSTANCE_TYPE_MIGRATION:
+        return 1;
+    break;
+
+    default:
+        abort();
+    }
+}
+
+static YankInstanceEntry *yank_find_entry(const YankInstance *instance)
+{
+    YankInstanceEntry *entry;
+
+    QLIST_FOREACH(entry, &yank_instance_list, next) {
+        if (yank_compare_instances(entry->instance, instance)) {
+            return entry;
+        }
+    }
+    return NULL;
+}
+
+void yank_register_instance(const YankInstance *instance, Error **errp)
+{
+    YankInstanceEntry *entry;
+
+    qemu_mutex_lock(&yank_lock);
+
+    if (yank_find_entry(instance)) {
+        error_setg(errp, "duplicate yank instance");
+        qemu_mutex_unlock(&yank_lock);
+        return;
+    }
+
+    entry = g_slice_new(YankInstanceEntry);
+    entry->instance = QAPI_CLONE(YankInstance, instance);
+    QLIST_INIT(&entry->yankfns);
+    QLIST_INSERT_HEAD(&yank_instance_list, entry, next);
+
+    qemu_mutex_unlock(&yank_lock);
+}
+
+void yank_unregister_instance(const YankInstance *instance)
+{
+    YankInstanceEntry *entry;
+
+    qemu_mutex_lock(&yank_lock);
+    entry = yank_find_entry(instance);
+    assert(entry);
+
+    assert(QLIST_EMPTY(&entry->yankfns));
+    QLIST_REMOVE(entry, next);
+    qapi_free_YankInstance(entry->instance);
+    g_slice_free(YankInstanceEntry, entry);
+
+    qemu_mutex_unlock(&yank_lock);
+}
+
+void yank_register_function(const YankInstance *instance,
+                            YankFn *func,
+                            void *opaque)
+{
+    YankInstanceEntry *entry;
+    YankFuncAndParam *func_entry;
+
+    qemu_mutex_lock(&yank_lock);
+    entry = yank_find_entry(instance);
+    assert(entry);
+
+    func_entry = g_slice_new(YankFuncAndParam);
+    func_entry->func = func;
+    func_entry->opaque = opaque;
+
+    QLIST_INSERT_HEAD(&entry->yankfns, func_entry, next);
+    qemu_mutex_unlock(&yank_lock);
+}
+
+void yank_unregister_function(const YankInstance *instance,
+                              YankFn *func,
+                              void *opaque)
+{
+    YankInstanceEntry *entry;
+    YankFuncAndParam *func_entry;
+
+    qemu_mutex_lock(&yank_lock);
+    entry = yank_find_entry(instance);
+    assert(entry);
+
+    QLIST_FOREACH(func_entry, &entry->yankfns, next) {
+        if (func_entry->func == func && func_entry->opaque == opaque) {
+            QLIST_REMOVE(func_entry, next);
+            g_slice_free(YankFuncAndParam, func_entry);
+            qemu_mutex_unlock(&yank_lock);
+            return;
+        }
+    }
+
+    abort();
+}
+
+void yank_generic_iochannel(void *opaque)
+{
+    QIOChannel *ioc = QIO_CHANNEL(opaque);
+
+    qio_channel_shutdown(ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
+}
+
+void qmp_yank(YankInstanceList *instances,
+              Error **errp)
+{
+    YankInstanceList *tail;
+    YankInstanceEntry *entry;
+    YankFuncAndParam *func_entry;
+
+    qemu_mutex_lock(&yank_lock);
+    for (tail = instances; tail; tail = tail->next) {
+        entry = yank_find_entry(tail->value);
+        if (!entry) {
+            error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND, "Instance not found");
+            qemu_mutex_unlock(&yank_lock);
+            return;
+        }
+    }
+    for (tail = instances; tail; tail = tail->next) {
+        entry = yank_find_entry(tail->value);
+        assert(entry);
+        QLIST_FOREACH(func_entry, &entry->yankfns, next) {
+            func_entry->func(func_entry->opaque);
+        }
+    }
+    qemu_mutex_unlock(&yank_lock);
+}
+
+YankInstanceList *qmp_query_yank(Error **errp)
+{
+    YankInstanceEntry *entry;
+    YankInstanceList *ret;
+
+    ret = NULL;
+
+    qemu_mutex_lock(&yank_lock);
+    QLIST_FOREACH(entry, &yank_instance_list, next) {
+        YankInstanceList *new_entry;
+        new_entry = g_new0(YankInstanceList, 1);
+        new_entry->value = QAPI_CLONE(YankInstance, entry->instance);
+        new_entry->next = ret;
+        ret = new_entry;
+    }
+    qemu_mutex_unlock(&yank_lock);
+
+    return ret;
+}
+
+static void __attribute__((__constructor__)) yank_init(void)
+{
+    qemu_mutex_init(&yank_lock);
+}
--
2.20.1


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v9 2/8] block/nbd.c: Add yank feature
  2020-10-28 18:45 [PATCH v9 0/8] Introduce 'yank' oob qmp command to recover from hanging qemu Lukas Straub
  2020-10-28 18:45 ` [PATCH v9 1/8] Introduce yank feature Lukas Straub
@ 2020-10-28 18:45 ` Lukas Straub
  2020-10-28 18:45 ` [PATCH v9 3/8] chardev/char-socket.c: " Lukas Straub
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Lukas Straub @ 2020-10-28 18:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Daniel P. Berrangé,
	qemu-block, Juan Quintela, Markus Armbruster,
	Dr. David Alan Gilbert, Max Reitz, Paolo Bonzini,
	Marc-André Lureau

[-- Attachment #1: Type: text/plain, Size: 17440 bytes --]

Register a yank function which shuts down the socket and sets
s->state = NBD_CLIENT_QUIT. This is the same behaviour as if an
error occured.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/nbd.c | 154 +++++++++++++++++++++++++++++++---------------------
 1 file changed, 93 insertions(+), 61 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 4548046cd7..d66c84ee40 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -35,6 +35,7 @@
 #include "qemu/option.h"
 #include "qemu/cutils.h"
 #include "qemu/main-loop.h"
+#include "qemu/atomic.h"

 #include "qapi/qapi-visit-sockets.h"
 #include "qapi/qmp/qstring.h"
@@ -44,6 +45,8 @@
 #include "block/nbd.h"
 #include "block/block_int.h"

+#include "qemu/yank.h"
+
 #define EN_OPTSTR ":exportname="
 #define MAX_NBD_REQUESTS    16

@@ -140,14 +143,13 @@ typedef struct BDRVNBDState {
     NBDConnectThread *connect_thread;
 } BDRVNBDState;

-static QIOChannelSocket *nbd_establish_connection(SocketAddress *saddr,
-                                                  Error **errp);
-static QIOChannelSocket *nbd_co_establish_connection(BlockDriverState *bs,
-                                                     Error **errp);
+static int nbd_establish_connection(BlockDriverState *bs, SocketAddress *saddr,
+                                    Error **errp);
+static int nbd_co_establish_connection(BlockDriverState *bs, Error **errp);
 static void nbd_co_establish_connection_cancel(BlockDriverState *bs,
                                                bool detach);
-static int nbd_client_handshake(BlockDriverState *bs, QIOChannelSocket *sioc,
-                                Error **errp);
+static int nbd_client_handshake(BlockDriverState *bs, Error **errp);
+static void nbd_yank(void *opaque);

 static void nbd_clear_bdrvstate(BDRVNBDState *s)
 {
@@ -165,12 +167,12 @@ static void nbd_clear_bdrvstate(BDRVNBDState *s)
 static void nbd_channel_error(BDRVNBDState *s, int ret)
 {
     if (ret == -EIO) {
-        if (s->state == NBD_CLIENT_CONNECTED) {
+        if (qatomic_load_acquire(&s->state) == NBD_CLIENT_CONNECTED) {
             s->state = s->reconnect_delay ? NBD_CLIENT_CONNECTING_WAIT :
                                             NBD_CLIENT_CONNECTING_NOWAIT;
         }
     } else {
-        if (s->state == NBD_CLIENT_CONNECTED) {
+        if (qatomic_load_acquire(&s->state) == NBD_CLIENT_CONNECTED) {
             qio_channel_shutdown(s->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
         }
         s->state = NBD_CLIENT_QUIT;
@@ -203,7 +205,7 @@ static void reconnect_delay_timer_cb(void *opaque)
 {
     BDRVNBDState *s = opaque;

-    if (s->state == NBD_CLIENT_CONNECTING_WAIT) {
+    if (qatomic_load_acquire(&s->state) == NBD_CLIENT_CONNECTING_WAIT) {
         s->state = NBD_CLIENT_CONNECTING_NOWAIT;
         while (qemu_co_enter_next(&s->free_sema, NULL)) {
             /* Resume all queued requests */
@@ -215,7 +217,7 @@ static void reconnect_delay_timer_cb(void *opaque)

 static void reconnect_delay_timer_init(BDRVNBDState *s, uint64_t expire_time_ns)
 {
-    if (s->state != NBD_CLIENT_CONNECTING_WAIT) {
+    if (qatomic_load_acquire(&s->state) != NBD_CLIENT_CONNECTING_WAIT) {
         return;
     }

@@ -260,7 +262,7 @@ static void nbd_client_attach_aio_context(BlockDriverState *bs,
      * s->connection_co is either yielded from nbd_receive_reply or from
      * nbd_co_reconnect_loop()
      */
-    if (s->state == NBD_CLIENT_CONNECTED) {
+    if (qatomic_load_acquire(&s->state) == NBD_CLIENT_CONNECTED) {
         qio_channel_attach_aio_context(QIO_CHANNEL(s->ioc), new_context);
     }

@@ -286,7 +288,7 @@ static void coroutine_fn nbd_client_co_drain_begin(BlockDriverState *bs)

     reconnect_delay_timer_del(s);

-    if (s->state == NBD_CLIENT_CONNECTING_WAIT) {
+    if (qatomic_load_acquire(&s->state) == NBD_CLIENT_CONNECTING_WAIT) {
         s->state = NBD_CLIENT_CONNECTING_NOWAIT;
         qemu_co_queue_restart_all(&s->free_sema);
     }
@@ -337,13 +339,14 @@ static void nbd_teardown_connection(BlockDriverState *bs)

 static bool nbd_client_connecting(BDRVNBDState *s)
 {
-    return s->state == NBD_CLIENT_CONNECTING_WAIT ||
-        s->state == NBD_CLIENT_CONNECTING_NOWAIT;
+    NBDClientState state = qatomic_load_acquire(&s->state);
+    return state == NBD_CLIENT_CONNECTING_WAIT ||
+        state == NBD_CLIENT_CONNECTING_NOWAIT;
 }

 static bool nbd_client_connecting_wait(BDRVNBDState *s)
 {
-    return s->state == NBD_CLIENT_CONNECTING_WAIT;
+    return qatomic_load_acquire(&s->state) == NBD_CLIENT_CONNECTING_WAIT;
 }

 static void connect_bh(void *opaque)
@@ -423,12 +426,12 @@ static void *connect_thread_func(void *opaque)
     return NULL;
 }

-static QIOChannelSocket *coroutine_fn
+static int coroutine_fn
 nbd_co_establish_connection(BlockDriverState *bs, Error **errp)
 {
+    int ret;
     QemuThread thread;
     BDRVNBDState *s = bs->opaque;
-    QIOChannelSocket *res;
     NBDConnectThread *thr = s->connect_thread;

     qemu_mutex_lock(&thr->mutex);
@@ -445,10 +448,12 @@ nbd_co_establish_connection(BlockDriverState *bs, Error **errp)
     case CONNECT_THREAD_SUCCESS:
         /* Previous attempt finally succeeded in background */
         thr->state = CONNECT_THREAD_NONE;
-        res = thr->sioc;
+        s->sioc = thr->sioc;
         thr->sioc = NULL;
+        yank_register_function(BLOCKDEV_YANK_INSTANCE(bs->node_name),
+                               nbd_yank, bs);
         qemu_mutex_unlock(&thr->mutex);
-        return res;
+        return 0;
     case CONNECT_THREAD_RUNNING:
         /* Already running, will wait */
         break;
@@ -480,8 +485,13 @@ nbd_co_establish_connection(BlockDriverState *bs, Error **errp)
         thr->state = CONNECT_THREAD_NONE;
         error_propagate(errp, thr->err);
         thr->err = NULL;
-        res = thr->sioc;
+        s->sioc = thr->sioc;
         thr->sioc = NULL;
+        if (s->sioc) {
+            yank_register_function(BLOCKDEV_YANK_INSTANCE(bs->node_name),
+                                   nbd_yank, bs);
+        }
+        ret = (s->sioc ? 0 : -1);
         break;
     case CONNECT_THREAD_RUNNING:
     case CONNECT_THREAD_RUNNING_DETACHED:
@@ -490,7 +500,7 @@ nbd_co_establish_connection(BlockDriverState *bs, Error **errp)
          * failed. Still connect thread is executing in background, and its
          * result may be used for next connection attempt.
          */
-        res = NULL;
+        ret = -1;
         error_setg(errp, "Connection attempt cancelled by other operation");
         break;

@@ -507,7 +517,7 @@ nbd_co_establish_connection(BlockDriverState *bs, Error **errp)

     qemu_mutex_unlock(&thr->mutex);

-    return res;
+    return ret;
 }

 /*
@@ -560,7 +570,6 @@ static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s)
 {
     int ret;
     Error *local_err = NULL;
-    QIOChannelSocket *sioc;

     if (!nbd_client_connecting(s)) {
         return;
@@ -593,21 +602,22 @@ static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s)
     /* Finalize previous connection if any */
     if (s->ioc) {
         qio_channel_detach_aio_context(QIO_CHANNEL(s->ioc));
+        yank_unregister_function(BLOCKDEV_YANK_INSTANCE(s->bs->node_name),
+                                 nbd_yank, s->bs);
         object_unref(OBJECT(s->sioc));
         s->sioc = NULL;
         object_unref(OBJECT(s->ioc));
         s->ioc = NULL;
     }

-    sioc = nbd_co_establish_connection(s->bs, &local_err);
-    if (!sioc) {
+    if (nbd_co_establish_connection(s->bs, &local_err) < 0) {
         ret = -ECONNREFUSED;
         goto out;
     }

     bdrv_dec_in_flight(s->bs);

-    ret = nbd_client_handshake(s->bs, sioc, &local_err);
+    ret = nbd_client_handshake(s->bs, &local_err);

     if (s->drained) {
         s->wait_drained_end = true;
@@ -639,7 +649,7 @@ static coroutine_fn void nbd_co_reconnect_loop(BDRVNBDState *s)
     uint64_t timeout = 1 * NANOSECONDS_PER_SECOND;
     uint64_t max_timeout = 16 * NANOSECONDS_PER_SECOND;

-    if (s->state == NBD_CLIENT_CONNECTING_WAIT) {
+    if (qatomic_load_acquire(&s->state) == NBD_CLIENT_CONNECTING_WAIT) {
         reconnect_delay_timer_init(s, qemu_clock_get_ns(QEMU_CLOCK_REALTIME) +
                                    s->reconnect_delay * NANOSECONDS_PER_SECOND);
     }
@@ -682,7 +692,7 @@ static coroutine_fn void nbd_connection_entry(void *opaque)
     int ret = 0;
     Error *local_err = NULL;

-    while (s->state != NBD_CLIENT_QUIT) {
+    while (qatomic_load_acquire(&s->state) != NBD_CLIENT_QUIT) {
         /*
          * The NBD client can only really be considered idle when it has
          * yielded from qio_channel_readv_all_eof(), waiting for data. This is
@@ -697,7 +707,7 @@ static coroutine_fn void nbd_connection_entry(void *opaque)
             nbd_co_reconnect_loop(s);
         }

-        if (s->state != NBD_CLIENT_CONNECTED) {
+        if (qatomic_load_acquire(&s->state) != NBD_CLIENT_CONNECTED) {
             continue;
         }

@@ -752,6 +762,8 @@ static coroutine_fn void nbd_connection_entry(void *opaque)
     s->connection_co = NULL;
     if (s->ioc) {
         qio_channel_detach_aio_context(QIO_CHANNEL(s->ioc));
+        yank_unregister_function(BLOCKDEV_YANK_INSTANCE(s->bs->node_name),
+                                 nbd_yank, s->bs);
         object_unref(OBJECT(s->sioc));
         s->sioc = NULL;
         object_unref(OBJECT(s->ioc));
@@ -776,7 +788,7 @@ static int nbd_co_send_request(BlockDriverState *bs,
         qemu_co_queue_wait(&s->free_sema, &s->send_mutex);
     }

-    if (s->state != NBD_CLIENT_CONNECTED) {
+    if (qatomic_load_acquire(&s->state) != NBD_CLIENT_CONNECTED) {
         rc = -EIO;
         goto err;
     }
@@ -803,7 +815,8 @@ static int nbd_co_send_request(BlockDriverState *bs,
     if (qiov) {
         qio_channel_set_cork(s->ioc, true);
         rc = nbd_send_request(s->ioc, request);
-        if (rc >= 0 && s->state == NBD_CLIENT_CONNECTED) {
+        if (qatomic_load_acquire(&s->state) == NBD_CLIENT_CONNECTED &&
+            rc >= 0) {
             if (qio_channel_writev_all(s->ioc, qiov->iov, qiov->niov,
                                        NULL) < 0) {
                 rc = -EIO;
@@ -1118,7 +1131,7 @@ static coroutine_fn int nbd_co_do_receive_one_chunk(
     s->requests[i].receiving = true;
     qemu_coroutine_yield();
     s->requests[i].receiving = false;
-    if (s->state != NBD_CLIENT_CONNECTED) {
+    if (qatomic_load_acquire(&s->state) != NBD_CLIENT_CONNECTED) {
         error_setg(errp, "Connection closed");
         return -EIO;
     }
@@ -1277,7 +1290,7 @@ static bool nbd_reply_chunk_iter_receive(BDRVNBDState *s,
     NBDReply local_reply;
     NBDStructuredReplyChunk *chunk;
     Error *local_err = NULL;
-    if (s->state != NBD_CLIENT_CONNECTED) {
+    if (qatomic_load_acquire(&s->state) != NBD_CLIENT_CONNECTED) {
         error_setg(&local_err, "Connection closed");
         nbd_iter_channel_error(iter, -EIO, &local_err);
         goto break_loop;
@@ -1302,7 +1315,8 @@ static bool nbd_reply_chunk_iter_receive(BDRVNBDState *s,
     }

     /* Do not execute the body of NBD_FOREACH_REPLY_CHUNK for simple reply. */
-    if (nbd_reply_is_simple(reply) || s->state != NBD_CLIENT_CONNECTED) {
+    if (nbd_reply_is_simple(reply) ||
+        qatomic_load_acquire(&s->state) != NBD_CLIENT_CONNECTED) {
         goto break_loop;
     }

@@ -1734,6 +1748,15 @@ static int nbd_client_reopen_prepare(BDRVReopenState *state,
     return 0;
 }

+static void nbd_yank(void *opaque)
+{
+    BlockDriverState *bs = opaque;
+    BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
+
+    qatomic_store_release(&s->state, NBD_CLIENT_QUIT);
+    qio_channel_shutdown(QIO_CHANNEL(s->sioc), QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
+}
+
 static void nbd_client_close(BlockDriverState *bs)
 {
     BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
@@ -1746,52 +1769,53 @@ static void nbd_client_close(BlockDriverState *bs)
     nbd_teardown_connection(bs);
 }

-static QIOChannelSocket *nbd_establish_connection(SocketAddress *saddr,
-                                                  Error **errp)
+static int nbd_establish_connection(BlockDriverState *bs,
+                                    SocketAddress *saddr,
+                                    Error **errp)
 {
     ERRP_GUARD();
-    QIOChannelSocket *sioc;
+    BDRVNBDState *s = (BDRVNBDState *)bs->opaque;

-    sioc = qio_channel_socket_new();
-    qio_channel_set_name(QIO_CHANNEL(sioc), "nbd-client");
+    s->sioc = qio_channel_socket_new();
+    qio_channel_set_name(QIO_CHANNEL(s->sioc), "nbd-client");

-    qio_channel_socket_connect_sync(sioc, saddr, errp);
+    qio_channel_socket_connect_sync(s->sioc, saddr, errp);
     if (*errp) {
-        object_unref(OBJECT(sioc));
-        return NULL;
+        object_unref(OBJECT(s->sioc));
+        s->sioc = NULL;
+        return -1;
     }

-    qio_channel_set_delay(QIO_CHANNEL(sioc), false);
+    yank_register_function(BLOCKDEV_YANK_INSTANCE(bs->node_name), nbd_yank, bs);
+    qio_channel_set_delay(QIO_CHANNEL(s->sioc), false);

-    return sioc;
+    return 0;
 }

-/* nbd_client_handshake takes ownership on sioc. On failure it is unref'ed. */
-static int nbd_client_handshake(BlockDriverState *bs, QIOChannelSocket *sioc,
-                                Error **errp)
+/* nbd_client_handshake takes ownership on s->sioc. On failure it's unref'ed. */
+static int nbd_client_handshake(BlockDriverState *bs, Error **errp)
 {
     BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
     AioContext *aio_context = bdrv_get_aio_context(bs);
     int ret;

     trace_nbd_client_handshake(s->export);
-
-    s->sioc = sioc;
-
-    qio_channel_set_blocking(QIO_CHANNEL(sioc), false, NULL);
-    qio_channel_attach_aio_context(QIO_CHANNEL(sioc), aio_context);
+    qio_channel_set_blocking(QIO_CHANNEL(s->sioc), false, NULL);
+    qio_channel_attach_aio_context(QIO_CHANNEL(s->sioc), aio_context);

     s->info.request_sizes = true;
     s->info.structured_reply = true;
     s->info.base_allocation = true;
     s->info.x_dirty_bitmap = g_strdup(s->x_dirty_bitmap);
     s->info.name = g_strdup(s->export ?: "");
-    ret = nbd_receive_negotiate(aio_context, QIO_CHANNEL(sioc), s->tlscreds,
+    ret = nbd_receive_negotiate(aio_context, QIO_CHANNEL(s->sioc), s->tlscreds,
                                 s->hostname, &s->ioc, &s->info, errp);
     g_free(s->info.x_dirty_bitmap);
     g_free(s->info.name);
     if (ret < 0) {
-        object_unref(OBJECT(sioc));
+        yank_unregister_function(BLOCKDEV_YANK_INSTANCE(bs->node_name),
+                                 nbd_yank, bs);
+        object_unref(OBJECT(s->sioc));
         s->sioc = NULL;
         return ret;
     }
@@ -1819,7 +1843,7 @@ static int nbd_client_handshake(BlockDriverState *bs, QIOChannelSocket *sioc,
     }

     if (!s->ioc) {
-        s->ioc = QIO_CHANNEL(sioc);
+        s->ioc = QIO_CHANNEL(s->sioc);
         object_ref(OBJECT(s->ioc));
     }

@@ -1835,9 +1859,11 @@ static int nbd_client_handshake(BlockDriverState *bs, QIOChannelSocket *sioc,
     {
         NBDRequest request = { .type = NBD_CMD_DISC };

-        nbd_send_request(s->ioc ?: QIO_CHANNEL(sioc), &request);
+        nbd_send_request(s->ioc ?: QIO_CHANNEL(s->sioc), &request);

-        object_unref(OBJECT(sioc));
+        yank_unregister_function(BLOCKDEV_YANK_INSTANCE(bs->node_name),
+                                 nbd_yank, bs);
+        object_unref(OBJECT(s->sioc));
         s->sioc = NULL;

         return ret;
@@ -2229,7 +2255,6 @@ static int nbd_open(BlockDriverState *bs, QDict *options, int flags,
 {
     int ret;
     BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
-    QIOChannelSocket *sioc;

     ret = nbd_process_options(bs, options, errp);
     if (ret < 0) {
@@ -2240,17 +2265,23 @@ static int nbd_open(BlockDriverState *bs, QDict *options, int flags,
     qemu_co_mutex_init(&s->send_mutex);
     qemu_co_queue_init(&s->free_sema);

+    yank_register_instance(BLOCKDEV_YANK_INSTANCE(bs->node_name), errp);
+    if (*errp) {
+        return -EEXIST;
+    }
+
     /*
      * establish TCP connection, return error if it fails
      * TODO: Configurable retry-until-timeout behaviour.
      */
-    sioc = nbd_establish_connection(s->saddr, errp);
-    if (!sioc) {
+    if (nbd_establish_connection(bs, s->saddr, errp) < 0) {
+        yank_unregister_instance(BLOCKDEV_YANK_INSTANCE(bs->node_name));
         return -ECONNREFUSED;
     }

-    ret = nbd_client_handshake(bs, sioc, errp);
+    ret = nbd_client_handshake(bs, errp);
     if (ret < 0) {
+        yank_unregister_instance(BLOCKDEV_YANK_INSTANCE(bs->node_name));
         nbd_clear_bdrvstate(s);
         return ret;
     }
@@ -2310,6 +2341,7 @@ static void nbd_close(BlockDriverState *bs)
     BDRVNBDState *s = bs->opaque;

     nbd_client_close(bs);
+    yank_unregister_instance(BLOCKDEV_YANK_INSTANCE(bs->node_name));
     nbd_clear_bdrvstate(s);
 }

--
2.20.1


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v9 3/8] chardev/char-socket.c: Add yank feature
  2020-10-28 18:45 [PATCH v9 0/8] Introduce 'yank' oob qmp command to recover from hanging qemu Lukas Straub
  2020-10-28 18:45 ` [PATCH v9 1/8] Introduce yank feature Lukas Straub
  2020-10-28 18:45 ` [PATCH v9 2/8] block/nbd.c: Add " Lukas Straub
@ 2020-10-28 18:45 ` Lukas Straub
  2020-10-28 18:45 ` [PATCH v9 4/8] migration: " Lukas Straub
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Lukas Straub @ 2020-10-28 18:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Daniel P. Berrangé,
	qemu-block, Juan Quintela, Markus Armbruster,
	Dr. David Alan Gilbert, Max Reitz, Paolo Bonzini,
	Marc-André Lureau

[-- Attachment #1: Type: text/plain, Size: 4819 bytes --]

Register a yank function to shutdown the socket on yank.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 chardev/char-socket.c | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/chardev/char-socket.c b/chardev/char-socket.c
index 95e45812d5..5947cbe8bb 100644
--- a/chardev/char-socket.c
+++ b/chardev/char-socket.c
@@ -34,6 +34,7 @@
 #include "qapi/error.h"
 #include "qapi/clone-visitor.h"
 #include "qapi/qapi-visit-sockets.h"
+#include "qemu/yank.h"

 #include "chardev/char-io.h"
 #include "qom/object.h"
@@ -70,6 +71,7 @@ struct SocketChardev {
     size_t read_msgfds_num;
     int *write_msgfds;
     size_t write_msgfds_num;
+    bool registered_yank;

     SocketAddress *addr;
     bool is_listen;
@@ -415,6 +417,12 @@ static void tcp_chr_free_connection(Chardev *chr)

     tcp_set_msgfds(chr, NULL, 0);
     remove_fd_in_watch(chr);
+    if (s->state == TCP_CHARDEV_STATE_CONNECTING
+        || s->state == TCP_CHARDEV_STATE_CONNECTED) {
+        yank_unregister_function(CHARDEV_YANK_INSTANCE(chr->label),
+                                 yank_generic_iochannel,
+                                 QIO_CHANNEL(s->sioc));
+    }
     object_unref(OBJECT(s->sioc));
     s->sioc = NULL;
     object_unref(OBJECT(s->ioc));
@@ -918,6 +926,9 @@ static int tcp_chr_add_client(Chardev *chr, int fd)
     }
     tcp_chr_change_state(s, TCP_CHARDEV_STATE_CONNECTING);
     tcp_chr_set_client_ioc_name(chr, sioc);
+    yank_register_function(CHARDEV_YANK_INSTANCE(chr->label),
+                           yank_generic_iochannel,
+                           QIO_CHANNEL(sioc));
     ret = tcp_chr_new_client(chr, sioc);
     object_unref(OBJECT(sioc));
     return ret;
@@ -932,6 +943,9 @@ static void tcp_chr_accept(QIONetListener *listener,

     tcp_chr_change_state(s, TCP_CHARDEV_STATE_CONNECTING);
     tcp_chr_set_client_ioc_name(chr, cioc);
+    yank_register_function(CHARDEV_YANK_INSTANCE(chr->label),
+                           yank_generic_iochannel,
+                           QIO_CHANNEL(cioc));
     tcp_chr_new_client(chr, cioc);
 }

@@ -947,6 +961,9 @@ static int tcp_chr_connect_client_sync(Chardev *chr, Error **errp)
         object_unref(OBJECT(sioc));
         return -1;
     }
+    yank_register_function(CHARDEV_YANK_INSTANCE(chr->label),
+                           yank_generic_iochannel,
+                           QIO_CHANNEL(sioc));
     tcp_chr_new_client(chr, sioc);
     object_unref(OBJECT(sioc));
     return 0;
@@ -962,6 +979,9 @@ static void tcp_chr_accept_server_sync(Chardev *chr)
     tcp_chr_change_state(s, TCP_CHARDEV_STATE_CONNECTING);
     sioc = qio_net_listener_wait_client(s->listener);
     tcp_chr_set_client_ioc_name(chr, sioc);
+    yank_register_function(CHARDEV_YANK_INSTANCE(chr->label),
+                           yank_generic_iochannel,
+                           QIO_CHANNEL(sioc));
     tcp_chr_new_client(chr, sioc);
     object_unref(OBJECT(sioc));
 }
@@ -1072,6 +1092,9 @@ static void char_socket_finalize(Object *obj)
         object_unref(OBJECT(s->tls_creds));
     }
     g_free(s->tls_authz);
+    if (s->registered_yank) {
+        yank_unregister_instance(CHARDEV_YANK_INSTANCE(chr->label));
+    }

     qemu_chr_be_event(chr, CHR_EVENT_CLOSED);
 }
@@ -1087,6 +1110,9 @@ static void qemu_chr_socket_connected(QIOTask *task, void *opaque)

     if (qio_task_propagate_error(task, &err)) {
         tcp_chr_change_state(s, TCP_CHARDEV_STATE_DISCONNECTED);
+        yank_unregister_function(CHARDEV_YANK_INSTANCE(chr->label),
+                                 yank_generic_iochannel,
+                                 QIO_CHANNEL(sioc));
         check_report_connect_error(chr, err);
         goto cleanup;
     }
@@ -1120,6 +1146,9 @@ static void tcp_chr_connect_client_async(Chardev *chr)
     tcp_chr_change_state(s, TCP_CHARDEV_STATE_CONNECTING);
     sioc = qio_channel_socket_new();
     tcp_chr_set_client_ioc_name(chr, sioc);
+    yank_register_function(CHARDEV_YANK_INSTANCE(chr->label),
+                           yank_generic_iochannel,
+                           QIO_CHANNEL(sioc));
     /*
      * Normally code would use the qio_channel_socket_connect_async
      * method which uses a QIOTask + qio_task_set_error internally
@@ -1362,6 +1391,12 @@ static void qmp_chardev_open_socket(Chardev *chr,
         qemu_chr_set_feature(chr, QEMU_CHAR_FEATURE_FD_PASS);
     }

+    yank_register_instance(CHARDEV_YANK_INSTANCE(chr->label), errp);
+    if (*errp) {
+        return;
+    }
+    s->registered_yank = true;
+
     /* be isn't opened until we get a connection */
     *be_opened = false;

--
2.20.1


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v9 4/8] migration: Add yank feature
  2020-10-28 18:45 [PATCH v9 0/8] Introduce 'yank' oob qmp command to recover from hanging qemu Lukas Straub
                   ` (2 preceding siblings ...)
  2020-10-28 18:45 ` [PATCH v9 3/8] chardev/char-socket.c: " Lukas Straub
@ 2020-10-28 18:45 ` Lukas Straub
  2020-10-28 18:45 ` [PATCH v9 5/8] io/channel-tls.c: make qio_channel_tls_shutdown thread-safe Lukas Straub
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Lukas Straub @ 2020-10-28 18:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Daniel P. Berrangé,
	qemu-block, Juan Quintela, Markus Armbruster,
	Dr. David Alan Gilbert, Max Reitz, Paolo Bonzini,
	Marc-André Lureau

[-- Attachment #1: Type: text/plain, Size: 7928 bytes --]

Register yank functions on sockets to shut them down.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/channel.c           | 13 +++++++++++++
 migration/migration.c         | 25 +++++++++++++++++++++++++
 migration/multifd.c           | 10 ++++++++++
 migration/qemu-file-channel.c |  7 +++++++
 migration/savevm.c            |  6 ++++++
 5 files changed, 61 insertions(+)

diff --git a/migration/channel.c b/migration/channel.c
index 8a783baa0b..35fe234e9c 100644
--- a/migration/channel.c
+++ b/migration/channel.c
@@ -18,6 +18,8 @@
 #include "trace.h"
 #include "qapi/error.h"
 #include "io/channel-tls.h"
+#include "io/channel-socket.h"
+#include "qemu/yank.h"

 /**
  * @migration_channel_process_incoming - Create new incoming migration channel
@@ -35,6 +37,11 @@ void migration_channel_process_incoming(QIOChannel *ioc)
     trace_migration_set_incoming_channel(
         ioc, object_get_typename(OBJECT(ioc)));

+    if (object_dynamic_cast(OBJECT(ioc), TYPE_QIO_CHANNEL_SOCKET)) {
+        yank_register_function(MIGRATION_YANK_INSTANCE, yank_generic_iochannel,
+                               QIO_CHANNEL(ioc));
+    }
+
     if (s->parameters.tls_creds &&
         *s->parameters.tls_creds &&
         !object_dynamic_cast(OBJECT(ioc),
@@ -67,6 +74,12 @@ void migration_channel_connect(MigrationState *s,
         ioc, object_get_typename(OBJECT(ioc)), hostname, error);

     if (!error) {
+        if (object_dynamic_cast(OBJECT(ioc), TYPE_QIO_CHANNEL_SOCKET)) {
+            yank_register_function(MIGRATION_YANK_INSTANCE,
+                                   yank_generic_iochannel,
+                                   QIO_CHANNEL(ioc));
+        }
+
         if (s->parameters.tls_creds &&
             *s->parameters.tls_creds &&
             !object_dynamic_cast(OBJECT(ioc),
diff --git a/migration/migration.c b/migration/migration.c
index 0575ecb379..e2c1123a90 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -56,6 +56,7 @@
 #include "net/announce.h"
 #include "qemu/queue.h"
 #include "multifd.h"
+#include "qemu/yank.h"

 #define MAX_THROTTLE  (128 << 20)      /* Migration transfer speed throttling */

@@ -244,6 +245,8 @@ void migration_incoming_state_destroy(void)
         qapi_free_SocketAddressList(mis->socket_address_list);
         mis->socket_address_list = NULL;
     }
+
+    yank_unregister_instance(MIGRATION_YANK_INSTANCE);
 }

 static void migrate_generate_event(int new_state)
@@ -390,8 +393,14 @@ void qemu_start_incoming_migration(const char *uri, Error **errp)
 {
     const char *p = NULL;

+    yank_register_instance(MIGRATION_YANK_INSTANCE, errp);
+    if (*errp) {
+        return;
+    }
+
     qapi_event_send_migration(MIGRATION_STATUS_SETUP);
     if (!strcmp(uri, "defer")) {
+        yank_unregister_instance(MIGRATION_YANK_INSTANCE);
         deferred_incoming_migration(errp);
     } else if (strstart(uri, "tcp:", &p) ||
                strstart(uri, "unix:", NULL) ||
@@ -406,6 +415,7 @@ void qemu_start_incoming_migration(const char *uri, Error **errp)
     } else if (strstart(uri, "fd:", &p)) {
         fd_start_incoming_migration(p, errp);
     } else {
+        yank_unregister_instance(MIGRATION_YANK_INSTANCE);
         error_setg(errp, "unknown migration protocol: %s", uri);
     }
 }
@@ -1698,6 +1708,7 @@ static void migrate_fd_cleanup(MigrationState *s)
     }
     notifier_list_notify(&migration_state_notifiers, s);
     block_cleanup_parameters(s);
+    yank_unregister_instance(MIGRATION_YANK_INSTANCE);
 }

 static void migrate_fd_cleanup_schedule(MigrationState *s)
@@ -1972,6 +1983,7 @@ void qmp_migrate_recover(const char *uri, Error **errp)
      * only re-setup the migration stream and poke existing migration
      * to continue using that newly established channel.
      */
+    yank_unregister_instance(MIGRATION_YANK_INSTANCE);
     qemu_start_incoming_migration(uri, errp);
 }

@@ -2109,6 +2121,13 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
         return;
     }

+    if (!(has_resume && resume)) {
+        yank_register_instance(MIGRATION_YANK_INSTANCE, errp);
+        if (*errp) {
+            return;
+        }
+    }
+
     if (strstart(uri, "tcp:", &p) ||
         strstart(uri, "unix:", NULL) ||
         strstart(uri, "vsock:", NULL)) {
@@ -2122,6 +2141,9 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
     } else if (strstart(uri, "fd:", &p)) {
         fd_start_outgoing_migration(s, p, &local_err);
     } else {
+        if (!(has_resume && resume)) {
+            yank_unregister_instance(MIGRATION_YANK_INSTANCE);
+        }
         error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "uri",
                    "a valid migration protocol");
         migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
@@ -2131,6 +2153,9 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
     }

     if (local_err) {
+        if (!(has_resume && resume)) {
+            yank_unregister_instance(MIGRATION_YANK_INSTANCE);
+        }
         migrate_fd_error(s, local_err);
         error_propagate(errp, local_err);
         return;
diff --git a/migration/multifd.c b/migration/multifd.c
index 68b171fb61..620b55f5d9 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -25,6 +25,9 @@
 #include "trace.h"
 #include "multifd.h"

+#include "qemu/yank.h"
+#include "io/channel-socket.h"
+
 /* Multiple fd's */

 #define MULTIFD_MAGIC 0x11223344U
@@ -962,6 +965,13 @@ int multifd_load_cleanup(Error **errp)
     for (i = 0; i < migrate_multifd_channels(); i++) {
         MultiFDRecvParams *p = &multifd_recv_state->params[i];

+        if (object_dynamic_cast(OBJECT(p->c), TYPE_QIO_CHANNEL_SOCKET)
+            && OBJECT(p->c)->ref == 1) {
+            yank_unregister_function(MIGRATION_YANK_INSTANCE,
+                                     yank_generic_iochannel,
+                                     QIO_CHANNEL(p->c));
+        }
+
         object_unref(OBJECT(p->c));
         p->c = NULL;
         qemu_mutex_destroy(&p->mutex);
diff --git a/migration/qemu-file-channel.c b/migration/qemu-file-channel.c
index d2ce32f4b9..afc3a7f642 100644
--- a/migration/qemu-file-channel.c
+++ b/migration/qemu-file-channel.c
@@ -27,6 +27,7 @@
 #include "qemu-file.h"
 #include "io/channel-socket.h"
 #include "qemu/iov.h"
+#include "qemu/yank.h"


 static ssize_t channel_writev_buffer(void *opaque,
@@ -104,6 +105,12 @@ static int channel_close(void *opaque, Error **errp)
     int ret;
     QIOChannel *ioc = QIO_CHANNEL(opaque);
     ret = qio_channel_close(ioc, errp);
+    if (object_dynamic_cast(OBJECT(ioc), TYPE_QIO_CHANNEL_SOCKET)
+        && OBJECT(ioc)->ref == 1) {
+        yank_unregister_function(MIGRATION_YANK_INSTANCE,
+                                 yank_generic_iochannel,
+                                 QIO_CHANNEL(ioc));
+    }
     object_unref(OBJECT(ioc));
     return ret;
 }
diff --git a/migration/savevm.c b/migration/savevm.c
index ff33e210eb..33da4692c4 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -62,6 +62,7 @@
 #include "migration/colo.h"
 #include "qemu/bitmap.h"
 #include "net/announce.h"
+#include "qemu/yank.h"

 const unsigned int postcopy_ram_discard_version = 0;

@@ -2940,6 +2941,11 @@ int load_snapshot(const char *name, Error **errp)
     qemu_system_reset(SHUTDOWN_CAUSE_NONE);
     mis->from_src_file = f;

+    yank_register_instance(MIGRATION_YANK_INSTANCE, errp);
+    if (*errp) {
+        ret = -EINVAL;
+        goto err_drain;
+    }
     aio_context_acquire(aio_context);
     ret = qemu_loadvm_state(f);
     migration_incoming_state_destroy();
--
2.20.1


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v9 5/8] io/channel-tls.c: make qio_channel_tls_shutdown thread-safe
  2020-10-28 18:45 [PATCH v9 0/8] Introduce 'yank' oob qmp command to recover from hanging qemu Lukas Straub
                   ` (3 preceding siblings ...)
  2020-10-28 18:45 ` [PATCH v9 4/8] migration: " Lukas Straub
@ 2020-10-28 18:45 ` Lukas Straub
  2020-10-28 18:45 ` [PATCH v9 6/8] io: Document qmp oob suitability of qio_channel_shutdown and io_shutdown Lukas Straub
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Lukas Straub @ 2020-10-28 18:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Daniel P. Berrangé,
	qemu-block, Juan Quintela, Markus Armbruster,
	Dr. David Alan Gilbert, Max Reitz, Paolo Bonzini,
	Marc-André Lureau

[-- Attachment #1: Type: text/plain, Size: 1381 bytes --]

Make qio_channel_tls_shutdown thread-safe by using atomics when
accessing tioc->shutdown.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
---
 io/channel-tls.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/io/channel-tls.c b/io/channel-tls.c
index 7ec8ceff2f..10d0bf59aa 100644
--- a/io/channel-tls.c
+++ b/io/channel-tls.c
@@ -23,6 +23,7 @@
 #include "qemu/module.h"
 #include "io/channel-tls.h"
 #include "trace.h"
+#include "qemu/atomic.h"


 static ssize_t qio_channel_tls_write_handler(const char *buf,
@@ -277,7 +278,8 @@ static ssize_t qio_channel_tls_readv(QIOChannel *ioc,
                     return QIO_CHANNEL_ERR_BLOCK;
                 }
             } else if (errno == ECONNABORTED &&
-                       (tioc->shutdown & QIO_CHANNEL_SHUTDOWN_READ)) {
+                       (qatomic_load_acquire(&tioc->shutdown) &
+                        QIO_CHANNEL_SHUTDOWN_READ)) {
                 return 0;
             }

@@ -361,7 +363,7 @@ static int qio_channel_tls_shutdown(QIOChannel *ioc,
 {
     QIOChannelTLS *tioc = QIO_CHANNEL_TLS(ioc);

-    tioc->shutdown |= how;
+    qatomic_or(&tioc->shutdown, how);

     return qio_channel_shutdown(tioc->master, how, errp);
 }
--
2.20.1


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v9 6/8] io: Document qmp oob suitability of qio_channel_shutdown and io_shutdown
  2020-10-28 18:45 [PATCH v9 0/8] Introduce 'yank' oob qmp command to recover from hanging qemu Lukas Straub
                   ` (4 preceding siblings ...)
  2020-10-28 18:45 ` [PATCH v9 5/8] io/channel-tls.c: make qio_channel_tls_shutdown thread-safe Lukas Straub
@ 2020-10-28 18:45 ` Lukas Straub
  2020-10-28 18:45 ` [PATCH v9 7/8] MAINTAINERS: Add myself as maintainer for yank feature Lukas Straub
  2020-10-28 18:45 ` [PATCH v9 8/8] tests/test-char.c: Wait for the chardev to connect in char_socket_client_dupid_test Lukas Straub
  7 siblings, 0 replies; 13+ messages in thread
From: Lukas Straub @ 2020-10-28 18:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Daniel P. Berrangé,
	qemu-block, Juan Quintela, Markus Armbruster,
	Dr. David Alan Gilbert, Max Reitz, Paolo Bonzini,
	Marc-André Lureau

[-- Attachment #1: Type: text/plain, Size: 1250 bytes --]

Migration and yank code assume that qio_channel_shutdown is thread
-safe and can be called from qmp oob handler. Document this after
checking the code.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
---
 include/io/channel.h | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/io/channel.h b/include/io/channel.h
index 3c04f0edda..e0b9fc615d 100644
--- a/include/io/channel.h
+++ b/include/io/channel.h
@@ -92,7 +92,8 @@ struct QIOChannel {
  * provide additional optional features.
  *
  * Consult the corresponding public API docs for a description
- * of the semantics of each callback
+ * of the semantics of each callback. io_shutdown in particular
+ * must be thread-safe, terminate quickly and must not block.
  */
 struct QIOChannelClass {
     ObjectClass parent;
@@ -510,6 +511,8 @@ int qio_channel_close(QIOChannel *ioc,
  * QIO_CHANNEL_FEATURE_SHUTDOWN prior to calling
  * this method.
  *
+ * This function is thread-safe, terminates quickly and does not block.
+ *
  * Returns: 0 on success, -1 on error
  */
 int qio_channel_shutdown(QIOChannel *ioc,
--
2.20.1


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v9 7/8] MAINTAINERS: Add myself as maintainer for yank feature
  2020-10-28 18:45 [PATCH v9 0/8] Introduce 'yank' oob qmp command to recover from hanging qemu Lukas Straub
                   ` (5 preceding siblings ...)
  2020-10-28 18:45 ` [PATCH v9 6/8] io: Document qmp oob suitability of qio_channel_shutdown and io_shutdown Lukas Straub
@ 2020-10-28 18:45 ` Lukas Straub
  2020-10-28 18:45 ` [PATCH v9 8/8] tests/test-char.c: Wait for the chardev to connect in char_socket_client_dupid_test Lukas Straub
  7 siblings, 0 replies; 13+ messages in thread
From: Lukas Straub @ 2020-10-28 18:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Daniel P. Berrangé,
	qemu-block, Juan Quintela, Markus Armbruster,
	Dr. David Alan Gilbert, Max Reitz, Paolo Bonzini,
	Marc-André Lureau

[-- Attachment #1: Type: text/plain, Size: 732 bytes --]

I'll maintain this for now as the colo usecase is the first user
of this functionality.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
---
 MAINTAINERS | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index ef6f5c7399..5921e565df 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2669,6 +2669,12 @@ F: util/uuid.c
 F: include/qemu/uuid.h
 F: tests/test-uuid.c

+Yank feature
+M: Lukas Straub <lukasstraub2@web.de>
+S: Odd fixes
+F: util/yank.c
+F: include/qemu/yank.h
+
 COLO Framework
 M: zhanghailiang <zhang.zhanghailiang@huawei.com>
 S: Maintained
--
2.20.1


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v9 8/8] tests/test-char.c: Wait for the chardev to connect in char_socket_client_dupid_test
  2020-10-28 18:45 [PATCH v9 0/8] Introduce 'yank' oob qmp command to recover from hanging qemu Lukas Straub
                   ` (6 preceding siblings ...)
  2020-10-28 18:45 ` [PATCH v9 7/8] MAINTAINERS: Add myself as maintainer for yank feature Lukas Straub
@ 2020-10-28 18:45 ` Lukas Straub
  7 siblings, 0 replies; 13+ messages in thread
From: Lukas Straub @ 2020-10-28 18:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Daniel P. Berrangé,
	qemu-block, Juan Quintela, Markus Armbruster,
	Dr. David Alan Gilbert, Max Reitz, Paolo Bonzini,
	Marc-André Lureau

[-- Attachment #1: Type: text/plain, Size: 1055 bytes --]

A connecting chardev object has an additional reference by the connecting
thread, so if the chardev is still connecting by the end of the test,
then the chardev object won't be freed. This in turn means that the yank
instance won't be unregistered and when running the next test-case
yank_register_instance will abort, because the yank instance is
already/still registered.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
---
 tests/test-char.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tests/test-char.c b/tests/test-char.c
index 9196e566e9..aedb5c9eda 100644
--- a/tests/test-char.c
+++ b/tests/test-char.c
@@ -937,6 +937,7 @@ static void char_socket_client_dupid_test(gconstpointer opaque)
     g_assert_nonnull(opts);
     chr1 = qemu_chr_new_from_opts(opts, NULL, &error_abort);
     g_assert_nonnull(chr1);
+    qemu_chr_wait_connected(chr1, &error_abort);

     chr2 = qemu_chr_new_from_opts(opts, NULL, &local_err);
     g_assert_null(chr2);
--
2.20.1

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v9 1/8] Introduce yank feature
  2020-10-28 18:45 ` [PATCH v9 1/8] Introduce yank feature Lukas Straub
@ 2020-10-29 16:36   ` Markus Armbruster
  2020-10-30 10:32     ` Lukas Straub
  0 siblings, 1 reply; 13+ messages in thread
From: Markus Armbruster @ 2020-10-29 16:36 UTC (permalink / raw)
  To: Lukas Straub
  Cc: Kevin Wolf, Daniel P. Berrangé,
	qemu-block, Juan Quintela, Dr. David Alan Gilbert, qemu-devel,
	Marc-André Lureau, Paolo Bonzini, Max Reitz

Nothing major, looks almost ready to me.

Lukas Straub <lukasstraub2@web.de> writes:

> The yank feature allows to recover from hanging qemu by "yanking"
> at various parts. Other qemu systems can register themselves and
> multiple yank functions. Then all yank functions for selected
> instances can be called by the 'yank' out-of-band qmp command.
> Available instances can be queried by a 'query-yank' oob command.
>
> Signed-off-by: Lukas Straub <lukasstraub2@web.de>
> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  include/qemu/yank.h |  95 ++++++++++++++++++++
>  qapi/misc.json      | 106 ++++++++++++++++++++++
>  util/meson.build    |   1 +
>  util/yank.c         | 213 ++++++++++++++++++++++++++++++++++++++++++++

checkpatch.pl warns:

    WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?

Can we find a maintainer for the two new files?

>  4 files changed, 415 insertions(+)
>  create mode 100644 include/qemu/yank.h
>  create mode 100644 util/yank.c
>
> diff --git a/include/qemu/yank.h b/include/qemu/yank.h
> new file mode 100644
> index 0000000000..89755e62af
> --- /dev/null
> +++ b/include/qemu/yank.h
> @@ -0,0 +1,95 @@
> +/*
> + * QEMU yank feature
> + *
> + * Copyright (c) Lukas Straub <lukasstraub2@web.de>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#ifndef YANK_H
> +#define YANK_H
> +
> +#include "qapi/qapi-types-misc.h"
> +
> +typedef void (YankFn)(void *opaque);
> +
> +/**
> + * yank_register_instance: Register a new instance.
> + *
> + * This registers a new instance for yanking. Must be called before any yank
> + * function is registered for this instance.
> + *
> + * This function is thread-safe.
> + *
> + * @instance: The instance.
> + * @errp: Error object.
> + */
> +void yank_register_instance(const YankInstance *instance, Error **errp);
> +
> +/**
> + * yank_unregister_instance: Unregister a instance.
> + *
> + * This unregisters a instance. Must be called only after every yank function
> + * of the instance has been unregistered.
> + *
> + * This function is thread-safe.
> + *
> + * @instance: The instance.
> + */
> +void yank_unregister_instance(const YankInstance *instance);
> +
> +/**
> + * yank_register_function: Register a yank function
> + *
> + * This registers a yank function. All limitations of qmp oob commands apply
> + * to the yank function as well. See docs/devel/qapi-code-gen.txt under
> + * "An OOB-capable command handler must satisfy the following conditions".
> + *
> + * This function is thread-safe.
> + *
> + * @instance: The instance.
> + * @func: The yank function.
> + * @opaque: Will be passed to the yank function.
> + */
> +void yank_register_function(const YankInstance *instance,
> +                            YankFn *func,
> +                            void *opaque);
> +
> +/**
> + * yank_unregister_function: Unregister a yank function
> + *
> + * This unregisters a yank function.
> + *
> + * This function is thread-safe.
> + *
> + * @instance: The instance.
> + * @func: func that was passed to yank_register_function.
> + * @opaque: opaque that was passed to yank_register_function.
> + */
> +void yank_unregister_function(const YankInstance *instance,
> +                              YankFn *func,
> +                              void *opaque);
> +
> +/**
> + * yank_generic_iochannel: Generic yank function for iochannel
> + *
> + * This is a generic yank function which will call qio_channel_shutdown on the
> + * provided QIOChannel.
> + *
> + * @opaque: QIOChannel to shutdown
> + */
> +void yank_generic_iochannel(void *opaque);
> +
> +#define BLOCKDEV_YANK_INSTANCE(the_node_name) (&(YankInstance) { \
> +        .type = YANK_INSTANCE_TYPE_BLOCKDEV, \
> +        .u.blockdev.node_name = (the_node_name) })
> +
> +#define CHARDEV_YANK_INSTANCE(the_id) (&(YankInstance) { \
> +        .type = YANK_INSTANCE_TYPE_CHARDEV, \
> +        .u.chardev.id = (the_id) })
> +
> +#define MIGRATION_YANK_INSTANCE (&(YankInstance) { \
> +        .type = YANK_INSTANCE_TYPE_MIGRATION })
> +
> +#endif
> diff --git a/qapi/misc.json b/qapi/misc.json
> index 40df513856..3b7de02a4d 100644
> --- a/qapi/misc.json
> +++ b/qapi/misc.json
> @@ -568,3 +568,109 @@
>   'data': { '*option': 'str' },
>   'returns': ['CommandLineOptionInfo'],
>   'allow-preconfig': true }
> +
> +##
> +# @YankInstanceType:
> +#
> +# An enumeration of yank instance types. See "YankInstance" for more

Please write cross-references as @YankInstance.  This gives us a chance
to turn them into links (seems not to be implemented, yet).  More of the
same below.

> +# information.
> +#
> +# Since: 5.2
> +##
> +{ 'enum': 'YankInstanceType',
> +  'data': [ 'blockdev', 'chardev', 'migration' ] }
> +
> +##
> +# @YankInstanceBlockdev:
> +#
> +# Specifies which blockdev to yank. See "YankInstance" for more information.
> +#
> +# @node-name: the blockdev's node-name

Is this really about block devices?  A node-name identifies a block
graph node, which may or may not be associated with a block device.

If I understand PATCH 2 correctly, it makes *any* block node with driver
'nbd' yankable while it has a network connection.  If that's true, then
this is definitely about block graph nodes, not about block devices.

If it is about devices, let's say "block device", not "blockdev".

If it is about nodes, let's say "block graph node".  Also rename
YankInstanceType member 'blockdev' by 'block-node', YankInstanceBlockdev
to YankInstanceBlockNode.

> +#
> +# Since: 5.2
> +##
> +{ 'struct': 'YankInstanceBlockdev',
> +  'data': { 'node-name': 'str' } }
> +
> +##
> +# @YankInstanceChardev:
> +#
> +# Specifies which chardev to yank. See "YankInstance" for more information.

"character device", not "chardev".

> +#
> +# @id: the chardev's ID
> +#
> +# Since: 5.2
> +##
> +{ 'struct': 'YankInstanceChardev',
> +  'data': { 'id': 'str' } }

This is called 'label' in @ChardevInfo.  It's also called 'id' in
@chardev-add.  Oh well, 'id' it is.

> +
> +##
> +# @YankInstance:
> +#
> +# A yank instance can be yanked with the "yank" qmp command to recover from a
> +# hanging qemu.

QEMU

> +#
> +# Currently implemented yank instances:
> +#  -nbd block device:
> +#   Yanking it will shutdown the connection to the nbd server without
> +#   attempting to reconnect.
> +#  -socket chardev:
> +#   Yanking it will shutdown the connected socket.
> +#  -migration:
> +#   Yanking it will shutdown all migration connections.

To my surprise, this is recognized as bullet list markup.  But please
put a space between the bullet and the text anyway.

Also: "shutdown" is a noun, the verb is spelled "shut down".

In my review of v8, I asked how yanking migration is related to command
migrate_cancel.  Daniel explained:

    migrate_cancel will do a shutdown() on the primary migration socket only.
    In addition it will toggle the migration state.

    Yanking will do a shutdown on all migration sockets (important for
    multifd), but won't touch migration state or any other aspect of QEMU
    code.

    Overall yanking has less potential for things to go wrong than the
    migrate_cancel method, as it doesn't try to do any kind of cleanup
    or migration.

Would it make sense to work this into the documentation?

> +#
> +# Since: 5.2
> +##
> +{ 'union': 'YankInstance',
> +  'base': { 'type': 'YankInstanceType' },
> +  'discriminator': 'type',
> +  'data': {
> +      'blockdev': 'YankInstanceBlockdev',
> +      'chardev': 'YankInstanceChardev' } }
> +
> +##
> +# @yank:
> +#
> +# Recover from hanging qemu by yanking the specified instances. See

QEMU

"Try to recover" would be more precise, I think.

> +# "YankInstance" for more information.
> +#
> +# Takes a list of @YankInstance as argument.
> +#
> +# Returns: nothing.
> +#
> +# Example:
> +#
> +# -> { "execute": "yank",
> +#      "arguments": {
> +#          "instances": [
> +#               { "type": "block-node",
> +#                 "node-name": "nbd0" }
> +#          ] } }
> +# <- { "return": {} }
> +#
> +# Since: 5.2
> +##
> +{ 'command': 'yank',
> +  'data': { 'instances': ['YankInstance'] },
> +  'allow-oob': true }
> +
> +##
> +# @query-yank:
> +#
> +# Query yank instances. See "YankInstance" for more information.
> +#
> +# Returns: list of @YankInstance
> +#
> +# Example:
> +#
> +# -> { "execute": "query-yank" }
> +# <- { "return": [
> +#          { "type": "block-node",
> +#            "node-name": "nbd0" }
> +#      ] }
> +#
> +# Since: 5.2
> +##
> +{ 'command': 'query-yank',
> +  'returns': ['YankInstance'],
> +  'allow-oob': true }
> diff --git a/util/meson.build b/util/meson.build
> index c5159ad79d..dbda9d9123 100644
> --- a/util/meson.build
> +++ b/util/meson.build
> @@ -50,6 +50,7 @@ endif
>
>  if have_system
>    util_ss.add(when: 'CONFIG_GIO', if_true: [files('dbus.c'), gio])
> +  util_ss.add(files('yank.c'))
>  endif
>
>  if have_block
> diff --git a/util/yank.c b/util/yank.c
> new file mode 100644
> index 0000000000..0b3a816706
> --- /dev/null
> +++ b/util/yank.c
> @@ -0,0 +1,213 @@
> +/*
> + * QEMU yank feature
> + *
> + * Copyright (c) Lukas Straub <lukasstraub2@web.de>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qapi/error.h"
> +#include "qemu/thread.h"
> +#include "qemu/queue.h"
> +#include "qapi/qapi-commands-misc.h"
> +#include "qapi/qapi-visit-misc.h"
> +#include "qapi/clone-visitor.h"
> +#include "io/channel.h"
> +#include "qemu/yank.h"
> +
> +struct YankFuncAndParam {
> +    YankFn *func;
> +    void *opaque;
> +    QLIST_ENTRY(YankFuncAndParam) next;
> +};
> +
> +struct YankInstanceEntry {
> +    YankInstance *instance;
> +    QLIST_HEAD(, YankFuncAndParam) yankfns;
> +    QLIST_ENTRY(YankInstanceEntry) next;
> +};
> +
> +typedef struct YankFuncAndParam YankFuncAndParam;
> +typedef struct YankInstanceEntry YankInstanceEntry;
> +
> +/*
> + * This lock protects the yank_instance_list below.

Please add something like

    * Because it's taken by OOB-capable commands, it must be "fast",
    * i.e. it may only be held for a bounded, short time.  See
    * docs/devel/qapi-code-gen.txt for additional information.

> + */
> +static QemuMutex yank_lock;
> +
> +static QLIST_HEAD(, YankInstanceEntry) yank_instance_list
> +    = QLIST_HEAD_INITIALIZER(yank_instance_list);
> +
> +static int yank_compare_instances(const YankInstance *a, const YankInstance *b)
> +{
> +    if (a->type != b->type) {
> +        return 0;
> +    }
> +
> +    switch (a->type) {
> +    case YANK_INSTANCE_TYPE_BLOCKDEV:
> +        return !strcmp(a->u.blockdev.node_name, b->u.blockdev.node_name);
> +    break;
> +
> +    case YANK_INSTANCE_TYPE_CHARDEV:
> +        return !strcmp(a->u.chardev.id, b->u.chardev.id);
> +    break;
> +
> +    case YANK_INSTANCE_TYPE_MIGRATION:
> +        return 1;
> +    break;
> +
> +    default:
> +        abort();
> +    }
> +}

Please make this function return bool.  When a comparison function
returns int, I expect it to return negative value when less, zero when
equal, and positive value when greater.

> +
> +static YankInstanceEntry *yank_find_entry(const YankInstance *instance)
> +{
> +    YankInstanceEntry *entry;
> +
> +    QLIST_FOREACH(entry, &yank_instance_list, next) {
> +        if (yank_compare_instances(entry->instance, instance)) {
> +            return entry;
> +        }
> +    }
> +    return NULL;
> +}
> +
> +void yank_register_instance(const YankInstance *instance, Error **errp)
> +{
> +    YankInstanceEntry *entry;
> +
> +    qemu_mutex_lock(&yank_lock);
> +
> +    if (yank_find_entry(instance)) {
> +        error_setg(errp, "duplicate yank instance");

How could this ever be anything but a programming error?

If it is a programming error, use assert(), just like you do in
yank_unregister_instance() below.

> +        qemu_mutex_unlock(&yank_lock);
> +        return;
> +    }
> +
> +    entry = g_slice_new(YankInstanceEntry);

Why not g_new()?  Hmm, GLib documentation says

    For newly written code it is recommended to use the new g_slice API
    instead of g_malloc() and friends, as long as objects are not
    resized during their lifetime and the object size used at allocation
    time is still available when freeing.

I see.

> +    entry->instance = QAPI_CLONE(YankInstance, instance);

You clone so the callers can build the argument on the stack.  Another
way to skin this cat: caller builds on the heap, this function takes
ownership.  But we're at v9, and your code should work just fine :)

> +    QLIST_INIT(&entry->yankfns);
> +    QLIST_INSERT_HEAD(&yank_instance_list, entry, next);
> +
> +    qemu_mutex_unlock(&yank_lock);
> +}
> +
> +void yank_unregister_instance(const YankInstance *instance)
> +{
> +    YankInstanceEntry *entry;
> +
> +    qemu_mutex_lock(&yank_lock);
> +    entry = yank_find_entry(instance);
> +    assert(entry);
> +
> +    assert(QLIST_EMPTY(&entry->yankfns));
> +    QLIST_REMOVE(entry, next);
> +    qapi_free_YankInstance(entry->instance);
> +    g_slice_free(YankInstanceEntry, entry);
> +
> +    qemu_mutex_unlock(&yank_lock);
> +}
> +
> +void yank_register_function(const YankInstance *instance,
> +                            YankFn *func,
> +                            void *opaque)
> +{
> +    YankInstanceEntry *entry;
> +    YankFuncAndParam *func_entry;
> +
> +    qemu_mutex_lock(&yank_lock);
> +    entry = yank_find_entry(instance);
> +    assert(entry);
> +
> +    func_entry = g_slice_new(YankFuncAndParam);
> +    func_entry->func = func;
> +    func_entry->opaque = opaque;
> +
> +    QLIST_INSERT_HEAD(&entry->yankfns, func_entry, next);
> +    qemu_mutex_unlock(&yank_lock);
> +}
> +
> +void yank_unregister_function(const YankInstance *instance,
> +                              YankFn *func,
> +                              void *opaque)
> +{
> +    YankInstanceEntry *entry;
> +    YankFuncAndParam *func_entry;
> +
> +    qemu_mutex_lock(&yank_lock);
> +    entry = yank_find_entry(instance);
> +    assert(entry);
> +
> +    QLIST_FOREACH(func_entry, &entry->yankfns, next) {
> +        if (func_entry->func == func && func_entry->opaque == opaque) {
> +            QLIST_REMOVE(func_entry, next);
> +            g_slice_free(YankFuncAndParam, func_entry);
> +            qemu_mutex_unlock(&yank_lock);
> +            return;
> +        }
> +    }
> +
> +    abort();
> +}
> +
> +void yank_generic_iochannel(void *opaque)
> +{
> +    QIOChannel *ioc = QIO_CHANNEL(opaque);
> +
> +    qio_channel_shutdown(ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
> +}
> +
> +void qmp_yank(YankInstanceList *instances,
> +              Error **errp)
> +{
> +    YankInstanceList *tail;
> +    YankInstanceEntry *entry;
> +    YankFuncAndParam *func_entry;
> +
> +    qemu_mutex_lock(&yank_lock);
> +    for (tail = instances; tail; tail = tail->next) {
> +        entry = yank_find_entry(tail->value);
> +        if (!entry) {
> +            error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND, "Instance not found");

Quote error.h:

 * Note: use of error classes other than ERROR_CLASS_GENERIC_ERROR is
 * strongly discouraged.

Any particular reason for ERROR_CLASS_DEVICE_NOT_FOUND?  If not, then
error_setg(), please.

> +            qemu_mutex_unlock(&yank_lock);
> +            return;
> +        }
> +    }
> +    for (tail = instances; tail; tail = tail->next) {
> +        entry = yank_find_entry(tail->value);
> +        assert(entry);
> +        QLIST_FOREACH(func_entry, &entry->yankfns, next) {
> +            func_entry->func(func_entry->opaque);
> +        }
> +    }
> +    qemu_mutex_unlock(&yank_lock);
> +}
> +
> +YankInstanceList *qmp_query_yank(Error **errp)
> +{
> +    YankInstanceEntry *entry;
> +    YankInstanceList *ret;
> +
> +    ret = NULL;
> +
> +    qemu_mutex_lock(&yank_lock);
> +    QLIST_FOREACH(entry, &yank_instance_list, next) {
> +        YankInstanceList *new_entry;
> +        new_entry = g_new0(YankInstanceList, 1);
> +        new_entry->value = QAPI_CLONE(YankInstance, entry->instance);
> +        new_entry->next = ret;
> +        ret = new_entry;

There is a pull request pending that adds QAPI_LIST_PREPEND().  If it
gets pulled before you respin, using it would be nice.

> +    }
> +    qemu_mutex_unlock(&yank_lock);

Your critical sections all look "fast" to me.  Good.

> +
> +    return ret;
> +}
> +
> +static void __attribute__((__constructor__)) yank_init(void)
> +{
> +    qemu_mutex_init(&yank_lock);
> +}
> --
> 2.20.1



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v9 1/8] Introduce yank feature
  2020-10-29 16:36   ` Markus Armbruster
@ 2020-10-30 10:32     ` Lukas Straub
  2020-10-30 14:02       ` Markus Armbruster
  0 siblings, 1 reply; 13+ messages in thread
From: Lukas Straub @ 2020-10-30 10:32 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Kevin Wolf, Daniel P. Berrangé,
	qemu-block, Juan Quintela, Dr. David Alan Gilbert, qemu-devel,
	Marc-André Lureau, Paolo Bonzini, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 19696 bytes --]

On Thu, 29 Oct 2020 17:36:14 +0100
Markus Armbruster <armbru@redhat.com> wrote:

> Nothing major, looks almost ready to me.
> 
> Lukas Straub <lukasstraub2@web.de> writes:
> 
> > The yank feature allows to recover from hanging qemu by "yanking"
> > at various parts. Other qemu systems can register themselves and
> > multiple yank functions. Then all yank functions for selected
> > instances can be called by the 'yank' out-of-band qmp command.
> > Available instances can be queried by a 'query-yank' oob command.
> >
> > Signed-off-by: Lukas Straub <lukasstraub2@web.de>
> > Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> >  include/qemu/yank.h |  95 ++++++++++++++++++++
> >  qapi/misc.json      | 106 ++++++++++++++++++++++
> >  util/meson.build    |   1 +
> >  util/yank.c         | 213 ++++++++++++++++++++++++++++++++++++++++++++  
> 
> checkpatch.pl warns:
> 
>     WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
> 
> Can we find a maintainer for the two new files?

Yes, I'm maintaining this for now, see patch 7.

> >  4 files changed, 415 insertions(+)
> >  create mode 100644 include/qemu/yank.h
> >  create mode 100644 util/yank.c
> >
> > diff --git a/include/qemu/yank.h b/include/qemu/yank.h
> > new file mode 100644
> > index 0000000000..89755e62af
> > --- /dev/null
> > +++ b/include/qemu/yank.h
> > @@ -0,0 +1,95 @@
> > +/*
> > + * QEMU yank feature
> > + *
> > + * Copyright (c) Lukas Straub <lukasstraub2@web.de>
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > + * See the COPYING file in the top-level directory.
> > + */
> > +
> > +#ifndef YANK_H
> > +#define YANK_H
> > +
> > +#include "qapi/qapi-types-misc.h"
> > +
> > +typedef void (YankFn)(void *opaque);
> > +
> > +/**
> > + * yank_register_instance: Register a new instance.
> > + *
> > + * This registers a new instance for yanking. Must be called before any yank
> > + * function is registered for this instance.
> > + *
> > + * This function is thread-safe.
> > + *
> > + * @instance: The instance.
> > + * @errp: Error object.
> > + */
> > +void yank_register_instance(const YankInstance *instance, Error **errp);
> > +
> > +/**
> > + * yank_unregister_instance: Unregister a instance.
> > + *
> > + * This unregisters a instance. Must be called only after every yank function
> > + * of the instance has been unregistered.
> > + *
> > + * This function is thread-safe.
> > + *
> > + * @instance: The instance.
> > + */
> > +void yank_unregister_instance(const YankInstance *instance);
> > +
> > +/**
> > + * yank_register_function: Register a yank function
> > + *
> > + * This registers a yank function. All limitations of qmp oob commands apply
> > + * to the yank function as well. See docs/devel/qapi-code-gen.txt under
> > + * "An OOB-capable command handler must satisfy the following conditions".
> > + *
> > + * This function is thread-safe.
> > + *
> > + * @instance: The instance.
> > + * @func: The yank function.
> > + * @opaque: Will be passed to the yank function.
> > + */
> > +void yank_register_function(const YankInstance *instance,
> > +                            YankFn *func,
> > +                            void *opaque);
> > +
> > +/**
> > + * yank_unregister_function: Unregister a yank function
> > + *
> > + * This unregisters a yank function.
> > + *
> > + * This function is thread-safe.
> > + *
> > + * @instance: The instance.
> > + * @func: func that was passed to yank_register_function.
> > + * @opaque: opaque that was passed to yank_register_function.
> > + */
> > +void yank_unregister_function(const YankInstance *instance,
> > +                              YankFn *func,
> > +                              void *opaque);
> > +
> > +/**
> > + * yank_generic_iochannel: Generic yank function for iochannel
> > + *
> > + * This is a generic yank function which will call qio_channel_shutdown on the
> > + * provided QIOChannel.
> > + *
> > + * @opaque: QIOChannel to shutdown
> > + */
> > +void yank_generic_iochannel(void *opaque);
> > +
> > +#define BLOCKDEV_YANK_INSTANCE(the_node_name) (&(YankInstance) { \
> > +        .type = YANK_INSTANCE_TYPE_BLOCKDEV, \
> > +        .u.blockdev.node_name = (the_node_name) })
> > +
> > +#define CHARDEV_YANK_INSTANCE(the_id) (&(YankInstance) { \
> > +        .type = YANK_INSTANCE_TYPE_CHARDEV, \
> > +        .u.chardev.id = (the_id) })
> > +
> > +#define MIGRATION_YANK_INSTANCE (&(YankInstance) { \
> > +        .type = YANK_INSTANCE_TYPE_MIGRATION })
> > +
> > +#endif
> > diff --git a/qapi/misc.json b/qapi/misc.json
> > index 40df513856..3b7de02a4d 100644
> > --- a/qapi/misc.json
> > +++ b/qapi/misc.json
> > @@ -568,3 +568,109 @@
> >   'data': { '*option': 'str' },
> >   'returns': ['CommandLineOptionInfo'],
> >   'allow-preconfig': true }
> > +
> > +##
> > +# @YankInstanceType:
> > +#
> > +# An enumeration of yank instance types. See "YankInstance" for more  
> 
> Please write cross-references as @YankInstance.  This gives us a chance
> to turn them into links (seems not to be implemented, yet).  More of the
> same below.

Changed for the next version.

> > +# information.
> > +#
> > +# Since: 5.2
> > +##
> > +{ 'enum': 'YankInstanceType',
> > +  'data': [ 'blockdev', 'chardev', 'migration' ] }
> > +
> > +##
> > +# @YankInstanceBlockdev:
> > +#
> > +# Specifies which blockdev to yank. See "YankInstance" for more information.
> > +#
> > +# @node-name: the blockdev's node-name  
> 
> Is this really about block devices?  A node-name identifies a block
> graph node, which may or may not be associated with a block device.
> 
> If I understand PATCH 2 correctly, it makes *any* block node with driver
> 'nbd' yankable while it has a network connection.  If that's true, then
> this is definitely about block graph nodes, not about block devices.

Indeed, changed for the next version.

> If it is about devices, let's say "block device", not "blockdev".
> 
> If it is about nodes, let's say "block graph node".  Also rename
> YankInstanceType member 'blockdev' by 'block-node', YankInstanceBlockdev
> to YankInstanceBlockNode.
> 
> > +#
> > +# Since: 5.2
> > +##
> > +{ 'struct': 'YankInstanceBlockdev',
> > +  'data': { 'node-name': 'str' } }
> > +
> > +##
> > +# @YankInstanceChardev:
> > +#
> > +# Specifies which chardev to yank. See "YankInstance" for more information.  
> 
> "character device", not "chardev".

Changed for the next version.

> > +#
> > +# @id: the chardev's ID
> > +#
> > +# Since: 5.2
> > +##
> > +{ 'struct': 'YankInstanceChardev',
> > +  'data': { 'id': 'str' } }  
> 
> This is called 'label' in @ChardevInfo.  It's also called 'id' in
> @chardev-add.  Oh well, 'id' it is.
> 
> > +
> > +##
> > +# @YankInstance:
> > +#
> > +# A yank instance can be yanked with the "yank" qmp command to recover from a
> > +# hanging qemu.  
> 
> QEMU
> 
> > +#
> > +# Currently implemented yank instances:
> > +#  -nbd block device:
> > +#   Yanking it will shutdown the connection to the nbd server without
> > +#   attempting to reconnect.
> > +#  -socket chardev:
> > +#   Yanking it will shutdown the connected socket.
> > +#  -migration:
> > +#   Yanking it will shutdown all migration connections.  
> 
> To my surprise, this is recognized as bullet list markup.  But please
> put a space between the bullet and the text anyway.
> 
> Also: "shutdown" is a noun, the verb is spelled "shut down".

Both changed for the next version.

> In my review of v8, I asked how yanking migration is related to command
> migrate_cancel.  Daniel explained:
> 
>     migrate_cancel will do a shutdown() on the primary migration socket only.
>     In addition it will toggle the migration state.
> 
>     Yanking will do a shutdown on all migration sockets (important for
>     multifd), but won't touch migration state or any other aspect of QEMU
>     code.
> 
>     Overall yanking has less potential for things to go wrong than the
>     migrate_cancel method, as it doesn't try to do any kind of cleanup
>     or migration.
> 
> Would it make sense to work this into the documentation?

How about this?

  - migration:
    Yanking it will shut down all migration connections. Unlike
    @migrate_cancel, it will not notify the migration process,
    so migration will go into @failed state, instead of @cancelled
    state.

> > +#
> > +# Since: 5.2
> > +##
> > +{ 'union': 'YankInstance',
> > +  'base': { 'type': 'YankInstanceType' },
> > +  'discriminator': 'type',
> > +  'data': {
> > +      'blockdev': 'YankInstanceBlockdev',
> > +      'chardev': 'YankInstanceChardev' } }
> > +
> > +##
> > +# @yank:
> > +#
> > +# Recover from hanging qemu by yanking the specified instances. See  
> 
> QEMU
> 
> "Try to recover" would be more precise, I think.

Changed for the next version.

> > +# "YankInstance" for more information.
> > +#
> > +# Takes a list of @YankInstance as argument.
> > +#
> > +# Returns: nothing.
> > +#
> > +# Example:
> > +#
> > +# -> { "execute": "yank",
> > +#      "arguments": {
> > +#          "instances": [
> > +#               { "type": "block-node",
> > +#                 "node-name": "nbd0" }
> > +#          ] } }
> > +# <- { "return": {} }
> > +#
> > +# Since: 5.2
> > +##
> > +{ 'command': 'yank',
> > +  'data': { 'instances': ['YankInstance'] },
> > +  'allow-oob': true }
> > +
> > +##
> > +# @query-yank:
> > +#
> > +# Query yank instances. See "YankInstance" for more information.
> > +#
> > +# Returns: list of @YankInstance
> > +#
> > +# Example:
> > +#
> > +# -> { "execute": "query-yank" }
> > +# <- { "return": [
> > +#          { "type": "block-node",
> > +#            "node-name": "nbd0" }
> > +#      ] }
> > +#
> > +# Since: 5.2
> > +##
> > +{ 'command': 'query-yank',
> > +  'returns': ['YankInstance'],
> > +  'allow-oob': true }
> > diff --git a/util/meson.build b/util/meson.build
> > index c5159ad79d..dbda9d9123 100644
> > --- a/util/meson.build
> > +++ b/util/meson.build
> > @@ -50,6 +50,7 @@ endif
> >
> >  if have_system
> >    util_ss.add(when: 'CONFIG_GIO', if_true: [files('dbus.c'), gio])
> > +  util_ss.add(files('yank.c'))
> >  endif
> >
> >  if have_block
> > diff --git a/util/yank.c b/util/yank.c
> > new file mode 100644
> > index 0000000000..0b3a816706
> > --- /dev/null
> > +++ b/util/yank.c
> > @@ -0,0 +1,213 @@
> > +/*
> > + * QEMU yank feature
> > + *
> > + * Copyright (c) Lukas Straub <lukasstraub2@web.de>
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > + * See the COPYING file in the top-level directory.
> > + */
> > +
> > +#include "qemu/osdep.h"
> > +#include "qapi/error.h"
> > +#include "qemu/thread.h"
> > +#include "qemu/queue.h"
> > +#include "qapi/qapi-commands-misc.h"
> > +#include "qapi/qapi-visit-misc.h"
> > +#include "qapi/clone-visitor.h"
> > +#include "io/channel.h"
> > +#include "qemu/yank.h"
> > +
> > +struct YankFuncAndParam {
> > +    YankFn *func;
> > +    void *opaque;
> > +    QLIST_ENTRY(YankFuncAndParam) next;
> > +};
> > +
> > +struct YankInstanceEntry {
> > +    YankInstance *instance;
> > +    QLIST_HEAD(, YankFuncAndParam) yankfns;
> > +    QLIST_ENTRY(YankInstanceEntry) next;
> > +};
> > +
> > +typedef struct YankFuncAndParam YankFuncAndParam;
> > +typedef struct YankInstanceEntry YankInstanceEntry;
> > +
> > +/*
> > + * This lock protects the yank_instance_list below.  
> 
> Please add something like
> 
>     * Because it's taken by OOB-capable commands, it must be "fast",
>     * i.e. it may only be held for a bounded, short time.  See
>     * docs/devel/qapi-code-gen.txt for additional information.

Changed for the next version.

> > + */
> > +static QemuMutex yank_lock;
> > +
> > +static QLIST_HEAD(, YankInstanceEntry) yank_instance_list
> > +    = QLIST_HEAD_INITIALIZER(yank_instance_list);
> > +
> > +static int yank_compare_instances(const YankInstance *a, const YankInstance *b)
> > +{
> > +    if (a->type != b->type) {
> > +        return 0;
> > +    }
> > +
> > +    switch (a->type) {
> > +    case YANK_INSTANCE_TYPE_BLOCKDEV:
> > +        return !strcmp(a->u.blockdev.node_name, b->u.blockdev.node_name);
> > +    break;
> > +
> > +    case YANK_INSTANCE_TYPE_CHARDEV:
> > +        return !strcmp(a->u.chardev.id, b->u.chardev.id);
> > +    break;
> > +
> > +    case YANK_INSTANCE_TYPE_MIGRATION:
> > +        return 1;
> > +    break;
> > +
> > +    default:
> > +        abort();
> > +    }
> > +}  
> 
> Please make this function return bool.  When a comparison function
> returns int, I expect it to return negative value when less, zero when
> equal, and positive value when greater.

Changed for the next version.

> > +
> > +static YankInstanceEntry *yank_find_entry(const YankInstance *instance)
> > +{
> > +    YankInstanceEntry *entry;
> > +
> > +    QLIST_FOREACH(entry, &yank_instance_list, next) {
> > +        if (yank_compare_instances(entry->instance, instance)) {
> > +            return entry;
> > +        }
> > +    }
> > +    return NULL;
> > +}
> > +
> > +void yank_register_instance(const YankInstance *instance, Error **errp)
> > +{
> > +    YankInstanceEntry *entry;
> > +
> > +    qemu_mutex_lock(&yank_lock);
> > +
> > +    if (yank_find_entry(instance)) {
> > +        error_setg(errp, "duplicate yank instance");  
> 
> How could this ever be anything but a programming error?
> 
> If it is a programming error, use assert(), just like you do in
> yank_unregister_instance() below.

The chardev code initializes the chardev-socket which in turn initiates the
connection, before checking for duplicate id. So we'll have to catch it here.

> 
> > +        qemu_mutex_unlock(&yank_lock);
> > +        return;
> > +    }
> > +
> > +    entry = g_slice_new(YankInstanceEntry);  
> 
> Why not g_new()?  Hmm, GLib documentation says
> 
>     For newly written code it is recommended to use the new g_slice API
>     instead of g_malloc() and friends, as long as objects are not
>     resized during their lifetime and the object size used at allocation
>     time is still available when freeing.
> 
> I see.
> 
> > +    entry->instance = QAPI_CLONE(YankInstance, instance);  
> 
> You clone so the callers can build the argument on the stack.  Another
> way to skin this cat: caller builds on the heap, this function takes
> ownership.  But we're at v9, and your code should work just fine :)

This makes the code simpler in the later patches.

> > +    QLIST_INIT(&entry->yankfns);
> > +    QLIST_INSERT_HEAD(&yank_instance_list, entry, next);
> > +
> > +    qemu_mutex_unlock(&yank_lock);
> > +}
> > +
> > +void yank_unregister_instance(const YankInstance *instance)
> > +{
> > +    YankInstanceEntry *entry;
> > +
> > +    qemu_mutex_lock(&yank_lock);
> > +    entry = yank_find_entry(instance);
> > +    assert(entry);
> > +
> > +    assert(QLIST_EMPTY(&entry->yankfns));
> > +    QLIST_REMOVE(entry, next);
> > +    qapi_free_YankInstance(entry->instance);
> > +    g_slice_free(YankInstanceEntry, entry);
> > +
> > +    qemu_mutex_unlock(&yank_lock);
> > +}
> > +
> > +void yank_register_function(const YankInstance *instance,
> > +                            YankFn *func,
> > +                            void *opaque)
> > +{
> > +    YankInstanceEntry *entry;
> > +    YankFuncAndParam *func_entry;
> > +
> > +    qemu_mutex_lock(&yank_lock);
> > +    entry = yank_find_entry(instance);
> > +    assert(entry);
> > +
> > +    func_entry = g_slice_new(YankFuncAndParam);
> > +    func_entry->func = func;
> > +    func_entry->opaque = opaque;
> > +
> > +    QLIST_INSERT_HEAD(&entry->yankfns, func_entry, next);
> > +    qemu_mutex_unlock(&yank_lock);
> > +}
> > +
> > +void yank_unregister_function(const YankInstance *instance,
> > +                              YankFn *func,
> > +                              void *opaque)
> > +{
> > +    YankInstanceEntry *entry;
> > +    YankFuncAndParam *func_entry;
> > +
> > +    qemu_mutex_lock(&yank_lock);
> > +    entry = yank_find_entry(instance);
> > +    assert(entry);
> > +
> > +    QLIST_FOREACH(func_entry, &entry->yankfns, next) {
> > +        if (func_entry->func == func && func_entry->opaque == opaque) {
> > +            QLIST_REMOVE(func_entry, next);
> > +            g_slice_free(YankFuncAndParam, func_entry);
> > +            qemu_mutex_unlock(&yank_lock);
> > +            return;
> > +        }
> > +    }
> > +
> > +    abort();
> > +}
> > +
> > +void yank_generic_iochannel(void *opaque)
> > +{
> > +    QIOChannel *ioc = QIO_CHANNEL(opaque);
> > +
> > +    qio_channel_shutdown(ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
> > +}
> > +
> > +void qmp_yank(YankInstanceList *instances,
> > +              Error **errp)
> > +{
> > +    YankInstanceList *tail;
> > +    YankInstanceEntry *entry;
> > +    YankFuncAndParam *func_entry;
> > +
> > +    qemu_mutex_lock(&yank_lock);
> > +    for (tail = instances; tail; tail = tail->next) {
> > +        entry = yank_find_entry(tail->value);
> > +        if (!entry) {
> > +            error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND, "Instance not found");  
> 
> Quote error.h:
> 
>  * Note: use of error classes other than ERROR_CLASS_GENERIC_ERROR is
>  * strongly discouraged.
> 
> Any particular reason for ERROR_CLASS_DEVICE_NOT_FOUND?  If not, then
> error_setg(), please.

There may be a time-to-check to time-to-use race condition here. Suppose
the management application (via QMP) calls 'blockev-del nbd0', then QEMU
hangs, so after a timeout it calls 'yank nbd0' while shortly before that
QEMU unhangs for some reason and removes the blockdev. Then yank returns
this error.

QMP errors should not be parsed except for the error class, so the the
management application can only differentiate between this (ignorable)
error and other (possibly fatal) ones by parsing the error class.

> 
> > +            qemu_mutex_unlock(&yank_lock);
> > +            return;
> > +        }
> > +    }
> > +    for (tail = instances; tail; tail = tail->next) {
> > +        entry = yank_find_entry(tail->value);
> > +        assert(entry);
> > +        QLIST_FOREACH(func_entry, &entry->yankfns, next) {
> > +            func_entry->func(func_entry->opaque);
> > +        }
> > +    }
> > +    qemu_mutex_unlock(&yank_lock);
> > +}
> > +
> > +YankInstanceList *qmp_query_yank(Error **errp)
> > +{
> > +    YankInstanceEntry *entry;
> > +    YankInstanceList *ret;
> > +
> > +    ret = NULL;
> > +
> > +    qemu_mutex_lock(&yank_lock);
> > +    QLIST_FOREACH(entry, &yank_instance_list, next) {
> > +        YankInstanceList *new_entry;
> > +        new_entry = g_new0(YankInstanceList, 1);
> > +        new_entry->value = QAPI_CLONE(YankInstance, entry->instance);
> > +        new_entry->next = ret;
> > +        ret = new_entry;  
> 
> There is a pull request pending that adds QAPI_LIST_PREPEND().  If it
> gets pulled before you respin, using it would be nice.
> 
> > +    }
> > +    qemu_mutex_unlock(&yank_lock);  
> 
> Your critical sections all look "fast" to me.  Good.
> 
> > +
> > +    return ret;
> > +}
> > +
> > +static void __attribute__((__constructor__)) yank_init(void)
> > +{
> > +    qemu_mutex_init(&yank_lock);
> > +}
> > --
> > 2.20.1  
> 

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v9 1/8] Introduce yank feature
  2020-10-30 10:32     ` Lukas Straub
@ 2020-10-30 14:02       ` Markus Armbruster
  2020-10-30 15:19         ` Lukas Straub
  0 siblings, 1 reply; 13+ messages in thread
From: Markus Armbruster @ 2020-10-30 14:02 UTC (permalink / raw)
  To: Lukas Straub
  Cc: Kevin Wolf, Daniel P. Berrangé,
	qemu-block, Juan Quintela, qemu-devel, Dr. David Alan Gilbert,
	Paolo Bonzini, Marc-André Lureau, Max Reitz

Lukas Straub <lukasstraub2@web.de> writes:

> On Thu, 29 Oct 2020 17:36:14 +0100
> Markus Armbruster <armbru@redhat.com> wrote:
>
>> Nothing major, looks almost ready to me.
>> 
>> Lukas Straub <lukasstraub2@web.de> writes:
>> 
>> > The yank feature allows to recover from hanging qemu by "yanking"
>> > at various parts. Other qemu systems can register themselves and
>> > multiple yank functions. Then all yank functions for selected
>> > instances can be called by the 'yank' out-of-band qmp command.
>> > Available instances can be queried by a 'query-yank' oob command.
>> >
>> > Signed-off-by: Lukas Straub <lukasstraub2@web.de>
>> > Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
>> > ---
>> >  include/qemu/yank.h |  95 ++++++++++++++++++++
>> >  qapi/misc.json      | 106 ++++++++++++++++++++++
>> >  util/meson.build    |   1 +
>> >  util/yank.c         | 213 ++++++++++++++++++++++++++++++++++++++++++++  
>> 
>> checkpatch.pl warns:
>> 
>>     WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
>> 
>> Can we find a maintainer for the two new files?
>
> Yes, I'm maintaining this for now, see patch 7.

Thanks!  Would it make sense to add the yank stuff to a new QAPI module
yank.json instead of misc.jaon, so the new MAINTAINERS stanza can cover
it?

[...]
>> > diff --git a/qapi/misc.json b/qapi/misc.json
>> > index 40df513856..3b7de02a4d 100644
>> > --- a/qapi/misc.json
>> > +++ b/qapi/misc.json
[...]
>> > +##
>> > +# @YankInstance:
>> > +#
>> > +# A yank instance can be yanked with the "yank" qmp command to recover from a
>> > +# hanging qemu.  
>> 
>> QEMU
>> 
>> > +#
>> > +# Currently implemented yank instances:
>> > +#  -nbd block device:
>> > +#   Yanking it will shutdown the connection to the nbd server without
>> > +#   attempting to reconnect.
>> > +#  -socket chardev:
>> > +#   Yanking it will shutdown the connected socket.
>> > +#  -migration:
>> > +#   Yanking it will shutdown all migration connections.  
>> 
>> To my surprise, this is recognized as bullet list markup.  But please
>> put a space between the bullet and the text anyway.
>> 
>> Also: "shutdown" is a noun, the verb is spelled "shut down".
>
> Both changed for the next version.
>
>> In my review of v8, I asked how yanking migration is related to command
>> migrate_cancel.  Daniel explained:
>> 
>>     migrate_cancel will do a shutdown() on the primary migration socket only.
>>     In addition it will toggle the migration state.
>> 
>>     Yanking will do a shutdown on all migration sockets (important for
>>     multifd), but won't touch migration state or any other aspect of QEMU
>>     code.
>> 
>>     Overall yanking has less potential for things to go wrong than the
>>     migrate_cancel method, as it doesn't try to do any kind of cleanup
>>     or migration.
>> 
>> Would it make sense to work this into the documentation?
>
> How about this?
>
>   - migration:
>     Yanking it will shut down all migration connections. Unlike
>     @migrate_cancel, it will not notify the migration process,
>     so migration will go into @failed state, instead of @cancelled
>     state.

Works for me.  Advice on when to use it rather than migrate_cancel would
be nice, though.

>> > +#
>> > +# Since: 5.2
>> > +##
>> > +{ 'union': 'YankInstance',
>> > +  'base': { 'type': 'YankInstanceType' },
>> > +  'discriminator': 'type',
>> > +  'data': {
>> > +      'blockdev': 'YankInstanceBlockdev',
>> > +      'chardev': 'YankInstanceChardev' } }
>> > +
>> > +##
>> > +# @yank:
>> > +#
>> > +# Recover from hanging qemu by yanking the specified instances. See  
>> 
>> QEMU
>> 
>> "Try to recover" would be more precise, I think.
>
> Changed for the next version.
>
>> > +# "YankInstance" for more information.
>> > +#
>> > +# Takes a list of @YankInstance as argument.
>> > +#
>> > +# Returns: nothing.
>> > +#
>> > +# Example:
>> > +#
>> > +# -> { "execute": "yank",
>> > +#      "arguments": {
>> > +#          "instances": [
>> > +#               { "type": "block-node",
>> > +#                 "node-name": "nbd0" }
>> > +#          ] } }
>> > +# <- { "return": {} }
>> > +#
>> > +# Since: 5.2
>> > +##
>> > +{ 'command': 'yank',
>> > +  'data': { 'instances': ['YankInstance'] },
>> > +  'allow-oob': true }
>> > +
[...]
>> > diff --git a/util/yank.c b/util/yank.c
>> > new file mode 100644
>> > index 0000000000..0b3a816706
>> > --- /dev/null
>> > +++ b/util/yank.c
[...]
>> > +void qmp_yank(YankInstanceList *instances,
>> > +              Error **errp)
>> > +{
>> > +    YankInstanceList *tail;
>> > +    YankInstanceEntry *entry;
>> > +    YankFuncAndParam *func_entry;
>> > +
>> > +    qemu_mutex_lock(&yank_lock);
>> > +    for (tail = instances; tail; tail = tail->next) {
>> > +        entry = yank_find_entry(tail->value);
>> > +        if (!entry) {
>> > +            error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND, "Instance not found");  
>> 
>> Quote error.h:
>> 
>>  * Note: use of error classes other than ERROR_CLASS_GENERIC_ERROR is
>>  * strongly discouraged.
>> 
>> Any particular reason for ERROR_CLASS_DEVICE_NOT_FOUND?  If not, then
>> error_setg(), please.
>
> There may be a time-to-check to time-to-use race condition here. Suppose
> the management application (via QMP) calls 'blockev-del nbd0', then QEMU
> hangs, so after a timeout it calls 'yank nbd0' while shortly before that
> QEMU unhangs for some reason and removes the blockdev. Then yank returns
> this error.
>
> QMP errors should not be parsed except for the error class, so the the
> management application can only differentiate between this (ignorable)
> error and other (possibly fatal) ones by parsing the error class.

I see.

DeviceNotFound doesn't really fit, but none of the other error classes
is any better.

I think the line

      # Returns: nothing.

in the QAPI schema (quoted above) should be expanded to something like


      # Returns: - Nothing on success
                 - If the YankInstance doesn't exist, DeviceNotFound

>> > +            qemu_mutex_unlock(&yank_lock);
>> > +            return;
>> > +        }
>> > +    }
>> > +    for (tail = instances; tail; tail = tail->next) {
>> > +        entry = yank_find_entry(tail->value);
>> > +        assert(entry);
>> > +        QLIST_FOREACH(func_entry, &entry->yankfns, next) {
>> > +            func_entry->func(func_entry->opaque);
>> > +        }
>> > +    }
>> > +    qemu_mutex_unlock(&yank_lock);
>> > +}
[...]



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v9 1/8] Introduce yank feature
  2020-10-30 14:02       ` Markus Armbruster
@ 2020-10-30 15:19         ` Lukas Straub
  0 siblings, 0 replies; 13+ messages in thread
From: Lukas Straub @ 2020-10-30 15:19 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Kevin Wolf, Daniel P. Berrangé,
	qemu-block, Juan Quintela, qemu-devel, Dr. David Alan Gilbert,
	Paolo Bonzini, Marc-André Lureau, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 7284 bytes --]

On Fri, 30 Oct 2020 15:02:09 +0100
Markus Armbruster <armbru@redhat.com> wrote:

> Lukas Straub <lukasstraub2@web.de> writes:
> 
> > On Thu, 29 Oct 2020 17:36:14 +0100
> > Markus Armbruster <armbru@redhat.com> wrote:
> >  
> >> Nothing major, looks almost ready to me.
> >> 
> >> Lukas Straub <lukasstraub2@web.de> writes:
> >>   
> >> > The yank feature allows to recover from hanging qemu by "yanking"
> >> > at various parts. Other qemu systems can register themselves and
> >> > multiple yank functions. Then all yank functions for selected
> >> > instances can be called by the 'yank' out-of-band qmp command.
> >> > Available instances can be queried by a 'query-yank' oob command.
> >> >
> >> > Signed-off-by: Lukas Straub <lukasstraub2@web.de>
> >> > Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
> >> > ---
> >> >  include/qemu/yank.h |  95 ++++++++++++++++++++
> >> >  qapi/misc.json      | 106 ++++++++++++++++++++++
> >> >  util/meson.build    |   1 +
> >> >  util/yank.c         | 213 ++++++++++++++++++++++++++++++++++++++++++++    
> >> 
> >> checkpatch.pl warns:
> >> 
> >>     WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
> >> 
> >> Can we find a maintainer for the two new files?  
> >
> > Yes, I'm maintaining this for now, see patch 7.  
> 
> Thanks!  Would it make sense to add the yank stuff to a new QAPI module
> yank.json instead of misc.jaon, so the new MAINTAINERS stanza can cover
> it?

Yes, makes sense. Changed for the next version.

> [...]
> >> > diff --git a/qapi/misc.json b/qapi/misc.json
> >> > index 40df513856..3b7de02a4d 100644
> >> > --- a/qapi/misc.json
> >> > +++ b/qapi/misc.json  
> [...]
> >> > +##
> >> > +# @YankInstance:
> >> > +#
> >> > +# A yank instance can be yanked with the "yank" qmp command to recover from a
> >> > +# hanging qemu.    
> >> 
> >> QEMU
> >>   
> >> > +#
> >> > +# Currently implemented yank instances:
> >> > +#  -nbd block device:
> >> > +#   Yanking it will shutdown the connection to the nbd server without
> >> > +#   attempting to reconnect.
> >> > +#  -socket chardev:
> >> > +#   Yanking it will shutdown the connected socket.
> >> > +#  -migration:
> >> > +#   Yanking it will shutdown all migration connections.    
> >> 
> >> To my surprise, this is recognized as bullet list markup.  But please
> >> put a space between the bullet and the text anyway.
> >> 
> >> Also: "shutdown" is a noun, the verb is spelled "shut down".  
> >
> > Both changed for the next version.
> >  
> >> In my review of v8, I asked how yanking migration is related to command
> >> migrate_cancel.  Daniel explained:
> >> 
> >>     migrate_cancel will do a shutdown() on the primary migration socket only.
> >>     In addition it will toggle the migration state.
> >> 
> >>     Yanking will do a shutdown on all migration sockets (important for
> >>     multifd), but won't touch migration state or any other aspect of QEMU
> >>     code.
> >> 
> >>     Overall yanking has less potential for things to go wrong than the
> >>     migrate_cancel method, as it doesn't try to do any kind of cleanup
> >>     or migration.
> >> 
> >> Would it make sense to work this into the documentation?  
> >
> > How about this?
> >
> >   - migration:
> >     Yanking it will shut down all migration connections. Unlike
> >     @migrate_cancel, it will not notify the migration process,
> >     so migration will go into @failed state, instead of @cancelled
> >     state.  
> 
> Works for me.  Advice on when to use it rather than migrate_cancel would
> be nice, though.

Ok, Changed for the next version.

> >> > +#
> >> > +# Since: 5.2
> >> > +##
> >> > +{ 'union': 'YankInstance',
> >> > +  'base': { 'type': 'YankInstanceType' },
> >> > +  'discriminator': 'type',
> >> > +  'data': {
> >> > +      'blockdev': 'YankInstanceBlockdev',
> >> > +      'chardev': 'YankInstanceChardev' } }
> >> > +
> >> > +##
> >> > +# @yank:
> >> > +#
> >> > +# Recover from hanging qemu by yanking the specified instances. See    
> >> 
> >> QEMU
> >> 
> >> "Try to recover" would be more precise, I think.  
> >
> > Changed for the next version.
> >  
> >> > +# "YankInstance" for more information.
> >> > +#
> >> > +# Takes a list of @YankInstance as argument.
> >> > +#
> >> > +# Returns: nothing.
> >> > +#
> >> > +# Example:
> >> > +#
> >> > +# -> { "execute": "yank",
> >> > +#      "arguments": {
> >> > +#          "instances": [
> >> > +#               { "type": "block-node",
> >> > +#                 "node-name": "nbd0" }
> >> > +#          ] } }
> >> > +# <- { "return": {} }
> >> > +#
> >> > +# Since: 5.2
> >> > +##
> >> > +{ 'command': 'yank',
> >> > +  'data': { 'instances': ['YankInstance'] },
> >> > +  'allow-oob': true }
> >> > +  
> [...]
> >> > diff --git a/util/yank.c b/util/yank.c
> >> > new file mode 100644
> >> > index 0000000000..0b3a816706
> >> > --- /dev/null
> >> > +++ b/util/yank.c  
> [...]
> >> > +void qmp_yank(YankInstanceList *instances,
> >> > +              Error **errp)
> >> > +{
> >> > +    YankInstanceList *tail;
> >> > +    YankInstanceEntry *entry;
> >> > +    YankFuncAndParam *func_entry;
> >> > +
> >> > +    qemu_mutex_lock(&yank_lock);
> >> > +    for (tail = instances; tail; tail = tail->next) {
> >> > +        entry = yank_find_entry(tail->value);
> >> > +        if (!entry) {
> >> > +            error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND, "Instance not found");    
> >> 
> >> Quote error.h:
> >> 
> >>  * Note: use of error classes other than ERROR_CLASS_GENERIC_ERROR is
> >>  * strongly discouraged.
> >> 
> >> Any particular reason for ERROR_CLASS_DEVICE_NOT_FOUND?  If not, then
> >> error_setg(), please.  
> >
> > There may be a time-to-check to time-to-use race condition here. Suppose
> > the management application (via QMP) calls 'blockev-del nbd0', then QEMU
> > hangs, so after a timeout it calls 'yank nbd0' while shortly before that
> > QEMU unhangs for some reason and removes the blockdev. Then yank returns
> > this error.
> >
> > QMP errors should not be parsed except for the error class, so the the
> > management application can only differentiate between this (ignorable)
> > error and other (possibly fatal) ones by parsing the error class.  
> 
> I see.
> 
> DeviceNotFound doesn't really fit, but none of the other error classes
> is any better.
> 
> I think the line
> 
>       # Returns: nothing.
> 
> in the QAPI schema (quoted above) should be expanded to something like
> 
> 
>       # Returns: - Nothing on success
>                  - If the YankInstance doesn't exist, DeviceNotFound

Changed for the next version.

> >> > +            qemu_mutex_unlock(&yank_lock);
> >> > +            return;
> >> > +        }
> >> > +    }
> >> > +    for (tail = instances; tail; tail = tail->next) {
> >> > +        entry = yank_find_entry(tail->value);
> >> > +        assert(entry);
> >> > +        QLIST_FOREACH(func_entry, &entry->yankfns, next) {
> >> > +            func_entry->func(func_entry->opaque);
> >> > +        }
> >> > +    }
> >> > +    qemu_mutex_unlock(&yank_lock);
> >> > +}  
> [...]
> 



-- 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-10-30 15:21 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-28 18:45 [PATCH v9 0/8] Introduce 'yank' oob qmp command to recover from hanging qemu Lukas Straub
2020-10-28 18:45 ` [PATCH v9 1/8] Introduce yank feature Lukas Straub
2020-10-29 16:36   ` Markus Armbruster
2020-10-30 10:32     ` Lukas Straub
2020-10-30 14:02       ` Markus Armbruster
2020-10-30 15:19         ` Lukas Straub
2020-10-28 18:45 ` [PATCH v9 2/8] block/nbd.c: Add " Lukas Straub
2020-10-28 18:45 ` [PATCH v9 3/8] chardev/char-socket.c: " Lukas Straub
2020-10-28 18:45 ` [PATCH v9 4/8] migration: " Lukas Straub
2020-10-28 18:45 ` [PATCH v9 5/8] io/channel-tls.c: make qio_channel_tls_shutdown thread-safe Lukas Straub
2020-10-28 18:45 ` [PATCH v9 6/8] io: Document qmp oob suitability of qio_channel_shutdown and io_shutdown Lukas Straub
2020-10-28 18:45 ` [PATCH v9 7/8] MAINTAINERS: Add myself as maintainer for yank feature Lukas Straub
2020-10-28 18:45 ` [PATCH v9 8/8] tests/test-char.c: Wait for the chardev to connect in char_socket_client_dupid_test Lukas Straub

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.