qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v12 0/7] Introduce 'yank' oob qmp command to recover from hanging qemu
@ 2020-12-13 11:48 Lukas Straub
  2020-12-13 11:48 ` [PATCH v12 1/7] Introduce yank feature Lukas Straub
                   ` (7 more replies)
  0 siblings, 8 replies; 12+ messages in thread
From: Lukas Straub @ 2020-12-13 11:48 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Daniel P. Berrangé,
	qemu-block, Juan Quintela, Markus Armbruster,
	Dr. David Alan Gilbert, Max Reitz, Marc-André Lureau,
	Paolo Bonzini, Marc-André Lureau

[-- Attachment #1: Type: text/plain, Size: 4141 bytes --]


Hello Everyone,
So here is v12.
@Marc-André Lureau, We still need an ACK for the chardev patch.

Changes:

v12:
 -rebase onto master
  -minor change to migration (removal of "defer" branch in qemu_start_incoming_migration)
 -add Reviewed-by tags

v11:
 -squashed MAINTAINERS update into patch 1
 -move qmp doc of yank before misc
 -add title for qmp docs
 -change "Since:" to 6.0
 -add Reviewed-by tags

v10:
 -moved from qapi/misc.json to qapi/yank.json
 -rename 'blockdev' -> 'block-node'
 -document difference betwen migration yank instance and migrate_cancel
 -better document return values of yank command
 -better document yank_lock
 -minor style and spelling fixes

v9:
 -rebase onto master
 -implemented new qmp api as proposed by Markus

v8:
 -add Reviewed-by and Acked-by tags
 -rebase onto master
  -minor change to migration
  -convert to meson
 -change "Since:" to 5.2
 -varios code style fixes (Markus Armbruster)
 -point to oob restrictions in comment to yank_register_function
  (Markus Armbruster)
 -improve qmp documentation (Markus Armbruster)
 -document oob suitability of qio_channel and io_shutdown (Markus Armbruster)

v7:
 -yank_register_instance now returns error via Error **errp instead of aborting
 -dropped "chardev/char.c: Check for duplicate id before  creating chardev"

v6:
 -add Reviewed-by and Acked-by tags
 -rebase on master
 -lots of changes in nbd due to rebase
 -only take maintainership of util/yank.c and include/qemu/yank.h (Daniel P. Berrangé)
 -fix a crash discovered by the newly added chardev test
 -fix the test itself

v5:
 -move yank.c to util/
 -move yank.h to include/qemu/
 -add license to yank.h
 -use const char*
 -nbd: use atomic_store_release and atomic_load_aqcuire
 -io-channel: ensure thread-safety and document it
 -add myself as maintainer for yank

v4:
 -fix build errors...

v3:
 -don't touch softmmu/vl.c, use __contructor__ attribute instead (Paolo Bonzini)
 -fix build errors
 -rewrite migration patch so it actually passes all tests

v2:
 -don't touch io/ code anymore
 -always register yank functions
 -'yank' now takes a list of instances to yank
 -'query-yank' returns a list of yankable instances

Overview:
Hello Everyone,
In many cases, if qemu has a network connection (qmp, migration, chardev, etc.)
to some other server and that server dies or hangs, qemu hangs too.
These patches introduce the new 'yank' out-of-band qmp command to recover from
these kinds of hangs. The different subsystems register callbacks which get
executed with the yank command. For example the callback can shutdown() a
socket. This is intended for the colo use-case, but it can be used for other
things too of course.

Regards,
Lukas Straub

Lukas Straub (7):
  Introduce yank feature
  block/nbd.c: Add yank feature
  chardev/char-socket.c: Add yank feature
  migration: Add yank feature
  io/channel-tls.c: make qio_channel_tls_shutdown thread-safe
  io: Document qmp oob suitability of qio_channel_shutdown and
    io_shutdown
  tests/test-char.c: Wait for the chardev to connect in
    char_socket_client_dupid_test

 MAINTAINERS                   |   7 ++
 block/nbd.c                   | 154 ++++++++++++++----------
 chardev/char-socket.c         |  35 ++++++
 include/io/channel.h          |   5 +-
 include/qemu/yank.h           |  95 +++++++++++++++
 io/channel-tls.c              |   6 +-
 migration/channel.c           |  13 ++
 migration/migration.c         |  24 ++++
 migration/multifd.c           |  10 ++
 migration/qemu-file-channel.c |   7 ++
 migration/savevm.c            |   6 +
 qapi/meson.build              |   1 +
 qapi/qapi-schema.json         |   1 +
 qapi/yank.json                | 119 +++++++++++++++++++
 tests/test-char.c             |   1 +
 util/meson.build              |   1 +
 util/yank.c                   | 216 ++++++++++++++++++++++++++++++++++
 17 files changed, 637 insertions(+), 64 deletions(-)
 create mode 100644 include/qemu/yank.h
 create mode 100644 qapi/yank.json
 create mode 100644 util/yank.c

--
2.20.1

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v12 1/7] Introduce yank feature
  2020-12-13 11:48 [PATCH v12 0/7] Introduce 'yank' oob qmp command to recover from hanging qemu Lukas Straub
@ 2020-12-13 11:48 ` Lukas Straub
  2020-12-22  8:00   ` Marc-André Lureau
  2020-12-13 11:48 ` [PATCH v12 2/7] block/nbd.c: Add " Lukas Straub
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 12+ messages in thread
From: Lukas Straub @ 2020-12-13 11:48 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Daniel P. Berrangé,
	qemu-block, Juan Quintela, Markus Armbruster,
	Dr. David Alan Gilbert, Max Reitz, Marc-André Lureau,
	Paolo Bonzini, Marc-André Lureau

[-- Attachment #1: Type: text/plain, Size: 14479 bytes --]

The yank feature allows to recover from hanging qemu by "yanking"
at various parts. Other qemu systems can register themselves and
multiple yank functions. Then all yank functions for selected
instances can be called by the 'yank' out-of-band qmp command.
Available instances can be queried by a 'query-yank' oob command.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
---
 MAINTAINERS           |   7 ++
 include/qemu/yank.h   |  95 +++++++++++++++++++
 qapi/meson.build      |   1 +
 qapi/qapi-schema.json |   1 +
 qapi/yank.json        | 119 +++++++++++++++++++++++
 util/meson.build      |   1 +
 util/yank.c           | 216 ++++++++++++++++++++++++++++++++++++++++++
 7 files changed, 440 insertions(+)
 create mode 100644 include/qemu/yank.h
 create mode 100644 qapi/yank.json
 create mode 100644 util/yank.c

diff --git a/MAINTAINERS b/MAINTAINERS
index d48a4e8a8b..5d7e3c0e4b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2705,6 +2705,13 @@ F: util/uuid.c
 F: include/qemu/uuid.h
 F: tests/test-uuid.c

+Yank feature
+M: Lukas Straub <lukasstraub2@web.de>
+S: Odd fixes
+F: util/yank.c
+F: include/qemu/yank.h
+F: qapi/yank.json
+
 COLO Framework
 M: zhanghailiang <zhang.zhanghailiang@huawei.com>
 S: Maintained
diff --git a/include/qemu/yank.h b/include/qemu/yank.h
new file mode 100644
index 0000000000..96f5b2626f
--- /dev/null
+++ b/include/qemu/yank.h
@@ -0,0 +1,95 @@
+/*
+ * QEMU yank feature
+ *
+ * Copyright (c) Lukas Straub <lukasstraub2@web.de>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef YANK_H
+#define YANK_H
+
+#include "qapi/qapi-types-yank.h"
+
+typedef void (YankFn)(void *opaque);
+
+/**
+ * yank_register_instance: Register a new instance.
+ *
+ * This registers a new instance for yanking. Must be called before any yank
+ * function is registered for this instance.
+ *
+ * This function is thread-safe.
+ *
+ * @instance: The instance.
+ * @errp: Error object.
+ */
+void yank_register_instance(const YankInstance *instance, Error **errp);
+
+/**
+ * yank_unregister_instance: Unregister a instance.
+ *
+ * This unregisters a instance. Must be called only after every yank function
+ * of the instance has been unregistered.
+ *
+ * This function is thread-safe.
+ *
+ * @instance: The instance.
+ */
+void yank_unregister_instance(const YankInstance *instance);
+
+/**
+ * yank_register_function: Register a yank function
+ *
+ * This registers a yank function. All limitations of qmp oob commands apply
+ * to the yank function as well. See docs/devel/qapi-code-gen.txt under
+ * "An OOB-capable command handler must satisfy the following conditions".
+ *
+ * This function is thread-safe.
+ *
+ * @instance: The instance.
+ * @func: The yank function.
+ * @opaque: Will be passed to the yank function.
+ */
+void yank_register_function(const YankInstance *instance,
+                            YankFn *func,
+                            void *opaque);
+
+/**
+ * yank_unregister_function: Unregister a yank function
+ *
+ * This unregisters a yank function.
+ *
+ * This function is thread-safe.
+ *
+ * @instance: The instance.
+ * @func: func that was passed to yank_register_function.
+ * @opaque: opaque that was passed to yank_register_function.
+ */
+void yank_unregister_function(const YankInstance *instance,
+                              YankFn *func,
+                              void *opaque);
+
+/**
+ * yank_generic_iochannel: Generic yank function for iochannel
+ *
+ * This is a generic yank function which will call qio_channel_shutdown on the
+ * provided QIOChannel.
+ *
+ * @opaque: QIOChannel to shutdown
+ */
+void yank_generic_iochannel(void *opaque);
+
+#define BLOCKDEV_YANK_INSTANCE(the_node_name) (&(YankInstance) { \
+        .type = YANK_INSTANCE_TYPE_BLOCK_NODE, \
+        .u.block_node.node_name = (the_node_name) })
+
+#define CHARDEV_YANK_INSTANCE(the_id) (&(YankInstance) { \
+        .type = YANK_INSTANCE_TYPE_CHARDEV, \
+        .u.chardev.id = (the_id) })
+
+#define MIGRATION_YANK_INSTANCE (&(YankInstance) { \
+        .type = YANK_INSTANCE_TYPE_MIGRATION })
+
+#endif
diff --git a/qapi/meson.build b/qapi/meson.build
index 0e98146f1f..ab68e7900e 100644
--- a/qapi/meson.build
+++ b/qapi/meson.build
@@ -47,6 +47,7 @@ qapi_all_modules = [
   'trace',
   'transaction',
   'ui',
+  'yank',
 ]

 qapi_storage_daemon_modules = [
diff --git a/qapi/qapi-schema.json b/qapi/qapi-schema.json
index 0b444b76d2..3441c9a9ae 100644
--- a/qapi/qapi-schema.json
+++ b/qapi/qapi-schema.json
@@ -86,6 +86,7 @@
 { 'include': 'machine.json' }
 { 'include': 'machine-target.json' }
 { 'include': 'replay.json' }
+{ 'include': 'yank.json' }
 { 'include': 'misc.json' }
 { 'include': 'misc-target.json' }
 { 'include': 'audio.json' }
diff --git a/qapi/yank.json b/qapi/yank.json
new file mode 100644
index 0000000000..167a775594
--- /dev/null
+++ b/qapi/yank.json
@@ -0,0 +1,119 @@
+# -*- Mode: Python -*-
+# vim: filetype=python
+#
+
+##
+# = Yank feature
+##
+
+##
+# @YankInstanceType:
+#
+# An enumeration of yank instance types. See @YankInstance for more
+# information.
+#
+# Since: 6.0
+##
+{ 'enum': 'YankInstanceType',
+  'data': [ 'block-node', 'chardev', 'migration' ] }
+
+##
+# @YankInstanceBlockNode:
+#
+# Specifies which block graph node to yank. See @YankInstance for more
+# information.
+#
+# @node-name: the name of the block graph node
+#
+# Since: 6.0
+##
+{ 'struct': 'YankInstanceBlockNode',
+  'data': { 'node-name': 'str' } }
+
+##
+# @YankInstanceChardev:
+#
+# Specifies which character device to yank. See @YankInstance for more
+# information.
+#
+# @id: the chardev's ID
+#
+# Since: 6.0
+##
+{ 'struct': 'YankInstanceChardev',
+  'data': { 'id': 'str' } }
+
+##
+# @YankInstance:
+#
+# A yank instance can be yanked with the @yank qmp command to recover from a
+# hanging QEMU.
+#
+# Currently implemented yank instances:
+#  - nbd block device:
+#    Yanking it will shut down the connection to the nbd server without
+#    attempting to reconnect.
+#  - socket chardev:
+#    Yanking it will shut down the connected socket.
+#  - migration:
+#    Yanking it will shut down all migration connections. Unlike
+#    @migrate_cancel, it will not notify the migration process, so migration
+#    will go into @failed state, instead of @cancelled state. @yank should be
+#    used to recover from hangs.
+#
+# Since: 6.0
+##
+{ 'union': 'YankInstance',
+  'base': { 'type': 'YankInstanceType' },
+  'discriminator': 'type',
+  'data': {
+      'block-node': 'YankInstanceBlockNode',
+      'chardev': 'YankInstanceChardev' } }
+
+##
+# @yank:
+#
+# Try to recover from hanging QEMU by yanking the specified instances. See
+# @YankInstance for more information.
+#
+# Takes a list of @YankInstance as argument.
+#
+# Returns: - Nothing on success
+#          - @DeviceNotFound error, if any of the YankInstances doesn't exist
+#
+# Example:
+#
+# -> { "execute": "yank",
+#      "arguments": {
+#          "instances": [
+#               { "type": "block-node",
+#                 "node-name": "nbd0" }
+#          ] } }
+# <- { "return": {} }
+#
+# Since: 6.0
+##
+{ 'command': 'yank',
+  'data': { 'instances': ['YankInstance'] },
+  'allow-oob': true }
+
+##
+# @query-yank:
+#
+# Query yank instances. See @YankInstance for more information.
+#
+# Returns: list of @YankInstance
+#
+# Example:
+#
+# -> { "execute": "query-yank" }
+# <- { "return": [
+#          { "type": "block-node",
+#            "node-name": "nbd0" }
+#      ] }
+#
+# Since: 6.0
+##
+{ 'command': 'query-yank',
+  'returns': ['YankInstance'],
+  'allow-oob': true }
diff --git a/util/meson.build b/util/meson.build
index f359af0d46..f7c67344e1 100644
--- a/util/meson.build
+++ b/util/meson.build
@@ -50,6 +50,7 @@ endif

 if have_system
   util_ss.add(when: 'CONFIG_GIO', if_true: [files('dbus.c'), gio])
+  util_ss.add(files('yank.c'))
 endif

 if have_block
diff --git a/util/yank.c b/util/yank.c
new file mode 100644
index 0000000000..4a0f538359
--- /dev/null
+++ b/util/yank.c
@@ -0,0 +1,216 @@
+/*
+ * QEMU yank feature
+ *
+ * Copyright (c) Lukas Straub <lukasstraub2@web.de>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qemu/thread.h"
+#include "qemu/queue.h"
+#include "qapi/qapi-commands-yank.h"
+#include "qapi/qapi-visit-yank.h"
+#include "qapi/clone-visitor.h"
+#include "io/channel.h"
+#include "qemu/yank.h"
+
+struct YankFuncAndParam {
+    YankFn *func;
+    void *opaque;
+    QLIST_ENTRY(YankFuncAndParam) next;
+};
+
+struct YankInstanceEntry {
+    YankInstance *instance;
+    QLIST_HEAD(, YankFuncAndParam) yankfns;
+    QLIST_ENTRY(YankInstanceEntry) next;
+};
+
+typedef struct YankFuncAndParam YankFuncAndParam;
+typedef struct YankInstanceEntry YankInstanceEntry;
+
+/*
+ * This lock protects the yank_instance_list below. Because it's taken by
+ * OOB-capable commands, it must be "fast", i.e. it may only be held for a
+ * bounded, short time. See docs/devel/qapi-code-gen.txt for additional
+ * information.
+ */
+static QemuMutex yank_lock;
+
+static QLIST_HEAD(, YankInstanceEntry) yank_instance_list
+    = QLIST_HEAD_INITIALIZER(yank_instance_list);
+
+static bool yank_compare_instances(const YankInstance *a, const YankInstance *b)
+{
+    if (a->type != b->type) {
+        return false;
+    }
+
+    switch (a->type) {
+    case YANK_INSTANCE_TYPE_BLOCK_NODE:
+        return !strcmp(a->u.block_node.node_name, b->u.block_node.node_name);
+    break;
+
+    case YANK_INSTANCE_TYPE_CHARDEV:
+        return !strcmp(a->u.chardev.id, b->u.chardev.id);
+    break;
+
+    case YANK_INSTANCE_TYPE_MIGRATION:
+        return true;
+    break;
+
+    default:
+        abort();
+    }
+}
+
+static YankInstanceEntry *yank_find_entry(const YankInstance *instance)
+{
+    YankInstanceEntry *entry;
+
+    QLIST_FOREACH(entry, &yank_instance_list, next) {
+        if (yank_compare_instances(entry->instance, instance)) {
+            return entry;
+        }
+    }
+    return NULL;
+}
+
+void yank_register_instance(const YankInstance *instance, Error **errp)
+{
+    YankInstanceEntry *entry;
+
+    qemu_mutex_lock(&yank_lock);
+
+    if (yank_find_entry(instance)) {
+        error_setg(errp, "duplicate yank instance");
+        qemu_mutex_unlock(&yank_lock);
+        return;
+    }
+
+    entry = g_slice_new(YankInstanceEntry);
+    entry->instance = QAPI_CLONE(YankInstance, instance);
+    QLIST_INIT(&entry->yankfns);
+    QLIST_INSERT_HEAD(&yank_instance_list, entry, next);
+
+    qemu_mutex_unlock(&yank_lock);
+}
+
+void yank_unregister_instance(const YankInstance *instance)
+{
+    YankInstanceEntry *entry;
+
+    qemu_mutex_lock(&yank_lock);
+    entry = yank_find_entry(instance);
+    assert(entry);
+
+    assert(QLIST_EMPTY(&entry->yankfns));
+    QLIST_REMOVE(entry, next);
+    qapi_free_YankInstance(entry->instance);
+    g_slice_free(YankInstanceEntry, entry);
+
+    qemu_mutex_unlock(&yank_lock);
+}
+
+void yank_register_function(const YankInstance *instance,
+                            YankFn *func,
+                            void *opaque)
+{
+    YankInstanceEntry *entry;
+    YankFuncAndParam *func_entry;
+
+    qemu_mutex_lock(&yank_lock);
+    entry = yank_find_entry(instance);
+    assert(entry);
+
+    func_entry = g_slice_new(YankFuncAndParam);
+    func_entry->func = func;
+    func_entry->opaque = opaque;
+
+    QLIST_INSERT_HEAD(&entry->yankfns, func_entry, next);
+    qemu_mutex_unlock(&yank_lock);
+}
+
+void yank_unregister_function(const YankInstance *instance,
+                              YankFn *func,
+                              void *opaque)
+{
+    YankInstanceEntry *entry;
+    YankFuncAndParam *func_entry;
+
+    qemu_mutex_lock(&yank_lock);
+    entry = yank_find_entry(instance);
+    assert(entry);
+
+    QLIST_FOREACH(func_entry, &entry->yankfns, next) {
+        if (func_entry->func == func && func_entry->opaque == opaque) {
+            QLIST_REMOVE(func_entry, next);
+            g_slice_free(YankFuncAndParam, func_entry);
+            qemu_mutex_unlock(&yank_lock);
+            return;
+        }
+    }
+
+    abort();
+}
+
+void yank_generic_iochannel(void *opaque)
+{
+    QIOChannel *ioc = QIO_CHANNEL(opaque);
+
+    qio_channel_shutdown(ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
+}
+
+void qmp_yank(YankInstanceList *instances,
+              Error **errp)
+{
+    YankInstanceList *tail;
+    YankInstanceEntry *entry;
+    YankFuncAndParam *func_entry;
+
+    qemu_mutex_lock(&yank_lock);
+    for (tail = instances; tail; tail = tail->next) {
+        entry = yank_find_entry(tail->value);
+        if (!entry) {
+            error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND, "Instance not found");
+            qemu_mutex_unlock(&yank_lock);
+            return;
+        }
+    }
+    for (tail = instances; tail; tail = tail->next) {
+        entry = yank_find_entry(tail->value);
+        assert(entry);
+        QLIST_FOREACH(func_entry, &entry->yankfns, next) {
+            func_entry->func(func_entry->opaque);
+        }
+    }
+    qemu_mutex_unlock(&yank_lock);
+}
+
+YankInstanceList *qmp_query_yank(Error **errp)
+{
+    YankInstanceEntry *entry;
+    YankInstanceList *ret;
+
+    ret = NULL;
+
+    qemu_mutex_lock(&yank_lock);
+    QLIST_FOREACH(entry, &yank_instance_list, next) {
+        YankInstanceList *new_entry;
+        new_entry = g_new0(YankInstanceList, 1);
+        new_entry->value = QAPI_CLONE(YankInstance, entry->instance);
+        new_entry->next = ret;
+        ret = new_entry;
+    }
+    qemu_mutex_unlock(&yank_lock);
+
+    return ret;
+}
+
+static void __attribute__((__constructor__)) yank_init(void)
+{
+    qemu_mutex_init(&yank_lock);
+}
--
2.20.1


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v12 2/7] block/nbd.c: Add yank feature
  2020-12-13 11:48 [PATCH v12 0/7] Introduce 'yank' oob qmp command to recover from hanging qemu Lukas Straub
  2020-12-13 11:48 ` [PATCH v12 1/7] Introduce yank feature Lukas Straub
@ 2020-12-13 11:48 ` Lukas Straub
  2020-12-13 11:48 ` [PATCH v12 3/7] chardev/char-socket.c: " Lukas Straub
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Lukas Straub @ 2020-12-13 11:48 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Daniel P. Berrangé,
	qemu-block, Juan Quintela, Markus Armbruster,
	Dr. David Alan Gilbert, Max Reitz, Marc-André Lureau,
	Paolo Bonzini, Marc-André Lureau

[-- Attachment #1: Type: text/plain, Size: 17485 bytes --]

Register a yank function which shuts down the socket and sets
s->state = NBD_CLIENT_QUIT. This is the same behaviour as if an
error occured.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 block/nbd.c | 154 +++++++++++++++++++++++++++++++---------------------
 1 file changed, 93 insertions(+), 61 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 42536702b6..994d1e7b33 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -35,6 +35,7 @@
 #include "qemu/option.h"
 #include "qemu/cutils.h"
 #include "qemu/main-loop.h"
+#include "qemu/atomic.h"

 #include "qapi/qapi-visit-sockets.h"
 #include "qapi/qmp/qstring.h"
@@ -44,6 +45,8 @@
 #include "block/nbd.h"
 #include "block/block_int.h"

+#include "qemu/yank.h"
+
 #define EN_OPTSTR ":exportname="
 #define MAX_NBD_REQUESTS    16

@@ -141,14 +144,13 @@ typedef struct BDRVNBDState {
     NBDConnectThread *connect_thread;
 } BDRVNBDState;

-static QIOChannelSocket *nbd_establish_connection(SocketAddress *saddr,
-                                                  Error **errp);
-static QIOChannelSocket *nbd_co_establish_connection(BlockDriverState *bs,
-                                                     Error **errp);
+static int nbd_establish_connection(BlockDriverState *bs, SocketAddress *saddr,
+                                    Error **errp);
+static int nbd_co_establish_connection(BlockDriverState *bs, Error **errp);
 static void nbd_co_establish_connection_cancel(BlockDriverState *bs,
                                                bool detach);
-static int nbd_client_handshake(BlockDriverState *bs, QIOChannelSocket *sioc,
-                                Error **errp);
+static int nbd_client_handshake(BlockDriverState *bs, Error **errp);
+static void nbd_yank(void *opaque);

 static void nbd_clear_bdrvstate(BDRVNBDState *s)
 {
@@ -166,12 +168,12 @@ static void nbd_clear_bdrvstate(BDRVNBDState *s)
 static void nbd_channel_error(BDRVNBDState *s, int ret)
 {
     if (ret == -EIO) {
-        if (s->state == NBD_CLIENT_CONNECTED) {
+        if (qatomic_load_acquire(&s->state) == NBD_CLIENT_CONNECTED) {
             s->state = s->reconnect_delay ? NBD_CLIENT_CONNECTING_WAIT :
                                             NBD_CLIENT_CONNECTING_NOWAIT;
         }
     } else {
-        if (s->state == NBD_CLIENT_CONNECTED) {
+        if (qatomic_load_acquire(&s->state) == NBD_CLIENT_CONNECTED) {
             qio_channel_shutdown(s->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
         }
         s->state = NBD_CLIENT_QUIT;
@@ -204,7 +206,7 @@ static void reconnect_delay_timer_cb(void *opaque)
 {
     BDRVNBDState *s = opaque;

-    if (s->state == NBD_CLIENT_CONNECTING_WAIT) {
+    if (qatomic_load_acquire(&s->state) == NBD_CLIENT_CONNECTING_WAIT) {
         s->state = NBD_CLIENT_CONNECTING_NOWAIT;
         while (qemu_co_enter_next(&s->free_sema, NULL)) {
             /* Resume all queued requests */
@@ -216,7 +218,7 @@ static void reconnect_delay_timer_cb(void *opaque)

 static void reconnect_delay_timer_init(BDRVNBDState *s, uint64_t expire_time_ns)
 {
-    if (s->state != NBD_CLIENT_CONNECTING_WAIT) {
+    if (qatomic_load_acquire(&s->state) != NBD_CLIENT_CONNECTING_WAIT) {
         return;
     }

@@ -261,7 +263,7 @@ static void nbd_client_attach_aio_context(BlockDriverState *bs,
      * s->connection_co is either yielded from nbd_receive_reply or from
      * nbd_co_reconnect_loop()
      */
-    if (s->state == NBD_CLIENT_CONNECTED) {
+    if (qatomic_load_acquire(&s->state) == NBD_CLIENT_CONNECTED) {
         qio_channel_attach_aio_context(QIO_CHANNEL(s->ioc), new_context);
     }

@@ -287,7 +289,7 @@ static void coroutine_fn nbd_client_co_drain_begin(BlockDriverState *bs)

     reconnect_delay_timer_del(s);

-    if (s->state == NBD_CLIENT_CONNECTING_WAIT) {
+    if (qatomic_load_acquire(&s->state) == NBD_CLIENT_CONNECTING_WAIT) {
         s->state = NBD_CLIENT_CONNECTING_NOWAIT;
         qemu_co_queue_restart_all(&s->free_sema);
     }
@@ -338,13 +340,14 @@ static void nbd_teardown_connection(BlockDriverState *bs)

 static bool nbd_client_connecting(BDRVNBDState *s)
 {
-    return s->state == NBD_CLIENT_CONNECTING_WAIT ||
-        s->state == NBD_CLIENT_CONNECTING_NOWAIT;
+    NBDClientState state = qatomic_load_acquire(&s->state);
+    return state == NBD_CLIENT_CONNECTING_WAIT ||
+        state == NBD_CLIENT_CONNECTING_NOWAIT;
 }

 static bool nbd_client_connecting_wait(BDRVNBDState *s)
 {
-    return s->state == NBD_CLIENT_CONNECTING_WAIT;
+    return qatomic_load_acquire(&s->state) == NBD_CLIENT_CONNECTING_WAIT;
 }

 static void connect_bh(void *opaque)
@@ -424,12 +427,12 @@ static void *connect_thread_func(void *opaque)
     return NULL;
 }

-static QIOChannelSocket *coroutine_fn
+static int coroutine_fn
 nbd_co_establish_connection(BlockDriverState *bs, Error **errp)
 {
+    int ret;
     QemuThread thread;
     BDRVNBDState *s = bs->opaque;
-    QIOChannelSocket *res;
     NBDConnectThread *thr = s->connect_thread;

     qemu_mutex_lock(&thr->mutex);
@@ -446,10 +449,12 @@ nbd_co_establish_connection(BlockDriverState *bs, Error **errp)
     case CONNECT_THREAD_SUCCESS:
         /* Previous attempt finally succeeded in background */
         thr->state = CONNECT_THREAD_NONE;
-        res = thr->sioc;
+        s->sioc = thr->sioc;
         thr->sioc = NULL;
+        yank_register_function(BLOCKDEV_YANK_INSTANCE(bs->node_name),
+                               nbd_yank, bs);
         qemu_mutex_unlock(&thr->mutex);
-        return res;
+        return 0;
     case CONNECT_THREAD_RUNNING:
         /* Already running, will wait */
         break;
@@ -481,8 +486,13 @@ nbd_co_establish_connection(BlockDriverState *bs, Error **errp)
         thr->state = CONNECT_THREAD_NONE;
         error_propagate(errp, thr->err);
         thr->err = NULL;
-        res = thr->sioc;
+        s->sioc = thr->sioc;
         thr->sioc = NULL;
+        if (s->sioc) {
+            yank_register_function(BLOCKDEV_YANK_INSTANCE(bs->node_name),
+                                   nbd_yank, bs);
+        }
+        ret = (s->sioc ? 0 : -1);
         break;
     case CONNECT_THREAD_RUNNING:
     case CONNECT_THREAD_RUNNING_DETACHED:
@@ -491,7 +501,7 @@ nbd_co_establish_connection(BlockDriverState *bs, Error **errp)
          * failed. Still connect thread is executing in background, and its
          * result may be used for next connection attempt.
          */
-        res = NULL;
+        ret = -1;
         error_setg(errp, "Connection attempt cancelled by other operation");
         break;

@@ -508,7 +518,7 @@ nbd_co_establish_connection(BlockDriverState *bs, Error **errp)

     qemu_mutex_unlock(&thr->mutex);

-    return res;
+    return ret;
 }

 /*
@@ -561,7 +571,6 @@ static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s)
 {
     int ret;
     Error *local_err = NULL;
-    QIOChannelSocket *sioc;

     if (!nbd_client_connecting(s)) {
         return;
@@ -594,21 +603,22 @@ static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s)
     /* Finalize previous connection if any */
     if (s->ioc) {
         qio_channel_detach_aio_context(QIO_CHANNEL(s->ioc));
+        yank_unregister_function(BLOCKDEV_YANK_INSTANCE(s->bs->node_name),
+                                 nbd_yank, s->bs);
         object_unref(OBJECT(s->sioc));
         s->sioc = NULL;
         object_unref(OBJECT(s->ioc));
         s->ioc = NULL;
     }

-    sioc = nbd_co_establish_connection(s->bs, &local_err);
-    if (!sioc) {
+    if (nbd_co_establish_connection(s->bs, &local_err) < 0) {
         ret = -ECONNREFUSED;
         goto out;
     }

     bdrv_dec_in_flight(s->bs);

-    ret = nbd_client_handshake(s->bs, sioc, &local_err);
+    ret = nbd_client_handshake(s->bs, &local_err);

     if (s->drained) {
         s->wait_drained_end = true;
@@ -640,7 +650,7 @@ static coroutine_fn void nbd_co_reconnect_loop(BDRVNBDState *s)
     uint64_t timeout = 1 * NANOSECONDS_PER_SECOND;
     uint64_t max_timeout = 16 * NANOSECONDS_PER_SECOND;

-    if (s->state == NBD_CLIENT_CONNECTING_WAIT) {
+    if (qatomic_load_acquire(&s->state) == NBD_CLIENT_CONNECTING_WAIT) {
         reconnect_delay_timer_init(s, qemu_clock_get_ns(QEMU_CLOCK_REALTIME) +
                                    s->reconnect_delay * NANOSECONDS_PER_SECOND);
     }
@@ -683,7 +693,7 @@ static coroutine_fn void nbd_connection_entry(void *opaque)
     int ret = 0;
     Error *local_err = NULL;

-    while (s->state != NBD_CLIENT_QUIT) {
+    while (qatomic_load_acquire(&s->state) != NBD_CLIENT_QUIT) {
         /*
          * The NBD client can only really be considered idle when it has
          * yielded from qio_channel_readv_all_eof(), waiting for data. This is
@@ -698,7 +708,7 @@ static coroutine_fn void nbd_connection_entry(void *opaque)
             nbd_co_reconnect_loop(s);
         }

-        if (s->state != NBD_CLIENT_CONNECTED) {
+        if (qatomic_load_acquire(&s->state) != NBD_CLIENT_CONNECTED) {
             continue;
         }

@@ -753,6 +763,8 @@ static coroutine_fn void nbd_connection_entry(void *opaque)
     s->connection_co = NULL;
     if (s->ioc) {
         qio_channel_detach_aio_context(QIO_CHANNEL(s->ioc));
+        yank_unregister_function(BLOCKDEV_YANK_INSTANCE(s->bs->node_name),
+                                 nbd_yank, s->bs);
         object_unref(OBJECT(s->sioc));
         s->sioc = NULL;
         object_unref(OBJECT(s->ioc));
@@ -777,7 +789,7 @@ static int nbd_co_send_request(BlockDriverState *bs,
         qemu_co_queue_wait(&s->free_sema, &s->send_mutex);
     }

-    if (s->state != NBD_CLIENT_CONNECTED) {
+    if (qatomic_load_acquire(&s->state) != NBD_CLIENT_CONNECTED) {
         rc = -EIO;
         goto err;
     }
@@ -804,7 +816,8 @@ static int nbd_co_send_request(BlockDriverState *bs,
     if (qiov) {
         qio_channel_set_cork(s->ioc, true);
         rc = nbd_send_request(s->ioc, request);
-        if (rc >= 0 && s->state == NBD_CLIENT_CONNECTED) {
+        if (qatomic_load_acquire(&s->state) == NBD_CLIENT_CONNECTED &&
+            rc >= 0) {
             if (qio_channel_writev_all(s->ioc, qiov->iov, qiov->niov,
                                        NULL) < 0) {
                 rc = -EIO;
@@ -1129,7 +1142,7 @@ static coroutine_fn int nbd_co_do_receive_one_chunk(
     s->requests[i].receiving = true;
     qemu_coroutine_yield();
     s->requests[i].receiving = false;
-    if (s->state != NBD_CLIENT_CONNECTED) {
+    if (qatomic_load_acquire(&s->state) != NBD_CLIENT_CONNECTED) {
         error_setg(errp, "Connection closed");
         return -EIO;
     }
@@ -1288,7 +1301,7 @@ static bool nbd_reply_chunk_iter_receive(BDRVNBDState *s,
     NBDReply local_reply;
     NBDStructuredReplyChunk *chunk;
     Error *local_err = NULL;
-    if (s->state != NBD_CLIENT_CONNECTED) {
+    if (qatomic_load_acquire(&s->state) != NBD_CLIENT_CONNECTED) {
         error_setg(&local_err, "Connection closed");
         nbd_iter_channel_error(iter, -EIO, &local_err);
         goto break_loop;
@@ -1313,7 +1326,8 @@ static bool nbd_reply_chunk_iter_receive(BDRVNBDState *s,
     }

     /* Do not execute the body of NBD_FOREACH_REPLY_CHUNK for simple reply. */
-    if (nbd_reply_is_simple(reply) || s->state != NBD_CLIENT_CONNECTED) {
+    if (nbd_reply_is_simple(reply) ||
+        qatomic_load_acquire(&s->state) != NBD_CLIENT_CONNECTED) {
         goto break_loop;
     }

@@ -1745,6 +1759,15 @@ static int nbd_client_reopen_prepare(BDRVReopenState *state,
     return 0;
 }

+static void nbd_yank(void *opaque)
+{
+    BlockDriverState *bs = opaque;
+    BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
+
+    qatomic_store_release(&s->state, NBD_CLIENT_QUIT);
+    qio_channel_shutdown(QIO_CHANNEL(s->sioc), QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
+}
+
 static void nbd_client_close(BlockDriverState *bs)
 {
     BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
@@ -1757,52 +1780,53 @@ static void nbd_client_close(BlockDriverState *bs)
     nbd_teardown_connection(bs);
 }

-static QIOChannelSocket *nbd_establish_connection(SocketAddress *saddr,
-                                                  Error **errp)
+static int nbd_establish_connection(BlockDriverState *bs,
+                                    SocketAddress *saddr,
+                                    Error **errp)
 {
     ERRP_GUARD();
-    QIOChannelSocket *sioc;
+    BDRVNBDState *s = (BDRVNBDState *)bs->opaque;

-    sioc = qio_channel_socket_new();
-    qio_channel_set_name(QIO_CHANNEL(sioc), "nbd-client");
+    s->sioc = qio_channel_socket_new();
+    qio_channel_set_name(QIO_CHANNEL(s->sioc), "nbd-client");

-    qio_channel_socket_connect_sync(sioc, saddr, errp);
+    qio_channel_socket_connect_sync(s->sioc, saddr, errp);
     if (*errp) {
-        object_unref(OBJECT(sioc));
-        return NULL;
+        object_unref(OBJECT(s->sioc));
+        s->sioc = NULL;
+        return -1;
     }

-    qio_channel_set_delay(QIO_CHANNEL(sioc), false);
+    yank_register_function(BLOCKDEV_YANK_INSTANCE(bs->node_name), nbd_yank, bs);
+    qio_channel_set_delay(QIO_CHANNEL(s->sioc), false);

-    return sioc;
+    return 0;
 }

-/* nbd_client_handshake takes ownership on sioc. On failure it is unref'ed. */
-static int nbd_client_handshake(BlockDriverState *bs, QIOChannelSocket *sioc,
-                                Error **errp)
+/* nbd_client_handshake takes ownership on s->sioc. On failure it's unref'ed. */
+static int nbd_client_handshake(BlockDriverState *bs, Error **errp)
 {
     BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
     AioContext *aio_context = bdrv_get_aio_context(bs);
     int ret;

     trace_nbd_client_handshake(s->export);
-
-    s->sioc = sioc;
-
-    qio_channel_set_blocking(QIO_CHANNEL(sioc), false, NULL);
-    qio_channel_attach_aio_context(QIO_CHANNEL(sioc), aio_context);
+    qio_channel_set_blocking(QIO_CHANNEL(s->sioc), false, NULL);
+    qio_channel_attach_aio_context(QIO_CHANNEL(s->sioc), aio_context);

     s->info.request_sizes = true;
     s->info.structured_reply = true;
     s->info.base_allocation = true;
     s->info.x_dirty_bitmap = g_strdup(s->x_dirty_bitmap);
     s->info.name = g_strdup(s->export ?: "");
-    ret = nbd_receive_negotiate(aio_context, QIO_CHANNEL(sioc), s->tlscreds,
+    ret = nbd_receive_negotiate(aio_context, QIO_CHANNEL(s->sioc), s->tlscreds,
                                 s->hostname, &s->ioc, &s->info, errp);
     g_free(s->info.x_dirty_bitmap);
     g_free(s->info.name);
     if (ret < 0) {
-        object_unref(OBJECT(sioc));
+        yank_unregister_function(BLOCKDEV_YANK_INSTANCE(bs->node_name),
+                                 nbd_yank, bs);
+        object_unref(OBJECT(s->sioc));
         s->sioc = NULL;
         return ret;
     }
@@ -1835,7 +1859,7 @@ static int nbd_client_handshake(BlockDriverState *bs, QIOChannelSocket *sioc,
     }

     if (!s->ioc) {
-        s->ioc = QIO_CHANNEL(sioc);
+        s->ioc = QIO_CHANNEL(s->sioc);
         object_ref(OBJECT(s->ioc));
     }

@@ -1851,9 +1875,11 @@ static int nbd_client_handshake(BlockDriverState *bs, QIOChannelSocket *sioc,
     {
         NBDRequest request = { .type = NBD_CMD_DISC };

-        nbd_send_request(s->ioc ?: QIO_CHANNEL(sioc), &request);
+        nbd_send_request(s->ioc ?: QIO_CHANNEL(s->sioc), &request);

-        object_unref(OBJECT(sioc));
+        yank_unregister_function(BLOCKDEV_YANK_INSTANCE(bs->node_name),
+                                 nbd_yank, bs);
+        object_unref(OBJECT(s->sioc));
         s->sioc = NULL;

         return ret;
@@ -2245,7 +2271,6 @@ static int nbd_open(BlockDriverState *bs, QDict *options, int flags,
 {
     int ret;
     BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
-    QIOChannelSocket *sioc;

     ret = nbd_process_options(bs, options, errp);
     if (ret < 0) {
@@ -2256,17 +2281,23 @@ static int nbd_open(BlockDriverState *bs, QDict *options, int flags,
     qemu_co_mutex_init(&s->send_mutex);
     qemu_co_queue_init(&s->free_sema);

+    yank_register_instance(BLOCKDEV_YANK_INSTANCE(bs->node_name), errp);
+    if (*errp) {
+        return -EEXIST;
+    }
+
     /*
      * establish TCP connection, return error if it fails
      * TODO: Configurable retry-until-timeout behaviour.
      */
-    sioc = nbd_establish_connection(s->saddr, errp);
-    if (!sioc) {
+    if (nbd_establish_connection(bs, s->saddr, errp) < 0) {
+        yank_unregister_instance(BLOCKDEV_YANK_INSTANCE(bs->node_name));
         return -ECONNREFUSED;
     }

-    ret = nbd_client_handshake(bs, sioc, errp);
+    ret = nbd_client_handshake(bs, errp);
     if (ret < 0) {
+        yank_unregister_instance(BLOCKDEV_YANK_INSTANCE(bs->node_name));
         nbd_clear_bdrvstate(s);
         return ret;
     }
@@ -2326,6 +2357,7 @@ static void nbd_close(BlockDriverState *bs)
     BDRVNBDState *s = bs->opaque;

     nbd_client_close(bs);
+    yank_unregister_instance(BLOCKDEV_YANK_INSTANCE(bs->node_name));
     nbd_clear_bdrvstate(s);
 }

--
2.20.1


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v12 3/7] chardev/char-socket.c: Add yank feature
  2020-12-13 11:48 [PATCH v12 0/7] Introduce 'yank' oob qmp command to recover from hanging qemu Lukas Straub
  2020-12-13 11:48 ` [PATCH v12 1/7] Introduce yank feature Lukas Straub
  2020-12-13 11:48 ` [PATCH v12 2/7] block/nbd.c: Add " Lukas Straub
@ 2020-12-13 11:48 ` Lukas Straub
  2020-12-13 11:48 ` [PATCH v12 4/7] migration: " Lukas Straub
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Lukas Straub @ 2020-12-13 11:48 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Daniel P. Berrangé,
	qemu-block, Juan Quintela, Markus Armbruster,
	Dr. David Alan Gilbert, Max Reitz, Marc-André Lureau,
	Paolo Bonzini, Marc-André Lureau

[-- Attachment #1: Type: text/plain, Size: 4819 bytes --]

Register a yank function to shutdown the socket on yank.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 chardev/char-socket.c | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/chardev/char-socket.c b/chardev/char-socket.c
index 213a4c8dd0..7f2ee9a338 100644
--- a/chardev/char-socket.c
+++ b/chardev/char-socket.c
@@ -34,6 +34,7 @@
 #include "qapi/error.h"
 #include "qapi/clone-visitor.h"
 #include "qapi/qapi-visit-sockets.h"
+#include "qemu/yank.h"

 #include "chardev/char-io.h"
 #include "qom/object.h"
@@ -70,6 +71,7 @@ struct SocketChardev {
     size_t read_msgfds_num;
     int *write_msgfds;
     size_t write_msgfds_num;
+    bool registered_yank;

     SocketAddress *addr;
     bool is_listen;
@@ -415,6 +417,12 @@ static void tcp_chr_free_connection(Chardev *chr)

     tcp_set_msgfds(chr, NULL, 0);
     remove_fd_in_watch(chr);
+    if (s->state == TCP_CHARDEV_STATE_CONNECTING
+        || s->state == TCP_CHARDEV_STATE_CONNECTED) {
+        yank_unregister_function(CHARDEV_YANK_INSTANCE(chr->label),
+                                 yank_generic_iochannel,
+                                 QIO_CHANNEL(s->sioc));
+    }
     object_unref(OBJECT(s->sioc));
     s->sioc = NULL;
     object_unref(OBJECT(s->ioc));
@@ -932,6 +940,9 @@ static int tcp_chr_add_client(Chardev *chr, int fd)
     }
     tcp_chr_change_state(s, TCP_CHARDEV_STATE_CONNECTING);
     tcp_chr_set_client_ioc_name(chr, sioc);
+    yank_register_function(CHARDEV_YANK_INSTANCE(chr->label),
+                           yank_generic_iochannel,
+                           QIO_CHANNEL(sioc));
     ret = tcp_chr_new_client(chr, sioc);
     object_unref(OBJECT(sioc));
     return ret;
@@ -946,6 +957,9 @@ static void tcp_chr_accept(QIONetListener *listener,

     tcp_chr_change_state(s, TCP_CHARDEV_STATE_CONNECTING);
     tcp_chr_set_client_ioc_name(chr, cioc);
+    yank_register_function(CHARDEV_YANK_INSTANCE(chr->label),
+                           yank_generic_iochannel,
+                           QIO_CHANNEL(cioc));
     tcp_chr_new_client(chr, cioc);
 }

@@ -961,6 +975,9 @@ static int tcp_chr_connect_client_sync(Chardev *chr, Error **errp)
         object_unref(OBJECT(sioc));
         return -1;
     }
+    yank_register_function(CHARDEV_YANK_INSTANCE(chr->label),
+                           yank_generic_iochannel,
+                           QIO_CHANNEL(sioc));
     tcp_chr_new_client(chr, sioc);
     object_unref(OBJECT(sioc));
     return 0;
@@ -976,6 +993,9 @@ static void tcp_chr_accept_server_sync(Chardev *chr)
     tcp_chr_change_state(s, TCP_CHARDEV_STATE_CONNECTING);
     sioc = qio_net_listener_wait_client(s->listener);
     tcp_chr_set_client_ioc_name(chr, sioc);
+    yank_register_function(CHARDEV_YANK_INSTANCE(chr->label),
+                           yank_generic_iochannel,
+                           QIO_CHANNEL(sioc));
     tcp_chr_new_client(chr, sioc);
     object_unref(OBJECT(sioc));
 }
@@ -1086,6 +1106,9 @@ static void char_socket_finalize(Object *obj)
         object_unref(OBJECT(s->tls_creds));
     }
     g_free(s->tls_authz);
+    if (s->registered_yank) {
+        yank_unregister_instance(CHARDEV_YANK_INSTANCE(chr->label));
+    }

     qemu_chr_be_event(chr, CHR_EVENT_CLOSED);
 }
@@ -1101,6 +1124,9 @@ static void qemu_chr_socket_connected(QIOTask *task, void *opaque)

     if (qio_task_propagate_error(task, &err)) {
         tcp_chr_change_state(s, TCP_CHARDEV_STATE_DISCONNECTED);
+        yank_unregister_function(CHARDEV_YANK_INSTANCE(chr->label),
+                                 yank_generic_iochannel,
+                                 QIO_CHANNEL(sioc));
         check_report_connect_error(chr, err);
         goto cleanup;
     }
@@ -1134,6 +1160,9 @@ static void tcp_chr_connect_client_async(Chardev *chr)
     tcp_chr_change_state(s, TCP_CHARDEV_STATE_CONNECTING);
     sioc = qio_channel_socket_new();
     tcp_chr_set_client_ioc_name(chr, sioc);
+    yank_register_function(CHARDEV_YANK_INSTANCE(chr->label),
+                           yank_generic_iochannel,
+                           QIO_CHANNEL(sioc));
     /*
      * Normally code would use the qio_channel_socket_connect_async
      * method which uses a QIOTask + qio_task_set_error internally
@@ -1376,6 +1405,12 @@ static void qmp_chardev_open_socket(Chardev *chr,
         qemu_chr_set_feature(chr, QEMU_CHAR_FEATURE_FD_PASS);
     }

+    yank_register_instance(CHARDEV_YANK_INSTANCE(chr->label), errp);
+    if (*errp) {
+        return;
+    }
+    s->registered_yank = true;
+
     /* be isn't opened until we get a connection */
     *be_opened = false;

--
2.20.1


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v12 4/7] migration: Add yank feature
  2020-12-13 11:48 [PATCH v12 0/7] Introduce 'yank' oob qmp command to recover from hanging qemu Lukas Straub
                   ` (2 preceding siblings ...)
  2020-12-13 11:48 ` [PATCH v12 3/7] chardev/char-socket.c: " Lukas Straub
@ 2020-12-13 11:48 ` Lukas Straub
  2020-12-13 11:48 ` [PATCH v12 5/7] io/channel-tls.c: make qio_channel_tls_shutdown thread-safe Lukas Straub
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Lukas Straub @ 2020-12-13 11:48 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Daniel P. Berrangé,
	qemu-block, Juan Quintela, Markus Armbruster,
	Dr. David Alan Gilbert, Max Reitz, Marc-André Lureau,
	Paolo Bonzini, Marc-André Lureau

[-- Attachment #1: Type: text/plain, Size: 7753 bytes --]

Register yank functions on sockets to shut them down.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/channel.c           | 13 +++++++++++++
 migration/migration.c         | 24 ++++++++++++++++++++++++
 migration/multifd.c           | 10 ++++++++++
 migration/qemu-file-channel.c |  7 +++++++
 migration/savevm.c            |  6 ++++++
 5 files changed, 60 insertions(+)

diff --git a/migration/channel.c b/migration/channel.c
index 8a783baa0b..35fe234e9c 100644
--- a/migration/channel.c
+++ b/migration/channel.c
@@ -18,6 +18,8 @@
 #include "trace.h"
 #include "qapi/error.h"
 #include "io/channel-tls.h"
+#include "io/channel-socket.h"
+#include "qemu/yank.h"

 /**
  * @migration_channel_process_incoming - Create new incoming migration channel
@@ -35,6 +37,11 @@ void migration_channel_process_incoming(QIOChannel *ioc)
     trace_migration_set_incoming_channel(
         ioc, object_get_typename(OBJECT(ioc)));

+    if (object_dynamic_cast(OBJECT(ioc), TYPE_QIO_CHANNEL_SOCKET)) {
+        yank_register_function(MIGRATION_YANK_INSTANCE, yank_generic_iochannel,
+                               QIO_CHANNEL(ioc));
+    }
+
     if (s->parameters.tls_creds &&
         *s->parameters.tls_creds &&
         !object_dynamic_cast(OBJECT(ioc),
@@ -67,6 +74,12 @@ void migration_channel_connect(MigrationState *s,
         ioc, object_get_typename(OBJECT(ioc)), hostname, error);

     if (!error) {
+        if (object_dynamic_cast(OBJECT(ioc), TYPE_QIO_CHANNEL_SOCKET)) {
+            yank_register_function(MIGRATION_YANK_INSTANCE,
+                                   yank_generic_iochannel,
+                                   QIO_CHANNEL(ioc));
+        }
+
         if (s->parameters.tls_creds &&
             *s->parameters.tls_creds &&
             !object_dynamic_cast(OBJECT(ioc),
diff --git a/migration/migration.c b/migration/migration.c
index e0dbde4091..dc520a721b 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -56,6 +56,7 @@
 #include "net/announce.h"
 #include "qemu/queue.h"
 #include "multifd.h"
+#include "qemu/yank.h"

 #ifdef CONFIG_VFIO
 #include "hw/vfio/vfio-common.h"
@@ -254,6 +255,8 @@ void migration_incoming_state_destroy(void)
         qapi_free_SocketAddressList(mis->socket_address_list);
         mis->socket_address_list = NULL;
     }
+
+    yank_unregister_instance(MIGRATION_YANK_INSTANCE);
 }

 static void migrate_generate_event(int new_state)
@@ -418,6 +421,11 @@ static void qemu_start_incoming_migration(const char *uri, Error **errp)
 {
     const char *p = NULL;

+    yank_register_instance(MIGRATION_YANK_INSTANCE, errp);
+    if (*errp) {
+        return;
+    }
+
     qapi_event_send_migration(MIGRATION_STATUS_SETUP);
     if (strstart(uri, "tcp:", &p) ||
         strstart(uri, "unix:", NULL) ||
@@ -432,6 +440,7 @@ static void qemu_start_incoming_migration(const char *uri, Error **errp)
     } else if (strstart(uri, "fd:", &p)) {
         fd_start_incoming_migration(p, errp);
     } else {
+        yank_unregister_instance(MIGRATION_YANK_INSTANCE);
         error_setg(errp, "unknown migration protocol: %s", uri);
     }
 }
@@ -1737,6 +1746,7 @@ static void migrate_fd_cleanup(MigrationState *s)
     }
     notifier_list_notify(&migration_state_notifiers, s);
     block_cleanup_parameters(s);
+    yank_unregister_instance(MIGRATION_YANK_INSTANCE);
 }

 static void migrate_fd_cleanup_schedule(MigrationState *s)
@@ -2011,6 +2021,7 @@ void qmp_migrate_recover(const char *uri, Error **errp)
      * only re-setup the migration stream and poke existing migration
      * to continue using that newly established channel.
      */
+    yank_unregister_instance(MIGRATION_YANK_INSTANCE);
     qemu_start_incoming_migration(uri, errp);
 }

@@ -2148,6 +2159,13 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
         return;
     }

+    if (!(has_resume && resume)) {
+        yank_register_instance(MIGRATION_YANK_INSTANCE, errp);
+        if (*errp) {
+            return;
+        }
+    }
+
     if (strstart(uri, "tcp:", &p) ||
         strstart(uri, "unix:", NULL) ||
         strstart(uri, "vsock:", NULL)) {
@@ -2161,6 +2179,9 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
     } else if (strstart(uri, "fd:", &p)) {
         fd_start_outgoing_migration(s, p, &local_err);
     } else {
+        if (!(has_resume && resume)) {
+            yank_unregister_instance(MIGRATION_YANK_INSTANCE);
+        }
         error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "uri",
                    "a valid migration protocol");
         migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
@@ -2170,6 +2191,9 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
     }

     if (local_err) {
+        if (!(has_resume && resume)) {
+            yank_unregister_instance(MIGRATION_YANK_INSTANCE);
+        }
         migrate_fd_error(s, local_err);
         error_propagate(errp, local_err);
         return;
diff --git a/migration/multifd.c b/migration/multifd.c
index 45c690aa11..1a1e589064 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -25,6 +25,9 @@
 #include "trace.h"
 #include "multifd.h"

+#include "qemu/yank.h"
+#include "io/channel-socket.h"
+
 /* Multiple fd's */

 #define MULTIFD_MAGIC 0x11223344U
@@ -974,6 +977,13 @@ int multifd_load_cleanup(Error **errp)
     for (i = 0; i < migrate_multifd_channels(); i++) {
         MultiFDRecvParams *p = &multifd_recv_state->params[i];

+        if (object_dynamic_cast(OBJECT(p->c), TYPE_QIO_CHANNEL_SOCKET)
+            && OBJECT(p->c)->ref == 1) {
+            yank_unregister_function(MIGRATION_YANK_INSTANCE,
+                                     yank_generic_iochannel,
+                                     QIO_CHANNEL(p->c));
+        }
+
         object_unref(OBJECT(p->c));
         p->c = NULL;
         qemu_mutex_destroy(&p->mutex);
diff --git a/migration/qemu-file-channel.c b/migration/qemu-file-channel.c
index d2ce32f4b9..afc3a7f642 100644
--- a/migration/qemu-file-channel.c
+++ b/migration/qemu-file-channel.c
@@ -27,6 +27,7 @@
 #include "qemu-file.h"
 #include "io/channel-socket.h"
 #include "qemu/iov.h"
+#include "qemu/yank.h"


 static ssize_t channel_writev_buffer(void *opaque,
@@ -104,6 +105,12 @@ static int channel_close(void *opaque, Error **errp)
     int ret;
     QIOChannel *ioc = QIO_CHANNEL(opaque);
     ret = qio_channel_close(ioc, errp);
+    if (object_dynamic_cast(OBJECT(ioc), TYPE_QIO_CHANNEL_SOCKET)
+        && OBJECT(ioc)->ref == 1) {
+        yank_unregister_function(MIGRATION_YANK_INSTANCE,
+                                 yank_generic_iochannel,
+                                 QIO_CHANNEL(ioc));
+    }
     object_unref(OBJECT(ioc));
     return ret;
 }
diff --git a/migration/savevm.c b/migration/savevm.c
index 5f937a2762..1a3808998e 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -62,6 +62,7 @@
 #include "migration/colo.h"
 #include "qemu/bitmap.h"
 #include "net/announce.h"
+#include "qemu/yank.h"

 const unsigned int postcopy_ram_discard_version;

@@ -3008,6 +3009,11 @@ int load_snapshot(const char *name, Error **errp)
     qemu_system_reset(SHUTDOWN_CAUSE_NONE);
     mis->from_src_file = f;

+    yank_register_instance(MIGRATION_YANK_INSTANCE, errp);
+    if (*errp) {
+        ret = -EINVAL;
+        goto err_drain;
+    }
     aio_context_acquire(aio_context);
     ret = qemu_loadvm_state(f);
     migration_incoming_state_destroy();
--
2.20.1


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v12 5/7] io/channel-tls.c: make qio_channel_tls_shutdown thread-safe
  2020-12-13 11:48 [PATCH v12 0/7] Introduce 'yank' oob qmp command to recover from hanging qemu Lukas Straub
                   ` (3 preceding siblings ...)
  2020-12-13 11:48 ` [PATCH v12 4/7] migration: " Lukas Straub
@ 2020-12-13 11:48 ` Lukas Straub
  2020-12-13 11:48 ` [PATCH v12 6/7] io: Document qmp oob suitability of qio_channel_shutdown and io_shutdown Lukas Straub
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Lukas Straub @ 2020-12-13 11:48 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Daniel P. Berrangé,
	qemu-block, Juan Quintela, Markus Armbruster,
	Dr. David Alan Gilbert, Max Reitz, Marc-André Lureau,
	Paolo Bonzini, Marc-André Lureau

[-- Attachment #1: Type: text/plain, Size: 1381 bytes --]

Make qio_channel_tls_shutdown thread-safe by using atomics when
accessing tioc->shutdown.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
---
 io/channel-tls.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/io/channel-tls.c b/io/channel-tls.c
index 388f019977..2ae1b92fc0 100644
--- a/io/channel-tls.c
+++ b/io/channel-tls.c
@@ -23,6 +23,7 @@
 #include "qemu/module.h"
 #include "io/channel-tls.h"
 #include "trace.h"
+#include "qemu/atomic.h"


 static ssize_t qio_channel_tls_write_handler(const char *buf,
@@ -277,7 +278,8 @@ static ssize_t qio_channel_tls_readv(QIOChannel *ioc,
                     return QIO_CHANNEL_ERR_BLOCK;
                 }
             } else if (errno == ECONNABORTED &&
-                       (tioc->shutdown & QIO_CHANNEL_SHUTDOWN_READ)) {
+                       (qatomic_load_acquire(&tioc->shutdown) &
+                        QIO_CHANNEL_SHUTDOWN_READ)) {
                 return 0;
             }

@@ -361,7 +363,7 @@ static int qio_channel_tls_shutdown(QIOChannel *ioc,
 {
     QIOChannelTLS *tioc = QIO_CHANNEL_TLS(ioc);

-    tioc->shutdown |= how;
+    qatomic_or(&tioc->shutdown, how);

     return qio_channel_shutdown(tioc->master, how, errp);
 }
--
2.20.1


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v12 6/7] io: Document qmp oob suitability of qio_channel_shutdown and io_shutdown
  2020-12-13 11:48 [PATCH v12 0/7] Introduce 'yank' oob qmp command to recover from hanging qemu Lukas Straub
                   ` (4 preceding siblings ...)
  2020-12-13 11:48 ` [PATCH v12 5/7] io/channel-tls.c: make qio_channel_tls_shutdown thread-safe Lukas Straub
@ 2020-12-13 11:48 ` Lukas Straub
  2020-12-13 11:48 ` [PATCH v12 7/7] tests/test-char.c: Wait for the chardev to connect in char_socket_client_dupid_test Lukas Straub
  2020-12-15 13:05 ` [PATCH v12 0/7] Introduce 'yank' oob qmp command to recover from hanging qemu Markus Armbruster
  7 siblings, 0 replies; 12+ messages in thread
From: Lukas Straub @ 2020-12-13 11:48 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Daniel P. Berrangé,
	qemu-block, Juan Quintela, Markus Armbruster,
	Dr. David Alan Gilbert, Max Reitz, Marc-André Lureau,
	Paolo Bonzini, Marc-André Lureau

[-- Attachment #1: Type: text/plain, Size: 1250 bytes --]

Migration and yank code assume that qio_channel_shutdown is thread
-safe and can be called from qmp oob handler. Document this after
checking the code.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
---
 include/io/channel.h | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/io/channel.h b/include/io/channel.h
index 4d6fe45f63..ab9ea77959 100644
--- a/include/io/channel.h
+++ b/include/io/channel.h
@@ -92,7 +92,8 @@ struct QIOChannel {
  * provide additional optional features.
  *
  * Consult the corresponding public API docs for a description
- * of the semantics of each callback
+ * of the semantics of each callback. io_shutdown in particular
+ * must be thread-safe, terminate quickly and must not block.
  */
 struct QIOChannelClass {
     ObjectClass parent;
@@ -510,6 +511,8 @@ int qio_channel_close(QIOChannel *ioc,
  * QIO_CHANNEL_FEATURE_SHUTDOWN prior to calling
  * this method.
  *
+ * This function is thread-safe, terminates quickly and does not block.
+ *
  * Returns: 0 on success, -1 on error
  */
 int qio_channel_shutdown(QIOChannel *ioc,
--
2.20.1


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v12 7/7] tests/test-char.c: Wait for the chardev to connect in char_socket_client_dupid_test
  2020-12-13 11:48 [PATCH v12 0/7] Introduce 'yank' oob qmp command to recover from hanging qemu Lukas Straub
                   ` (5 preceding siblings ...)
  2020-12-13 11:48 ` [PATCH v12 6/7] io: Document qmp oob suitability of qio_channel_shutdown and io_shutdown Lukas Straub
@ 2020-12-13 11:48 ` Lukas Straub
  2020-12-15 13:05 ` [PATCH v12 0/7] Introduce 'yank' oob qmp command to recover from hanging qemu Markus Armbruster
  7 siblings, 0 replies; 12+ messages in thread
From: Lukas Straub @ 2020-12-13 11:48 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Daniel P. Berrangé,
	qemu-block, Juan Quintela, Markus Armbruster,
	Dr. David Alan Gilbert, Max Reitz, Marc-André Lureau,
	Paolo Bonzini, Marc-André Lureau

[-- Attachment #1: Type: text/plain, Size: 1055 bytes --]

A connecting chardev object has an additional reference by the connecting
thread, so if the chardev is still connecting by the end of the test,
then the chardev object won't be freed. This in turn means that the yank
instance won't be unregistered and when running the next test-case
yank_register_instance will abort, because the yank instance is
already/still registered.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
---
 tests/test-char.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tests/test-char.c b/tests/test-char.c
index 953e0d1c1f..41a76410d8 100644
--- a/tests/test-char.c
+++ b/tests/test-char.c
@@ -937,6 +937,7 @@ static void char_socket_client_dupid_test(gconstpointer opaque)
     g_assert_nonnull(opts);
     chr1 = qemu_chr_new_from_opts(opts, NULL, &error_abort);
     g_assert_nonnull(chr1);
+    qemu_chr_wait_connected(chr1, &error_abort);

     chr2 = qemu_chr_new_from_opts(opts, NULL, &local_err);
     g_assert_null(chr2);
--
2.20.1

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v12 0/7] Introduce 'yank' oob qmp command to recover from hanging qemu
  2020-12-13 11:48 [PATCH v12 0/7] Introduce 'yank' oob qmp command to recover from hanging qemu Lukas Straub
                   ` (6 preceding siblings ...)
  2020-12-13 11:48 ` [PATCH v12 7/7] tests/test-char.c: Wait for the chardev to connect in char_socket_client_dupid_test Lukas Straub
@ 2020-12-15 13:05 ` Markus Armbruster
  2020-12-19  8:09   ` Markus Armbruster
  7 siblings, 1 reply; 12+ messages in thread
From: Markus Armbruster @ 2020-12-15 13:05 UTC (permalink / raw)
  To: Lukas Straub
  Cc: Kevin Wolf, Daniel P. Berrangé,
	qemu-block, Juan Quintela, Dr. David Alan Gilbert, qemu-devel,
	Marc-André Lureau, Marc-André Lureau, Paolo Bonzini,
	Max Reitz

Lukas Straub <lukasstraub2@web.de> writes:

> Hello Everyone,
> So here is v12.
> @Marc-André Lureau, We still need an ACK for the chardev patch.

Marc-André?  Would be good to get this wrapped before Christmas.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v12 0/7] Introduce 'yank' oob qmp command to recover from hanging qemu
  2020-12-15 13:05 ` [PATCH v12 0/7] Introduce 'yank' oob qmp command to recover from hanging qemu Markus Armbruster
@ 2020-12-19  8:09   ` Markus Armbruster
  0 siblings, 0 replies; 12+ messages in thread
From: Markus Armbruster @ 2020-12-19  8:09 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Lukas Straub, Daniel P. Berrangé,
	qemu-block, Juan Quintela, Dr. David Alan Gilbert, Max Reitz,
	Marc-André Lureau, Marc-André Lureau, Paolo Bonzini

Markus Armbruster <armbru@redhat.com> writes:

> Lukas Straub <lukasstraub2@web.de> writes:
>
>> Hello Everyone,
>> So here is v12.
>> @Marc-André Lureau, We still need an ACK for the chardev patch.
>
> Marc-André?  Would be good to get this wrapped before Christmas.

No go.

Unless someone objects, I'll merge this when I'm back (mid January) with
or without an ACK for chardev.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v12 1/7] Introduce yank feature
  2020-12-13 11:48 ` [PATCH v12 1/7] Introduce yank feature Lukas Straub
@ 2020-12-22  8:00   ` Marc-André Lureau
  2020-12-28 13:56     ` Lukas Straub
  0 siblings, 1 reply; 12+ messages in thread
From: Marc-André Lureau @ 2020-12-22  8:00 UTC (permalink / raw)
  To: Lukas Straub
  Cc: Kevin Wolf, Daniel P. Berrangé,
	qemu-block, Juan Quintela, Markus Armbruster, qemu-devel,
	Max Reitz, Paolo Bonzini, Dr. David Alan Gilbert

[-- Attachment #1: Type: text/plain, Size: 16677 bytes --]

On Sun, Dec 13, 2020 at 3:48 PM Lukas Straub <lukasstraub2@web.de> wrote:

> The yank feature allows to recover from hanging qemu by "yanking"
> at various parts. Other qemu systems can register themselves and
> multiple yank functions. Then all yank functions for selected
> instances can be called by the 'yank' out-of-band qmp command.
> Available instances can be queried by a 'query-yank' oob command.
>
> Signed-off-by: Lukas Straub <lukasstraub2@web.de>
> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
> Reviewed-by: Markus Armbruster <armbru@redhat.com>
> ---
>  MAINTAINERS           |   7 ++
>  include/qemu/yank.h   |  95 +++++++++++++++++++
>  qapi/meson.build      |   1 +
>  qapi/qapi-schema.json |   1 +
>  qapi/yank.json        | 119 +++++++++++++++++++++++
>  util/meson.build      |   1 +
>  util/yank.c           | 216 ++++++++++++++++++++++++++++++++++++++++++
>  7 files changed, 440 insertions(+)
>  create mode 100644 include/qemu/yank.h
>  create mode 100644 qapi/yank.json
>  create mode 100644 util/yank.c
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index d48a4e8a8b..5d7e3c0e4b 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -2705,6 +2705,13 @@ F: util/uuid.c
>  F: include/qemu/uuid.h
>  F: tests/test-uuid.c
>
> +Yank feature
> +M: Lukas Straub <lukasstraub2@web.de>
> +S: Odd fixes
> +F: util/yank.c
> +F: include/qemu/yank.h
> +F: qapi/yank.json
> +
>  COLO Framework
>  M: zhanghailiang <zhang.zhanghailiang@huawei.com>
>  S: Maintained
> diff --git a/include/qemu/yank.h b/include/qemu/yank.h
> new file mode 100644
> index 0000000000..96f5b2626f
> --- /dev/null
> +++ b/include/qemu/yank.h
> @@ -0,0 +1,95 @@
> +/*
> + * QEMU yank feature
> + *
> + * Copyright (c) Lukas Straub <lukasstraub2@web.de>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or
> later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#ifndef YANK_H
> +#define YANK_H
> +
> +#include "qapi/qapi-types-yank.h"
> +
> +typedef void (YankFn)(void *opaque);
> +
> +/**
> + * yank_register_instance: Register a new instance.
> + *
> + * This registers a new instance for yanking. Must be called before any
> yank
> + * function is registered for this instance.
> + *
> + * This function is thread-safe.
> + *
> + * @instance: The instance.
> + * @errp: Error object.
> + */
> +void yank_register_instance(const YankInstance *instance, Error **errp);
> +
>

It's a good idea to return a success boolean. (see include/qapi/error.h)

+/**
> + * yank_unregister_instance: Unregister a instance.
> + *
> + * This unregisters a instance. Must be called only after every yank
> function
> + * of the instance has been unregistered.
> + *
> + * This function is thread-safe.
> + *
> + * @instance: The instance.
> + */
> +void yank_unregister_instance(const YankInstance *instance);
> +
> +/**
> + * yank_register_function: Register a yank function
> + *
> + * This registers a yank function. All limitations of qmp oob commands
> apply
> + * to the yank function as well. See docs/devel/qapi-code-gen.txt under
> + * "An OOB-capable command handler must satisfy the following conditions".
> + *
> + * This function is thread-safe.
> + *
> + * @instance: The instance.
> + * @func: The yank function.
> + * @opaque: Will be passed to the yank function.
> + */
> +void yank_register_function(const YankInstance *instance,
> +                            YankFn *func,
> +                            void *opaque);
> +
> +/**
> + * yank_unregister_function: Unregister a yank function
> + *
> + * This unregisters a yank function.
> + *
> + * This function is thread-safe.
> + *
> + * @instance: The instance.
> + * @func: func that was passed to yank_register_function.
> + * @opaque: opaque that was passed to yank_register_function.
> + */
> +void yank_unregister_function(const YankInstance *instance,
> +                              YankFn *func,
> +                              void *opaque);
> +
> +/**
> + * yank_generic_iochannel: Generic yank function for iochannel
> + *
> + * This is a generic yank function which will call qio_channel_shutdown
> on the
> + * provided QIOChannel.
> + *
> + * @opaque: QIOChannel to shutdown
> + */
> +void yank_generic_iochannel(void *opaque);
> +
> +#define BLOCKDEV_YANK_INSTANCE(the_node_name) (&(YankInstance) { \
> +        .type = YANK_INSTANCE_TYPE_BLOCK_NODE, \
> +        .u.block_node.node_name = (the_node_name) })
> +
> +#define CHARDEV_YANK_INSTANCE(the_id) (&(YankInstance) { \
> +        .type = YANK_INSTANCE_TYPE_CHARDEV, \
> +        .u.chardev.id = (the_id) })
> +
> +#define MIGRATION_YANK_INSTANCE (&(YankInstance) { \
> +        .type = YANK_INSTANCE_TYPE_MIGRATION })
> +
> +#endif
> diff --git a/qapi/meson.build b/qapi/meson.build
> index 0e98146f1f..ab68e7900e 100644
> --- a/qapi/meson.build
> +++ b/qapi/meson.build
> @@ -47,6 +47,7 @@ qapi_all_modules = [
>    'trace',
>    'transaction',
>    'ui',
> +  'yank',
>  ]
>
>  qapi_storage_daemon_modules = [
> diff --git a/qapi/qapi-schema.json b/qapi/qapi-schema.json
> index 0b444b76d2..3441c9a9ae 100644
> --- a/qapi/qapi-schema.json
> +++ b/qapi/qapi-schema.json
> @@ -86,6 +86,7 @@
>  { 'include': 'machine.json' }
>  { 'include': 'machine-target.json' }
>  { 'include': 'replay.json' }
> +{ 'include': 'yank.json' }
>  { 'include': 'misc.json' }
>  { 'include': 'misc-target.json' }
>  { 'include': 'audio.json' }
> diff --git a/qapi/yank.json b/qapi/yank.json
> new file mode 100644
> index 0000000000..167a775594
> --- /dev/null
> +++ b/qapi/yank.json
> @@ -0,0 +1,119 @@
> +# -*- Mode: Python -*-
> +# vim: filetype=python
> +#
> +
> +##
> +# = Yank feature
> +##
> +
> +##
> +# @YankInstanceType:
> +#
> +# An enumeration of yank instance types. See @YankInstance for more
> +# information.
> +#
> +# Since: 6.0
> +##
> +{ 'enum': 'YankInstanceType',
> +  'data': [ 'block-node', 'chardev', 'migration' ] }
> +
> +##
> +# @YankInstanceBlockNode:
> +#
> +# Specifies which block graph node to yank. See @YankInstance for more
> +# information.
> +#
> +# @node-name: the name of the block graph node
> +#
> +# Since: 6.0
> +##
> +{ 'struct': 'YankInstanceBlockNode',
> +  'data': { 'node-name': 'str' } }
> +
> +##
> +# @YankInstanceChardev:
> +#
> +# Specifies which character device to yank. See @YankInstance for more
> +# information.
> +#
> +# @id: the chardev's ID
> +#
> +# Since: 6.0
> +##
> +{ 'struct': 'YankInstanceChardev',
> +  'data': { 'id': 'str' } }
> +
> +##
> +# @YankInstance:
> +#
> +# A yank instance can be yanked with the @yank qmp command to recover
> from a
> +# hanging QEMU.
> +#
> +# Currently implemented yank instances:
> +#  - nbd block device:
> +#    Yanking it will shut down the connection to the nbd server without
> +#    attempting to reconnect.
> +#  - socket chardev:
> +#    Yanking it will shut down the connected socket.
> +#  - migration:
> +#    Yanking it will shut down all migration connections. Unlike
> +#    @migrate_cancel, it will not notify the migration process, so
> migration
> +#    will go into @failed state, instead of @cancelled state. @yank
> should be
> +#    used to recover from hangs.
> +#
> +# Since: 6.0
> +##
> +{ 'union': 'YankInstance',
> +  'base': { 'type': 'YankInstanceType' },
> +  'discriminator': 'type',
> +  'data': {
> +      'block-node': 'YankInstanceBlockNode',
> +      'chardev': 'YankInstanceChardev' } }
> +
> +##
> +# @yank:
> +#
> +# Try to recover from hanging QEMU by yanking the specified instances. See
> +# @YankInstance for more information.
> +#
> +# Takes a list of @YankInstance as argument.
> +#
> +# Returns: - Nothing on success
> +#          - @DeviceNotFound error, if any of the YankInstances doesn't
> exist
> +#
> +# Example:
> +#
> +# -> { "execute": "yank",
> +#      "arguments": {
> +#          "instances": [
> +#               { "type": "block-node",
> +#                 "node-name": "nbd0" }
> +#          ] } }
> +# <- { "return": {} }
> +#
> +# Since: 6.0
> +##
> +{ 'command': 'yank',
> +  'data': { 'instances': ['YankInstance'] },
> +  'allow-oob': true }
> +
> +##
> +# @query-yank:
> +#
> +# Query yank instances. See @YankInstance for more information.
> +#
> +# Returns: list of @YankInstance
> +#
> +# Example:
> +#
> +# -> { "execute": "query-yank" }
> +# <- { "return": [
> +#          { "type": "block-node",
> +#            "node-name": "nbd0" }
> +#      ] }
> +#
> +# Since: 6.0
> +##
> +{ 'command': 'query-yank',
> +  'returns': ['YankInstance'],
> +  'allow-oob': true }
> diff --git a/util/meson.build b/util/meson.build
> index f359af0d46..f7c67344e1 100644
> --- a/util/meson.build
> +++ b/util/meson.build
> @@ -50,6 +50,7 @@ endif
>
>  if have_system
>    util_ss.add(when: 'CONFIG_GIO', if_true: [files('dbus.c'), gio])
> +  util_ss.add(files('yank.c'))
>  endif
>
>  if have_block
> diff --git a/util/yank.c b/util/yank.c
> new file mode 100644
> index 0000000000..4a0f538359
> --- /dev/null
> +++ b/util/yank.c
> @@ -0,0 +1,216 @@
> +/*
> + * QEMU yank feature
> + *
> + * Copyright (c) Lukas Straub <lukasstraub2@web.de>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or
> later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qapi/error.h"
> +#include "qemu/thread.h"
> +#include "qemu/queue.h"
> +#include "qapi/qapi-commands-yank.h"
> +#include "qapi/qapi-visit-yank.h"
> +#include "qapi/clone-visitor.h"
> +#include "io/channel.h"
> +#include "qemu/yank.h"
> +
> +struct YankFuncAndParam {
> +    YankFn *func;
> +    void *opaque;
> +    QLIST_ENTRY(YankFuncAndParam) next;
> +};
> +
> +struct YankInstanceEntry {
> +    YankInstance *instance;
> +    QLIST_HEAD(, YankFuncAndParam) yankfns;
> +    QLIST_ENTRY(YankInstanceEntry) next;
> +};
> +
> +typedef struct YankFuncAndParam YankFuncAndParam;
> +typedef struct YankInstanceEntry YankInstanceEntry;
> +
> +/*
> + * This lock protects the yank_instance_list below. Because it's taken by
> + * OOB-capable commands, it must be "fast", i.e. it may only be held for a
> + * bounded, short time. See docs/devel/qapi-code-gen.txt for additional
> + * information.
> + */
> +static QemuMutex yank_lock;
> +
> +static QLIST_HEAD(, YankInstanceEntry) yank_instance_list
> +    = QLIST_HEAD_INITIALIZER(yank_instance_list);
> +
> +static bool yank_compare_instances(const YankInstance *a, const
> YankInstance *b)
>

I expect a compare function to return a value like strcmp.

I suggest to name it yank_instance_equal() instead.

+{
> +    if (a->type != b->type) {
> +        return false;
> +    }
> +
> +    switch (a->type) {
> +    case YANK_INSTANCE_TYPE_BLOCK_NODE:
> +        return !strcmp(a->u.block_node.node_name,
> b->u.block_node.node_name);
>

Or g_str_equal()

+    break;
>

This is not a common qemu style "break" indentation. And the "break" here
shouldn't be necessary after a return.

+
> +    case YANK_INSTANCE_TYPE_CHARDEV:
> +        return !strcmp(a->u.chardev.id, b->u.chardev.id);
> +    break;
> +
> +    case YANK_INSTANCE_TYPE_MIGRATION:
> +        return true;
> +    break;
> +
> +    default:
> +        abort();
> +    }
> +}
> +
> +static YankInstanceEntry *yank_find_entry(const YankInstance *instance)
> +{
> +    YankInstanceEntry *entry;
> +
> +    QLIST_FOREACH(entry, &yank_instance_list, next) {
> +        if (yank_compare_instances(entry->instance, instance)) {
> +            return entry;
> +        }
> +    }
> +    return NULL;
> +}
> +
> +void yank_register_instance(const YankInstance *instance, Error **errp)
> +{
> +    YankInstanceEntry *entry;
> +
> +    qemu_mutex_lock(&yank_lock);
>

You may use QEMU_LOCK_GUARD instead.


> +
> +    if (yank_find_entry(instance)) {
> +        error_setg(errp, "duplicate yank instance");
> +        qemu_mutex_unlock(&yank_lock);
> +        return;
> +    }
> +
> +    entry = g_slice_new(YankInstanceEntry);
>

GSlice is mostly useless, rarely used in qemu. It's on its way to
deprecation in GLib (https://gitlab.gnome.org/GNOME/glib/-/issues/1079).
Please replace with g_new0()

+    entry->instance = QAPI_CLONE(YankInstance, instance);
> +    QLIST_INIT(&entry->yankfns);
> +    QLIST_INSERT_HEAD(&yank_instance_list, entry, next);
> +
> +    qemu_mutex_unlock(&yank_lock);
> +}
> +
> +void yank_unregister_instance(const YankInstance *instance)
> +{
> +    YankInstanceEntry *entry;
> +
> +    qemu_mutex_lock(&yank_lock);
>

 You may use QEMU_LOCK_GUARD instead. (same below)

+    entry = yank_find_entry(instance);
> +    assert(entry);
>

This is not very friendly, as shown by the following patches that need to
track separately whether an instance was registered or not.

Could we simply return if entry is not found?


> +
> +    assert(QLIST_EMPTY(&entry->yankfns));
> +    QLIST_REMOVE(entry, next);
> +    qapi_free_YankInstance(entry->instance);
> +    g_slice_free(YankInstanceEntry, entry);
> +
> +    qemu_mutex_unlock(&yank_lock);
> +}
> +
> +void yank_register_function(const YankInstance *instance,
> +                            YankFn *func,
> +                            void *opaque)
> +{
> +    YankInstanceEntry *entry;
> +    YankFuncAndParam *func_entry;
> +
> +    qemu_mutex_lock(&yank_lock);
>


+    entry = yank_find_entry(instance);
> +    assert(entry);
>

The object is expected to be fully initialized, assert() is justified
(compared to the unregistered function which could silently ignore
unregistered instances)

+
> +    func_entry = g_slice_new(YankFuncAndParam);
> +    func_entry->func = func;
> +    func_entry->opaque = opaque;
> +
> +    QLIST_INSERT_HEAD(&entry->yankfns, func_entry, next);
> +    qemu_mutex_unlock(&yank_lock);
> +}
> +
> +void yank_unregister_function(const YankInstance *instance,
> +                              YankFn *func,
> +                              void *opaque)
> +{
> +    YankInstanceEntry *entry;
> +    YankFuncAndParam *func_entry;
> +
> +    qemu_mutex_lock(&yank_lock);
>


+    entry = yank_find_entry(instance);
> +    assert(entry);
> +
> +    QLIST_FOREACH(func_entry, &entry->yankfns, next) {
> +        if (func_entry->func == func && func_entry->opaque == opaque) {
> +            QLIST_REMOVE(func_entry, next);
> +            g_slice_free(YankFuncAndParam, func_entry);
> +            qemu_mutex_unlock(&yank_lock);
> +            return;
> +        }
> +    }
> +
> +    abort();
> +}
> +
> +void yank_generic_iochannel(void *opaque)
> +{
> +    QIOChannel *ioc = QIO_CHANNEL(opaque);
> +
> +    qio_channel_shutdown(ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
> +}
> +
> +void qmp_yank(YankInstanceList *instances,
> +              Error **errp)
> +{
> +    YankInstanceList *tail;
> +    YankInstanceEntry *entry;
> +    YankFuncAndParam *func_entry;
> +
> +    qemu_mutex_lock(&yank_lock);
>


+    for (tail = instances; tail; tail = tail->next) {
> +        entry = yank_find_entry(tail->value);
> +        if (!entry) {
> +            error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND, "Instance not
> found");
> +            qemu_mutex_unlock(&yank_lock);
> +            return;
> +        }
> +    }
> +    for (tail = instances; tail; tail = tail->next) {
> +        entry = yank_find_entry(tail->value);
> +        assert(entry);
> +        QLIST_FOREACH(func_entry, &entry->yankfns, next) {
> +            func_entry->func(func_entry->opaque);
> +        }
> +    }
> +    qemu_mutex_unlock(&yank_lock);
> +}
> +
> +YankInstanceList *qmp_query_yank(Error **errp)
> +{
> +    YankInstanceEntry *entry;
> +    YankInstanceList *ret;
> +
> +    ret = NULL;
> +
> +    qemu_mutex_lock(&yank_lock);
>


+    QLIST_FOREACH(entry, &yank_instance_list, next) {
> +        YankInstanceList *new_entry;
> +        new_entry = g_new0(YankInstanceList, 1);
> +        new_entry->value = QAPI_CLONE(YankInstance, entry->instance);
> +        new_entry->next = ret;
> +        ret = new_entry;
> +    }
> +    qemu_mutex_unlock(&yank_lock);
> +
> +    return ret;
> +}
> +
> +static void __attribute__((__constructor__)) yank_init(void)
> +{
> +    qemu_mutex_init(&yank_lock);
> +}
> --
> 2.20.1
>
>

-- 
Marc-André Lureau

[-- Attachment #2: Type: text/html, Size: 22650 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v12 1/7] Introduce yank feature
  2020-12-22  8:00   ` Marc-André Lureau
@ 2020-12-28 13:56     ` Lukas Straub
  0 siblings, 0 replies; 12+ messages in thread
From: Lukas Straub @ 2020-12-28 13:56 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Kevin Wolf, Daniel P. Berrangé,
	qemu-block, Juan Quintela, Markus Armbruster, qemu-devel,
	Max Reitz, Paolo Bonzini, Dr. David Alan Gilbert

[-- Attachment #1: Type: text/plain, Size: 18731 bytes --]

On Tue, 22 Dec 2020 12:00:29 +0400
Marc-André Lureau <marcandre.lureau@gmail.com> wrote:

> On Sun, Dec 13, 2020 at 3:48 PM Lukas Straub <lukasstraub2@web.de> wrote:
> 
> > The yank feature allows to recover from hanging qemu by "yanking"
> > at various parts. Other qemu systems can register themselves and
> > multiple yank functions. Then all yank functions for selected
> > instances can be called by the 'yank' out-of-band qmp command.
> > Available instances can be queried by a 'query-yank' oob command.
> >
> > Signed-off-by: Lukas Straub <lukasstraub2@web.de>
> > Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
> > Reviewed-by: Markus Armbruster <armbru@redhat.com>
> > ---
> >  MAINTAINERS           |   7 ++
> >  include/qemu/yank.h   |  95 +++++++++++++++++++
> >  qapi/meson.build      |   1 +
> >  qapi/qapi-schema.json |   1 +
> >  qapi/yank.json        | 119 +++++++++++++++++++++++
> >  util/meson.build      |   1 +
> >  util/yank.c           | 216 ++++++++++++++++++++++++++++++++++++++++++
> >  7 files changed, 440 insertions(+)
> >  create mode 100644 include/qemu/yank.h
> >  create mode 100644 qapi/yank.json
> >  create mode 100644 util/yank.c
> >
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index d48a4e8a8b..5d7e3c0e4b 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -2705,6 +2705,13 @@ F: util/uuid.c
> >  F: include/qemu/uuid.h
> >  F: tests/test-uuid.c
> >
> > +Yank feature
> > +M: Lukas Straub <lukasstraub2@web.de>
> > +S: Odd fixes
> > +F: util/yank.c
> > +F: include/qemu/yank.h
> > +F: qapi/yank.json
> > +
> >  COLO Framework
> >  M: zhanghailiang <zhang.zhanghailiang@huawei.com>
> >  S: Maintained
> > diff --git a/include/qemu/yank.h b/include/qemu/yank.h
> > new file mode 100644
> > index 0000000000..96f5b2626f
> > --- /dev/null
> > +++ b/include/qemu/yank.h
> > @@ -0,0 +1,95 @@
> > +/*
> > + * QEMU yank feature
> > + *
> > + * Copyright (c) Lukas Straub <lukasstraub2@web.de>
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or
> > later.
> > + * See the COPYING file in the top-level directory.
> > + */
> > +
> > +#ifndef YANK_H
> > +#define YANK_H
> > +
> > +#include "qapi/qapi-types-yank.h"
> > +
> > +typedef void (YankFn)(void *opaque);
> > +
> > +/**
> > + * yank_register_instance: Register a new instance.
> > + *
> > + * This registers a new instance for yanking. Must be called before any
> > yank
> > + * function is registered for this instance.
> > + *
> > + * This function is thread-safe.
> > + *
> > + * @instance: The instance.
> > + * @errp: Error object.
> > + */
> > +void yank_register_instance(const YankInstance *instance, Error **errp);
> > +
> >  
> 
> It's a good idea to return a success boolean. (see include/qapi/error.h)

Changed for the next version.

> +/**
> > + * yank_unregister_instance: Unregister a instance.
> > + *
> > + * This unregisters a instance. Must be called only after every yank
> > function
> > + * of the instance has been unregistered.
> > + *
> > + * This function is thread-safe.
> > + *
> > + * @instance: The instance.
> > + */
> > +void yank_unregister_instance(const YankInstance *instance);
> > +
> > +/**
> > + * yank_register_function: Register a yank function
> > + *
> > + * This registers a yank function. All limitations of qmp oob commands
> > apply
> > + * to the yank function as well. See docs/devel/qapi-code-gen.txt under
> > + * "An OOB-capable command handler must satisfy the following conditions".
> > + *
> > + * This function is thread-safe.
> > + *
> > + * @instance: The instance.
> > + * @func: The yank function.
> > + * @opaque: Will be passed to the yank function.
> > + */
> > +void yank_register_function(const YankInstance *instance,
> > +                            YankFn *func,
> > +                            void *opaque);
> > +
> > +/**
> > + * yank_unregister_function: Unregister a yank function
> > + *
> > + * This unregisters a yank function.
> > + *
> > + * This function is thread-safe.
> > + *
> > + * @instance: The instance.
> > + * @func: func that was passed to yank_register_function.
> > + * @opaque: opaque that was passed to yank_register_function.
> > + */
> > +void yank_unregister_function(const YankInstance *instance,
> > +                              YankFn *func,
> > +                              void *opaque);
> > +
> > +/**
> > + * yank_generic_iochannel: Generic yank function for iochannel
> > + *
> > + * This is a generic yank function which will call qio_channel_shutdown
> > on the
> > + * provided QIOChannel.
> > + *
> > + * @opaque: QIOChannel to shutdown
> > + */
> > +void yank_generic_iochannel(void *opaque);
> > +
> > +#define BLOCKDEV_YANK_INSTANCE(the_node_name) (&(YankInstance) { \
> > +        .type = YANK_INSTANCE_TYPE_BLOCK_NODE, \
> > +        .u.block_node.node_name = (the_node_name) })
> > +
> > +#define CHARDEV_YANK_INSTANCE(the_id) (&(YankInstance) { \
> > +        .type = YANK_INSTANCE_TYPE_CHARDEV, \
> > +        .u.chardev.id = (the_id) })
> > +
> > +#define MIGRATION_YANK_INSTANCE (&(YankInstance) { \
> > +        .type = YANK_INSTANCE_TYPE_MIGRATION })
> > +
> > +#endif
> > diff --git a/qapi/meson.build b/qapi/meson.build
> > index 0e98146f1f..ab68e7900e 100644
> > --- a/qapi/meson.build
> > +++ b/qapi/meson.build
> > @@ -47,6 +47,7 @@ qapi_all_modules = [
> >    'trace',
> >    'transaction',
> >    'ui',
> > +  'yank',
> >  ]
> >
> >  qapi_storage_daemon_modules = [
> > diff --git a/qapi/qapi-schema.json b/qapi/qapi-schema.json
> > index 0b444b76d2..3441c9a9ae 100644
> > --- a/qapi/qapi-schema.json
> > +++ b/qapi/qapi-schema.json
> > @@ -86,6 +86,7 @@
> >  { 'include': 'machine.json' }
> >  { 'include': 'machine-target.json' }
> >  { 'include': 'replay.json' }
> > +{ 'include': 'yank.json' }
> >  { 'include': 'misc.json' }
> >  { 'include': 'misc-target.json' }
> >  { 'include': 'audio.json' }
> > diff --git a/qapi/yank.json b/qapi/yank.json
> > new file mode 100644
> > index 0000000000..167a775594
> > --- /dev/null
> > +++ b/qapi/yank.json
> > @@ -0,0 +1,119 @@
> > +# -*- Mode: Python -*-
> > +# vim: filetype=python
> > +#
> > +
> > +##
> > +# = Yank feature
> > +##
> > +
> > +##
> > +# @YankInstanceType:
> > +#
> > +# An enumeration of yank instance types. See @YankInstance for more
> > +# information.
> > +#
> > +# Since: 6.0
> > +##
> > +{ 'enum': 'YankInstanceType',
> > +  'data': [ 'block-node', 'chardev', 'migration' ] }
> > +
> > +##
> > +# @YankInstanceBlockNode:
> > +#
> > +# Specifies which block graph node to yank. See @YankInstance for more
> > +# information.
> > +#
> > +# @node-name: the name of the block graph node
> > +#
> > +# Since: 6.0
> > +##
> > +{ 'struct': 'YankInstanceBlockNode',
> > +  'data': { 'node-name': 'str' } }
> > +
> > +##
> > +# @YankInstanceChardev:
> > +#
> > +# Specifies which character device to yank. See @YankInstance for more
> > +# information.
> > +#
> > +# @id: the chardev's ID
> > +#
> > +# Since: 6.0
> > +##
> > +{ 'struct': 'YankInstanceChardev',
> > +  'data': { 'id': 'str' } }
> > +
> > +##
> > +# @YankInstance:
> > +#
> > +# A yank instance can be yanked with the @yank qmp command to recover
> > from a
> > +# hanging QEMU.
> > +#
> > +# Currently implemented yank instances:
> > +#  - nbd block device:
> > +#    Yanking it will shut down the connection to the nbd server without
> > +#    attempting to reconnect.
> > +#  - socket chardev:
> > +#    Yanking it will shut down the connected socket.
> > +#  - migration:
> > +#    Yanking it will shut down all migration connections. Unlike
> > +#    @migrate_cancel, it will not notify the migration process, so
> > migration
> > +#    will go into @failed state, instead of @cancelled state. @yank
> > should be
> > +#    used to recover from hangs.
> > +#
> > +# Since: 6.0
> > +##
> > +{ 'union': 'YankInstance',
> > +  'base': { 'type': 'YankInstanceType' },
> > +  'discriminator': 'type',
> > +  'data': {
> > +      'block-node': 'YankInstanceBlockNode',
> > +      'chardev': 'YankInstanceChardev' } }
> > +
> > +##
> > +# @yank:
> > +#
> > +# Try to recover from hanging QEMU by yanking the specified instances. See
> > +# @YankInstance for more information.
> > +#
> > +# Takes a list of @YankInstance as argument.
> > +#
> > +# Returns: - Nothing on success
> > +#          - @DeviceNotFound error, if any of the YankInstances doesn't
> > exist
> > +#
> > +# Example:
> > +#
> > +# -> { "execute": "yank",
> > +#      "arguments": {
> > +#          "instances": [
> > +#               { "type": "block-node",
> > +#                 "node-name": "nbd0" }
> > +#          ] } }
> > +# <- { "return": {} }
> > +#
> > +# Since: 6.0
> > +##
> > +{ 'command': 'yank',
> > +  'data': { 'instances': ['YankInstance'] },
> > +  'allow-oob': true }
> > +
> > +##
> > +# @query-yank:
> > +#
> > +# Query yank instances. See @YankInstance for more information.
> > +#
> > +# Returns: list of @YankInstance
> > +#
> > +# Example:
> > +#
> > +# -> { "execute": "query-yank" }
> > +# <- { "return": [
> > +#          { "type": "block-node",
> > +#            "node-name": "nbd0" }
> > +#      ] }
> > +#
> > +# Since: 6.0
> > +##
> > +{ 'command': 'query-yank',
> > +  'returns': ['YankInstance'],
> > +  'allow-oob': true }
> > diff --git a/util/meson.build b/util/meson.build
> > index f359af0d46..f7c67344e1 100644
> > --- a/util/meson.build
> > +++ b/util/meson.build
> > @@ -50,6 +50,7 @@ endif
> >
> >  if have_system
> >    util_ss.add(when: 'CONFIG_GIO', if_true: [files('dbus.c'), gio])
> > +  util_ss.add(files('yank.c'))
> >  endif
> >
> >  if have_block
> > diff --git a/util/yank.c b/util/yank.c
> > new file mode 100644
> > index 0000000000..4a0f538359
> > --- /dev/null
> > +++ b/util/yank.c
> > @@ -0,0 +1,216 @@
> > +/*
> > + * QEMU yank feature
> > + *
> > + * Copyright (c) Lukas Straub <lukasstraub2@web.de>
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or
> > later.
> > + * See the COPYING file in the top-level directory.
> > + */
> > +
> > +#include "qemu/osdep.h"
> > +#include "qapi/error.h"
> > +#include "qemu/thread.h"
> > +#include "qemu/queue.h"
> > +#include "qapi/qapi-commands-yank.h"
> > +#include "qapi/qapi-visit-yank.h"
> > +#include "qapi/clone-visitor.h"
> > +#include "io/channel.h"
> > +#include "qemu/yank.h"
> > +
> > +struct YankFuncAndParam {
> > +    YankFn *func;
> > +    void *opaque;
> > +    QLIST_ENTRY(YankFuncAndParam) next;
> > +};
> > +
> > +struct YankInstanceEntry {
> > +    YankInstance *instance;
> > +    QLIST_HEAD(, YankFuncAndParam) yankfns;
> > +    QLIST_ENTRY(YankInstanceEntry) next;
> > +};
> > +
> > +typedef struct YankFuncAndParam YankFuncAndParam;
> > +typedef struct YankInstanceEntry YankInstanceEntry;
> > +
> > +/*
> > + * This lock protects the yank_instance_list below. Because it's taken by
> > + * OOB-capable commands, it must be "fast", i.e. it may only be held for a
> > + * bounded, short time. See docs/devel/qapi-code-gen.txt for additional
> > + * information.
> > + */
> > +static QemuMutex yank_lock;
> > +
> > +static QLIST_HEAD(, YankInstanceEntry) yank_instance_list
> > +    = QLIST_HEAD_INITIALIZER(yank_instance_list);
> > +
> > +static bool yank_compare_instances(const YankInstance *a, const
> > YankInstance *b)
> >  
> 
> I expect a compare function to return a value like strcmp.
> 
> I suggest to name it yank_instance_equal() instead.

Changed for the next version.

> +{
> > +    if (a->type != b->type) {
> > +        return false;
> > +    }
> > +
> > +    switch (a->type) {
> > +    case YANK_INSTANCE_TYPE_BLOCK_NODE:
> > +        return !strcmp(a->u.block_node.node_name,
> > b->u.block_node.node_name);
> >  
> 
> Or g_str_equal()

Changed for the next version.

> +    break;
> >  
> 
> This is not a common qemu style "break" indentation. And the "break" here
> shouldn't be necessary after a return.

Changed for the next version.

> +
> > +    case YANK_INSTANCE_TYPE_CHARDEV:
> > +        return !strcmp(a->u.chardev.id, b->u.chardev.id);
> > +    break;
> > +
> > +    case YANK_INSTANCE_TYPE_MIGRATION:
> > +        return true;
> > +    break;
> > +
> > +    default:
> > +        abort();
> > +    }
> > +}
> > +
> > +static YankInstanceEntry *yank_find_entry(const YankInstance *instance)
> > +{
> > +    YankInstanceEntry *entry;
> > +
> > +    QLIST_FOREACH(entry, &yank_instance_list, next) {
> > +        if (yank_compare_instances(entry->instance, instance)) {
> > +            return entry;
> > +        }
> > +    }
> > +    return NULL;
> > +}
> > +
> > +void yank_register_instance(const YankInstance *instance, Error **errp)
> > +{
> > +    YankInstanceEntry *entry;
> > +
> > +    qemu_mutex_lock(&yank_lock);
> >  
> 
> You may use QEMU_LOCK_GUARD instead.

Changed for the next version.

> 
> > +
> > +    if (yank_find_entry(instance)) {
> > +        error_setg(errp, "duplicate yank instance");
> > +        qemu_mutex_unlock(&yank_lock);
> > +        return;
> > +    }
> > +
> > +    entry = g_slice_new(YankInstanceEntry);
> >  
> 
> GSlice is mostly useless, rarely used in qemu. It's on its way to
> deprecation in GLib (https://gitlab.gnome.org/GNOME/glib/-/issues/1079).
> Please replace with g_new0()

Changed for the next version.

> 
> +    entry->instance = QAPI_CLONE(YankInstance, instance);
> > +    QLIST_INIT(&entry->yankfns);
> > +    QLIST_INSERT_HEAD(&yank_instance_list, entry, next);
> > +
> > +    qemu_mutex_unlock(&yank_lock);
> > +}
> > +
> > +void yank_unregister_instance(const YankInstance *instance)
> > +{
> > +    YankInstanceEntry *entry;
> > +
> > +    qemu_mutex_lock(&yank_lock);
> >  
> 
>  You may use QEMU_LOCK_GUARD instead. (same below)
> 

Changed for the next version.

> +    entry = yank_find_entry(instance);
> > +    assert(entry);
> >  
> 
> This is not very friendly, as shown by the following patches that need to
> track separately whether an instance was registered or not.
> 
> Could we simply return if entry is not found?

I'd rather keep it that way, as it detects programming errors.

And in the chardev code for example, the chardev object is created even
if a chardev with the same id already exists. yank_register_instance
detects this and it errors out. However, the new chardev object then
will be free()'d and if we call yank_unregister_instance unconditionally
in the destructor, we would actually unregister the instance of the
old/already existing chardev. So at least there we have to track it
separately anyway.

> 
> > +
> > +    assert(QLIST_EMPTY(&entry->yankfns));
> > +    QLIST_REMOVE(entry, next);
> > +    qapi_free_YankInstance(entry->instance);
> > +    g_slice_free(YankInstanceEntry, entry);
> > +
> > +    qemu_mutex_unlock(&yank_lock);
> > +}
> > +
> > +void yank_register_function(const YankInstance *instance,
> > +                            YankFn *func,
> > +                            void *opaque)
> > +{
> > +    YankInstanceEntry *entry;
> > +    YankFuncAndParam *func_entry;
> > +
> > +    qemu_mutex_lock(&yank_lock);
> >  
> 
> 
> +    entry = yank_find_entry(instance);
> > +    assert(entry);
> >  
> 
> The object is expected to be fully initialized, assert() is justified
> (compared to the unregistered function which could silently ignore
> unregistered instances)
> 
> +
> > +    func_entry = g_slice_new(YankFuncAndParam);
> > +    func_entry->func = func;
> > +    func_entry->opaque = opaque;
> > +
> > +    QLIST_INSERT_HEAD(&entry->yankfns, func_entry, next);
> > +    qemu_mutex_unlock(&yank_lock);
> > +}
> > +
> > +void yank_unregister_function(const YankInstance *instance,
> > +                              YankFn *func,
> > +                              void *opaque)
> > +{
> > +    YankInstanceEntry *entry;
> > +    YankFuncAndParam *func_entry;
> > +
> > +    qemu_mutex_lock(&yank_lock);
> >  
> 
> 
> +    entry = yank_find_entry(instance);
> > +    assert(entry);
> > +
> > +    QLIST_FOREACH(func_entry, &entry->yankfns, next) {
> > +        if (func_entry->func == func && func_entry->opaque == opaque) {
> > +            QLIST_REMOVE(func_entry, next);
> > +            g_slice_free(YankFuncAndParam, func_entry);
> > +            qemu_mutex_unlock(&yank_lock);
> > +            return;
> > +        }
> > +    }
> > +
> > +    abort();
> > +}
> > +
> > +void yank_generic_iochannel(void *opaque)
> > +{
> > +    QIOChannel *ioc = QIO_CHANNEL(opaque);
> > +
> > +    qio_channel_shutdown(ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
> > +}
> > +
> > +void qmp_yank(YankInstanceList *instances,
> > +              Error **errp)
> > +{
> > +    YankInstanceList *tail;
> > +    YankInstanceEntry *entry;
> > +    YankFuncAndParam *func_entry;
> > +
> > +    qemu_mutex_lock(&yank_lock);
> >  
> 
> 
> +    for (tail = instances; tail; tail = tail->next) {
> > +        entry = yank_find_entry(tail->value);
> > +        if (!entry) {
> > +            error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND, "Instance not
> > found");
> > +            qemu_mutex_unlock(&yank_lock);
> > +            return;
> > +        }
> > +    }
> > +    for (tail = instances; tail; tail = tail->next) {
> > +        entry = yank_find_entry(tail->value);
> > +        assert(entry);
> > +        QLIST_FOREACH(func_entry, &entry->yankfns, next) {
> > +            func_entry->func(func_entry->opaque);
> > +        }
> > +    }
> > +    qemu_mutex_unlock(&yank_lock);
> > +}
> > +
> > +YankInstanceList *qmp_query_yank(Error **errp)
> > +{
> > +    YankInstanceEntry *entry;
> > +    YankInstanceList *ret;
> > +
> > +    ret = NULL;
> > +
> > +    qemu_mutex_lock(&yank_lock);
> >  
> 
> 
> +    QLIST_FOREACH(entry, &yank_instance_list, next) {
> > +        YankInstanceList *new_entry;
> > +        new_entry = g_new0(YankInstanceList, 1);
> > +        new_entry->value = QAPI_CLONE(YankInstance, entry->instance);
> > +        new_entry->next = ret;
> > +        ret = new_entry;
> > +    }
> > +    qemu_mutex_unlock(&yank_lock);
> > +
> > +    return ret;
> > +}
> > +
> > +static void __attribute__((__constructor__)) yank_init(void)
> > +{
> > +    qemu_mutex_init(&yank_lock);
> > +}
> > --
> > 2.20.1
> >
> >  
> 



-- 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-12-28 13:57 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-13 11:48 [PATCH v12 0/7] Introduce 'yank' oob qmp command to recover from hanging qemu Lukas Straub
2020-12-13 11:48 ` [PATCH v12 1/7] Introduce yank feature Lukas Straub
2020-12-22  8:00   ` Marc-André Lureau
2020-12-28 13:56     ` Lukas Straub
2020-12-13 11:48 ` [PATCH v12 2/7] block/nbd.c: Add " Lukas Straub
2020-12-13 11:48 ` [PATCH v12 3/7] chardev/char-socket.c: " Lukas Straub
2020-12-13 11:48 ` [PATCH v12 4/7] migration: " Lukas Straub
2020-12-13 11:48 ` [PATCH v12 5/7] io/channel-tls.c: make qio_channel_tls_shutdown thread-safe Lukas Straub
2020-12-13 11:48 ` [PATCH v12 6/7] io: Document qmp oob suitability of qio_channel_shutdown and io_shutdown Lukas Straub
2020-12-13 11:48 ` [PATCH v12 7/7] tests/test-char.c: Wait for the chardev to connect in char_socket_client_dupid_test Lukas Straub
2020-12-15 13:05 ` [PATCH v12 0/7] Introduce 'yank' oob qmp command to recover from hanging qemu Markus Armbruster
2020-12-19  8:09   ` Markus Armbruster

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).