qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v7 0/8] Introduce 'yank' oob qmp command to recover from hanging qemu
@ 2020-08-04  8:11 Lukas Straub
  2020-08-04  8:11 ` [PATCH v7 1/8] Introduce yank feature Lukas Straub
                   ` (8 more replies)
  0 siblings, 9 replies; 25+ messages in thread
From: Lukas Straub @ 2020-08-04  8:11 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Daniel P. Berrangé,
	qemu-block, Juan Quintela, Dr. David Alan Gilbert, Max Reitz,
	Paolo Bonzini, Marc-André Lureau

[-- Attachment #1: Type: text/plain, Size: 2891 bytes --]

Hello Everyone,
In many cases, if qemu has a network connection (qmp, migration, chardev, etc.)
to some other server and that server dies or hangs, qemu hangs too.
These patches introduce the new 'yank' out-of-band qmp command to recover from
these kinds of hangs. The different subsystems register callbacks which get
executed with the yank command. For example the callback can shutdown() a
socket. This is intended for the colo use-case, but it can be used for other
things too of course.

Regards,
Lukas Straub

v7:
 -yank_register_instance now returns error via Error **errp instead of aborting
 -dropped "chardev/char.c: Check for duplicate id before  creating chardev"

v6:
 -add Reviewed-by and Acked-by tags
 -rebase on master
 -lots of changes in nbd due to rebase
 -only take maintainership of util/yank.c and include/qemu/yank.h (Daniel P. Berrangé)
 -fix a crash discovered by the newly added chardev test
 -fix the test itself

v5:
 -move yank.c to util/
 -move yank.h to include/qemu/
 -add license to yank.h
 -use const char*
 -nbd: use atomic_store_release and atomic_load_aqcuire
 -io-channel: ensure thread-safety and document it
 -add myself as maintainer for yank

v4:
 -fix build errors...

v3:
 -don't touch softmmu/vl.c, use __contructor__ attribute instead (Paolo Bonzini)
 -fix build errors
 -rewrite migration patch so it actually passes all tests

v2:
 -don't touch io/ code anymore
 -always register yank functions
 -'yank' now takes a list of instances to yank
 -'query-yank' returns a list of yankable instances

Lukas Straub (8):
  Introduce yank feature
  block/nbd.c: Add yank feature
  chardev/char-socket.c: Add yank feature
  migration: Add yank feature
  io/channel-tls.c: make qio_channel_tls_shutdown thread-safe
  io: Document thread-safety of qio_channel_shutdown
  MAINTAINERS: Add myself as maintainer for yank feature
  tests/test-char.c: Wait for the chardev to connect in
    char_socket_client_dupid_test

 MAINTAINERS                   |   6 ++
 block/nbd.c                   | 129 +++++++++++++++---------
 chardev/char-socket.c         |  31 ++++++
 include/io/channel.h          |   2 +
 include/qemu/yank.h           |  80 +++++++++++++++
 io/channel-tls.c              |   6 +-
 migration/channel.c           |  12 +++
 migration/migration.c         |  25 ++++-
 migration/multifd.c           |  10 ++
 migration/qemu-file-channel.c |   6 ++
 migration/savevm.c            |   6 ++
 qapi/misc.json                |  45 +++++++++
 tests/Makefile.include        |   2 +-
 tests/test-char.c             |   1 +
 util/Makefile.objs            |   1 +
 util/yank.c                   | 184 ++++++++++++++++++++++++++++++++++
 16 files changed, 493 insertions(+), 53 deletions(-)
 create mode 100644 include/qemu/yank.h
 create mode 100644 util/yank.c

--
2.20.1

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v7 1/8] Introduce yank feature
  2020-08-04  8:11 [PATCH v7 0/8] Introduce 'yank' oob qmp command to recover from hanging qemu Lukas Straub
@ 2020-08-04  8:11 ` Lukas Straub
  2020-08-27 10:31   ` Daniel P. Berrangé
  2020-08-27 12:37   ` Markus Armbruster
  2020-08-04  8:11 ` [PATCH v7 2/8] block/nbd.c: Add " Lukas Straub
                   ` (7 subsequent siblings)
  8 siblings, 2 replies; 25+ messages in thread
From: Lukas Straub @ 2020-08-04  8:11 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Daniel P. Berrangé,
	qemu-block, Juan Quintela, Dr. David Alan Gilbert, Max Reitz,
	Paolo Bonzini, Marc-André Lureau

[-- Attachment #1: Type: text/plain, Size: 9972 bytes --]

The yank feature allows to recover from hanging qemu by "yanking"
at various parts. Other qemu systems can register themselves and
multiple yank functions. Then all yank functions for selected
instances can be called by the 'yank' out-of-band qmp command.
Available instances can be queried by a 'query-yank' oob command.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/qemu/yank.h |  80 +++++++++++++++++++
 qapi/misc.json      |  45 +++++++++++
 util/Makefile.objs  |   1 +
 util/yank.c         | 184 ++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 310 insertions(+)
 create mode 100644 include/qemu/yank.h
 create mode 100644 util/yank.c

diff --git a/include/qemu/yank.h b/include/qemu/yank.h
new file mode 100644
index 0000000000..cd184fcd05
--- /dev/null
+++ b/include/qemu/yank.h
@@ -0,0 +1,80 @@
+/*
+ * QEMU yank feature
+ *
+ * Copyright (c) Lukas Straub <lukasstraub2@web.de>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef YANK_H
+#define YANK_H
+
+typedef void (YankFn) (void *opaque);
+
+/**
+ * yank_register_instance: Register a new instance.
+ *
+ * This registers a new instance for yanking. Must be called before any yank
+ * function is registered for this instance.
+ *
+ * This function is thread-safe.
+ *
+ * @instance_name: The globally unique name of the instance.
+ * @errp: ...
+ */
+void yank_register_instance(const char *instance_name, Error **errp);
+
+/**
+ * yank_unregister_instance: Unregister a instance.
+ *
+ * This unregisters a instance. Must be called only after every yank function
+ * of the instance has been unregistered.
+ *
+ * This function is thread-safe.
+ *
+ * @instance_name: The name of the instance.
+ */
+void yank_unregister_instance(const char *instance_name);
+
+/**
+ * yank_register_function: Register a yank function
+ *
+ * This registers a yank function. All limitations of qmp oob commands apply
+ * to the yank function as well.
+ *
+ * This function is thread-safe.
+ *
+ * @instance_name: The name of the instance
+ * @func: The yank function
+ * @opaque: Will be passed to the yank function
+ */
+void yank_register_function(const char *instance_name,
+                            YankFn *func,
+                            void *opaque);
+
+/**
+ * yank_unregister_function: Unregister a yank function
+ *
+ * This unregisters a yank function.
+ *
+ * This function is thread-safe.
+ *
+ * @instance_name: The name of the instance
+ * @func: func that was passed to yank_register_function
+ * @opaque: opaque that was passed to yank_register_function
+ */
+void yank_unregister_function(const char *instance_name,
+                              YankFn *func,
+                              void *opaque);
+
+/**
+ * yank_unregister_function: Generic yank function for iochannel
+ *
+ * This is a generic yank function which will call qio_channel_shutdown on the
+ * provided QIOChannel.
+ *
+ * @opaque: QIOChannel to shutdown
+ */
+void yank_generic_iochannel(void *opaque);
+#endif
diff --git a/qapi/misc.json b/qapi/misc.json
index 9d32820dc1..0d6a8f20b7 100644
--- a/qapi/misc.json
+++ b/qapi/misc.json
@@ -1615,3 +1615,48 @@
 ##
 { 'command': 'query-vm-generation-id', 'returns': 'GuidInfo' }

+##
+# @YankInstances:
+#
+# @instances: List of yank instances.
+#
+# Yank instances are named after the following schema:
+# "blockdev:<node-name>", "chardev:<chardev-name>" and "migration"
+#
+# Since: 5.1
+##
+{ 'struct': 'YankInstances', 'data': {'instances': ['str'] } }
+
+##
+# @yank:
+#
+# Recover from hanging qemu by yanking the specified instances.
+#
+# Takes @YankInstances as argument.
+#
+# Returns: nothing.
+#
+# Example:
+#
+# -> { "execute": "yank", "arguments": { "instances": ["blockdev:nbd0"] } }
+# <- { "return": {} }
+#
+# Since: 5.1
+##
+{ 'command': 'yank', 'data': 'YankInstances', 'allow-oob': true }
+
+##
+# @query-yank:
+#
+# Query yank instances.
+#
+# Returns: @YankInstances
+#
+# Example:
+#
+# -> { "execute": "query-yank" }
+# <- { "return": { "instances": ["blockdev:nbd0"] } }
+#
+# Since: 5.1
+##
+{ 'command': 'query-yank', 'returns': 'YankInstances', 'allow-oob': true }
diff --git a/util/Makefile.objs b/util/Makefile.objs
index cc5e37177a..13faa98425 100644
--- a/util/Makefile.objs
+++ b/util/Makefile.objs
@@ -45,6 +45,7 @@ util-obj-$(CONFIG_GIO) += dbus.o
 dbus.o-cflags = $(GIO_CFLAGS)
 dbus.o-libs = $(GIO_LIBS)
 util-obj-$(CONFIG_USER_ONLY) += selfmap.o
+util-obj-y += yank.o

 #######################################################################
 # code used by both qemu system emulation and qemu-img
diff --git a/util/yank.c b/util/yank.c
new file mode 100644
index 0000000000..b0cd27728b
--- /dev/null
+++ b/util/yank.c
@@ -0,0 +1,184 @@
+/*
+ * QEMU yank feature
+ *
+ * Copyright (c) Lukas Straub <lukasstraub2@web.de>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qemu/thread.h"
+#include "qemu/queue.h"
+#include "qapi/qapi-commands-misc.h"
+#include "io/channel.h"
+#include "qemu/yank.h"
+
+struct YankFuncAndParam {
+    YankFn *func;
+    void *opaque;
+    QLIST_ENTRY(YankFuncAndParam) next;
+};
+
+struct YankInstance {
+    char *name;
+    QLIST_HEAD(, YankFuncAndParam) yankfns;
+    QLIST_ENTRY(YankInstance) next;
+};
+
+static QemuMutex lock;
+static QLIST_HEAD(yankinst_list, YankInstance) head
+    = QLIST_HEAD_INITIALIZER(head);
+
+static struct YankInstance *yank_find_instance(const char *name)
+{
+    struct YankInstance *tmp, *instance;
+    instance = NULL;
+    QLIST_FOREACH(tmp, &head, next) {
+        if (!strcmp(tmp->name, name)) {
+            instance = tmp;
+        }
+    }
+    return instance;
+}
+
+void yank_register_instance(const char *instance_name, Error **errp)
+{
+    struct YankInstance *instance;
+
+    qemu_mutex_lock(&lock);
+
+    if (yank_find_instance(instance_name)) {
+        error_setg(errp, "duplicate yank instance name: '%s'", instance_name);
+        qemu_mutex_unlock(&lock);
+        return;
+    }
+
+    instance = g_slice_new(struct YankInstance);
+    instance->name = g_strdup(instance_name);
+    QLIST_INIT(&instance->yankfns);
+    QLIST_INSERT_HEAD(&head, instance, next);
+
+    qemu_mutex_unlock(&lock);
+}
+
+void yank_unregister_instance(const char *instance_name)
+{
+    struct YankInstance *instance;
+
+    qemu_mutex_lock(&lock);
+    instance = yank_find_instance(instance_name);
+    assert(instance);
+
+    assert(QLIST_EMPTY(&instance->yankfns));
+    QLIST_REMOVE(instance, next);
+    g_free(instance->name);
+    g_slice_free(struct YankInstance, instance);
+
+    qemu_mutex_unlock(&lock);
+}
+
+void yank_register_function(const char *instance_name,
+                            YankFn *func,
+                            void *opaque)
+{
+    struct YankInstance *instance;
+    struct YankFuncAndParam *entry;
+
+    qemu_mutex_lock(&lock);
+    instance = yank_find_instance(instance_name);
+    assert(instance);
+
+    entry = g_slice_new(struct YankFuncAndParam);
+    entry->func = func;
+    entry->opaque = opaque;
+
+    QLIST_INSERT_HEAD(&instance->yankfns, entry, next);
+    qemu_mutex_unlock(&lock);
+}
+
+void yank_unregister_function(const char *instance_name,
+                              YankFn *func,
+                              void *opaque)
+{
+    struct YankInstance *instance;
+    struct YankFuncAndParam *entry;
+
+    qemu_mutex_lock(&lock);
+    instance = yank_find_instance(instance_name);
+    assert(instance);
+
+    QLIST_FOREACH(entry, &instance->yankfns, next) {
+        if (entry->func == func && entry->opaque == opaque) {
+            QLIST_REMOVE(entry, next);
+            g_slice_free(struct YankFuncAndParam, entry);
+            qemu_mutex_unlock(&lock);
+            return;
+        }
+    }
+
+    abort();
+}
+
+void yank_generic_iochannel(void *opaque)
+{
+    QIOChannel *ioc = QIO_CHANNEL(opaque);
+
+    qio_channel_shutdown(ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
+}
+
+void qmp_yank(strList *instances,
+              Error **errp)
+{
+    strList *tmp;
+    struct YankInstance *instance;
+    struct YankFuncAndParam *entry;
+
+    qemu_mutex_lock(&lock);
+    tmp = instances;
+    for (; tmp; tmp = tmp->next) {
+        instance = yank_find_instance(tmp->value);
+        if (!instance) {
+            error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND,
+                      "Instance '%s' not found", tmp->value);
+            qemu_mutex_unlock(&lock);
+            return;
+        }
+    }
+    tmp = instances;
+    for (; tmp; tmp = tmp->next) {
+        instance = yank_find_instance(tmp->value);
+        assert(instance);
+        QLIST_FOREACH(entry, &instance->yankfns, next) {
+            entry->func(entry->opaque);
+        }
+    }
+    qemu_mutex_unlock(&lock);
+}
+
+YankInstances *qmp_query_yank(Error **errp)
+{
+    struct YankInstance *instance;
+    YankInstances *ret;
+
+    ret = g_new0(YankInstances, 1);
+    ret->instances = NULL;
+
+    qemu_mutex_lock(&lock);
+    QLIST_FOREACH(instance, &head, next) {
+        strList *entry;
+        entry = g_new0(strList, 1);
+        entry->value = g_strdup(instance->name);
+        entry->next = ret->instances;
+        ret->instances = entry;
+    }
+    qemu_mutex_unlock(&lock);
+
+    return ret;
+}
+
+static void __attribute__((__constructor__)) yank_init(void)
+{
+    qemu_mutex_init(&lock);
+}
--
2.20.1


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v7 2/8] block/nbd.c: Add yank feature
  2020-08-04  8:11 [PATCH v7 0/8] Introduce 'yank' oob qmp command to recover from hanging qemu Lukas Straub
  2020-08-04  8:11 ` [PATCH v7 1/8] Introduce yank feature Lukas Straub
@ 2020-08-04  8:11 ` Lukas Straub
  2020-08-27 10:31   ` Daniel P. Berrangé
  2020-08-04  8:11 ` [PATCH v7 3/8] chardev/char-socket.c: " Lukas Straub
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 25+ messages in thread
From: Lukas Straub @ 2020-08-04  8:11 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Daniel P. Berrangé,
	qemu-block, Juan Quintela, Dr. David Alan Gilbert, Max Reitz,
	Paolo Bonzini, Marc-André Lureau

[-- Attachment #1: Type: text/plain, Size: 13903 bytes --]

Register a yank function which shuts down the socket and sets
s->state = NBD_CLIENT_QUIT. This is the same behaviour as if an
error occured.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/nbd.c | 129 ++++++++++++++++++++++++++++++++--------------------
 1 file changed, 80 insertions(+), 49 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 7bb881fef4..8632cf5340 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -35,6 +35,7 @@
 #include "qemu/option.h"
 #include "qemu/cutils.h"
 #include "qemu/main-loop.h"
+#include "qemu/atomic.h"

 #include "qapi/qapi-visit-sockets.h"
 #include "qapi/qmp/qstring.h"
@@ -43,6 +44,8 @@
 #include "block/nbd.h"
 #include "block/block_int.h"

+#include "qemu/yank.h"
+
 #define EN_OPTSTR ":exportname="
 #define MAX_NBD_REQUESTS    16

@@ -84,6 +87,8 @@ typedef struct BDRVNBDState {
     NBDReply reply;
     BlockDriverState *bs;

+    char *yank_name;
+
     /* Connection parameters */
     uint32_t reconnect_delay;
     SocketAddress *saddr;
@@ -93,10 +98,10 @@ typedef struct BDRVNBDState {
     char *x_dirty_bitmap;
 } BDRVNBDState;

-static QIOChannelSocket *nbd_establish_connection(SocketAddress *saddr,
-                                                  Error **errp);
-static int nbd_client_handshake(BlockDriverState *bs, QIOChannelSocket *sioc,
-                                Error **errp);
+static int nbd_establish_connection(BlockDriverState *bs, SocketAddress *saddr,
+                                    Error **errp);
+static int nbd_client_handshake(BlockDriverState *bs, Error **errp);
+static void nbd_yank(void *opaque);

 static void nbd_clear_bdrvstate(BDRVNBDState *s)
 {
@@ -109,17 +114,19 @@ static void nbd_clear_bdrvstate(BDRVNBDState *s)
     s->tlscredsid = NULL;
     g_free(s->x_dirty_bitmap);
     s->x_dirty_bitmap = NULL;
+    g_free(s->yank_name);
+    s->yank_name = NULL;
 }

 static void nbd_channel_error(BDRVNBDState *s, int ret)
 {
     if (ret == -EIO) {
-        if (s->state == NBD_CLIENT_CONNECTED) {
+        if (atomic_load_acquire(&s->state) == NBD_CLIENT_CONNECTED) {
             s->state = s->reconnect_delay ? NBD_CLIENT_CONNECTING_WAIT :
                                             NBD_CLIENT_CONNECTING_NOWAIT;
         }
     } else {
-        if (s->state == NBD_CLIENT_CONNECTED) {
+        if (atomic_load_acquire(&s->state) == NBD_CLIENT_CONNECTED) {
             qio_channel_shutdown(s->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
         }
         s->state = NBD_CLIENT_QUIT;
@@ -170,7 +177,7 @@ static void nbd_client_attach_aio_context(BlockDriverState *bs,
      * s->connection_co is either yielded from nbd_receive_reply or from
      * nbd_co_reconnect_loop()
      */
-    if (s->state == NBD_CLIENT_CONNECTED) {
+    if (atomic_load_acquire(&s->state) == NBD_CLIENT_CONNECTED) {
         qio_channel_attach_aio_context(QIO_CHANNEL(s->ioc), new_context);
     }

@@ -237,20 +244,20 @@ static void nbd_teardown_connection(BlockDriverState *bs)

 static bool nbd_client_connecting(BDRVNBDState *s)
 {
-    return s->state == NBD_CLIENT_CONNECTING_WAIT ||
-        s->state == NBD_CLIENT_CONNECTING_NOWAIT;
+    NBDClientState state = atomic_load_acquire(&s->state);
+    return state == NBD_CLIENT_CONNECTING_WAIT ||
+        state == NBD_CLIENT_CONNECTING_NOWAIT;
 }

 static bool nbd_client_connecting_wait(BDRVNBDState *s)
 {
-    return s->state == NBD_CLIENT_CONNECTING_WAIT;
+    return atomic_load_acquire(&s->state) == NBD_CLIENT_CONNECTING_WAIT;
 }

 static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s)
 {
     int ret;
     Error *local_err = NULL;
-    QIOChannelSocket *sioc;

     if (!nbd_client_connecting(s)) {
         return;
@@ -283,21 +290,21 @@ static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s)
     /* Finalize previous connection if any */
     if (s->ioc) {
         nbd_client_detach_aio_context(s->bs);
+        yank_unregister_function(s->yank_name, nbd_yank, s->bs);
         object_unref(OBJECT(s->sioc));
         s->sioc = NULL;
         object_unref(OBJECT(s->ioc));
         s->ioc = NULL;
     }

-    sioc = nbd_establish_connection(s->saddr, &local_err);
-    if (!sioc) {
+    if (nbd_establish_connection(s->bs, s->saddr, &local_err) < 0) {
         ret = -ECONNREFUSED;
         goto out;
     }

     bdrv_dec_in_flight(s->bs);

-    ret = nbd_client_handshake(s->bs, sioc, &local_err);
+    ret = nbd_client_handshake(s->bs, &local_err);

     if (s->drained) {
         s->wait_drained_end = true;
@@ -334,7 +341,7 @@ static coroutine_fn void nbd_co_reconnect_loop(BDRVNBDState *s)
     nbd_reconnect_attempt(s);

     while (nbd_client_connecting(s)) {
-        if (s->state == NBD_CLIENT_CONNECTING_WAIT &&
+        if (atomic_load_acquire(&s->state) == NBD_CLIENT_CONNECTING_WAIT &&
             qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - start_time_ns > delay_ns)
         {
             s->state = NBD_CLIENT_CONNECTING_NOWAIT;
@@ -371,7 +378,7 @@ static coroutine_fn void nbd_connection_entry(void *opaque)
     int ret = 0;
     Error *local_err = NULL;

-    while (s->state != NBD_CLIENT_QUIT) {
+    while (atomic_load_acquire(&s->state) != NBD_CLIENT_QUIT) {
         /*
          * The NBD client can only really be considered idle when it has
          * yielded from qio_channel_readv_all_eof(), waiting for data. This is
@@ -386,7 +393,7 @@ static coroutine_fn void nbd_connection_entry(void *opaque)
             nbd_co_reconnect_loop(s);
         }

-        if (s->state != NBD_CLIENT_CONNECTED) {
+        if (atomic_load_acquire(&s->state) != NBD_CLIENT_CONNECTED) {
             continue;
         }

@@ -441,6 +448,7 @@ static coroutine_fn void nbd_connection_entry(void *opaque)
     s->connection_co = NULL;
     if (s->ioc) {
         nbd_client_detach_aio_context(s->bs);
+        yank_unregister_function(s->yank_name, nbd_yank, s->bs);
         object_unref(OBJECT(s->sioc));
         s->sioc = NULL;
         object_unref(OBJECT(s->ioc));
@@ -465,7 +473,7 @@ static int nbd_co_send_request(BlockDriverState *bs,
         qemu_co_queue_wait(&s->free_sema, &s->send_mutex);
     }

-    if (s->state != NBD_CLIENT_CONNECTED) {
+    if (atomic_load_acquire(&s->state) != NBD_CLIENT_CONNECTED) {
         rc = -EIO;
         goto err;
     }
@@ -492,7 +500,7 @@ static int nbd_co_send_request(BlockDriverState *bs,
     if (qiov) {
         qio_channel_set_cork(s->ioc, true);
         rc = nbd_send_request(s->ioc, request);
-        if (rc >= 0 && s->state == NBD_CLIENT_CONNECTED) {
+        if (rc >= 0 && atomic_load_acquire(&s->state) == NBD_CLIENT_CONNECTED) {
             if (qio_channel_writev_all(s->ioc, qiov->iov, qiov->niov,
                                        NULL) < 0) {
                 rc = -EIO;
@@ -807,7 +815,7 @@ static coroutine_fn int nbd_co_do_receive_one_chunk(
     s->requests[i].receiving = true;
     qemu_coroutine_yield();
     s->requests[i].receiving = false;
-    if (s->state != NBD_CLIENT_CONNECTED) {
+    if (atomic_load_acquire(&s->state) != NBD_CLIENT_CONNECTED) {
         error_setg(errp, "Connection closed");
         return -EIO;
     }
@@ -966,7 +974,7 @@ static bool nbd_reply_chunk_iter_receive(BDRVNBDState *s,
     NBDReply local_reply;
     NBDStructuredReplyChunk *chunk;
     Error *local_err = NULL;
-    if (s->state != NBD_CLIENT_CONNECTED) {
+    if (atomic_load_acquire(&s->state) != NBD_CLIENT_CONNECTED) {
         error_setg(&local_err, "Connection closed");
         nbd_iter_channel_error(iter, -EIO, &local_err);
         goto break_loop;
@@ -991,7 +999,8 @@ static bool nbd_reply_chunk_iter_receive(BDRVNBDState *s,
     }

     /* Do not execute the body of NBD_FOREACH_REPLY_CHUNK for simple reply. */
-    if (nbd_reply_is_simple(reply) || s->state != NBD_CLIENT_CONNECTED) {
+    if (nbd_reply_is_simple(reply) ||
+        atomic_load_acquire(&s->state) != NBD_CLIENT_CONNECTED) {
         goto break_loop;
     }

@@ -1423,6 +1432,15 @@ static int nbd_client_reopen_prepare(BDRVReopenState *state,
     return 0;
 }

+static void nbd_yank(void *opaque)
+{
+    BlockDriverState *bs = opaque;
+    BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
+
+    atomic_store_release(&s->state, NBD_CLIENT_QUIT);
+    qio_channel_shutdown(QIO_CHANNEL(s->sioc), QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
+}
+
 static void nbd_client_close(BlockDriverState *bs)
 {
     BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
@@ -1435,52 +1453,53 @@ static void nbd_client_close(BlockDriverState *bs)
     nbd_teardown_connection(bs);
 }

-static QIOChannelSocket *nbd_establish_connection(SocketAddress *saddr,
-                                                  Error **errp)
+static int nbd_establish_connection(BlockDriverState *bs,
+                                    SocketAddress *saddr,
+                                    Error **errp)
 {
     ERRP_GUARD();
-    QIOChannelSocket *sioc;
+    BDRVNBDState *s = (BDRVNBDState *)bs->opaque;

-    sioc = qio_channel_socket_new();
-    qio_channel_set_name(QIO_CHANNEL(sioc), "nbd-client");
+    s->sioc = qio_channel_socket_new();
+    qio_channel_set_name(QIO_CHANNEL(s->sioc), "nbd-client");
+    yank_register_function(s->yank_name, nbd_yank, bs);

-    qio_channel_socket_connect_sync(sioc, saddr, errp);
+    qio_channel_socket_connect_sync(s->sioc, saddr, errp);
     if (*errp) {
-        object_unref(OBJECT(sioc));
-        return NULL;
+        yank_unregister_function(s->yank_name, nbd_yank, bs);
+        object_unref(OBJECT(s->sioc));
+        s->sioc = NULL;
+        return -1;
     }

-    qio_channel_set_delay(QIO_CHANNEL(sioc), false);
+    qio_channel_set_delay(QIO_CHANNEL(s->sioc), false);

-    return sioc;
+    return 0;
 }

-/* nbd_client_handshake takes ownership on sioc. On failure it is unref'ed. */
-static int nbd_client_handshake(BlockDriverState *bs, QIOChannelSocket *sioc,
-                                Error **errp)
+/* nbd_client_handshake takes ownership on s->sioc. On failure it's unref'ed. */
+static int nbd_client_handshake(BlockDriverState *bs, Error **errp)
 {
     BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
     AioContext *aio_context = bdrv_get_aio_context(bs);
     int ret;

     trace_nbd_client_handshake(s->export);
-
-    s->sioc = sioc;
-
-    qio_channel_set_blocking(QIO_CHANNEL(sioc), false, NULL);
-    qio_channel_attach_aio_context(QIO_CHANNEL(sioc), aio_context);
+    qio_channel_set_blocking(QIO_CHANNEL(s->sioc), false, NULL);
+    qio_channel_attach_aio_context(QIO_CHANNEL(s->sioc), aio_context);

     s->info.request_sizes = true;
     s->info.structured_reply = true;
     s->info.base_allocation = true;
     s->info.x_dirty_bitmap = g_strdup(s->x_dirty_bitmap);
     s->info.name = g_strdup(s->export ?: "");
-    ret = nbd_receive_negotiate(aio_context, QIO_CHANNEL(sioc), s->tlscreds,
+    ret = nbd_receive_negotiate(aio_context, QIO_CHANNEL(s->sioc), s->tlscreds,
                                 s->hostname, &s->ioc, &s->info, errp);
     g_free(s->info.x_dirty_bitmap);
     g_free(s->info.name);
     if (ret < 0) {
-        object_unref(OBJECT(sioc));
+        yank_unregister_function(s->yank_name, nbd_yank, bs);
+        object_unref(OBJECT(s->sioc));
         s->sioc = NULL;
         return ret;
     }
@@ -1508,7 +1527,7 @@ static int nbd_client_handshake(BlockDriverState *bs, QIOChannelSocket *sioc,
     }

     if (!s->ioc) {
-        s->ioc = QIO_CHANNEL(sioc);
+        s->ioc = QIO_CHANNEL(s->sioc);
         object_ref(OBJECT(s->ioc));
     }

@@ -1524,9 +1543,10 @@ static int nbd_client_handshake(BlockDriverState *bs, QIOChannelSocket *sioc,
     {
         NBDRequest request = { .type = NBD_CMD_DISC };

-        nbd_send_request(s->ioc ?: QIO_CHANNEL(sioc), &request);
+        nbd_send_request(s->ioc ?: QIO_CHANNEL(s->sioc), &request);

-        object_unref(OBJECT(sioc));
+        yank_unregister_function(s->yank_name, nbd_yank, bs);
+        object_unref(OBJECT(s->sioc));
         s->sioc = NULL;

         return ret;
@@ -1918,7 +1938,6 @@ static int nbd_open(BlockDriverState *bs, QDict *options, int flags,
 {
     int ret;
     BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
-    QIOChannelSocket *sioc;

     ret = nbd_process_options(bs, options, errp);
     if (ret < 0) {
@@ -1929,17 +1948,28 @@ static int nbd_open(BlockDriverState *bs, QDict *options, int flags,
     qemu_co_mutex_init(&s->send_mutex);
     qemu_co_queue_init(&s->free_sema);

+    s->yank_name = g_strconcat("blockdev:", bs->node_name, NULL);
+    yank_register_instance(s->yank_name, errp);
+    if (*errp) {
+        g_free(s->yank_name);
+        s->yank_name = NULL;
+        return -EEXIST;
+    }
+
     /*
      * establish TCP connection, return error if it fails
      * TODO: Configurable retry-until-timeout behaviour.
      */
-    sioc = nbd_establish_connection(s->saddr, errp);
-    if (!sioc) {
+    if (nbd_establish_connection(bs, s->saddr, errp) < 0) {
+        yank_unregister_instance(s->yank_name);
+        g_free(s->yank_name);
+        s->yank_name = NULL;
         return -ECONNREFUSED;
     }

-    ret = nbd_client_handshake(bs, sioc, errp);
+    ret = nbd_client_handshake(bs, errp);
     if (ret < 0) {
+        yank_unregister_instance(s->yank_name);
         nbd_clear_bdrvstate(s);
         return ret;
     }
@@ -1997,6 +2027,7 @@ static void nbd_close(BlockDriverState *bs)
     BDRVNBDState *s = bs->opaque;

     nbd_client_close(bs);
+    yank_unregister_instance(s->yank_name);
     nbd_clear_bdrvstate(s);
 }

--
2.20.1


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v7 3/8] chardev/char-socket.c: Add yank feature
  2020-08-04  8:11 [PATCH v7 0/8] Introduce 'yank' oob qmp command to recover from hanging qemu Lukas Straub
  2020-08-04  8:11 ` [PATCH v7 1/8] Introduce yank feature Lukas Straub
  2020-08-04  8:11 ` [PATCH v7 2/8] block/nbd.c: Add " Lukas Straub
@ 2020-08-04  8:11 ` Lukas Straub
  2020-08-27 10:32   ` Daniel P. Berrangé
  2020-08-04  8:11 ` [PATCH v7 4/8] migration: " Lukas Straub
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 25+ messages in thread
From: Lukas Straub @ 2020-08-04  8:11 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Daniel P. Berrangé,
	qemu-block, Juan Quintela, Dr. David Alan Gilbert, Max Reitz,
	Paolo Bonzini, Marc-André Lureau

[-- Attachment #1: Type: text/plain, Size: 4497 bytes --]

Register a yank function to shutdown the socket on yank.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 chardev/char-socket.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/chardev/char-socket.c b/chardev/char-socket.c
index ef62dbf3d7..8e2865ca83 100644
--- a/chardev/char-socket.c
+++ b/chardev/char-socket.c
@@ -34,6 +34,7 @@
 #include "qapi/error.h"
 #include "qapi/clone-visitor.h"
 #include "qapi/qapi-visit-sockets.h"
+#include "qemu/yank.h"

 #include "chardev/char-io.h"

@@ -69,6 +70,7 @@ typedef struct {
     size_t read_msgfds_num;
     int *write_msgfds;
     size_t write_msgfds_num;
+    char *yank_name;

     SocketAddress *addr;
     bool is_listen;
@@ -413,6 +415,11 @@ static void tcp_chr_free_connection(Chardev *chr)

     tcp_set_msgfds(chr, NULL, 0);
     remove_fd_in_watch(chr);
+    if (s->state == TCP_CHARDEV_STATE_CONNECTING
+        || s->state == TCP_CHARDEV_STATE_CONNECTED) {
+        yank_unregister_function(s->yank_name, yank_generic_iochannel,
+                                 QIO_CHANNEL(s->sioc));
+    }
     object_unref(OBJECT(s->sioc));
     s->sioc = NULL;
     object_unref(OBJECT(s->ioc));
@@ -916,6 +923,8 @@ static int tcp_chr_add_client(Chardev *chr, int fd)
     }
     tcp_chr_change_state(s, TCP_CHARDEV_STATE_CONNECTING);
     tcp_chr_set_client_ioc_name(chr, sioc);
+    yank_register_function(s->yank_name, yank_generic_iochannel,
+                           QIO_CHANNEL(sioc));
     ret = tcp_chr_new_client(chr, sioc);
     object_unref(OBJECT(sioc));
     return ret;
@@ -930,6 +939,8 @@ static void tcp_chr_accept(QIONetListener *listener,

     tcp_chr_change_state(s, TCP_CHARDEV_STATE_CONNECTING);
     tcp_chr_set_client_ioc_name(chr, cioc);
+    yank_register_function(s->yank_name, yank_generic_iochannel,
+                           QIO_CHANNEL(cioc));
     tcp_chr_new_client(chr, cioc);
 }

@@ -945,6 +956,8 @@ static int tcp_chr_connect_client_sync(Chardev *chr, Error **errp)
         object_unref(OBJECT(sioc));
         return -1;
     }
+    yank_register_function(s->yank_name, yank_generic_iochannel,
+                           QIO_CHANNEL(sioc));
     tcp_chr_new_client(chr, sioc);
     object_unref(OBJECT(sioc));
     return 0;
@@ -960,6 +973,8 @@ static void tcp_chr_accept_server_sync(Chardev *chr)
     tcp_chr_change_state(s, TCP_CHARDEV_STATE_CONNECTING);
     sioc = qio_net_listener_wait_client(s->listener);
     tcp_chr_set_client_ioc_name(chr, sioc);
+    yank_register_function(s->yank_name, yank_generic_iochannel,
+                           QIO_CHANNEL(sioc));
     tcp_chr_new_client(chr, sioc);
     object_unref(OBJECT(sioc));
 }
@@ -1070,6 +1085,10 @@ static void char_socket_finalize(Object *obj)
         object_unref(OBJECT(s->tls_creds));
     }
     g_free(s->tls_authz);
+    if (s->yank_name) {
+        yank_unregister_instance(s->yank_name);
+        g_free(s->yank_name);
+    }

     qemu_chr_be_event(chr, CHR_EVENT_CLOSED);
 }
@@ -1085,6 +1104,8 @@ static void qemu_chr_socket_connected(QIOTask *task, void *opaque)

     if (qio_task_propagate_error(task, &err)) {
         tcp_chr_change_state(s, TCP_CHARDEV_STATE_DISCONNECTED);
+        yank_unregister_function(s->yank_name, yank_generic_iochannel,
+                                 QIO_CHANNEL(sioc));
         check_report_connect_error(chr, err);
         goto cleanup;
     }
@@ -1118,6 +1139,8 @@ static void tcp_chr_connect_client_async(Chardev *chr)
     tcp_chr_change_state(s, TCP_CHARDEV_STATE_CONNECTING);
     sioc = qio_channel_socket_new();
     tcp_chr_set_client_ioc_name(chr, sioc);
+    yank_register_function(s->yank_name, yank_generic_iochannel,
+                           QIO_CHANNEL(sioc));
     /*
      * Normally code would use the qio_channel_socket_connect_async
      * method which uses a QIOTask + qio_task_set_error internally
@@ -1360,6 +1383,14 @@ static void qmp_chardev_open_socket(Chardev *chr,
         qemu_chr_set_feature(chr, QEMU_CHAR_FEATURE_FD_PASS);
     }

+    s->yank_name = g_strconcat("chardev:", chr->label, NULL);
+    yank_register_instance(s->yank_name, errp);
+    if (*errp) {
+        g_free(s->yank_name);
+        s->yank_name = NULL;
+        return;
+    }
+
     /* be isn't opened until we get a connection */
     *be_opened = false;

--
2.20.1


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v7 4/8] migration: Add yank feature
  2020-08-04  8:11 [PATCH v7 0/8] Introduce 'yank' oob qmp command to recover from hanging qemu Lukas Straub
                   ` (2 preceding siblings ...)
  2020-08-04  8:11 ` [PATCH v7 3/8] chardev/char-socket.c: " Lukas Straub
@ 2020-08-04  8:11 ` Lukas Straub
  2020-08-27 10:39   ` Daniel P. Berrangé
  2020-08-04  8:11 ` [PATCH v7 5/8] io/channel-tls.c: make qio_channel_tls_shutdown thread-safe Lukas Straub
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 25+ messages in thread
From: Lukas Straub @ 2020-08-04  8:11 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Daniel P. Berrangé,
	qemu-block, Juan Quintela, Dr. David Alan Gilbert, Max Reitz,
	Paolo Bonzini, Marc-André Lureau

[-- Attachment #1: Type: text/plain, Size: 8421 bytes --]

Register yank functions on sockets to shut them down.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 migration/channel.c           | 12 ++++++++++++
 migration/migration.c         | 25 ++++++++++++++++++++++++-
 migration/multifd.c           | 10 ++++++++++
 migration/qemu-file-channel.c |  6 ++++++
 migration/savevm.c            |  6 ++++++
 tests/Makefile.include        |  2 +-
 6 files changed, 59 insertions(+), 2 deletions(-)

diff --git a/migration/channel.c b/migration/channel.c
index 20e4c8e2dc..21fc8046b9 100644
--- a/migration/channel.c
+++ b/migration/channel.c
@@ -18,6 +18,8 @@
 #include "trace.h"
 #include "qapi/error.h"
 #include "io/channel-tls.h"
+#include "io/channel-socket.h"
+#include "qemu/yank.h"

 /**
  * @migration_channel_process_incoming - Create new incoming migration channel
@@ -35,6 +37,11 @@ void migration_channel_process_incoming(QIOChannel *ioc)
     trace_migration_set_incoming_channel(
         ioc, object_get_typename(OBJECT(ioc)));

+    if (object_dynamic_cast(OBJECT(ioc), TYPE_QIO_CHANNEL_SOCKET)) {
+        yank_register_function("migration", yank_generic_iochannel,
+                               QIO_CHANNEL(ioc));
+    }
+
     if (s->parameters.tls_creds &&
         *s->parameters.tls_creds &&
         !object_dynamic_cast(OBJECT(ioc),
@@ -67,6 +74,11 @@ void migration_channel_connect(MigrationState *s,
         ioc, object_get_typename(OBJECT(ioc)), hostname, error);

     if (!error) {
+        if (object_dynamic_cast(OBJECT(ioc), TYPE_QIO_CHANNEL_SOCKET)) {
+            yank_register_function("migration", yank_generic_iochannel,
+                                   QIO_CHANNEL(ioc));
+        }
+
         if (s->parameters.tls_creds &&
             *s->parameters.tls_creds &&
             !object_dynamic_cast(OBJECT(ioc),
diff --git a/migration/migration.c b/migration/migration.c
index 8fe36339db..e4818edb2a 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -55,6 +55,7 @@
 #include "net/announce.h"
 #include "qemu/queue.h"
 #include "multifd.h"
+#include "qemu/yank.h"

 #define MAX_THROTTLE  (32 << 20)      /* Migration transfer speed throttling */

@@ -243,6 +244,8 @@ void migration_incoming_state_destroy(void)
         qapi_free_SocketAddressList(mis->socket_address_list);
         mis->socket_address_list = NULL;
     }
+
+    yank_unregister_instance("migration");
 }

 static void migrate_generate_event(int new_state)
@@ -379,8 +382,14 @@ void qemu_start_incoming_migration(const char *uri, Error **errp)
 {
     const char *p;

+    yank_register_instance("migration", errp);
+    if (*errp) {
+        return;
+    }
+
     qapi_event_send_migration(MIGRATION_STATUS_SETUP);
     if (!strcmp(uri, "defer")) {
+        yank_unregister_instance("migration");
         deferred_incoming_migration(errp);
     } else if (strstart(uri, "tcp:", &p)) {
         tcp_start_incoming_migration(p, errp);
@@ -395,6 +404,7 @@ void qemu_start_incoming_migration(const char *uri, Error **errp)
     } else if (strstart(uri, "fd:", &p)) {
         fd_start_incoming_migration(p, errp);
     } else {
+        yank_unregister_instance("migration");
         error_setg(errp, "unknown migration protocol: %s", uri);
     }
 }
@@ -1662,6 +1672,7 @@ static void migrate_fd_cleanup(MigrationState *s)
     }
     notifier_list_notify(&migration_state_notifiers, s);
     block_cleanup_parameters(s);
+    yank_unregister_instance("migration");
 }

 static void migrate_fd_cleanup_schedule(MigrationState *s)
@@ -1935,6 +1946,7 @@ void qmp_migrate_recover(const char *uri, Error **errp)
      * only re-setup the migration stream and poke existing migration
      * to continue using that newly established channel.
      */
+    yank_unregister_instance("migration");
     qemu_start_incoming_migration(uri, errp);
 }

@@ -2071,7 +2083,12 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
         /* Error detected, put into errp */
         return;
     }
-
+    if (!(has_resume && resume)) {
+        yank_register_instance("migration", errp);
+        if (*errp) {
+            return;
+        }
+    }
     if (strstart(uri, "tcp:", &p)) {
         tcp_start_outgoing_migration(s, p, &local_err);
 #ifdef CONFIG_RDMA
@@ -2085,6 +2102,9 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
     } else if (strstart(uri, "fd:", &p)) {
         fd_start_outgoing_migration(s, p, &local_err);
     } else {
+        if (!(has_resume && resume)) {
+            yank_unregister_instance("migration");
+        }
         error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "uri",
                    "a valid migration protocol");
         migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
@@ -2094,6 +2114,9 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
     }

     if (local_err) {
+        if (!(has_resume && resume)) {
+            yank_unregister_instance("migration");
+        }
         migrate_fd_error(s, local_err);
         error_propagate(errp, local_err);
         return;
diff --git a/migration/multifd.c b/migration/multifd.c
index d0441202aa..2c9863e770 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -24,6 +24,9 @@
 #include "trace.h"
 #include "multifd.h"

+#include "qemu/yank.h"
+#include "io/channel-socket.h"
+
 /* Multiple fd's */

 #define MULTIFD_MAGIC 0x11223344U
@@ -866,6 +869,13 @@ int multifd_load_cleanup(Error **errp)
     for (i = 0; i < migrate_multifd_channels(); i++) {
         MultiFDRecvParams *p = &multifd_recv_state->params[i];

+        if (object_dynamic_cast(OBJECT(p->c), TYPE_QIO_CHANNEL_SOCKET)
+            && OBJECT(p->c)->ref == 1) {
+            yank_unregister_function("migration",
+                                     yank_generic_iochannel,
+                                     QIO_CHANNEL(p->c));
+        }
+
         object_unref(OBJECT(p->c));
         p->c = NULL;
         qemu_mutex_destroy(&p->mutex);
diff --git a/migration/qemu-file-channel.c b/migration/qemu-file-channel.c
index d2ce32f4b9..d8f8384fea 100644
--- a/migration/qemu-file-channel.c
+++ b/migration/qemu-file-channel.c
@@ -27,6 +27,7 @@
 #include "qemu-file.h"
 #include "io/channel-socket.h"
 #include "qemu/iov.h"
+#include "qemu/yank.h"


 static ssize_t channel_writev_buffer(void *opaque,
@@ -104,6 +105,11 @@ static int channel_close(void *opaque, Error **errp)
     int ret;
     QIOChannel *ioc = QIO_CHANNEL(opaque);
     ret = qio_channel_close(ioc, errp);
+    if (object_dynamic_cast(OBJECT(ioc), TYPE_QIO_CHANNEL_SOCKET)
+        && OBJECT(ioc)->ref == 1) {
+        yank_unregister_function("migration", yank_generic_iochannel,
+                                 QIO_CHANNEL(ioc));
+    }
     object_unref(OBJECT(ioc));
     return ret;
 }
diff --git a/migration/savevm.c b/migration/savevm.c
index a843d202b5..395b8fa704 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -63,6 +63,7 @@
 #include "migration/colo.h"
 #include "qemu/bitmap.h"
 #include "net/announce.h"
+#include "qemu/yank.h"

 const unsigned int postcopy_ram_discard_version = 0;

@@ -2935,6 +2936,11 @@ int load_snapshot(const char *name, Error **errp)
     qemu_system_reset(SHUTDOWN_CAUSE_NONE);
     mis->from_src_file = f;

+    yank_register_instance("migration", errp);
+    if (*errp) {
+        ret = -EINVAL;
+        goto err_drain;
+    }
     aio_context_acquire(aio_context);
     ret = qemu_loadvm_state(f);
     migration_incoming_state_destroy();
diff --git a/tests/Makefile.include b/tests/Makefile.include
index c7e4646ded..e733918269 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -452,7 +452,7 @@ tests/test-qdev-global-props$(EXESUF): tests/test-qdev-global-props.o \
 	$(test-qapi-obj-y)
 tests/test-vmstate$(EXESUF): tests/test-vmstate.o \
 	migration/vmstate.o migration/vmstate-types.o migration/qemu-file.o \
-        migration/qemu-file-channel.o migration/qjson.o \
+        migration/qemu-file-channel.o migration/qjson.o util/yank.o \
 	$(test-io-obj-y)
 tests/test-timed-average$(EXESUF): tests/test-timed-average.o $(test-util-obj-y)
 tests/test-base64$(EXESUF): tests/test-base64.o $(test-util-obj-y)
--
2.20.1


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v7 5/8] io/channel-tls.c: make qio_channel_tls_shutdown thread-safe
  2020-08-04  8:11 [PATCH v7 0/8] Introduce 'yank' oob qmp command to recover from hanging qemu Lukas Straub
                   ` (3 preceding siblings ...)
  2020-08-04  8:11 ` [PATCH v7 4/8] migration: " Lukas Straub
@ 2020-08-04  8:11 ` Lukas Straub
  2020-08-04  8:11 ` [PATCH v7 6/8] io: Document thread-safety of qio_channel_shutdown Lukas Straub
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 25+ messages in thread
From: Lukas Straub @ 2020-08-04  8:11 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Daniel P. Berrangé,
	qemu-block, Juan Quintela, Dr. David Alan Gilbert, Max Reitz,
	Paolo Bonzini, Marc-André Lureau

[-- Attachment #1: Type: text/plain, Size: 1379 bytes --]

Make qio_channel_tls_shutdown thread-safe by using atomics when
accessing tioc->shutdown.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 io/channel-tls.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/io/channel-tls.c b/io/channel-tls.c
index 7ec8ceff2f..b350c84640 100644
--- a/io/channel-tls.c
+++ b/io/channel-tls.c
@@ -23,6 +23,7 @@
 #include "qemu/module.h"
 #include "io/channel-tls.h"
 #include "trace.h"
+#include "qemu/atomic.h"


 static ssize_t qio_channel_tls_write_handler(const char *buf,
@@ -277,7 +278,8 @@ static ssize_t qio_channel_tls_readv(QIOChannel *ioc,
                     return QIO_CHANNEL_ERR_BLOCK;
                 }
             } else if (errno == ECONNABORTED &&
-                       (tioc->shutdown & QIO_CHANNEL_SHUTDOWN_READ)) {
+                       (atomic_load_acquire(&tioc->shutdown) &
+                        QIO_CHANNEL_SHUTDOWN_READ)) {
                 return 0;
             }

@@ -361,7 +363,7 @@ static int qio_channel_tls_shutdown(QIOChannel *ioc,
 {
     QIOChannelTLS *tioc = QIO_CHANNEL_TLS(ioc);

-    tioc->shutdown |= how;
+    atomic_or(&tioc->shutdown, how);

     return qio_channel_shutdown(tioc->master, how, errp);
 }
--
2.20.1


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v7 6/8] io: Document thread-safety of qio_channel_shutdown
  2020-08-04  8:11 [PATCH v7 0/8] Introduce 'yank' oob qmp command to recover from hanging qemu Lukas Straub
                   ` (4 preceding siblings ...)
  2020-08-04  8:11 ` [PATCH v7 5/8] io/channel-tls.c: make qio_channel_tls_shutdown thread-safe Lukas Straub
@ 2020-08-04  8:11 ` Lukas Straub
  2020-08-04  8:11 ` [PATCH v7 7/8] MAINTAINERS: Add myself as maintainer for yank feature Lukas Straub
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 25+ messages in thread
From: Lukas Straub @ 2020-08-04  8:11 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Daniel P. Berrangé,
	qemu-block, Juan Quintela, Dr. David Alan Gilbert, Max Reitz,
	Paolo Bonzini, Marc-André Lureau

[-- Attachment #1: Type: text/plain, Size: 773 bytes --]

Migration and yank code assume that qio_channel_shutdown is thread
-safe. Document this after checking the code.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/io/channel.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/io/channel.h b/include/io/channel.h
index d4557f0930..6b8a2b87b8 100644
--- a/include/io/channel.h
+++ b/include/io/channel.h
@@ -516,6 +516,8 @@ int qio_channel_close(QIOChannel *ioc,
  * QIO_CHANNEL_FEATURE_SHUTDOWN prior to calling
  * this method.
  *
+ * This function is thread-safe.
+ *
  * Returns: 0 on success, -1 on error
  */
 int qio_channel_shutdown(QIOChannel *ioc,
--
2.20.1


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v7 7/8] MAINTAINERS: Add myself as maintainer for yank feature
  2020-08-04  8:11 [PATCH v7 0/8] Introduce 'yank' oob qmp command to recover from hanging qemu Lukas Straub
                   ` (5 preceding siblings ...)
  2020-08-04  8:11 ` [PATCH v7 6/8] io: Document thread-safety of qio_channel_shutdown Lukas Straub
@ 2020-08-04  8:11 ` Lukas Straub
  2020-08-04  8:12 ` [PATCH v7 8/8] tests/test-char.c: Wait for the chardev to connect in char_socket_client_dupid_test Lukas Straub
  2020-08-18 12:26 ` [PATCH v7 0/8] Introduce 'yank' oob qmp command to recover from hanging qemu Lukas Straub
  8 siblings, 0 replies; 25+ messages in thread
From: Lukas Straub @ 2020-08-04  8:11 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Daniel P. Berrangé,
	qemu-block, Juan Quintela, Dr. David Alan Gilbert, Max Reitz,
	Paolo Bonzini, Marc-André Lureau

[-- Attachment #1: Type: text/plain, Size: 732 bytes --]

I'll maintain this for now as the colo usecase is the first user
of this functionality.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 MAINTAINERS | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 0886eb3d2b..bf5075a637 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2614,6 +2614,12 @@ F: util/uuid.c
 F: include/qemu/uuid.h
 F: tests/test-uuid.c

+Yank feature
+M: Lukas Straub <lukasstraub2@web.de>
+S: Odd fixes
+F: util/yank.c
+F: include/qemu/yank.h
+
 COLO Framework
 M: zhanghailiang <zhang.zhanghailiang@huawei.com>
 S: Maintained
--
2.20.1


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v7 8/8] tests/test-char.c: Wait for the chardev to connect in char_socket_client_dupid_test
  2020-08-04  8:11 [PATCH v7 0/8] Introduce 'yank' oob qmp command to recover from hanging qemu Lukas Straub
                   ` (6 preceding siblings ...)
  2020-08-04  8:11 ` [PATCH v7 7/8] MAINTAINERS: Add myself as maintainer for yank feature Lukas Straub
@ 2020-08-04  8:12 ` Lukas Straub
  2020-08-27 10:30   ` Daniel P. Berrangé
  2020-08-18 12:26 ` [PATCH v7 0/8] Introduce 'yank' oob qmp command to recover from hanging qemu Lukas Straub
  8 siblings, 1 reply; 25+ messages in thread
From: Lukas Straub @ 2020-08-04  8:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Daniel P. Berrangé,
	qemu-block, Juan Quintela, Dr. David Alan Gilbert, Max Reitz,
	Paolo Bonzini, Marc-André Lureau

[-- Attachment #1: Type: text/plain, Size: 999 bytes --]

A connecting chardev object has an additional reference by the connecting
thread, so if the chardev is still connecting by the end of the test,
then the chardev object won't be freed. This in turn means that the yank
instance won't be unregistered and when running the next test-case
yank_register_instance will abort, because the yank instance is
already/still registered.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
---
 tests/test-char.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tests/test-char.c b/tests/test-char.c
index d35cc839bc..2ced07de69 100644
--- a/tests/test-char.c
+++ b/tests/test-char.c
@@ -937,6 +937,7 @@ static void char_socket_client_dupid_test(gconstpointer opaque)
     g_assert_nonnull(opts);
     chr1 = qemu_chr_new_from_opts(opts, NULL, &error_abort);
     g_assert_nonnull(chr1);
+    qemu_chr_wait_connected(chr1, &error_abort);

     chr2 = qemu_chr_new_from_opts(opts, NULL, &local_err);
     g_assert_null(chr2);
--
2.20.1

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 0/8] Introduce 'yank' oob qmp command to recover from hanging qemu
  2020-08-04  8:11 [PATCH v7 0/8] Introduce 'yank' oob qmp command to recover from hanging qemu Lukas Straub
                   ` (7 preceding siblings ...)
  2020-08-04  8:12 ` [PATCH v7 8/8] tests/test-char.c: Wait for the chardev to connect in char_socket_client_dupid_test Lukas Straub
@ 2020-08-18 12:26 ` Lukas Straub
  2020-08-27  8:42   ` Lukas Straub
  8 siblings, 1 reply; 25+ messages in thread
From: Lukas Straub @ 2020-08-18 12:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Daniel P. Berrangé,
	qemu-block, Juan Quintela, Dr. David Alan Gilbert, Max Reitz,
	Paolo Bonzini, Marc-André Lureau

[-- Attachment #1: Type: text/plain, Size: 3142 bytes --]

On Tue, 4 Aug 2020 10:11:22 +0200
Lukas Straub <lukasstraub2@web.de> wrote:

> Hello Everyone,
> In many cases, if qemu has a network connection (qmp, migration, chardev, etc.)
> to some other server and that server dies or hangs, qemu hangs too.
> These patches introduce the new 'yank' out-of-band qmp command to recover from
> these kinds of hangs. The different subsystems register callbacks which get
> executed with the yank command. For example the callback can shutdown() a
> socket. This is intended for the colo use-case, but it can be used for other
> things too of course.
> 
> Regards,
> Lukas Straub
> 
> v7:
>  -yank_register_instance now returns error via Error **errp instead of aborting
>  -dropped "chardev/char.c: Check for duplicate id before  creating chardev"
> 
> v6:
>  -add Reviewed-by and Acked-by tags
>  -rebase on master
>  -lots of changes in nbd due to rebase
>  -only take maintainership of util/yank.c and include/qemu/yank.h (Daniel P. Berrangé)
>  -fix a crash discovered by the newly added chardev test
>  -fix the test itself
> 
> v5:
>  -move yank.c to util/
>  -move yank.h to include/qemu/
>  -add license to yank.h
>  -use const char*
>  -nbd: use atomic_store_release and atomic_load_aqcuire
>  -io-channel: ensure thread-safety and document it
>  -add myself as maintainer for yank
> 
> v4:
>  -fix build errors...
> 
> v3:
>  -don't touch softmmu/vl.c, use __contructor__ attribute instead (Paolo Bonzini)
>  -fix build errors
>  -rewrite migration patch so it actually passes all tests
> 
> v2:
>  -don't touch io/ code anymore
>  -always register yank functions
>  -'yank' now takes a list of instances to yank
>  -'query-yank' returns a list of yankable instances
> 
> Lukas Straub (8):
>   Introduce yank feature
>   block/nbd.c: Add yank feature
>   chardev/char-socket.c: Add yank feature
>   migration: Add yank feature
>   io/channel-tls.c: make qio_channel_tls_shutdown thread-safe
>   io: Document thread-safety of qio_channel_shutdown
>   MAINTAINERS: Add myself as maintainer for yank feature
>   tests/test-char.c: Wait for the chardev to connect in
>     char_socket_client_dupid_test
> 
>  MAINTAINERS                   |   6 ++
>  block/nbd.c                   | 129 +++++++++++++++---------
>  chardev/char-socket.c         |  31 ++++++
>  include/io/channel.h          |   2 +
>  include/qemu/yank.h           |  80 +++++++++++++++
>  io/channel-tls.c              |   6 +-
>  migration/channel.c           |  12 +++
>  migration/migration.c         |  25 ++++-
>  migration/multifd.c           |  10 ++
>  migration/qemu-file-channel.c |   6 ++
>  migration/savevm.c            |   6 ++
>  qapi/misc.json                |  45 +++++++++
>  tests/Makefile.include        |   2 +-
>  tests/test-char.c             |   1 +
>  util/Makefile.objs            |   1 +
>  util/yank.c                   | 184 ++++++++++++++++++++++++++++++++++
>  16 files changed, 493 insertions(+), 53 deletions(-)
>  create mode 100644 include/qemu/yank.h
>  create mode 100644 util/yank.c
> 
> --
> 2.20.1

Ping...

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 0/8] Introduce 'yank' oob qmp command to recover from hanging qemu
  2020-08-18 12:26 ` [PATCH v7 0/8] Introduce 'yank' oob qmp command to recover from hanging qemu Lukas Straub
@ 2020-08-27  8:42   ` Lukas Straub
  2020-08-27 10:41     ` Daniel P. Berrangé
  0 siblings, 1 reply; 25+ messages in thread
From: Lukas Straub @ 2020-08-27  8:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Daniel P. Berrangé,
	qemu-block, Juan Quintela, Dr. David Alan Gilbert, Max Reitz,
	Paolo Bonzini, Marc-André Lureau

[-- Attachment #1: Type: text/plain, Size: 3511 bytes --]

On Tue, 18 Aug 2020 14:26:31 +0200
Lukas Straub <lukasstraub2@web.de> wrote:

> On Tue, 4 Aug 2020 10:11:22 +0200
> Lukas Straub <lukasstraub2@web.de> wrote:
> 
> > Hello Everyone,
> > In many cases, if qemu has a network connection (qmp, migration, chardev, etc.)
> > to some other server and that server dies or hangs, qemu hangs too.
> > These patches introduce the new 'yank' out-of-band qmp command to recover from
> > these kinds of hangs. The different subsystems register callbacks which get
> > executed with the yank command. For example the callback can shutdown() a
> > socket. This is intended for the colo use-case, but it can be used for other
> > things too of course.
> > 
> > Regards,
> > Lukas Straub
> > 
> > v7:
> >  -yank_register_instance now returns error via Error **errp instead of aborting
> >  -dropped "chardev/char.c: Check for duplicate id before  creating chardev"
> > 
> > v6:
> >  -add Reviewed-by and Acked-by tags
> >  -rebase on master
> >  -lots of changes in nbd due to rebase
> >  -only take maintainership of util/yank.c and include/qemu/yank.h (Daniel P. Berrangé)
> >  -fix a crash discovered by the newly added chardev test
> >  -fix the test itself
> > 
> > v5:
> >  -move yank.c to util/
> >  -move yank.h to include/qemu/
> >  -add license to yank.h
> >  -use const char*
> >  -nbd: use atomic_store_release and atomic_load_aqcuire
> >  -io-channel: ensure thread-safety and document it
> >  -add myself as maintainer for yank
> > 
> > v4:
> >  -fix build errors...
> > 
> > v3:
> >  -don't touch softmmu/vl.c, use __contructor__ attribute instead (Paolo Bonzini)
> >  -fix build errors
> >  -rewrite migration patch so it actually passes all tests
> > 
> > v2:
> >  -don't touch io/ code anymore
> >  -always register yank functions
> >  -'yank' now takes a list of instances to yank
> >  -'query-yank' returns a list of yankable instances
> > 
> > Lukas Straub (8):
> >   Introduce yank feature
> >   block/nbd.c: Add yank feature
> >   chardev/char-socket.c: Add yank feature
> >   migration: Add yank feature
> >   io/channel-tls.c: make qio_channel_tls_shutdown thread-safe
> >   io: Document thread-safety of qio_channel_shutdown
> >   MAINTAINERS: Add myself as maintainer for yank feature
> >   tests/test-char.c: Wait for the chardev to connect in
> >     char_socket_client_dupid_test
> > 
> >  MAINTAINERS                   |   6 ++
> >  block/nbd.c                   | 129 +++++++++++++++---------
> >  chardev/char-socket.c         |  31 ++++++
> >  include/io/channel.h          |   2 +
> >  include/qemu/yank.h           |  80 +++++++++++++++
> >  io/channel-tls.c              |   6 +-
> >  migration/channel.c           |  12 +++
> >  migration/migration.c         |  25 ++++-
> >  migration/multifd.c           |  10 ++
> >  migration/qemu-file-channel.c |   6 ++
> >  migration/savevm.c            |   6 ++
> >  qapi/misc.json                |  45 +++++++++
> >  tests/Makefile.include        |   2 +-
> >  tests/test-char.c             |   1 +
> >  util/Makefile.objs            |   1 +
> >  util/yank.c                   | 184 ++++++++++++++++++++++++++++++++++
> >  16 files changed, 493 insertions(+), 53 deletions(-)
> >  create mode 100644 include/qemu/yank.h
> >  create mode 100644 util/yank.c
> > 
> > --
> > 2.20.1  
> 
> Ping...

Ping 2...

Also, can the different subsystems have a look at this and give their ok?

Regards,
Lukas Straub

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 8/8] tests/test-char.c: Wait for the chardev to connect in char_socket_client_dupid_test
  2020-08-04  8:12 ` [PATCH v7 8/8] tests/test-char.c: Wait for the chardev to connect in char_socket_client_dupid_test Lukas Straub
@ 2020-08-27 10:30   ` Daniel P. Berrangé
  0 siblings, 0 replies; 25+ messages in thread
From: Daniel P. Berrangé @ 2020-08-27 10:30 UTC (permalink / raw)
  To: Lukas Straub
  Cc: Kevin Wolf, qemu-block, Juan Quintela, qemu-devel, Max Reitz,
	Paolo Bonzini, Marc-André Lureau, Dr. David Alan Gilbert

On Tue, Aug 04, 2020 at 10:12:01AM +0200, Lukas Straub wrote:
> A connecting chardev object has an additional reference by the connecting
> thread, so if the chardev is still connecting by the end of the test,
> then the chardev object won't be freed. This in turn means that the yank
> instance won't be unregistered and when running the next test-case
> yank_register_instance will abort, because the yank instance is
> already/still registered.
> 
> Signed-off-by: Lukas Straub <lukasstraub2@web.de>
> ---
>  tests/test-char.c | 1 +
>  1 file changed, 1 insertion(+)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 1/8] Introduce yank feature
  2020-08-04  8:11 ` [PATCH v7 1/8] Introduce yank feature Lukas Straub
@ 2020-08-27 10:31   ` Daniel P. Berrangé
  2020-08-27 12:37   ` Markus Armbruster
  1 sibling, 0 replies; 25+ messages in thread
From: Daniel P. Berrangé @ 2020-08-27 10:31 UTC (permalink / raw)
  To: Lukas Straub
  Cc: Kevin Wolf, qemu-block, Juan Quintela, qemu-devel, Max Reitz,
	Paolo Bonzini, Marc-André Lureau, Dr. David Alan Gilbert

On Tue, Aug 04, 2020 at 10:11:34AM +0200, Lukas Straub wrote:
> The yank feature allows to recover from hanging qemu by "yanking"
> at various parts. Other qemu systems can register themselves and
> multiple yank functions. Then all yank functions for selected
> instances can be called by the 'yank' out-of-band qmp command.
> Available instances can be queried by a 'query-yank' oob command.
> 
> Signed-off-by: Lukas Straub <lukasstraub2@web.de>
> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  include/qemu/yank.h |  80 +++++++++++++++++++
>  qapi/misc.json      |  45 +++++++++++
>  util/Makefile.objs  |   1 +
>  util/yank.c         | 184 ++++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 310 insertions(+)
>  create mode 100644 include/qemu/yank.h
>  create mode 100644 util/yank.c

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 2/8] block/nbd.c: Add yank feature
  2020-08-04  8:11 ` [PATCH v7 2/8] block/nbd.c: Add " Lukas Straub
@ 2020-08-27 10:31   ` Daniel P. Berrangé
  0 siblings, 0 replies; 25+ messages in thread
From: Daniel P. Berrangé @ 2020-08-27 10:31 UTC (permalink / raw)
  To: Lukas Straub
  Cc: Kevin Wolf, qemu-block, Juan Quintela, qemu-devel, Max Reitz,
	Paolo Bonzini, Marc-André Lureau, Dr. David Alan Gilbert

On Tue, Aug 04, 2020 at 10:11:37AM +0200, Lukas Straub wrote:
> Register a yank function which shuts down the socket and sets
> s->state = NBD_CLIENT_QUIT. This is the same behaviour as if an
> error occured.
> 
> Signed-off-by: Lukas Straub <lukasstraub2@web.de>
> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  block/nbd.c | 129 ++++++++++++++++++++++++++++++++--------------------
>  1 file changed, 80 insertions(+), 49 deletions(-)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 3/8] chardev/char-socket.c: Add yank feature
  2020-08-04  8:11 ` [PATCH v7 3/8] chardev/char-socket.c: " Lukas Straub
@ 2020-08-27 10:32   ` Daniel P. Berrangé
  0 siblings, 0 replies; 25+ messages in thread
From: Daniel P. Berrangé @ 2020-08-27 10:32 UTC (permalink / raw)
  To: Lukas Straub
  Cc: Kevin Wolf, qemu-block, Juan Quintela, qemu-devel, Max Reitz,
	Paolo Bonzini, Marc-André Lureau, Dr. David Alan Gilbert

On Tue, Aug 04, 2020 at 10:11:41AM +0200, Lukas Straub wrote:
> Register a yank function to shutdown the socket on yank.
> 
> Signed-off-by: Lukas Straub <lukasstraub2@web.de>
> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  chardev/char-socket.c | 31 +++++++++++++++++++++++++++++++
>  1 file changed, 31 insertions(+)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 4/8] migration: Add yank feature
  2020-08-04  8:11 ` [PATCH v7 4/8] migration: " Lukas Straub
@ 2020-08-27 10:39   ` Daniel P. Berrangé
  0 siblings, 0 replies; 25+ messages in thread
From: Daniel P. Berrangé @ 2020-08-27 10:39 UTC (permalink / raw)
  To: Lukas Straub
  Cc: Kevin Wolf, qemu-block, Juan Quintela, qemu-devel, Max Reitz,
	Paolo Bonzini, Marc-André Lureau, Dr. David Alan Gilbert

On Tue, Aug 04, 2020 at 10:11:45AM +0200, Lukas Straub wrote:
> Register yank functions on sockets to shut them down.
> 
> Signed-off-by: Lukas Straub <lukasstraub2@web.de>
> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  migration/channel.c           | 12 ++++++++++++
>  migration/migration.c         | 25 ++++++++++++++++++++++++-
>  migration/multifd.c           | 10 ++++++++++
>  migration/qemu-file-channel.c |  6 ++++++
>  migration/savevm.c            |  6 ++++++
>  tests/Makefile.include        |  2 +-
>  6 files changed, 59 insertions(+), 2 deletions(-)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 0/8] Introduce 'yank' oob qmp command to recover from hanging qemu
  2020-08-27  8:42   ` Lukas Straub
@ 2020-08-27 10:41     ` Daniel P. Berrangé
  2020-08-27 14:18       ` Markus Armbruster
  2020-08-27 17:58       ` Dr. David Alan Gilbert
  0 siblings, 2 replies; 25+ messages in thread
From: Daniel P. Berrangé @ 2020-08-27 10:41 UTC (permalink / raw)
  To: Lukas Straub
  Cc: Kevin Wolf, qemu-block, Juan Quintela, qemu-devel, Max Reitz,
	Paolo Bonzini, Marc-André Lureau, Dr. David Alan Gilbert

On Thu, Aug 27, 2020 at 10:42:46AM +0200, Lukas Straub wrote:
> On Tue, 18 Aug 2020 14:26:31 +0200
> Lukas Straub <lukasstraub2@web.de> wrote:
> 
> > On Tue, 4 Aug 2020 10:11:22 +0200
> > Lukas Straub <lukasstraub2@web.de> wrote:
> > 
> > > Hello Everyone,
> > > In many cases, if qemu has a network connection (qmp, migration, chardev, etc.)
> > > to some other server and that server dies or hangs, qemu hangs too.
> > > These patches introduce the new 'yank' out-of-band qmp command to recover from
> > > these kinds of hangs. The different subsystems register callbacks which get
> > > executed with the yank command. For example the callback can shutdown() a
> > > socket. This is intended for the colo use-case, but it can be used for other
> > > things too of course.
> > > 
> > > Regards,
> > > Lukas Straub
> > > 
> > > v7:
> > >  -yank_register_instance now returns error via Error **errp instead of aborting
> > >  -dropped "chardev/char.c: Check for duplicate id before  creating chardev"
> > > 
> > > v6:
> > >  -add Reviewed-by and Acked-by tags
> > >  -rebase on master
> > >  -lots of changes in nbd due to rebase
> > >  -only take maintainership of util/yank.c and include/qemu/yank.h (Daniel P. Berrangé)
> > >  -fix a crash discovered by the newly added chardev test
> > >  -fix the test itself
> > > 
> > > v5:
> > >  -move yank.c to util/
> > >  -move yank.h to include/qemu/
> > >  -add license to yank.h
> > >  -use const char*
> > >  -nbd: use atomic_store_release and atomic_load_aqcuire
> > >  -io-channel: ensure thread-safety and document it
> > >  -add myself as maintainer for yank
> > > 
> > > v4:
> > >  -fix build errors...
> > > 
> > > v3:
> > >  -don't touch softmmu/vl.c, use __contructor__ attribute instead (Paolo Bonzini)
> > >  -fix build errors
> > >  -rewrite migration patch so it actually passes all tests
> > > 
> > > v2:
> > >  -don't touch io/ code anymore
> > >  -always register yank functions
> > >  -'yank' now takes a list of instances to yank
> > >  -'query-yank' returns a list of yankable instances
> > > 
> > > Lukas Straub (8):
> > >   Introduce yank feature
> > >   block/nbd.c: Add yank feature
> > >   chardev/char-socket.c: Add yank feature
> > >   migration: Add yank feature
> > >   io/channel-tls.c: make qio_channel_tls_shutdown thread-safe
> > >   io: Document thread-safety of qio_channel_shutdown
> > >   MAINTAINERS: Add myself as maintainer for yank feature
> > >   tests/test-char.c: Wait for the chardev to connect in
> > >     char_socket_client_dupid_test
> > > 
> > >  MAINTAINERS                   |   6 ++
> > >  block/nbd.c                   | 129 +++++++++++++++---------
> > >  chardev/char-socket.c         |  31 ++++++
> > >  include/io/channel.h          |   2 +
> > >  include/qemu/yank.h           |  80 +++++++++++++++
> > >  io/channel-tls.c              |   6 +-
> > >  migration/channel.c           |  12 +++
> > >  migration/migration.c         |  25 ++++-
> > >  migration/multifd.c           |  10 ++
> > >  migration/qemu-file-channel.c |   6 ++
> > >  migration/savevm.c            |   6 ++
> > >  qapi/misc.json                |  45 +++++++++
> > >  tests/Makefile.include        |   2 +-
> > >  tests/test-char.c             |   1 +
> > >  util/Makefile.objs            |   1 +
> > >  util/yank.c                   | 184 ++++++++++++++++++++++++++++++++++
> > >  16 files changed, 493 insertions(+), 53 deletions(-)
> > >  create mode 100644 include/qemu/yank.h
> > >  create mode 100644 util/yank.c
> > > 
> > > --
> > > 2.20.1  
> > 
> > Ping...
> 
> Ping 2...
> 
> Also, can the different subsystems have a look at this and give their ok?

We need ACKs from the NBD, migration and chardev maintainers, for the
respective patches, then I think this series is ready for a pull request.

Once acks arrive, I'm happy to send a PULL unless someone else has a
desire todo it.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 1/8] Introduce yank feature
  2020-08-04  8:11 ` [PATCH v7 1/8] Introduce yank feature Lukas Straub
  2020-08-27 10:31   ` Daniel P. Berrangé
@ 2020-08-27 12:37   ` Markus Armbruster
  2020-08-27 14:28     ` Daniel P. Berrangé
                       ` (2 more replies)
  1 sibling, 3 replies; 25+ messages in thread
From: Markus Armbruster @ 2020-08-27 12:37 UTC (permalink / raw)
  To: Lukas Straub
  Cc: Kevin Wolf, Daniel P. Berrangé,
	qemu-block, Juan Quintela, Dr. David Alan Gilbert, qemu-devel,
	Marc-André Lureau, Paolo Bonzini, Max Reitz

I apologize for not reviewing this much earlier.

Lukas Straub <lukasstraub2@web.de> writes:

> The yank feature allows to recover from hanging qemu by "yanking"
> at various parts. Other qemu systems can register themselves and
> multiple yank functions. Then all yank functions for selected
> instances can be called by the 'yank' out-of-band qmp command.
> Available instances can be queried by a 'query-yank' oob command.
>
> Signed-off-by: Lukas Straub <lukasstraub2@web.de>
> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  include/qemu/yank.h |  80 +++++++++++++++++++
>  qapi/misc.json      |  45 +++++++++++
>  util/Makefile.objs  |   1 +
>  util/yank.c         | 184 ++++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 310 insertions(+)
>  create mode 100644 include/qemu/yank.h
>  create mode 100644 util/yank.c
>
> diff --git a/include/qemu/yank.h b/include/qemu/yank.h
> new file mode 100644
> index 0000000000..cd184fcd05
> --- /dev/null
> +++ b/include/qemu/yank.h
> @@ -0,0 +1,80 @@
> +/*
> + * QEMU yank feature
> + *
> + * Copyright (c) Lukas Straub <lukasstraub2@web.de>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#ifndef YANK_H
> +#define YANK_H
> +
> +typedef void (YankFn) (void *opaque);

No space before parameter lists, please.

> +
> +/**
> + * yank_register_instance: Register a new instance.
> + *
> + * This registers a new instance for yanking. Must be called before any yank
> + * function is registered for this instance.
> + *
> + * This function is thread-safe.
> + *
> + * @instance_name: The globally unique name of the instance.
> + * @errp: ...
> + */
> +void yank_register_instance(const char *instance_name, Error **errp);
> +
> +/**
> + * yank_unregister_instance: Unregister a instance.
> + *
> + * This unregisters a instance. Must be called only after every yank function
> + * of the instance has been unregistered.
> + *
> + * This function is thread-safe.
> + *
> + * @instance_name: The name of the instance.
> + */
> +void yank_unregister_instance(const char *instance_name);
> +
> +/**
> + * yank_register_function: Register a yank function
> + *
> + * This registers a yank function. All limitations of qmp oob commands apply
> + * to the yank function as well.

The restrictions are documented in docs/devel/qapi-code-gen.txt under
"An OOB-capable command handler must satisfy the following conditions".
Let's point there,

> + *
> + * This function is thread-safe.
> + *
> + * @instance_name: The name of the instance
> + * @func: The yank function
> + * @opaque: Will be passed to the yank function
> + */
> +void yank_register_function(const char *instance_name,
> +                            YankFn *func,
> +                            void *opaque);

Pardon my ignorance... can you explain to me why a yank instance may
have multiple functions?

> +
> +/**
> + * yank_unregister_function: Unregister a yank function
> + *
> + * This unregisters a yank function.
> + *
> + * This function is thread-safe.
> + *
> + * @instance_name: The name of the instance
> + * @func: func that was passed to yank_register_function
> + * @opaque: opaque that was passed to yank_register_function
> + */
> +void yank_unregister_function(const char *instance_name,
> +                              YankFn *func,
> +                              void *opaque);
> +
> +/**
> + * yank_unregister_function: Generic yank function for iochannel

Pasto, should be

    * yank_generic_iochannel: ...

> + *
> + * This is a generic yank function which will call qio_channel_shutdown on the
> + * provided QIOChannel.
> + *
> + * @opaque: QIOChannel to shutdown
> + */
> +void yank_generic_iochannel(void *opaque);
> +#endif
> diff --git a/qapi/misc.json b/qapi/misc.json
> index 9d32820dc1..0d6a8f20b7 100644
> --- a/qapi/misc.json
> +++ b/qapi/misc.json
> @@ -1615,3 +1615,48 @@
>  ##
>  { 'command': 'query-vm-generation-id', 'returns': 'GuidInfo' }
>
> +##
> +# @YankInstances:
> +#
> +# @instances: List of yank instances.
> +#
> +# Yank instances are named after the following schema:
> +# "blockdev:<node-name>", "chardev:<chardev-name>" and "migration"
> +#
> +# Since: 5.1
> +##
> +{ 'struct': 'YankInstances', 'data': {'instances': ['str'] } }

I'm afraid this is a problematic QMP interface.

By making YankInstances a struct, you keep the door open to adding more
members, which is good.

But by making its 'instances' member a ['str'], you close the door to
using anything but a single string for the individual instances.  Not so
good.

The single string encodes information which QMP client will need to
parse from the string.  We frown on that in QMP.  Use QAPI complex types
capabilities for structured data.

Could you use something like this instead?

{ 'enum': 'YankInstanceType',
  'data': { 'block-node', 'chardev', 'migration' } }

{ 'struct': 'YankInstanceBlockNode',
  'data': { 'node-name': 'str' } }

{ 'struct': 'YankInstanceChardev',
  'data' { 'label': 'str' } }

{ 'union': 'YankInstance',
  'base': { 'type': 'YankInstanceType' },
  'discriminator': 'type',
  'data': {
      'block-node': 'YankInstanceBlockNode',
      'chardev': 'YankInstanceChardev' } }

{ 'command': 'yank',
  'data': { 'instances': ['YankInstance'] },
  'allow-oob': true }

If you're confident nothing will ever be added to YankInstanceBlockNode
and YankInstanceChardev, you could use str instead.

> +
> +##
> +# @yank:
> +#
> +# Recover from hanging qemu by yanking the specified instances.

What's an "instance", and what does it mean to "yank" it?

The documentation of YankInstances above gives a clue on what an
"instance" is: presumably a block node, a character device or the
migration job.

I guess a YankInstance is whatever the code chooses to make one, and the
current code makes these three kinds.

Does it make every block node a YankInstance?  If not, which ones?

Does it make every character device a YankInstance?  If not, which ones?

Does it make migration always a YankInstance?  If not, when?

> +#
> +# Takes @YankInstances as argument.
> +#
> +# Returns: nothing.
> +#
> +# Example:
> +#
> +# -> { "execute": "yank", "arguments": { "instances": ["blockdev:nbd0"] } }
> +# <- { "return": {} }
> +#
> +# Since: 5.1
> +##
> +{ 'command': 'yank', 'data': 'YankInstances', 'allow-oob': true }
> +
> +##
> +# @query-yank:
> +#
> +# Query yank instances.
> +#
> +# Returns: @YankInstances
> +#
> +# Example:
> +#
> +# -> { "execute": "query-yank" }
> +# <- { "return": { "instances": ["blockdev:nbd0"] } }
> +#
> +# Since: 5.1
> +##
> +{ 'command': 'query-yank', 'returns': 'YankInstances', 'allow-oob': true }
> diff --git a/util/Makefile.objs b/util/Makefile.objs
> index cc5e37177a..13faa98425 100644
> --- a/util/Makefile.objs
> +++ b/util/Makefile.objs
> @@ -45,6 +45,7 @@ util-obj-$(CONFIG_GIO) += dbus.o
>  dbus.o-cflags = $(GIO_CFLAGS)
>  dbus.o-libs = $(GIO_LIBS)
>  util-obj-$(CONFIG_USER_ONLY) += selfmap.o
> +util-obj-y += yank.o
>
>  #######################################################################
>  # code used by both qemu system emulation and qemu-img
> diff --git a/util/yank.c b/util/yank.c
> new file mode 100644
> index 0000000000..b0cd27728b
> --- /dev/null
> +++ b/util/yank.c
> @@ -0,0 +1,184 @@
> +/*
> + * QEMU yank feature
> + *
> + * Copyright (c) Lukas Straub <lukasstraub2@web.de>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qapi/error.h"
> +#include "qemu/thread.h"
> +#include "qemu/queue.h"
> +#include "qapi/qapi-commands-misc.h"
> +#include "io/channel.h"
> +#include "qemu/yank.h"
> +
> +struct YankFuncAndParam {
> +    YankFn *func;
> +    void *opaque;
> +    QLIST_ENTRY(YankFuncAndParam) next;
> +};
> +
> +struct YankInstance {
> +    char *name;
> +    QLIST_HEAD(, YankFuncAndParam) yankfns;
> +    QLIST_ENTRY(YankInstance) next;
> +};
> +
> +static QemuMutex lock;

Please give the variable a more telling name, such as @yank_lock, and
document what exactly the lock protects.  I can guess it's just the list
that immediately follows, but I prefer to be explicit when it comes to
locking.

> +static QLIST_HEAD(yankinst_list, YankInstance) head
> +    = QLIST_HEAD_INITIALIZER(head);

Please give the variable a more telling name, such as
@yank_instance_list.

I doubt there is a need to name the struct.  This should do:

   static QLIST_HEAD(, YankInstance) yank_instance_list

> +
> +static struct YankInstance *yank_find_instance(const char *name)
> +{
> +    struct YankInstance *tmp, *instance;
> +    instance = NULL;
> +    QLIST_FOREACH(tmp, &head, next) {
> +        if (!strcmp(tmp->name, name)) {
> +            instance = tmp;
> +        }
> +    }
> +    return instance;
> +}

Suggest

   static struct YankInstance *yank_find_instance(const char *name)
   {
       struct YankInstance *instance;

       QLIST_FOREACH(instance, &head, next) {
           if (!strcmp(instance->name, name)) {
               return instance;
           }
       }
       return NULL;
   }

> +
> +void yank_register_instance(const char *instance_name, Error **errp)
> +{
> +    struct YankInstance *instance;
> +
> +    qemu_mutex_lock(&lock);
> +
> +    if (yank_find_instance(instance_name)) {
> +        error_setg(errp, "duplicate yank instance name: '%s'", instance_name);

Rather long line, suggest to wrap before the last argument.

> +        qemu_mutex_unlock(&lock);
> +        return;
> +    }
> +
> +    instance = g_slice_new(struct YankInstance);
> +    instance->name = g_strdup(instance_name);
> +    QLIST_INIT(&instance->yankfns);
> +    QLIST_INSERT_HEAD(&head, instance, next);
> +
> +    qemu_mutex_unlock(&lock);
> +}
> +
> +void yank_unregister_instance(const char *instance_name)
> +{
> +    struct YankInstance *instance;
> +
> +    qemu_mutex_lock(&lock);
> +    instance = yank_find_instance(instance_name);
> +    assert(instance);
> +
> +    assert(QLIST_EMPTY(&instance->yankfns));
> +    QLIST_REMOVE(instance, next);
> +    g_free(instance->name);
> +    g_slice_free(struct YankInstance, instance);
> +
> +    qemu_mutex_unlock(&lock);
> +}
> +
> +void yank_register_function(const char *instance_name,
> +                            YankFn *func,
> +                            void *opaque)
> +{
> +    struct YankInstance *instance;
> +    struct YankFuncAndParam *entry;
> +
> +    qemu_mutex_lock(&lock);
> +    instance = yank_find_instance(instance_name);
> +    assert(instance);
> +
> +    entry = g_slice_new(struct YankFuncAndParam);
> +    entry->func = func;
> +    entry->opaque = opaque;
> +
> +    QLIST_INSERT_HEAD(&instance->yankfns, entry, next);
> +    qemu_mutex_unlock(&lock);
> +}
> +
> +void yank_unregister_function(const char *instance_name,
> +                              YankFn *func,
> +                              void *opaque)
> +{
> +    struct YankInstance *instance;
> +    struct YankFuncAndParam *entry;
> +
> +    qemu_mutex_lock(&lock);
> +    instance = yank_find_instance(instance_name);
> +    assert(instance);
> +
> +    QLIST_FOREACH(entry, &instance->yankfns, next) {
> +        if (entry->func == func && entry->opaque == opaque) {
> +            QLIST_REMOVE(entry, next);
> +            g_slice_free(struct YankFuncAndParam, entry);
> +            qemu_mutex_unlock(&lock);
> +            return;
> +        }
> +    }
> +
> +    abort();
> +}
> +
> +void yank_generic_iochannel(void *opaque)
> +{
> +    QIOChannel *ioc = QIO_CHANNEL(opaque);
> +
> +    qio_channel_shutdown(ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
> +}
> +
> +void qmp_yank(strList *instances,
> +              Error **errp)
> +{
> +    strList *tmp;
> +    struct YankInstance *instance;
> +    struct YankFuncAndParam *entry;
> +
> +    qemu_mutex_lock(&lock);
> +    tmp = instances;
> +    for (; tmp; tmp = tmp->next) {

Make that

       for (tail = instances; tail; tail = tail->next) {

> +        instance = yank_find_instance(tmp->value);
> +        if (!instance) {
> +            error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND,
> +                      "Instance '%s' not found", tmp->value);
> +            qemu_mutex_unlock(&lock);
> +            return;
> +        }
> +    }
> +    tmp = instances;
> +    for (; tmp; tmp = tmp->next) {

Likewise.

> +        instance = yank_find_instance(tmp->value);
> +        assert(instance);
> +        QLIST_FOREACH(entry, &instance->yankfns, next) {
> +            entry->func(entry->opaque);
> +        }
> +    }
> +    qemu_mutex_unlock(&lock);
> +}
> +
> +YankInstances *qmp_query_yank(Error **errp)
> +{
> +    struct YankInstance *instance;
> +    YankInstances *ret;
> +
> +    ret = g_new0(YankInstances, 1);
> +    ret->instances = NULL;
> +
> +    qemu_mutex_lock(&lock);
> +    QLIST_FOREACH(instance, &head, next) {
> +        strList *entry;
> +        entry = g_new0(strList, 1);
> +        entry->value = g_strdup(instance->name);
> +        entry->next = ret->instances;
> +        ret->instances = entry;
> +    }
> +    qemu_mutex_unlock(&lock);
> +
> +    return ret;
> +}
> +
> +static void __attribute__((__constructor__)) yank_init(void)
> +{
> +    qemu_mutex_init(&lock);
> +}
> --
> 2.20.1

The two QMP commands permit out-of-band execution ('allow-oob': true).
OOB is easy to get wrong, but I figure you have a legitimate use case.
Let's review the restrictions documented in
docs/devel/qapi-code-gen.txt:

    An OOB-capable command handler must satisfy the following conditions:

    - It terminates quickly.
    - It does not invoke system calls that may block.
    - It does not access guest RAM that may block when userfaultfd is
      enabled for postcopy live migration.
    - It takes only "fast" locks, i.e. all critical sections protected by
      any lock it takes also satisfy the conditions for OOB command
      handler code.

Since the command handlers take &lock, the restrictions apply to the
other critical sections protected by &lock as well.  I believe these are
all okay: they do nothing but allocate, initialize and free memory.

The restrictions also apply to the YankFn callbacks, but you documented
that.  Okay.

The one such callback included in this patch is
yank_generic_iochannel(), which is a thin wrapper around
qio_channel_shutdown(), which in turn runs the io_shutdown method.
Thus, the restructions also apply to all the io_shutdown methods.
That's not documented.

Daniel, should it be documented?



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 0/8] Introduce 'yank' oob qmp command to recover from hanging qemu
  2020-08-27 10:41     ` Daniel P. Berrangé
@ 2020-08-27 14:18       ` Markus Armbruster
  2020-08-27 17:58       ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 25+ messages in thread
From: Markus Armbruster @ 2020-08-27 14:18 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Kevin Wolf, Lukas Straub, qemu-block, Juan Quintela, qemu-devel,
	Max Reitz, Marc-André Lureau, Paolo Bonzini,
	Dr. David Alan Gilbert

Daniel P. Berrangé <berrange@redhat.com> writes:

> On Thu, Aug 27, 2020 at 10:42:46AM +0200, Lukas Straub wrote:
[...]
>> Also, can the different subsystems have a look at this and give their ok?
>
> We need ACKs from the NBD, migration and chardev maintainers, for the
> respective patches, then I think this series is ready for a pull request.

The QMP interface and its documentation need a bit of work, see my
review of PATCH 1.  I'm hopeful v8 will nail it.

> Once acks arrive, I'm happy to send a PULL unless someone else has a
> desire todo it.

Not yet, please.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 1/8] Introduce yank feature
  2020-08-27 12:37   ` Markus Armbruster
@ 2020-08-27 14:28     ` Daniel P. Berrangé
  2020-08-28 14:21     ` Lukas Straub
  2020-09-04 12:33     ` Lukas Straub
  2 siblings, 0 replies; 25+ messages in thread
From: Daniel P. Berrangé @ 2020-08-27 14:28 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Kevin Wolf, Lukas Straub, qemu-block, Juan Quintela,
	Dr. David Alan Gilbert, qemu-devel, Marc-André Lureau,
	Paolo Bonzini, Max Reitz

On Thu, Aug 27, 2020 at 02:37:00PM +0200, Markus Armbruster wrote:
> I apologize for not reviewing this much earlier.
> 
> Lukas Straub <lukasstraub2@web.de> writes:
> 
> > The yank feature allows to recover from hanging qemu by "yanking"
> > at various parts. Other qemu systems can register themselves and
> > multiple yank functions. Then all yank functions for selected
> > instances can be called by the 'yank' out-of-band qmp command.
> > Available instances can be queried by a 'query-yank' oob command.
> >
> > Signed-off-by: Lukas Straub <lukasstraub2@web.de>
> > Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> >  include/qemu/yank.h |  80 +++++++++++++++++++
> >  qapi/misc.json      |  45 +++++++++++
> >  util/Makefile.objs  |   1 +
> >  util/yank.c         | 184 ++++++++++++++++++++++++++++++++++++++++++++
> >  4 files changed, 310 insertions(+)
> >  create mode 100644 include/qemu/yank.h
> >  create mode 100644 util/yank.c


> > + *
> > + * This function is thread-safe.
> > + *
> > + * @instance_name: The name of the instance
> > + * @func: The yank function
> > + * @opaque: Will be passed to the yank function
> > + */
> > +void yank_register_function(const char *instance_name,
> > +                            YankFn *func,
> > +                            void *opaque);
> 
> Pardon my ignorance... can you explain to me why a yank instance may
> have multiple functions?

multifd migration - there's a single migration "instance", which
has multiple FDs open, each of which has a func registered.


> > diff --git a/qapi/misc.json b/qapi/misc.json
> > index 9d32820dc1..0d6a8f20b7 100644
> > --- a/qapi/misc.json
> > +++ b/qapi/misc.json
> > @@ -1615,3 +1615,48 @@
> >  ##
> >  { 'command': 'query-vm-generation-id', 'returns': 'GuidInfo' }
> >
> > +##
> > +# @YankInstances:
> > +#
> > +# @instances: List of yank instances.
> > +#
> > +# Yank instances are named after the following schema:
> > +# "blockdev:<node-name>", "chardev:<chardev-name>" and "migration"
> > +#
> > +# Since: 5.1
> > +##
> > +{ 'struct': 'YankInstances', 'data': {'instances': ['str'] } }
> 
> I'm afraid this is a problematic QMP interface.
> 
> By making YankInstances a struct, you keep the door open to adding more
> members, which is good.
> 
> But by making its 'instances' member a ['str'], you close the door to
> using anything but a single string for the individual instances.  Not so
> good.
> 
> The single string encodes information which QMP client will need to
> parse from the string.  We frown on that in QMP.  Use QAPI complex types
> capabilities for structured data.
> 
> Could you use something like this instead?
> 
> { 'enum': 'YankInstanceType',
>   'data': { 'block-node', 'chardev', 'migration' } }
> 
> { 'struct': 'YankInstanceBlockNode',
>   'data': { 'node-name': 'str' } }
> 
> { 'struct': 'YankInstanceChardev',
>   'data' { 'label': 'str' } }
> 
> { 'union': 'YankInstance',
>   'base': { 'type': 'YankInstanceType' },
>   'discriminator': 'type',
>   'data': {
>       'block-node': 'YankInstanceBlockNode',
>       'chardev': 'YankInstanceChardev' } }
> 
> { 'command': 'yank',
>   'data': { 'instances': ['YankInstance'] },
>   'allow-oob': true }
> 
> If you're confident nothing will ever be added to YankInstanceBlockNode
> and YankInstanceChardev, you could use str instead.

I raised this idea back in the v1 posting. There's a thread starting
at this:

  https://lists.gnu.org/archive/html/qemu-devel/2020-05/msg02881.html

which eventually concluded that plain string is best.

I think that's right because the yank feature is providing generic
infrastructure with zero knowledge of backends that are using it.
The only interaction between the yank functionality and its callers
is via an opaque function callback. So there's no conceptual place
at which the yank infra would want to know about the backends nor
pass any backend specific config params to it.

> > +##
> > +# @yank:
> > +#
> > +# Recover from hanging qemu by yanking the specified instances.
> 
> What's an "instance", and what does it mean to "yank" it?
> 
> The documentation of YankInstances above gives a clue on what an
> "instance" is: presumably a block node, a character device or the
> migration job.
> 
> I guess a YankInstance is whatever the code chooses to make one, and the
> current code makes these three kinds.
> 
> Does it make every block node a YankInstance?  If not, which ones?
> 
> Does it make every character device a YankInstance?  If not, which ones?
> 
> Does it make migration always a YankInstance?  If not, when?

From the POV of the yank code, the "instance" is intentionally opaque.
Whatever the instance wants todo with its callback is acceptable, as
long as it isn't violating the rules for the callback by doing stuff
which can block. In practice right now, an instance is anything which
has a network connection associated with it, but it doesn't have to be
restricted to just networking. Anything which is talking to a service
which can get "stuck" is in scope for supporting yanking.

eg I could imagine an instance having some external helper process and
the yank callback would kill the process 



> > +static void __attribute__((__constructor__)) yank_init(void)
> > +{
> > +    qemu_mutex_init(&lock);
> > +}
> > --
> > 2.20.1
> 
> The two QMP commands permit out-of-band execution ('allow-oob': true).
> OOB is easy to get wrong, but I figure you have a legitimate use case.
> Let's review the restrictions documented in
> docs/devel/qapi-code-gen.txt:
> 
>     An OOB-capable command handler must satisfy the following conditions:
> 
>     - It terminates quickly.
>     - It does not invoke system calls that may block.
>     - It does not access guest RAM that may block when userfaultfd is
>       enabled for postcopy live migration.
>     - It takes only "fast" locks, i.e. all critical sections protected by
>       any lock it takes also satisfy the conditions for OOB command
>       handler code.
> 
> Since the command handlers take &lock, the restrictions apply to the
> other critical sections protected by &lock as well.  I believe these are
> all okay: they do nothing but allocate, initialize and free memory.
> 
> The restrictions also apply to the YankFn callbacks, but you documented
> that.  Okay.
> 
> The one such callback included in this patch is
> yank_generic_iochannel(), which is a thin wrapper around
> qio_channel_shutdown(), which in turn runs the io_shutdown method.
> Thus, the restructions also apply to all the io_shutdown methods.
> That's not documented.
> 
> Daniel, should it be documented?

Patch 6 documents that the qio_channel_shutdown method must be
thread safe but perhaps the doc could be more explicit

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 0/8] Introduce 'yank' oob qmp command to recover from hanging qemu
  2020-08-27 10:41     ` Daniel P. Berrangé
  2020-08-27 14:18       ` Markus Armbruster
@ 2020-08-27 17:58       ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 25+ messages in thread
From: Dr. David Alan Gilbert @ 2020-08-27 17:58 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Kevin Wolf, Lukas Straub, qemu-block, Juan Quintela, qemu-devel,
	Max Reitz, Paolo Bonzini, Marc-André Lureau

* Daniel P. Berrangé (berrange@redhat.com) wrote:
> On Thu, Aug 27, 2020 at 10:42:46AM +0200, Lukas Straub wrote:
> > On Tue, 18 Aug 2020 14:26:31 +0200
> > Lukas Straub <lukasstraub2@web.de> wrote:
> > 
> > > On Tue, 4 Aug 2020 10:11:22 +0200
> > > Lukas Straub <lukasstraub2@web.de> wrote:
> > > 
> > > > Hello Everyone,
> > > > In many cases, if qemu has a network connection (qmp, migration, chardev, etc.)
> > > > to some other server and that server dies or hangs, qemu hangs too.
> > > > These patches introduce the new 'yank' out-of-band qmp command to recover from
> > > > these kinds of hangs. The different subsystems register callbacks which get
> > > > executed with the yank command. For example the callback can shutdown() a
> > > > socket. This is intended for the colo use-case, but it can be used for other
> > > > things too of course.
> > > > 
> > > > Regards,
> > > > Lukas Straub
> > > > 
> > > > v7:
> > > >  -yank_register_instance now returns error via Error **errp instead of aborting
> > > >  -dropped "chardev/char.c: Check for duplicate id before  creating chardev"
> > > > 
> > > > v6:
> > > >  -add Reviewed-by and Acked-by tags
> > > >  -rebase on master
> > > >  -lots of changes in nbd due to rebase
> > > >  -only take maintainership of util/yank.c and include/qemu/yank.h (Daniel P. Berrangé)
> > > >  -fix a crash discovered by the newly added chardev test
> > > >  -fix the test itself
> > > > 
> > > > v5:
> > > >  -move yank.c to util/
> > > >  -move yank.h to include/qemu/
> > > >  -add license to yank.h
> > > >  -use const char*
> > > >  -nbd: use atomic_store_release and atomic_load_aqcuire
> > > >  -io-channel: ensure thread-safety and document it
> > > >  -add myself as maintainer for yank
> > > > 
> > > > v4:
> > > >  -fix build errors...
> > > > 
> > > > v3:
> > > >  -don't touch softmmu/vl.c, use __contructor__ attribute instead (Paolo Bonzini)
> > > >  -fix build errors
> > > >  -rewrite migration patch so it actually passes all tests
> > > > 
> > > > v2:
> > > >  -don't touch io/ code anymore
> > > >  -always register yank functions
> > > >  -'yank' now takes a list of instances to yank
> > > >  -'query-yank' returns a list of yankable instances
> > > > 
> > > > Lukas Straub (8):
> > > >   Introduce yank feature
> > > >   block/nbd.c: Add yank feature
> > > >   chardev/char-socket.c: Add yank feature
> > > >   migration: Add yank feature
> > > >   io/channel-tls.c: make qio_channel_tls_shutdown thread-safe
> > > >   io: Document thread-safety of qio_channel_shutdown
> > > >   MAINTAINERS: Add myself as maintainer for yank feature
> > > >   tests/test-char.c: Wait for the chardev to connect in
> > > >     char_socket_client_dupid_test
> > > > 
> > > >  MAINTAINERS                   |   6 ++
> > > >  block/nbd.c                   | 129 +++++++++++++++---------
> > > >  chardev/char-socket.c         |  31 ++++++
> > > >  include/io/channel.h          |   2 +
> > > >  include/qemu/yank.h           |  80 +++++++++++++++
> > > >  io/channel-tls.c              |   6 +-
> > > >  migration/channel.c           |  12 +++
> > > >  migration/migration.c         |  25 ++++-
> > > >  migration/multifd.c           |  10 ++
> > > >  migration/qemu-file-channel.c |   6 ++
> > > >  migration/savevm.c            |   6 ++
> > > >  qapi/misc.json                |  45 +++++++++
> > > >  tests/Makefile.include        |   2 +-
> > > >  tests/test-char.c             |   1 +
> > > >  util/Makefile.objs            |   1 +
> > > >  util/yank.c                   | 184 ++++++++++++++++++++++++++++++++++
> > > >  16 files changed, 493 insertions(+), 53 deletions(-)
> > > >  create mode 100644 include/qemu/yank.h
> > > >  create mode 100644 util/yank.c
> > > > 
> > > > --
> > > > 2.20.1  
> > > 
> > > Ping...
> > 
> > Ping 2...
> > 
> > Also, can the different subsystems have a look at this and give their ok?
> 
> We need ACKs from the NBD, migration and chardev maintainers, for the
> respective patches, then I think this series is ready for a pull request.

I'm happy from Migration:

Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> Once acks arrive, I'm happy to send a PULL unless someone else has a
> desire todo it.

Looks like Markus would like a QMP tweak; but other than that I'd also
be happy to take it via migration; whichever is easiest.

Dave

> 
> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 1/8] Introduce yank feature
  2020-08-27 12:37   ` Markus Armbruster
  2020-08-27 14:28     ` Daniel P. Berrangé
@ 2020-08-28 14:21     ` Lukas Straub
  2020-08-31  7:47       ` Markus Armbruster
  2020-09-04 12:33     ` Lukas Straub
  2 siblings, 1 reply; 25+ messages in thread
From: Lukas Straub @ 2020-08-28 14:21 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Kevin Wolf, Daniel P. Berrangé,
	qemu-block, Juan Quintela, Dr. David Alan Gilbert, qemu-devel,
	Marc-André Lureau, Paolo Bonzini, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 16863 bytes --]

On Thu, 27 Aug 2020 14:37:00 +0200
Markus Armbruster <armbru@redhat.com> wrote:

> I apologize for not reviewing this much earlier.
> 
> Lukas Straub <lukasstraub2@web.de> writes:
> 
> > The yank feature allows to recover from hanging qemu by "yanking"
> > at various parts. Other qemu systems can register themselves and
> > multiple yank functions. Then all yank functions for selected
> > instances can be called by the 'yank' out-of-band qmp command.
> > Available instances can be queried by a 'query-yank' oob command.
> >
> > Signed-off-by: Lukas Straub <lukasstraub2@web.de>
> > Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> >  include/qemu/yank.h |  80 +++++++++++++++++++
> >  qapi/misc.json      |  45 +++++++++++
> >  util/Makefile.objs  |   1 +
> >  util/yank.c         | 184 ++++++++++++++++++++++++++++++++++++++++++++
> >  4 files changed, 310 insertions(+)
> >  create mode 100644 include/qemu/yank.h
> >  create mode 100644 util/yank.c
> >
> > diff --git a/include/qemu/yank.h b/include/qemu/yank.h
> > new file mode 100644
> > index 0000000000..cd184fcd05
> > --- /dev/null
> > +++ b/include/qemu/yank.h
> > @@ -0,0 +1,80 @@
> > +/*
> > + * QEMU yank feature
> > + *
> > + * Copyright (c) Lukas Straub <lukasstraub2@web.de>
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > + * See the COPYING file in the top-level directory.
> > + */
> > +
> > +#ifndef YANK_H
> > +#define YANK_H
> > +
> > +typedef void (YankFn) (void *opaque);  
> 
> No space before parameter lists, please.

Ok, I will fix this in the next version.

> > +
> > +/**
> > + * yank_register_instance: Register a new instance.
> > + *
> > + * This registers a new instance for yanking. Must be called before any yank
> > + * function is registered for this instance.
> > + *
> > + * This function is thread-safe.
> > + *
> > + * @instance_name: The globally unique name of the instance.
> > + * @errp: ...
> > + */
> > +void yank_register_instance(const char *instance_name, Error **errp);
> > +
> > +/**
> > + * yank_unregister_instance: Unregister a instance.
> > + *
> > + * This unregisters a instance. Must be called only after every yank function
> > + * of the instance has been unregistered.
> > + *
> > + * This function is thread-safe.
> > + *
> > + * @instance_name: The name of the instance.
> > + */
> > +void yank_unregister_instance(const char *instance_name);
> > +
> > +/**
> > + * yank_register_function: Register a yank function
> > + *
> > + * This registers a yank function. All limitations of qmp oob commands apply
> > + * to the yank function as well.  
> 
> The restrictions are documented in docs/devel/qapi-code-gen.txt under
> "An OOB-capable command handler must satisfy the following conditions".
> Let's point there,

I will fix this in the next version.

> > + *
> > + * This function is thread-safe.
> > + *
> > + * @instance_name: The name of the instance
> > + * @func: The yank function
> > + * @opaque: Will be passed to the yank function
> > + */
> > +void yank_register_function(const char *instance_name,
> > +                            YankFn *func,
> > +                            void *opaque);  
> 
> Pardon my ignorance... can you explain to me why a yank instance may
> have multiple functions?
>
> > +
> > +/**
> > + * yank_unregister_function: Unregister a yank function
> > + *
> > + * This unregisters a yank function.
> > + *
> > + * This function is thread-safe.
> > + *
> > + * @instance_name: The name of the instance
> > + * @func: func that was passed to yank_register_function
> > + * @opaque: opaque that was passed to yank_register_function
> > + */
> > +void yank_unregister_function(const char *instance_name,
> > +                              YankFn *func,
> > +                              void *opaque);
> > +
> > +/**
> > + * yank_unregister_function: Generic yank function for iochannel  
> 
> Pasto, should be
> 
>     * yank_generic_iochannel: ...

I will fix this in the next version.

> > + *
> > + * This is a generic yank function which will call qio_channel_shutdown on the
> > + * provided QIOChannel.
> > + *
> > + * @opaque: QIOChannel to shutdown
> > + */
> > +void yank_generic_iochannel(void *opaque);
> > +#endif
> > diff --git a/qapi/misc.json b/qapi/misc.json
> > index 9d32820dc1..0d6a8f20b7 100644
> > --- a/qapi/misc.json
> > +++ b/qapi/misc.json
> > @@ -1615,3 +1615,48 @@
> >  ##
> >  { 'command': 'query-vm-generation-id', 'returns': 'GuidInfo' }
> >
> > +##
> > +# @YankInstances:
> > +#
> > +# @instances: List of yank instances.
> > +#
> > +# Yank instances are named after the following schema:
> > +# "blockdev:<node-name>", "chardev:<chardev-name>" and "migration"
> > +#
> > +# Since: 5.1
> > +##
> > +{ 'struct': 'YankInstances', 'data': {'instances': ['str'] } }  
> 
> I'm afraid this is a problematic QMP interface.
> 
> By making YankInstances a struct, you keep the door open to adding more
> members, which is good.
> 
> But by making its 'instances' member a ['str'], you close the door to
> using anything but a single string for the individual instances.  Not so
> good.
> 
> The single string encodes information which QMP client will need to
> parse from the string.  We frown on that in QMP.  Use QAPI complex types
> capabilities for structured data.
> 
> Could you use something like this instead?
> 
> { 'enum': 'YankInstanceType',
>   'data': { 'block-node', 'chardev', 'migration' } }
> 
> { 'struct': 'YankInstanceBlockNode',
>   'data': { 'node-name': 'str' } }
> 
> { 'struct': 'YankInstanceChardev',
>   'data' { 'label': 'str' } }
> 
> { 'union': 'YankInstance',
>   'base': { 'type': 'YankInstanceType' },
>   'discriminator': 'type',
>   'data': {
>       'block-node': 'YankInstanceBlockNode',
>       'chardev': 'YankInstanceChardev' } }
> 
> { 'command': 'yank',
>   'data': { 'instances': ['YankInstance'] },
>   'allow-oob': true }
> 
> If you're confident nothing will ever be added to YankInstanceBlockNode
> and YankInstanceChardev, you could use str instead.

As Daniel said, this has already been discussed.

> > +
> > +##
> > +# @yank:
> > +#
> > +# Recover from hanging qemu by yanking the specified instances.  
> 
> What's an "instance", and what does it mean to "yank" it?
> 
> The documentation of YankInstances above gives a clue on what an
> "instance" is: presumably a block node, a character device or the
> migration job.
> 
> I guess a YankInstance is whatever the code chooses to make one, and the
> current code makes these three kinds.
> 
> Does it make every block node a YankInstance?  If not, which ones?
> 
> Does it make every character device a YankInstance?  If not, which ones?
> 
> Does it make migration always a YankInstance?  If not, when?

The yank function is generic so anything more specific than "instance"
doesn't fit. But you are right, I'll improve the documentation here and
list what is currently implemented for yanking and what yanking does in
each case.

> > +#
> > +# Takes @YankInstances as argument.
> > +#
> > +# Returns: nothing.
> > +#
> > +# Example:
> > +#
> > +# -> { "execute": "yank", "arguments": { "instances": ["blockdev:nbd0"] } }
> > +# <- { "return": {} }
> > +#
> > +# Since: 5.1
> > +##
> > +{ 'command': 'yank', 'data': 'YankInstances', 'allow-oob': true }
> > +
> > +##
> > +# @query-yank:
> > +#
> > +# Query yank instances.
> > +#
> > +# Returns: @YankInstances
> > +#
> > +# Example:
> > +#
> > +# -> { "execute": "query-yank" }
> > +# <- { "return": { "instances": ["blockdev:nbd0"] } }
> > +#
> > +# Since: 5.1
> > +##
> > +{ 'command': 'query-yank', 'returns': 'YankInstances', 'allow-oob': true }
> > diff --git a/util/Makefile.objs b/util/Makefile.objs
> > index cc5e37177a..13faa98425 100644
> > --- a/util/Makefile.objs
> > +++ b/util/Makefile.objs
> > @@ -45,6 +45,7 @@ util-obj-$(CONFIG_GIO) += dbus.o
> >  dbus.o-cflags = $(GIO_CFLAGS)
> >  dbus.o-libs = $(GIO_LIBS)
> >  util-obj-$(CONFIG_USER_ONLY) += selfmap.o
> > +util-obj-y += yank.o
> >
> >  #######################################################################
> >  # code used by both qemu system emulation and qemu-img
> > diff --git a/util/yank.c b/util/yank.c
> > new file mode 100644
> > index 0000000000..b0cd27728b
> > --- /dev/null
> > +++ b/util/yank.c
> > @@ -0,0 +1,184 @@
> > +/*
> > + * QEMU yank feature
> > + *
> > + * Copyright (c) Lukas Straub <lukasstraub2@web.de>
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > + * See the COPYING file in the top-level directory.
> > + */
> > +
> > +#include "qemu/osdep.h"
> > +#include "qapi/error.h"
> > +#include "qemu/thread.h"
> > +#include "qemu/queue.h"
> > +#include "qapi/qapi-commands-misc.h"
> > +#include "io/channel.h"
> > +#include "qemu/yank.h"
> > +
> > +struct YankFuncAndParam {
> > +    YankFn *func;
> > +    void *opaque;
> > +    QLIST_ENTRY(YankFuncAndParam) next;
> > +};
> > +
> > +struct YankInstance {
> > +    char *name;
> > +    QLIST_HEAD(, YankFuncAndParam) yankfns;
> > +    QLIST_ENTRY(YankInstance) next;
> > +};
> > +
> > +static QemuMutex lock;  
> 
> Please give the variable a more telling name, such as @yank_lock, and
> document what exactly the lock protects.  I can guess it's just the list
> that immediately follows, but I prefer to be explicit when it comes to
> locking.

I will fix this in the next version.

> > +static QLIST_HEAD(yankinst_list, YankInstance) head
> > +    = QLIST_HEAD_INITIALIZER(head);  
> 
> Please give the variable a more telling name, such as
> @yank_instance_list.
> 
> I doubt there is a need to name the struct.  This should do:
> 
>    static QLIST_HEAD(, YankInstance) yank_instance_list

I will fix this in the next version.

> > +
> > +static struct YankInstance *yank_find_instance(const char *name)
> > +{
> > +    struct YankInstance *tmp, *instance;
> > +    instance = NULL;
> > +    QLIST_FOREACH(tmp, &head, next) {
> > +        if (!strcmp(tmp->name, name)) {
> > +            instance = tmp;
> > +        }
> > +    }
> > +    return instance;
> > +}  
> 
> Suggest
> 
>    static struct YankInstance *yank_find_instance(const char *name)
>    {
>        struct YankInstance *instance;
> 
>        QLIST_FOREACH(instance, &head, next) {
>            if (!strcmp(instance->name, name)) {
>                return instance;
>            }
>        }
>        return NULL;
>    }

I will fix this in the next version.

> > +
> > +void yank_register_instance(const char *instance_name, Error **errp)
> > +{
> > +    struct YankInstance *instance;
> > +
> > +    qemu_mutex_lock(&lock);
> > +
> > +    if (yank_find_instance(instance_name)) {
> > +        error_setg(errp, "duplicate yank instance name: '%s'", instance_name);  
> 
> Rather long line, suggest to wrap before the last argument.

I will fix this in the next version.

> > +        qemu_mutex_unlock(&lock);
> > +        return;
> > +    }
> > +
> > +    instance = g_slice_new(struct YankInstance);
> > +    instance->name = g_strdup(instance_name);
> > +    QLIST_INIT(&instance->yankfns);
> > +    QLIST_INSERT_HEAD(&head, instance, next);
> > +
> > +    qemu_mutex_unlock(&lock);
> > +}
> > +
> > +void yank_unregister_instance(const char *instance_name)
> > +{
> > +    struct YankInstance *instance;
> > +
> > +    qemu_mutex_lock(&lock);
> > +    instance = yank_find_instance(instance_name);
> > +    assert(instance);
> > +
> > +    assert(QLIST_EMPTY(&instance->yankfns));
> > +    QLIST_REMOVE(instance, next);
> > +    g_free(instance->name);
> > +    g_slice_free(struct YankInstance, instance);
> > +
> > +    qemu_mutex_unlock(&lock);
> > +}
> > +
> > +void yank_register_function(const char *instance_name,
> > +                            YankFn *func,
> > +                            void *opaque)
> > +{
> > +    struct YankInstance *instance;
> > +    struct YankFuncAndParam *entry;
> > +
> > +    qemu_mutex_lock(&lock);
> > +    instance = yank_find_instance(instance_name);
> > +    assert(instance);
> > +
> > +    entry = g_slice_new(struct YankFuncAndParam);
> > +    entry->func = func;
> > +    entry->opaque = opaque;
> > +
> > +    QLIST_INSERT_HEAD(&instance->yankfns, entry, next);
> > +    qemu_mutex_unlock(&lock);
> > +}
> > +
> > +void yank_unregister_function(const char *instance_name,
> > +                              YankFn *func,
> > +                              void *opaque)
> > +{
> > +    struct YankInstance *instance;
> > +    struct YankFuncAndParam *entry;
> > +
> > +    qemu_mutex_lock(&lock);
> > +    instance = yank_find_instance(instance_name);
> > +    assert(instance);
> > +
> > +    QLIST_FOREACH(entry, &instance->yankfns, next) {
> > +        if (entry->func == func && entry->opaque == opaque) {
> > +            QLIST_REMOVE(entry, next);
> > +            g_slice_free(struct YankFuncAndParam, entry);
> > +            qemu_mutex_unlock(&lock);
> > +            return;
> > +        }
> > +    }
> > +
> > +    abort();
> > +}
> > +
> > +void yank_generic_iochannel(void *opaque)
> > +{
> > +    QIOChannel *ioc = QIO_CHANNEL(opaque);
> > +
> > +    qio_channel_shutdown(ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
> > +}
> > +
> > +void qmp_yank(strList *instances,
> > +              Error **errp)
> > +{
> > +    strList *tmp;
> > +    struct YankInstance *instance;
> > +    struct YankFuncAndParam *entry;
> > +
> > +    qemu_mutex_lock(&lock);
> > +    tmp = instances;
> > +    for (; tmp; tmp = tmp->next) {  
> 
> Make that
> 
>        for (tail = instances; tail; tail = tail->next) {

I will fix this in the next version.

> > +        instance = yank_find_instance(tmp->value);
> > +        if (!instance) {
> > +            error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND,
> > +                      "Instance '%s' not found", tmp->value);
> > +            qemu_mutex_unlock(&lock);
> > +            return;
> > +        }
> > +    }
> > +    tmp = instances;
> > +    for (; tmp; tmp = tmp->next) {  
> 
> Likewise.
> 
> > +        instance = yank_find_instance(tmp->value);
> > +        assert(instance);
> > +        QLIST_FOREACH(entry, &instance->yankfns, next) {
> > +            entry->func(entry->opaque);
> > +        }
> > +    }
> > +    qemu_mutex_unlock(&lock);
> > +}
> > +
> > +YankInstances *qmp_query_yank(Error **errp)
> > +{
> > +    struct YankInstance *instance;
> > +    YankInstances *ret;
> > +
> > +    ret = g_new0(YankInstances, 1);
> > +    ret->instances = NULL;
> > +
> > +    qemu_mutex_lock(&lock);
> > +    QLIST_FOREACH(instance, &head, next) {
> > +        strList *entry;
> > +        entry = g_new0(strList, 1);
> > +        entry->value = g_strdup(instance->name);
> > +        entry->next = ret->instances;
> > +        ret->instances = entry;
> > +    }
> > +    qemu_mutex_unlock(&lock);
> > +
> > +    return ret;
> > +}
> > +
> > +static void __attribute__((__constructor__)) yank_init(void)
> > +{
> > +    qemu_mutex_init(&lock);
> > +}
> > --
> > 2.20.1  
> 
> The two QMP commands permit out-of-band execution ('allow-oob': true).
> OOB is easy to get wrong, but I figure you have a legitimate use case.
> Let's review the restrictions documented in
> docs/devel/qapi-code-gen.txt:
> 
>     An OOB-capable command handler must satisfy the following conditions:
> 
>     - It terminates quickly.
>     - It does not invoke system calls that may block.
>     - It does not access guest RAM that may block when userfaultfd is
>       enabled for postcopy live migration.
>     - It takes only "fast" locks, i.e. all critical sections protected by
>       any lock it takes also satisfy the conditions for OOB command
>       handler code.
> 
> Since the command handlers take &lock, the restrictions apply to the
> other critical sections protected by &lock as well.  I believe these are
> all okay: they do nothing but allocate, initialize and free memory.
> 
> The restrictions also apply to the YankFn callbacks, but you documented
> that.  Okay.
> 
> The one such callback included in this patch is
> yank_generic_iochannel(), which is a thin wrapper around
> qio_channel_shutdown(), which in turn runs the io_shutdown method.
> Thus, the restructions also apply to all the io_shutdown methods.
> That's not documented.
> 
> Daniel, should it be documented?
> 
This is already done in patch 6.

Thank you for you review.

Regards,
Lukas Straub

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 1/8] Introduce yank feature
  2020-08-28 14:21     ` Lukas Straub
@ 2020-08-31  7:47       ` Markus Armbruster
  0 siblings, 0 replies; 25+ messages in thread
From: Markus Armbruster @ 2020-08-31  7:47 UTC (permalink / raw)
  To: Lukas Straub
  Cc: Kevin Wolf, Daniel P. Berrangé,
	qemu-block, Juan Quintela, qemu-devel, Dr. David Alan Gilbert,
	Markus Armbruster, Paolo Bonzini, Marc-André Lureau,
	Max Reitz

Lukas Straub <lukasstraub2@web.de> writes:

> On Thu, 27 Aug 2020 14:37:00 +0200
> Markus Armbruster <armbru@redhat.com> wrote:
>
>> I apologize for not reviewing this much earlier.
>> 
>> Lukas Straub <lukasstraub2@web.de> writes:
>> 
>> > The yank feature allows to recover from hanging qemu by "yanking"
>> > at various parts. Other qemu systems can register themselves and
>> > multiple yank functions. Then all yank functions for selected
>> > instances can be called by the 'yank' out-of-band qmp command.
>> > Available instances can be queried by a 'query-yank' oob command.
>> >
>> > Signed-off-by: Lukas Straub <lukasstraub2@web.de>
>> > Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
[...]
>> > diff --git a/qapi/misc.json b/qapi/misc.json
>> > index 9d32820dc1..0d6a8f20b7 100644
>> > --- a/qapi/misc.json
>> > +++ b/qapi/misc.json
>> > @@ -1615,3 +1615,48 @@
>> >  ##
>> >  { 'command': 'query-vm-generation-id', 'returns': 'GuidInfo' }
>> >
>> > +##
>> > +# @YankInstances:
>> > +#
>> > +# @instances: List of yank instances.
>> > +#
>> > +# Yank instances are named after the following schema:
>> > +# "blockdev:<node-name>", "chardev:<chardev-name>" and "migration"
>> > +#
>> > +# Since: 5.1
>> > +##
>> > +{ 'struct': 'YankInstances', 'data': {'instances': ['str'] } }  
>> 
>> I'm afraid this is a problematic QMP interface.
>> 
>> By making YankInstances a struct, you keep the door open to adding more
>> members, which is good.
>> 
>> But by making its 'instances' member a ['str'], you close the door to
>> using anything but a single string for the individual instances.  Not so
>> good.
>> 
>> The single string encodes information which QMP client will need to
>> parse from the string.  We frown on that in QMP.  Use QAPI complex types
>> capabilities for structured data.
>> 
>> Could you use something like this instead?
>> 
>> { 'enum': 'YankInstanceType',
>>   'data': { 'block-node', 'chardev', 'migration' } }
>> 
>> { 'struct': 'YankInstanceBlockNode',
>>   'data': { 'node-name': 'str' } }
>> 
>> { 'struct': 'YankInstanceChardev',
>>   'data' { 'label': 'str' } }
>> 
>> { 'union': 'YankInstance',
>>   'base': { 'type': 'YankInstanceType' },
>>   'discriminator': 'type',
>>   'data': {
>>       'block-node': 'YankInstanceBlockNode',
>>       'chardev': 'YankInstanceChardev' } }
>> 
>> { 'command': 'yank',
>>   'data': { 'instances': ['YankInstance'] },
>>   'allow-oob': true }
>> 
>> If you're confident nothing will ever be added to YankInstanceBlockNode
>> and YankInstanceChardev, you could use str instead.
>
> As Daniel said, this has already been discussed.

I'll look up that discussion.

[...]
>> The two QMP commands permit out-of-band execution ('allow-oob': true).
>> OOB is easy to get wrong, but I figure you have a legitimate use case.
>> Let's review the restrictions documented in
>> docs/devel/qapi-code-gen.txt:
>> 
>>     An OOB-capable command handler must satisfy the following conditions:
>> 
>>     - It terminates quickly.
>>     - It does not invoke system calls that may block.
>>     - It does not access guest RAM that may block when userfaultfd is
>>       enabled for postcopy live migration.
>>     - It takes only "fast" locks, i.e. all critical sections protected by
>>       any lock it takes also satisfy the conditions for OOB command
>>       handler code.
>> 
>> Since the command handlers take &lock, the restrictions apply to the
>> other critical sections protected by &lock as well.  I believe these are
>> all okay: they do nothing but allocate, initialize and free memory.
>> 
>> The restrictions also apply to the YankFn callbacks, but you documented
>> that.  Okay.
>> 
>> The one such callback included in this patch is
>> yank_generic_iochannel(), which is a thin wrapper around
>> qio_channel_shutdown(), which in turn runs the io_shutdown method.
>> Thus, the restructions also apply to all the io_shutdown methods.
>> That's not documented.
>> 
>> Daniel, should it be documented?
>> 
> This is already done in patch 6.

Patch 6 adds "This function is thread-safe" to its contract.  The
restrictions on OOB-capable handler code are much more severe than
ordinary thread safety.  For instance, blocking system calls outside
critical sections are thread safe, but not permitted in OOB-capable
handler code.  The contract needs to be more specific.

> Thank you for you review.

Better late than never...  you're welcome!



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 1/8] Introduce yank feature
  2020-08-27 12:37   ` Markus Armbruster
  2020-08-27 14:28     ` Daniel P. Berrangé
  2020-08-28 14:21     ` Lukas Straub
@ 2020-09-04 12:33     ` Lukas Straub
  2020-09-04 12:47       ` Eric Blake
  2 siblings, 1 reply; 25+ messages in thread
From: Lukas Straub @ 2020-09-04 12:33 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Kevin Wolf, Daniel P. Berrangé,
	qemu-block, Juan Quintela, Dr. David Alan Gilbert, qemu-devel,
	Marc-André Lureau, Paolo Bonzini, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 3990 bytes --]

On Thu, 27 Aug 2020 14:37:00 +0200
Markus Armbruster <armbru@redhat.com> wrote:

> I apologize for not reviewing this much earlier.
> 
> Lukas Straub <lukasstraub2@web.de> writes:
> 
> > The yank feature allows to recover from hanging qemu by "yanking"
> > at various parts. Other qemu systems can register themselves and
> > multiple yank functions. Then all yank functions for selected
> > instances can be called by the 'yank' out-of-band qmp command.
> > Available instances can be queried by a 'query-yank' oob command.
> >
> > Signed-off-by: Lukas Straub <lukasstraub2@web.de>
> > Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> ...
> > diff --git a/qapi/misc.json b/qapi/misc.json
> > index 9d32820dc1..0d6a8f20b7 100644
> > --- a/qapi/misc.json
> > +++ b/qapi/misc.json
> > @@ -1615,3 +1615,48 @@
> >  ##
> >  { 'command': 'query-vm-generation-id', 'returns': 'GuidInfo' }
> >
> > +##
> > +# @YankInstances:
> > +#
> > +# @instances: List of yank instances.
> > +#
> > +# Yank instances are named after the following schema:
> > +# "blockdev:<node-name>", "chardev:<chardev-name>" and "migration"
> > +#
> > +# Since: 5.1
> > +##
> > +{ 'struct': 'YankInstances', 'data': {'instances': ['str'] } }  
> 
> I'm afraid this is a problematic QMP interface.
> 
> By making YankInstances a struct, you keep the door open to adding more
> members, which is good.
> 
> But by making its 'instances' member a ['str'], you close the door to
> using anything but a single string for the individual instances.  Not so
> good.
> 
> The single string encodes information which QMP client will need to
> parse from the string.  We frown on that in QMP.  Use QAPI complex types
> capabilities for structured data.
> 
> Could you use something like this instead?
> 
> { 'enum': 'YankInstanceType',
>   'data': { 'block-node', 'chardev', 'migration' } }
> 
> { 'struct': 'YankInstanceBlockNode',
>   'data': { 'node-name': 'str' } }
> 
> { 'struct': 'YankInstanceChardev',
>   'data' { 'label': 'str' } }
> 
> { 'union': 'YankInstance',
>   'base': { 'type': 'YankInstanceType' },
>   'discriminator': 'type',
>   'data': {
>       'block-node': 'YankInstanceBlockNode',
>       'chardev': 'YankInstanceChardev' } }
> 
> { 'command': 'yank',
>   'data': { 'instances': ['YankInstance'] },
>   'allow-oob': true }

This proposal looks good to me. Does everyone agree?

Regards,
Lukas Straub

> If you're confident nothing will ever be added to YankInstanceBlockNode
> and YankInstanceChardev, you could use str instead.
> 
> > +
> > +##
> > +# @yank:
> > +#
> > +# Recover from hanging qemu by yanking the specified instances.  
> 
> What's an "instance", and what does it mean to "yank" it?
> 
> The documentation of YankInstances above gives a clue on what an
> "instance" is: presumably a block node, a character device or the
> migration job.
> 
> I guess a YankInstance is whatever the code chooses to make one, and the
> current code makes these three kinds.
> 
> Does it make every block node a YankInstance?  If not, which ones?
> 
> Does it make every character device a YankInstance?  If not, which ones?
> 
> Does it make migration always a YankInstance?  If not, when?
> 
> > +#
> > +# Takes @YankInstances as argument.
> > +#
> > +# Returns: nothing.
> > +#
> > +# Example:
> > +#
> > +# -> { "execute": "yank", "arguments": { "instances": ["blockdev:nbd0"] } }
> > +# <- { "return": {} }
> > +#
> > +# Since: 5.1
> > +##
> > +{ 'command': 'yank', 'data': 'YankInstances', 'allow-oob': true }
> > +
> > +##
> > +# @query-yank:
> > +#
> > +# Query yank instances.
> > +#
> > +# Returns: @YankInstances
> > +#
> > +# Example:
> > +#
> > +# -> { "execute": "query-yank" }
> > +# <- { "return": { "instances": ["blockdev:nbd0"] } }
> > +#
> > +# Since: 5.1
> > +##
> > +{ 'command': 'query-yank', 'returns': 'YankInstances', 'allow-oob': true }
> ...

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 1/8] Introduce yank feature
  2020-09-04 12:33     ` Lukas Straub
@ 2020-09-04 12:47       ` Eric Blake
  0 siblings, 0 replies; 25+ messages in thread
From: Eric Blake @ 2020-09-04 12:47 UTC (permalink / raw)
  To: Lukas Straub, Markus Armbruster
  Cc: Kevin Wolf, Daniel P. Berrangé,
	qemu-block, Juan Quintela, qemu-devel, Dr. David Alan Gilbert,
	Paolo Bonzini, Marc-André Lureau, Max Reitz

On 9/4/20 7:33 AM, Lukas Straub wrote:

>>> +##
>>> +# @YankInstances:
>>> +#
>>> +# @instances: List of yank instances.
>>> +#
>>> +# Yank instances are named after the following schema:
>>> +# "blockdev:<node-name>", "chardev:<chardev-name>" and "migration"
>>> +#
>>> +# Since: 5.1
>>> +##
>>> +{ 'struct': 'YankInstances', 'data': {'instances': ['str'] } }
>>
>> I'm afraid this is a problematic QMP interface.
>>
>> By making YankInstances a struct, you keep the door open to adding more
>> members, which is good.
>>
>> But by making its 'instances' member a ['str'], you close the door to
>> using anything but a single string for the individual instances.  Not so
>> good.
>>
>> The single string encodes information which QMP client will need to
>> parse from the string.  We frown on that in QMP.  Use QAPI complex types
>> capabilities for structured data.
>>
>> Could you use something like this instead?
>>
>> { 'enum': 'YankInstanceType',
>>    'data': { 'block-node', 'chardev', 'migration' } }
>>
>> { 'struct': 'YankInstanceBlockNode',
>>    'data': { 'node-name': 'str' } }
>>
>> { 'struct': 'YankInstanceChardev',
>>    'data' { 'label': 'str' } }
>>
>> { 'union': 'YankInstance',
>>    'base': { 'type': 'YankInstanceType' },
>>    'discriminator': 'type',
>>    'data': {
>>        'block-node': 'YankInstanceBlockNode',
>>        'chardev': 'YankInstanceChardev' } }
>>
>> { 'command': 'yank',
>>    'data': { 'instances': ['YankInstance'] },
>>    'allow-oob': true }
> 
> This proposal looks good to me. Does everyone agree?

Yes; this is also more introspectible, so that if we add more yank 
instances down the road, or even more optional features to existing yank 
instances, it becomes easier to detect whether a particular qemu has 
those additions.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2020-09-04 12:48 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-04  8:11 [PATCH v7 0/8] Introduce 'yank' oob qmp command to recover from hanging qemu Lukas Straub
2020-08-04  8:11 ` [PATCH v7 1/8] Introduce yank feature Lukas Straub
2020-08-27 10:31   ` Daniel P. Berrangé
2020-08-27 12:37   ` Markus Armbruster
2020-08-27 14:28     ` Daniel P. Berrangé
2020-08-28 14:21     ` Lukas Straub
2020-08-31  7:47       ` Markus Armbruster
2020-09-04 12:33     ` Lukas Straub
2020-09-04 12:47       ` Eric Blake
2020-08-04  8:11 ` [PATCH v7 2/8] block/nbd.c: Add " Lukas Straub
2020-08-27 10:31   ` Daniel P. Berrangé
2020-08-04  8:11 ` [PATCH v7 3/8] chardev/char-socket.c: " Lukas Straub
2020-08-27 10:32   ` Daniel P. Berrangé
2020-08-04  8:11 ` [PATCH v7 4/8] migration: " Lukas Straub
2020-08-27 10:39   ` Daniel P. Berrangé
2020-08-04  8:11 ` [PATCH v7 5/8] io/channel-tls.c: make qio_channel_tls_shutdown thread-safe Lukas Straub
2020-08-04  8:11 ` [PATCH v7 6/8] io: Document thread-safety of qio_channel_shutdown Lukas Straub
2020-08-04  8:11 ` [PATCH v7 7/8] MAINTAINERS: Add myself as maintainer for yank feature Lukas Straub
2020-08-04  8:12 ` [PATCH v7 8/8] tests/test-char.c: Wait for the chardev to connect in char_socket_client_dupid_test Lukas Straub
2020-08-27 10:30   ` Daniel P. Berrangé
2020-08-18 12:26 ` [PATCH v7 0/8] Introduce 'yank' oob qmp command to recover from hanging qemu Lukas Straub
2020-08-27  8:42   ` Lukas Straub
2020-08-27 10:41     ` Daniel P. Berrangé
2020-08-27 14:18       ` Markus Armbruster
2020-08-27 17:58       ` Dr. David Alan Gilbert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).