qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/4] Introduce 'yank' oob qmp command to recover from hanging qemu
@ 2020-05-25 15:44 Lukas Straub
  2020-05-25 15:44 ` [PATCH v4 1/4] Introduce yank feature Lukas Straub
                   ` (4 more replies)
  0 siblings, 5 replies; 22+ messages in thread
From: Lukas Straub @ 2020-05-25 15:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Daniel P. Berrangé,
	qemu-block, Juan Quintela, Dr. David Alan Gilbert, Max Reitz,
	Paolo Bonzini, Marc-André Lureau

[-- Attachment #1: Type: text/plain, Size: 1745 bytes --]

Hello Everyone,
In many cases, if qemu has a network connection (qmp, migration, chardev, etc.)
to some other server and that server dies or hangs, qemu hangs too.
These patches introduce the new 'yank' out-of-band qmp command to recover from
these kinds of hangs. The different subsystems register callbacks which get
executed with the yank command. For example the callback can shutdown() a
socket. This is intended for the colo use-case, but it can be used for other
things too of course.

Regards,
Lukas Straub

v4:
 -fix build errors...

v3:
 -don't touch softmmu/vl.c, use __contructor__ attribute instead (Paolo Bonzini)
 -fix build errors
 -rewrite migration patch so it actually passes all tests

v2:
 -don't touch io/ code anymore
 -always register yank functions
 -'yank' now takes a list of instances to yank
 -'query-yank' returns a list of yankable instances


Lukas Straub (4):
  Introduce yank feature
  block/nbd.c: Add yank feature
  chardev/char-socket.c: Add yank feature
  migration: Add yank feature

 Makefile.objs                 |   3 +
 block/nbd.c                   | 101 ++++++++++++--------
 chardev/char-socket.c         |  24 +++++
 migration/channel.c           |  12 +++
 migration/migration.c         |  18 +++-
 migration/multifd.c           |  10 ++
 migration/qemu-file-channel.c |   6 ++
 migration/savevm.c            |   2 +
 qapi/misc.json                |  45 +++++++++
 tests/Makefile.include        |   2 +-
 yank.c                        | 174 ++++++++++++++++++++++++++++++++++
 yank.h                        |  67 +++++++++++++
 12 files changed, 425 insertions(+), 39 deletions(-)
 create mode 100644 yank.c
 create mode 100644 yank.h

--
2.20.1

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v4 1/4] Introduce yank feature
  2020-05-25 15:44 [PATCH v4 0/4] Introduce 'yank' oob qmp command to recover from hanging qemu Lukas Straub
@ 2020-05-25 15:44 ` Lukas Straub
  2020-06-16 14:39   ` Daniel P. Berrangé
                     ` (2 more replies)
  2020-05-25 15:44 ` [PATCH v4 2/4] block/nbd.c: Add " Lukas Straub
                   ` (3 subsequent siblings)
  4 siblings, 3 replies; 22+ messages in thread
From: Lukas Straub @ 2020-05-25 15:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Daniel P. Berrangé,
	qemu-block, Juan Quintela, Dr. David Alan Gilbert, Max Reitz,
	Paolo Bonzini, Marc-André Lureau

[-- Attachment #1: Type: text/plain, Size: 8617 bytes --]

The yank feature allows to recover from hanging qemu by "yanking"
at various parts. Other qemu systems can register themselves and
multiple yank functions. Then all yank functions for selected
instances can be called by the 'yank' out-of-band qmp command.
Available instances can be queried by a 'query-yank' oob command.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
---
 qapi/misc.json |  45 +++++++++++++
 yank.c         | 174 +++++++++++++++++++++++++++++++++++++++++++++++++
 yank.h         |  67 +++++++++++++++++++
 3 files changed, 286 insertions(+)
 create mode 100644 yank.c
 create mode 100644 yank.h

diff --git a/qapi/misc.json b/qapi/misc.json
index 99b90ac80b..f5228b2502 100644
--- a/qapi/misc.json
+++ b/qapi/misc.json
@@ -1550,3 +1550,48 @@
 ##
 { 'command': 'query-vm-generation-id', 'returns': 'GuidInfo' }

+##
+# @YankInstances:
+#
+# @instances: List of yank instances.
+#
+# Yank instances are named after the following schema:
+# "blockdev:<node-name>", "chardev:<chardev-name>" and "migration"
+#
+# Since: 5.1
+##
+{ 'struct': 'YankInstances', 'data': {'instances': ['str'] } }
+
+##
+# @yank:
+#
+# Recover from hanging qemu by yanking the specified instances.
+#
+# Takes @YankInstances as argument.
+#
+# Returns: nothing.
+#
+# Example:
+#
+# -> { "execute": "yank", "arguments": { "instances": ["blockdev:nbd0"] } }
+# <- { "return": {} }
+#
+# Since: 5.1
+##
+{ 'command': 'yank', 'data': 'YankInstances', 'allow-oob': true }
+
+##
+# @query-yank:
+#
+# Query yank instances.
+#
+# Returns: @YankInstances
+#
+# Example:
+#
+# -> { "execute": "query-yank" }
+# <- { "return": { "instances": ["blockdev:nbd0"] } }
+#
+# Since: 5.1
+##
+{ 'command': 'query-yank', 'returns': 'YankInstances', 'allow-oob': true }
diff --git a/yank.c b/yank.c
new file mode 100644
index 0000000000..36d8139d4d
--- /dev/null
+++ b/yank.c
@@ -0,0 +1,174 @@
+/*
+ * QEMU yank feature
+ *
+ * Copyright (c) Lukas Straub <lukasstraub2@web.de>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qemu/thread.h"
+#include "qemu/queue.h"
+#include "qapi/qapi-commands-misc.h"
+#include "io/channel.h"
+#include "yank.h"
+
+struct YankFuncAndParam {
+    YankFn *func;
+    void *opaque;
+    QLIST_ENTRY(YankFuncAndParam) next;
+};
+
+struct YankInstance {
+    char *name;
+    QLIST_HEAD(, YankFuncAndParam) yankfns;
+    QLIST_ENTRY(YankInstance) next;
+};
+
+static QemuMutex lock;
+static QLIST_HEAD(yankinst_list, YankInstance) head
+    = QLIST_HEAD_INITIALIZER(head);
+
+static struct YankInstance *yank_find_instance(char *name)
+{
+    struct YankInstance *tmp, *instance;
+    instance = NULL;
+    QLIST_FOREACH(tmp, &head, next) {
+        if (!strcmp(tmp->name, name)) {
+            instance = tmp;
+        }
+    }
+    return instance;
+}
+
+void yank_register_instance(char *instance_name)
+{
+    struct YankInstance *instance;
+
+    qemu_mutex_lock(&lock);
+    assert(!yank_find_instance(instance_name));
+
+    instance = g_slice_new(struct YankInstance);
+    instance->name = g_strdup(instance_name);
+    QLIST_INIT(&instance->yankfns);
+    QLIST_INSERT_HEAD(&head, instance, next);
+
+    qemu_mutex_unlock(&lock);
+}
+
+void yank_unregister_instance(char *instance_name)
+{
+    struct YankInstance *instance;
+
+    qemu_mutex_lock(&lock);
+    instance = yank_find_instance(instance_name);
+    assert(instance);
+
+    assert(QLIST_EMPTY(&instance->yankfns));
+    QLIST_REMOVE(instance, next);
+    g_free(instance->name);
+    g_slice_free(struct YankInstance, instance);
+
+    qemu_mutex_unlock(&lock);
+}
+
+void yank_register_function(char *instance_name, YankFn *func, void *opaque)
+{
+    struct YankInstance *instance;
+    struct YankFuncAndParam *entry;
+
+    qemu_mutex_lock(&lock);
+    instance = yank_find_instance(instance_name);
+    assert(instance);
+
+    entry = g_slice_new(struct YankFuncAndParam);
+    entry->func = func;
+    entry->opaque = opaque;
+
+    QLIST_INSERT_HEAD(&instance->yankfns, entry, next);
+    qemu_mutex_unlock(&lock);
+}
+
+void yank_unregister_function(char *instance_name, YankFn *func, void *opaque)
+{
+    struct YankInstance *instance;
+    struct YankFuncAndParam *entry;
+
+    qemu_mutex_lock(&lock);
+    instance = yank_find_instance(instance_name);
+    assert(instance);
+
+    QLIST_FOREACH(entry, &instance->yankfns, next) {
+        if (entry->func == func && entry->opaque == opaque) {
+            QLIST_REMOVE(entry, next);
+            g_slice_free(struct YankFuncAndParam, entry);
+            qemu_mutex_unlock(&lock);
+            return;
+        }
+    }
+
+    abort();
+}
+
+void yank_generic_iochannel(void *opaque)
+{
+    QIOChannel *ioc = QIO_CHANNEL(opaque);
+
+    qio_channel_shutdown(ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
+}
+
+void qmp_yank(strList *instances, Error **errp)
+{
+    strList *tmp;
+    struct YankInstance *instance;
+    struct YankFuncAndParam *entry;
+
+    qemu_mutex_lock(&lock);
+    tmp = instances;
+    for (; tmp; tmp = tmp->next) {
+        instance = yank_find_instance(tmp->value);
+        if (!instance) {
+            error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND,
+                      "Instance '%s' not found", tmp->value);
+            qemu_mutex_unlock(&lock);
+            return;
+        }
+    }
+    tmp = instances;
+    for (; tmp; tmp = tmp->next) {
+        instance = yank_find_instance(tmp->value);
+        assert(instance);
+        QLIST_FOREACH(entry, &instance->yankfns, next) {
+            entry->func(entry->opaque);
+        }
+    }
+    qemu_mutex_unlock(&lock);
+}
+
+YankInstances *qmp_query_yank(Error **errp)
+{
+    struct YankInstance *instance;
+    YankInstances *ret;
+
+    ret = g_new0(YankInstances, 1);
+    ret->instances = NULL;
+
+    qemu_mutex_lock(&lock);
+    QLIST_FOREACH(instance, &head, next) {
+        strList *entry;
+        entry = g_new0(strList, 1);
+        entry->value = g_strdup(instance->name);
+        entry->next = ret->instances;
+        ret->instances = entry;
+    }
+    qemu_mutex_unlock(&lock);
+
+    return ret;
+}
+
+static void __attribute__((__constructor__)) yank_init(void)
+{
+    qemu_mutex_init(&lock);
+}
diff --git a/yank.h b/yank.h
new file mode 100644
index 0000000000..f1c8743b72
--- /dev/null
+++ b/yank.h
@@ -0,0 +1,67 @@
+
+#ifndef YANK_H
+#define YANK_H
+
+typedef void (YankFn) (void *opaque);
+
+/**
+ * yank_register_instance: Register a new instance.
+ *
+ * This registers a new instance for yanking. Must be called before any yank
+ * function is registered for this instance.
+ *
+ * This function is thread-safe.
+ *
+ * @instance_name: The globally unique name of the instance.
+ */
+void yank_register_instance(char *instance_name);
+
+/**
+ * yank_unregister_instance: Unregister a instance.
+ *
+ * This unregisters a instance. Must be called only after every yank function
+ * of the instance has been unregistered.
+ *
+ * This function is thread-safe.
+ *
+ * @instance_name: The name of the instance.
+ */
+void yank_unregister_instance(char *instance_name);
+
+/**
+ * yank_register_function: Register a yank function
+ *
+ * This registers a yank function. All limitations of qmp oob commands apply
+ * to the yank function as well.
+ *
+ * This function is thread-safe.
+ *
+ * @instance_name: The name of the instance
+ * @func: The yank function
+ * @opaque: Will be passed to the yank function
+ */
+void yank_register_function(char *instance_name, YankFn *func, void *opaque);
+
+/**
+ * yank_unregister_function: Unregister a yank function
+ *
+ * This unregisters a yank function.
+ *
+ * This function is thread-safe.
+ *
+ * @instance_name: The name of the instance
+ * @func: func that was passed to yank_register_function
+ * @opaque: opaque that was passed to yank_register_function
+ */
+void yank_unregister_function(char *instance_name, YankFn *func, void *opaque);
+
+/**
+ * yank_unregister_function: Generic yank function for iochannel
+ *
+ * This is a generic yank function which will call qio_channel_shutdown on the
+ * provided QIOChannel.
+ *
+ * @opaque: QIOChannel to shutdown
+ */
+void yank_generic_iochannel(void *opaque);
+#endif
--
2.20.1


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 2/4] block/nbd.c: Add yank feature
  2020-05-25 15:44 [PATCH v4 0/4] Introduce 'yank' oob qmp command to recover from hanging qemu Lukas Straub
  2020-05-25 15:44 ` [PATCH v4 1/4] Introduce yank feature Lukas Straub
@ 2020-05-25 15:44 ` Lukas Straub
  2020-06-16 14:40   ` Daniel P. Berrangé
                     ` (2 more replies)
  2020-05-25 15:44 ` [PATCH v4 3/4] chardev/char-socket.c: " Lukas Straub
                   ` (2 subsequent siblings)
  4 siblings, 3 replies; 22+ messages in thread
From: Lukas Straub @ 2020-05-25 15:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Daniel P. Berrangé,
	qemu-block, Juan Quintela, Dr. David Alan Gilbert, Max Reitz,
	Paolo Bonzini, Marc-André Lureau

[-- Attachment #1: Type: text/plain, Size: 12595 bytes --]

Register a yank function which shuts down the socket and sets
s->state = NBD_CLIENT_QUIT. This is the same behaviour as if an
error occured.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
---
 Makefile.objs |   1 +
 block/nbd.c   | 101 ++++++++++++++++++++++++++++++++------------------
 2 files changed, 65 insertions(+), 37 deletions(-)

diff --git a/Makefile.objs b/Makefile.objs
index a7c967633a..8e403b81f3 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -18,6 +18,7 @@ block-obj-y += block.o blockjob.o job.o
 block-obj-y += block/ scsi/
 block-obj-y += qemu-io-cmds.o
 block-obj-$(CONFIG_REPLICATION) += replication.o
+block-obj-y += yank.o

 block-obj-m = block/

diff --git a/block/nbd.c b/block/nbd.c
index 2160859f64..3a41749f1b 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -35,6 +35,7 @@
 #include "qemu/option.h"
 #include "qemu/cutils.h"
 #include "qemu/main-loop.h"
+#include "qemu/atomic.h"

 #include "qapi/qapi-visit-sockets.h"
 #include "qapi/qmp/qstring.h"
@@ -43,6 +44,8 @@
 #include "block/nbd.h"
 #include "block/block_int.h"

+#include "yank.h"
+
 #define EN_OPTSTR ":exportname="
 #define MAX_NBD_REQUESTS    16

@@ -84,6 +87,8 @@ typedef struct BDRVNBDState {
     NBDReply reply;
     BlockDriverState *bs;

+    char *yank_name;
+
     /* Connection parameters */
     uint32_t reconnect_delay;
     SocketAddress *saddr;
@@ -94,6 +99,7 @@ typedef struct BDRVNBDState {
 } BDRVNBDState;

 static int nbd_client_connect(BlockDriverState *bs, Error **errp);
+static void nbd_yank(void *opaque);

 static void nbd_clear_bdrvstate(BDRVNBDState *s)
 {
@@ -106,17 +112,19 @@ static void nbd_clear_bdrvstate(BDRVNBDState *s)
     s->tlscredsid = NULL;
     g_free(s->x_dirty_bitmap);
     s->x_dirty_bitmap = NULL;
+    g_free(s->yank_name);
+    s->yank_name = NULL;
 }

 static void nbd_channel_error(BDRVNBDState *s, int ret)
 {
     if (ret == -EIO) {
-        if (s->state == NBD_CLIENT_CONNECTED) {
+        if (atomic_read(&s->state) == NBD_CLIENT_CONNECTED) {
             s->state = s->reconnect_delay ? NBD_CLIENT_CONNECTING_WAIT :
                                             NBD_CLIENT_CONNECTING_NOWAIT;
         }
     } else {
-        if (s->state == NBD_CLIENT_CONNECTED) {
+        if (atomic_read(&s->state) == NBD_CLIENT_CONNECTED) {
             qio_channel_shutdown(s->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
         }
         s->state = NBD_CLIENT_QUIT;
@@ -167,7 +175,7 @@ static void nbd_client_attach_aio_context(BlockDriverState *bs,
      * s->connection_co is either yielded from nbd_receive_reply or from
      * nbd_co_reconnect_loop()
      */
-    if (s->state == NBD_CLIENT_CONNECTED) {
+    if (atomic_read(&s->state) == NBD_CLIENT_CONNECTED) {
         qio_channel_attach_aio_context(QIO_CHANNEL(s->ioc), new_context);
     }

@@ -206,7 +214,7 @@ static void nbd_teardown_connection(BlockDriverState *bs)
 {
     BDRVNBDState *s = (BDRVNBDState *)bs->opaque;

-    if (s->state == NBD_CLIENT_CONNECTED) {
+    if (atomic_read(&s->state) == NBD_CLIENT_CONNECTED) {
         /* finish any pending coroutines */
         assert(s->ioc);
         qio_channel_shutdown(s->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
@@ -230,13 +238,14 @@ static void nbd_teardown_connection(BlockDriverState *bs)

 static bool nbd_client_connecting(BDRVNBDState *s)
 {
-    return s->state == NBD_CLIENT_CONNECTING_WAIT ||
-        s->state == NBD_CLIENT_CONNECTING_NOWAIT;
+    NBDClientState state = atomic_read(&s->state);
+    return state == NBD_CLIENT_CONNECTING_WAIT ||
+        state == NBD_CLIENT_CONNECTING_NOWAIT;
 }

 static bool nbd_client_connecting_wait(BDRVNBDState *s)
 {
-    return s->state == NBD_CLIENT_CONNECTING_WAIT;
+    return atomic_read(&s->state) == NBD_CLIENT_CONNECTING_WAIT;
 }

 static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s)
@@ -274,6 +283,7 @@ static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s)
     /* Finalize previous connection if any */
     if (s->ioc) {
         nbd_client_detach_aio_context(s->bs);
+        yank_unregister_function(s->yank_name, nbd_yank, s->bs);
         object_unref(OBJECT(s->sioc));
         s->sioc = NULL;
         object_unref(OBJECT(s->ioc));
@@ -305,7 +315,7 @@ static coroutine_fn void nbd_co_reconnect_loop(BDRVNBDState *s)
     nbd_reconnect_attempt(s);

     while (nbd_client_connecting(s)) {
-        if (s->state == NBD_CLIENT_CONNECTING_WAIT &&
+        if (atomic_read(&s->state) == NBD_CLIENT_CONNECTING_WAIT &&
             qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - start_time_ns > delay_ns)
         {
             s->state = NBD_CLIENT_CONNECTING_NOWAIT;
@@ -341,7 +351,7 @@ static coroutine_fn void nbd_connection_entry(void *opaque)
     int ret = 0;
     Error *local_err = NULL;

-    while (s->state != NBD_CLIENT_QUIT) {
+    while (atomic_read(&s->state) != NBD_CLIENT_QUIT) {
         /*
          * The NBD client can only really be considered idle when it has
          * yielded from qio_channel_readv_all_eof(), waiting for data. This is
@@ -356,7 +366,7 @@ static coroutine_fn void nbd_connection_entry(void *opaque)
             nbd_co_reconnect_loop(s);
         }

-        if (s->state != NBD_CLIENT_CONNECTED) {
+        if (atomic_read(&s->state) != NBD_CLIENT_CONNECTED) {
             continue;
         }

@@ -411,6 +421,7 @@ static coroutine_fn void nbd_connection_entry(void *opaque)
     s->connection_co = NULL;
     if (s->ioc) {
         nbd_client_detach_aio_context(s->bs);
+        yank_unregister_function(s->yank_name, nbd_yank, s->bs);
         object_unref(OBJECT(s->sioc));
         s->sioc = NULL;
         object_unref(OBJECT(s->ioc));
@@ -435,7 +446,7 @@ static int nbd_co_send_request(BlockDriverState *bs,
         qemu_co_queue_wait(&s->free_sema, &s->send_mutex);
     }

-    if (s->state != NBD_CLIENT_CONNECTED) {
+    if (atomic_read(&s->state) != NBD_CLIENT_CONNECTED) {
         rc = -EIO;
         goto err;
     }
@@ -462,7 +473,7 @@ static int nbd_co_send_request(BlockDriverState *bs,
     if (qiov) {
         qio_channel_set_cork(s->ioc, true);
         rc = nbd_send_request(s->ioc, request);
-        if (rc >= 0 && s->state == NBD_CLIENT_CONNECTED) {
+        if (rc >= 0 && atomic_read(&s->state) == NBD_CLIENT_CONNECTED) {
             if (qio_channel_writev_all(s->ioc, qiov->iov, qiov->niov,
                                        NULL) < 0) {
                 rc = -EIO;
@@ -777,7 +788,7 @@ static coroutine_fn int nbd_co_do_receive_one_chunk(
     s->requests[i].receiving = true;
     qemu_coroutine_yield();
     s->requests[i].receiving = false;
-    if (s->state != NBD_CLIENT_CONNECTED) {
+    if (atomic_read(&s->state) != NBD_CLIENT_CONNECTED) {
         error_setg(errp, "Connection closed");
         return -EIO;
     }
@@ -936,7 +947,7 @@ static bool nbd_reply_chunk_iter_receive(BDRVNBDState *s,
     NBDReply local_reply;
     NBDStructuredReplyChunk *chunk;
     Error *local_err = NULL;
-    if (s->state != NBD_CLIENT_CONNECTED) {
+    if (atomic_read(&s->state) != NBD_CLIENT_CONNECTED) {
         error_setg(&local_err, "Connection closed");
         nbd_iter_channel_error(iter, -EIO, &local_err);
         goto break_loop;
@@ -961,7 +972,8 @@ static bool nbd_reply_chunk_iter_receive(BDRVNBDState *s,
     }

     /* Do not execute the body of NBD_FOREACH_REPLY_CHUNK for simple reply. */
-    if (nbd_reply_is_simple(reply) || s->state != NBD_CLIENT_CONNECTED) {
+    if (nbd_reply_is_simple(reply) ||
+        atomic_read(&s->state) != NBD_CLIENT_CONNECTED) {
         goto break_loop;
     }

@@ -1395,6 +1407,15 @@ static int nbd_client_reopen_prepare(BDRVReopenState *state,
     return 0;
 }

+static void nbd_yank(void *opaque)
+{
+    BlockDriverState *bs = opaque;
+    BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
+
+    qio_channel_shutdown(QIO_CHANNEL(s->sioc), QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
+    atomic_set(&s->state, NBD_CLIENT_QUIT);
+}
+
 static void nbd_client_close(BlockDriverState *bs)
 {
     BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
@@ -1407,25 +1428,29 @@ static void nbd_client_close(BlockDriverState *bs)
     nbd_teardown_connection(bs);
 }

-static QIOChannelSocket *nbd_establish_connection(SocketAddress *saddr,
-                                                  Error **errp)
+static int nbd_establish_connection(BlockDriverState *bs,
+                                    SocketAddress *saddr,
+                                    Error **errp)
 {
-    QIOChannelSocket *sioc;
+    BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
     Error *local_err = NULL;

-    sioc = qio_channel_socket_new();
-    qio_channel_set_name(QIO_CHANNEL(sioc), "nbd-client");
+    s->sioc = qio_channel_socket_new();
+    qio_channel_set_name(QIO_CHANNEL(s->sioc), "nbd-client");
+    yank_register_function(s->yank_name, nbd_yank, bs);

-    qio_channel_socket_connect_sync(sioc, saddr, &local_err);
+    qio_channel_socket_connect_sync(s->sioc, saddr, &local_err);
     if (local_err) {
-        object_unref(OBJECT(sioc));
+        yank_unregister_function(s->yank_name, nbd_yank, bs);
+        object_unref(OBJECT(s->sioc));
+        s->sioc = NULL;
         error_propagate(errp, local_err);
-        return NULL;
+        return -1;
     }

-    qio_channel_set_delay(QIO_CHANNEL(sioc), false);
+    qio_channel_set_delay(QIO_CHANNEL(s->sioc), false);

-    return sioc;
+    return 0;
 }

 static int nbd_client_connect(BlockDriverState *bs, Error **errp)
@@ -1438,28 +1463,27 @@ static int nbd_client_connect(BlockDriverState *bs, Error **errp)
      * establish TCP connection, return error if it fails
      * TODO: Configurable retry-until-timeout behaviour.
      */
-    QIOChannelSocket *sioc = nbd_establish_connection(s->saddr, errp);
-
-    if (!sioc) {
+    if (nbd_establish_connection(bs, s->saddr, errp) < 0) {
         return -ECONNREFUSED;
     }

     /* NBD handshake */
     trace_nbd_client_connect(s->export);
-    qio_channel_set_blocking(QIO_CHANNEL(sioc), false, NULL);
-    qio_channel_attach_aio_context(QIO_CHANNEL(sioc), aio_context);
+    qio_channel_set_blocking(QIO_CHANNEL(s->sioc), false, NULL);
+    qio_channel_attach_aio_context(QIO_CHANNEL(s->sioc), aio_context);

     s->info.request_sizes = true;
     s->info.structured_reply = true;
     s->info.base_allocation = true;
     s->info.x_dirty_bitmap = g_strdup(s->x_dirty_bitmap);
     s->info.name = g_strdup(s->export ?: "");
-    ret = nbd_receive_negotiate(aio_context, QIO_CHANNEL(sioc), s->tlscreds,
+    ret = nbd_receive_negotiate(aio_context, QIO_CHANNEL(s->sioc), s->tlscreds,
                                 s->hostname, &s->ioc, &s->info, errp);
     g_free(s->info.x_dirty_bitmap);
     g_free(s->info.name);
     if (ret < 0) {
-        object_unref(OBJECT(sioc));
+        yank_unregister_function(s->yank_name, nbd_yank, bs);
+        object_unref(OBJECT(s->sioc));
         return ret;
     }
     if (s->x_dirty_bitmap && !s->info.base_allocation) {
@@ -1485,10 +1509,8 @@ static int nbd_client_connect(BlockDriverState *bs, Error **errp)
         }
     }

-    s->sioc = sioc;
-
     if (!s->ioc) {
-        s->ioc = QIO_CHANNEL(sioc);
+        s->ioc = QIO_CHANNEL(s->sioc);
         object_ref(OBJECT(s->ioc));
     }

@@ -1504,9 +1526,10 @@ static int nbd_client_connect(BlockDriverState *bs, Error **errp)
     {
         NBDRequest request = { .type = NBD_CMD_DISC };

-        nbd_send_request(s->ioc ?: QIO_CHANNEL(sioc), &request);
+        nbd_send_request(s->ioc ?: QIO_CHANNEL(s->sioc), &request);

-        object_unref(OBJECT(sioc));
+        yank_unregister_function(s->yank_name, nbd_yank, bs);
+        object_unref(OBJECT(s->sioc));

         return ret;
     }
@@ -1913,6 +1936,9 @@ static int nbd_open(BlockDriverState *bs, QDict *options, int flags,
     qemu_co_mutex_init(&s->send_mutex);
     qemu_co_queue_init(&s->free_sema);

+    s->yank_name = g_strconcat("blockdev:", bs->node_name, NULL);
+    yank_register_instance(s->yank_name);
+
     ret = nbd_client_connect(bs, errp);
     if (ret < 0) {
         nbd_clear_bdrvstate(s);
@@ -1972,6 +1998,7 @@ static void nbd_close(BlockDriverState *bs)
     BDRVNBDState *s = bs->opaque;

     nbd_client_close(bs);
+    yank_unregister_instance(s->yank_name);
     nbd_clear_bdrvstate(s);
 }

--
2.20.1


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 3/4] chardev/char-socket.c: Add yank feature
  2020-05-25 15:44 [PATCH v4 0/4] Introduce 'yank' oob qmp command to recover from hanging qemu Lukas Straub
  2020-05-25 15:44 ` [PATCH v4 1/4] Introduce yank feature Lukas Straub
  2020-05-25 15:44 ` [PATCH v4 2/4] block/nbd.c: Add " Lukas Straub
@ 2020-05-25 15:44 ` Lukas Straub
  2020-06-16 14:41   ` Daniel P. Berrangé
  2020-05-25 15:44 ` [PATCH v4 4/4] migration: " Lukas Straub
  2020-06-06 19:30 ` [PATCH v4 0/4] Introduce 'yank' oob qmp command to recover from hanging qemu Lukas Straub
  4 siblings, 1 reply; 22+ messages in thread
From: Lukas Straub @ 2020-05-25 15:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Daniel P. Berrangé,
	qemu-block, Juan Quintela, Dr. David Alan Gilbert, Max Reitz,
	Paolo Bonzini, Marc-André Lureau

[-- Attachment #1: Type: text/plain, Size: 4627 bytes --]

Register a yank function to shutdown the socket on yank.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
---
 Makefile.objs         |  1 +
 chardev/char-socket.c | 24 ++++++++++++++++++++++++
 2 files changed, 25 insertions(+)

diff --git a/Makefile.objs b/Makefile.objs
index 8e403b81f3..5582f4eda9 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -10,6 +10,7 @@ qom-obj-y = qom/
 ifeq ($(call lor,$(CONFIG_SOFTMMU),$(CONFIG_TOOLS)),y)

 chardev-obj-y = chardev/
+chardev-obj-y += yank.o

 authz-obj-y = authz/

diff --git a/chardev/char-socket.c b/chardev/char-socket.c
index 185fe38dda..d5c6cd2153 100644
--- a/chardev/char-socket.c
+++ b/chardev/char-socket.c
@@ -34,6 +34,7 @@
 #include "qapi/error.h"
 #include "qapi/clone-visitor.h"
 #include "qapi/qapi-visit-sockets.h"
+#include "yank.h"

 #include "chardev/char-io.h"

@@ -69,6 +70,7 @@ typedef struct {
     size_t read_msgfds_num;
     int *write_msgfds;
     size_t write_msgfds_num;
+    char *yank_name;

     SocketAddress *addr;
     bool is_listen;
@@ -409,6 +411,11 @@ static void tcp_chr_free_connection(Chardev *chr)

     tcp_set_msgfds(chr, NULL, 0);
     remove_fd_in_watch(chr);
+    if (s->state == TCP_CHARDEV_STATE_CONNECTING
+        || s->state == TCP_CHARDEV_STATE_CONNECTED) {
+        yank_unregister_function(s->yank_name, yank_generic_iochannel,
+                                 QIO_CHANNEL(s->sioc));
+    }
     object_unref(OBJECT(s->sioc));
     s->sioc = NULL;
     object_unref(OBJECT(s->ioc));
@@ -912,6 +919,8 @@ static int tcp_chr_add_client(Chardev *chr, int fd)
     }
     tcp_chr_change_state(s, TCP_CHARDEV_STATE_CONNECTING);
     tcp_chr_set_client_ioc_name(chr, sioc);
+    yank_register_function(s->yank_name, yank_generic_iochannel,
+                           QIO_CHANNEL(sioc));
     ret = tcp_chr_new_client(chr, sioc);
     object_unref(OBJECT(sioc));
     return ret;
@@ -926,6 +935,8 @@ static void tcp_chr_accept(QIONetListener *listener,

     tcp_chr_change_state(s, TCP_CHARDEV_STATE_CONNECTING);
     tcp_chr_set_client_ioc_name(chr, cioc);
+    yank_register_function(s->yank_name, yank_generic_iochannel,
+                           QIO_CHANNEL(cioc));
     tcp_chr_new_client(chr, cioc);
 }

@@ -941,6 +952,8 @@ static int tcp_chr_connect_client_sync(Chardev *chr, Error **errp)
         object_unref(OBJECT(sioc));
         return -1;
     }
+    yank_register_function(s->yank_name, yank_generic_iochannel,
+                           QIO_CHANNEL(sioc));
     tcp_chr_new_client(chr, sioc);
     object_unref(OBJECT(sioc));
     return 0;
@@ -956,6 +969,8 @@ static void tcp_chr_accept_server_sync(Chardev *chr)
     tcp_chr_change_state(s, TCP_CHARDEV_STATE_CONNECTING);
     sioc = qio_net_listener_wait_client(s->listener);
     tcp_chr_set_client_ioc_name(chr, sioc);
+    yank_register_function(s->yank_name, yank_generic_iochannel,
+                           QIO_CHANNEL(sioc));
     tcp_chr_new_client(chr, sioc);
     object_unref(OBJECT(sioc));
 }
@@ -1066,6 +1081,8 @@ static void char_socket_finalize(Object *obj)
         object_unref(OBJECT(s->tls_creds));
     }
     g_free(s->tls_authz);
+    yank_unregister_instance(s->yank_name);
+    g_free(s->yank_name);

     qemu_chr_be_event(chr, CHR_EVENT_CLOSED);
 }
@@ -1081,6 +1098,8 @@ static void qemu_chr_socket_connected(QIOTask *task, void *opaque)

     if (qio_task_propagate_error(task, &err)) {
         tcp_chr_change_state(s, TCP_CHARDEV_STATE_DISCONNECTED);
+        yank_unregister_function(s->yank_name, yank_generic_iochannel,
+                                 QIO_CHANNEL(sioc));
         check_report_connect_error(chr, err);
         error_free(err);
         goto cleanup;
@@ -1115,6 +1134,8 @@ static void tcp_chr_connect_client_async(Chardev *chr)
     tcp_chr_change_state(s, TCP_CHARDEV_STATE_CONNECTING);
     sioc = qio_channel_socket_new();
     tcp_chr_set_client_ioc_name(chr, sioc);
+    yank_register_function(s->yank_name, yank_generic_iochannel,
+                           QIO_CHANNEL(sioc));
     /*
      * Normally code would use the qio_channel_socket_connect_async
      * method which uses a QIOTask + qio_task_set_error internally
@@ -1356,6 +1377,9 @@ static void qmp_chardev_open_socket(Chardev *chr,
         qemu_chr_set_feature(chr, QEMU_CHAR_FEATURE_FD_PASS);
     }

+    s->yank_name = g_strconcat("chardev:", chr->label, NULL);
+    yank_register_instance(s->yank_name);
+
     /* be isn't opened until we get a connection */
     *be_opened = false;

--
2.20.1


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 4/4] migration: Add yank feature
  2020-05-25 15:44 [PATCH v4 0/4] Introduce 'yank' oob qmp command to recover from hanging qemu Lukas Straub
                   ` (2 preceding siblings ...)
  2020-05-25 15:44 ` [PATCH v4 3/4] chardev/char-socket.c: " Lukas Straub
@ 2020-05-25 15:44 ` Lukas Straub
  2020-06-16 14:42   ` Daniel P. Berrangé
  2020-06-06 19:30 ` [PATCH v4 0/4] Introduce 'yank' oob qmp command to recover from hanging qemu Lukas Straub
  4 siblings, 1 reply; 22+ messages in thread
From: Lukas Straub @ 2020-05-25 15:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Daniel P. Berrangé,
	qemu-block, Juan Quintela, Dr. David Alan Gilbert, Max Reitz,
	Paolo Bonzini, Marc-André Lureau

[-- Attachment #1: Type: text/plain, Size: 8598 bytes --]

Register yank functions on sockets to shut them down.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
---
 Makefile.objs                 |  1 +
 migration/channel.c           | 12 ++++++++++++
 migration/migration.c         | 18 +++++++++++++++++-
 migration/multifd.c           | 10 ++++++++++
 migration/qemu-file-channel.c |  6 ++++++
 migration/savevm.c            |  2 ++
 tests/Makefile.include        |  2 +-
 7 files changed, 49 insertions(+), 2 deletions(-)

diff --git a/Makefile.objs b/Makefile.objs
index 5582f4eda9..d2a49d3834 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -58,6 +58,7 @@ common-obj-$(CONFIG_LINUX) += fsdev/

 common-obj-y += accel/
 common-obj-y += migration/
+common-obj-y += yank.o

 common-obj-y += audio/
 common-obj-m += audio/
diff --git a/migration/channel.c b/migration/channel.c
index 20e4c8e2dc..5cb48d403a 100644
--- a/migration/channel.c
+++ b/migration/channel.c
@@ -18,6 +18,8 @@
 #include "trace.h"
 #include "qapi/error.h"
 #include "io/channel-tls.h"
+#include "io/channel-socket.h"
+#include "yank.h"

 /**
  * @migration_channel_process_incoming - Create new incoming migration channel
@@ -35,6 +37,11 @@ void migration_channel_process_incoming(QIOChannel *ioc)
     trace_migration_set_incoming_channel(
         ioc, object_get_typename(OBJECT(ioc)));

+    if (object_dynamic_cast(OBJECT(ioc), TYPE_QIO_CHANNEL_SOCKET)) {
+        yank_register_function((char *) "migration", yank_generic_iochannel,
+                               QIO_CHANNEL(ioc));
+    }
+
     if (s->parameters.tls_creds &&
         *s->parameters.tls_creds &&
         !object_dynamic_cast(OBJECT(ioc),
@@ -67,6 +74,11 @@ void migration_channel_connect(MigrationState *s,
         ioc, object_get_typename(OBJECT(ioc)), hostname, error);

     if (!error) {
+        if (object_dynamic_cast(OBJECT(ioc), TYPE_QIO_CHANNEL_SOCKET)) {
+            yank_register_function((char *) "migration", yank_generic_iochannel,
+                                   QIO_CHANNEL(ioc));
+        }
+
         if (s->parameters.tls_creds &&
             *s->parameters.tls_creds &&
             !object_dynamic_cast(OBJECT(ioc),
diff --git a/migration/migration.c b/migration/migration.c
index 187ac0410c..c6d7119c08 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -54,6 +54,7 @@
 #include "net/announce.h"
 #include "qemu/queue.h"
 #include "multifd.h"
+#include "yank.h"

 #define MAX_THROTTLE  (32 << 20)      /* Migration transfer speed throttling */

@@ -231,6 +232,8 @@ void migration_incoming_state_destroy(void)
         qapi_free_SocketAddressList(mis->socket_address_list);
         mis->socket_address_list = NULL;
     }
+
+    yank_unregister_instance((char *) "migration");
 }

 static void migrate_generate_event(int new_state)
@@ -362,7 +365,9 @@ void qemu_start_incoming_migration(const char *uri, Error **errp)
     const char *p;

     qapi_event_send_migration(MIGRATION_STATUS_SETUP);
+    yank_register_instance((char *) "migration");
     if (!strcmp(uri, "defer")) {
+        yank_unregister_instance((char *) "migration");
         deferred_incoming_migration(errp);
     } else if (strstart(uri, "tcp:", &p)) {
         tcp_start_incoming_migration(p, errp);
@@ -377,6 +382,7 @@ void qemu_start_incoming_migration(const char *uri, Error **errp)
     } else if (strstart(uri, "fd:", &p)) {
         fd_start_incoming_migration(p, errp);
     } else {
+        yank_unregister_instance((char *) "migration");
         error_setg(errp, "unknown migration protocol: %s", uri);
     }
 }
@@ -1632,6 +1638,7 @@ static void migrate_fd_cleanup(MigrationState *s)
     }
     notifier_list_notify(&migration_state_notifiers, s);
     block_cleanup_parameters(s);
+    yank_unregister_instance((char *) "migration");
 }

 static void migrate_fd_cleanup_schedule(MigrationState *s)
@@ -1899,6 +1906,7 @@ void qmp_migrate_recover(const char *uri, Error **errp)
      * only re-setup the migration stream and poke existing migration
      * to continue using that newly established channel.
      */
+    yank_unregister_instance((char *) "migration");
     qemu_start_incoming_migration(uri, errp);
 }

@@ -2035,7 +2043,9 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
         /* Error detected, put into errp */
         return;
     }
-
+    if (!(has_resume && resume)) {
+        yank_register_instance((char *) "migration");
+    }
     if (strstart(uri, "tcp:", &p)) {
         tcp_start_outgoing_migration(s, p, &local_err);
 #ifdef CONFIG_RDMA
@@ -2049,6 +2059,9 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
     } else if (strstart(uri, "fd:", &p)) {
         fd_start_outgoing_migration(s, p, &local_err);
     } else {
+        if (!(has_resume && resume)) {
+            yank_unregister_instance((char *) "migration");
+        }
         error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "uri",
                    "a valid migration protocol");
         migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
@@ -2058,6 +2071,9 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
     }

     if (local_err) {
+        if (!(has_resume && resume)) {
+            yank_unregister_instance((char *) "migration");
+        }
         migrate_fd_error(s, local_err);
         error_propagate(errp, local_err);
         return;
diff --git a/migration/multifd.c b/migration/multifd.c
index cb6a4a3ab8..97566262c1 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -24,6 +24,9 @@
 #include "trace.h"
 #include "multifd.h"

+#include "yank.h"
+#include "io/channel-socket.h"
+
 /* Multiple fd's */

 #define MULTIFD_MAGIC 0x11223344U
@@ -856,6 +859,13 @@ int multifd_load_cleanup(Error **errp)
     for (i = 0; i < migrate_multifd_channels(); i++) {
         MultiFDRecvParams *p = &multifd_recv_state->params[i];

+        if (object_dynamic_cast(OBJECT(p->c), TYPE_QIO_CHANNEL_SOCKET)
+            && OBJECT(p->c)->ref == 1) {
+            yank_unregister_function((char *) "migration",
+                                     yank_generic_iochannel,
+                                     QIO_CHANNEL(p->c));
+        }
+
         object_unref(OBJECT(p->c));
         p->c = NULL;
         qemu_mutex_destroy(&p->mutex);
diff --git a/migration/qemu-file-channel.c b/migration/qemu-file-channel.c
index d2ce32f4b9..b725ac8098 100644
--- a/migration/qemu-file-channel.c
+++ b/migration/qemu-file-channel.c
@@ -27,6 +27,7 @@
 #include "qemu-file.h"
 #include "io/channel-socket.h"
 #include "qemu/iov.h"
+#include "yank.h"


 static ssize_t channel_writev_buffer(void *opaque,
@@ -104,6 +105,11 @@ static int channel_close(void *opaque, Error **errp)
     int ret;
     QIOChannel *ioc = QIO_CHANNEL(opaque);
     ret = qio_channel_close(ioc, errp);
+    if (object_dynamic_cast(OBJECT(ioc), TYPE_QIO_CHANNEL_SOCKET)
+        && OBJECT(ioc)->ref == 1) {
+        yank_unregister_function((char *) "migration", yank_generic_iochannel,
+                                 QIO_CHANNEL(ioc));
+    }
     object_unref(OBJECT(ioc));
     return ret;
 }
diff --git a/migration/savevm.c b/migration/savevm.c
index c00a6807d9..c5f957d4e4 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -63,6 +63,7 @@
 #include "migration/colo.h"
 #include "qemu/bitmap.h"
 #include "net/announce.h"
+#include "yank.h"

 const unsigned int postcopy_ram_discard_version = 0;

@@ -2897,6 +2898,7 @@ int load_snapshot(const char *name, Error **errp)
     qemu_system_reset(SHUTDOWN_CAUSE_NONE);
     mis->from_src_file = f;

+    yank_register_instance((char *) "migration");
     aio_context_acquire(aio_context);
     ret = qemu_loadvm_state(f);
     migration_incoming_state_destroy();
diff --git a/tests/Makefile.include b/tests/Makefile.include
index 51de676298..e76a5458c3 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -442,7 +442,7 @@ tests/test-qdev-global-props$(EXESUF): tests/test-qdev-global-props.o \
 	$(test-qapi-obj-y)
 tests/test-vmstate$(EXESUF): tests/test-vmstate.o \
 	migration/vmstate.o migration/vmstate-types.o migration/qemu-file.o \
-        migration/qemu-file-channel.o migration/qjson.o \
+        migration/qemu-file-channel.o migration/qjson.o yank.o \
 	$(test-io-obj-y)
 tests/test-timed-average$(EXESUF): tests/test-timed-average.o $(test-util-obj-y)
 tests/test-base64$(EXESUF): tests/test-base64.o $(test-util-obj-y)
--
2.20.1

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 0/4] Introduce 'yank' oob qmp command to recover from hanging qemu
  2020-05-25 15:44 [PATCH v4 0/4] Introduce 'yank' oob qmp command to recover from hanging qemu Lukas Straub
                   ` (3 preceding siblings ...)
  2020-05-25 15:44 ` [PATCH v4 4/4] migration: " Lukas Straub
@ 2020-06-06 19:30 ` Lukas Straub
  2020-06-17 14:41   ` Stefan Hajnoczi
  4 siblings, 1 reply; 22+ messages in thread
From: Lukas Straub @ 2020-06-06 19:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Daniel P. Berrangé,
	qemu-block, Juan Quintela, Dr. David Alan Gilbert, Max Reitz,
	Paolo Bonzini, Marc-André Lureau

[-- Attachment #1: Type: text/plain, Size: 740 bytes --]

On Mon, 25 May 2020 17:44:12 +0200
Lukas Straub <lukasstraub2@web.de> wrote:

> Hello Everyone,
> In many cases, if qemu has a network connection (qmp, migration, chardev, etc.)
> to some other server and that server dies or hangs, qemu hangs too.
> These patches introduce the new 'yank' out-of-band qmp command to recover from
> these kinds of hangs. The different subsystems register callbacks which get
> executed with the yank command. For example the callback can shutdown() a
> socket. This is intended for the colo use-case, but it can be used for other
> things too of course.
> 
> Regards,
> Lukas Straub

Hello Everyone,
Can this be reviewed, it would be cool to have this in qemu 5.1.

Regards,
Lukas Straub

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 1/4] Introduce yank feature
  2020-05-25 15:44 ` [PATCH v4 1/4] Introduce yank feature Lukas Straub
@ 2020-06-16 14:39   ` Daniel P. Berrangé
  2020-06-19 14:23     ` Lukas Straub
  2020-06-16 14:49   ` Daniel P. Berrangé
  2020-06-17 15:12   ` Stefan Hajnoczi
  2 siblings, 1 reply; 22+ messages in thread
From: Daniel P. Berrangé @ 2020-06-16 14:39 UTC (permalink / raw)
  To: Lukas Straub
  Cc: Kevin Wolf, qemu-block, Juan Quintela, qemu-devel, Max Reitz,
	Paolo Bonzini, Marc-André Lureau, Dr. David Alan Gilbert

On Mon, May 25, 2020 at 05:44:23PM +0200, Lukas Straub wrote:
> The yank feature allows to recover from hanging qemu by "yanking"
> at various parts. Other qemu systems can register themselves and
> multiple yank functions. Then all yank functions for selected
> instances can be called by the 'yank' out-of-band qmp command.
> Available instances can be queried by a 'query-yank' oob command.
> 
> Signed-off-by: Lukas Straub <lukasstraub2@web.de>
> ---
>  qapi/misc.json |  45 +++++++++++++
>  yank.c         | 174 +++++++++++++++++++++++++++++++++++++++++++++++++
>  yank.h         |  67 +++++++++++++++++++
>  3 files changed, 286 insertions(+)
>  create mode 100644 yank.c
>  create mode 100644 yank.h

> +void yank_register_function(char *instance_name, YankFn *func, void *opaque)
> +{
> +    struct YankInstance *instance;
> +    struct YankFuncAndParam *entry;
> +
> +    qemu_mutex_lock(&lock);
> +    instance = yank_find_instance(instance_name);
> +    assert(instance);
> +
> +    entry = g_slice_new(struct YankFuncAndParam);
> +    entry->func = func;
> +    entry->opaque = opaque;
> +
> +    QLIST_INSERT_HEAD(&instance->yankfns, entry, next);
> +    qemu_mutex_unlock(&lock);
> +}
> +
> +void yank_unregister_function(char *instance_name, YankFn *func, void *opaque)
> +{
> +    struct YankInstance *instance;
> +    struct YankFuncAndParam *entry;
> +
> +    qemu_mutex_lock(&lock);
> +    instance = yank_find_instance(instance_name);
> +    assert(instance);
> +
> +    QLIST_FOREACH(entry, &instance->yankfns, next) {
> +        if (entry->func == func && entry->opaque == opaque) {
> +            QLIST_REMOVE(entry, next);
> +            g_slice_free(struct YankFuncAndParam, entry);
> +            qemu_mutex_unlock(&lock);
> +            return;
> +        }
> +    }
> +
> +    abort();
> +}

Since the NBD impl no longer needs to register multiple different functions
on the same insance_nane, these methods could be be simplified, to only
accept a single function, instead of keeping a whole list. This would avoid
need to pass a function into the unregister() method at all.

I don't consider this a blocker though, so you pick whether to do this
or not.


> diff --git a/yank.h b/yank.h
> new file mode 100644
> index 0000000000..f1c8743b72
> --- /dev/null
> +++ b/yank.h
> @@ -0,0 +1,67 @@
> +

Missing license header

> +#ifndef YANK_H
> +#define YANK_H
> +
> +typedef void (YankFn) (void *opaque);
> +
> +/**
> + * yank_register_instance: Register a new instance.
> + *
> + * This registers a new instance for yanking. Must be called before any yank
> + * function is registered for this instance.
> + *
> + * This function is thread-safe.
> + *
> + * @instance_name: The globally unique name of the instance.
> + */
> +void yank_register_instance(char *instance_name);
> +
> +/**
> + * yank_unregister_instance: Unregister a instance.
> + *
> + * This unregisters a instance. Must be called only after every yank function
> + * of the instance has been unregistered.
> + *
> + * This function is thread-safe.
> + *
> + * @instance_name: The name of the instance.
> + */
> +void yank_unregister_instance(char *instance_name);
> +
> +/**
> + * yank_register_function: Register a yank function
> + *
> + * This registers a yank function. All limitations of qmp oob commands apply
> + * to the yank function as well.
> + *
> + * This function is thread-safe.
> + *
> + * @instance_name: The name of the instance
> + * @func: The yank function
> + * @opaque: Will be passed to the yank function
> + */
> +void yank_register_function(char *instance_name, YankFn *func, void *opaque);
> +
> +/**
> + * yank_unregister_function: Unregister a yank function
> + *
> + * This unregisters a yank function.
> + *
> + * This function is thread-safe.
> + *
> + * @instance_name: The name of the instance
> + * @func: func that was passed to yank_register_function
> + * @opaque: opaque that was passed to yank_register_function
> + */
> +void yank_unregister_function(char *instance_name, YankFn *func, void *opaque);
> +
> +/**
> + * yank_unregister_function: Generic yank function for iochannel
> + *
> + * This is a generic yank function which will call qio_channel_shutdown on the
> + * provided QIOChannel.
> + *
> + * @opaque: QIOChannel to shutdown
> + */
> +void yank_generic_iochannel(void *opaque);
> +#endif



Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 2/4] block/nbd.c: Add yank feature
  2020-05-25 15:44 ` [PATCH v4 2/4] block/nbd.c: Add " Lukas Straub
@ 2020-06-16 14:40   ` Daniel P. Berrangé
  2020-06-16 14:44   ` Daniel P. Berrangé
  2020-06-17 15:09   ` Stefan Hajnoczi
  2 siblings, 0 replies; 22+ messages in thread
From: Daniel P. Berrangé @ 2020-06-16 14:40 UTC (permalink / raw)
  To: Lukas Straub
  Cc: Kevin Wolf, qemu-block, Juan Quintela, qemu-devel, Max Reitz,
	Paolo Bonzini, Marc-André Lureau, Dr. David Alan Gilbert

On Mon, May 25, 2020 at 05:44:26PM +0200, Lukas Straub wrote:
> Register a yank function which shuts down the socket and sets
> s->state = NBD_CLIENT_QUIT. This is the same behaviour as if an
> error occured.
> 
> Signed-off-by: Lukas Straub <lukasstraub2@web.de>
> ---
>  Makefile.objs |   1 +
>  block/nbd.c   | 101 ++++++++++++++++++++++++++++++++------------------
>  2 files changed, 65 insertions(+), 37 deletions(-)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>



Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 3/4] chardev/char-socket.c: Add yank feature
  2020-05-25 15:44 ` [PATCH v4 3/4] chardev/char-socket.c: " Lukas Straub
@ 2020-06-16 14:41   ` Daniel P. Berrangé
  0 siblings, 0 replies; 22+ messages in thread
From: Daniel P. Berrangé @ 2020-06-16 14:41 UTC (permalink / raw)
  To: Lukas Straub
  Cc: Kevin Wolf, qemu-block, Juan Quintela, qemu-devel, Max Reitz,
	Paolo Bonzini, Marc-André Lureau, Dr. David Alan Gilbert

On Mon, May 25, 2020 at 05:44:29PM +0200, Lukas Straub wrote:
> Register a yank function to shutdown the socket on yank.
> 
> Signed-off-by: Lukas Straub <lukasstraub2@web.de>
> ---
>  Makefile.objs         |  1 +
>  chardev/char-socket.c | 24 ++++++++++++++++++++++++
>  2 files changed, 25 insertions(+)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 4/4] migration: Add yank feature
  2020-05-25 15:44 ` [PATCH v4 4/4] migration: " Lukas Straub
@ 2020-06-16 14:42   ` Daniel P. Berrangé
  0 siblings, 0 replies; 22+ messages in thread
From: Daniel P. Berrangé @ 2020-06-16 14:42 UTC (permalink / raw)
  To: Lukas Straub
  Cc: Kevin Wolf, qemu-block, Juan Quintela, qemu-devel, Max Reitz,
	Paolo Bonzini, Marc-André Lureau, Dr. David Alan Gilbert

On Mon, May 25, 2020 at 05:44:33PM +0200, Lukas Straub wrote:
> Register yank functions on sockets to shut them down.
> 
> Signed-off-by: Lukas Straub <lukasstraub2@web.de>
> ---
>  Makefile.objs                 |  1 +
>  migration/channel.c           | 12 ++++++++++++
>  migration/migration.c         | 18 +++++++++++++++++-
>  migration/multifd.c           | 10 ++++++++++
>  migration/qemu-file-channel.c |  6 ++++++
>  migration/savevm.c            |  2 ++
>  tests/Makefile.include        |  2 +-
>  7 files changed, 49 insertions(+), 2 deletions(-)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 2/4] block/nbd.c: Add yank feature
  2020-05-25 15:44 ` [PATCH v4 2/4] block/nbd.c: Add " Lukas Straub
  2020-06-16 14:40   ` Daniel P. Berrangé
@ 2020-06-16 14:44   ` Daniel P. Berrangé
  2020-06-19 16:23     ` Lukas Straub
  2020-06-17 15:09   ` Stefan Hajnoczi
  2 siblings, 1 reply; 22+ messages in thread
From: Daniel P. Berrangé @ 2020-06-16 14:44 UTC (permalink / raw)
  To: Lukas Straub
  Cc: Kevin Wolf, qemu-block, Juan Quintela, qemu-devel, Max Reitz,
	Paolo Bonzini, Marc-André Lureau, Dr. David Alan Gilbert

On Mon, May 25, 2020 at 05:44:26PM +0200, Lukas Straub wrote:
> Register a yank function which shuts down the socket and sets
> s->state = NBD_CLIENT_QUIT. This is the same behaviour as if an
> error occured.
> 
> Signed-off-by: Lukas Straub <lukasstraub2@web.de>
> ---
>  Makefile.objs |   1 +
>  block/nbd.c   | 101 ++++++++++++++++++++++++++++++++------------------
>  2 files changed, 65 insertions(+), 37 deletions(-)
> 
> diff --git a/Makefile.objs b/Makefile.objs
> index a7c967633a..8e403b81f3 100644
> --- a/Makefile.objs
> +++ b/Makefile.objs
> @@ -18,6 +18,7 @@ block-obj-y += block.o blockjob.o job.o
>  block-obj-y += block/ scsi/
>  block-obj-y += qemu-io-cmds.o
>  block-obj-$(CONFIG_REPLICATION) += replication.o
> +block-obj-y += yank.o

Oh, I see this is repeated for migration and chardev code too.

Instead of doing this and relying on linker to merge duplicates,
I think we should put yank.c into util/ and built it into util-obj-y,
so it gets added to everything.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 1/4] Introduce yank feature
  2020-05-25 15:44 ` [PATCH v4 1/4] Introduce yank feature Lukas Straub
  2020-06-16 14:39   ` Daniel P. Berrangé
@ 2020-06-16 14:49   ` Daniel P. Berrangé
  2020-06-17 15:12   ` Stefan Hajnoczi
  2 siblings, 0 replies; 22+ messages in thread
From: Daniel P. Berrangé @ 2020-06-16 14:49 UTC (permalink / raw)
  To: Lukas Straub
  Cc: Kevin Wolf, qemu-block, Juan Quintela, qemu-devel, Max Reitz,
	Paolo Bonzini, Marc-André Lureau, Dr. David Alan Gilbert

On Mon, May 25, 2020 at 05:44:23PM +0200, Lukas Straub wrote:
> The yank feature allows to recover from hanging qemu by "yanking"
> at various parts. Other qemu systems can register themselves and
> multiple yank functions. Then all yank functions for selected
> instances can be called by the 'yank' out-of-band qmp command.
> Available instances can be queried by a 'query-yank' oob command.
> 
> Signed-off-by: Lukas Straub <lukasstraub2@web.de>
> ---
>  qapi/misc.json |  45 +++++++++++++
>  yank.c         | 174 +++++++++++++++++++++++++++++++++++++++++++++++++
>  yank.h         |  67 +++++++++++++++++++

As mentioned n the second patch, I think these should live in util/
and util/Makefile.objs should add them to util-obj-y

Ideally MAINTAINERS would be updated to list someone as the official
maintainer too.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 0/4] Introduce 'yank' oob qmp command to recover from hanging qemu
  2020-06-06 19:30 ` [PATCH v4 0/4] Introduce 'yank' oob qmp command to recover from hanging qemu Lukas Straub
@ 2020-06-17 14:41   ` Stefan Hajnoczi
  0 siblings, 0 replies; 22+ messages in thread
From: Stefan Hajnoczi @ 2020-06-17 14:41 UTC (permalink / raw)
  To: Lukas Straub
  Cc: Kevin Wolf, Daniel P. Berrangé,
	qemu-block, Juan Quintela, Dr. David Alan Gilbert, qemu-devel,
	Marc-André Lureau, Paolo Bonzini, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 1028 bytes --]

On Sat, Jun 06, 2020 at 09:30:38PM +0200, Lukas Straub wrote:
> On Mon, 25 May 2020 17:44:12 +0200
> Lukas Straub <lukasstraub2@web.de> wrote:
> 
> > Hello Everyone,
> > In many cases, if qemu has a network connection (qmp, migration, chardev, etc.)
> > to some other server and that server dies or hangs, qemu hangs too.
> > These patches introduce the new 'yank' out-of-band qmp command to recover from
> > these kinds of hangs. The different subsystems register callbacks which get
> > executed with the yank command. For example the callback can shutdown() a
> > socket. This is intended for the colo use-case, but it can be used for other
> > things too of course.
> > 
> > Regards,
> > Lukas Straub
> 
> Hello Everyone,
> Can this be reviewed, it would be cool to have this in qemu 5.1.

Please see my reply to a previous version. Code that executes in the oob
environment needs to take special precautions, this needs to be
documented so that yank API users know what the limitations are.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 2/4] block/nbd.c: Add yank feature
  2020-05-25 15:44 ` [PATCH v4 2/4] block/nbd.c: Add " Lukas Straub
  2020-06-16 14:40   ` Daniel P. Berrangé
  2020-06-16 14:44   ` Daniel P. Berrangé
@ 2020-06-17 15:09   ` Stefan Hajnoczi
  2020-06-19 18:07     ` Lukas Straub
  2 siblings, 1 reply; 22+ messages in thread
From: Stefan Hajnoczi @ 2020-06-17 15:09 UTC (permalink / raw)
  To: Lukas Straub
  Cc: Kevin Wolf, Daniel P. Berrangé,
	qemu-block, Juan Quintela, Dr. David Alan Gilbert, qemu-devel,
	Marc-André Lureau, Paolo Bonzini, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 1348 bytes --]

On Mon, May 25, 2020 at 05:44:26PM +0200, Lukas Straub wrote:
> @@ -1395,6 +1407,15 @@ static int nbd_client_reopen_prepare(BDRVReopenState *state,
>      return 0;
>  }
> 
> +static void nbd_yank(void *opaque)
> +{
> +    BlockDriverState *bs = opaque;
> +    BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
> +
> +    qio_channel_shutdown(QIO_CHANNEL(s->sioc), QIO_CHANNEL_SHUTDOWN_BOTH, NULL);

qio_channel_shutdown() is not guaranteed to be thread-safe. Please
document new assumptions that are being introduced.

Today we can more or less get away with it (although TLS sockets are a
little iffy) because it boils down the a shutdown(2) system call. I
think it would be okay to update the qio_channel_shutdown() and
.io_shutdown() documentation to clarify that this is thread-safe.

> +    atomic_set(&s->state, NBD_CLIENT_QUIT);

docs/devel/atomics.rst says:

  No barriers are implied by ``atomic_read`` and ``atomic_set`` in either Linux
  or QEMU.

Other threads might not see the latest value of s->state because this is
a weakly ordered memory access.

I haven't audited the NBD code in detail, but if you want the other
threads to always see NBD_CLIENT_QUIT then s->state should be set before
calling qio_channel_shutdown() using a stronger atomics API like
atomic_load_acquire()/atomic_store_release().

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 1/4] Introduce yank feature
  2020-05-25 15:44 ` [PATCH v4 1/4] Introduce yank feature Lukas Straub
  2020-06-16 14:39   ` Daniel P. Berrangé
  2020-06-16 14:49   ` Daniel P. Berrangé
@ 2020-06-17 15:12   ` Stefan Hajnoczi
  2020-06-19 16:29     ` Lukas Straub
  2 siblings, 1 reply; 22+ messages in thread
From: Stefan Hajnoczi @ 2020-06-17 15:12 UTC (permalink / raw)
  To: Lukas Straub
  Cc: Kevin Wolf, Daniel P. Berrangé,
	qemu-block, Juan Quintela, Dr. David Alan Gilbert, qemu-devel,
	Marc-André Lureau, Paolo Bonzini, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 265 bytes --]

On Mon, May 25, 2020 at 05:44:23PM +0200, Lukas Straub wrote:
> +static struct YankInstance *yank_find_instance(char *name)

There are const char * -> char * casts in later patches. Please use
const char * where possible. Callers shouldn't need to cast away const.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 1/4] Introduce yank feature
  2020-06-16 14:39   ` Daniel P. Berrangé
@ 2020-06-19 14:23     ` Lukas Straub
  2020-06-19 16:53       ` Daniel P. Berrangé
  0 siblings, 1 reply; 22+ messages in thread
From: Lukas Straub @ 2020-06-19 14:23 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Kevin Wolf, qemu-block, Juan Quintela, qemu-devel, Max Reitz,
	Paolo Bonzini, Marc-André Lureau, Dr. David Alan Gilbert

[-- Attachment #1: Type: text/plain, Size: 5020 bytes --]

On Tue, 16 Jun 2020 15:39:57 +0100
Daniel P. Berrangé <berrange@redhat.com> wrote:

> On Mon, May 25, 2020 at 05:44:23PM +0200, Lukas Straub wrote:
> > The yank feature allows to recover from hanging qemu by "yanking"
> > at various parts. Other qemu systems can register themselves and
> > multiple yank functions. Then all yank functions for selected
> > instances can be called by the 'yank' out-of-band qmp command.
> > Available instances can be queried by a 'query-yank' oob command.
> > 
> > Signed-off-by: Lukas Straub <lukasstraub2@web.de>
> > ---
> >  qapi/misc.json |  45 +++++++++++++
> >  yank.c         | 174 +++++++++++++++++++++++++++++++++++++++++++++++++
> >  yank.h         |  67 +++++++++++++++++++
> >  3 files changed, 286 insertions(+)
> >  create mode 100644 yank.c
> >  create mode 100644 yank.h  
> 
> > +void yank_register_function(char *instance_name, YankFn *func, void *opaque)
> > +{
> > +    struct YankInstance *instance;
> > +    struct YankFuncAndParam *entry;
> > +
> > +    qemu_mutex_lock(&lock);
> > +    instance = yank_find_instance(instance_name);
> > +    assert(instance);
> > +
> > +    entry = g_slice_new(struct YankFuncAndParam);
> > +    entry->func = func;
> > +    entry->opaque = opaque;
> > +
> > +    QLIST_INSERT_HEAD(&instance->yankfns, entry, next);
> > +    qemu_mutex_unlock(&lock);
> > +}
> > +
> > +void yank_unregister_function(char *instance_name, YankFn *func, void *opaque)
> > +{
> > +    struct YankInstance *instance;
> > +    struct YankFuncAndParam *entry;
> > +
> > +    qemu_mutex_lock(&lock);
> > +    instance = yank_find_instance(instance_name);
> > +    assert(instance);
> > +
> > +    QLIST_FOREACH(entry, &instance->yankfns, next) {
> > +        if (entry->func == func && entry->opaque == opaque) {
> > +            QLIST_REMOVE(entry, next);
> > +            g_slice_free(struct YankFuncAndParam, entry);
> > +            qemu_mutex_unlock(&lock);
> > +            return;
> > +        }
> > +    }
> > +
> > +    abort();
> > +}  
> 
> Since the NBD impl no longer needs to register multiple different functions
> on the same insance_nane, these methods could be be simplified, to only
> accept a single function, instead of keeping a whole list. This would avoid
> need to pass a function into the unregister() method at all.

Multiple yank functions are still needed for multifd migration.

> I don't consider this a blocker though, so you pick whether to do this
> or not.
> 
> 
> > diff --git a/yank.h b/yank.h
> > new file mode 100644
> > index 0000000000..f1c8743b72
> > --- /dev/null
> > +++ b/yank.h
> > @@ -0,0 +1,67 @@
> > +  
> 
> Missing license header

I will fix that in the next version.

Thanks,
Lukas Straub

> 
> > +#ifndef YANK_H
> > +#define YANK_H
> > +
> > +typedef void (YankFn) (void *opaque);
> > +
> > +/**
> > + * yank_register_instance: Register a new instance.
> > + *
> > + * This registers a new instance for yanking. Must be called before any yank
> > + * function is registered for this instance.
> > + *
> > + * This function is thread-safe.
> > + *
> > + * @instance_name: The globally unique name of the instance.
> > + */
> > +void yank_register_instance(char *instance_name);
> > +
> > +/**
> > + * yank_unregister_instance: Unregister a instance.
> > + *
> > + * This unregisters a instance. Must be called only after every yank function
> > + * of the instance has been unregistered.
> > + *
> > + * This function is thread-safe.
> > + *
> > + * @instance_name: The name of the instance.
> > + */
> > +void yank_unregister_instance(char *instance_name);
> > +
> > +/**
> > + * yank_register_function: Register a yank function
> > + *
> > + * This registers a yank function. All limitations of qmp oob commands apply
> > + * to the yank function as well.
> > + *
> > + * This function is thread-safe.
> > + *
> > + * @instance_name: The name of the instance
> > + * @func: The yank function
> > + * @opaque: Will be passed to the yank function
> > + */
> > +void yank_register_function(char *instance_name, YankFn *func, void *opaque);
> > +
> > +/**
> > + * yank_unregister_function: Unregister a yank function
> > + *
> > + * This unregisters a yank function.
> > + *
> > + * This function is thread-safe.
> > + *
> > + * @instance_name: The name of the instance
> > + * @func: func that was passed to yank_register_function
> > + * @opaque: opaque that was passed to yank_register_function
> > + */
> > +void yank_unregister_function(char *instance_name, YankFn *func, void *opaque);
> > +
> > +/**
> > + * yank_unregister_function: Generic yank function for iochannel
> > + *
> > + * This is a generic yank function which will call qio_channel_shutdown on the
> > + * provided QIOChannel.
> > + *
> > + * @opaque: QIOChannel to shutdown
> > + */
> > +void yank_generic_iochannel(void *opaque);
> > +#endif  
> 
> 
> 
> Regards,
> Daniel


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 2/4] block/nbd.c: Add yank feature
  2020-06-16 14:44   ` Daniel P. Berrangé
@ 2020-06-19 16:23     ` Lukas Straub
  0 siblings, 0 replies; 22+ messages in thread
From: Lukas Straub @ 2020-06-19 16:23 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Kevin Wolf, qemu-block, Juan Quintela, qemu-devel, Max Reitz,
	Paolo Bonzini, Marc-André Lureau, Dr. David Alan Gilbert

[-- Attachment #1: Type: text/plain, Size: 1248 bytes --]

On Tue, 16 Jun 2020 15:44:06 +0100
Daniel P. Berrangé <berrange@redhat.com> wrote:

> On Mon, May 25, 2020 at 05:44:26PM +0200, Lukas Straub wrote:
> > Register a yank function which shuts down the socket and sets
> > s->state = NBD_CLIENT_QUIT. This is the same behaviour as if an
> > error occured.
> > 
> > Signed-off-by: Lukas Straub <lukasstraub2@web.de>
> > ---
> >  Makefile.objs |   1 +
> >  block/nbd.c   | 101 ++++++++++++++++++++++++++++++++------------------
> >  2 files changed, 65 insertions(+), 37 deletions(-)
> > 
> > diff --git a/Makefile.objs b/Makefile.objs
> > index a7c967633a..8e403b81f3 100644
> > --- a/Makefile.objs
> > +++ b/Makefile.objs
> > @@ -18,6 +18,7 @@ block-obj-y += block.o blockjob.o job.o
> >  block-obj-y += block/ scsi/
> >  block-obj-y += qemu-io-cmds.o
> >  block-obj-$(CONFIG_REPLICATION) += replication.o
> > +block-obj-y += yank.o  
> 
> Oh, I see this is repeated for migration and chardev code too.
> 
> Instead of doing this and relying on linker to merge duplicates,
> I think we should put yank.c into util/ and built it into util-obj-y,
> so it gets added to everything.

Ok, I will do this in the next version.

Thanks,
Lukas Straub

> Regards,
> Daniel


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 1/4] Introduce yank feature
  2020-06-17 15:12   ` Stefan Hajnoczi
@ 2020-06-19 16:29     ` Lukas Straub
  2020-06-19 16:52       ` Daniel P. Berrangé
  0 siblings, 1 reply; 22+ messages in thread
From: Lukas Straub @ 2020-06-19 16:29 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Daniel P. Berrangé,
	qemu-block, Juan Quintela, Dr. David Alan Gilbert, qemu-devel,
	Marc-André Lureau, Paolo Bonzini, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 507 bytes --]

On Wed, 17 Jun 2020 16:12:40 +0100
Stefan Hajnoczi <stefanha@gmail.com> wrote:

> On Mon, May 25, 2020 at 05:44:23PM +0200, Lukas Straub wrote:
> > +static struct YankInstance *yank_find_instance(char *name)  
> 
> There are const char * -> char * casts in later patches. Please use
> const char * where possible. Callers shouldn't need to cast away const.

nbd and chardev generate the instance name dynamically so it needs to be char *, but in migration it's hardcoded.

Thanks,
Lukas Straub

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 1/4] Introduce yank feature
  2020-06-19 16:29     ` Lukas Straub
@ 2020-06-19 16:52       ` Daniel P. Berrangé
  2020-06-19 16:59         ` Lukas Straub
  0 siblings, 1 reply; 22+ messages in thread
From: Daniel P. Berrangé @ 2020-06-19 16:52 UTC (permalink / raw)
  To: Lukas Straub
  Cc: Kevin Wolf, qemu-block, Juan Quintela, Stefan Hajnoczi,
	qemu-devel, Dr. David Alan Gilbert, Paolo Bonzini,
	Marc-André Lureau, Max Reitz

On Fri, Jun 19, 2020 at 06:29:24PM +0200, Lukas Straub wrote:
> On Wed, 17 Jun 2020 16:12:40 +0100
> Stefan Hajnoczi <stefanha@gmail.com> wrote:
> 
> > On Mon, May 25, 2020 at 05:44:23PM +0200, Lukas Straub wrote:
> > > +static struct YankInstance *yank_find_instance(char *name)  
> > 
> > There are const char * -> char * casts in later patches. Please use
> > const char * where possible. Callers shouldn't need to cast away const.
> 
> nbd and chardev generate the instance name dynamically so it
> needs to be char *, but in migration it's hardcoded.

I think you're looking at it from the wrong perspective.

The yank_find_instance() method never modifies the 'name' paramater
that it receives. Therefore it should be "const char *". Likewise
for the other yank_*()  methods in fact.

The caller can have a char *, or a const char * as suits its needs.
Either can be passed into the yank_* methods and will gain const-ness
from the POV of yank code.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 1/4] Introduce yank feature
  2020-06-19 14:23     ` Lukas Straub
@ 2020-06-19 16:53       ` Daniel P. Berrangé
  0 siblings, 0 replies; 22+ messages in thread
From: Daniel P. Berrangé @ 2020-06-19 16:53 UTC (permalink / raw)
  To: Lukas Straub
  Cc: Kevin Wolf, qemu-block, Juan Quintela, qemu-devel, Max Reitz,
	Paolo Bonzini, Marc-André Lureau, Dr. David Alan Gilbert

On Fri, Jun 19, 2020 at 04:23:50PM +0200, Lukas Straub wrote:
> On Tue, 16 Jun 2020 15:39:57 +0100
> Daniel P. Berrangé <berrange@redhat.com> wrote:
> 
> > On Mon, May 25, 2020 at 05:44:23PM +0200, Lukas Straub wrote:
> > > The yank feature allows to recover from hanging qemu by "yanking"
> > > at various parts. Other qemu systems can register themselves and
> > > multiple yank functions. Then all yank functions for selected
> > > instances can be called by the 'yank' out-of-band qmp command.
> > > Available instances can be queried by a 'query-yank' oob command.
> > > 
> > > Signed-off-by: Lukas Straub <lukasstraub2@web.de>
> > > ---
> > >  qapi/misc.json |  45 +++++++++++++
> > >  yank.c         | 174 +++++++++++++++++++++++++++++++++++++++++++++++++
> > >  yank.h         |  67 +++++++++++++++++++
> > >  3 files changed, 286 insertions(+)
> > >  create mode 100644 yank.c
> > >  create mode 100644 yank.h  
> > 
> > > +void yank_register_function(char *instance_name, YankFn *func, void *opaque)
> > > +{
> > > +    struct YankInstance *instance;
> > > +    struct YankFuncAndParam *entry;
> > > +
> > > +    qemu_mutex_lock(&lock);
> > > +    instance = yank_find_instance(instance_name);
> > > +    assert(instance);
> > > +
> > > +    entry = g_slice_new(struct YankFuncAndParam);
> > > +    entry->func = func;
> > > +    entry->opaque = opaque;
> > > +
> > > +    QLIST_INSERT_HEAD(&instance->yankfns, entry, next);
> > > +    qemu_mutex_unlock(&lock);
> > > +}
> > > +
> > > +void yank_unregister_function(char *instance_name, YankFn *func, void *opaque)
> > > +{
> > > +    struct YankInstance *instance;
> > > +    struct YankFuncAndParam *entry;
> > > +
> > > +    qemu_mutex_lock(&lock);
> > > +    instance = yank_find_instance(instance_name);
> > > +    assert(instance);
> > > +
> > > +    QLIST_FOREACH(entry, &instance->yankfns, next) {
> > > +        if (entry->func == func && entry->opaque == opaque) {
> > > +            QLIST_REMOVE(entry, next);
> > > +            g_slice_free(struct YankFuncAndParam, entry);
> > > +            qemu_mutex_unlock(&lock);
> > > +            return;
> > > +        }
> > > +    }
> > > +
> > > +    abort();
> > > +}  
> > 
> > Since the NBD impl no longer needs to register multiple different functions
> > on the same insance_nane, these methods could be be simplified, to only
> > accept a single function, instead of keeping a whole list. This would avoid
> > need to pass a function into the unregister() method at all.
> 
> Multiple yank functions are still needed for multifd migration.

Oh I missed that subtlety, so fine to ignore my suggestion.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 1/4] Introduce yank feature
  2020-06-19 16:52       ` Daniel P. Berrangé
@ 2020-06-19 16:59         ` Lukas Straub
  0 siblings, 0 replies; 22+ messages in thread
From: Lukas Straub @ 2020-06-19 16:59 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Kevin Wolf, qemu-block, Juan Quintela, Stefan Hajnoczi,
	qemu-devel, Dr. David Alan Gilbert, Paolo Bonzini,
	Marc-André Lureau, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 1229 bytes --]

On Fri, 19 Jun 2020 17:52:40 +0100
Daniel P. Berrangé <berrange@redhat.com> wrote:

> On Fri, Jun 19, 2020 at 06:29:24PM +0200, Lukas Straub wrote:
> > On Wed, 17 Jun 2020 16:12:40 +0100
> > Stefan Hajnoczi <stefanha@gmail.com> wrote:
> >   
> > > On Mon, May 25, 2020 at 05:44:23PM +0200, Lukas Straub wrote:  
> > > > +static struct YankInstance *yank_find_instance(char *name)    
> > > 
> > > There are const char * -> char * casts in later patches. Please use
> > > const char * where possible. Callers shouldn't need to cast away const.  
> > 
> > nbd and chardev generate the instance name dynamically so it
> > needs to be char *, but in migration it's hardcoded.  
> 
> I think you're looking at it from the wrong perspective.
> 
> The yank_find_instance() method never modifies the 'name' paramater
> that it receives. Therefore it should be "const char *". Likewise
> for the other yank_*()  methods in fact.
> 
> The caller can have a char *, or a const char * as suits its needs.
> Either can be passed into the yank_* methods and will gain const-ness
> from the POV of yank code.

Makes sense, I will change it in the next version.

Thanks,
Lukas Straub

> 
> Regards,
> Daniel


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 2/4] block/nbd.c: Add yank feature
  2020-06-17 15:09   ` Stefan Hajnoczi
@ 2020-06-19 18:07     ` Lukas Straub
  0 siblings, 0 replies; 22+ messages in thread
From: Lukas Straub @ 2020-06-19 18:07 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Daniel P. Berrangé,
	qemu-block, Juan Quintela, Dr. David Alan Gilbert, qemu-devel,
	Marc-André Lureau, Paolo Bonzini, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 1692 bytes --]

On Wed, 17 Jun 2020 16:09:09 +0100
Stefan Hajnoczi <stefanha@gmail.com> wrote:

> On Mon, May 25, 2020 at 05:44:26PM +0200, Lukas Straub wrote:
> > @@ -1395,6 +1407,15 @@ static int nbd_client_reopen_prepare(BDRVReopenState *state,
> >      return 0;
> >  }
> > 
> > +static void nbd_yank(void *opaque)
> > +{
> > +    BlockDriverState *bs = opaque;
> > +    BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
> > +
> > +    qio_channel_shutdown(QIO_CHANNEL(s->sioc), QIO_CHANNEL_SHUTDOWN_BOTH, NULL);  
> 
> qio_channel_shutdown() is not guaranteed to be thread-safe. Please
> document new assumptions that are being introduced.
> 
> Today we can more or less get away with it (although TLS sockets are a
> little iffy) because it boils down the a shutdown(2) system call. I
> think it would be okay to update the qio_channel_shutdown() and
> .io_shutdown() documentation to clarify that this is thread-safe.

Good idea, especially since the migration code already assumes this. I will do this in the next version.

> > +    atomic_set(&s->state, NBD_CLIENT_QUIT);  
> 
> docs/devel/atomics.rst says:
> 
>   No barriers are implied by ``atomic_read`` and ``atomic_set`` in either Linux
>   or QEMU.
> 
> Other threads might not see the latest value of s->state because this is
> a weakly ordered memory access.
> 
> I haven't audited the NBD code in detail, but if you want the other
> threads to always see NBD_CLIENT_QUIT then s->state should be set before
> calling qio_channel_shutdown() using a stronger atomics API like
> atomic_load_acquire()/atomic_store_release().

You are right, I will change that in the next version.

Thanks,
Lukas Straub

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2020-06-19 18:11 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-25 15:44 [PATCH v4 0/4] Introduce 'yank' oob qmp command to recover from hanging qemu Lukas Straub
2020-05-25 15:44 ` [PATCH v4 1/4] Introduce yank feature Lukas Straub
2020-06-16 14:39   ` Daniel P. Berrangé
2020-06-19 14:23     ` Lukas Straub
2020-06-19 16:53       ` Daniel P. Berrangé
2020-06-16 14:49   ` Daniel P. Berrangé
2020-06-17 15:12   ` Stefan Hajnoczi
2020-06-19 16:29     ` Lukas Straub
2020-06-19 16:52       ` Daniel P. Berrangé
2020-06-19 16:59         ` Lukas Straub
2020-05-25 15:44 ` [PATCH v4 2/4] block/nbd.c: Add " Lukas Straub
2020-06-16 14:40   ` Daniel P. Berrangé
2020-06-16 14:44   ` Daniel P. Berrangé
2020-06-19 16:23     ` Lukas Straub
2020-06-17 15:09   ` Stefan Hajnoczi
2020-06-19 18:07     ` Lukas Straub
2020-05-25 15:44 ` [PATCH v4 3/4] chardev/char-socket.c: " Lukas Straub
2020-06-16 14:41   ` Daniel P. Berrangé
2020-05-25 15:44 ` [PATCH v4 4/4] migration: " Lukas Straub
2020-06-16 14:42   ` Daniel P. Berrangé
2020-06-06 19:30 ` [PATCH v4 0/4] Introduce 'yank' oob qmp command to recover from hanging qemu Lukas Straub
2020-06-17 14:41   ` Stefan Hajnoczi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).