qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 0/6] colo-compare bugfixes
@ 2020-05-22  6:47 Lukas Straub
  2020-05-22  6:47 ` [PATCH v6 1/6] net/colo-compare.c: Create event_bh with the right AioContext Lukas Straub
                   ` (5 more replies)
  0 siblings, 6 replies; 8+ messages in thread
From: Lukas Straub @ 2020-05-22  6:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: Zhang Chen, Jason Wang, Li Zhijian, Marc-André Lureau

[-- Attachment #1: Type: text/plain, Size: 1231 bytes --]

Hello Everyone,
Here are fixes for bugs that I found in my tests. I have tested this with my
test suite and everything works fine.

Regards,
Lukas Straub

Version changes:
v6:
 -don't touch softmmu/vl.c, use __contructor__ attribute instead
 -better wording

v5:
 -initialize and use notify_sendco only if notify_dev is set

v4:
 -fix potential deadlock with notify_remote_frame
 -avoid malloc and memcpy in many cases

v3:
 -fix checkpatch.pl error

v2:
 -better wording
 -fix performance-regression in patch 3 "net/colo-compare.c: Fix deadlock in compare_chr_send"
 -add more bugfixes

Lukas Straub (6):
  net/colo-compare.c: Create event_bh with the right AioContext
  chardev/char.c: Use qemu_co_sleep_ns if in coroutine
  net/colo-compare.c: Fix deadlock in compare_chr_send
  net/colo-compare.c: Only hexdump packets if tracing is enabled
  net/colo-compare.c: Check that colo-compare is active
  net/colo-compare.c: Correct ordering in complete and finalize

 chardev/char.c     |   7 +-
 net/colo-compare.c | 254 ++++++++++++++++++++++++++++++++++-----------
 net/colo.c         |   7 ++
 net/colo.h         |   1 +
 4 files changed, 206 insertions(+), 63 deletions(-)

--
2.20.1

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v6 1/6] net/colo-compare.c: Create event_bh with the right AioContext
  2020-05-22  6:47 [PATCH v6 0/6] colo-compare bugfixes Lukas Straub
@ 2020-05-22  6:47 ` Lukas Straub
  2020-05-22  6:47 ` [PATCH v6 2/6] chardev/char.c: Use qemu_co_sleep_ns if in coroutine Lukas Straub
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Lukas Straub @ 2020-05-22  6:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: Zhang Chen, Jason Wang, Li Zhijian, Marc-André Lureau

[-- Attachment #1: Type: text/plain, Size: 1433 bytes --]

qemu_bh_new will set the bh to be executed in the main
loop. This causes crashes as colo_compare_handle_event assumes
that it has exclusive access the queues, which are also
concurrently accessed in the iothread.

Create the bh with the AioContext of the iothread to fulfill
these assumptions and fix the crashes. This is safe, because
the bh already takes the appropriate locks.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Derek Su <dereksu@qnap.com>
Tested-by: Derek Su <dereksu@qnap.com>
---
 net/colo-compare.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index 10c0239f9d..1de4220fe2 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -890,6 +890,7 @@ static void colo_compare_handle_event(void *opaque)

 static void colo_compare_iothread(CompareState *s)
 {
+    AioContext *ctx = iothread_get_aio_context(s->iothread);
     object_ref(OBJECT(s->iothread));
     s->worker_context = iothread_get_g_main_context(s->iothread);

@@ -906,7 +907,7 @@ static void colo_compare_iothread(CompareState *s)
     }

     colo_compare_timer_init(s);
-    s->event_bh = qemu_bh_new(colo_compare_handle_event, s);
+    s->event_bh = aio_bh_new(ctx, colo_compare_handle_event, s);
 }

 static char *compare_get_pri_indev(Object *obj, Error **errp)
--
2.20.1


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v6 2/6] chardev/char.c: Use qemu_co_sleep_ns if in coroutine
  2020-05-22  6:47 [PATCH v6 0/6] colo-compare bugfixes Lukas Straub
  2020-05-22  6:47 ` [PATCH v6 1/6] net/colo-compare.c: Create event_bh with the right AioContext Lukas Straub
@ 2020-05-22  6:47 ` Lukas Straub
  2020-05-22  6:47 ` [PATCH v6 3/6] net/colo-compare.c: Fix deadlock in compare_chr_send Lukas Straub
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Lukas Straub @ 2020-05-22  6:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: Zhang Chen, Jason Wang, Li Zhijian, Marc-André Lureau

[-- Attachment #1: Type: text/plain, Size: 1151 bytes --]

This will be needed in the next patch so compare_chr_send can be
converted to a coroutine.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
---
 chardev/char.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/chardev/char.c b/chardev/char.c
index e77564060d..5c8014199f 100644
--- a/chardev/char.c
+++ b/chardev/char.c
@@ -38,6 +38,7 @@
 #include "qemu/module.h"
 #include "qemu/option.h"
 #include "qemu/id.h"
+#include "qemu/coroutine.h"

 #include "chardev/char-mux.h"

@@ -119,7 +120,11 @@ static int qemu_chr_write_buffer(Chardev *s,
     retry:
         res = cc->chr_write(s, buf + *offset, len - *offset);
         if (res < 0 && errno == EAGAIN && write_all) {
-            g_usleep(100);
+            if (qemu_in_coroutine()) {
+                qemu_co_sleep_ns(QEMU_CLOCK_REALTIME, 100000);
+            } else {
+                g_usleep(100);
+            }
             goto retry;
         }

--
2.20.1


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v6 3/6] net/colo-compare.c: Fix deadlock in compare_chr_send
  2020-05-22  6:47 [PATCH v6 0/6] colo-compare bugfixes Lukas Straub
  2020-05-22  6:47 ` [PATCH v6 1/6] net/colo-compare.c: Create event_bh with the right AioContext Lukas Straub
  2020-05-22  6:47 ` [PATCH v6 2/6] chardev/char.c: Use qemu_co_sleep_ns if in coroutine Lukas Straub
@ 2020-05-22  6:47 ` Lukas Straub
  2020-05-22  6:47 ` [PATCH v6 4/6] net/colo-compare.c: Only hexdump packets if tracing is enabled Lukas Straub
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Lukas Straub @ 2020-05-22  6:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: Zhang Chen, Jason Wang, Li Zhijian, Marc-André Lureau

[-- Attachment #1: Type: text/plain, Size: 12064 bytes --]

The chr_out chardev is connected to a filter-redirector
running in the main loop. qemu_chr_fe_write_all might block
here in compare_chr_send if the (socket-)buffer is full.
If another filter-redirector in the main loop want's to
send data to chr_pri_in it might also block if the buffer
is full. This leads to a deadlock because both event loops
get blocked.

Fix this by converting compare_chr_send to a coroutine and
putting the packets in a send queue.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Zhang Chen <chen.zhang@intel.com>
Tested-by: Zhang Chen <chen.zhang@intel.com>
---
 net/colo-compare.c | 193 ++++++++++++++++++++++++++++++++++-----------
 net/colo.c         |   7 ++
 net/colo.h         |   1 +
 3 files changed, 156 insertions(+), 45 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index 1de4220fe2..e47c8c6049 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -32,6 +32,9 @@
 #include "migration/migration.h"
 #include "util.h"

+#include "block/aio-wait.h"
+#include "qemu/coroutine.h"
+
 #define TYPE_COLO_COMPARE "colo-compare"
 #define COLO_COMPARE(obj) \
     OBJECT_CHECK(CompareState, (obj), TYPE_COLO_COMPARE)
@@ -77,6 +80,23 @@ static int event_unhandled_count;
  *                    |packet  |  |packet  +    |packet  | |packet  +
  *                    +--------+  +--------+    +--------+ +--------+
  */
+
+typedef struct SendCo {
+    Coroutine *co;
+    struct CompareState *s;
+    CharBackend *chr;
+    GQueue send_list;
+    bool notify_remote_frame;
+    bool done;
+    int ret;
+} SendCo;
+
+typedef struct SendEntry {
+    uint32_t size;
+    uint32_t vnet_hdr_len;
+    uint8_t *buf;
+} SendEntry;
+
 typedef struct CompareState {
     Object parent;

@@ -91,6 +111,8 @@ typedef struct CompareState {
     SocketReadState pri_rs;
     SocketReadState sec_rs;
     SocketReadState notify_rs;
+    SendCo out_sendco;
+    SendCo notify_sendco;
     bool vnet_hdr;
     uint32_t compare_timeout;
     uint32_t expired_scan_cycle;
@@ -124,10 +146,11 @@ enum {


 static int compare_chr_send(CompareState *s,
-                            const uint8_t *buf,
+                            uint8_t *buf,
                             uint32_t size,
                             uint32_t vnet_hdr_len,
-                            bool notify_remote_frame);
+                            bool notify_remote_frame,
+                            bool zero_copy);

 static bool packet_matches_str(const char *str,
                                const uint8_t *buf,
@@ -145,7 +168,7 @@ static void notify_remote_frame(CompareState *s)
     char msg[] = "DO_CHECKPOINT";
     int ret = 0;

-    ret = compare_chr_send(s, (uint8_t *)msg, strlen(msg), 0, true);
+    ret = compare_chr_send(s, (uint8_t *)msg, strlen(msg), 0, true, false);
     if (ret < 0) {
         error_report("Notify Xen COLO-frame failed");
     }
@@ -272,12 +295,13 @@ static void colo_release_primary_pkt(CompareState *s, Packet *pkt)
                            pkt->data,
                            pkt->size,
                            pkt->vnet_hdr_len,
-                           false);
+                           false,
+                           true);
     if (ret < 0) {
         error_report("colo send primary packet failed");
     }
     trace_colo_compare_main("packet same and release packet");
-    packet_destroy(pkt, NULL);
+    packet_destroy_partial(pkt, NULL);
 }

 /*
@@ -699,65 +723,115 @@ static void colo_compare_connection(void *opaque, void *user_data)
     }
 }

-static int compare_chr_send(CompareState *s,
-                            const uint8_t *buf,
-                            uint32_t size,
-                            uint32_t vnet_hdr_len,
-                            bool notify_remote_frame)
+static void coroutine_fn _compare_chr_send(void *opaque)
 {
+    SendCo *sendco = opaque;
+    CompareState *s = sendco->s;
     int ret = 0;
-    uint32_t len = htonl(size);

-    if (!size) {
-        return 0;
-    }
+    while (!g_queue_is_empty(&sendco->send_list)) {
+        SendEntry *entry = g_queue_pop_tail(&sendco->send_list);
+        uint32_t len = htonl(entry->size);

-    if (notify_remote_frame) {
-        ret = qemu_chr_fe_write_all(&s->chr_notify_dev,
-                                    (uint8_t *)&len,
-                                    sizeof(len));
-    } else {
-        ret = qemu_chr_fe_write_all(&s->chr_out, (uint8_t *)&len, sizeof(len));
-    }
+        ret = qemu_chr_fe_write_all(sendco->chr, (uint8_t *)&len, sizeof(len));

-    if (ret != sizeof(len)) {
-        goto err;
-    }
+        if (ret != sizeof(len)) {
+            g_free(entry->buf);
+            g_slice_free(SendEntry, entry);
+            goto err;
+        }

-    if (s->vnet_hdr) {
-        /*
-         * We send vnet header len make other module(like filter-redirector)
-         * know how to parse net packet correctly.
-         */
-        len = htonl(vnet_hdr_len);
+        if (!sendco->notify_remote_frame && s->vnet_hdr) {
+            /*
+             * We send vnet header len make other module(like filter-redirector)
+             * know how to parse net packet correctly.
+             */
+            len = htonl(entry->vnet_hdr_len);

-        if (!notify_remote_frame) {
-            ret = qemu_chr_fe_write_all(&s->chr_out,
+            ret = qemu_chr_fe_write_all(sendco->chr,
                                         (uint8_t *)&len,
                                         sizeof(len));
+
+            if (ret != sizeof(len)) {
+                g_free(entry->buf);
+                g_slice_free(SendEntry, entry);
+                goto err;
+            }
         }

-        if (ret != sizeof(len)) {
+        ret = qemu_chr_fe_write_all(sendco->chr,
+                                    (uint8_t *)entry->buf,
+                                    entry->size);
+
+        if (ret != entry->size) {
+            g_free(entry->buf);
+            g_slice_free(SendEntry, entry);
             goto err;
         }
+
+        g_free(entry->buf);
+        g_slice_free(SendEntry, entry);
     }

+    sendco->ret = 0;
+    goto out;
+
+err:
+    while (!g_queue_is_empty(&sendco->send_list)) {
+        SendEntry *entry = g_queue_pop_tail(&sendco->send_list);
+        g_free(entry->buf);
+        g_slice_free(SendEntry, entry);
+    }
+    sendco->ret = ret < 0 ? ret : -EIO;
+out:
+    sendco->co = NULL;
+    sendco->done = true;
+    aio_wait_kick();
+}
+
+static int compare_chr_send(CompareState *s,
+                            uint8_t *buf,
+                            uint32_t size,
+                            uint32_t vnet_hdr_len,
+                            bool notify_remote_frame,
+                            bool zero_copy)
+{
+    SendCo *sendco;
+    SendEntry *entry;
+
     if (notify_remote_frame) {
-        ret = qemu_chr_fe_write_all(&s->chr_notify_dev,
-                                    (uint8_t *)buf,
-                                    size);
+        sendco = &s->notify_sendco;
     } else {
-        ret = qemu_chr_fe_write_all(&s->chr_out, (uint8_t *)buf, size);
+        sendco = &s->out_sendco;
     }

-    if (ret != size) {
-        goto err;
+    if (!size) {
+        return 0;
     }

-    return 0;
+    entry = g_slice_new(SendEntry);
+    entry->size = size;
+    entry->vnet_hdr_len = vnet_hdr_len;
+    if (zero_copy) {
+        entry->buf = buf;
+    } else {
+        entry->buf = g_malloc(size);
+        memcpy(entry->buf, buf, size);
+    }
+    g_queue_push_head(&sendco->send_list, entry);
+
+    if (sendco->done) {
+        sendco->co = qemu_coroutine_create(_compare_chr_send, sendco);
+        sendco->done = false;
+        qemu_coroutine_enter(sendco->co);
+        if (sendco->done) {
+            /* report early errors */
+            return sendco->ret;
+        }
+    }

-err:
-    return ret < 0 ? ret : -EIO;
+    /* assume success */
+    return 0;
 }

 static int compare_chr_can_read(void *opaque)
@@ -1063,6 +1137,7 @@ static void compare_pri_rs_finalize(SocketReadState *pri_rs)
                          pri_rs->buf,
                          pri_rs->packet_len,
                          pri_rs->vnet_hdr_len,
+                         false,
                          false);
     } else {
         /* compare packet in the specified connection */
@@ -1093,7 +1168,7 @@ static void compare_notify_rs_finalize(SocketReadState *notify_rs)
     if (packet_matches_str("COLO_USERSPACE_PROXY_INIT",
                            notify_rs->buf,
                            notify_rs->packet_len)) {
-        ret = compare_chr_send(s, (uint8_t *)msg, strlen(msg), 0, true);
+        ret = compare_chr_send(s, (uint8_t *)msg, strlen(msg), 0, true, false);
         if (ret < 0) {
             error_report("Notify Xen COLO-frame INIT failed");
         }
@@ -1199,6 +1274,20 @@ static void colo_compare_complete(UserCreatable *uc, Error **errp)

     QTAILQ_INSERT_TAIL(&net_compares, s, next);

+    s->out_sendco.s = s;
+    s->out_sendco.chr = &s->chr_out;
+    s->out_sendco.notify_remote_frame = false;
+    s->out_sendco.done = true;
+    g_queue_init(&s->out_sendco.send_list);
+
+    if (s->notify_dev) {
+        s->notify_sendco.s = s;
+        s->notify_sendco.chr = &s->chr_notify_dev;
+        s->notify_sendco.notify_remote_frame = true;
+        s->notify_sendco.done = true;
+        g_queue_init(&s->notify_sendco.send_list);
+    }
+
     g_queue_init(&s->conn_list);

     qemu_mutex_init(&event_mtx);
@@ -1225,8 +1314,9 @@ static void colo_flush_packets(void *opaque, void *user_data)
                          pkt->data,
                          pkt->size,
                          pkt->vnet_hdr_len,
-                         false);
-        packet_destroy(pkt, NULL);
+                         false,
+                         true);
+        packet_destroy_partial(pkt, NULL);
     }
     while (!g_queue_is_empty(&conn->secondary_list)) {
         pkt = g_queue_pop_head(&conn->secondary_list);
@@ -1301,10 +1391,23 @@ static void colo_compare_finalize(Object *obj)
         }
     }

+    AioContext *ctx = iothread_get_aio_context(s->iothread);
+    aio_context_acquire(ctx);
+    AIO_WAIT_WHILE(ctx, !s->out_sendco.done);
+    if (s->notify_dev) {
+        AIO_WAIT_WHILE(ctx, !s->notify_sendco.done);
+    }
+    aio_context_release(ctx);
+
     /* Release all unhandled packets after compare thead exited */
     g_queue_foreach(&s->conn_list, colo_flush_packets, s);
+    AIO_WAIT_WHILE(NULL, !s->out_sendco.done);

     g_queue_clear(&s->conn_list);
+    g_queue_clear(&s->out_sendco.send_list);
+    if (s->notify_dev) {
+        g_queue_clear(&s->notify_sendco.send_list);
+    }

     if (s->connection_track_table) {
         g_hash_table_destroy(s->connection_track_table);
diff --git a/net/colo.c b/net/colo.c
index 8196b35837..a6c66d829a 100644
--- a/net/colo.c
+++ b/net/colo.c
@@ -185,6 +185,13 @@ void packet_destroy(void *opaque, void *user_data)
     g_slice_free(Packet, pkt);
 }

+void packet_destroy_partial(void *opaque, void *user_data)
+{
+    Packet *pkt = opaque;
+
+    g_slice_free(Packet, pkt);
+}
+
 /*
  * Clear hashtable, stop this hash growing really huge
  */
diff --git a/net/colo.h b/net/colo.h
index 679314b1ca..573ab91785 100644
--- a/net/colo.h
+++ b/net/colo.h
@@ -102,5 +102,6 @@ bool connection_has_tracked(GHashTable *connection_track_table,
 void connection_hashtable_reset(GHashTable *connection_track_table);
 Packet *packet_new(const void *data, int size, int vnet_hdr_len);
 void packet_destroy(void *opaque, void *user_data);
+void packet_destroy_partial(void *opaque, void *user_data);

 #endif /* NET_COLO_H */
--
2.20.1


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v6 4/6] net/colo-compare.c: Only hexdump packets if tracing is enabled
  2020-05-22  6:47 [PATCH v6 0/6] colo-compare bugfixes Lukas Straub
                   ` (2 preceding siblings ...)
  2020-05-22  6:47 ` [PATCH v6 3/6] net/colo-compare.c: Fix deadlock in compare_chr_send Lukas Straub
@ 2020-05-22  6:47 ` Lukas Straub
  2020-05-22  6:47 ` [PATCH v6 5/6] net/colo-compare.c: Check that colo-compare is active Lukas Straub
  2020-05-22  6:47 ` [PATCH v6 6/6] net/colo-compare.c: Correct ordering in complete and finalize Lukas Straub
  5 siblings, 0 replies; 8+ messages in thread
From: Lukas Straub @ 2020-05-22  6:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: Zhang Chen, Jason Wang, Li Zhijian, Marc-André Lureau

[-- Attachment #1: Type: text/plain, Size: 1274 bytes --]

Else the log will be flooded if there is a lot of network
traffic.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
---
 net/colo-compare.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index e47c8c6049..7886444cdf 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -483,10 +483,12 @@ sec:
         g_queue_push_head(&conn->primary_list, ppkt);
         g_queue_push_head(&conn->secondary_list, spkt);

-        qemu_hexdump((char *)ppkt->data, stderr,
-                     "colo-compare ppkt", ppkt->size);
-        qemu_hexdump((char *)spkt->data, stderr,
-                     "colo-compare spkt", spkt->size);
+        if (trace_event_get_state_backends(TRACE_COLO_COMPARE_MISCOMPARE)) {
+            qemu_hexdump((char *)ppkt->data, stderr,
+                        "colo-compare ppkt", ppkt->size);
+            qemu_hexdump((char *)spkt->data, stderr,
+                        "colo-compare spkt", spkt->size);
+        }

         colo_compare_inconsistency_notify(s);
     }
--
2.20.1


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v6 5/6] net/colo-compare.c: Check that colo-compare is active
  2020-05-22  6:47 [PATCH v6 0/6] colo-compare bugfixes Lukas Straub
                   ` (3 preceding siblings ...)
  2020-05-22  6:47 ` [PATCH v6 4/6] net/colo-compare.c: Only hexdump packets if tracing is enabled Lukas Straub
@ 2020-05-22  6:47 ` Lukas Straub
  2020-05-22  8:03   ` Zhang, Chen
  2020-05-22  6:47 ` [PATCH v6 6/6] net/colo-compare.c: Correct ordering in complete and finalize Lukas Straub
  5 siblings, 1 reply; 8+ messages in thread
From: Lukas Straub @ 2020-05-22  6:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: Zhang Chen, Jason Wang, Li Zhijian, Marc-André Lureau

[-- Attachment #1: Type: text/plain, Size: 4246 bytes --]

If the colo-compare object is removed before failover and a
checkpoint happens, qemu crashes because it tries to lock
the destroyed event_mtx in colo_notify_compares_event.

Fix this by checking if everything is initialized by
introducing a new variable colo_compare_active which
is protected by a new mutex colo_compare_mutex. The new mutex
also protects against concurrent access of the net_compares
list and makes sure that colo_notify_compares_event isn't
active while we destroy event_mtx and event_complete_cond.

With this it also is again possible to use colo without
colo-compare (periodic mode) and to use multiple colo-compare
for multiple network interfaces.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Tested-by: Lukas Straub <lukasstraub2@web.de>
---
 net/colo-compare.c | 35 +++++++++++++++++++++++++++++------
 1 file changed, 29 insertions(+), 6 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index 7886444cdf..64d2453450 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -54,6 +54,8 @@ static NotifierList colo_compare_notifiers =
 #define REGULAR_PACKET_CHECK_MS 3000
 #define DEFAULT_TIME_OUT_MS 3000

+static QemuMutex colo_compare_mutex;
+static bool colo_compare_active;
 static QemuMutex event_mtx;
 static QemuCond event_complete_cond;
 static int event_unhandled_count;
@@ -906,6 +908,12 @@ static void check_old_packet_regular(void *opaque)
 void colo_notify_compares_event(void *opaque, int event, Error **errp)
 {
     CompareState *s;
+    qemu_mutex_lock(&colo_compare_mutex);
+
+    if (!colo_compare_active) {
+        qemu_mutex_unlock(&colo_compare_mutex);
+        return;
+    }

     qemu_mutex_lock(&event_mtx);
     QTAILQ_FOREACH(s, &net_compares, next) {
@@ -919,6 +927,7 @@ void colo_notify_compares_event(void *opaque, int event, Error **errp)
     }

     qemu_mutex_unlock(&event_mtx);
+    qemu_mutex_unlock(&colo_compare_mutex);
 }

 static void colo_compare_timer_init(CompareState *s)
@@ -1274,7 +1283,14 @@ static void colo_compare_complete(UserCreatable *uc, Error **errp)
                            s->vnet_hdr);
     }

+    qemu_mutex_lock(&colo_compare_mutex);
+    if (!colo_compare_active) {
+        qemu_mutex_init(&event_mtx);
+        qemu_cond_init(&event_complete_cond);
+        colo_compare_active = true;
+    }
     QTAILQ_INSERT_TAIL(&net_compares, s, next);
+    qemu_mutex_unlock(&colo_compare_mutex);

     s->out_sendco.s = s;
     s->out_sendco.chr = &s->chr_out;
@@ -1292,9 +1308,6 @@ static void colo_compare_complete(UserCreatable *uc, Error **errp)

     g_queue_init(&s->conn_list);

-    qemu_mutex_init(&event_mtx);
-    qemu_cond_init(&event_complete_cond);
-
     s->connection_track_table = g_hash_table_new_full(connection_key_hash,
                                                       connection_key_equal,
                                                       g_free,
@@ -1386,12 +1399,19 @@ static void colo_compare_finalize(Object *obj)

     qemu_bh_delete(s->event_bh);

+    qemu_mutex_lock(&colo_compare_mutex);
     QTAILQ_FOREACH(tmp, &net_compares, next) {
         if (tmp == s) {
             QTAILQ_REMOVE(&net_compares, s, next);
             break;
         }
     }
+    if (QTAILQ_EMPTY(&net_compares)) {
+        colo_compare_active = false;
+        qemu_mutex_destroy(&event_mtx);
+        qemu_cond_destroy(&event_complete_cond);
+    }
+    qemu_mutex_unlock(&colo_compare_mutex);

     AioContext *ctx = iothread_get_aio_context(s->iothread);
     aio_context_acquire(ctx);
@@ -1419,15 +1439,18 @@ static void colo_compare_finalize(Object *obj)
         object_unref(OBJECT(s->iothread));
     }

-    qemu_mutex_destroy(&event_mtx);
-    qemu_cond_destroy(&event_complete_cond);
-
     g_free(s->pri_indev);
     g_free(s->sec_indev);
     g_free(s->outdev);
     g_free(s->notify_dev);
 }

+static void __attribute__((__constructor__)) colo_compare_init_globals(void)
+{
+    colo_compare_active = false;
+    qemu_mutex_init(&colo_compare_mutex);
+}
+
 static const TypeInfo colo_compare_info = {
     .name = TYPE_COLO_COMPARE,
     .parent = TYPE_OBJECT,
--
2.20.1


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v6 6/6] net/colo-compare.c: Correct ordering in complete and finalize
  2020-05-22  6:47 [PATCH v6 0/6] colo-compare bugfixes Lukas Straub
                   ` (4 preceding siblings ...)
  2020-05-22  6:47 ` [PATCH v6 5/6] net/colo-compare.c: Check that colo-compare is active Lukas Straub
@ 2020-05-22  6:47 ` Lukas Straub
  5 siblings, 0 replies; 8+ messages in thread
From: Lukas Straub @ 2020-05-22  6:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: Zhang Chen, Jason Wang, Li Zhijian, Marc-André Lureau

[-- Attachment #1: Type: text/plain, Size: 2938 bytes --]

In colo_compare_complete, insert CompareState into net_compares
only after everything has been initialized.
In colo_compare_finalize, remove CompareState from net_compares
before anything is deinitialized.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Zhang Chen <chen.zhang@intel.com>
---
 net/colo-compare.c | 45 +++++++++++++++++++++++----------------------
 1 file changed, 23 insertions(+), 22 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index 64d2453450..160776d39e 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -1283,15 +1283,6 @@ static void colo_compare_complete(UserCreatable *uc, Error **errp)
                            s->vnet_hdr);
     }

-    qemu_mutex_lock(&colo_compare_mutex);
-    if (!colo_compare_active) {
-        qemu_mutex_init(&event_mtx);
-        qemu_cond_init(&event_complete_cond);
-        colo_compare_active = true;
-    }
-    QTAILQ_INSERT_TAIL(&net_compares, s, next);
-    qemu_mutex_unlock(&colo_compare_mutex);
-
     s->out_sendco.s = s;
     s->out_sendco.chr = &s->chr_out;
     s->out_sendco.notify_remote_frame = false;
@@ -1314,6 +1305,16 @@ static void colo_compare_complete(UserCreatable *uc, Error **errp)
                                                       connection_destroy);

     colo_compare_iothread(s);
+
+    qemu_mutex_lock(&colo_compare_mutex);
+    if (!colo_compare_active) {
+        qemu_mutex_init(&event_mtx);
+        qemu_cond_init(&event_complete_cond);
+        colo_compare_active = true;
+    }
+    QTAILQ_INSERT_TAIL(&net_compares, s, next);
+    qemu_mutex_unlock(&colo_compare_mutex);
+
     return;
 }

@@ -1386,19 +1387,6 @@ static void colo_compare_finalize(Object *obj)
     CompareState *s = COLO_COMPARE(obj);
     CompareState *tmp = NULL;

-    qemu_chr_fe_deinit(&s->chr_pri_in, false);
-    qemu_chr_fe_deinit(&s->chr_sec_in, false);
-    qemu_chr_fe_deinit(&s->chr_out, false);
-    if (s->notify_dev) {
-        qemu_chr_fe_deinit(&s->chr_notify_dev, false);
-    }
-
-    if (s->iothread) {
-        colo_compare_timer_del(s);
-    }
-
-    qemu_bh_delete(s->event_bh);
-
     qemu_mutex_lock(&colo_compare_mutex);
     QTAILQ_FOREACH(tmp, &net_compares, next) {
         if (tmp == s) {
@@ -1413,6 +1401,19 @@ static void colo_compare_finalize(Object *obj)
     }
     qemu_mutex_unlock(&colo_compare_mutex);

+    qemu_chr_fe_deinit(&s->chr_pri_in, false);
+    qemu_chr_fe_deinit(&s->chr_sec_in, false);
+    qemu_chr_fe_deinit(&s->chr_out, false);
+    if (s->notify_dev) {
+        qemu_chr_fe_deinit(&s->chr_notify_dev, false);
+    }
+
+    if (s->iothread) {
+        colo_compare_timer_del(s);
+    }
+
+    qemu_bh_delete(s->event_bh);
+
     AioContext *ctx = iothread_get_aio_context(s->iothread);
     aio_context_acquire(ctx);
     AIO_WAIT_WHILE(ctx, !s->out_sendco.done);
--
2.20.1

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* RE: [PATCH v6 5/6] net/colo-compare.c: Check that colo-compare is active
  2020-05-22  6:47 ` [PATCH v6 5/6] net/colo-compare.c: Check that colo-compare is active Lukas Straub
@ 2020-05-22  8:03   ` Zhang, Chen
  0 siblings, 0 replies; 8+ messages in thread
From: Zhang, Chen @ 2020-05-22  8:03 UTC (permalink / raw)
  To: Lukas Straub, qemu-devel; +Cc: Marc-André Lureau, Jason Wang, Li Zhijian



> -----Original Message-----
> From: Lukas Straub <lukasstraub2@web.de>
> Sent: Friday, May 22, 2020 2:48 PM
> To: qemu-devel <qemu-devel@nongnu.org>
> Cc: Zhang, Chen <chen.zhang@intel.com>; Li Zhijian
> <lizhijian@cn.fujitsu.com>; Jason Wang <jasowang@redhat.com>; Marc-
> André Lureau <marcandre.lureau@redhat.com>
> Subject: [PATCH v6 5/6] net/colo-compare.c: Check that colo-compare is
> active
> 
> If the colo-compare object is removed before failover and a checkpoint
> happens, qemu crashes because it tries to lock the destroyed event_mtx in
> colo_notify_compares_event.
> 
> Fix this by checking if everything is initialized by introducing a new variable
> colo_compare_active which is protected by a new mutex
> colo_compare_mutex. The new mutex also protects against concurrent
> access of the net_compares list and makes sure that
> colo_notify_compares_event isn't active while we destroy event_mtx and
> event_complete_cond.
> 
> With this it also is again possible to use colo without colo-compare (periodic
> mode) and to use multiple colo-compare for multiple network interfaces.
> 
> Signed-off-by: Lukas Straub <lukasstraub2@web.de>
> Tested-by: Lukas Straub <lukasstraub2@web.de>
> ---
>  net/colo-compare.c | 35 +++++++++++++++++++++++++++++------
>  1 file changed, 29 insertions(+), 6 deletions(-)
> 
> diff --git a/net/colo-compare.c b/net/colo-compare.c index
> 7886444cdf..64d2453450 100644
> --- a/net/colo-compare.c
> +++ b/net/colo-compare.c
> @@ -54,6 +54,8 @@ static NotifierList colo_compare_notifiers =  #define
> REGULAR_PACKET_CHECK_MS 3000  #define DEFAULT_TIME_OUT_MS 3000
> 
> +static QemuMutex colo_compare_mutex;
> +static bool colo_compare_active;
>  static QemuMutex event_mtx;
>  static QemuCond event_complete_cond;
>  static int event_unhandled_count;
> @@ -906,6 +908,12 @@ static void check_old_packet_regular(void *opaque)
> void colo_notify_compares_event(void *opaque, int event, Error **errp)  {
>      CompareState *s;
> +    qemu_mutex_lock(&colo_compare_mutex);
> +
> +    if (!colo_compare_active) {
> +        qemu_mutex_unlock(&colo_compare_mutex);
> +        return;
> +    }
> 
>      qemu_mutex_lock(&event_mtx);
>      QTAILQ_FOREACH(s, &net_compares, next) { @@ -919,6 +927,7 @@ void
> colo_notify_compares_event(void *opaque, int event, Error **errp)
>      }
> 
>      qemu_mutex_unlock(&event_mtx);
> +    qemu_mutex_unlock(&colo_compare_mutex);
>  }
> 
>  static void colo_compare_timer_init(CompareState *s) @@ -1274,7 +1283,14
> @@ static void colo_compare_complete(UserCreatable *uc, Error **errp)
>                             s->vnet_hdr);
>      }
> 
> +    qemu_mutex_lock(&colo_compare_mutex);
> +    if (!colo_compare_active) {
> +        qemu_mutex_init(&event_mtx);
> +        qemu_cond_init(&event_complete_cond);
> +        colo_compare_active = true;
> +    }
>      QTAILQ_INSERT_TAIL(&net_compares, s, next);
> +    qemu_mutex_unlock(&colo_compare_mutex);
> 
>      s->out_sendco.s = s;
>      s->out_sendco.chr = &s->chr_out;
> @@ -1292,9 +1308,6 @@ static void colo_compare_complete(UserCreatable
> *uc, Error **errp)
> 
>      g_queue_init(&s->conn_list);
> 
> -    qemu_mutex_init(&event_mtx);
> -    qemu_cond_init(&event_complete_cond);
> -
>      s->connection_track_table =
> g_hash_table_new_full(connection_key_hash,
>                                                        connection_key_equal,
>                                                        g_free, @@ -1386,12 +1399,19 @@ static void
> colo_compare_finalize(Object *obj)
> 
>      qemu_bh_delete(s->event_bh);
> 
> +    qemu_mutex_lock(&colo_compare_mutex);
>      QTAILQ_FOREACH(tmp, &net_compares, next) {
>          if (tmp == s) {
>              QTAILQ_REMOVE(&net_compares, s, next);
>              break;
>          }
>      }
> +    if (QTAILQ_EMPTY(&net_compares)) {
> +        colo_compare_active = false;
> +        qemu_mutex_destroy(&event_mtx);
> +        qemu_cond_destroy(&event_complete_cond);
> +    }
> +    qemu_mutex_unlock(&colo_compare_mutex);
> 
>      AioContext *ctx = iothread_get_aio_context(s->iothread);
>      aio_context_acquire(ctx);
> @@ -1419,15 +1439,18 @@ static void colo_compare_finalize(Object *obj)
>          object_unref(OBJECT(s->iothread));
>      }
> 
> -    qemu_mutex_destroy(&event_mtx);
> -    qemu_cond_destroy(&event_complete_cond);
> -
>      g_free(s->pri_indev);
>      g_free(s->sec_indev);
>      g_free(s->outdev);
>      g_free(s->notify_dev);
>  }
> 
> +static void __attribute__((__constructor__))
> +colo_compare_init_globals(void) {
> +    colo_compare_active = false;
> +    qemu_mutex_init(&colo_compare_mutex);
> +}
> +

Looks good for me.
I will queue this series.
Reviewed-by: Zhang Chen <chen.zhang@intel.com>

Thanks
Zhang Chen

>  static const TypeInfo colo_compare_info = {
>      .name = TYPE_COLO_COMPARE,
>      .parent = TYPE_OBJECT,
> --
> 2.20.1



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-05-22  8:04 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-22  6:47 [PATCH v6 0/6] colo-compare bugfixes Lukas Straub
2020-05-22  6:47 ` [PATCH v6 1/6] net/colo-compare.c: Create event_bh with the right AioContext Lukas Straub
2020-05-22  6:47 ` [PATCH v6 2/6] chardev/char.c: Use qemu_co_sleep_ns if in coroutine Lukas Straub
2020-05-22  6:47 ` [PATCH v6 3/6] net/colo-compare.c: Fix deadlock in compare_chr_send Lukas Straub
2020-05-22  6:47 ` [PATCH v6 4/6] net/colo-compare.c: Only hexdump packets if tracing is enabled Lukas Straub
2020-05-22  6:47 ` [PATCH v6 5/6] net/colo-compare.c: Check that colo-compare is active Lukas Straub
2020-05-22  8:03   ` Zhang, Chen
2020-05-22  6:47 ` [PATCH v6 6/6] net/colo-compare.c: Correct ordering in complete and finalize Lukas Straub

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).