[Qemu-devel] [PATCH v4 00/12] Enable postcopy RDMA live migration

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH v4 00/12] Enable postcopy RDMA live migration
@ 2018-05-30  9:43 Lidong Chen
  2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 01/12] migration: disable RDMA WRITE after postcopy started Lidong Chen
                   ` (11 more replies)
  0 siblings, 12 replies; 28+ messages in thread
From: Lidong Chen @ 2018-05-30  9:43 UTC (permalink / raw)
  To: zhang.zhanghailiang, quintela, dgilbert, berrange, aviadye, pbonzini
  Cc: qemu-devel, adido, Lidong Chen

The RDMA QIOChannel does not support bi-directional communication, so when RDMA 
live migration with postcopy enabled, the source qemu return path get qemu file 
error.

These patches implement bi-directional communication for RDMA QIOChannel and 
disable the RDMA WRITE during the postcopy phase.

This patch just make postcopy works, and will improve performance later.

[v4]
 - not wait RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnect
 - implement io_set_aio_fd_handler function for RDMA QIOChannel (Juan Quintela)
 - invoke qio_channel_yield only when qemu_in_coroutine() (Juan Quintela)
 - create a dedicated thread to release rdma resource
 - poll the cm event while wait RDMA work request completion
 - implement the shutdown function for RDMA QIOChannel

[v3]
 - add a mutex in QEMUFile struct to avoid concurrent channel close (Daniel)
 - destroy the mutex before free QEMUFile (David)
 - use rdmain and rmdaout instead of rdma->return_path (Daniel)

[v2]
 - does not update bytes_xfer when disable RDMA WRITE (David)
 - implement bi-directional communication for RDMA QIOChannel (Daniel)

Lidong Chen (12):
  migration: disable RDMA WRITE after postcopy started
  migration: create a dedicated connection for rdma return path
  migration: remove unnecessary variables len in QIOChannelRDMA
  migration: avoid concurrent invoke channel_close by different threads
  migration: implement bi-directional RDMA QIOChannel
  migration: Stop rdma yielding during incoming postcopy
  migration: not wait RDMA_CM_EVENT_DISCONNECTED event after
    rdma_disconnect
  migration: implement io_set_aio_fd_handler function for RDMA
    QIOChannel
  migration: invoke qio_channel_yield only when qemu_in_coroutine()
  migration: create a dedicated thread to release rdma resource
  migration: poll the cm event while wait RDMA work request completion
  migration: implement the shutdown for RDMA QIOChannel

 migration/colo.c              |   2 +
 migration/migration.c         |   2 +
 migration/postcopy-ram.c      |   2 +
 migration/qemu-file-channel.c |  12 +-
 migration/qemu-file.c         |  13 +-
 migration/ram.c               |   4 +
 migration/rdma.c              | 435 ++++++++++++++++++++++++++++++++++++------
 migration/savevm.c            |   3 +
 migration/trace-events        |   1 -
 9 files changed, 411 insertions(+), 63 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH v4 01/12] migration: disable RDMA WRITE after postcopy started
  2018-05-30  9:43 [Qemu-devel] [PATCH v4 00/12] Enable postcopy RDMA live migration Lidong Chen
@ 2018-05-30  9:43 ` Lidong Chen
  2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 02/12] migration: create a dedicated connection for rdma return path Lidong Chen
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 28+ messages in thread
From: Lidong Chen @ 2018-05-30  9:43 UTC (permalink / raw)
  To: zhang.zhanghailiang, quintela, dgilbert, berrange, aviadye, pbonzini
  Cc: qemu-devel, adido, Lidong Chen, Lidong Chen

From: Lidong Chen <jemmy858585@gmail.com>

RDMA WRITE operations are performed with no notification to the destination
qemu, then the destination qemu can not wakeup. This patch disable RDMA WRITE
after postcopy started.

Signed-off-by: Lidong Chen <lidongchen@tencent.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/qemu-file.c |  8 ++++++--
 migration/rdma.c      | 12 ++++++++++++
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 0463f4c..977b9ae 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -253,8 +253,12 @@ size_t ram_control_save_page(QEMUFile *f, ram_addr_t block_offset,
     if (f->hooks && f->hooks->save_page) {
         int ret = f->hooks->save_page(f, f->opaque, block_offset,
                                       offset, size, bytes_sent);
-        f->bytes_xfer += size;
-        if (ret != RAM_SAVE_CONTROL_DELAYED) {
+        if (ret != RAM_SAVE_CONTROL_NOT_SUPP) {
+            f->bytes_xfer += size;
+        }
+
+        if (ret != RAM_SAVE_CONTROL_DELAYED &&
+            ret != RAM_SAVE_CONTROL_NOT_SUPP) {
             if (bytes_sent && *bytes_sent > 0) {
                 qemu_update_position(f, *bytes_sent);
             } else if (ret < 0) {
diff --git a/migration/rdma.c b/migration/rdma.c
index 7d233b0..a0748f4 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2930,6 +2930,10 @@ static size_t qemu_rdma_save_page(QEMUFile *f, void *opaque,
 
     CHECK_ERROR_STATE();
 
+    if (migrate_get_current()->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
+        return RAM_SAVE_CONTROL_NOT_SUPP;
+    }
+
     qemu_fflush(f);
 
     if (size > 0) {
@@ -3489,6 +3493,10 @@ static int qemu_rdma_registration_start(QEMUFile *f, void *opaque,
 
     CHECK_ERROR_STATE();
 
+    if (migrate_get_current()->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
+        return 0;
+    }
+
     trace_qemu_rdma_registration_start(flags);
     qemu_put_be64(f, RAM_SAVE_FLAG_HOOK);
     qemu_fflush(f);
@@ -3511,6 +3519,10 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
 
     CHECK_ERROR_STATE();
 
+    if (migrate_get_current()->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
+        return 0;
+    }
+
     qemu_fflush(f);
     ret = qemu_rdma_drain_cq(f, rdma);
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH v4 02/12] migration: create a dedicated connection for rdma return path
  2018-05-30  9:43 [Qemu-devel] [PATCH v4 00/12] Enable postcopy RDMA live migration Lidong Chen
  2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 01/12] migration: disable RDMA WRITE after postcopy started Lidong Chen
@ 2018-05-30  9:43 ` Lidong Chen
  2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 03/12] migration: remove unnecessary variables len in QIOChannelRDMA Lidong Chen
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 28+ messages in thread
From: Lidong Chen @ 2018-05-30  9:43 UTC (permalink / raw)
  To: zhang.zhanghailiang, quintela, dgilbert, berrange, aviadye, pbonzini
  Cc: qemu-devel, adido, Lidong Chen, Lidong Chen

From: Lidong Chen <jemmy858585@gmail.com>

If start a RDMA migration with postcopy enabled, the source qemu
establish a dedicated connection for return path.

Signed-off-by: Lidong Chen <lidongchen@tencent.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/rdma.c | 94 ++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 91 insertions(+), 3 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index a0748f4..ec4bbff 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -387,6 +387,10 @@ typedef struct RDMAContext {
     uint64_t unregistrations[RDMA_SIGNALED_SEND_MAX];
 
     GHashTable *blockmap;
+
+    /* the RDMAContext for return path */
+    struct RDMAContext *return_path;
+    bool is_return_path;
 } RDMAContext;
 
 #define TYPE_QIO_CHANNEL_RDMA "qio-channel-rdma"
@@ -2332,10 +2336,22 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
         rdma_destroy_id(rdma->cm_id);
         rdma->cm_id = NULL;
     }
+
+    /* the destination side, listen_id and channel is shared */
     if (rdma->listen_id) {
-        rdma_destroy_id(rdma->listen_id);
+        if (!rdma->is_return_path) {
+            rdma_destroy_id(rdma->listen_id);
+        }
         rdma->listen_id = NULL;
+
+        if (rdma->channel) {
+            if (!rdma->is_return_path) {
+                rdma_destroy_event_channel(rdma->channel);
+            }
+            rdma->channel = NULL;
+        }
     }
+
     if (rdma->channel) {
         rdma_destroy_event_channel(rdma->channel);
         rdma->channel = NULL;
@@ -2564,6 +2580,25 @@ err_dest_init_create_listen_id:
 
 }
 
+static void qemu_rdma_return_path_dest_init(RDMAContext *rdma_return_path,
+                                            RDMAContext *rdma)
+{
+    int idx;
+
+    for (idx = 0; idx < RDMA_WRID_MAX; idx++) {
+        rdma_return_path->wr_data[idx].control_len = 0;
+        rdma_return_path->wr_data[idx].control_curr = NULL;
+    }
+
+    /*the CM channel and CM id is shared*/
+    rdma_return_path->channel = rdma->channel;
+    rdma_return_path->listen_id = rdma->listen_id;
+
+    rdma->return_path = rdma_return_path;
+    rdma_return_path->return_path = rdma;
+    rdma_return_path->is_return_path = true;
+}
+
 static void *qemu_rdma_data_init(const char *host_port, Error **errp)
 {
     RDMAContext *rdma = NULL;
@@ -3021,6 +3056,8 @@ err:
     return ret;
 }
 
+static void rdma_accept_incoming_migration(void *opaque);
+
 static int qemu_rdma_accept(RDMAContext *rdma)
 {
     RDMACapabilities cap;
@@ -3115,7 +3152,14 @@ static int qemu_rdma_accept(RDMAContext *rdma)
         }
     }
 
-    qemu_set_fd_handler(rdma->channel->fd, NULL, NULL, NULL);
+    /* Accept the second connection request for return path */
+    if (migrate_postcopy() && !rdma->is_return_path) {
+        qemu_set_fd_handler(rdma->channel->fd, rdma_accept_incoming_migration,
+                            NULL,
+                            (void *)(intptr_t)rdma->return_path);
+    } else {
+        qemu_set_fd_handler(rdma->channel->fd, NULL, NULL, NULL);
+    }
 
     ret = rdma_accept(rdma->cm_id, &conn_param);
     if (ret) {
@@ -3700,6 +3744,10 @@ static void rdma_accept_incoming_migration(void *opaque)
 
     trace_qemu_rdma_accept_incoming_migration_accepted();
 
+    if (rdma->is_return_path) {
+        return;
+    }
+
     f = qemu_fopen_rdma(rdma, "rb");
     if (f == NULL) {
         ERROR(errp, "could not qemu_fopen_rdma!");
@@ -3714,7 +3762,7 @@ static void rdma_accept_incoming_migration(void *opaque)
 void rdma_start_incoming_migration(const char *host_port, Error **errp)
 {
     int ret;
-    RDMAContext *rdma;
+    RDMAContext *rdma, *rdma_return_path;
     Error *local_err = NULL;
 
     trace_rdma_start_incoming_migration();
@@ -3741,12 +3789,24 @@ void rdma_start_incoming_migration(const char *host_port, Error **errp)
 
     trace_rdma_start_incoming_migration_after_rdma_listen();
 
+    /* initialize the RDMAContext for return path */
+    if (migrate_postcopy()) {
+        rdma_return_path = qemu_rdma_data_init(host_port, &local_err);
+
+        if (rdma_return_path == NULL) {
+            goto err;
+        }
+
+        qemu_rdma_return_path_dest_init(rdma_return_path, rdma);
+    }
+
     qemu_set_fd_handler(rdma->channel->fd, rdma_accept_incoming_migration,
                         NULL, (void *)(intptr_t)rdma);
     return;
 err:
     error_propagate(errp, local_err);
     g_free(rdma);
+    g_free(rdma_return_path);
 }
 
 void rdma_start_outgoing_migration(void *opaque,
@@ -3754,6 +3814,7 @@ void rdma_start_outgoing_migration(void *opaque,
 {
     MigrationState *s = opaque;
     RDMAContext *rdma = qemu_rdma_data_init(host_port, errp);
+    RDMAContext *rdma_return_path = NULL;
     int ret = 0;
 
     if (rdma == NULL) {
@@ -3774,6 +3835,32 @@ void rdma_start_outgoing_migration(void *opaque,
         goto err;
     }
 
+    /* RDMA postcopy need a seprate queue pair for return path */
+    if (migrate_postcopy()) {
+        rdma_return_path = qemu_rdma_data_init(host_port, errp);
+
+        if (rdma_return_path == NULL) {
+            goto err;
+        }
+
+        ret = qemu_rdma_source_init(rdma_return_path,
+            s->enabled_capabilities[MIGRATION_CAPABILITY_RDMA_PIN_ALL], errp);
+
+        if (ret) {
+            goto err;
+        }
+
+        ret = qemu_rdma_connect(rdma_return_path, errp);
+
+        if (ret) {
+            goto err;
+        }
+
+        rdma->return_path = rdma_return_path;
+        rdma_return_path->return_path = rdma;
+        rdma_return_path->is_return_path = true;
+    }
+
     trace_rdma_start_outgoing_migration_after_rdma_connect();
 
     s->to_dst_file = qemu_fopen_rdma(rdma, "wb");
@@ -3781,4 +3868,5 @@ void rdma_start_outgoing_migration(void *opaque,
     return;
 err:
     g_free(rdma);
+    g_free(rdma_return_path);
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH v4 03/12] migration: remove unnecessary variables len in QIOChannelRDMA
  2018-05-30  9:43 [Qemu-devel] [PATCH v4 00/12] Enable postcopy RDMA live migration Lidong Chen
  2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 01/12] migration: disable RDMA WRITE after postcopy started Lidong Chen
  2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 02/12] migration: create a dedicated connection for rdma return path Lidong Chen
@ 2018-05-30  9:43 ` Lidong Chen
  2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 04/12] migration: avoid concurrent invoke channel_close by different threads Lidong Chen
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 28+ messages in thread
From: Lidong Chen @ 2018-05-30  9:43 UTC (permalink / raw)
  To: zhang.zhanghailiang, quintela, dgilbert, berrange, aviadye, pbonzini
  Cc: qemu-devel, adido, Lidong Chen, Lidong Chen

From: Lidong Chen <jemmy858585@gmail.com>

Because qio_channel_rdma_writev and qio_channel_rdma_readv maybe invoked
by different threads concurrently, this patch removes unnecessary variables
len in QIOChannelRDMA and use local variable instead.

Signed-off-by: Lidong Chen <lidongchen@tencent.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
---
 migration/rdma.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index ec4bbff..9b6da4d 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -404,7 +404,6 @@ struct QIOChannelRDMA {
     QIOChannel parent;
     RDMAContext *rdma;
     QEMUFile *file;
-    size_t len;
     bool blocking; /* XXX we don't actually honour this yet */
 };
 
@@ -2643,6 +2642,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
     int ret;
     ssize_t done = 0;
     size_t i;
+    size_t len = 0;
 
     CHECK_ERROR_STATE();
 
@@ -2662,10 +2662,10 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
         while (remaining) {
             RDMAControlHeader head;
 
-            rioc->len = MIN(remaining, RDMA_SEND_INCREMENT);
-            remaining -= rioc->len;
+            len = MIN(remaining, RDMA_SEND_INCREMENT);
+            remaining -= len;
 
-            head.len = rioc->len;
+            head.len = len;
             head.type = RDMA_CONTROL_QEMU_FILE;
 
             ret = qemu_rdma_exchange_send(rdma, &head, data, NULL, NULL, NULL);
@@ -2675,8 +2675,8 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
                 return ret;
             }
 
-            data += rioc->len;
-            done += rioc->len;
+            data += len;
+            done += len;
         }
     }
 
@@ -2771,8 +2771,7 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
             }
         }
     }
-    rioc->len = done;
-    return rioc->len;
+    return done;
 }
 
 /*
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH v4 04/12] migration: avoid concurrent invoke channel_close by different threads
  2018-05-30  9:43 [Qemu-devel] [PATCH v4 00/12] Enable postcopy RDMA live migration Lidong Chen
                   ` (2 preceding siblings ...)
  2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 03/12] migration: remove unnecessary variables len in QIOChannelRDMA Lidong Chen
@ 2018-05-30  9:43 ` Lidong Chen
  2018-05-30 14:45   ` Dr. David Alan Gilbert
  2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 05/12] migration: implement bi-directional RDMA QIOChannel Lidong Chen
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 28+ messages in thread
From: Lidong Chen @ 2018-05-30  9:43 UTC (permalink / raw)
  To: zhang.zhanghailiang, quintela, dgilbert, berrange, aviadye, pbonzini
  Cc: qemu-devel, adido, Lidong Chen, Lidong Chen

From: Lidong Chen <jemmy858585@gmail.com>

The channel_close maybe invoked by different threads. For example, source
qemu invokes qemu_fclose in main thread, migration thread and return path
thread. Destination qemu invokes qemu_fclose in main thread, listen thread
and COLO incoming thread.

Add a mutex in QEMUFile struct to avoid concurrent invoke channel_close.

Signed-off-by: Lidong Chen <lidongchen@tencent.com>
---
 migration/qemu-file.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 977b9ae..87d0f05 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -52,6 +52,7 @@ struct QEMUFile {
     unsigned int iovcnt;
 
     int last_error;
+    QemuMutex lock;
 };
 
 /*
@@ -96,6 +97,7 @@ QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps *ops)
 
     f = g_new0(QEMUFile, 1);
 
+    qemu_mutex_init(&f->lock);
     f->opaque = opaque;
     f->ops = ops;
     return f;
@@ -328,7 +330,9 @@ int qemu_fclose(QEMUFile *f)
     ret = qemu_file_get_error(f);
 
     if (f->ops->close) {
+        qemu_mutex_lock(&f->lock);
         int ret2 = f->ops->close(f->opaque);
+        qemu_mutex_unlock(&f->lock);
         if (ret >= 0) {
             ret = ret2;
         }
@@ -339,6 +343,7 @@ int qemu_fclose(QEMUFile *f)
     if (f->last_error) {
         ret = f->last_error;
     }
+    qemu_mutex_destroy(&f->lock);
     g_free(f);
     trace_qemu_file_fclose();
     return ret;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH v4 05/12] migration: implement bi-directional RDMA QIOChannel
  2018-05-30  9:43 [Qemu-devel] [PATCH v4 00/12] Enable postcopy RDMA live migration Lidong Chen
                   ` (3 preceding siblings ...)
  2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 04/12] migration: avoid concurrent invoke channel_close by different threads Lidong Chen
@ 2018-05-30  9:43 ` Lidong Chen
  2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 06/12] migration: Stop rdma yielding during incoming postcopy Lidong Chen
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 28+ messages in thread
From: Lidong Chen @ 2018-05-30  9:43 UTC (permalink / raw)
  To: zhang.zhanghailiang, quintela, dgilbert, berrange, aviadye, pbonzini
  Cc: qemu-devel, adido, Lidong Chen, Lidong Chen

From: Lidong Chen <jemmy858585@gmail.com>

This patch implements bi-directional RDMA QIOChannel. Because different
threads may access RDMAQIOChannel currently, this patch use RCU to protect it.

Signed-off-by: Lidong Chen <lidongchen@tencent.com>
---
 migration/colo.c         |   2 +
 migration/migration.c    |   2 +
 migration/postcopy-ram.c |   2 +
 migration/ram.c          |   4 +
 migration/rdma.c         | 196 ++++++++++++++++++++++++++++++++++++++++-------
 migration/savevm.c       |   3 +
 6 files changed, 183 insertions(+), 26 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index 4381067..88936f5 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -534,6 +534,7 @@ void *colo_process_incoming_thread(void *opaque)
     uint64_t value;
     Error *local_err = NULL;
 
+    rcu_register_thread();
     qemu_sem_init(&mis->colo_incoming_sem, 0);
 
     migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
@@ -666,5 +667,6 @@ out:
     }
     migration_incoming_exit_colo();
 
+    rcu_unregister_thread();
     return NULL;
 }
diff --git a/migration/migration.c b/migration/migration.c
index 05aec2c..6217ef1 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2008,6 +2008,7 @@ static void *source_return_path_thread(void *opaque)
     int res;
 
     trace_source_return_path_thread_entry();
+    rcu_register_thread();
 
 retry:
     while (!ms->rp_state.error && !qemu_file_get_error(rp) &&
@@ -2147,6 +2148,7 @@ out:
     trace_source_return_path_thread_end();
     ms->rp_state.from_dst_file = NULL;
     qemu_fclose(rp);
+    rcu_unregister_thread();
     return NULL;
 }
 
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 658b750..a5de61d 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -853,6 +853,7 @@ static void *postcopy_ram_fault_thread(void *opaque)
     RAMBlock *rb = NULL;
 
     trace_postcopy_ram_fault_thread_entry();
+    rcu_register_thread();
     mis->last_rb = NULL; /* last RAMBlock we sent part of */
     qemu_sem_post(&mis->fault_thread_sem);
 
@@ -1059,6 +1060,7 @@ retry:
             }
         }
     }
+    rcu_unregister_thread();
     trace_postcopy_ram_fault_thread_exit();
     g_free(pfd);
     return NULL;
diff --git a/migration/ram.c b/migration/ram.c
index c53e836..85c8c39 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -678,6 +678,7 @@ static void *multifd_send_thread(void *opaque)
     MultiFDSendParams *p = opaque;
     Error *local_err = NULL;
 
+    rcu_register_thread();
     if (multifd_send_initial_packet(p, &local_err) < 0) {
         goto out;
     }
@@ -701,6 +702,7 @@ out:
     p->running = false;
     qemu_mutex_unlock(&p->mutex);
 
+    rcu_unregister_thread();
     return NULL;
 }
 
@@ -814,6 +816,7 @@ static void *multifd_recv_thread(void *opaque)
 {
     MultiFDRecvParams *p = opaque;
 
+    rcu_register_thread();
     while (true) {
         qemu_mutex_lock(&p->mutex);
         if (p->quit) {
@@ -828,6 +831,7 @@ static void *multifd_recv_thread(void *opaque)
     p->running = false;
     qemu_mutex_unlock(&p->mutex);
 
+    rcu_unregister_thread();
     return NULL;
 }
 
diff --git a/migration/rdma.c b/migration/rdma.c
index 9b6da4d..45f01e6 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -86,6 +86,7 @@ static uint32_t known_capabilities = RDMA_CAPABILITY_PIN_ALL;
                                 " to abort!"); \
                 rdma->error_reported = 1; \
             } \
+            rcu_read_unlock(); \
             return rdma->error_state; \
         } \
     } while (0)
@@ -402,7 +403,8 @@ typedef struct QIOChannelRDMA QIOChannelRDMA;
 
 struct QIOChannelRDMA {
     QIOChannel parent;
-    RDMAContext *rdma;
+    RDMAContext *rdmain;
+    RDMAContext *rdmaout;
     QEMUFile *file;
     bool blocking; /* XXX we don't actually honour this yet */
 };
@@ -2638,12 +2640,20 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
 {
     QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
     QEMUFile *f = rioc->file;
-    RDMAContext *rdma = rioc->rdma;
+    RDMAContext *rdma;
     int ret;
     ssize_t done = 0;
     size_t i;
     size_t len = 0;
 
+    rcu_read_lock();
+    rdma = atomic_rcu_read(&rioc->rdmaout);
+
+    if (!rdma) {
+        rcu_read_unlock();
+        return -EIO;
+    }
+
     CHECK_ERROR_STATE();
 
     /*
@@ -2653,6 +2663,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
     ret = qemu_rdma_write_flush(f, rdma);
     if (ret < 0) {
         rdma->error_state = ret;
+        rcu_read_unlock();
         return ret;
     }
 
@@ -2672,6 +2683,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
 
             if (ret < 0) {
                 rdma->error_state = ret;
+                rcu_read_unlock();
                 return ret;
             }
 
@@ -2680,6 +2692,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
         }
     }
 
+    rcu_read_unlock();
     return done;
 }
 
@@ -2713,12 +2726,20 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
                                       Error **errp)
 {
     QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
-    RDMAContext *rdma = rioc->rdma;
+    RDMAContext *rdma;
     RDMAControlHeader head;
     int ret = 0;
     ssize_t i;
     size_t done = 0;
 
+    rcu_read_lock();
+    rdma = atomic_rcu_read(&rioc->rdmain);
+
+    if (!rdma) {
+        rcu_read_unlock();
+        return -EIO;
+    }
+
     CHECK_ERROR_STATE();
 
     for (i = 0; i < niov; i++) {
@@ -2730,7 +2751,7 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
          * were given and dish out the bytes until we run
          * out of bytes.
          */
-        ret = qemu_rdma_fill(rioc->rdma, data, want, 0);
+        ret = qemu_rdma_fill(rdma, data, want, 0);
         done += ret;
         want -= ret;
         /* Got what we needed, so go to next iovec */
@@ -2752,25 +2773,28 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
 
         if (ret < 0) {
             rdma->error_state = ret;
+            rcu_read_unlock();
             return ret;
         }
 
         /*
          * SEND was received with new bytes, now try again.
          */
-        ret = qemu_rdma_fill(rioc->rdma, data, want, 0);
+        ret = qemu_rdma_fill(rdma, data, want, 0);
         done += ret;
         want -= ret;
 
         /* Still didn't get enough, so lets just return */
         if (want) {
             if (done == 0) {
+                rcu_read_unlock();
                 return QIO_CHANNEL_ERR_BLOCK;
             } else {
                 break;
             }
         }
     }
+    rcu_read_unlock();
     return done;
 }
 
@@ -2822,15 +2846,29 @@ qio_channel_rdma_source_prepare(GSource *source,
                                 gint *timeout)
 {
     QIOChannelRDMASource *rsource = (QIOChannelRDMASource *)source;
-    RDMAContext *rdma = rsource->rioc->rdma;
+    RDMAContext *rdma;
     GIOCondition cond = 0;
     *timeout = -1;
 
+    rcu_read_lock();
+    if (rsource->condition == G_IO_IN) {
+        rdma = atomic_rcu_read(&rsource->rioc->rdmain);
+    } else {
+        rdma = atomic_rcu_read(&rsource->rioc->rdmaout);
+    }
+
+    if (!rdma) {
+        error_report("RDMAContext is NULL when prepare Gsource");
+        rcu_read_unlock();
+        return FALSE;
+    }
+
     if (rdma->wr_data[0].control_len) {
         cond |= G_IO_IN;
     }
     cond |= G_IO_OUT;
 
+    rcu_read_unlock();
     return cond & rsource->condition;
 }
 
@@ -2838,14 +2876,28 @@ static gboolean
 qio_channel_rdma_source_check(GSource *source)
 {
     QIOChannelRDMASource *rsource = (QIOChannelRDMASource *)source;
-    RDMAContext *rdma = rsource->rioc->rdma;
+    RDMAContext *rdma;
     GIOCondition cond = 0;
 
+    rcu_read_lock();
+    if (rsource->condition == G_IO_IN) {
+        rdma = atomic_rcu_read(&rsource->rioc->rdmain);
+    } else {
+        rdma = atomic_rcu_read(&rsource->rioc->rdmaout);
+    }
+
+    if (!rdma) {
+        error_report("RDMAContext is NULL when check Gsource");
+        rcu_read_unlock();
+        return FALSE;
+    }
+
     if (rdma->wr_data[0].control_len) {
         cond |= G_IO_IN;
     }
     cond |= G_IO_OUT;
 
+    rcu_read_unlock();
     return cond & rsource->condition;
 }
 
@@ -2856,14 +2908,28 @@ qio_channel_rdma_source_dispatch(GSource *source,
 {
     QIOChannelFunc func = (QIOChannelFunc)callback;
     QIOChannelRDMASource *rsource = (QIOChannelRDMASource *)source;
-    RDMAContext *rdma = rsource->rioc->rdma;
+    RDMAContext *rdma;
     GIOCondition cond = 0;
 
+    rcu_read_lock();
+    if (rsource->condition == G_IO_IN) {
+        rdma = atomic_rcu_read(&rsource->rioc->rdmain);
+    } else {
+        rdma = atomic_rcu_read(&rsource->rioc->rdmaout);
+    }
+
+    if (!rdma) {
+        error_report("RDMAContext is NULL when dispatch Gsource");
+        rcu_read_unlock();
+        return FALSE;
+    }
+
     if (rdma->wr_data[0].control_len) {
         cond |= G_IO_IN;
     }
     cond |= G_IO_OUT;
 
+    rcu_read_unlock();
     return (*func)(QIO_CHANNEL(rsource->rioc),
                    (cond & rsource->condition),
                    user_data);
@@ -2908,15 +2974,32 @@ static int qio_channel_rdma_close(QIOChannel *ioc,
                                   Error **errp)
 {
     QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
+    RDMAContext *rdmain, *rdmaout;
     trace_qemu_rdma_close();
-    if (rioc->rdma) {
-        if (!rioc->rdma->error_state) {
-            rioc->rdma->error_state = qemu_file_get_error(rioc->file);
-        }
-        qemu_rdma_cleanup(rioc->rdma);
-        g_free(rioc->rdma);
-        rioc->rdma = NULL;
+
+    rdmain = rioc->rdmain;
+    if (rdmain) {
+        atomic_rcu_set(&rioc->rdmain, NULL);
+    }
+
+    rdmaout = rioc->rdmaout;
+    if (rdmaout) {
+        atomic_rcu_set(&rioc->rdmaout, NULL);
     }
+
+    synchronize_rcu();
+
+    if (rdmain) {
+        qemu_rdma_cleanup(rdmain);
+    }
+
+    if (rdmaout) {
+        qemu_rdma_cleanup(rdmaout);
+    }
+
+    g_free(rdmain);
+    g_free(rdmaout);
+
     return 0;
 }
 
@@ -2959,12 +3042,21 @@ static size_t qemu_rdma_save_page(QEMUFile *f, void *opaque,
                                   size_t size, uint64_t *bytes_sent)
 {
     QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(opaque);
-    RDMAContext *rdma = rioc->rdma;
+    RDMAContext *rdma;
     int ret;
 
+    rcu_read_lock();
+    rdma = atomic_rcu_read(&rioc->rdmaout);
+
+    if (!rdma) {
+        rcu_read_unlock();
+        return -EIO;
+    }
+
     CHECK_ERROR_STATE();
 
     if (migrate_get_current()->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
+        rcu_read_unlock();
         return RAM_SAVE_CONTROL_NOT_SUPP;
     }
 
@@ -3049,9 +3141,11 @@ static size_t qemu_rdma_save_page(QEMUFile *f, void *opaque,
         }
     }
 
+    rcu_read_unlock();
     return RAM_SAVE_CONTROL_DELAYED;
 err:
     rdma->error_state = ret;
+    rcu_read_unlock();
     return ret;
 }
 
@@ -3227,8 +3321,8 @@ static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque)
     RDMAControlHeader blocks = { .type = RDMA_CONTROL_RAM_BLOCKS_RESULT,
                                  .repeat = 1 };
     QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(opaque);
-    RDMAContext *rdma = rioc->rdma;
-    RDMALocalBlocks *local = &rdma->local_ram_blocks;
+    RDMAContext *rdma;
+    RDMALocalBlocks *local;
     RDMAControlHeader head;
     RDMARegister *reg, *registers;
     RDMACompress *comp;
@@ -3241,8 +3335,17 @@ static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque)
     int count = 0;
     int i = 0;
 
+    rcu_read_lock();
+    rdma = atomic_rcu_read(&rioc->rdmain);
+
+    if (!rdma) {
+        rcu_read_unlock();
+        return -EIO;
+    }
+
     CHECK_ERROR_STATE();
 
+    local = &rdma->local_ram_blocks;
     do {
         trace_qemu_rdma_registration_handle_wait();
 
@@ -3476,6 +3579,7 @@ out:
     if (ret < 0) {
         rdma->error_state = ret;
     }
+    rcu_read_unlock();
     return ret;
 }
 
@@ -3489,10 +3593,18 @@ out:
 static int
 rdma_block_notification_handle(QIOChannelRDMA *rioc, const char *name)
 {
-    RDMAContext *rdma = rioc->rdma;
+    RDMAContext *rdma;
     int curr;
     int found = -1;
 
+    rcu_read_lock();
+    rdma = atomic_rcu_read(&rioc->rdmain);
+
+    if (!rdma) {
+        rcu_read_unlock();
+        return -EIO;
+    }
+
     /* Find the matching RAMBlock in our local list */
     for (curr = 0; curr < rdma->local_ram_blocks.nb_blocks; curr++) {
         if (!strcmp(rdma->local_ram_blocks.block[curr].block_name, name)) {
@@ -3503,6 +3615,7 @@ rdma_block_notification_handle(QIOChannelRDMA *rioc, const char *name)
 
     if (found == -1) {
         error_report("RAMBlock '%s' not found on destination", name);
+        rcu_read_unlock();
         return -ENOENT;
     }
 
@@ -3510,6 +3623,7 @@ rdma_block_notification_handle(QIOChannelRDMA *rioc, const char *name)
     trace_rdma_block_notification_handle(name, rdma->next_src_index);
     rdma->next_src_index++;
 
+    rcu_read_unlock();
     return 0;
 }
 
@@ -3532,11 +3646,19 @@ static int qemu_rdma_registration_start(QEMUFile *f, void *opaque,
                                         uint64_t flags, void *data)
 {
     QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(opaque);
-    RDMAContext *rdma = rioc->rdma;
+    RDMAContext *rdma;
+
+    rcu_read_lock();
+    rdma = atomic_rcu_read(&rioc->rdmaout);
+    if (!rdma) {
+        rcu_read_unlock();
+        return -EIO;
+    }
 
     CHECK_ERROR_STATE();
 
     if (migrate_get_current()->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
+        rcu_read_unlock();
         return 0;
     }
 
@@ -3544,6 +3666,7 @@ static int qemu_rdma_registration_start(QEMUFile *f, void *opaque,
     qemu_put_be64(f, RAM_SAVE_FLAG_HOOK);
     qemu_fflush(f);
 
+    rcu_read_unlock();
     return 0;
 }
 
@@ -3556,13 +3679,21 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
 {
     Error *local_err = NULL, **errp = &local_err;
     QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(opaque);
-    RDMAContext *rdma = rioc->rdma;
+    RDMAContext *rdma;
     RDMAControlHeader head = { .len = 0, .repeat = 1 };
     int ret = 0;
 
+    rcu_read_lock();
+    rdma = atomic_rcu_read(&rioc->rdmaout);
+    if (!rdma) {
+        rcu_read_unlock();
+        return -EIO;
+    }
+
     CHECK_ERROR_STATE();
 
     if (migrate_get_current()->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
+        rcu_read_unlock();
         return 0;
     }
 
@@ -3594,6 +3725,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
                     qemu_rdma_reg_whole_ram_blocks : NULL);
         if (ret < 0) {
             ERROR(errp, "receiving remote info!");
+            rcu_read_unlock();
             return ret;
         }
 
@@ -3617,6 +3749,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
                         "not identical on both the source and destination.",
                         local->nb_blocks, nb_dest_blocks);
             rdma->error_state = -EINVAL;
+            rcu_read_unlock();
             return -EINVAL;
         }
 
@@ -3633,6 +3766,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
                             local->block[i].length,
                             rdma->dest_blocks[i].length);
                 rdma->error_state = -EINVAL;
+                rcu_read_unlock();
                 return -EINVAL;
             }
             local->block[i].remote_host_addr =
@@ -3650,9 +3784,11 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
         goto err;
     }
 
+    rcu_read_unlock();
     return 0;
 err:
     rdma->error_state = ret;
+    rcu_read_unlock();
     return ret;
 }
 
@@ -3670,10 +3806,15 @@ static const QEMUFileHooks rdma_write_hooks = {
 static void qio_channel_rdma_finalize(Object *obj)
 {
     QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(obj);
-    if (rioc->rdma) {
-        qemu_rdma_cleanup(rioc->rdma);
-        g_free(rioc->rdma);
-        rioc->rdma = NULL;
+    if (rioc->rdmain) {
+        qemu_rdma_cleanup(rioc->rdmain);
+        g_free(rioc->rdmain);
+        rioc->rdmain = NULL;
+    }
+    if (rioc->rdmaout) {
+        qemu_rdma_cleanup(rioc->rdmaout);
+        g_free(rioc->rdmaout);
+        rioc->rdmaout = NULL;
     }
 }
 
@@ -3713,13 +3854,16 @@ static QEMUFile *qemu_fopen_rdma(RDMAContext *rdma, const char *mode)
     }
 
     rioc = QIO_CHANNEL_RDMA(object_new(TYPE_QIO_CHANNEL_RDMA));
-    rioc->rdma = rdma;
 
     if (mode[0] == 'w') {
         rioc->file = qemu_fopen_channel_output(QIO_CHANNEL(rioc));
+        rioc->rdmaout = rdma;
+        rioc->rdmain = rdma->return_path;
         qemu_file_set_hooks(rioc->file, &rdma_write_hooks);
     } else {
         rioc->file = qemu_fopen_channel_input(QIO_CHANNEL(rioc));
+        rioc->rdmain = rdma;
+        rioc->rdmaout = rdma->return_path;
         qemu_file_set_hooks(rioc->file, &rdma_read_hooks);
     }
 
diff --git a/migration/savevm.c b/migration/savevm.c
index 4251125..90cd00f 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1621,6 +1621,7 @@ static void *postcopy_ram_listen_thread(void *opaque)
     qemu_sem_post(&mis->listen_thread_sem);
     trace_postcopy_ram_listen_thread_start();
 
+    rcu_register_thread();
     /*
      * Because we're a thread and not a coroutine we can't yield
      * in qemu_file, and thus we must be blocking now.
@@ -1661,6 +1662,7 @@ static void *postcopy_ram_listen_thread(void *opaque)
          * to leave the guest running and fire MCEs for pages that never
          * arrived as a desperate recovery step.
          */
+        rcu_unregister_thread();
         exit(EXIT_FAILURE);
     }
 
@@ -1675,6 +1677,7 @@ static void *postcopy_ram_listen_thread(void *opaque)
     migration_incoming_state_destroy();
     qemu_loadvm_state_cleanup();
 
+    rcu_unregister_thread();
     return NULL;
 }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH v4 06/12] migration: Stop rdma yielding during incoming postcopy
  2018-05-30  9:43 [Qemu-devel] [PATCH v4 00/12] Enable postcopy RDMA live migration Lidong Chen
                   ` (4 preceding siblings ...)
  2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 05/12] migration: implement bi-directional RDMA QIOChannel Lidong Chen
@ 2018-05-30  9:43 ` Lidong Chen
  2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 07/12] migration: not wait RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnect Lidong Chen
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 28+ messages in thread
From: Lidong Chen @ 2018-05-30  9:43 UTC (permalink / raw)
  To: zhang.zhanghailiang, quintela, dgilbert, berrange, aviadye, pbonzini
  Cc: qemu-devel, adido, Lidong Chen, Lidong Chen

From: Lidong Chen <jemmy858585@gmail.com>

During incoming postcopy, the destination qemu will invoke
qemu_rdma_wait_comp_channel in a seprate thread. So does not use rdma
yield, and poll the completion channel fd instead.

Signed-off-by: Lidong Chen <lidongchen@tencent.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/rdma.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 45f01e6..0dd4033 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1493,11 +1493,13 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma)
      * Coroutine doesn't start until migration_fd_process_incoming()
      * so don't yield unless we know we're running inside of a coroutine.
      */
-    if (rdma->migration_started_on_destination) {
+    if (rdma->migration_started_on_destination &&
+        migration_incoming_get_current()->state == MIGRATION_STATUS_ACTIVE) {
         yield_until_fd_readable(rdma->comp_channel->fd);
     } else {
         /* This is the source side, we're in a separate thread
          * or destination prior to migration_fd_process_incoming()
+         * after postcopy, the destination also in a seprate thread.
          * we can't yield; so we have to poll the fd.
          * But we need to be able to handle 'cancel' or an error
          * without hanging forever.
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH v4 07/12] migration: not wait RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnect
  2018-05-30  9:43 [Qemu-devel] [PATCH v4 00/12] Enable postcopy RDMA live migration Lidong Chen
                   ` (5 preceding siblings ...)
  2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 06/12] migration: Stop rdma yielding during incoming postcopy Lidong Chen
@ 2018-05-30  9:43 ` Lidong Chen
  2018-05-30 12:24   ` Dr. David Alan Gilbert
  2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 08/12] migration: implement io_set_aio_fd_handler function for RDMA QIOChannel Lidong Chen
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 28+ messages in thread
From: Lidong Chen @ 2018-05-30  9:43 UTC (permalink / raw)
  To: zhang.zhanghailiang, quintela, dgilbert, berrange, aviadye, pbonzini
  Cc: qemu-devel, adido, Lidong Chen, Lidong Chen

From: Lidong Chen <jemmy858585@gmail.com>

When cancel migration during RDMA precopy, the source qemu main thread hangs sometime.

The backtrace is:
    (gdb) bt
    #0  0x00007f249eabd43d in write () from /lib64/libpthread.so.0
    #1  0x00007f24a1ce98e4 in rdma_get_cm_event (channel=0x4675d10, event=0x7ffe2f643dd0) at src/cma.c:2189
    #2  0x00000000007b6166 in qemu_rdma_cleanup (rdma=0x6784000) at migration/rdma.c:2296
    #3  0x00000000007b7cae in qio_channel_rdma_close (ioc=0x3bfcc30, errp=0x0) at migration/rdma.c:2999
    #4  0x00000000008db60e in qio_channel_close (ioc=0x3bfcc30, errp=0x0) at io/channel.c:273
    #5  0x00000000007a8765 in channel_close (opaque=0x3bfcc30) at migration/qemu-file-channel.c:98
    #6  0x00000000007a71f9 in qemu_fclose (f=0x527c000) at migration/qemu-file.c:334
    #7  0x0000000000795b96 in migrate_fd_cleanup (opaque=0x3b46280) at migration/migration.c:1162
    #8  0x000000000093a71b in aio_bh_call (bh=0x3db7a20) at util/async.c:90
    #9  0x000000000093a7b2 in aio_bh_poll (ctx=0x3b121c0) at util/async.c:118
    #10 0x000000000093f2ad in aio_dispatch (ctx=0x3b121c0) at util/aio-posix.c:436
    #11 0x000000000093ab41 in aio_ctx_dispatch (source=0x3b121c0, callback=0x0, user_data=0x0)
        at util/async.c:261
    #12 0x00007f249f73c7aa in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
    #13 0x000000000093dc5e in glib_pollfds_poll () at util/main-loop.c:215
    #14 0x000000000093dd4e in os_host_main_loop_wait (timeout=28000000) at util/main-loop.c:263
    #15 0x000000000093de05 in main_loop_wait (nonblocking=0) at util/main-loop.c:522
    #16 0x00000000005bc6a5 in main_loop () at vl.c:1944
    #17 0x00000000005c39b5 in main (argc=56, argv=0x7ffe2f6443f8, envp=0x3ad0030) at vl.c:4752

It does not get the RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnect sometime.

According to IB Spec once active side send DREQ message, it should wait for DREP message
and only once it arrived it should trigger a DISCONNECT event. DREP message can be dropped
due to network issues.
For that case the spec defines a DREP_timeout state in the CM state machine, if the DREP is
dropped we should get a timeout and a TIMEWAIT_EXIT event will be trigger.
Unfortunately the current kernel CM implementation doesn't include the DREP_timeout state
and in above scenario we will not get DISCONNECT or TIMEWAIT_EXIT events.

So it should not invoke rdma_get_cm_event which may hang forever, and the event channel
is also destroyed in qemu_rdma_cleanup.

Signed-off-by: Lidong Chen <lidongchen@tencent.com>
---
 migration/rdma.c       | 12 ++----------
 migration/trace-events |  1 -
 2 files changed, 2 insertions(+), 11 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 0dd4033..92e4d30 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2275,8 +2275,7 @@ static int qemu_rdma_write(QEMUFile *f, RDMAContext *rdma,
 
 static void qemu_rdma_cleanup(RDMAContext *rdma)
 {
-    struct rdma_cm_event *cm_event;
-    int ret, idx;
+    int idx;
 
     if (rdma->cm_id && rdma->connected) {
         if ((rdma->error_state ||
@@ -2290,14 +2289,7 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
             qemu_rdma_post_send_control(rdma, NULL, &head);
         }
 
-        ret = rdma_disconnect(rdma->cm_id);
-        if (!ret) {
-            trace_qemu_rdma_cleanup_waiting_for_disconnect();
-            ret = rdma_get_cm_event(rdma->channel, &cm_event);
-            if (!ret) {
-                rdma_ack_cm_event(cm_event);
-            }
-        }
+        rdma_disconnect(rdma->cm_id);
         trace_qemu_rdma_cleanup_disconnect();
         rdma->connected = false;
     }
diff --git a/migration/trace-events b/migration/trace-events
index 3c798dd..4a768ea 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -146,7 +146,6 @@ qemu_rdma_accept_pin_state(bool pin) "%d"
 qemu_rdma_accept_pin_verbsc(void *verbs) "Verbs context after listen: %p"
 qemu_rdma_block_for_wrid_miss(const char *wcompstr, int wcomp, const char *gcompstr, uint64_t req) "A Wanted wrid %s (%d) but got %s (%" PRIu64 ")"
 qemu_rdma_cleanup_disconnect(void) ""
-qemu_rdma_cleanup_waiting_for_disconnect(void) ""
 qemu_rdma_close(void) ""
 qemu_rdma_connect_pin_all_requested(void) ""
 qemu_rdma_connect_pin_all_outcome(bool pin) "%d"
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH v4 08/12] migration: implement io_set_aio_fd_handler function for RDMA QIOChannel
  2018-05-30  9:43 [Qemu-devel] [PATCH v4 00/12] Enable postcopy RDMA live migration Lidong Chen
                   ` (6 preceding siblings ...)
  2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 07/12] migration: not wait RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnect Lidong Chen
@ 2018-05-30  9:43 ` Lidong Chen
  2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 09/12] migration: invoke qio_channel_yield only when qemu_in_coroutine() Lidong Chen
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 28+ messages in thread
From: Lidong Chen @ 2018-05-30  9:43 UTC (permalink / raw)
  To: zhang.zhanghailiang, quintela, dgilbert, berrange, aviadye, pbonzini
  Cc: qemu-devel, adido, Lidong Chen, Lidong Chen

From: Lidong Chen <jemmy858585@gmail.com>

if qio_channel_rdma_readv return QIO_CHANNEL_ERR_BLOCK, the destination qemu
crash.

The backtrace is:
(gdb) bt
    #0  0x0000000000000000 in ?? ()
    #1  0x00000000008db50e in qio_channel_set_aio_fd_handler (ioc=0x38111e0, ctx=0x3726080,
        io_read=0x8db841 <qio_channel_restart_read>, io_write=0x0, opaque=0x38111e0) at io/channel.c:
    #2  0x00000000008db952 in qio_channel_set_aio_fd_handlers (ioc=0x38111e0) at io/channel.c:438
    #3  0x00000000008dbab4 in qio_channel_yield (ioc=0x38111e0, condition=G_IO_IN) at io/channel.c:47
    #4  0x00000000007a870b in channel_get_buffer (opaque=0x38111e0, buf=0x440c038 "", pos=0, size=327
        at migration/qemu-file-channel.c:83
    #5  0x00000000007a70f6 in qemu_fill_buffer (f=0x440c000) at migration/qemu-file.c:299
    #6  0x00000000007a79d0 in qemu_peek_byte (f=0x440c000, offset=0) at migration/qemu-file.c:562
    #7  0x00000000007a7a22 in qemu_get_byte (f=0x440c000) at migration/qemu-file.c:575
    #8  0x00000000007a7c78 in qemu_get_be32 (f=0x440c000) at migration/qemu-file.c:655
    #9  0x00000000007a0508 in qemu_loadvm_state (f=0x440c000) at migration/savevm.c:2126
    #10 0x0000000000794141 in process_incoming_migration_co (opaque=0x0) at migration/migration.c:366
    #11 0x000000000095c598 in coroutine_trampoline (i0=84033984, i1=0) at util/coroutine-ucontext.c:1
    #12 0x00007f9c0db56d40 in ?? () from /lib64/libc.so.6
    #13 0x00007f96fe858760 in ?? ()
    #14 0x0000000000000000 in ?? ()

RDMA QIOChannel not implement io_set_aio_fd_handler. so
qio_channel_set_aio_fd_handler will access NULL pointer.

Signed-off-by: Lidong Chen <lidongchen@tencent.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
---
 migration/rdma.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/migration/rdma.c b/migration/rdma.c
index 92e4d30..dfa4f77 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2963,6 +2963,21 @@ static GSource *qio_channel_rdma_create_watch(QIOChannel *ioc,
     return source;
 }
 
+static void qio_channel_rdma_set_aio_fd_handler(QIOChannel *ioc,
+                                                  AioContext *ctx,
+                                                  IOHandler *io_read,
+                                                  IOHandler *io_write,
+                                                  void *opaque)
+{
+    QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
+    if (io_read) {
+        aio_set_fd_handler(ctx, rioc->rdmain->comp_channel->fd,
+                           false, io_read, io_write, NULL, opaque);
+    } else {
+        aio_set_fd_handler(ctx, rioc->rdmaout->comp_channel->fd,
+                           false, io_read, io_write, NULL, opaque);
+    }
+}
 
 static int qio_channel_rdma_close(QIOChannel *ioc,
                                   Error **errp)
@@ -3822,6 +3837,7 @@ static void qio_channel_rdma_class_init(ObjectClass *klass,
     ioc_klass->io_set_blocking = qio_channel_rdma_set_blocking;
     ioc_klass->io_close = qio_channel_rdma_close;
     ioc_klass->io_create_watch = qio_channel_rdma_create_watch;
+    ioc_klass->io_set_aio_fd_handler = qio_channel_rdma_set_aio_fd_handler;
 }
 
 static const TypeInfo qio_channel_rdma_info = {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH v4 09/12] migration: invoke qio_channel_yield only when qemu_in_coroutine()
  2018-05-30  9:43 [Qemu-devel] [PATCH v4 00/12] Enable postcopy RDMA live migration Lidong Chen
                   ` (7 preceding siblings ...)
  2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 08/12] migration: implement io_set_aio_fd_handler function for RDMA QIOChannel Lidong Chen
@ 2018-05-30  9:43 ` Lidong Chen
  2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 10/12] migration: create a dedicated thread to release rdma resource Lidong Chen
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 28+ messages in thread
From: Lidong Chen @ 2018-05-30  9:43 UTC (permalink / raw)
  To: zhang.zhanghailiang, quintela, dgilbert, berrange, aviadye, pbonzini
  Cc: qemu-devel, adido, Lidong Chen, Lidong Chen

From: Lidong Chen <jemmy858585@gmail.com>

when qio_channel_read return QIO_CHANNEL_ERR_BLOCK, the source qemu crash.

The backtrace is:
    (gdb) bt
    #0  0x00007fb20aba91d7 in raise () from /lib64/libc.so.6
    #1  0x00007fb20abaa8c8 in abort () from /lib64/libc.so.6
    #2  0x00007fb20aba2146 in __assert_fail_base () from /lib64/libc.so.6
    #3  0x00007fb20aba21f2 in __assert_fail () from /lib64/libc.so.6
    #4  0x00000000008dba2d in qio_channel_yield (ioc=0x22f9e20, condition=G_IO_IN) at io/channel.c:460
    #5  0x00000000007a870b in channel_get_buffer (opaque=0x22f9e20, buf=0x3d54038 "", pos=0, size=32768)
        at migration/qemu-file-channel.c:83
    #6  0x00000000007a70f6 in qemu_fill_buffer (f=0x3d54000) at migration/qemu-file.c:299
    #7  0x00000000007a79d0 in qemu_peek_byte (f=0x3d54000, offset=0) at migration/qemu-file.c:562
    #8  0x00000000007a7a22 in qemu_get_byte (f=0x3d54000) at migration/qemu-file.c:575
    #9  0x00000000007a7c46 in qemu_get_be16 (f=0x3d54000) at migration/qemu-file.c:647
    #10 0x0000000000796db7 in source_return_path_thread (opaque=0x2242280) at migration/migration.c:1794
    #11 0x00000000009428fa in qemu_thread_start (args=0x3e58420) at util/qemu-thread-posix.c:504
    #12 0x00007fb20af3ddc5 in start_thread () from /lib64/libpthread.so.0
    #13 0x00007fb20ac6b74d in clone () from /lib64/libc.so.6

This patch fixed by invoke qio_channel_yield only when qemu_in_coroutine().

Signed-off-by: Lidong Chen <lidongchen@tencent.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
---
 migration/qemu-file-channel.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/migration/qemu-file-channel.c b/migration/qemu-file-channel.c
index e202d73..8e639eb 100644
--- a/migration/qemu-file-channel.c
+++ b/migration/qemu-file-channel.c
@@ -49,7 +49,11 @@ static ssize_t channel_writev_buffer(void *opaque,
         ssize_t len;
         len = qio_channel_writev(ioc, local_iov, nlocal_iov, NULL);
         if (len == QIO_CHANNEL_ERR_BLOCK) {
-            qio_channel_wait(ioc, G_IO_OUT);
+            if (qemu_in_coroutine()) {
+                qio_channel_yield(ioc, G_IO_OUT);
+            } else {
+                qio_channel_wait(ioc, G_IO_OUT);
+            }
             continue;
         }
         if (len < 0) {
@@ -80,7 +84,11 @@ static ssize_t channel_get_buffer(void *opaque,
         ret = qio_channel_read(ioc, (char *)buf, size, NULL);
         if (ret < 0) {
             if (ret == QIO_CHANNEL_ERR_BLOCK) {
-                qio_channel_yield(ioc, G_IO_IN);
+                if (qemu_in_coroutine()) {
+                    qio_channel_yield(ioc, G_IO_IN);
+                } else {
+                    qio_channel_wait(ioc, G_IO_IN);
+                }
             } else {
                 /* XXX handle Error * object */
                 return -EIO;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH v4 10/12] migration: create a dedicated thread to release rdma resource
  2018-05-30  9:43 [Qemu-devel] [PATCH v4 00/12] Enable postcopy RDMA live migration Lidong Chen
                   ` (8 preceding siblings ...)
  2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 09/12] migration: invoke qio_channel_yield only when qemu_in_coroutine() Lidong Chen
@ 2018-05-30  9:43 ` Lidong Chen
  2018-05-30 16:50   ` Dr. David Alan Gilbert
  2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 11/12] migration: poll the cm event while wait RDMA work request completion Lidong Chen
  2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 12/12] migration: implement the shutdown for RDMA QIOChannel Lidong Chen
  11 siblings, 1 reply; 28+ messages in thread
From: Lidong Chen @ 2018-05-30  9:43 UTC (permalink / raw)
  To: zhang.zhanghailiang, quintela, dgilbert, berrange, aviadye, pbonzini
  Cc: qemu-devel, adido, Lidong Chen

ibv_dereg_mr wait for a long time for big memory size virtual server.

The test result is:
  10GB  326ms
  20GB  699ms
  30GB  1021ms
  40GB  1387ms
  50GB  1712ms
  60GB  2034ms
  70GB  2457ms
  80GB  2807ms
  90GB  3107ms
  100GB 3474ms
  110GB 3735ms
  120GB 4064ms
  130GB 4567ms
  140GB 4886ms

this will cause the guest os hang for a while when migration finished.
So create a dedicated thread to release rdma resource.

Signed-off-by: Lidong Chen <lidongchen@tencent.com>
---
 migration/rdma.c | 21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index dfa4f77..1b9e261 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2979,12 +2979,12 @@ static void qio_channel_rdma_set_aio_fd_handler(QIOChannel *ioc,
     }
 }
 
-static int qio_channel_rdma_close(QIOChannel *ioc,
-                                  Error **errp)
+static void *qio_channel_rdma_close_thread(void *arg)
 {
-    QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
+    QIOChannelRDMA *rioc = arg;
     RDMAContext *rdmain, *rdmaout;
-    trace_qemu_rdma_close();
+
+    rcu_register_thread();
 
     rdmain = rioc->rdmain;
     if (rdmain) {
@@ -3009,6 +3009,19 @@ static int qio_channel_rdma_close(QIOChannel *ioc,
     g_free(rdmain);
     g_free(rdmaout);
 
+    rcu_unregister_thread();
+    return NULL;
+}
+
+static int qio_channel_rdma_close(QIOChannel *ioc,
+                                  Error **errp)
+{
+    QemuThread t;
+    QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
+    trace_qemu_rdma_close();
+
+    qemu_thread_create(&t, "rdma cleanup", qio_channel_rdma_close_thread,
+                           rioc, QEMU_THREAD_DETACHED);
     return 0;
 }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH v4 11/12] migration: poll the cm event while wait RDMA work request completion
  2018-05-30  9:43 [Qemu-devel] [PATCH v4 00/12] Enable postcopy RDMA live migration Lidong Chen
                   ` (9 preceding siblings ...)
  2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 10/12] migration: create a dedicated thread to release rdma resource Lidong Chen
@ 2018-05-30  9:43 ` Lidong Chen
  2018-05-30 17:33   ` Dr. David Alan Gilbert
  2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 12/12] migration: implement the shutdown for RDMA QIOChannel Lidong Chen
  11 siblings, 1 reply; 28+ messages in thread
From: Lidong Chen @ 2018-05-30  9:43 UTC (permalink / raw)
  To: zhang.zhanghailiang, quintela, dgilbert, berrange, aviadye, pbonzini
  Cc: qemu-devel, adido, Lidong Chen

If the peer qemu is crashed, the qemu_rdma_wait_comp_channel function
maybe loop forever. so we should also poll the cm event fd, and when
receive any cm event, we consider some error happened.

Signed-off-by: Lidong Chen <lidongchen@tencent.com>
---
 migration/rdma.c | 35 ++++++++++++++++++++++++-----------
 1 file changed, 24 insertions(+), 11 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 1b9e261..d611a06 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1489,6 +1489,9 @@ static uint64_t qemu_rdma_poll(RDMAContext *rdma, uint64_t *wr_id_out,
  */
 static int qemu_rdma_wait_comp_channel(RDMAContext *rdma)
 {
+    struct rdma_cm_event *cm_event;
+    int ret = -1;
+
     /*
      * Coroutine doesn't start until migration_fd_process_incoming()
      * so don't yield unless we know we're running inside of a coroutine.
@@ -1504,25 +1507,35 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma)
          * But we need to be able to handle 'cancel' or an error
          * without hanging forever.
          */
-        while (!rdma->error_state  && !rdma->received_error) {
-            GPollFD pfds[1];
+        while (!rdma->error_state && !rdma->received_error) {
+            GPollFD pfds[2];
             pfds[0].fd = rdma->comp_channel->fd;
             pfds[0].events = G_IO_IN | G_IO_HUP | G_IO_ERR;
+            pfds[0].revents = 0;
+
+            pfds[1].fd = rdma->channel->fd;
+            pfds[1].events = G_IO_IN | G_IO_HUP | G_IO_ERR;
+            pfds[1].revents = 0;
+
             /* 0.1s timeout, should be fine for a 'cancel' */
-            switch (qemu_poll_ns(pfds, 1, 100 * 1000 * 1000)) {
-            case 1: /* fd active */
-                return 0;
+            qemu_poll_ns(pfds, 2, 100 * 1000 * 1000);
 
-            case 0: /* Timeout, go around again */
-                break;
+            if (pfds[1].revents) {
+                ret = rdma_get_cm_event(rdma->channel, &cm_event);
+                if (!ret) {
+                    rdma_ack_cm_event(cm_event);
+                }
+                error_report("receive cm event while wait comp channel,"
+                             "cm event is %d", cm_event->event);
 
-            default: /* Error of some type -
-                      * I don't trust errno from qemu_poll_ns
-                     */
-                error_report("%s: poll failed", __func__);
+                /* consider any rdma communication event as an error */
                 return -EPIPE;
             }
 
+            if (pfds[0].revents) {
+                return 0;
+            }
+
             if (migrate_get_current()->state == MIGRATION_STATUS_CANCELLING) {
                 /* Bail out and let the cancellation happen */
                 return -EPIPE;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH v4 12/12] migration: implement the shutdown for RDMA QIOChannel
  2018-05-30  9:43 [Qemu-devel] [PATCH v4 00/12] Enable postcopy RDMA live migration Lidong Chen
                   ` (10 preceding siblings ...)
  2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 11/12] migration: poll the cm event while wait RDMA work request completion Lidong Chen
@ 2018-05-30  9:43 ` Lidong Chen
  2018-05-30 17:59   ` Dr. David Alan Gilbert
  11 siblings, 1 reply; 28+ messages in thread
From: Lidong Chen @ 2018-05-30  9:43 UTC (permalink / raw)
  To: zhang.zhanghailiang, quintela, dgilbert, berrange, aviadye, pbonzini
  Cc: qemu-devel, adido, Lidong Chen

Because RDMA QIOChannel not implement shutdown function,
If the to_dst_file was set error, the return path thread
will wait forever. and the migration thread will wait
return path thread exit.

the backtrace of return path thread is:

(gdb) bt
    #0  0x00007f372a76bb0f in ppoll () from /lib64/libc.so.6
    #1  0x000000000071dc24 in qemu_poll_ns (fds=0x7ef7091d0580, nfds=2, timeout=100000000)
        at qemu-timer.c:325
    #2  0x00000000006b2fba in qemu_rdma_wait_comp_channel (rdma=0xd424000)
        at migration/rdma.c:1501
    #3  0x00000000006b3191 in qemu_rdma_block_for_wrid (rdma=0xd424000, wrid_requested=4000,
        byte_len=0x7ef7091d0640) at migration/rdma.c:1580
    #4  0x00000000006b3638 in qemu_rdma_exchange_get_response (rdma=0xd424000,
        head=0x7ef7091d0720, expecting=3, idx=0) at migration/rdma.c:1726
    #5  0x00000000006b3ad6 in qemu_rdma_exchange_recv (rdma=0xd424000, head=0x7ef7091d0720,
        expecting=3) at migration/rdma.c:1903
    #6  0x00000000006b5d03 in qemu_rdma_get_buffer (opaque=0x6a57dc0, buf=0x5c80030 "", pos=8,
        size=32768) at migration/rdma.c:2714
    #7  0x00000000006a9635 in qemu_fill_buffer (f=0x5c80000) at migration/qemu-file.c:232
    #8  0x00000000006a9ecd in qemu_peek_byte (f=0x5c80000, offset=0)
        at migration/qemu-file.c:502
    #9  0x00000000006a9f1f in qemu_get_byte (f=0x5c80000) at migration/qemu-file.c:515
    #10 0x00000000006aa162 in qemu_get_be16 (f=0x5c80000) at migration/qemu-file.c:591
    #11 0x00000000006a46d3 in source_return_path_thread (
        opaque=0xd826a0 <current_migration.37100>) at migration/migration.c:1331
    #12 0x00007f372aa49e25 in start_thread () from /lib64/libpthread.so.0
    #13 0x00007f372a77635d in clone () from /lib64/libc.so.6

the backtrace of migration thread is:

(gdb) bt
    #0  0x00007f372aa4af57 in pthread_join () from /lib64/libpthread.so.0
    #1  0x00000000007d5711 in qemu_thread_join (thread=0xd826f8 <current_migration.37100+88>)
        at util/qemu-thread-posix.c:504
    #2  0x00000000006a4bc5 in await_return_path_close_on_source (
        ms=0xd826a0 <current_migration.37100>) at migration/migration.c:1460
    #3  0x00000000006a53e4 in migration_completion (s=0xd826a0 <current_migration.37100>,
        current_active_state=4, old_vm_running=0x7ef7089cf976, start_time=0x7ef7089cf980)
        at migration/migration.c:1695
    #4  0x00000000006a5c54 in migration_thread (opaque=0xd826a0 <current_migration.37100>)
        at migration/migration.c:1837
    #5  0x00007f372aa49e25 in start_thread () from /lib64/libpthread.so.0
    #6  0x00007f372a77635d in clone () from /lib64/libc.so.6

Signed-off-by: Lidong Chen <lidongchen@tencent.com>
---
 migration/rdma.c | 40 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/migration/rdma.c b/migration/rdma.c
index d611a06..0912b6a 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3038,6 +3038,45 @@ static int qio_channel_rdma_close(QIOChannel *ioc,
     return 0;
 }
 
+static int
+qio_channel_rdma_shutdown(QIOChannel *ioc,
+                            QIOChannelShutdown how,
+                            Error **errp)
+{
+    QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
+    RDMAContext *rdmain, *rdmaout;
+
+    rcu_read_lock();
+
+    rdmain = atomic_rcu_read(&rioc->rdmain);
+    rdmaout = atomic_rcu_read(&rioc->rdmain);
+
+    switch (how) {
+    case QIO_CHANNEL_SHUTDOWN_READ:
+        if (rdmain) {
+            rdmain->error_state = -1;
+        }
+        break;
+    case QIO_CHANNEL_SHUTDOWN_WRITE:
+        if (rdmaout) {
+            rdmaout->error_state = -1;
+        }
+        break;
+    case QIO_CHANNEL_SHUTDOWN_BOTH:
+    default:
+        if (rdmain) {
+            rdmain->error_state = -1;
+        }
+        if (rdmaout) {
+            rdmaout->error_state = -1;
+        }
+        break;
+    }
+
+    rcu_read_unlock();
+    return 0;
+}
+
 /*
  * Parameters:
  *    @offset == 0 :
@@ -3864,6 +3903,7 @@ static void qio_channel_rdma_class_init(ObjectClass *klass,
     ioc_klass->io_close = qio_channel_rdma_close;
     ioc_klass->io_create_watch = qio_channel_rdma_create_watch;
     ioc_klass->io_set_aio_fd_handler = qio_channel_rdma_set_aio_fd_handler;
+    ioc_klass->io_shutdown = qio_channel_rdma_shutdown;
 }
 
 static const TypeInfo qio_channel_rdma_info = {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v4 07/12] migration: not wait RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnect
  2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 07/12] migration: not wait RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnect Lidong Chen
@ 2018-05-30 12:24   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 28+ messages in thread
From: Dr. David Alan Gilbert @ 2018-05-30 12:24 UTC (permalink / raw)
  To: Lidong Chen
  Cc: zhang.zhanghailiang, quintela, berrange, aviadye, pbonzini,
	qemu-devel, adido, Lidong Chen

* Lidong Chen (jemmy858585@gmail.com) wrote:
> From: Lidong Chen <jemmy858585@gmail.com>
> 
> When cancel migration during RDMA precopy, the source qemu main thread hangs sometime.
> 
> The backtrace is:
>     (gdb) bt
>     #0  0x00007f249eabd43d in write () from /lib64/libpthread.so.0
>     #1  0x00007f24a1ce98e4 in rdma_get_cm_event (channel=0x4675d10, event=0x7ffe2f643dd0) at src/cma.c:2189
>     #2  0x00000000007b6166 in qemu_rdma_cleanup (rdma=0x6784000) at migration/rdma.c:2296
>     #3  0x00000000007b7cae in qio_channel_rdma_close (ioc=0x3bfcc30, errp=0x0) at migration/rdma.c:2999
>     #4  0x00000000008db60e in qio_channel_close (ioc=0x3bfcc30, errp=0x0) at io/channel.c:273
>     #5  0x00000000007a8765 in channel_close (opaque=0x3bfcc30) at migration/qemu-file-channel.c:98
>     #6  0x00000000007a71f9 in qemu_fclose (f=0x527c000) at migration/qemu-file.c:334
>     #7  0x0000000000795b96 in migrate_fd_cleanup (opaque=0x3b46280) at migration/migration.c:1162
>     #8  0x000000000093a71b in aio_bh_call (bh=0x3db7a20) at util/async.c:90
>     #9  0x000000000093a7b2 in aio_bh_poll (ctx=0x3b121c0) at util/async.c:118
>     #10 0x000000000093f2ad in aio_dispatch (ctx=0x3b121c0) at util/aio-posix.c:436
>     #11 0x000000000093ab41 in aio_ctx_dispatch (source=0x3b121c0, callback=0x0, user_data=0x0)
>         at util/async.c:261
>     #12 0x00007f249f73c7aa in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
>     #13 0x000000000093dc5e in glib_pollfds_poll () at util/main-loop.c:215
>     #14 0x000000000093dd4e in os_host_main_loop_wait (timeout=28000000) at util/main-loop.c:263
>     #15 0x000000000093de05 in main_loop_wait (nonblocking=0) at util/main-loop.c:522
>     #16 0x00000000005bc6a5 in main_loop () at vl.c:1944
>     #17 0x00000000005c39b5 in main (argc=56, argv=0x7ffe2f6443f8, envp=0x3ad0030) at vl.c:4752
> 
> It does not get the RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnect sometime.
> 
> According to IB Spec once active side send DREQ message, it should wait for DREP message
> and only once it arrived it should trigger a DISCONNECT event. DREP message can be dropped
> due to network issues.
> For that case the spec defines a DREP_timeout state in the CM state machine, if the DREP is
> dropped we should get a timeout and a TIMEWAIT_EXIT event will be trigger.
> Unfortunately the current kernel CM implementation doesn't include the DREP_timeout state
> and in above scenario we will not get DISCONNECT or TIMEWAIT_EXIT events.
> 
> So it should not invoke rdma_get_cm_event which may hang forever, and the event channel
> is also destroyed in qemu_rdma_cleanup.
> 
> Signed-off-by: Lidong Chen <lidongchen@tencent.com>



Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/rdma.c       | 12 ++----------
>  migration/trace-events |  1 -
>  2 files changed, 2 insertions(+), 11 deletions(-)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index 0dd4033..92e4d30 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -2275,8 +2275,7 @@ static int qemu_rdma_write(QEMUFile *f, RDMAContext *rdma,
>  
>  static void qemu_rdma_cleanup(RDMAContext *rdma)
>  {
> -    struct rdma_cm_event *cm_event;
> -    int ret, idx;
> +    int idx;
>  
>      if (rdma->cm_id && rdma->connected) {
>          if ((rdma->error_state ||
> @@ -2290,14 +2289,7 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
>              qemu_rdma_post_send_control(rdma, NULL, &head);
>          }
>  
> -        ret = rdma_disconnect(rdma->cm_id);
> -        if (!ret) {
> -            trace_qemu_rdma_cleanup_waiting_for_disconnect();
> -            ret = rdma_get_cm_event(rdma->channel, &cm_event);
> -            if (!ret) {
> -                rdma_ack_cm_event(cm_event);
> -            }
> -        }
> +        rdma_disconnect(rdma->cm_id);
>          trace_qemu_rdma_cleanup_disconnect();
>          rdma->connected = false;
>      }
> diff --git a/migration/trace-events b/migration/trace-events
> index 3c798dd..4a768ea 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -146,7 +146,6 @@ qemu_rdma_accept_pin_state(bool pin) "%d"
>  qemu_rdma_accept_pin_verbsc(void *verbs) "Verbs context after listen: %p"
>  qemu_rdma_block_for_wrid_miss(const char *wcompstr, int wcomp, const char *gcompstr, uint64_t req) "A Wanted wrid %s (%d) but got %s (%" PRIu64 ")"
>  qemu_rdma_cleanup_disconnect(void) ""
> -qemu_rdma_cleanup_waiting_for_disconnect(void) ""
>  qemu_rdma_close(void) ""
>  qemu_rdma_connect_pin_all_requested(void) ""
>  qemu_rdma_connect_pin_all_outcome(bool pin) "%d"
> -- 
> 1.8.3.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v4 04/12] migration: avoid concurrent invoke channel_close by different threads
  2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 04/12] migration: avoid concurrent invoke channel_close by different threads Lidong Chen
@ 2018-05-30 14:45   ` Dr. David Alan Gilbert
  2018-05-31  7:07     ` 858585 jemmy
  0 siblings, 1 reply; 28+ messages in thread
From: Dr. David Alan Gilbert @ 2018-05-30 14:45 UTC (permalink / raw)
  To: Lidong Chen
  Cc: zhang.zhanghailiang, quintela, berrange, aviadye, pbonzini,
	qemu-devel, adido, Lidong Chen

* Lidong Chen (jemmy858585@gmail.com) wrote:
> From: Lidong Chen <jemmy858585@gmail.com>
> 
> The channel_close maybe invoked by different threads. For example, source
> qemu invokes qemu_fclose in main thread, migration thread and return path
> thread. Destination qemu invokes qemu_fclose in main thread, listen thread
> and COLO incoming thread.
> 
> Add a mutex in QEMUFile struct to avoid concurrent invoke channel_close.
> 
> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
> ---
>  migration/qemu-file.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> index 977b9ae..87d0f05 100644
> --- a/migration/qemu-file.c
> +++ b/migration/qemu-file.c
> @@ -52,6 +52,7 @@ struct QEMUFile {
>      unsigned int iovcnt;
>  
>      int last_error;
> +    QemuMutex lock;

That could do with a comment saying what you're protecting

>  };
>  
>  /*
> @@ -96,6 +97,7 @@ QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps *ops)
>  
>      f = g_new0(QEMUFile, 1);
>  
> +    qemu_mutex_init(&f->lock);
>      f->opaque = opaque;
>      f->ops = ops;
>      return f;
> @@ -328,7 +330,9 @@ int qemu_fclose(QEMUFile *f)
>      ret = qemu_file_get_error(f);
>  
>      if (f->ops->close) {
> +        qemu_mutex_lock(&f->lock);
>          int ret2 = f->ops->close(f->opaque);
> +        qemu_mutex_unlock(&f->lock);

OK, and at least for the RDMA code, if it calls
close a 2nd time, rioc->rdma is checked so it wont actually free stuff a
2nd time.

>          if (ret >= 0) {
>              ret = ret2;
>          }
> @@ -339,6 +343,7 @@ int qemu_fclose(QEMUFile *f)
>      if (f->last_error) {
>          ret = f->last_error;
>      }
> +    qemu_mutex_destroy(&f->lock);
>      g_free(f);

Hmm but that's not safe; if two things really do call qemu_fclose()
on the same structure they race here and can end up destroying the lock
twice, or doing f->lock  after the 1st one has already g_free(f).


So lets go back a step.
I think:
  a) There should always be a separate QEMUFile* for
     to_src_file and from_src_file - I don't see where you open
     the 2nd one; I don't see your implementation of
     f->ops->get_return_path.
  b) I *think* that while the different threads might all call
     fclose(), I think there should only ever be one qemu_fclose
     call for each direction on the QEMUFile.

But now we have two problems:
  If (a) is true then f->lock  is separate on each one so
   doesn't really protect if the two directions are closed
   at once. (Assuming (b) is true)

  If (a) is false and we actually share a single QEMUFile then
 that race at the end happens.

Dave


>      trace_qemu_file_fclose();
>      return ret;
> -- 
> 1.8.3.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v4 10/12] migration: create a dedicated thread to release rdma resource
  2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 10/12] migration: create a dedicated thread to release rdma resource Lidong Chen
@ 2018-05-30 16:50   ` Dr. David Alan Gilbert
  2018-05-31  7:25     ` 858585 jemmy
  0 siblings, 1 reply; 28+ messages in thread
From: Dr. David Alan Gilbert @ 2018-05-30 16:50 UTC (permalink / raw)
  To: Lidong Chen
  Cc: zhang.zhanghailiang, quintela, berrange, aviadye, pbonzini,
	qemu-devel, adido, Lidong Chen

* Lidong Chen (jemmy858585@gmail.com) wrote:
> ibv_dereg_mr wait for a long time for big memory size virtual server.
> 
> The test result is:
>   10GB  326ms
>   20GB  699ms
>   30GB  1021ms
>   40GB  1387ms
>   50GB  1712ms
>   60GB  2034ms
>   70GB  2457ms
>   80GB  2807ms
>   90GB  3107ms
>   100GB 3474ms
>   110GB 3735ms
>   120GB 4064ms
>   130GB 4567ms
>   140GB 4886ms
> 
> this will cause the guest os hang for a while when migration finished.
> So create a dedicated thread to release rdma resource.
> 
> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
> ---
>  migration/rdma.c | 21 +++++++++++++++++----
>  1 file changed, 17 insertions(+), 4 deletions(-)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index dfa4f77..1b9e261 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -2979,12 +2979,12 @@ static void qio_channel_rdma_set_aio_fd_handler(QIOChannel *ioc,
>      }
>  }
>  
> -static int qio_channel_rdma_close(QIOChannel *ioc,
> -                                  Error **errp)
> +static void *qio_channel_rdma_close_thread(void *arg)
>  {
> -    QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
> +    QIOChannelRDMA *rioc = arg;
>      RDMAContext *rdmain, *rdmaout;
> -    trace_qemu_rdma_close();
> +
> +    rcu_register_thread();
>  
>      rdmain = rioc->rdmain;
>      if (rdmain) {
> @@ -3009,6 +3009,19 @@ static int qio_channel_rdma_close(QIOChannel *ioc,
>      g_free(rdmain);
>      g_free(rdmaout);
>  
> +    rcu_unregister_thread();
> +    return NULL;
> +}
> +
> +static int qio_channel_rdma_close(QIOChannel *ioc,
> +                                  Error **errp)
> +{
> +    QemuThread t;
> +    QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
> +    trace_qemu_rdma_close();
> +
> +    qemu_thread_create(&t, "rdma cleanup", qio_channel_rdma_close_thread,
> +                           rioc, QEMU_THREAD_DETACHED);

I don't think this can be this simple; consider the lock in patch 4;
now that lock means qui_channel_rdma_close() can't be called in
parallel; but with this change it means:


     f->lock
       qemu_thread_create  (1)
    !f->lock
     f->lock
       qemu_thread_create
    !f->lock

so we don't really protect the thing you were trying to lock

Dave

>      return 0;
>  }
>  
> -- 
> 1.8.3.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v4 11/12] migration: poll the cm event while wait RDMA work request completion
  2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 11/12] migration: poll the cm event while wait RDMA work request completion Lidong Chen
@ 2018-05-30 17:33   ` Dr. David Alan Gilbert
  2018-05-31  7:36     ` 858585 jemmy
  0 siblings, 1 reply; 28+ messages in thread
From: Dr. David Alan Gilbert @ 2018-05-30 17:33 UTC (permalink / raw)
  To: Lidong Chen
  Cc: zhang.zhanghailiang, quintela, berrange, aviadye, pbonzini,
	qemu-devel, adido, Lidong Chen

* Lidong Chen (jemmy858585@gmail.com) wrote:
> If the peer qemu is crashed, the qemu_rdma_wait_comp_channel function
> maybe loop forever. so we should also poll the cm event fd, and when
> receive any cm event, we consider some error happened.
> 
> Signed-off-by: Lidong Chen <lidongchen@tencent.com>

I don't understand enough about the way the infiniband fd's work to
fully review this; so I'd appreciate if some one who does could
comment/add their review.

> ---
>  migration/rdma.c | 35 ++++++++++++++++++++++++-----------
>  1 file changed, 24 insertions(+), 11 deletions(-)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index 1b9e261..d611a06 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -1489,6 +1489,9 @@ static uint64_t qemu_rdma_poll(RDMAContext *rdma, uint64_t *wr_id_out,
>   */
>  static int qemu_rdma_wait_comp_channel(RDMAContext *rdma)
>  {
> +    struct rdma_cm_event *cm_event;
> +    int ret = -1;
> +
>      /*
>       * Coroutine doesn't start until migration_fd_process_incoming()
>       * so don't yield unless we know we're running inside of a coroutine.
> @@ -1504,25 +1507,35 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma)
>           * But we need to be able to handle 'cancel' or an error
>           * without hanging forever.
>           */
> -        while (!rdma->error_state  && !rdma->received_error) {
> -            GPollFD pfds[1];
> +        while (!rdma->error_state && !rdma->received_error) {
> +            GPollFD pfds[2];
>              pfds[0].fd = rdma->comp_channel->fd;
>              pfds[0].events = G_IO_IN | G_IO_HUP | G_IO_ERR;
> +            pfds[0].revents = 0;
> +
> +            pfds[1].fd = rdma->channel->fd;
> +            pfds[1].events = G_IO_IN | G_IO_HUP | G_IO_ERR;
> +            pfds[1].revents = 0;
> +
>              /* 0.1s timeout, should be fine for a 'cancel' */
> -            switch (qemu_poll_ns(pfds, 1, 100 * 1000 * 1000)) {
> -            case 1: /* fd active */
> -                return 0;
> +            qemu_poll_ns(pfds, 2, 100 * 1000 * 1000);

Shouldn't we still check the return value of this; if it's negative
something has gone wrong.

Dave

> -            case 0: /* Timeout, go around again */
> -                break;
> +            if (pfds[1].revents) {
> +                ret = rdma_get_cm_event(rdma->channel, &cm_event);
> +                if (!ret) {
> +                    rdma_ack_cm_event(cm_event);
> +                }
> +                error_report("receive cm event while wait comp channel,"
> +                             "cm event is %d", cm_event->event);
>  
> -            default: /* Error of some type -
> -                      * I don't trust errno from qemu_poll_ns
> -                     */
> -                error_report("%s: poll failed", __func__);
> +                /* consider any rdma communication event as an error */
>                  return -EPIPE;
>              }
>  
> +            if (pfds[0].revents) {
> +                return 0;
> +            }
> +
>              if (migrate_get_current()->state == MIGRATION_STATUS_CANCELLING) {
>                  /* Bail out and let the cancellation happen */
>                  return -EPIPE;
> -- 
> 1.8.3.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v4 12/12] migration: implement the shutdown for RDMA QIOChannel
  2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 12/12] migration: implement the shutdown for RDMA QIOChannel Lidong Chen
@ 2018-05-30 17:59   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 28+ messages in thread
From: Dr. David Alan Gilbert @ 2018-05-30 17:59 UTC (permalink / raw)
  To: Lidong Chen
  Cc: zhang.zhanghailiang, quintela, berrange, aviadye, pbonzini,
	qemu-devel, adido, Lidong Chen

* Lidong Chen (jemmy858585@gmail.com) wrote:
> Because RDMA QIOChannel not implement shutdown function,
> If the to_dst_file was set error, the return path thread
> will wait forever. and the migration thread will wait
> return path thread exit.
> 
> the backtrace of return path thread is:
> 
> (gdb) bt
>     #0  0x00007f372a76bb0f in ppoll () from /lib64/libc.so.6
>     #1  0x000000000071dc24 in qemu_poll_ns (fds=0x7ef7091d0580, nfds=2, timeout=100000000)
>         at qemu-timer.c:325
>     #2  0x00000000006b2fba in qemu_rdma_wait_comp_channel (rdma=0xd424000)
>         at migration/rdma.c:1501
>     #3  0x00000000006b3191 in qemu_rdma_block_for_wrid (rdma=0xd424000, wrid_requested=4000,
>         byte_len=0x7ef7091d0640) at migration/rdma.c:1580
>     #4  0x00000000006b3638 in qemu_rdma_exchange_get_response (rdma=0xd424000,
>         head=0x7ef7091d0720, expecting=3, idx=0) at migration/rdma.c:1726
>     #5  0x00000000006b3ad6 in qemu_rdma_exchange_recv (rdma=0xd424000, head=0x7ef7091d0720,
>         expecting=3) at migration/rdma.c:1903
>     #6  0x00000000006b5d03 in qemu_rdma_get_buffer (opaque=0x6a57dc0, buf=0x5c80030 "", pos=8,
>         size=32768) at migration/rdma.c:2714
>     #7  0x00000000006a9635 in qemu_fill_buffer (f=0x5c80000) at migration/qemu-file.c:232
>     #8  0x00000000006a9ecd in qemu_peek_byte (f=0x5c80000, offset=0)
>         at migration/qemu-file.c:502
>     #9  0x00000000006a9f1f in qemu_get_byte (f=0x5c80000) at migration/qemu-file.c:515
>     #10 0x00000000006aa162 in qemu_get_be16 (f=0x5c80000) at migration/qemu-file.c:591
>     #11 0x00000000006a46d3 in source_return_path_thread (
>         opaque=0xd826a0 <current_migration.37100>) at migration/migration.c:1331
>     #12 0x00007f372aa49e25 in start_thread () from /lib64/libpthread.so.0
>     #13 0x00007f372a77635d in clone () from /lib64/libc.so.6
> 
> the backtrace of migration thread is:
> 
> (gdb) bt
>     #0  0x00007f372aa4af57 in pthread_join () from /lib64/libpthread.so.0
>     #1  0x00000000007d5711 in qemu_thread_join (thread=0xd826f8 <current_migration.37100+88>)
>         at util/qemu-thread-posix.c:504
>     #2  0x00000000006a4bc5 in await_return_path_close_on_source (
>         ms=0xd826a0 <current_migration.37100>) at migration/migration.c:1460
>     #3  0x00000000006a53e4 in migration_completion (s=0xd826a0 <current_migration.37100>,
>         current_active_state=4, old_vm_running=0x7ef7089cf976, start_time=0x7ef7089cf980)
>         at migration/migration.c:1695
>     #4  0x00000000006a5c54 in migration_thread (opaque=0xd826a0 <current_migration.37100>)
>         at migration/migration.c:1837
>     #5  0x00007f372aa49e25 in start_thread () from /lib64/libpthread.so.0
>     #6  0x00007f372a77635d in clone () from /lib64/libc.so.6
> 
> Signed-off-by: Lidong Chen <lidongchen@tencent.com>

Yeh OK, this should help;  so:

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

I wonder if there are any places we can get stuck in rdma calls though where this isn't enough?

> ---
>  migration/rdma.c | 40 ++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 40 insertions(+)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index d611a06..0912b6a 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -3038,6 +3038,45 @@ static int qio_channel_rdma_close(QIOChannel *ioc,
>      return 0;
>  }
>  
> +static int
> +qio_channel_rdma_shutdown(QIOChannel *ioc,
> +                            QIOChannelShutdown how,
> +                            Error **errp)
> +{
> +    QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
> +    RDMAContext *rdmain, *rdmaout;
> +
> +    rcu_read_lock();
> +
> +    rdmain = atomic_rcu_read(&rioc->rdmain);
> +    rdmaout = atomic_rcu_read(&rioc->rdmain);
> +
> +    switch (how) {
> +    case QIO_CHANNEL_SHUTDOWN_READ:
> +        if (rdmain) {
> +            rdmain->error_state = -1;
> +        }
> +        break;
> +    case QIO_CHANNEL_SHUTDOWN_WRITE:
> +        if (rdmaout) {
> +            rdmaout->error_state = -1;
> +        }
> +        break;
> +    case QIO_CHANNEL_SHUTDOWN_BOTH:
> +    default:
> +        if (rdmain) {
> +            rdmain->error_state = -1;
> +        }
> +        if (rdmaout) {
> +            rdmaout->error_state = -1;
> +        }
> +        break;
> +    }
> +
> +    rcu_read_unlock();
> +    return 0;
> +}
> +
>  /*
>   * Parameters:
>   *    @offset == 0 :
> @@ -3864,6 +3903,7 @@ static void qio_channel_rdma_class_init(ObjectClass *klass,
>      ioc_klass->io_close = qio_channel_rdma_close;
>      ioc_klass->io_create_watch = qio_channel_rdma_create_watch;
>      ioc_klass->io_set_aio_fd_handler = qio_channel_rdma_set_aio_fd_handler;
> +    ioc_klass->io_shutdown = qio_channel_rdma_shutdown;
>  }
>  
>  static const TypeInfo qio_channel_rdma_info = {
> -- 
> 1.8.3.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v4 04/12] migration: avoid concurrent invoke channel_close by different threads
  2018-05-30 14:45   ` Dr. David Alan Gilbert
@ 2018-05-31  7:07     ` 858585 jemmy
  2018-05-31 10:52       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 28+ messages in thread
From: 858585 jemmy @ 2018-05-31  7:07 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: zhang.zhanghailiang, Juan Quintela, Daniel P. Berrange,
	Aviad Yehezkel, Paolo Bonzini, qemu-devel, Adi Dotan,
	Lidong Chen

On Wed, May 30, 2018 at 10:45 PM, Dr. David Alan Gilbert
<dgilbert@redhat.com> wrote:
> * Lidong Chen (jemmy858585@gmail.com) wrote:
>> From: Lidong Chen <jemmy858585@gmail.com>
>>
>> The channel_close maybe invoked by different threads. For example, source
>> qemu invokes qemu_fclose in main thread, migration thread and return path
>> thread. Destination qemu invokes qemu_fclose in main thread, listen thread
>> and COLO incoming thread.
>>
>> Add a mutex in QEMUFile struct to avoid concurrent invoke channel_close.
>>
>> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
>> ---
>>  migration/qemu-file.c | 5 +++++
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/migration/qemu-file.c b/migration/qemu-file.c
>> index 977b9ae..87d0f05 100644
>> --- a/migration/qemu-file.c
>> +++ b/migration/qemu-file.c
>> @@ -52,6 +52,7 @@ struct QEMUFile {
>>      unsigned int iovcnt;
>>
>>      int last_error;
>> +    QemuMutex lock;
>
> That could do with a comment saying what you're protecting
>
>>  };
>>
>>  /*
>> @@ -96,6 +97,7 @@ QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps *ops)
>>
>>      f = g_new0(QEMUFile, 1);
>>
>> +    qemu_mutex_init(&f->lock);
>>      f->opaque = opaque;
>>      f->ops = ops;
>>      return f;
>> @@ -328,7 +330,9 @@ int qemu_fclose(QEMUFile *f)
>>      ret = qemu_file_get_error(f);
>>
>>      if (f->ops->close) {
>> +        qemu_mutex_lock(&f->lock);
>>          int ret2 = f->ops->close(f->opaque);
>> +        qemu_mutex_unlock(&f->lock);
>
> OK, and at least for the RDMA code, if it calls
> close a 2nd time, rioc->rdma is checked so it wont actually free stuff a
> 2nd time.
>
>>          if (ret >= 0) {
>>              ret = ret2;
>>          }
>> @@ -339,6 +343,7 @@ int qemu_fclose(QEMUFile *f)
>>      if (f->last_error) {
>>          ret = f->last_error;
>>      }
>> +    qemu_mutex_destroy(&f->lock);
>>      g_free(f);
>
> Hmm but that's not safe; if two things really do call qemu_fclose()
> on the same structure they race here and can end up destroying the lock
> twice, or doing f->lock  after the 1st one has already g_free(f).

>
>
> So lets go back a step.
> I think:
>   a) There should always be a separate QEMUFile* for
>      to_src_file and from_src_file - I don't see where you open
>      the 2nd one; I don't see your implementation of
>      f->ops->get_return_path.

yes, current qemu version use a separate QEMUFile* for to_src_file and
from_src_file.
and the two QEMUFile point to one QIOChannelRDMA.

the f->ops->get_return_path is implemented by channel_output_ops or
channel_input_ops.

>   b) I *think* that while the different threads might all call
>      fclose(), I think there should only ever be one qemu_fclose
>      call for each direction on the QEMUFile.
>
> But now we have two problems:
>   If (a) is true then f->lock  is separate on each one so
>    doesn't really protect if the two directions are closed
>    at once. (Assuming (b) is true)

yes, you are right.  so I should add a QemuMutex in QIOChannel structure, not
QEMUFile structure. and qemu_mutex_destroy the QemuMutex in
qio_channel_finalize.

Thank you.

>
>   If (a) is false and we actually share a single QEMUFile then
>  that race at the end happens.
>
> Dave
>
>
>>      trace_qemu_file_fclose();
>>      return ret;
>> --
>> 1.8.3.1
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v4 10/12] migration: create a dedicated thread to release rdma resource
  2018-05-30 16:50   ` Dr. David Alan Gilbert
@ 2018-05-31  7:25     ` 858585 jemmy
  2018-05-31 10:55       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 28+ messages in thread
From: 858585 jemmy @ 2018-05-31  7:25 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: zhang.zhanghailiang, Juan Quintela, Daniel P. Berrange,
	Aviad Yehezkel, Paolo Bonzini, qemu-devel, Adi Dotan,
	Lidong Chen

On Thu, May 31, 2018 at 12:50 AM, Dr. David Alan Gilbert
<dgilbert@redhat.com> wrote:
> * Lidong Chen (jemmy858585@gmail.com) wrote:
>> ibv_dereg_mr wait for a long time for big memory size virtual server.
>>
>> The test result is:
>>   10GB  326ms
>>   20GB  699ms
>>   30GB  1021ms
>>   40GB  1387ms
>>   50GB  1712ms
>>   60GB  2034ms
>>   70GB  2457ms
>>   80GB  2807ms
>>   90GB  3107ms
>>   100GB 3474ms
>>   110GB 3735ms
>>   120GB 4064ms
>>   130GB 4567ms
>>   140GB 4886ms
>>
>> this will cause the guest os hang for a while when migration finished.
>> So create a dedicated thread to release rdma resource.
>>
>> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
>> ---
>>  migration/rdma.c | 21 +++++++++++++++++----
>>  1 file changed, 17 insertions(+), 4 deletions(-)
>>
>> diff --git a/migration/rdma.c b/migration/rdma.c
>> index dfa4f77..1b9e261 100644
>> --- a/migration/rdma.c
>> +++ b/migration/rdma.c
>> @@ -2979,12 +2979,12 @@ static void qio_channel_rdma_set_aio_fd_handler(QIOChannel *ioc,
>>      }
>>  }
>>
>> -static int qio_channel_rdma_close(QIOChannel *ioc,
>> -                                  Error **errp)
>> +static void *qio_channel_rdma_close_thread(void *arg)
>>  {
>> -    QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
>> +    QIOChannelRDMA *rioc = arg;
>>      RDMAContext *rdmain, *rdmaout;
>> -    trace_qemu_rdma_close();
>> +
>> +    rcu_register_thread();
>>
>>      rdmain = rioc->rdmain;
>>      if (rdmain) {
>> @@ -3009,6 +3009,19 @@ static int qio_channel_rdma_close(QIOChannel *ioc,
>>      g_free(rdmain);
>>      g_free(rdmaout);
>>
>> +    rcu_unregister_thread();
>> +    return NULL;
>> +}
>> +
>> +static int qio_channel_rdma_close(QIOChannel *ioc,
>> +                                  Error **errp)
>> +{
>> +    QemuThread t;
>> +    QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
>> +    trace_qemu_rdma_close();
>> +
>> +    qemu_thread_create(&t, "rdma cleanup", qio_channel_rdma_close_thread,
>> +                           rioc, QEMU_THREAD_DETACHED);
>
> I don't think this can be this simple; consider the lock in patch 4;
> now that lock means qui_channel_rdma_close() can't be called in
> parallel; but with this change it means:
>
>
>      f->lock
>        qemu_thread_create  (1)
>     !f->lock
>      f->lock
>        qemu_thread_create
>     !f->lock
>
> so we don't really protect the thing you were trying to lock

yes, I should not use rioc as the thread arg.

static int qio_channel_rdma_close(QIOChannel *ioc,
                                  Error **errp)
{
    QemuThread t;
    RDMAContext *rdma[2];
    QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);

    trace_qemu_rdma_close();
    if (rioc->rdmain || rioc->rdmaout) {
        rdma[0] =  rioc->rdmain;
        rdma[1] =  rioc->rdmaout;
        qemu_thread_create(&t, "rdma cleanup", qio_channel_rdma_close_thread,
                           rdma, QEMU_THREAD_DETACHED);
        rioc->rdmain = NULL;
        rioc->rdmaout = NULL;
    }
    return 0;
}

>
> Dave
>
>>      return 0;
>>  }
>>
>> --
>> 1.8.3.1
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v4 11/12] migration: poll the cm event while wait RDMA work request completion
  2018-05-30 17:33   ` Dr. David Alan Gilbert
@ 2018-05-31  7:36     ` 858585 jemmy
  2018-06-03 15:04       ` Aviad Yehezkel
  0 siblings, 1 reply; 28+ messages in thread
From: 858585 jemmy @ 2018-05-31  7:36 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, Aviad Yehezkel
  Cc: zhang.zhanghailiang, Juan Quintela, Daniel P. Berrange,
	Paolo Bonzini, qemu-devel, Adi Dotan, Lidong Chen

On Thu, May 31, 2018 at 1:33 AM, Dr. David Alan Gilbert
<dgilbert@redhat.com> wrote:
> * Lidong Chen (jemmy858585@gmail.com) wrote:
>> If the peer qemu is crashed, the qemu_rdma_wait_comp_channel function
>> maybe loop forever. so we should also poll the cm event fd, and when
>> receive any cm event, we consider some error happened.
>>
>> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
>
> I don't understand enough about the way the infiniband fd's work to
> fully review this; so I'd appreciate if some one who does could
> comment/add their review.

Hi Avaid:
    we need your help. I also not find any document about the cq
channel event fd and
cm channel event f.
    Should we set the events to G_IO_IN | G_IO_HUP | G_IO_ERR? or
G_IO_IN is enough?
    pfds[0].events = G_IO_IN | G_IO_HUP | G_IO_ERR;
    pfds[1].events = G_IO_IN | G_IO_HUP | G_IO_ERR;
    Thanks.

>
>> ---
>>  migration/rdma.c | 35 ++++++++++++++++++++++++-----------
>>  1 file changed, 24 insertions(+), 11 deletions(-)
>>
>> diff --git a/migration/rdma.c b/migration/rdma.c
>> index 1b9e261..d611a06 100644
>> --- a/migration/rdma.c
>> +++ b/migration/rdma.c
>> @@ -1489,6 +1489,9 @@ static uint64_t qemu_rdma_poll(RDMAContext *rdma, uint64_t *wr_id_out,
>>   */
>>  static int qemu_rdma_wait_comp_channel(RDMAContext *rdma)
>>  {
>> +    struct rdma_cm_event *cm_event;
>> +    int ret = -1;
>> +
>>      /*
>>       * Coroutine doesn't start until migration_fd_process_incoming()
>>       * so don't yield unless we know we're running inside of a coroutine.
>> @@ -1504,25 +1507,35 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma)
>>           * But we need to be able to handle 'cancel' or an error
>>           * without hanging forever.
>>           */
>> -        while (!rdma->error_state  && !rdma->received_error) {
>> -            GPollFD pfds[1];
>> +        while (!rdma->error_state && !rdma->received_error) {
>> +            GPollFD pfds[2];
>>              pfds[0].fd = rdma->comp_channel->fd;
>>              pfds[0].events = G_IO_IN | G_IO_HUP | G_IO_ERR;
>> +            pfds[0].revents = 0;
>> +
>> +            pfds[1].fd = rdma->channel->fd;
>> +            pfds[1].events = G_IO_IN | G_IO_HUP | G_IO_ERR;
>> +            pfds[1].revents = 0;
>> +
>>              /* 0.1s timeout, should be fine for a 'cancel' */
>> -            switch (qemu_poll_ns(pfds, 1, 100 * 1000 * 1000)) {
>> -            case 1: /* fd active */
>> -                return 0;
>> +            qemu_poll_ns(pfds, 2, 100 * 1000 * 1000);
>
> Shouldn't we still check the return value of this; if it's negative
> something has gone wrong.

I will fix this.
Thanks.

>
> Dave
>
>> -            case 0: /* Timeout, go around again */
>> -                break;
>> +            if (pfds[1].revents) {
>> +                ret = rdma_get_cm_event(rdma->channel, &cm_event);
>> +                if (!ret) {
>> +                    rdma_ack_cm_event(cm_event);
>> +                }
>> +                error_report("receive cm event while wait comp channel,"
>> +                             "cm event is %d", cm_event->event);
>>
>> -            default: /* Error of some type -
>> -                      * I don't trust errno from qemu_poll_ns
>> -                     */
>> -                error_report("%s: poll failed", __func__);
>> +                /* consider any rdma communication event as an error */
>>                  return -EPIPE;
>>              }
>>
>> +            if (pfds[0].revents) {
>> +                return 0;
>> +            }
>> +
>>              if (migrate_get_current()->state == MIGRATION_STATUS_CANCELLING) {
>>                  /* Bail out and let the cancellation happen */
>>                  return -EPIPE;
>> --
>> 1.8.3.1
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v4 04/12] migration: avoid concurrent invoke channel_close by different threads
  2018-05-31  7:07     ` 858585 jemmy
@ 2018-05-31 10:52       ` Dr. David Alan Gilbert
  2018-06-03 13:50         ` 858585 jemmy
  0 siblings, 1 reply; 28+ messages in thread
From: Dr. David Alan Gilbert @ 2018-05-31 10:52 UTC (permalink / raw)
  To: 858585 jemmy
  Cc: zhang.zhanghailiang, Juan Quintela, Daniel P. Berrange,
	Aviad Yehezkel, Paolo Bonzini, qemu-devel, Adi Dotan,
	Lidong Chen

* 858585 jemmy (jemmy858585@gmail.com) wrote:
> On Wed, May 30, 2018 at 10:45 PM, Dr. David Alan Gilbert
> <dgilbert@redhat.com> wrote:
> > * Lidong Chen (jemmy858585@gmail.com) wrote:
> >> From: Lidong Chen <jemmy858585@gmail.com>
> >>
> >> The channel_close maybe invoked by different threads. For example, source
> >> qemu invokes qemu_fclose in main thread, migration thread and return path
> >> thread. Destination qemu invokes qemu_fclose in main thread, listen thread
> >> and COLO incoming thread.
> >>
> >> Add a mutex in QEMUFile struct to avoid concurrent invoke channel_close.
> >>
> >> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
> >> ---
> >>  migration/qemu-file.c | 5 +++++
> >>  1 file changed, 5 insertions(+)
> >>
> >> diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> >> index 977b9ae..87d0f05 100644
> >> --- a/migration/qemu-file.c
> >> +++ b/migration/qemu-file.c
> >> @@ -52,6 +52,7 @@ struct QEMUFile {
> >>      unsigned int iovcnt;
> >>
> >>      int last_error;
> >> +    QemuMutex lock;
> >
> > That could do with a comment saying what you're protecting
> >
> >>  };
> >>
> >>  /*
> >> @@ -96,6 +97,7 @@ QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps *ops)
> >>
> >>      f = g_new0(QEMUFile, 1);
> >>
> >> +    qemu_mutex_init(&f->lock);
> >>      f->opaque = opaque;
> >>      f->ops = ops;
> >>      return f;
> >> @@ -328,7 +330,9 @@ int qemu_fclose(QEMUFile *f)
> >>      ret = qemu_file_get_error(f);
> >>
> >>      if (f->ops->close) {
> >> +        qemu_mutex_lock(&f->lock);
> >>          int ret2 = f->ops->close(f->opaque);
> >> +        qemu_mutex_unlock(&f->lock);
> >
> > OK, and at least for the RDMA code, if it calls
> > close a 2nd time, rioc->rdma is checked so it wont actually free stuff a
> > 2nd time.
> >
> >>          if (ret >= 0) {
> >>              ret = ret2;
> >>          }
> >> @@ -339,6 +343,7 @@ int qemu_fclose(QEMUFile *f)
> >>      if (f->last_error) {
> >>          ret = f->last_error;
> >>      }
> >> +    qemu_mutex_destroy(&f->lock);
> >>      g_free(f);
> >
> > Hmm but that's not safe; if two things really do call qemu_fclose()
> > on the same structure they race here and can end up destroying the lock
> > twice, or doing f->lock  after the 1st one has already g_free(f).
> 
> >
> >
> > So lets go back a step.
> > I think:
> >   a) There should always be a separate QEMUFile* for
> >      to_src_file and from_src_file - I don't see where you open
> >      the 2nd one; I don't see your implementation of
> >      f->ops->get_return_path.
> 
> yes, current qemu version use a separate QEMUFile* for to_src_file and
> from_src_file.
> and the two QEMUFile point to one QIOChannelRDMA.
> 
> the f->ops->get_return_path is implemented by channel_output_ops or
> channel_input_ops.

Ah OK, yes that makes sense.

> >   b) I *think* that while the different threads might all call
> >      fclose(), I think there should only ever be one qemu_fclose
> >      call for each direction on the QEMUFile.
> >
> > But now we have two problems:
> >   If (a) is true then f->lock  is separate on each one so
> >    doesn't really protect if the two directions are closed
> >    at once. (Assuming (b) is true)
> 
> yes, you are right.  so I should add a QemuMutex in QIOChannel structure, not
> QEMUFile structure. and qemu_mutex_destroy the QemuMutex in
> qio_channel_finalize.

OK, that sounds better.

Dave

> Thank you.
> 
> >
> >   If (a) is false and we actually share a single QEMUFile then
> >  that race at the end happens.
> >
> > Dave
> >
> >
> >>      trace_qemu_file_fclose();
> >>      return ret;
> >> --
> >> 1.8.3.1
> >>
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v4 10/12] migration: create a dedicated thread to release rdma resource
  2018-05-31  7:25     ` 858585 jemmy
@ 2018-05-31 10:55       ` Dr. David Alan Gilbert
  2018-05-31 11:27         ` 858585 jemmy
  0 siblings, 1 reply; 28+ messages in thread
From: Dr. David Alan Gilbert @ 2018-05-31 10:55 UTC (permalink / raw)
  To: 858585 jemmy
  Cc: zhang.zhanghailiang, Juan Quintela, Daniel P. Berrange,
	Aviad Yehezkel, Paolo Bonzini, qemu-devel, Adi Dotan,
	Lidong Chen

* 858585 jemmy (jemmy858585@gmail.com) wrote:
> On Thu, May 31, 2018 at 12:50 AM, Dr. David Alan Gilbert
> <dgilbert@redhat.com> wrote:
> > * Lidong Chen (jemmy858585@gmail.com) wrote:
> >> ibv_dereg_mr wait for a long time for big memory size virtual server.
> >>
> >> The test result is:
> >>   10GB  326ms
> >>   20GB  699ms
> >>   30GB  1021ms
> >>   40GB  1387ms
> >>   50GB  1712ms
> >>   60GB  2034ms
> >>   70GB  2457ms
> >>   80GB  2807ms
> >>   90GB  3107ms
> >>   100GB 3474ms
> >>   110GB 3735ms
> >>   120GB 4064ms
> >>   130GB 4567ms
> >>   140GB 4886ms
> >>
> >> this will cause the guest os hang for a while when migration finished.
> >> So create a dedicated thread to release rdma resource.
> >>
> >> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
> >> ---
> >>  migration/rdma.c | 21 +++++++++++++++++----
> >>  1 file changed, 17 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/migration/rdma.c b/migration/rdma.c
> >> index dfa4f77..1b9e261 100644
> >> --- a/migration/rdma.c
> >> +++ b/migration/rdma.c
> >> @@ -2979,12 +2979,12 @@ static void qio_channel_rdma_set_aio_fd_handler(QIOChannel *ioc,
> >>      }
> >>  }
> >>
> >> -static int qio_channel_rdma_close(QIOChannel *ioc,
> >> -                                  Error **errp)
> >> +static void *qio_channel_rdma_close_thread(void *arg)
> >>  {
> >> -    QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
> >> +    QIOChannelRDMA *rioc = arg;
> >>      RDMAContext *rdmain, *rdmaout;
> >> -    trace_qemu_rdma_close();
> >> +
> >> +    rcu_register_thread();
> >>
> >>      rdmain = rioc->rdmain;
> >>      if (rdmain) {
> >> @@ -3009,6 +3009,19 @@ static int qio_channel_rdma_close(QIOChannel *ioc,
> >>      g_free(rdmain);
> >>      g_free(rdmaout);
> >>
> >> +    rcu_unregister_thread();
> >> +    return NULL;
> >> +}
> >> +
> >> +static int qio_channel_rdma_close(QIOChannel *ioc,
> >> +                                  Error **errp)
> >> +{
> >> +    QemuThread t;
> >> +    QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
> >> +    trace_qemu_rdma_close();
> >> +
> >> +    qemu_thread_create(&t, "rdma cleanup", qio_channel_rdma_close_thread,
> >> +                           rioc, QEMU_THREAD_DETACHED);
> >
> > I don't think this can be this simple; consider the lock in patch 4;
> > now that lock means qui_channel_rdma_close() can't be called in
> > parallel; but with this change it means:
> >
> >
> >      f->lock
> >        qemu_thread_create  (1)
> >     !f->lock
> >      f->lock
> >        qemu_thread_create
> >     !f->lock
> >
> > so we don't really protect the thing you were trying to lock
> 
> yes, I should not use rioc as the thread arg.
> 
> static int qio_channel_rdma_close(QIOChannel *ioc,
>                                   Error **errp)
> {
>     QemuThread t;
>     RDMAContext *rdma[2];
>     QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
> 
>     trace_qemu_rdma_close();
>     if (rioc->rdmain || rioc->rdmaout) {
>         rdma[0] =  rioc->rdmain;
>         rdma[1] =  rioc->rdmaout;
>         qemu_thread_create(&t, "rdma cleanup", qio_channel_rdma_close_thread,
>                            rdma, QEMU_THREAD_DETACHED);
>         rioc->rdmain = NULL;
>         rioc->rdmaout = NULL;

Is it safe to close both directions at once?
For example, if you get the close from the return path thread, might the
main thread be still using it's QEMUFile in the opposite direction;
it'll call close a little bit later?

Dave

>     }
>     return 0;
> }
> 
> >
> > Dave
> >
> >>      return 0;
> >>  }
> >>
> >> --
> >> 1.8.3.1
> >>
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v4 10/12] migration: create a dedicated thread to release rdma resource
  2018-05-31 10:55       ` Dr. David Alan Gilbert
@ 2018-05-31 11:27         ` 858585 jemmy
  0 siblings, 0 replies; 28+ messages in thread
From: 858585 jemmy @ 2018-05-31 11:27 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: zhang.zhanghailiang, Juan Quintela, Daniel P. Berrange,
	Aviad Yehezkel, Paolo Bonzini, qemu-devel, Adi Dotan,
	Lidong Chen

On Thu, May 31, 2018 at 6:55 PM, Dr. David Alan Gilbert
<dgilbert@redhat.com> wrote:
> * 858585 jemmy (jemmy858585@gmail.com) wrote:
>> On Thu, May 31, 2018 at 12:50 AM, Dr. David Alan Gilbert
>> <dgilbert@redhat.com> wrote:
>> > * Lidong Chen (jemmy858585@gmail.com) wrote:
>> >> ibv_dereg_mr wait for a long time for big memory size virtual server.
>> >>
>> >> The test result is:
>> >>   10GB  326ms
>> >>   20GB  699ms
>> >>   30GB  1021ms
>> >>   40GB  1387ms
>> >>   50GB  1712ms
>> >>   60GB  2034ms
>> >>   70GB  2457ms
>> >>   80GB  2807ms
>> >>   90GB  3107ms
>> >>   100GB 3474ms
>> >>   110GB 3735ms
>> >>   120GB 4064ms
>> >>   130GB 4567ms
>> >>   140GB 4886ms
>> >>
>> >> this will cause the guest os hang for a while when migration finished.
>> >> So create a dedicated thread to release rdma resource.
>> >>
>> >> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
>> >> ---
>> >>  migration/rdma.c | 21 +++++++++++++++++----
>> >>  1 file changed, 17 insertions(+), 4 deletions(-)
>> >>
>> >> diff --git a/migration/rdma.c b/migration/rdma.c
>> >> index dfa4f77..1b9e261 100644
>> >> --- a/migration/rdma.c
>> >> +++ b/migration/rdma.c
>> >> @@ -2979,12 +2979,12 @@ static void qio_channel_rdma_set_aio_fd_handler(QIOChannel *ioc,
>> >>      }
>> >>  }
>> >>
>> >> -static int qio_channel_rdma_close(QIOChannel *ioc,
>> >> -                                  Error **errp)
>> >> +static void *qio_channel_rdma_close_thread(void *arg)
>> >>  {
>> >> -    QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
>> >> +    QIOChannelRDMA *rioc = arg;
>> >>      RDMAContext *rdmain, *rdmaout;
>> >> -    trace_qemu_rdma_close();
>> >> +
>> >> +    rcu_register_thread();
>> >>
>> >>      rdmain = rioc->rdmain;
>> >>      if (rdmain) {
>> >> @@ -3009,6 +3009,19 @@ static int qio_channel_rdma_close(QIOChannel *ioc,
>> >>      g_free(rdmain);
>> >>      g_free(rdmaout);
>> >>
>> >> +    rcu_unregister_thread();
>> >> +    return NULL;
>> >> +}
>> >> +
>> >> +static int qio_channel_rdma_close(QIOChannel *ioc,
>> >> +                                  Error **errp)
>> >> +{
>> >> +    QemuThread t;
>> >> +    QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
>> >> +    trace_qemu_rdma_close();
>> >> +
>> >> +    qemu_thread_create(&t, "rdma cleanup", qio_channel_rdma_close_thread,
>> >> +                           rioc, QEMU_THREAD_DETACHED);
>> >
>> > I don't think this can be this simple; consider the lock in patch 4;
>> > now that lock means qui_channel_rdma_close() can't be called in
>> > parallel; but with this change it means:
>> >
>> >
>> >      f->lock
>> >        qemu_thread_create  (1)
>> >     !f->lock
>> >      f->lock
>> >        qemu_thread_create
>> >     !f->lock
>> >
>> > so we don't really protect the thing you were trying to lock
>>
>> yes, I should not use rioc as the thread arg.
>>
>> static int qio_channel_rdma_close(QIOChannel *ioc,
>>                                   Error **errp)
>> {
>>     QemuThread t;
>>     RDMAContext *rdma[2];
>>     QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
>>
>>     trace_qemu_rdma_close();
>>     if (rioc->rdmain || rioc->rdmaout) {
>>         rdma[0] =  rioc->rdmain;
>>         rdma[1] =  rioc->rdmaout;
>>         qemu_thread_create(&t, "rdma cleanup", qio_channel_rdma_close_thread,
>>                            rdma, QEMU_THREAD_DETACHED);
>>         rioc->rdmain = NULL;
>>         rioc->rdmaout = NULL;
>
> Is it safe to close both directions at once?
> For example, if you get the close from the return path thread, might the
> main thread be still using it's QEMUFile in the opposite direction;
> it'll call close a little bit later?

I use rcu to protect this.  qio_channel_rdma_close_thread call synchronize_rcu,
it will wait until all other thread not access rdmain and rdmaout.

And if the return patch close the qemu file, the migration thread qemu
file will be set error soon
because the QIOChannel is closed. QIOChannelSocket also work this way.



>
> Dave
>
>>     }
>>     return 0;
>> }
>>
>> >
>> > Dave
>> >
>> >>      return 0;
>> >>  }
>> >>
>> >> --
>> >> 1.8.3.1
>> >>
>> > --
>> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v4 04/12] migration: avoid concurrent invoke channel_close by different threads
  2018-05-31 10:52       ` Dr. David Alan Gilbert
@ 2018-06-03 13:50         ` 858585 jemmy
  2018-06-03 14:43           ` 858585 jemmy
  0 siblings, 1 reply; 28+ messages in thread
From: 858585 jemmy @ 2018-06-03 13:50 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: zhang.zhanghailiang, Juan Quintela, Daniel P. Berrange,
	Aviad Yehezkel, Paolo Bonzini, qemu-devel, Adi Dotan,
	Lidong Chen

On Thu, May 31, 2018 at 6:52 PM, Dr. David Alan Gilbert
<dgilbert@redhat.com> wrote:
> * 858585 jemmy (jemmy858585@gmail.com) wrote:
>> On Wed, May 30, 2018 at 10:45 PM, Dr. David Alan Gilbert
>> <dgilbert@redhat.com> wrote:
>> > * Lidong Chen (jemmy858585@gmail.com) wrote:
>> >> From: Lidong Chen <jemmy858585@gmail.com>
>> >>
>> >> The channel_close maybe invoked by different threads. For example, source
>> >> qemu invokes qemu_fclose in main thread, migration thread and return path
>> >> thread. Destination qemu invokes qemu_fclose in main thread, listen thread
>> >> and COLO incoming thread.
>> >>
>> >> Add a mutex in QEMUFile struct to avoid concurrent invoke channel_close.
>> >>
>> >> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
>> >> ---
>> >>  migration/qemu-file.c | 5 +++++
>> >>  1 file changed, 5 insertions(+)
>> >>
>> >> diff --git a/migration/qemu-file.c b/migration/qemu-file.c
>> >> index 977b9ae..87d0f05 100644
>> >> --- a/migration/qemu-file.c
>> >> +++ b/migration/qemu-file.c
>> >> @@ -52,6 +52,7 @@ struct QEMUFile {
>> >>      unsigned int iovcnt;
>> >>
>> >>      int last_error;
>> >> +    QemuMutex lock;
>> >
>> > That could do with a comment saying what you're protecting
>> >
>> >>  };
>> >>
>> >>  /*
>> >> @@ -96,6 +97,7 @@ QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps *ops)
>> >>
>> >>      f = g_new0(QEMUFile, 1);
>> >>
>> >> +    qemu_mutex_init(&f->lock);
>> >>      f->opaque = opaque;
>> >>      f->ops = ops;
>> >>      return f;
>> >> @@ -328,7 +330,9 @@ int qemu_fclose(QEMUFile *f)
>> >>      ret = qemu_file_get_error(f);
>> >>
>> >>      if (f->ops->close) {
>> >> +        qemu_mutex_lock(&f->lock);
>> >>          int ret2 = f->ops->close(f->opaque);
>> >> +        qemu_mutex_unlock(&f->lock);
>> >
>> > OK, and at least for the RDMA code, if it calls
>> > close a 2nd time, rioc->rdma is checked so it wont actually free stuff a
>> > 2nd time.
>> >
>> >>          if (ret >= 0) {
>> >>              ret = ret2;
>> >>          }
>> >> @@ -339,6 +343,7 @@ int qemu_fclose(QEMUFile *f)
>> >>      if (f->last_error) {
>> >>          ret = f->last_error;
>> >>      }
>> >> +    qemu_mutex_destroy(&f->lock);
>> >>      g_free(f);
>> >
>> > Hmm but that's not safe; if two things really do call qemu_fclose()
>> > on the same structure they race here and can end up destroying the lock
>> > twice, or doing f->lock  after the 1st one has already g_free(f).
>>
>> >
>> >
>> > So lets go back a step.
>> > I think:
>> >   a) There should always be a separate QEMUFile* for
>> >      to_src_file and from_src_file - I don't see where you open
>> >      the 2nd one; I don't see your implementation of
>> >      f->ops->get_return_path.
>>
>> yes, current qemu version use a separate QEMUFile* for to_src_file and
>> from_src_file.
>> and the two QEMUFile point to one QIOChannelRDMA.
>>
>> the f->ops->get_return_path is implemented by channel_output_ops or
>> channel_input_ops.
>
> Ah OK, yes that makes sense.
>
>> >   b) I *think* that while the different threads might all call
>> >      fclose(), I think there should only ever be one qemu_fclose
>> >      call for each direction on the QEMUFile.
>> >
>> > But now we have two problems:
>> >   If (a) is true then f->lock  is separate on each one so
>> >    doesn't really protect if the two directions are closed
>> >    at once. (Assuming (b) is true)
>>
>> yes, you are right.  so I should add a QemuMutex in QIOChannel structure, not
>> QEMUFile structure. and qemu_mutex_destroy the QemuMutex in
>> qio_channel_finalize.
>
> OK, that sounds better.
>
> Dave
>

Hi Dave:
    Another way is protect channel_close in migration part, like
QemuMutex rp_mutex.
    As Daniel mentioned, QIOChannel impls are only intended to a single thread.
    https://www.mail-archive.com/qemu-devel@nongnu.org/msg530100.html

    which way is better? Does QIOChannel have the plan to support multi thread?
    Not only channel_close need lock between different threads,
writev_buffer write also
    need.

    thanks.


>> Thank you.
>>
>> >
>> >   If (a) is false and we actually share a single QEMUFile then
>> >  that race at the end happens.
>> >
>> > Dave
>> >
>> >
>> >>      trace_qemu_file_fclose();
>> >>      return ret;
>> >> --
>> >> 1.8.3.1
>> >>
>> > --
>> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v4 04/12] migration: avoid concurrent invoke channel_close by different threads
  2018-06-03 13:50         ` 858585 jemmy
@ 2018-06-03 14:43           ` 858585 jemmy
  0 siblings, 0 replies; 28+ messages in thread
From: 858585 jemmy @ 2018-06-03 14:43 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: zhang.zhanghailiang, Juan Quintela, Daniel P. Berrange,
	Aviad Yehezkel, Paolo Bonzini, qemu-devel, Adi Dotan,
	Lidong Chen

On Sun, Jun 3, 2018 at 9:50 PM, 858585 jemmy <jemmy858585@gmail.com> wrote:
> On Thu, May 31, 2018 at 6:52 PM, Dr. David Alan Gilbert
> <dgilbert@redhat.com> wrote:
>> * 858585 jemmy (jemmy858585@gmail.com) wrote:
>>> On Wed, May 30, 2018 at 10:45 PM, Dr. David Alan Gilbert
>>> <dgilbert@redhat.com> wrote:
>>> > * Lidong Chen (jemmy858585@gmail.com) wrote:
>>> >> From: Lidong Chen <jemmy858585@gmail.com>
>>> >>
>>> >> The channel_close maybe invoked by different threads. For example, source
>>> >> qemu invokes qemu_fclose in main thread, migration thread and return path
>>> >> thread. Destination qemu invokes qemu_fclose in main thread, listen thread
>>> >> and COLO incoming thread.
>>> >>
>>> >> Add a mutex in QEMUFile struct to avoid concurrent invoke channel_close.
>>> >>
>>> >> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
>>> >> ---
>>> >>  migration/qemu-file.c | 5 +++++
>>> >>  1 file changed, 5 insertions(+)
>>> >>
>>> >> diff --git a/migration/qemu-file.c b/migration/qemu-file.c
>>> >> index 977b9ae..87d0f05 100644
>>> >> --- a/migration/qemu-file.c
>>> >> +++ b/migration/qemu-file.c
>>> >> @@ -52,6 +52,7 @@ struct QEMUFile {
>>> >>      unsigned int iovcnt;
>>> >>
>>> >>      int last_error;
>>> >> +    QemuMutex lock;
>>> >
>>> > That could do with a comment saying what you're protecting
>>> >
>>> >>  };
>>> >>
>>> >>  /*
>>> >> @@ -96,6 +97,7 @@ QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps *ops)
>>> >>
>>> >>      f = g_new0(QEMUFile, 1);
>>> >>
>>> >> +    qemu_mutex_init(&f->lock);
>>> >>      f->opaque = opaque;
>>> >>      f->ops = ops;
>>> >>      return f;
>>> >> @@ -328,7 +330,9 @@ int qemu_fclose(QEMUFile *f)
>>> >>      ret = qemu_file_get_error(f);
>>> >>
>>> >>      if (f->ops->close) {
>>> >> +        qemu_mutex_lock(&f->lock);
>>> >>          int ret2 = f->ops->close(f->opaque);
>>> >> +        qemu_mutex_unlock(&f->lock);
>>> >
>>> > OK, and at least for the RDMA code, if it calls
>>> > close a 2nd time, rioc->rdma is checked so it wont actually free stuff a
>>> > 2nd time.
>>> >
>>> >>          if (ret >= 0) {
>>> >>              ret = ret2;
>>> >>          }
>>> >> @@ -339,6 +343,7 @@ int qemu_fclose(QEMUFile *f)
>>> >>      if (f->last_error) {
>>> >>          ret = f->last_error;
>>> >>      }
>>> >> +    qemu_mutex_destroy(&f->lock);
>>> >>      g_free(f);
>>> >
>>> > Hmm but that's not safe; if two things really do call qemu_fclose()
>>> > on the same structure they race here and can end up destroying the lock
>>> > twice, or doing f->lock  after the 1st one has already g_free(f).
>>>
>>> >
>>> >
>>> > So lets go back a step.
>>> > I think:
>>> >   a) There should always be a separate QEMUFile* for
>>> >      to_src_file and from_src_file - I don't see where you open
>>> >      the 2nd one; I don't see your implementation of
>>> >      f->ops->get_return_path.
>>>
>>> yes, current qemu version use a separate QEMUFile* for to_src_file and
>>> from_src_file.
>>> and the two QEMUFile point to one QIOChannelRDMA.
>>>
>>> the f->ops->get_return_path is implemented by channel_output_ops or
>>> channel_input_ops.
>>
>> Ah OK, yes that makes sense.
>>
>>> >   b) I *think* that while the different threads might all call
>>> >      fclose(), I think there should only ever be one qemu_fclose
>>> >      call for each direction on the QEMUFile.
>>> >
>>> > But now we have two problems:
>>> >   If (a) is true then f->lock  is separate on each one so
>>> >    doesn't really protect if the two directions are closed
>>> >    at once. (Assuming (b) is true)
>>>
>>> yes, you are right.  so I should add a QemuMutex in QIOChannel structure, not
>>> QEMUFile structure. and qemu_mutex_destroy the QemuMutex in
>>> qio_channel_finalize.
>>
>> OK, that sounds better.
>>
>> Dave
>>
>
> Hi Dave:
>     Another way is protect channel_close in migration part, like
> QemuMutex rp_mutex.
>     As Daniel mentioned, QIOChannel impls are only intended to a single thread.
>     https://www.mail-archive.com/qemu-devel@nongnu.org/msg530100.html
>
>     which way is better? Does QIOChannel have the plan to support multi thread?
>     Not only channel_close need lock between different threads,
> writev_buffer write also
>     need.
>
>     thanks.
>
>

I find qemu not call qemu_mutex_destroy to release rp_mutex in
migration_instance_finalize:(
although qemu_mutex_destroy is not necceesary, but it is a good practice to do.
it's better we fixed it.

>>> Thank you.
>>>
>>> >
>>> >   If (a) is false and we actually share a single QEMUFile then
>>> >  that race at the end happens.
>>> >
>>> > Dave
>>> >
>>> >
>>> >>      trace_qemu_file_fclose();
>>> >>      return ret;
>>> >> --
>>> >> 1.8.3.1
>>> >>
>>> > --
>>> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>> --
>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v4 11/12] migration: poll the cm event while wait RDMA work request completion
  2018-05-31  7:36     ` 858585 jemmy
@ 2018-06-03 15:04       ` Aviad Yehezkel
  2018-06-05 14:26         ` 858585 jemmy
  0 siblings, 1 reply; 28+ messages in thread
From: Aviad Yehezkel @ 2018-06-03 15:04 UTC (permalink / raw)
  To: 858585 jemmy, Dr. David Alan Gilbert, Aviad Yehezkel
  Cc: Adi Dotan, zhang.zhanghailiang, Juan Quintela, qemu-devel,
	Paolo Bonzini, Lidong Chen

+Gal

Gal, please comment with our findings.

Thanks!


On 5/31/2018 10:36 AM, 858585 jemmy wrote:
> On Thu, May 31, 2018 at 1:33 AM, Dr. David Alan Gilbert
> <dgilbert@redhat.com> wrote:
>> * Lidong Chen (jemmy858585@gmail.com) wrote:
>>> If the peer qemu is crashed, the qemu_rdma_wait_comp_channel function
>>> maybe loop forever. so we should also poll the cm event fd, and when
>>> receive any cm event, we consider some error happened.
>>>
>>> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
>> I don't understand enough about the way the infiniband fd's work to
>> fully review this; so I'd appreciate if some one who does could
>> comment/add their review.
> Hi Avaid:
>      we need your help. I also not find any document about the cq
> channel event fd and
> cm channel event f.
>      Should we set the events to G_IO_IN | G_IO_HUP | G_IO_ERR? or
> G_IO_IN is enough?
>      pfds[0].events = G_IO_IN | G_IO_HUP | G_IO_ERR;
>      pfds[1].events = G_IO_IN | G_IO_HUP | G_IO_ERR;
>      Thanks.
>
>>> ---
>>>   migration/rdma.c | 35 ++++++++++++++++++++++++-----------
>>>   1 file changed, 24 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/migration/rdma.c b/migration/rdma.c
>>> index 1b9e261..d611a06 100644
>>> --- a/migration/rdma.c
>>> +++ b/migration/rdma.c
>>> @@ -1489,6 +1489,9 @@ static uint64_t qemu_rdma_poll(RDMAContext *rdma, uint64_t *wr_id_out,
>>>    */
>>>   static int qemu_rdma_wait_comp_channel(RDMAContext *rdma)
>>>   {
>>> +    struct rdma_cm_event *cm_event;
>>> +    int ret = -1;
>>> +
>>>       /*
>>>        * Coroutine doesn't start until migration_fd_process_incoming()
>>>        * so don't yield unless we know we're running inside of a coroutine.
>>> @@ -1504,25 +1507,35 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma)
>>>            * But we need to be able to handle 'cancel' or an error
>>>            * without hanging forever.
>>>            */
>>> -        while (!rdma->error_state  && !rdma->received_error) {
>>> -            GPollFD pfds[1];
>>> +        while (!rdma->error_state && !rdma->received_error) {
>>> +            GPollFD pfds[2];
>>>               pfds[0].fd = rdma->comp_channel->fd;
>>>               pfds[0].events = G_IO_IN | G_IO_HUP | G_IO_ERR;
>>> +            pfds[0].revents = 0;
>>> +
>>> +            pfds[1].fd = rdma->channel->fd;
>>> +            pfds[1].events = G_IO_IN | G_IO_HUP | G_IO_ERR;
>>> +            pfds[1].revents = 0;
>>> +
>>>               /* 0.1s timeout, should be fine for a 'cancel' */
>>> -            switch (qemu_poll_ns(pfds, 1, 100 * 1000 * 1000)) {
>>> -            case 1: /* fd active */
>>> -                return 0;
>>> +            qemu_poll_ns(pfds, 2, 100 * 1000 * 1000);
>> Shouldn't we still check the return value of this; if it's negative
>> something has gone wrong.
> I will fix this.
> Thanks.
>
>> Dave
>>
>>> -            case 0: /* Timeout, go around again */
>>> -                break;
>>> +            if (pfds[1].revents) {
>>> +                ret = rdma_get_cm_event(rdma->channel, &cm_event);
>>> +                if (!ret) {
>>> +                    rdma_ack_cm_event(cm_event);
>>> +                }
>>> +                error_report("receive cm event while wait comp channel,"
>>> +                             "cm event is %d", cm_event->event);
>>>
>>> -            default: /* Error of some type -
>>> -                      * I don't trust errno from qemu_poll_ns
>>> -                     */
>>> -                error_report("%s: poll failed", __func__);
>>> +                /* consider any rdma communication event as an error */
>>>                   return -EPIPE;
>>>               }
>>>
>>> +            if (pfds[0].revents) {
>>> +                return 0;
>>> +            }
>>> +
>>>               if (migrate_get_current()->state == MIGRATION_STATUS_CANCELLING) {
>>>                   /* Bail out and let the cancellation happen */
>>>                   return -EPIPE;
>>> --
>>> 1.8.3.1
>>>
>> --
>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v4 11/12] migration: poll the cm event while wait RDMA work request completion
  2018-06-03 15:04       ` Aviad Yehezkel
@ 2018-06-05 14:26         ` 858585 jemmy
  0 siblings, 0 replies; 28+ messages in thread
From: 858585 jemmy @ 2018-06-05 14:26 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, Gal Shachaf
  Cc: Aviad Yehezkel, Adi Dotan, zhang.zhanghailiang, Juan Quintela,
	qemu-devel, Paolo Bonzini, Lidong Chen, Aviad Yehezkel

On Sun, Jun 3, 2018 at 11:04 PM, Aviad Yehezkel
<aviadye@dev.mellanox.co.il> wrote:
> +Gal
>
> Gal, please comment with our findings.

some suggestion from Gal:
1.Regarding the GIOConditions for the FPollFD.events/revents:
G_IO_IN is enough for cm_channel – if it is not empty, it will return
POLLIN | POLLRDNORM, but you can also use G_IO_HUP | G_IO_ERR just to
be safe.
2.Please note that you are not currently checking for error return
values on qemu_poll_ns, rdma_get_cm_event and rdma_ack_cm_event.
3.should consider checking for specific RDMA_CM_EVENT types:
RDMA_CM_EVENT_DISCONNECTED, RDMA_CM_EVENT_DEVICE_REMOVAL.for example,
receiving RDMA_CM_EVENT_ADDR_CHANGE should not result in error
4.it is better to first poll the CQ, and only if it has no new CQEs
poll the eventsQ. This way, you will not go into error even if you’ve
got a CQE.

I will send new version patch.

>
> Thanks!
>
>
> On 5/31/2018 10:36 AM, 858585 jemmy wrote:
>>
>> On Thu, May 31, 2018 at 1:33 AM, Dr. David Alan Gilbert
>> <dgilbert@redhat.com> wrote:
>>>
>>> * Lidong Chen (jemmy858585@gmail.com) wrote:
>>>>
>>>> If the peer qemu is crashed, the qemu_rdma_wait_comp_channel function
>>>> maybe loop forever. so we should also poll the cm event fd, and when
>>>> receive any cm event, we consider some error happened.
>>>>
>>>> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
>>>
>>> I don't understand enough about the way the infiniband fd's work to
>>> fully review this; so I'd appreciate if some one who does could
>>> comment/add their review.
>>
>> Hi Avaid:
>>      we need your help. I also not find any document about the cq
>> channel event fd and
>> cm channel event f.
>>      Should we set the events to G_IO_IN | G_IO_HUP | G_IO_ERR? or
>> G_IO_IN is enough?
>>      pfds[0].events = G_IO_IN | G_IO_HUP | G_IO_ERR;
>>      pfds[1].events = G_IO_IN | G_IO_HUP | G_IO_ERR;
>>      Thanks.
>>
>>>> ---
>>>>   migration/rdma.c | 35 ++++++++++++++++++++++++-----------
>>>>   1 file changed, 24 insertions(+), 11 deletions(-)
>>>>
>>>> diff --git a/migration/rdma.c b/migration/rdma.c
>>>> index 1b9e261..d611a06 100644
>>>> --- a/migration/rdma.c
>>>> +++ b/migration/rdma.c
>>>> @@ -1489,6 +1489,9 @@ static uint64_t qemu_rdma_poll(RDMAContext *rdma,
>>>> uint64_t *wr_id_out,
>>>>    */
>>>>   static int qemu_rdma_wait_comp_channel(RDMAContext *rdma)
>>>>   {
>>>> +    struct rdma_cm_event *cm_event;
>>>> +    int ret = -1;
>>>> +
>>>>       /*
>>>>        * Coroutine doesn't start until migration_fd_process_incoming()
>>>>        * so don't yield unless we know we're running inside of a
>>>> coroutine.
>>>> @@ -1504,25 +1507,35 @@ static int
>>>> qemu_rdma_wait_comp_channel(RDMAContext *rdma)
>>>>            * But we need to be able to handle 'cancel' or an error
>>>>            * without hanging forever.
>>>>            */
>>>> -        while (!rdma->error_state  && !rdma->received_error) {
>>>> -            GPollFD pfds[1];
>>>> +        while (!rdma->error_state && !rdma->received_error) {
>>>> +            GPollFD pfds[2];
>>>>               pfds[0].fd = rdma->comp_channel->fd;
>>>>               pfds[0].events = G_IO_IN | G_IO_HUP | G_IO_ERR;
>>>> +            pfds[0].revents = 0;
>>>> +
>>>> +            pfds[1].fd = rdma->channel->fd;
>>>> +            pfds[1].events = G_IO_IN | G_IO_HUP | G_IO_ERR;
>>>> +            pfds[1].revents = 0;
>>>> +
>>>>               /* 0.1s timeout, should be fine for a 'cancel' */
>>>> -            switch (qemu_poll_ns(pfds, 1, 100 * 1000 * 1000)) {
>>>> -            case 1: /* fd active */
>>>> -                return 0;
>>>> +            qemu_poll_ns(pfds, 2, 100 * 1000 * 1000);
>>>
>>> Shouldn't we still check the return value of this; if it's negative
>>> something has gone wrong.
>>
>> I will fix this.
>> Thanks.
>>
>>> Dave
>>>
>>>> -            case 0: /* Timeout, go around again */
>>>> -                break;
>>>> +            if (pfds[1].revents) {
>>>> +                ret = rdma_get_cm_event(rdma->channel, &cm_event);
>>>> +                if (!ret) {
>>>> +                    rdma_ack_cm_event(cm_event);
>>>> +                }
>>>> +                error_report("receive cm event while wait comp
>>>> channel,"
>>>> +                             "cm event is %d", cm_event->event);
>>>>
>>>> -            default: /* Error of some type -
>>>> -                      * I don't trust errno from qemu_poll_ns
>>>> -                     */
>>>> -                error_report("%s: poll failed", __func__);
>>>> +                /* consider any rdma communication event as an error */
>>>>                   return -EPIPE;
>>>>               }
>>>>
>>>> +            if (pfds[0].revents) {
>>>> +                return 0;
>>>> +            }
>>>> +
>>>>               if (migrate_get_current()->state ==
>>>> MIGRATION_STATUS_CANCELLING) {
>>>>                   /* Bail out and let the cancellation happen */
>>>>                   return -EPIPE;
>>>> --
>>>> 1.8.3.1
>>>>
>>> --
>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2018-06-05 14:26 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-30  9:43 [Qemu-devel] [PATCH v4 00/12] Enable postcopy RDMA live migration Lidong Chen
2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 01/12] migration: disable RDMA WRITE after postcopy started Lidong Chen
2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 02/12] migration: create a dedicated connection for rdma return path Lidong Chen
2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 03/12] migration: remove unnecessary variables len in QIOChannelRDMA Lidong Chen
2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 04/12] migration: avoid concurrent invoke channel_close by different threads Lidong Chen
2018-05-30 14:45   ` Dr. David Alan Gilbert
2018-05-31  7:07     ` 858585 jemmy
2018-05-31 10:52       ` Dr. David Alan Gilbert
2018-06-03 13:50         ` 858585 jemmy
2018-06-03 14:43           ` 858585 jemmy
2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 05/12] migration: implement bi-directional RDMA QIOChannel Lidong Chen
2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 06/12] migration: Stop rdma yielding during incoming postcopy Lidong Chen
2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 07/12] migration: not wait RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnect Lidong Chen
2018-05-30 12:24   ` Dr. David Alan Gilbert
2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 08/12] migration: implement io_set_aio_fd_handler function for RDMA QIOChannel Lidong Chen
2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 09/12] migration: invoke qio_channel_yield only when qemu_in_coroutine() Lidong Chen
2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 10/12] migration: create a dedicated thread to release rdma resource Lidong Chen
2018-05-30 16:50   ` Dr. David Alan Gilbert
2018-05-31  7:25     ` 858585 jemmy
2018-05-31 10:55       ` Dr. David Alan Gilbert
2018-05-31 11:27         ` 858585 jemmy
2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 11/12] migration: poll the cm event while wait RDMA work request completion Lidong Chen
2018-05-30 17:33   ` Dr. David Alan Gilbert
2018-05-31  7:36     ` 858585 jemmy
2018-06-03 15:04       ` Aviad Yehezkel
2018-06-05 14:26         ` 858585 jemmy
2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 12/12] migration: implement the shutdown for RDMA QIOChannel Lidong Chen
2018-05-30 17:59   ` Dr. David Alan Gilbert

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.