All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v5 00/10] Enable postcopy RDMA live migration
@ 2018-06-05 15:27 Lidong Chen
  2018-06-05 15:28 ` [Qemu-devel] [PATCH v5 01/10] migration: disable RDMA WRITE after postcopy started Lidong Chen
                   ` (9 more replies)
  0 siblings, 10 replies; 22+ messages in thread
From: Lidong Chen @ 2018-06-05 15:27 UTC (permalink / raw)
  To: zhang.zhanghailiang, quintela, dgilbert, berrange, aviadye, pbonzini
  Cc: qemu-devel, adido, galsha, Lidong Chen

The RDMA QIOChannel does not support bi-directional communication, so when RDMA 
live migration with postcopy enabled, the source qemu return path get qemu file 
error.

These patches implement bi-directional communication for RDMA QIOChannel and 
disable the RDMA WRITE during the postcopy phase.

This patch just make postcopy works, and will improve performance later.

[v5]
 - rebase
 - fix bug for create a dedicated thread to release rdma resource(David)
 - fix bug for poll the cm event while wait RDMA work request completion(David,Gal)

[v4]
 - not wait RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnect
 - implement io_set_aio_fd_handler function for RDMA QIOChannel (Juan Quintela)
 - invoke qio_channel_yield only when qemu_in_coroutine() (Juan Quintela)
 - create a dedicated thread to release rdma resource
 - poll the cm event while wait RDMA work request completion
 - implement the shutdown function for RDMA QIOChannel

[v3]
 - add a mutex in QEMUFile struct to avoid concurrent channel close (Daniel)
 - destroy the mutex before free QEMUFile (David)
 - use rdmain and rmdaout instead of rdma->return_path (Daniel)

[v2]
 - does not update bytes_xfer when disable RDMA WRITE (David)
 - implement bi-directional communication for RDMA QIOChannel (Daniel)

Lidong Chen (10):
  migration: disable RDMA WRITE after postcopy started
  migration: create a dedicated connection for rdma return path
  migration: avoid concurrent invoke channel_close by different threads
  migration: implement bi-directional RDMA QIOChannel
  migration: Stop rdma yielding during incoming postcopy
  migration: implement io_set_aio_fd_handler function for RDMA
    QIOChannel
  migration: invoke qio_channel_yield only when qemu_in_coroutine()
  migration: create a dedicated thread to release rdma resource
  migration: poll the cm event while wait RDMA work request completion
  migration: implement the shutdown for RDMA QIOChannel

 migration/colo.c              |   2 +
 migration/migration.c         |   4 +
 migration/migration.h         |   7 +
 migration/postcopy-ram.c      |   2 +
 migration/qemu-file-channel.c |  12 +-
 migration/qemu-file.c         |  14 +-
 migration/ram.c               |   4 +
 migration/rdma.c              | 404 ++++++++++++++++++++++++++++++++++++++----
 migration/savevm.c            |   3 +
 9 files changed, 414 insertions(+), 38 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH v5 01/10] migration: disable RDMA WRITE after postcopy started
  2018-06-05 15:27 [Qemu-devel] [PATCH v5 00/10] Enable postcopy RDMA live migration Lidong Chen
@ 2018-06-05 15:28 ` Lidong Chen
  2018-06-05 15:28 ` [Qemu-devel] [PATCH v5 02/10] migration: create a dedicated connection for rdma return path Lidong Chen
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: Lidong Chen @ 2018-06-05 15:28 UTC (permalink / raw)
  To: zhang.zhanghailiang, quintela, dgilbert, berrange, aviadye, pbonzini
  Cc: qemu-devel, adido, galsha, Lidong Chen, Lidong Chen

From: Lidong Chen <jemmy858585@gmail.com>

RDMA WRITE operations are performed with no notification to the destination
qemu, then the destination qemu can not wakeup. This patch disable RDMA WRITE
after postcopy started.

Signed-off-by: Lidong Chen <lidongchen@tencent.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/qemu-file.c |  8 ++++++--
 migration/rdma.c      | 12 ++++++++++++
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 0463f4c..977b9ae 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -253,8 +253,12 @@ size_t ram_control_save_page(QEMUFile *f, ram_addr_t block_offset,
     if (f->hooks && f->hooks->save_page) {
         int ret = f->hooks->save_page(f, f->opaque, block_offset,
                                       offset, size, bytes_sent);
-        f->bytes_xfer += size;
-        if (ret != RAM_SAVE_CONTROL_DELAYED) {
+        if (ret != RAM_SAVE_CONTROL_NOT_SUPP) {
+            f->bytes_xfer += size;
+        }
+
+        if (ret != RAM_SAVE_CONTROL_DELAYED &&
+            ret != RAM_SAVE_CONTROL_NOT_SUPP) {
             if (bytes_sent && *bytes_sent > 0) {
                 qemu_update_position(f, *bytes_sent);
             } else if (ret < 0) {
diff --git a/migration/rdma.c b/migration/rdma.c
index 05aee3d..185ed98 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2921,6 +2921,10 @@ static size_t qemu_rdma_save_page(QEMUFile *f, void *opaque,
 
     CHECK_ERROR_STATE();
 
+    if (migrate_get_current()->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
+        return RAM_SAVE_CONTROL_NOT_SUPP;
+    }
+
     qemu_fflush(f);
 
     if (size > 0) {
@@ -3480,6 +3484,10 @@ static int qemu_rdma_registration_start(QEMUFile *f, void *opaque,
 
     CHECK_ERROR_STATE();
 
+    if (migrate_get_current()->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
+        return 0;
+    }
+
     trace_qemu_rdma_registration_start(flags);
     qemu_put_be64(f, RAM_SAVE_FLAG_HOOK);
     qemu_fflush(f);
@@ -3502,6 +3510,10 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
 
     CHECK_ERROR_STATE();
 
+    if (migrate_get_current()->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
+        return 0;
+    }
+
     qemu_fflush(f);
     ret = qemu_rdma_drain_cq(f, rdma);
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH v5 02/10] migration: create a dedicated connection for rdma return path
  2018-06-05 15:27 [Qemu-devel] [PATCH v5 00/10] Enable postcopy RDMA live migration Lidong Chen
  2018-06-05 15:28 ` [Qemu-devel] [PATCH v5 01/10] migration: disable RDMA WRITE after postcopy started Lidong Chen
@ 2018-06-05 15:28 ` Lidong Chen
  2018-06-05 15:28 ` [Qemu-devel] [PATCH v5 03/10] migration: avoid concurrent invoke channel_close by different threads Lidong Chen
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: Lidong Chen @ 2018-06-05 15:28 UTC (permalink / raw)
  To: zhang.zhanghailiang, quintela, dgilbert, berrange, aviadye, pbonzini
  Cc: qemu-devel, adido, galsha, Lidong Chen, Lidong Chen

From: Lidong Chen <jemmy858585@gmail.com>

If start a RDMA migration with postcopy enabled, the source qemu
establish a dedicated connection for return path.

Signed-off-by: Lidong Chen <lidongchen@tencent.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/rdma.c | 94 ++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 91 insertions(+), 3 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 185ed98..f6705a3 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -387,6 +387,10 @@ typedef struct RDMAContext {
     uint64_t unregistrations[RDMA_SIGNALED_SEND_MAX];
 
     GHashTable *blockmap;
+
+    /* the RDMAContext for return path */
+    struct RDMAContext *return_path;
+    bool is_return_path;
 } RDMAContext;
 
 #define TYPE_QIO_CHANNEL_RDMA "qio-channel-rdma"
@@ -2323,10 +2327,22 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
         rdma_destroy_id(rdma->cm_id);
         rdma->cm_id = NULL;
     }
+
+    /* the destination side, listen_id and channel is shared */
     if (rdma->listen_id) {
-        rdma_destroy_id(rdma->listen_id);
+        if (!rdma->is_return_path) {
+            rdma_destroy_id(rdma->listen_id);
+        }
         rdma->listen_id = NULL;
+
+        if (rdma->channel) {
+            if (!rdma->is_return_path) {
+                rdma_destroy_event_channel(rdma->channel);
+            }
+            rdma->channel = NULL;
+        }
     }
+
     if (rdma->channel) {
         rdma_destroy_event_channel(rdma->channel);
         rdma->channel = NULL;
@@ -2555,6 +2571,25 @@ err_dest_init_create_listen_id:
 
 }
 
+static void qemu_rdma_return_path_dest_init(RDMAContext *rdma_return_path,
+                                            RDMAContext *rdma)
+{
+    int idx;
+
+    for (idx = 0; idx < RDMA_WRID_MAX; idx++) {
+        rdma_return_path->wr_data[idx].control_len = 0;
+        rdma_return_path->wr_data[idx].control_curr = NULL;
+    }
+
+    /*the CM channel and CM id is shared*/
+    rdma_return_path->channel = rdma->channel;
+    rdma_return_path->listen_id = rdma->listen_id;
+
+    rdma->return_path = rdma_return_path;
+    rdma_return_path->return_path = rdma;
+    rdma_return_path->is_return_path = true;
+}
+
 static void *qemu_rdma_data_init(const char *host_port, Error **errp)
 {
     RDMAContext *rdma = NULL;
@@ -3012,6 +3047,8 @@ err:
     return ret;
 }
 
+static void rdma_accept_incoming_migration(void *opaque);
+
 static int qemu_rdma_accept(RDMAContext *rdma)
 {
     RDMACapabilities cap;
@@ -3106,7 +3143,14 @@ static int qemu_rdma_accept(RDMAContext *rdma)
         }
     }
 
-    qemu_set_fd_handler(rdma->channel->fd, NULL, NULL, NULL);
+    /* Accept the second connection request for return path */
+    if (migrate_postcopy() && !rdma->is_return_path) {
+        qemu_set_fd_handler(rdma->channel->fd, rdma_accept_incoming_migration,
+                            NULL,
+                            (void *)(intptr_t)rdma->return_path);
+    } else {
+        qemu_set_fd_handler(rdma->channel->fd, NULL, NULL, NULL);
+    }
 
     ret = rdma_accept(rdma->cm_id, &conn_param);
     if (ret) {
@@ -3691,6 +3735,10 @@ static void rdma_accept_incoming_migration(void *opaque)
 
     trace_qemu_rdma_accept_incoming_migration_accepted();
 
+    if (rdma->is_return_path) {
+        return;
+    }
+
     f = qemu_fopen_rdma(rdma, "rb");
     if (f == NULL) {
         ERROR(errp, "could not qemu_fopen_rdma!");
@@ -3705,7 +3753,7 @@ static void rdma_accept_incoming_migration(void *opaque)
 void rdma_start_incoming_migration(const char *host_port, Error **errp)
 {
     int ret;
-    RDMAContext *rdma;
+    RDMAContext *rdma, *rdma_return_path;
     Error *local_err = NULL;
 
     trace_rdma_start_incoming_migration();
@@ -3732,12 +3780,24 @@ void rdma_start_incoming_migration(const char *host_port, Error **errp)
 
     trace_rdma_start_incoming_migration_after_rdma_listen();
 
+    /* initialize the RDMAContext for return path */
+    if (migrate_postcopy()) {
+        rdma_return_path = qemu_rdma_data_init(host_port, &local_err);
+
+        if (rdma_return_path == NULL) {
+            goto err;
+        }
+
+        qemu_rdma_return_path_dest_init(rdma_return_path, rdma);
+    }
+
     qemu_set_fd_handler(rdma->channel->fd, rdma_accept_incoming_migration,
                         NULL, (void *)(intptr_t)rdma);
     return;
 err:
     error_propagate(errp, local_err);
     g_free(rdma);
+    g_free(rdma_return_path);
 }
 
 void rdma_start_outgoing_migration(void *opaque,
@@ -3745,6 +3805,7 @@ void rdma_start_outgoing_migration(void *opaque,
 {
     MigrationState *s = opaque;
     RDMAContext *rdma = qemu_rdma_data_init(host_port, errp);
+    RDMAContext *rdma_return_path = NULL;
     int ret = 0;
 
     if (rdma == NULL) {
@@ -3765,6 +3826,32 @@ void rdma_start_outgoing_migration(void *opaque,
         goto err;
     }
 
+    /* RDMA postcopy need a seprate queue pair for return path */
+    if (migrate_postcopy()) {
+        rdma_return_path = qemu_rdma_data_init(host_port, errp);
+
+        if (rdma_return_path == NULL) {
+            goto err;
+        }
+
+        ret = qemu_rdma_source_init(rdma_return_path,
+            s->enabled_capabilities[MIGRATION_CAPABILITY_RDMA_PIN_ALL], errp);
+
+        if (ret) {
+            goto err;
+        }
+
+        ret = qemu_rdma_connect(rdma_return_path, errp);
+
+        if (ret) {
+            goto err;
+        }
+
+        rdma->return_path = rdma_return_path;
+        rdma_return_path->return_path = rdma;
+        rdma_return_path->is_return_path = true;
+    }
+
     trace_rdma_start_outgoing_migration_after_rdma_connect();
 
     s->to_dst_file = qemu_fopen_rdma(rdma, "wb");
@@ -3772,4 +3859,5 @@ void rdma_start_outgoing_migration(void *opaque,
     return;
 err:
     g_free(rdma);
+    g_free(rdma_return_path);
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH v5 03/10] migration: avoid concurrent invoke channel_close by different threads
  2018-06-05 15:27 [Qemu-devel] [PATCH v5 00/10] Enable postcopy RDMA live migration Lidong Chen
  2018-06-05 15:28 ` [Qemu-devel] [PATCH v5 01/10] migration: disable RDMA WRITE after postcopy started Lidong Chen
  2018-06-05 15:28 ` [Qemu-devel] [PATCH v5 02/10] migration: create a dedicated connection for rdma return path Lidong Chen
@ 2018-06-05 15:28 ` Lidong Chen
  2018-06-13 17:02   ` Daniel P. Berrangé
  2018-06-05 15:28 ` [Qemu-devel] [PATCH v5 04/10] migration: implement bi-directional RDMA QIOChannel Lidong Chen
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 22+ messages in thread
From: Lidong Chen @ 2018-06-05 15:28 UTC (permalink / raw)
  To: zhang.zhanghailiang, quintela, dgilbert, berrange, aviadye, pbonzini
  Cc: qemu-devel, adido, galsha, Lidong Chen

The channel_close maybe invoked by different threads. For example, source
qemu invokes qemu_fclose in main thread, migration thread and return path
thread. Destination qemu invokes qemu_fclose in main thread, listen thread
and COLO incoming thread.

Signed-off-by: Lidong Chen <lidongchen@tencent.com>
---
 migration/migration.c | 2 ++
 migration/migration.h | 7 +++++++
 migration/qemu-file.c | 6 ++++--
 3 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 1e99ec9..1d0aaec 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3075,6 +3075,7 @@ static void migration_instance_finalize(Object *obj)
 
     qemu_mutex_destroy(&ms->error_mutex);
     qemu_mutex_destroy(&ms->qemu_file_lock);
+    qemu_mutex_destroy(&ms->qemu_file_close_lock);
     g_free(params->tls_hostname);
     g_free(params->tls_creds);
     qemu_sem_destroy(&ms->pause_sem);
@@ -3115,6 +3116,7 @@ static void migration_instance_init(Object *obj)
     qemu_sem_init(&ms->postcopy_pause_rp_sem, 0);
     qemu_sem_init(&ms->rp_state.rp_sem, 0);
     qemu_mutex_init(&ms->qemu_file_lock);
+    qemu_mutex_init(&ms->qemu_file_close_lock);
 }
 
 /*
diff --git a/migration/migration.h b/migration/migration.h
index 5af57d6..7a6025a 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -121,6 +121,13 @@ struct MigrationState
      */
     QemuMutex qemu_file_lock;
 
+    /*
+     * The to_src_file and from_dst_file point to one QIOChannelRDMA,
+     * And qemu_fclose maybe invoked by different threads. use this lock
+     * to avoid concurrent invoke channel_close by different threads.
+     */
+    QemuMutex qemu_file_close_lock;
+
     /* bytes already send at the beggining of current interation */
     uint64_t iteration_initial_bytes;
     /* time at the start of current iteration */
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 977b9ae..74c48e0 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -323,12 +323,14 @@ void qemu_update_position(QEMUFile *f, size_t size)
  */
 int qemu_fclose(QEMUFile *f)
 {
-    int ret;
+    int ret, ret2;
     qemu_fflush(f);
     ret = qemu_file_get_error(f);
 
     if (f->ops->close) {
-        int ret2 = f->ops->close(f->opaque);
+        qemu_mutex_lock(&migrate_get_current()->qemu_file_close_lock);
+        ret2 = f->ops->close(f->opaque);
+        qemu_mutex_unlock(&migrate_get_current()->qemu_file_close_lock);
         if (ret >= 0) {
             ret = ret2;
         }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH v5 04/10] migration: implement bi-directional RDMA QIOChannel
  2018-06-05 15:27 [Qemu-devel] [PATCH v5 00/10] Enable postcopy RDMA live migration Lidong Chen
                   ` (2 preceding siblings ...)
  2018-06-05 15:28 ` [Qemu-devel] [PATCH v5 03/10] migration: avoid concurrent invoke channel_close by different threads Lidong Chen
@ 2018-06-05 15:28 ` Lidong Chen
  2018-06-13 14:21   ` Dr. David Alan Gilbert
  2018-06-27 15:32   ` Dr. David Alan Gilbert
  2018-06-05 15:28 ` [Qemu-devel] [PATCH v5 05/10] migration: Stop rdma yielding during incoming postcopy Lidong Chen
                   ` (5 subsequent siblings)
  9 siblings, 2 replies; 22+ messages in thread
From: Lidong Chen @ 2018-06-05 15:28 UTC (permalink / raw)
  To: zhang.zhanghailiang, quintela, dgilbert, berrange, aviadye, pbonzini
  Cc: qemu-devel, adido, galsha, Lidong Chen, Lidong Chen

From: Lidong Chen <jemmy858585@gmail.com>

This patch implements bi-directional RDMA QIOChannel. Because different
threads may access RDMAQIOChannel currently, this patch use RCU to protect it.

Signed-off-by: Lidong Chen <lidongchen@tencent.com>
---
 migration/colo.c         |   2 +
 migration/migration.c    |   2 +
 migration/postcopy-ram.c |   2 +
 migration/ram.c          |   4 +
 migration/rdma.c         | 196 ++++++++++++++++++++++++++++++++++++++++-------
 migration/savevm.c       |   3 +
 6 files changed, 183 insertions(+), 26 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index 4381067..88936f5 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -534,6 +534,7 @@ void *colo_process_incoming_thread(void *opaque)
     uint64_t value;
     Error *local_err = NULL;
 
+    rcu_register_thread();
     qemu_sem_init(&mis->colo_incoming_sem, 0);
 
     migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
@@ -666,5 +667,6 @@ out:
     }
     migration_incoming_exit_colo();
 
+    rcu_unregister_thread();
     return NULL;
 }
diff --git a/migration/migration.c b/migration/migration.c
index 1d0aaec..4253d9f 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2028,6 +2028,7 @@ static void *source_return_path_thread(void *opaque)
     int res;
 
     trace_source_return_path_thread_entry();
+    rcu_register_thread();
 
 retry:
     while (!ms->rp_state.error && !qemu_file_get_error(rp) &&
@@ -2167,6 +2168,7 @@ out:
     trace_source_return_path_thread_end();
     ms->rp_state.from_dst_file = NULL;
     qemu_fclose(rp);
+    rcu_unregister_thread();
     return NULL;
 }
 
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 48e5155..98613eb 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -853,6 +853,7 @@ static void *postcopy_ram_fault_thread(void *opaque)
     RAMBlock *rb = NULL;
 
     trace_postcopy_ram_fault_thread_entry();
+    rcu_register_thread();
     mis->last_rb = NULL; /* last RAMBlock we sent part of */
     qemu_sem_post(&mis->fault_thread_sem);
 
@@ -1059,6 +1060,7 @@ retry:
             }
         }
     }
+    rcu_unregister_thread();
     trace_postcopy_ram_fault_thread_exit();
     g_free(pfd);
     return NULL;
diff --git a/migration/ram.c b/migration/ram.c
index a500015..a674fb5 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -683,6 +683,7 @@ static void *multifd_send_thread(void *opaque)
     MultiFDSendParams *p = opaque;
     Error *local_err = NULL;
 
+    rcu_register_thread();
     if (multifd_send_initial_packet(p, &local_err) < 0) {
         goto out;
     }
@@ -706,6 +707,7 @@ out:
     p->running = false;
     qemu_mutex_unlock(&p->mutex);
 
+    rcu_unregister_thread();
     return NULL;
 }
 
@@ -819,6 +821,7 @@ static void *multifd_recv_thread(void *opaque)
 {
     MultiFDRecvParams *p = opaque;
 
+    rcu_register_thread();
     while (true) {
         qemu_mutex_lock(&p->mutex);
         if (p->quit) {
@@ -833,6 +836,7 @@ static void *multifd_recv_thread(void *opaque)
     p->running = false;
     qemu_mutex_unlock(&p->mutex);
 
+    rcu_unregister_thread();
     return NULL;
 }
 
diff --git a/migration/rdma.c b/migration/rdma.c
index f6705a3..769f443 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -86,6 +86,7 @@ static uint32_t known_capabilities = RDMA_CAPABILITY_PIN_ALL;
                                 " to abort!"); \
                 rdma->error_reported = 1; \
             } \
+            rcu_read_unlock(); \
             return rdma->error_state; \
         } \
     } while (0)
@@ -402,7 +403,8 @@ typedef struct QIOChannelRDMA QIOChannelRDMA;
 
 struct QIOChannelRDMA {
     QIOChannel parent;
-    RDMAContext *rdma;
+    RDMAContext *rdmain;
+    RDMAContext *rdmaout;
     QEMUFile *file;
     bool blocking; /* XXX we don't actually honour this yet */
 };
@@ -2630,12 +2632,20 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
 {
     QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
     QEMUFile *f = rioc->file;
-    RDMAContext *rdma = rioc->rdma;
+    RDMAContext *rdma;
     int ret;
     ssize_t done = 0;
     size_t i;
     size_t len = 0;
 
+    rcu_read_lock();
+    rdma = atomic_rcu_read(&rioc->rdmaout);
+
+    if (!rdma) {
+        rcu_read_unlock();
+        return -EIO;
+    }
+
     CHECK_ERROR_STATE();
 
     /*
@@ -2645,6 +2655,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
     ret = qemu_rdma_write_flush(f, rdma);
     if (ret < 0) {
         rdma->error_state = ret;
+        rcu_read_unlock();
         return ret;
     }
 
@@ -2664,6 +2675,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
 
             if (ret < 0) {
                 rdma->error_state = ret;
+                rcu_read_unlock();
                 return ret;
             }
 
@@ -2672,6 +2684,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
         }
     }
 
+    rcu_read_unlock();
     return done;
 }
 
@@ -2705,12 +2718,20 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
                                       Error **errp)
 {
     QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
-    RDMAContext *rdma = rioc->rdma;
+    RDMAContext *rdma;
     RDMAControlHeader head;
     int ret = 0;
     ssize_t i;
     size_t done = 0;
 
+    rcu_read_lock();
+    rdma = atomic_rcu_read(&rioc->rdmain);
+
+    if (!rdma) {
+        rcu_read_unlock();
+        return -EIO;
+    }
+
     CHECK_ERROR_STATE();
 
     for (i = 0; i < niov; i++) {
@@ -2722,7 +2743,7 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
          * were given and dish out the bytes until we run
          * out of bytes.
          */
-        ret = qemu_rdma_fill(rioc->rdma, data, want, 0);
+        ret = qemu_rdma_fill(rdma, data, want, 0);
         done += ret;
         want -= ret;
         /* Got what we needed, so go to next iovec */
@@ -2744,25 +2765,28 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
 
         if (ret < 0) {
             rdma->error_state = ret;
+            rcu_read_unlock();
             return ret;
         }
 
         /*
          * SEND was received with new bytes, now try again.
          */
-        ret = qemu_rdma_fill(rioc->rdma, data, want, 0);
+        ret = qemu_rdma_fill(rdma, data, want, 0);
         done += ret;
         want -= ret;
 
         /* Still didn't get enough, so lets just return */
         if (want) {
             if (done == 0) {
+                rcu_read_unlock();
                 return QIO_CHANNEL_ERR_BLOCK;
             } else {
                 break;
             }
         }
     }
+    rcu_read_unlock();
     return done;
 }
 
@@ -2814,15 +2838,29 @@ qio_channel_rdma_source_prepare(GSource *source,
                                 gint *timeout)
 {
     QIOChannelRDMASource *rsource = (QIOChannelRDMASource *)source;
-    RDMAContext *rdma = rsource->rioc->rdma;
+    RDMAContext *rdma;
     GIOCondition cond = 0;
     *timeout = -1;
 
+    rcu_read_lock();
+    if (rsource->condition == G_IO_IN) {
+        rdma = atomic_rcu_read(&rsource->rioc->rdmain);
+    } else {
+        rdma = atomic_rcu_read(&rsource->rioc->rdmaout);
+    }
+
+    if (!rdma) {
+        error_report("RDMAContext is NULL when prepare Gsource");
+        rcu_read_unlock();
+        return FALSE;
+    }
+
     if (rdma->wr_data[0].control_len) {
         cond |= G_IO_IN;
     }
     cond |= G_IO_OUT;
 
+    rcu_read_unlock();
     return cond & rsource->condition;
 }
 
@@ -2830,14 +2868,28 @@ static gboolean
 qio_channel_rdma_source_check(GSource *source)
 {
     QIOChannelRDMASource *rsource = (QIOChannelRDMASource *)source;
-    RDMAContext *rdma = rsource->rioc->rdma;
+    RDMAContext *rdma;
     GIOCondition cond = 0;
 
+    rcu_read_lock();
+    if (rsource->condition == G_IO_IN) {
+        rdma = atomic_rcu_read(&rsource->rioc->rdmain);
+    } else {
+        rdma = atomic_rcu_read(&rsource->rioc->rdmaout);
+    }
+
+    if (!rdma) {
+        error_report("RDMAContext is NULL when check Gsource");
+        rcu_read_unlock();
+        return FALSE;
+    }
+
     if (rdma->wr_data[0].control_len) {
         cond |= G_IO_IN;
     }
     cond |= G_IO_OUT;
 
+    rcu_read_unlock();
     return cond & rsource->condition;
 }
 
@@ -2848,14 +2900,28 @@ qio_channel_rdma_source_dispatch(GSource *source,
 {
     QIOChannelFunc func = (QIOChannelFunc)callback;
     QIOChannelRDMASource *rsource = (QIOChannelRDMASource *)source;
-    RDMAContext *rdma = rsource->rioc->rdma;
+    RDMAContext *rdma;
     GIOCondition cond = 0;
 
+    rcu_read_lock();
+    if (rsource->condition == G_IO_IN) {
+        rdma = atomic_rcu_read(&rsource->rioc->rdmain);
+    } else {
+        rdma = atomic_rcu_read(&rsource->rioc->rdmaout);
+    }
+
+    if (!rdma) {
+        error_report("RDMAContext is NULL when dispatch Gsource");
+        rcu_read_unlock();
+        return FALSE;
+    }
+
     if (rdma->wr_data[0].control_len) {
         cond |= G_IO_IN;
     }
     cond |= G_IO_OUT;
 
+    rcu_read_unlock();
     return (*func)(QIO_CHANNEL(rsource->rioc),
                    (cond & rsource->condition),
                    user_data);
@@ -2900,15 +2966,32 @@ static int qio_channel_rdma_close(QIOChannel *ioc,
                                   Error **errp)
 {
     QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
+    RDMAContext *rdmain, *rdmaout;
     trace_qemu_rdma_close();
-    if (rioc->rdma) {
-        if (!rioc->rdma->error_state) {
-            rioc->rdma->error_state = qemu_file_get_error(rioc->file);
-        }
-        qemu_rdma_cleanup(rioc->rdma);
-        g_free(rioc->rdma);
-        rioc->rdma = NULL;
+
+    rdmain = rioc->rdmain;
+    if (rdmain) {
+        atomic_rcu_set(&rioc->rdmain, NULL);
+    }
+
+    rdmaout = rioc->rdmaout;
+    if (rdmaout) {
+        atomic_rcu_set(&rioc->rdmaout, NULL);
     }
+
+    synchronize_rcu();
+
+    if (rdmain) {
+        qemu_rdma_cleanup(rdmain);
+    }
+
+    if (rdmaout) {
+        qemu_rdma_cleanup(rdmaout);
+    }
+
+    g_free(rdmain);
+    g_free(rdmaout);
+
     return 0;
 }
 
@@ -2951,12 +3034,21 @@ static size_t qemu_rdma_save_page(QEMUFile *f, void *opaque,
                                   size_t size, uint64_t *bytes_sent)
 {
     QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(opaque);
-    RDMAContext *rdma = rioc->rdma;
+    RDMAContext *rdma;
     int ret;
 
+    rcu_read_lock();
+    rdma = atomic_rcu_read(&rioc->rdmaout);
+
+    if (!rdma) {
+        rcu_read_unlock();
+        return -EIO;
+    }
+
     CHECK_ERROR_STATE();
 
     if (migrate_get_current()->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
+        rcu_read_unlock();
         return RAM_SAVE_CONTROL_NOT_SUPP;
     }
 
@@ -3041,9 +3133,11 @@ static size_t qemu_rdma_save_page(QEMUFile *f, void *opaque,
         }
     }
 
+    rcu_read_unlock();
     return RAM_SAVE_CONTROL_DELAYED;
 err:
     rdma->error_state = ret;
+    rcu_read_unlock();
     return ret;
 }
 
@@ -3219,8 +3313,8 @@ static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque)
     RDMAControlHeader blocks = { .type = RDMA_CONTROL_RAM_BLOCKS_RESULT,
                                  .repeat = 1 };
     QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(opaque);
-    RDMAContext *rdma = rioc->rdma;
-    RDMALocalBlocks *local = &rdma->local_ram_blocks;
+    RDMAContext *rdma;
+    RDMALocalBlocks *local;
     RDMAControlHeader head;
     RDMARegister *reg, *registers;
     RDMACompress *comp;
@@ -3233,8 +3327,17 @@ static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque)
     int count = 0;
     int i = 0;
 
+    rcu_read_lock();
+    rdma = atomic_rcu_read(&rioc->rdmain);
+
+    if (!rdma) {
+        rcu_read_unlock();
+        return -EIO;
+    }
+
     CHECK_ERROR_STATE();
 
+    local = &rdma->local_ram_blocks;
     do {
         trace_qemu_rdma_registration_handle_wait();
 
@@ -3468,6 +3571,7 @@ out:
     if (ret < 0) {
         rdma->error_state = ret;
     }
+    rcu_read_unlock();
     return ret;
 }
 
@@ -3481,10 +3585,18 @@ out:
 static int
 rdma_block_notification_handle(QIOChannelRDMA *rioc, const char *name)
 {
-    RDMAContext *rdma = rioc->rdma;
+    RDMAContext *rdma;
     int curr;
     int found = -1;
 
+    rcu_read_lock();
+    rdma = atomic_rcu_read(&rioc->rdmain);
+
+    if (!rdma) {
+        rcu_read_unlock();
+        return -EIO;
+    }
+
     /* Find the matching RAMBlock in our local list */
     for (curr = 0; curr < rdma->local_ram_blocks.nb_blocks; curr++) {
         if (!strcmp(rdma->local_ram_blocks.block[curr].block_name, name)) {
@@ -3495,6 +3607,7 @@ rdma_block_notification_handle(QIOChannelRDMA *rioc, const char *name)
 
     if (found == -1) {
         error_report("RAMBlock '%s' not found on destination", name);
+        rcu_read_unlock();
         return -ENOENT;
     }
 
@@ -3502,6 +3615,7 @@ rdma_block_notification_handle(QIOChannelRDMA *rioc, const char *name)
     trace_rdma_block_notification_handle(name, rdma->next_src_index);
     rdma->next_src_index++;
 
+    rcu_read_unlock();
     return 0;
 }
 
@@ -3524,11 +3638,19 @@ static int qemu_rdma_registration_start(QEMUFile *f, void *opaque,
                                         uint64_t flags, void *data)
 {
     QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(opaque);
-    RDMAContext *rdma = rioc->rdma;
+    RDMAContext *rdma;
+
+    rcu_read_lock();
+    rdma = atomic_rcu_read(&rioc->rdmaout);
+    if (!rdma) {
+        rcu_read_unlock();
+        return -EIO;
+    }
 
     CHECK_ERROR_STATE();
 
     if (migrate_get_current()->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
+        rcu_read_unlock();
         return 0;
     }
 
@@ -3536,6 +3658,7 @@ static int qemu_rdma_registration_start(QEMUFile *f, void *opaque,
     qemu_put_be64(f, RAM_SAVE_FLAG_HOOK);
     qemu_fflush(f);
 
+    rcu_read_unlock();
     return 0;
 }
 
@@ -3548,13 +3671,21 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
 {
     Error *local_err = NULL, **errp = &local_err;
     QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(opaque);
-    RDMAContext *rdma = rioc->rdma;
+    RDMAContext *rdma;
     RDMAControlHeader head = { .len = 0, .repeat = 1 };
     int ret = 0;
 
+    rcu_read_lock();
+    rdma = atomic_rcu_read(&rioc->rdmaout);
+    if (!rdma) {
+        rcu_read_unlock();
+        return -EIO;
+    }
+
     CHECK_ERROR_STATE();
 
     if (migrate_get_current()->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
+        rcu_read_unlock();
         return 0;
     }
 
@@ -3586,6 +3717,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
                     qemu_rdma_reg_whole_ram_blocks : NULL);
         if (ret < 0) {
             ERROR(errp, "receiving remote info!");
+            rcu_read_unlock();
             return ret;
         }
 
@@ -3609,6 +3741,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
                         "not identical on both the source and destination.",
                         local->nb_blocks, nb_dest_blocks);
             rdma->error_state = -EINVAL;
+            rcu_read_unlock();
             return -EINVAL;
         }
 
@@ -3625,6 +3758,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
                             local->block[i].length,
                             rdma->dest_blocks[i].length);
                 rdma->error_state = -EINVAL;
+                rcu_read_unlock();
                 return -EINVAL;
             }
             local->block[i].remote_host_addr =
@@ -3642,9 +3776,11 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
         goto err;
     }
 
+    rcu_read_unlock();
     return 0;
 err:
     rdma->error_state = ret;
+    rcu_read_unlock();
     return ret;
 }
 
@@ -3662,10 +3798,15 @@ static const QEMUFileHooks rdma_write_hooks = {
 static void qio_channel_rdma_finalize(Object *obj)
 {
     QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(obj);
-    if (rioc->rdma) {
-        qemu_rdma_cleanup(rioc->rdma);
-        g_free(rioc->rdma);
-        rioc->rdma = NULL;
+    if (rioc->rdmain) {
+        qemu_rdma_cleanup(rioc->rdmain);
+        g_free(rioc->rdmain);
+        rioc->rdmain = NULL;
+    }
+    if (rioc->rdmaout) {
+        qemu_rdma_cleanup(rioc->rdmaout);
+        g_free(rioc->rdmaout);
+        rioc->rdmaout = NULL;
     }
 }
 
@@ -3705,13 +3846,16 @@ static QEMUFile *qemu_fopen_rdma(RDMAContext *rdma, const char *mode)
     }
 
     rioc = QIO_CHANNEL_RDMA(object_new(TYPE_QIO_CHANNEL_RDMA));
-    rioc->rdma = rdma;
 
     if (mode[0] == 'w') {
         rioc->file = qemu_fopen_channel_output(QIO_CHANNEL(rioc));
+        rioc->rdmaout = rdma;
+        rioc->rdmain = rdma->return_path;
         qemu_file_set_hooks(rioc->file, &rdma_write_hooks);
     } else {
         rioc->file = qemu_fopen_channel_input(QIO_CHANNEL(rioc));
+        rioc->rdmain = rdma;
+        rioc->rdmaout = rdma->return_path;
         qemu_file_set_hooks(rioc->file, &rdma_read_hooks);
     }
 
diff --git a/migration/savevm.c b/migration/savevm.c
index c2f34ff..21c07d4 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1622,6 +1622,7 @@ static void *postcopy_ram_listen_thread(void *opaque)
     qemu_sem_post(&mis->listen_thread_sem);
     trace_postcopy_ram_listen_thread_start();
 
+    rcu_register_thread();
     /*
      * Because we're a thread and not a coroutine we can't yield
      * in qemu_file, and thus we must be blocking now.
@@ -1662,6 +1663,7 @@ static void *postcopy_ram_listen_thread(void *opaque)
          * to leave the guest running and fire MCEs for pages that never
          * arrived as a desperate recovery step.
          */
+        rcu_unregister_thread();
         exit(EXIT_FAILURE);
     }
 
@@ -1676,6 +1678,7 @@ static void *postcopy_ram_listen_thread(void *opaque)
     migration_incoming_state_destroy();
     qemu_loadvm_state_cleanup();
 
+    rcu_unregister_thread();
     return NULL;
 }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH v5 05/10] migration: Stop rdma yielding during incoming postcopy
  2018-06-05 15:27 [Qemu-devel] [PATCH v5 00/10] Enable postcopy RDMA live migration Lidong Chen
                   ` (3 preceding siblings ...)
  2018-06-05 15:28 ` [Qemu-devel] [PATCH v5 04/10] migration: implement bi-directional RDMA QIOChannel Lidong Chen
@ 2018-06-05 15:28 ` Lidong Chen
  2018-06-05 15:28 ` [Qemu-devel] [PATCH v5 06/10] migration: implement io_set_aio_fd_handler function for RDMA QIOChannel Lidong Chen
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: Lidong Chen @ 2018-06-05 15:28 UTC (permalink / raw)
  To: zhang.zhanghailiang, quintela, dgilbert, berrange, aviadye, pbonzini
  Cc: qemu-devel, adido, galsha, Lidong Chen, Lidong Chen

From: Lidong Chen <jemmy858585@gmail.com>

During incoming postcopy, the destination qemu will invoke
qemu_rdma_wait_comp_channel in a seprate thread. So does not use rdma
yield, and poll the completion channel fd instead.

Signed-off-by: Lidong Chen <lidongchen@tencent.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/rdma.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 769f443..92e4d30 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1493,11 +1493,13 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma)
      * Coroutine doesn't start until migration_fd_process_incoming()
      * so don't yield unless we know we're running inside of a coroutine.
      */
-    if (rdma->migration_started_on_destination) {
+    if (rdma->migration_started_on_destination &&
+        migration_incoming_get_current()->state == MIGRATION_STATUS_ACTIVE) {
         yield_until_fd_readable(rdma->comp_channel->fd);
     } else {
         /* This is the source side, we're in a separate thread
          * or destination prior to migration_fd_process_incoming()
+         * after postcopy, the destination also in a seprate thread.
          * we can't yield; so we have to poll the fd.
          * But we need to be able to handle 'cancel' or an error
          * without hanging forever.
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH v5 06/10] migration: implement io_set_aio_fd_handler function for RDMA QIOChannel
  2018-06-05 15:27 [Qemu-devel] [PATCH v5 00/10] Enable postcopy RDMA live migration Lidong Chen
                   ` (4 preceding siblings ...)
  2018-06-05 15:28 ` [Qemu-devel] [PATCH v5 05/10] migration: Stop rdma yielding during incoming postcopy Lidong Chen
@ 2018-06-05 15:28 ` Lidong Chen
  2018-06-05 15:28 ` [Qemu-devel] [PATCH v5 07/10] migration: invoke qio_channel_yield only when qemu_in_coroutine() Lidong Chen
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: Lidong Chen @ 2018-06-05 15:28 UTC (permalink / raw)
  To: zhang.zhanghailiang, quintela, dgilbert, berrange, aviadye, pbonzini
  Cc: qemu-devel, adido, galsha, Lidong Chen, Lidong Chen

From: Lidong Chen <jemmy858585@gmail.com>

if qio_channel_rdma_readv return QIO_CHANNEL_ERR_BLOCK, the destination qemu
crash.

The backtrace is:
(gdb) bt
    #0  0x0000000000000000 in ?? ()
    #1  0x00000000008db50e in qio_channel_set_aio_fd_handler (ioc=0x38111e0, ctx=0x3726080,
        io_read=0x8db841 <qio_channel_restart_read>, io_write=0x0, opaque=0x38111e0) at io/channel.c:
    #2  0x00000000008db952 in qio_channel_set_aio_fd_handlers (ioc=0x38111e0) at io/channel.c:438
    #3  0x00000000008dbab4 in qio_channel_yield (ioc=0x38111e0, condition=G_IO_IN) at io/channel.c:47
    #4  0x00000000007a870b in channel_get_buffer (opaque=0x38111e0, buf=0x440c038 "", pos=0, size=327
        at migration/qemu-file-channel.c:83
    #5  0x00000000007a70f6 in qemu_fill_buffer (f=0x440c000) at migration/qemu-file.c:299
    #6  0x00000000007a79d0 in qemu_peek_byte (f=0x440c000, offset=0) at migration/qemu-file.c:562
    #7  0x00000000007a7a22 in qemu_get_byte (f=0x440c000) at migration/qemu-file.c:575
    #8  0x00000000007a7c78 in qemu_get_be32 (f=0x440c000) at migration/qemu-file.c:655
    #9  0x00000000007a0508 in qemu_loadvm_state (f=0x440c000) at migration/savevm.c:2126
    #10 0x0000000000794141 in process_incoming_migration_co (opaque=0x0) at migration/migration.c:366
    #11 0x000000000095c598 in coroutine_trampoline (i0=84033984, i1=0) at util/coroutine-ucontext.c:1
    #12 0x00007f9c0db56d40 in ?? () from /lib64/libc.so.6
    #13 0x00007f96fe858760 in ?? ()
    #14 0x0000000000000000 in ?? ()

RDMA QIOChannel not implement io_set_aio_fd_handler. so
qio_channel_set_aio_fd_handler will access NULL pointer.

Signed-off-by: Lidong Chen <lidongchen@tencent.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
---
 migration/rdma.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/migration/rdma.c b/migration/rdma.c
index 92e4d30..dfa4f77 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2963,6 +2963,21 @@ static GSource *qio_channel_rdma_create_watch(QIOChannel *ioc,
     return source;
 }
 
+static void qio_channel_rdma_set_aio_fd_handler(QIOChannel *ioc,
+                                                  AioContext *ctx,
+                                                  IOHandler *io_read,
+                                                  IOHandler *io_write,
+                                                  void *opaque)
+{
+    QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
+    if (io_read) {
+        aio_set_fd_handler(ctx, rioc->rdmain->comp_channel->fd,
+                           false, io_read, io_write, NULL, opaque);
+    } else {
+        aio_set_fd_handler(ctx, rioc->rdmaout->comp_channel->fd,
+                           false, io_read, io_write, NULL, opaque);
+    }
+}
 
 static int qio_channel_rdma_close(QIOChannel *ioc,
                                   Error **errp)
@@ -3822,6 +3837,7 @@ static void qio_channel_rdma_class_init(ObjectClass *klass,
     ioc_klass->io_set_blocking = qio_channel_rdma_set_blocking;
     ioc_klass->io_close = qio_channel_rdma_close;
     ioc_klass->io_create_watch = qio_channel_rdma_create_watch;
+    ioc_klass->io_set_aio_fd_handler = qio_channel_rdma_set_aio_fd_handler;
 }
 
 static const TypeInfo qio_channel_rdma_info = {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH v5 07/10] migration: invoke qio_channel_yield only when qemu_in_coroutine()
  2018-06-05 15:27 [Qemu-devel] [PATCH v5 00/10] Enable postcopy RDMA live migration Lidong Chen
                   ` (5 preceding siblings ...)
  2018-06-05 15:28 ` [Qemu-devel] [PATCH v5 06/10] migration: implement io_set_aio_fd_handler function for RDMA QIOChannel Lidong Chen
@ 2018-06-05 15:28 ` Lidong Chen
  2018-06-05 15:28 ` [Qemu-devel] [PATCH v5 08/10] migration: create a dedicated thread to release rdma resource Lidong Chen
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: Lidong Chen @ 2018-06-05 15:28 UTC (permalink / raw)
  To: zhang.zhanghailiang, quintela, dgilbert, berrange, aviadye, pbonzini
  Cc: qemu-devel, adido, galsha, Lidong Chen, Lidong Chen

From: Lidong Chen <jemmy858585@gmail.com>

when qio_channel_read return QIO_CHANNEL_ERR_BLOCK, the source qemu crash.

The backtrace is:
    (gdb) bt
    #0  0x00007fb20aba91d7 in raise () from /lib64/libc.so.6
    #1  0x00007fb20abaa8c8 in abort () from /lib64/libc.so.6
    #2  0x00007fb20aba2146 in __assert_fail_base () from /lib64/libc.so.6
    #3  0x00007fb20aba21f2 in __assert_fail () from /lib64/libc.so.6
    #4  0x00000000008dba2d in qio_channel_yield (ioc=0x22f9e20, condition=G_IO_IN) at io/channel.c:460
    #5  0x00000000007a870b in channel_get_buffer (opaque=0x22f9e20, buf=0x3d54038 "", pos=0, size=32768)
        at migration/qemu-file-channel.c:83
    #6  0x00000000007a70f6 in qemu_fill_buffer (f=0x3d54000) at migration/qemu-file.c:299
    #7  0x00000000007a79d0 in qemu_peek_byte (f=0x3d54000, offset=0) at migration/qemu-file.c:562
    #8  0x00000000007a7a22 in qemu_get_byte (f=0x3d54000) at migration/qemu-file.c:575
    #9  0x00000000007a7c46 in qemu_get_be16 (f=0x3d54000) at migration/qemu-file.c:647
    #10 0x0000000000796db7 in source_return_path_thread (opaque=0x2242280) at migration/migration.c:1794
    #11 0x00000000009428fa in qemu_thread_start (args=0x3e58420) at util/qemu-thread-posix.c:504
    #12 0x00007fb20af3ddc5 in start_thread () from /lib64/libpthread.so.0
    #13 0x00007fb20ac6b74d in clone () from /lib64/libc.so.6

This patch fixed by invoke qio_channel_yield only when qemu_in_coroutine().

Signed-off-by: Lidong Chen <lidongchen@tencent.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
---
 migration/qemu-file-channel.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/migration/qemu-file-channel.c b/migration/qemu-file-channel.c
index e202d73..8e639eb 100644
--- a/migration/qemu-file-channel.c
+++ b/migration/qemu-file-channel.c
@@ -49,7 +49,11 @@ static ssize_t channel_writev_buffer(void *opaque,
         ssize_t len;
         len = qio_channel_writev(ioc, local_iov, nlocal_iov, NULL);
         if (len == QIO_CHANNEL_ERR_BLOCK) {
-            qio_channel_wait(ioc, G_IO_OUT);
+            if (qemu_in_coroutine()) {
+                qio_channel_yield(ioc, G_IO_OUT);
+            } else {
+                qio_channel_wait(ioc, G_IO_OUT);
+            }
             continue;
         }
         if (len < 0) {
@@ -80,7 +84,11 @@ static ssize_t channel_get_buffer(void *opaque,
         ret = qio_channel_read(ioc, (char *)buf, size, NULL);
         if (ret < 0) {
             if (ret == QIO_CHANNEL_ERR_BLOCK) {
-                qio_channel_yield(ioc, G_IO_IN);
+                if (qemu_in_coroutine()) {
+                    qio_channel_yield(ioc, G_IO_IN);
+                } else {
+                    qio_channel_wait(ioc, G_IO_IN);
+                }
             } else {
                 /* XXX handle Error * object */
                 return -EIO;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH v5 08/10] migration: create a dedicated thread to release rdma resource
  2018-06-05 15:27 [Qemu-devel] [PATCH v5 00/10] Enable postcopy RDMA live migration Lidong Chen
                   ` (6 preceding siblings ...)
  2018-06-05 15:28 ` [Qemu-devel] [PATCH v5 07/10] migration: invoke qio_channel_yield only when qemu_in_coroutine() Lidong Chen
@ 2018-06-05 15:28 ` Lidong Chen
  2018-06-27 18:59   ` Dr. David Alan Gilbert
  2018-06-05 15:28 ` [Qemu-devel] [PATCH v5 09/10] migration: poll the cm event while wait RDMA work request completion Lidong Chen
  2018-06-05 15:28 ` [Qemu-devel] [PATCH v5 10/10] migration: implement the shutdown for RDMA QIOChannel Lidong Chen
  9 siblings, 1 reply; 22+ messages in thread
From: Lidong Chen @ 2018-06-05 15:28 UTC (permalink / raw)
  To: zhang.zhanghailiang, quintela, dgilbert, berrange, aviadye, pbonzini
  Cc: qemu-devel, adido, galsha, Lidong Chen

ibv_dereg_mr wait for a long time for big memory size virtual server.

The test result is:
  10GB  326ms
  20GB  699ms
  30GB  1021ms
  40GB  1387ms
  50GB  1712ms
  60GB  2034ms
  70GB  2457ms
  80GB  2807ms
  90GB  3107ms
  100GB 3474ms
  110GB 3735ms
  120GB 4064ms
  130GB 4567ms
  140GB 4886ms

this will cause the guest os hang for a while when migration finished.
So create a dedicated thread to release rdma resource.

Signed-off-by: Lidong Chen <lidongchen@tencent.com>
---
 migration/rdma.c | 43 +++++++++++++++++++++++++++----------------
 1 file changed, 27 insertions(+), 16 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index dfa4f77..f12e8d5 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2979,35 +2979,46 @@ static void qio_channel_rdma_set_aio_fd_handler(QIOChannel *ioc,
     }
 }
 
-static int qio_channel_rdma_close(QIOChannel *ioc,
-                                  Error **errp)
+static void *qio_channel_rdma_close_thread(void *arg)
 {
-    QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
-    RDMAContext *rdmain, *rdmaout;
-    trace_qemu_rdma_close();
+    RDMAContext **rdma = arg;
+    RDMAContext *rdmain = rdma[0];
+    RDMAContext *rdmaout = rdma[1];
 
-    rdmain = rioc->rdmain;
-    if (rdmain) {
-        atomic_rcu_set(&rioc->rdmain, NULL);
-    }
-
-    rdmaout = rioc->rdmaout;
-    if (rdmaout) {
-        atomic_rcu_set(&rioc->rdmaout, NULL);
-    }
+    rcu_register_thread();
 
     synchronize_rcu();
-
     if (rdmain) {
         qemu_rdma_cleanup(rdmain);
     }
-
     if (rdmaout) {
         qemu_rdma_cleanup(rdmaout);
     }
 
     g_free(rdmain);
     g_free(rdmaout);
+    g_free(rdma);
+
+    rcu_unregister_thread();
+    return NULL;
+}
+
+static int qio_channel_rdma_close(QIOChannel *ioc,
+                                  Error **errp)
+{
+    QemuThread t;
+    QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
+    RDMAContext **rdma = g_new0(RDMAContext*, 2);
+
+    trace_qemu_rdma_close();
+    if (rioc->rdmain || rioc->rdmaout) {
+        rdma[0] =  rioc->rdmain;
+        rdma[1] =  rioc->rdmaout;
+        qemu_thread_create(&t, "rdma cleanup", qio_channel_rdma_close_thread,
+                           rdma, QEMU_THREAD_DETACHED);
+        atomic_rcu_set(&rioc->rdmain, NULL);
+        atomic_rcu_set(&rioc->rdmaout, NULL);
+    }
 
     return 0;
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH v5 09/10] migration: poll the cm event while wait RDMA work request completion
  2018-06-05 15:27 [Qemu-devel] [PATCH v5 00/10] Enable postcopy RDMA live migration Lidong Chen
                   ` (7 preceding siblings ...)
  2018-06-05 15:28 ` [Qemu-devel] [PATCH v5 08/10] migration: create a dedicated thread to release rdma resource Lidong Chen
@ 2018-06-05 15:28 ` Lidong Chen
  2018-06-13 14:24   ` Dr. David Alan Gilbert
  2018-06-05 15:28 ` [Qemu-devel] [PATCH v5 10/10] migration: implement the shutdown for RDMA QIOChannel Lidong Chen
  9 siblings, 1 reply; 22+ messages in thread
From: Lidong Chen @ 2018-06-05 15:28 UTC (permalink / raw)
  To: zhang.zhanghailiang, quintela, dgilbert, berrange, aviadye, pbonzini
  Cc: qemu-devel, adido, galsha, Lidong Chen

If the peer qemu is crashed, the qemu_rdma_wait_comp_channel function
maybe loop forever. so we should also poll the cm event fd, and when
receive RDMA_CM_EVENT_DISCONNECTED and RDMA_CM_EVENT_DEVICE_REMOVAL,
we consider some error happened.

Signed-off-by: Lidong Chen <lidongchen@tencent.com>
---
 migration/rdma.c | 33 ++++++++++++++++++++++++++++++---
 1 file changed, 30 insertions(+), 3 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index f12e8d5..bb6989e 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1489,6 +1489,9 @@ static uint64_t qemu_rdma_poll(RDMAContext *rdma, uint64_t *wr_id_out,
  */
 static int qemu_rdma_wait_comp_channel(RDMAContext *rdma)
 {
+    struct rdma_cm_event *cm_event;
+    int ret = -1;
+
     /*
      * Coroutine doesn't start until migration_fd_process_incoming()
      * so don't yield unless we know we're running inside of a coroutine.
@@ -1505,13 +1508,37 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma)
          * without hanging forever.
          */
         while (!rdma->error_state  && !rdma->received_error) {
-            GPollFD pfds[1];
+            GPollFD pfds[2];
             pfds[0].fd = rdma->comp_channel->fd;
             pfds[0].events = G_IO_IN | G_IO_HUP | G_IO_ERR;
+            pfds[0].revents = 0;
+
+            pfds[1].fd = rdma->channel->fd;
+            pfds[1].events = G_IO_IN | G_IO_HUP | G_IO_ERR;
+            pfds[1].revents = 0;
+
             /* 0.1s timeout, should be fine for a 'cancel' */
-            switch (qemu_poll_ns(pfds, 1, 100 * 1000 * 1000)) {
+            switch (qemu_poll_ns(pfds, 2, 100 * 1000 * 1000)) {
+            case 2:
             case 1: /* fd active */
-                return 0;
+                if (pfds[0].revents) {
+                    return 0;
+                }
+
+                if (pfds[1].revents) {
+                    ret = rdma_get_cm_event(rdma->channel, &cm_event);
+                    if (!ret) {
+                        rdma_ack_cm_event(cm_event);
+                    }
+
+                    error_report("receive cm event while wait comp channel,"
+                                 "cm event is %d", cm_event->event);
+                    if (cm_event->event == RDMA_CM_EVENT_DISCONNECTED ||
+                        cm_event->event == RDMA_CM_EVENT_DEVICE_REMOVAL) {
+                        return -EPIPE;
+                    }
+                }
+                break;
 
             case 0: /* Timeout, go around again */
                 break;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH v5 10/10] migration: implement the shutdown for RDMA QIOChannel
  2018-06-05 15:27 [Qemu-devel] [PATCH v5 00/10] Enable postcopy RDMA live migration Lidong Chen
                   ` (8 preceding siblings ...)
  2018-06-05 15:28 ` [Qemu-devel] [PATCH v5 09/10] migration: poll the cm event while wait RDMA work request completion Lidong Chen
@ 2018-06-05 15:28 ` Lidong Chen
  9 siblings, 0 replies; 22+ messages in thread
From: Lidong Chen @ 2018-06-05 15:28 UTC (permalink / raw)
  To: zhang.zhanghailiang, quintela, dgilbert, berrange, aviadye, pbonzini
  Cc: qemu-devel, adido, galsha, Lidong Chen

Because RDMA QIOChannel not implement shutdown function,
If the to_dst_file was set error, the return path thread
will wait forever. and the migration thread will wait
return path thread exit.

the backtrace of return path thread is:

(gdb) bt
    #0  0x00007f372a76bb0f in ppoll () from /lib64/libc.so.6
    #1  0x000000000071dc24 in qemu_poll_ns (fds=0x7ef7091d0580, nfds=2, timeout=100000000)
        at qemu-timer.c:325
    #2  0x00000000006b2fba in qemu_rdma_wait_comp_channel (rdma=0xd424000)
        at migration/rdma.c:1501
    #3  0x00000000006b3191 in qemu_rdma_block_for_wrid (rdma=0xd424000, wrid_requested=4000,
        byte_len=0x7ef7091d0640) at migration/rdma.c:1580
    #4  0x00000000006b3638 in qemu_rdma_exchange_get_response (rdma=0xd424000,
        head=0x7ef7091d0720, expecting=3, idx=0) at migration/rdma.c:1726
    #5  0x00000000006b3ad6 in qemu_rdma_exchange_recv (rdma=0xd424000, head=0x7ef7091d0720,
        expecting=3) at migration/rdma.c:1903
    #6  0x00000000006b5d03 in qemu_rdma_get_buffer (opaque=0x6a57dc0, buf=0x5c80030 "", pos=8,
        size=32768) at migration/rdma.c:2714
    #7  0x00000000006a9635 in qemu_fill_buffer (f=0x5c80000) at migration/qemu-file.c:232
    #8  0x00000000006a9ecd in qemu_peek_byte (f=0x5c80000, offset=0)
        at migration/qemu-file.c:502
    #9  0x00000000006a9f1f in qemu_get_byte (f=0x5c80000) at migration/qemu-file.c:515
    #10 0x00000000006aa162 in qemu_get_be16 (f=0x5c80000) at migration/qemu-file.c:591
    #11 0x00000000006a46d3 in source_return_path_thread (
        opaque=0xd826a0 <current_migration.37100>) at migration/migration.c:1331
    #12 0x00007f372aa49e25 in start_thread () from /lib64/libpthread.so.0
    #13 0x00007f372a77635d in clone () from /lib64/libc.so.6

the backtrace of migration thread is:

(gdb) bt
    #0  0x00007f372aa4af57 in pthread_join () from /lib64/libpthread.so.0
    #1  0x00000000007d5711 in qemu_thread_join (thread=0xd826f8 <current_migration.37100+88>)
        at util/qemu-thread-posix.c:504
    #2  0x00000000006a4bc5 in await_return_path_close_on_source (
        ms=0xd826a0 <current_migration.37100>) at migration/migration.c:1460
    #3  0x00000000006a53e4 in migration_completion (s=0xd826a0 <current_migration.37100>,
        current_active_state=4, old_vm_running=0x7ef7089cf976, start_time=0x7ef7089cf980)
        at migration/migration.c:1695
    #4  0x00000000006a5c54 in migration_thread (opaque=0xd826a0 <current_migration.37100>)
        at migration/migration.c:1837
    #5  0x00007f372aa49e25 in start_thread () from /lib64/libpthread.so.0
    #6  0x00007f372a77635d in clone () from /lib64/libc.so.6

Signed-off-by: Lidong Chen <lidongchen@tencent.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/rdma.c | 40 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/migration/rdma.c b/migration/rdma.c
index bb6989e..0b35d65 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3050,6 +3050,45 @@ static int qio_channel_rdma_close(QIOChannel *ioc,
     return 0;
 }
 
+static int
+qio_channel_rdma_shutdown(QIOChannel *ioc,
+                            QIOChannelShutdown how,
+                            Error **errp)
+{
+    QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
+    RDMAContext *rdmain, *rdmaout;
+
+    rcu_read_lock();
+
+    rdmain = atomic_rcu_read(&rioc->rdmain);
+    rdmaout = atomic_rcu_read(&rioc->rdmain);
+
+    switch (how) {
+    case QIO_CHANNEL_SHUTDOWN_READ:
+        if (rdmain) {
+            rdmain->error_state = -1;
+        }
+        break;
+    case QIO_CHANNEL_SHUTDOWN_WRITE:
+        if (rdmaout) {
+            rdmaout->error_state = -1;
+        }
+        break;
+    case QIO_CHANNEL_SHUTDOWN_BOTH:
+    default:
+        if (rdmain) {
+            rdmain->error_state = -1;
+        }
+        if (rdmaout) {
+            rdmaout->error_state = -1;
+        }
+        break;
+    }
+
+    rcu_read_unlock();
+    return 0;
+}
+
 /*
  * Parameters:
  *    @offset == 0 :
@@ -3876,6 +3915,7 @@ static void qio_channel_rdma_class_init(ObjectClass *klass,
     ioc_klass->io_close = qio_channel_rdma_close;
     ioc_klass->io_create_watch = qio_channel_rdma_create_watch;
     ioc_klass->io_set_aio_fd_handler = qio_channel_rdma_set_aio_fd_handler;
+    ioc_klass->io_shutdown = qio_channel_rdma_shutdown;
 }
 
 static const TypeInfo qio_channel_rdma_info = {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v5 04/10] migration: implement bi-directional RDMA QIOChannel
  2018-06-05 15:28 ` [Qemu-devel] [PATCH v5 04/10] migration: implement bi-directional RDMA QIOChannel Lidong Chen
@ 2018-06-13 14:21   ` Dr. David Alan Gilbert
  2018-06-27 15:31     ` 858585 jemmy
  2018-06-27 15:32   ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 22+ messages in thread
From: Dr. David Alan Gilbert @ 2018-06-13 14:21 UTC (permalink / raw)
  To: Lidong Chen, pbonzini
  Cc: zhang.zhanghailiang, quintela, berrange, aviadye, qemu-devel,
	adido, galsha, Lidong Chen

* Lidong Chen (jemmy858585@gmail.com) wrote:
> From: Lidong Chen <jemmy858585@gmail.com>
> 
> This patch implements bi-directional RDMA QIOChannel. Because different
> threads may access RDMAQIOChannel currently, this patch use RCU to protect it.
> 
> Signed-off-by: Lidong Chen <lidongchen@tencent.com>

Paolo: Does it make sense the way RCU is used here  Holding the
read-lock for so long in multifd_rdma_[read|write]v is what worries me
most.  

Dave

> ---
>  migration/colo.c         |   2 +
>  migration/migration.c    |   2 +
>  migration/postcopy-ram.c |   2 +
>  migration/ram.c          |   4 +
>  migration/rdma.c         | 196 ++++++++++++++++++++++++++++++++++++++++-------
>  migration/savevm.c       |   3 +
>  6 files changed, 183 insertions(+), 26 deletions(-)
> 
> diff --git a/migration/colo.c b/migration/colo.c
> index 4381067..88936f5 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -534,6 +534,7 @@ void *colo_process_incoming_thread(void *opaque)
>      uint64_t value;
>      Error *local_err = NULL;
>  
> +    rcu_register_thread();
>      qemu_sem_init(&mis->colo_incoming_sem, 0);
>  
>      migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
> @@ -666,5 +667,6 @@ out:
>      }
>      migration_incoming_exit_colo();
>  
> +    rcu_unregister_thread();
>      return NULL;
>  }
> diff --git a/migration/migration.c b/migration/migration.c
> index 1d0aaec..4253d9f 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2028,6 +2028,7 @@ static void *source_return_path_thread(void *opaque)
>      int res;
>  
>      trace_source_return_path_thread_entry();
> +    rcu_register_thread();
>  
>  retry:
>      while (!ms->rp_state.error && !qemu_file_get_error(rp) &&
> @@ -2167,6 +2168,7 @@ out:
>      trace_source_return_path_thread_end();
>      ms->rp_state.from_dst_file = NULL;
>      qemu_fclose(rp);
> +    rcu_unregister_thread();
>      return NULL;
>  }
>  
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index 48e5155..98613eb 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -853,6 +853,7 @@ static void *postcopy_ram_fault_thread(void *opaque)
>      RAMBlock *rb = NULL;
>  
>      trace_postcopy_ram_fault_thread_entry();
> +    rcu_register_thread();
>      mis->last_rb = NULL; /* last RAMBlock we sent part of */
>      qemu_sem_post(&mis->fault_thread_sem);
>  
> @@ -1059,6 +1060,7 @@ retry:
>              }
>          }
>      }
> +    rcu_unregister_thread();
>      trace_postcopy_ram_fault_thread_exit();
>      g_free(pfd);
>      return NULL;
> diff --git a/migration/ram.c b/migration/ram.c
> index a500015..a674fb5 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -683,6 +683,7 @@ static void *multifd_send_thread(void *opaque)
>      MultiFDSendParams *p = opaque;
>      Error *local_err = NULL;
>  
> +    rcu_register_thread();
>      if (multifd_send_initial_packet(p, &local_err) < 0) {
>          goto out;
>      }
> @@ -706,6 +707,7 @@ out:
>      p->running = false;
>      qemu_mutex_unlock(&p->mutex);
>  
> +    rcu_unregister_thread();
>      return NULL;
>  }
>  
> @@ -819,6 +821,7 @@ static void *multifd_recv_thread(void *opaque)
>  {
>      MultiFDRecvParams *p = opaque;
>  
> +    rcu_register_thread();
>      while (true) {
>          qemu_mutex_lock(&p->mutex);
>          if (p->quit) {
> @@ -833,6 +836,7 @@ static void *multifd_recv_thread(void *opaque)
>      p->running = false;
>      qemu_mutex_unlock(&p->mutex);
>  
> +    rcu_unregister_thread();
>      return NULL;
>  }
>  
> diff --git a/migration/rdma.c b/migration/rdma.c
> index f6705a3..769f443 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -86,6 +86,7 @@ static uint32_t known_capabilities = RDMA_CAPABILITY_PIN_ALL;
>                                  " to abort!"); \
>                  rdma->error_reported = 1; \
>              } \
> +            rcu_read_unlock(); \
>              return rdma->error_state; \
>          } \
>      } while (0)
> @@ -402,7 +403,8 @@ typedef struct QIOChannelRDMA QIOChannelRDMA;
>  
>  struct QIOChannelRDMA {
>      QIOChannel parent;
> -    RDMAContext *rdma;
> +    RDMAContext *rdmain;
> +    RDMAContext *rdmaout;
>      QEMUFile *file;
>      bool blocking; /* XXX we don't actually honour this yet */
>  };
> @@ -2630,12 +2632,20 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
>  {
>      QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
>      QEMUFile *f = rioc->file;
> -    RDMAContext *rdma = rioc->rdma;
> +    RDMAContext *rdma;
>      int ret;
>      ssize_t done = 0;
>      size_t i;
>      size_t len = 0;
>  
> +    rcu_read_lock();
> +    rdma = atomic_rcu_read(&rioc->rdmaout);
> +
> +    if (!rdma) {
> +        rcu_read_unlock();
> +        return -EIO;
> +    }
> +
>      CHECK_ERROR_STATE();
>  
>      /*
> @@ -2645,6 +2655,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
>      ret = qemu_rdma_write_flush(f, rdma);
>      if (ret < 0) {
>          rdma->error_state = ret;
> +        rcu_read_unlock();
>          return ret;
>      }
>  
> @@ -2664,6 +2675,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
>  
>              if (ret < 0) {
>                  rdma->error_state = ret;
> +                rcu_read_unlock();
>                  return ret;
>              }
>  
> @@ -2672,6 +2684,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
>          }
>      }
>  
> +    rcu_read_unlock();
>      return done;
>  }
>  
> @@ -2705,12 +2718,20 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
>                                        Error **errp)
>  {
>      QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
> -    RDMAContext *rdma = rioc->rdma;
> +    RDMAContext *rdma;
>      RDMAControlHeader head;
>      int ret = 0;
>      ssize_t i;
>      size_t done = 0;
>  
> +    rcu_read_lock();
> +    rdma = atomic_rcu_read(&rioc->rdmain);
> +
> +    if (!rdma) {
> +        rcu_read_unlock();
> +        return -EIO;
> +    }
> +
>      CHECK_ERROR_STATE();
>  
>      for (i = 0; i < niov; i++) {
> @@ -2722,7 +2743,7 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
>           * were given and dish out the bytes until we run
>           * out of bytes.
>           */
> -        ret = qemu_rdma_fill(rioc->rdma, data, want, 0);
> +        ret = qemu_rdma_fill(rdma, data, want, 0);
>          done += ret;
>          want -= ret;
>          /* Got what we needed, so go to next iovec */
> @@ -2744,25 +2765,28 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
>  
>          if (ret < 0) {
>              rdma->error_state = ret;
> +            rcu_read_unlock();
>              return ret;
>          }
>  
>          /*
>           * SEND was received with new bytes, now try again.
>           */
> -        ret = qemu_rdma_fill(rioc->rdma, data, want, 0);
> +        ret = qemu_rdma_fill(rdma, data, want, 0);
>          done += ret;
>          want -= ret;
>  
>          /* Still didn't get enough, so lets just return */
>          if (want) {
>              if (done == 0) {
> +                rcu_read_unlock();
>                  return QIO_CHANNEL_ERR_BLOCK;
>              } else {
>                  break;
>              }
>          }
>      }
> +    rcu_read_unlock();
>      return done;
>  }
>  
> @@ -2814,15 +2838,29 @@ qio_channel_rdma_source_prepare(GSource *source,
>                                  gint *timeout)
>  {
>      QIOChannelRDMASource *rsource = (QIOChannelRDMASource *)source;
> -    RDMAContext *rdma = rsource->rioc->rdma;
> +    RDMAContext *rdma;
>      GIOCondition cond = 0;
>      *timeout = -1;
>  
> +    rcu_read_lock();
> +    if (rsource->condition == G_IO_IN) {
> +        rdma = atomic_rcu_read(&rsource->rioc->rdmain);
> +    } else {
> +        rdma = atomic_rcu_read(&rsource->rioc->rdmaout);
> +    }
> +
> +    if (!rdma) {
> +        error_report("RDMAContext is NULL when prepare Gsource");
> +        rcu_read_unlock();
> +        return FALSE;
> +    }
> +
>      if (rdma->wr_data[0].control_len) {
>          cond |= G_IO_IN;
>      }
>      cond |= G_IO_OUT;
>  
> +    rcu_read_unlock();
>      return cond & rsource->condition;
>  }
>  
> @@ -2830,14 +2868,28 @@ static gboolean
>  qio_channel_rdma_source_check(GSource *source)
>  {
>      QIOChannelRDMASource *rsource = (QIOChannelRDMASource *)source;
> -    RDMAContext *rdma = rsource->rioc->rdma;
> +    RDMAContext *rdma;
>      GIOCondition cond = 0;
>  
> +    rcu_read_lock();
> +    if (rsource->condition == G_IO_IN) {
> +        rdma = atomic_rcu_read(&rsource->rioc->rdmain);
> +    } else {
> +        rdma = atomic_rcu_read(&rsource->rioc->rdmaout);
> +    }
> +
> +    if (!rdma) {
> +        error_report("RDMAContext is NULL when check Gsource");
> +        rcu_read_unlock();
> +        return FALSE;
> +    }
> +
>      if (rdma->wr_data[0].control_len) {
>          cond |= G_IO_IN;
>      }
>      cond |= G_IO_OUT;
>  
> +    rcu_read_unlock();
>      return cond & rsource->condition;
>  }
>  
> @@ -2848,14 +2900,28 @@ qio_channel_rdma_source_dispatch(GSource *source,
>  {
>      QIOChannelFunc func = (QIOChannelFunc)callback;
>      QIOChannelRDMASource *rsource = (QIOChannelRDMASource *)source;
> -    RDMAContext *rdma = rsource->rioc->rdma;
> +    RDMAContext *rdma;
>      GIOCondition cond = 0;
>  
> +    rcu_read_lock();
> +    if (rsource->condition == G_IO_IN) {
> +        rdma = atomic_rcu_read(&rsource->rioc->rdmain);
> +    } else {
> +        rdma = atomic_rcu_read(&rsource->rioc->rdmaout);
> +    }
> +
> +    if (!rdma) {
> +        error_report("RDMAContext is NULL when dispatch Gsource");
> +        rcu_read_unlock();
> +        return FALSE;
> +    }
> +
>      if (rdma->wr_data[0].control_len) {
>          cond |= G_IO_IN;
>      }
>      cond |= G_IO_OUT;
>  
> +    rcu_read_unlock();
>      return (*func)(QIO_CHANNEL(rsource->rioc),
>                     (cond & rsource->condition),
>                     user_data);
> @@ -2900,15 +2966,32 @@ static int qio_channel_rdma_close(QIOChannel *ioc,
>                                    Error **errp)
>  {
>      QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
> +    RDMAContext *rdmain, *rdmaout;
>      trace_qemu_rdma_close();
> -    if (rioc->rdma) {
> -        if (!rioc->rdma->error_state) {
> -            rioc->rdma->error_state = qemu_file_get_error(rioc->file);
> -        }
> -        qemu_rdma_cleanup(rioc->rdma);
> -        g_free(rioc->rdma);
> -        rioc->rdma = NULL;
> +
> +    rdmain = rioc->rdmain;
> +    if (rdmain) {
> +        atomic_rcu_set(&rioc->rdmain, NULL);
> +    }
> +
> +    rdmaout = rioc->rdmaout;
> +    if (rdmaout) {
> +        atomic_rcu_set(&rioc->rdmaout, NULL);
>      }
> +
> +    synchronize_rcu();
> +
> +    if (rdmain) {
> +        qemu_rdma_cleanup(rdmain);
> +    }
> +
> +    if (rdmaout) {
> +        qemu_rdma_cleanup(rdmaout);
> +    }
> +
> +    g_free(rdmain);
> +    g_free(rdmaout);
> +
>      return 0;
>  }
>  
> @@ -2951,12 +3034,21 @@ static size_t qemu_rdma_save_page(QEMUFile *f, void *opaque,
>                                    size_t size, uint64_t *bytes_sent)
>  {
>      QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(opaque);
> -    RDMAContext *rdma = rioc->rdma;
> +    RDMAContext *rdma;
>      int ret;
>  
> +    rcu_read_lock();
> +    rdma = atomic_rcu_read(&rioc->rdmaout);
> +
> +    if (!rdma) {
> +        rcu_read_unlock();
> +        return -EIO;
> +    }
> +
>      CHECK_ERROR_STATE();
>  
>      if (migrate_get_current()->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
> +        rcu_read_unlock();
>          return RAM_SAVE_CONTROL_NOT_SUPP;
>      }
>  
> @@ -3041,9 +3133,11 @@ static size_t qemu_rdma_save_page(QEMUFile *f, void *opaque,
>          }
>      }
>  
> +    rcu_read_unlock();
>      return RAM_SAVE_CONTROL_DELAYED;
>  err:
>      rdma->error_state = ret;
> +    rcu_read_unlock();
>      return ret;
>  }
>  
> @@ -3219,8 +3313,8 @@ static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque)
>      RDMAControlHeader blocks = { .type = RDMA_CONTROL_RAM_BLOCKS_RESULT,
>                                   .repeat = 1 };
>      QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(opaque);
> -    RDMAContext *rdma = rioc->rdma;
> -    RDMALocalBlocks *local = &rdma->local_ram_blocks;
> +    RDMAContext *rdma;
> +    RDMALocalBlocks *local;
>      RDMAControlHeader head;
>      RDMARegister *reg, *registers;
>      RDMACompress *comp;
> @@ -3233,8 +3327,17 @@ static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque)
>      int count = 0;
>      int i = 0;
>  
> +    rcu_read_lock();
> +    rdma = atomic_rcu_read(&rioc->rdmain);
> +
> +    if (!rdma) {
> +        rcu_read_unlock();
> +        return -EIO;
> +    }
> +
>      CHECK_ERROR_STATE();
>  
> +    local = &rdma->local_ram_blocks;
>      do {
>          trace_qemu_rdma_registration_handle_wait();
>  
> @@ -3468,6 +3571,7 @@ out:
>      if (ret < 0) {
>          rdma->error_state = ret;
>      }
> +    rcu_read_unlock();
>      return ret;
>  }
>  
> @@ -3481,10 +3585,18 @@ out:
>  static int
>  rdma_block_notification_handle(QIOChannelRDMA *rioc, const char *name)
>  {
> -    RDMAContext *rdma = rioc->rdma;
> +    RDMAContext *rdma;
>      int curr;
>      int found = -1;
>  
> +    rcu_read_lock();
> +    rdma = atomic_rcu_read(&rioc->rdmain);
> +
> +    if (!rdma) {
> +        rcu_read_unlock();
> +        return -EIO;
> +    }
> +
>      /* Find the matching RAMBlock in our local list */
>      for (curr = 0; curr < rdma->local_ram_blocks.nb_blocks; curr++) {
>          if (!strcmp(rdma->local_ram_blocks.block[curr].block_name, name)) {
> @@ -3495,6 +3607,7 @@ rdma_block_notification_handle(QIOChannelRDMA *rioc, const char *name)
>  
>      if (found == -1) {
>          error_report("RAMBlock '%s' not found on destination", name);
> +        rcu_read_unlock();
>          return -ENOENT;
>      }
>  
> @@ -3502,6 +3615,7 @@ rdma_block_notification_handle(QIOChannelRDMA *rioc, const char *name)
>      trace_rdma_block_notification_handle(name, rdma->next_src_index);
>      rdma->next_src_index++;
>  
> +    rcu_read_unlock();
>      return 0;
>  }
>  
> @@ -3524,11 +3638,19 @@ static int qemu_rdma_registration_start(QEMUFile *f, void *opaque,
>                                          uint64_t flags, void *data)
>  {
>      QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(opaque);
> -    RDMAContext *rdma = rioc->rdma;
> +    RDMAContext *rdma;
> +
> +    rcu_read_lock();
> +    rdma = atomic_rcu_read(&rioc->rdmaout);
> +    if (!rdma) {
> +        rcu_read_unlock();
> +        return -EIO;
> +    }
>  
>      CHECK_ERROR_STATE();
>  
>      if (migrate_get_current()->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
> +        rcu_read_unlock();
>          return 0;
>      }
>  
> @@ -3536,6 +3658,7 @@ static int qemu_rdma_registration_start(QEMUFile *f, void *opaque,
>      qemu_put_be64(f, RAM_SAVE_FLAG_HOOK);
>      qemu_fflush(f);
>  
> +    rcu_read_unlock();
>      return 0;
>  }
>  
> @@ -3548,13 +3671,21 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
>  {
>      Error *local_err = NULL, **errp = &local_err;
>      QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(opaque);
> -    RDMAContext *rdma = rioc->rdma;
> +    RDMAContext *rdma;
>      RDMAControlHeader head = { .len = 0, .repeat = 1 };
>      int ret = 0;
>  
> +    rcu_read_lock();
> +    rdma = atomic_rcu_read(&rioc->rdmaout);
> +    if (!rdma) {
> +        rcu_read_unlock();
> +        return -EIO;
> +    }
> +
>      CHECK_ERROR_STATE();
>  
>      if (migrate_get_current()->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
> +        rcu_read_unlock();
>          return 0;
>      }
>  
> @@ -3586,6 +3717,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
>                      qemu_rdma_reg_whole_ram_blocks : NULL);
>          if (ret < 0) {
>              ERROR(errp, "receiving remote info!");
> +            rcu_read_unlock();
>              return ret;
>          }
>  
> @@ -3609,6 +3741,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
>                          "not identical on both the source and destination.",
>                          local->nb_blocks, nb_dest_blocks);
>              rdma->error_state = -EINVAL;
> +            rcu_read_unlock();
>              return -EINVAL;
>          }
>  
> @@ -3625,6 +3758,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
>                              local->block[i].length,
>                              rdma->dest_blocks[i].length);
>                  rdma->error_state = -EINVAL;
> +                rcu_read_unlock();
>                  return -EINVAL;
>              }
>              local->block[i].remote_host_addr =
> @@ -3642,9 +3776,11 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
>          goto err;
>      }
>  
> +    rcu_read_unlock();
>      return 0;
>  err:
>      rdma->error_state = ret;
> +    rcu_read_unlock();
>      return ret;
>  }
>  
> @@ -3662,10 +3798,15 @@ static const QEMUFileHooks rdma_write_hooks = {
>  static void qio_channel_rdma_finalize(Object *obj)
>  {
>      QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(obj);
> -    if (rioc->rdma) {
> -        qemu_rdma_cleanup(rioc->rdma);
> -        g_free(rioc->rdma);
> -        rioc->rdma = NULL;
> +    if (rioc->rdmain) {
> +        qemu_rdma_cleanup(rioc->rdmain);
> +        g_free(rioc->rdmain);
> +        rioc->rdmain = NULL;
> +    }
> +    if (rioc->rdmaout) {
> +        qemu_rdma_cleanup(rioc->rdmaout);
> +        g_free(rioc->rdmaout);
> +        rioc->rdmaout = NULL;
>      }
>  }
>  
> @@ -3705,13 +3846,16 @@ static QEMUFile *qemu_fopen_rdma(RDMAContext *rdma, const char *mode)
>      }
>  
>      rioc = QIO_CHANNEL_RDMA(object_new(TYPE_QIO_CHANNEL_RDMA));
> -    rioc->rdma = rdma;
>  
>      if (mode[0] == 'w') {
>          rioc->file = qemu_fopen_channel_output(QIO_CHANNEL(rioc));
> +        rioc->rdmaout = rdma;
> +        rioc->rdmain = rdma->return_path;
>          qemu_file_set_hooks(rioc->file, &rdma_write_hooks);
>      } else {
>          rioc->file = qemu_fopen_channel_input(QIO_CHANNEL(rioc));
> +        rioc->rdmain = rdma;
> +        rioc->rdmaout = rdma->return_path;
>          qemu_file_set_hooks(rioc->file, &rdma_read_hooks);
>      }
>  
> diff --git a/migration/savevm.c b/migration/savevm.c
> index c2f34ff..21c07d4 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1622,6 +1622,7 @@ static void *postcopy_ram_listen_thread(void *opaque)
>      qemu_sem_post(&mis->listen_thread_sem);
>      trace_postcopy_ram_listen_thread_start();
>  
> +    rcu_register_thread();
>      /*
>       * Because we're a thread and not a coroutine we can't yield
>       * in qemu_file, and thus we must be blocking now.
> @@ -1662,6 +1663,7 @@ static void *postcopy_ram_listen_thread(void *opaque)
>           * to leave the guest running and fire MCEs for pages that never
>           * arrived as a desperate recovery step.
>           */
> +        rcu_unregister_thread();
>          exit(EXIT_FAILURE);
>      }
>  
> @@ -1676,6 +1678,7 @@ static void *postcopy_ram_listen_thread(void *opaque)
>      migration_incoming_state_destroy();
>      qemu_loadvm_state_cleanup();
>  
> +    rcu_unregister_thread();
>      return NULL;
>  }
>  
> -- 
> 1.8.3.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v5 09/10] migration: poll the cm event while wait RDMA work request completion
  2018-06-05 15:28 ` [Qemu-devel] [PATCH v5 09/10] migration: poll the cm event while wait RDMA work request completion Lidong Chen
@ 2018-06-13 14:24   ` Dr. David Alan Gilbert
  2018-06-14  2:42     ` 858585 jemmy
  0 siblings, 1 reply; 22+ messages in thread
From: Dr. David Alan Gilbert @ 2018-06-13 14:24 UTC (permalink / raw)
  To: Lidong Chen
  Cc: zhang.zhanghailiang, quintela, berrange, aviadye, pbonzini,
	qemu-devel, adido, galsha, Lidong Chen

* Lidong Chen (jemmy858585@gmail.com) wrote:
> If the peer qemu is crashed, the qemu_rdma_wait_comp_channel function
> maybe loop forever. so we should also poll the cm event fd, and when
> receive RDMA_CM_EVENT_DISCONNECTED and RDMA_CM_EVENT_DEVICE_REMOVAL,
> we consider some error happened.
> 
> Signed-off-by: Lidong Chen <lidongchen@tencent.com>

Was there a reply which explained/pointed to docs for cm_event?
Or a Review-by from one of the Infiniband people would be fine.

Dave

> ---
>  migration/rdma.c | 33 ++++++++++++++++++++++++++++++---
>  1 file changed, 30 insertions(+), 3 deletions(-)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index f12e8d5..bb6989e 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -1489,6 +1489,9 @@ static uint64_t qemu_rdma_poll(RDMAContext *rdma, uint64_t *wr_id_out,
>   */
>  static int qemu_rdma_wait_comp_channel(RDMAContext *rdma)
>  {
> +    struct rdma_cm_event *cm_event;
> +    int ret = -1;
> +
>      /*
>       * Coroutine doesn't start until migration_fd_process_incoming()
>       * so don't yield unless we know we're running inside of a coroutine.
> @@ -1505,13 +1508,37 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma)
>           * without hanging forever.
>           */
>          while (!rdma->error_state  && !rdma->received_error) {
> -            GPollFD pfds[1];
> +            GPollFD pfds[2];
>              pfds[0].fd = rdma->comp_channel->fd;
>              pfds[0].events = G_IO_IN | G_IO_HUP | G_IO_ERR;
> +            pfds[0].revents = 0;
> +
> +            pfds[1].fd = rdma->channel->fd;
> +            pfds[1].events = G_IO_IN | G_IO_HUP | G_IO_ERR;
> +            pfds[1].revents = 0;
> +
>              /* 0.1s timeout, should be fine for a 'cancel' */
> -            switch (qemu_poll_ns(pfds, 1, 100 * 1000 * 1000)) {
> +            switch (qemu_poll_ns(pfds, 2, 100 * 1000 * 1000)) {
> +            case 2:
>              case 1: /* fd active */
> -                return 0;
> +                if (pfds[0].revents) {
> +                    return 0;
> +                }
> +
> +                if (pfds[1].revents) {
> +                    ret = rdma_get_cm_event(rdma->channel, &cm_event);
> +                    if (!ret) {
> +                        rdma_ack_cm_event(cm_event);
> +                    }
> +
> +                    error_report("receive cm event while wait comp channel,"
> +                                 "cm event is %d", cm_event->event);
> +                    if (cm_event->event == RDMA_CM_EVENT_DISCONNECTED ||
> +                        cm_event->event == RDMA_CM_EVENT_DEVICE_REMOVAL) {
> +                        return -EPIPE;
> +                    }
> +                }
> +                break;
>  
>              case 0: /* Timeout, go around again */
>                  break;
> -- 
> 1.8.3.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v5 03/10] migration: avoid concurrent invoke channel_close by different threads
  2018-06-05 15:28 ` [Qemu-devel] [PATCH v5 03/10] migration: avoid concurrent invoke channel_close by different threads Lidong Chen
@ 2018-06-13 17:02   ` Daniel P. Berrangé
  0 siblings, 0 replies; 22+ messages in thread
From: Daniel P. Berrangé @ 2018-06-13 17:02 UTC (permalink / raw)
  To: Lidong Chen
  Cc: zhang.zhanghailiang, quintela, dgilbert, aviadye, pbonzini,
	qemu-devel, adido, galsha, Lidong Chen

On Tue, Jun 05, 2018 at 11:28:02PM +0800, Lidong Chen wrote:
> The channel_close maybe invoked by different threads. For example, source
> qemu invokes qemu_fclose in main thread, migration thread and return path
> thread. Destination qemu invokes qemu_fclose in main thread, listen thread
> and COLO incoming thread.

The only alternative approach would be to try to get all close calls
happening in the same thread, by having the other threads just set
some flag the primary thread watches.  This likely quite alot of
work to achieve though, so I agree just doing mutex locking is a
prudent solution.

> 
> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
> ---
>  migration/migration.c | 2 ++
>  migration/migration.h | 7 +++++++
>  migration/qemu-file.c | 6 ++++--
>  3 files changed, 13 insertions(+), 2 deletions(-)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v5 09/10] migration: poll the cm event while wait RDMA work request completion
  2018-06-13 14:24   ` Dr. David Alan Gilbert
@ 2018-06-14  2:42     ` 858585 jemmy
  0 siblings, 0 replies; 22+ messages in thread
From: 858585 jemmy @ 2018-06-14  2:42 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: zhang.zhanghailiang, Juan Quintela, Daniel P. Berrange,
	Aviad Yehezkel, Paolo Bonzini, qemu-devel, Adi Dotan,
	Gal Shachaf, Lidong Chen

On Wed, Jun 13, 2018 at 10:24 PM, Dr. David Alan Gilbert
<dgilbert@redhat.com> wrote:
> * Lidong Chen (jemmy858585@gmail.com) wrote:
>> If the peer qemu is crashed, the qemu_rdma_wait_comp_channel function
>> maybe loop forever. so we should also poll the cm event fd, and when
>> receive RDMA_CM_EVENT_DISCONNECTED and RDMA_CM_EVENT_DEVICE_REMOVAL,
>> we consider some error happened.
>>
>> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
>
> Was there a reply which explained/pointed to docs for cm_event?

https://linux.die.net/man/3/rdma_get_cm_event

> Or a Review-by from one of the Infiniband people would be fine.

yes, I should add Gal Shachaf <galsha@mellanox.com>,Aviad Yehezkel
<aviadye@mellanox.com>
we are working together on RDMA live migration.

Thanks.

>
> Dave
>
>> ---
>>  migration/rdma.c | 33 ++++++++++++++++++++++++++++++---
>>  1 file changed, 30 insertions(+), 3 deletions(-)
>>
>> diff --git a/migration/rdma.c b/migration/rdma.c
>> index f12e8d5..bb6989e 100644
>> --- a/migration/rdma.c
>> +++ b/migration/rdma.c
>> @@ -1489,6 +1489,9 @@ static uint64_t qemu_rdma_poll(RDMAContext *rdma, uint64_t *wr_id_out,
>>   */
>>  static int qemu_rdma_wait_comp_channel(RDMAContext *rdma)
>>  {
>> +    struct rdma_cm_event *cm_event;
>> +    int ret = -1;
>> +
>>      /*
>>       * Coroutine doesn't start until migration_fd_process_incoming()
>>       * so don't yield unless we know we're running inside of a coroutine.
>> @@ -1505,13 +1508,37 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma)
>>           * without hanging forever.
>>           */
>>          while (!rdma->error_state  && !rdma->received_error) {
>> -            GPollFD pfds[1];
>> +            GPollFD pfds[2];
>>              pfds[0].fd = rdma->comp_channel->fd;
>>              pfds[0].events = G_IO_IN | G_IO_HUP | G_IO_ERR;
>> +            pfds[0].revents = 0;
>> +
>> +            pfds[1].fd = rdma->channel->fd;
>> +            pfds[1].events = G_IO_IN | G_IO_HUP | G_IO_ERR;
>> +            pfds[1].revents = 0;
>> +
>>              /* 0.1s timeout, should be fine for a 'cancel' */
>> -            switch (qemu_poll_ns(pfds, 1, 100 * 1000 * 1000)) {
>> +            switch (qemu_poll_ns(pfds, 2, 100 * 1000 * 1000)) {
>> +            case 2:
>>              case 1: /* fd active */
>> -                return 0;
>> +                if (pfds[0].revents) {
>> +                    return 0;
>> +                }
>> +
>> +                if (pfds[1].revents) {
>> +                    ret = rdma_get_cm_event(rdma->channel, &cm_event);
>> +                    if (!ret) {
>> +                        rdma_ack_cm_event(cm_event);
>> +                    }
>> +
>> +                    error_report("receive cm event while wait comp channel,"
>> +                                 "cm event is %d", cm_event->event);
>> +                    if (cm_event->event == RDMA_CM_EVENT_DISCONNECTED ||
>> +                        cm_event->event == RDMA_CM_EVENT_DEVICE_REMOVAL) {
>> +                        return -EPIPE;
>> +                    }
>> +                }
>> +                break;
>>
>>              case 0: /* Timeout, go around again */
>>                  break;
>> --
>> 1.8.3.1
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v5 04/10] migration: implement bi-directional RDMA QIOChannel
  2018-06-13 14:21   ` Dr. David Alan Gilbert
@ 2018-06-27 15:31     ` 858585 jemmy
  0 siblings, 0 replies; 22+ messages in thread
From: 858585 jemmy @ 2018-06-27 15:31 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, Paolo Bonzini
  Cc: zhang.zhanghailiang, Juan Quintela, Daniel P. Berrange,
	Aviad Yehezkel, qemu-devel, Adi Dotan, Gal Shachaf, Lidong Chen

On Wed, Jun 13, 2018 at 10:21 PM, Dr. David Alan Gilbert
<dgilbert@redhat.com> wrote:
> * Lidong Chen (jemmy858585@gmail.com) wrote:
>> From: Lidong Chen <jemmy858585@gmail.com>
>>
>> This patch implements bi-directional RDMA QIOChannel. Because different
>> threads may access RDMAQIOChannel currently, this patch use RCU to protect it.
>>
>> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
>
> Paolo: Does it make sense the way RCU is used here  Holding the
> read-lock for so long in multifd_rdma_[read|write]v is what worries me
> most.
>
> Dave
>

Hi Paolo:
     Could you review this patch?
     Thanks.

>> ---
>>  migration/colo.c         |   2 +
>>  migration/migration.c    |   2 +
>>  migration/postcopy-ram.c |   2 +
>>  migration/ram.c          |   4 +
>>  migration/rdma.c         | 196 ++++++++++++++++++++++++++++++++++++++++-------
>>  migration/savevm.c       |   3 +
>>  6 files changed, 183 insertions(+), 26 deletions(-)
>>
>> diff --git a/migration/colo.c b/migration/colo.c
>> index 4381067..88936f5 100644
>> --- a/migration/colo.c
>> +++ b/migration/colo.c
>> @@ -534,6 +534,7 @@ void *colo_process_incoming_thread(void *opaque)
>>      uint64_t value;
>>      Error *local_err = NULL;
>>
>> +    rcu_register_thread();
>>      qemu_sem_init(&mis->colo_incoming_sem, 0);
>>
>>      migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
>> @@ -666,5 +667,6 @@ out:
>>      }
>>      migration_incoming_exit_colo();
>>
>> +    rcu_unregister_thread();
>>      return NULL;
>>  }
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 1d0aaec..4253d9f 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -2028,6 +2028,7 @@ static void *source_return_path_thread(void *opaque)
>>      int res;
>>
>>      trace_source_return_path_thread_entry();
>> +    rcu_register_thread();
>>
>>  retry:
>>      while (!ms->rp_state.error && !qemu_file_get_error(rp) &&
>> @@ -2167,6 +2168,7 @@ out:
>>      trace_source_return_path_thread_end();
>>      ms->rp_state.from_dst_file = NULL;
>>      qemu_fclose(rp);
>> +    rcu_unregister_thread();
>>      return NULL;
>>  }
>>
>> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
>> index 48e5155..98613eb 100644
>> --- a/migration/postcopy-ram.c
>> +++ b/migration/postcopy-ram.c
>> @@ -853,6 +853,7 @@ static void *postcopy_ram_fault_thread(void *opaque)
>>      RAMBlock *rb = NULL;
>>
>>      trace_postcopy_ram_fault_thread_entry();
>> +    rcu_register_thread();
>>      mis->last_rb = NULL; /* last RAMBlock we sent part of */
>>      qemu_sem_post(&mis->fault_thread_sem);
>>
>> @@ -1059,6 +1060,7 @@ retry:
>>              }
>>          }
>>      }
>> +    rcu_unregister_thread();
>>      trace_postcopy_ram_fault_thread_exit();
>>      g_free(pfd);
>>      return NULL;
>> diff --git a/migration/ram.c b/migration/ram.c
>> index a500015..a674fb5 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -683,6 +683,7 @@ static void *multifd_send_thread(void *opaque)
>>      MultiFDSendParams *p = opaque;
>>      Error *local_err = NULL;
>>
>> +    rcu_register_thread();
>>      if (multifd_send_initial_packet(p, &local_err) < 0) {
>>          goto out;
>>      }
>> @@ -706,6 +707,7 @@ out:
>>      p->running = false;
>>      qemu_mutex_unlock(&p->mutex);
>>
>> +    rcu_unregister_thread();
>>      return NULL;
>>  }
>>
>> @@ -819,6 +821,7 @@ static void *multifd_recv_thread(void *opaque)
>>  {
>>      MultiFDRecvParams *p = opaque;
>>
>> +    rcu_register_thread();
>>      while (true) {
>>          qemu_mutex_lock(&p->mutex);
>>          if (p->quit) {
>> @@ -833,6 +836,7 @@ static void *multifd_recv_thread(void *opaque)
>>      p->running = false;
>>      qemu_mutex_unlock(&p->mutex);
>>
>> +    rcu_unregister_thread();
>>      return NULL;
>>  }
>>
>> diff --git a/migration/rdma.c b/migration/rdma.c
>> index f6705a3..769f443 100644
>> --- a/migration/rdma.c
>> +++ b/migration/rdma.c
>> @@ -86,6 +86,7 @@ static uint32_t known_capabilities = RDMA_CAPABILITY_PIN_ALL;
>>                                  " to abort!"); \
>>                  rdma->error_reported = 1; \
>>              } \
>> +            rcu_read_unlock(); \
>>              return rdma->error_state; \
>>          } \
>>      } while (0)
>> @@ -402,7 +403,8 @@ typedef struct QIOChannelRDMA QIOChannelRDMA;
>>
>>  struct QIOChannelRDMA {
>>      QIOChannel parent;
>> -    RDMAContext *rdma;
>> +    RDMAContext *rdmain;
>> +    RDMAContext *rdmaout;
>>      QEMUFile *file;
>>      bool blocking; /* XXX we don't actually honour this yet */
>>  };
>> @@ -2630,12 +2632,20 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
>>  {
>>      QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
>>      QEMUFile *f = rioc->file;
>> -    RDMAContext *rdma = rioc->rdma;
>> +    RDMAContext *rdma;
>>      int ret;
>>      ssize_t done = 0;
>>      size_t i;
>>      size_t len = 0;
>>
>> +    rcu_read_lock();
>> +    rdma = atomic_rcu_read(&rioc->rdmaout);
>> +
>> +    if (!rdma) {
>> +        rcu_read_unlock();
>> +        return -EIO;
>> +    }
>> +
>>      CHECK_ERROR_STATE();
>>
>>      /*
>> @@ -2645,6 +2655,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
>>      ret = qemu_rdma_write_flush(f, rdma);
>>      if (ret < 0) {
>>          rdma->error_state = ret;
>> +        rcu_read_unlock();
>>          return ret;
>>      }
>>
>> @@ -2664,6 +2675,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
>>
>>              if (ret < 0) {
>>                  rdma->error_state = ret;
>> +                rcu_read_unlock();
>>                  return ret;
>>              }
>>
>> @@ -2672,6 +2684,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
>>          }
>>      }
>>
>> +    rcu_read_unlock();
>>      return done;
>>  }
>>
>> @@ -2705,12 +2718,20 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
>>                                        Error **errp)
>>  {
>>      QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
>> -    RDMAContext *rdma = rioc->rdma;
>> +    RDMAContext *rdma;
>>      RDMAControlHeader head;
>>      int ret = 0;
>>      ssize_t i;
>>      size_t done = 0;
>>
>> +    rcu_read_lock();
>> +    rdma = atomic_rcu_read(&rioc->rdmain);
>> +
>> +    if (!rdma) {
>> +        rcu_read_unlock();
>> +        return -EIO;
>> +    }
>> +
>>      CHECK_ERROR_STATE();
>>
>>      for (i = 0; i < niov; i++) {
>> @@ -2722,7 +2743,7 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
>>           * were given and dish out the bytes until we run
>>           * out of bytes.
>>           */
>> -        ret = qemu_rdma_fill(rioc->rdma, data, want, 0);
>> +        ret = qemu_rdma_fill(rdma, data, want, 0);
>>          done += ret;
>>          want -= ret;
>>          /* Got what we needed, so go to next iovec */
>> @@ -2744,25 +2765,28 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
>>
>>          if (ret < 0) {
>>              rdma->error_state = ret;
>> +            rcu_read_unlock();
>>              return ret;
>>          }
>>
>>          /*
>>           * SEND was received with new bytes, now try again.
>>           */
>> -        ret = qemu_rdma_fill(rioc->rdma, data, want, 0);
>> +        ret = qemu_rdma_fill(rdma, data, want, 0);
>>          done += ret;
>>          want -= ret;
>>
>>          /* Still didn't get enough, so lets just return */
>>          if (want) {
>>              if (done == 0) {
>> +                rcu_read_unlock();
>>                  return QIO_CHANNEL_ERR_BLOCK;
>>              } else {
>>                  break;
>>              }
>>          }
>>      }
>> +    rcu_read_unlock();
>>      return done;
>>  }
>>
>> @@ -2814,15 +2838,29 @@ qio_channel_rdma_source_prepare(GSource *source,
>>                                  gint *timeout)
>>  {
>>      QIOChannelRDMASource *rsource = (QIOChannelRDMASource *)source;
>> -    RDMAContext *rdma = rsource->rioc->rdma;
>> +    RDMAContext *rdma;
>>      GIOCondition cond = 0;
>>      *timeout = -1;
>>
>> +    rcu_read_lock();
>> +    if (rsource->condition == G_IO_IN) {
>> +        rdma = atomic_rcu_read(&rsource->rioc->rdmain);
>> +    } else {
>> +        rdma = atomic_rcu_read(&rsource->rioc->rdmaout);
>> +    }
>> +
>> +    if (!rdma) {
>> +        error_report("RDMAContext is NULL when prepare Gsource");
>> +        rcu_read_unlock();
>> +        return FALSE;
>> +    }
>> +
>>      if (rdma->wr_data[0].control_len) {
>>          cond |= G_IO_IN;
>>      }
>>      cond |= G_IO_OUT;
>>
>> +    rcu_read_unlock();
>>      return cond & rsource->condition;
>>  }
>>
>> @@ -2830,14 +2868,28 @@ static gboolean
>>  qio_channel_rdma_source_check(GSource *source)
>>  {
>>      QIOChannelRDMASource *rsource = (QIOChannelRDMASource *)source;
>> -    RDMAContext *rdma = rsource->rioc->rdma;
>> +    RDMAContext *rdma;
>>      GIOCondition cond = 0;
>>
>> +    rcu_read_lock();
>> +    if (rsource->condition == G_IO_IN) {
>> +        rdma = atomic_rcu_read(&rsource->rioc->rdmain);
>> +    } else {
>> +        rdma = atomic_rcu_read(&rsource->rioc->rdmaout);
>> +    }
>> +
>> +    if (!rdma) {
>> +        error_report("RDMAContext is NULL when check Gsource");
>> +        rcu_read_unlock();
>> +        return FALSE;
>> +    }
>> +
>>      if (rdma->wr_data[0].control_len) {
>>          cond |= G_IO_IN;
>>      }
>>      cond |= G_IO_OUT;
>>
>> +    rcu_read_unlock();
>>      return cond & rsource->condition;
>>  }
>>
>> @@ -2848,14 +2900,28 @@ qio_channel_rdma_source_dispatch(GSource *source,
>>  {
>>      QIOChannelFunc func = (QIOChannelFunc)callback;
>>      QIOChannelRDMASource *rsource = (QIOChannelRDMASource *)source;
>> -    RDMAContext *rdma = rsource->rioc->rdma;
>> +    RDMAContext *rdma;
>>      GIOCondition cond = 0;
>>
>> +    rcu_read_lock();
>> +    if (rsource->condition == G_IO_IN) {
>> +        rdma = atomic_rcu_read(&rsource->rioc->rdmain);
>> +    } else {
>> +        rdma = atomic_rcu_read(&rsource->rioc->rdmaout);
>> +    }
>> +
>> +    if (!rdma) {
>> +        error_report("RDMAContext is NULL when dispatch Gsource");
>> +        rcu_read_unlock();
>> +        return FALSE;
>> +    }
>> +
>>      if (rdma->wr_data[0].control_len) {
>>          cond |= G_IO_IN;
>>      }
>>      cond |= G_IO_OUT;
>>
>> +    rcu_read_unlock();
>>      return (*func)(QIO_CHANNEL(rsource->rioc),
>>                     (cond & rsource->condition),
>>                     user_data);
>> @@ -2900,15 +2966,32 @@ static int qio_channel_rdma_close(QIOChannel *ioc,
>>                                    Error **errp)
>>  {
>>      QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
>> +    RDMAContext *rdmain, *rdmaout;
>>      trace_qemu_rdma_close();
>> -    if (rioc->rdma) {
>> -        if (!rioc->rdma->error_state) {
>> -            rioc->rdma->error_state = qemu_file_get_error(rioc->file);
>> -        }
>> -        qemu_rdma_cleanup(rioc->rdma);
>> -        g_free(rioc->rdma);
>> -        rioc->rdma = NULL;
>> +
>> +    rdmain = rioc->rdmain;
>> +    if (rdmain) {
>> +        atomic_rcu_set(&rioc->rdmain, NULL);
>> +    }
>> +
>> +    rdmaout = rioc->rdmaout;
>> +    if (rdmaout) {
>> +        atomic_rcu_set(&rioc->rdmaout, NULL);
>>      }
>> +
>> +    synchronize_rcu();
>> +
>> +    if (rdmain) {
>> +        qemu_rdma_cleanup(rdmain);
>> +    }
>> +
>> +    if (rdmaout) {
>> +        qemu_rdma_cleanup(rdmaout);
>> +    }
>> +
>> +    g_free(rdmain);
>> +    g_free(rdmaout);
>> +
>>      return 0;
>>  }
>>
>> @@ -2951,12 +3034,21 @@ static size_t qemu_rdma_save_page(QEMUFile *f, void *opaque,
>>                                    size_t size, uint64_t *bytes_sent)
>>  {
>>      QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(opaque);
>> -    RDMAContext *rdma = rioc->rdma;
>> +    RDMAContext *rdma;
>>      int ret;
>>
>> +    rcu_read_lock();
>> +    rdma = atomic_rcu_read(&rioc->rdmaout);
>> +
>> +    if (!rdma) {
>> +        rcu_read_unlock();
>> +        return -EIO;
>> +    }
>> +
>>      CHECK_ERROR_STATE();
>>
>>      if (migrate_get_current()->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
>> +        rcu_read_unlock();
>>          return RAM_SAVE_CONTROL_NOT_SUPP;
>>      }
>>
>> @@ -3041,9 +3133,11 @@ static size_t qemu_rdma_save_page(QEMUFile *f, void *opaque,
>>          }
>>      }
>>
>> +    rcu_read_unlock();
>>      return RAM_SAVE_CONTROL_DELAYED;
>>  err:
>>      rdma->error_state = ret;
>> +    rcu_read_unlock();
>>      return ret;
>>  }
>>
>> @@ -3219,8 +3313,8 @@ static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque)
>>      RDMAControlHeader blocks = { .type = RDMA_CONTROL_RAM_BLOCKS_RESULT,
>>                                   .repeat = 1 };
>>      QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(opaque);
>> -    RDMAContext *rdma = rioc->rdma;
>> -    RDMALocalBlocks *local = &rdma->local_ram_blocks;
>> +    RDMAContext *rdma;
>> +    RDMALocalBlocks *local;
>>      RDMAControlHeader head;
>>      RDMARegister *reg, *registers;
>>      RDMACompress *comp;
>> @@ -3233,8 +3327,17 @@ static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque)
>>      int count = 0;
>>      int i = 0;
>>
>> +    rcu_read_lock();
>> +    rdma = atomic_rcu_read(&rioc->rdmain);
>> +
>> +    if (!rdma) {
>> +        rcu_read_unlock();
>> +        return -EIO;
>> +    }
>> +
>>      CHECK_ERROR_STATE();
>>
>> +    local = &rdma->local_ram_blocks;
>>      do {
>>          trace_qemu_rdma_registration_handle_wait();
>>
>> @@ -3468,6 +3571,7 @@ out:
>>      if (ret < 0) {
>>          rdma->error_state = ret;
>>      }
>> +    rcu_read_unlock();
>>      return ret;
>>  }
>>
>> @@ -3481,10 +3585,18 @@ out:
>>  static int
>>  rdma_block_notification_handle(QIOChannelRDMA *rioc, const char *name)
>>  {
>> -    RDMAContext *rdma = rioc->rdma;
>> +    RDMAContext *rdma;
>>      int curr;
>>      int found = -1;
>>
>> +    rcu_read_lock();
>> +    rdma = atomic_rcu_read(&rioc->rdmain);
>> +
>> +    if (!rdma) {
>> +        rcu_read_unlock();
>> +        return -EIO;
>> +    }
>> +
>>      /* Find the matching RAMBlock in our local list */
>>      for (curr = 0; curr < rdma->local_ram_blocks.nb_blocks; curr++) {
>>          if (!strcmp(rdma->local_ram_blocks.block[curr].block_name, name)) {
>> @@ -3495,6 +3607,7 @@ rdma_block_notification_handle(QIOChannelRDMA *rioc, const char *name)
>>
>>      if (found == -1) {
>>          error_report("RAMBlock '%s' not found on destination", name);
>> +        rcu_read_unlock();
>>          return -ENOENT;
>>      }
>>
>> @@ -3502,6 +3615,7 @@ rdma_block_notification_handle(QIOChannelRDMA *rioc, const char *name)
>>      trace_rdma_block_notification_handle(name, rdma->next_src_index);
>>      rdma->next_src_index++;
>>
>> +    rcu_read_unlock();
>>      return 0;
>>  }
>>
>> @@ -3524,11 +3638,19 @@ static int qemu_rdma_registration_start(QEMUFile *f, void *opaque,
>>                                          uint64_t flags, void *data)
>>  {
>>      QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(opaque);
>> -    RDMAContext *rdma = rioc->rdma;
>> +    RDMAContext *rdma;
>> +
>> +    rcu_read_lock();
>> +    rdma = atomic_rcu_read(&rioc->rdmaout);
>> +    if (!rdma) {
>> +        rcu_read_unlock();
>> +        return -EIO;
>> +    }
>>
>>      CHECK_ERROR_STATE();
>>
>>      if (migrate_get_current()->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
>> +        rcu_read_unlock();
>>          return 0;
>>      }
>>
>> @@ -3536,6 +3658,7 @@ static int qemu_rdma_registration_start(QEMUFile *f, void *opaque,
>>      qemu_put_be64(f, RAM_SAVE_FLAG_HOOK);
>>      qemu_fflush(f);
>>
>> +    rcu_read_unlock();
>>      return 0;
>>  }
>>
>> @@ -3548,13 +3671,21 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
>>  {
>>      Error *local_err = NULL, **errp = &local_err;
>>      QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(opaque);
>> -    RDMAContext *rdma = rioc->rdma;
>> +    RDMAContext *rdma;
>>      RDMAControlHeader head = { .len = 0, .repeat = 1 };
>>      int ret = 0;
>>
>> +    rcu_read_lock();
>> +    rdma = atomic_rcu_read(&rioc->rdmaout);
>> +    if (!rdma) {
>> +        rcu_read_unlock();
>> +        return -EIO;
>> +    }
>> +
>>      CHECK_ERROR_STATE();
>>
>>      if (migrate_get_current()->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
>> +        rcu_read_unlock();
>>          return 0;
>>      }
>>
>> @@ -3586,6 +3717,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
>>                      qemu_rdma_reg_whole_ram_blocks : NULL);
>>          if (ret < 0) {
>>              ERROR(errp, "receiving remote info!");
>> +            rcu_read_unlock();
>>              return ret;
>>          }
>>
>> @@ -3609,6 +3741,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
>>                          "not identical on both the source and destination.",
>>                          local->nb_blocks, nb_dest_blocks);
>>              rdma->error_state = -EINVAL;
>> +            rcu_read_unlock();
>>              return -EINVAL;
>>          }
>>
>> @@ -3625,6 +3758,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
>>                              local->block[i].length,
>>                              rdma->dest_blocks[i].length);
>>                  rdma->error_state = -EINVAL;
>> +                rcu_read_unlock();
>>                  return -EINVAL;
>>              }
>>              local->block[i].remote_host_addr =
>> @@ -3642,9 +3776,11 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
>>          goto err;
>>      }
>>
>> +    rcu_read_unlock();
>>      return 0;
>>  err:
>>      rdma->error_state = ret;
>> +    rcu_read_unlock();
>>      return ret;
>>  }
>>
>> @@ -3662,10 +3798,15 @@ static const QEMUFileHooks rdma_write_hooks = {
>>  static void qio_channel_rdma_finalize(Object *obj)
>>  {
>>      QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(obj);
>> -    if (rioc->rdma) {
>> -        qemu_rdma_cleanup(rioc->rdma);
>> -        g_free(rioc->rdma);
>> -        rioc->rdma = NULL;
>> +    if (rioc->rdmain) {
>> +        qemu_rdma_cleanup(rioc->rdmain);
>> +        g_free(rioc->rdmain);
>> +        rioc->rdmain = NULL;
>> +    }
>> +    if (rioc->rdmaout) {
>> +        qemu_rdma_cleanup(rioc->rdmaout);
>> +        g_free(rioc->rdmaout);
>> +        rioc->rdmaout = NULL;
>>      }
>>  }
>>
>> @@ -3705,13 +3846,16 @@ static QEMUFile *qemu_fopen_rdma(RDMAContext *rdma, const char *mode)
>>      }
>>
>>      rioc = QIO_CHANNEL_RDMA(object_new(TYPE_QIO_CHANNEL_RDMA));
>> -    rioc->rdma = rdma;
>>
>>      if (mode[0] == 'w') {
>>          rioc->file = qemu_fopen_channel_output(QIO_CHANNEL(rioc));
>> +        rioc->rdmaout = rdma;
>> +        rioc->rdmain = rdma->return_path;
>>          qemu_file_set_hooks(rioc->file, &rdma_write_hooks);
>>      } else {
>>          rioc->file = qemu_fopen_channel_input(QIO_CHANNEL(rioc));
>> +        rioc->rdmain = rdma;
>> +        rioc->rdmaout = rdma->return_path;
>>          qemu_file_set_hooks(rioc->file, &rdma_read_hooks);
>>      }
>>
>> diff --git a/migration/savevm.c b/migration/savevm.c
>> index c2f34ff..21c07d4 100644
>> --- a/migration/savevm.c
>> +++ b/migration/savevm.c
>> @@ -1622,6 +1622,7 @@ static void *postcopy_ram_listen_thread(void *opaque)
>>      qemu_sem_post(&mis->listen_thread_sem);
>>      trace_postcopy_ram_listen_thread_start();
>>
>> +    rcu_register_thread();
>>      /*
>>       * Because we're a thread and not a coroutine we can't yield
>>       * in qemu_file, and thus we must be blocking now.
>> @@ -1662,6 +1663,7 @@ static void *postcopy_ram_listen_thread(void *opaque)
>>           * to leave the guest running and fire MCEs for pages that never
>>           * arrived as a desperate recovery step.
>>           */
>> +        rcu_unregister_thread();
>>          exit(EXIT_FAILURE);
>>      }
>>
>> @@ -1676,6 +1678,7 @@ static void *postcopy_ram_listen_thread(void *opaque)
>>      migration_incoming_state_destroy();
>>      qemu_loadvm_state_cleanup();
>>
>> +    rcu_unregister_thread();
>>      return NULL;
>>  }
>>
>> --
>> 1.8.3.1
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v5 04/10] migration: implement bi-directional RDMA QIOChannel
  2018-06-05 15:28 ` [Qemu-devel] [PATCH v5 04/10] migration: implement bi-directional RDMA QIOChannel Lidong Chen
  2018-06-13 14:21   ` Dr. David Alan Gilbert
@ 2018-06-27 15:32   ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 22+ messages in thread
From: Dr. David Alan Gilbert @ 2018-06-27 15:32 UTC (permalink / raw)
  To: Lidong Chen
  Cc: zhang.zhanghailiang, quintela, berrange, aviadye, pbonzini,
	qemu-devel, adido, galsha, Lidong Chen

* Lidong Chen (jemmy858585@gmail.com) wrote:
> From: Lidong Chen <jemmy858585@gmail.com>
> 
> This patch implements bi-directional RDMA QIOChannel. Because different
> threads may access RDMAQIOChannel currently, this patch use RCU to protect it.
> 
> Signed-off-by: Lidong Chen <lidongchen@tencent.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/colo.c         |   2 +
>  migration/migration.c    |   2 +
>  migration/postcopy-ram.c |   2 +
>  migration/ram.c          |   4 +
>  migration/rdma.c         | 196 ++++++++++++++++++++++++++++++++++++++++-------
>  migration/savevm.c       |   3 +
>  6 files changed, 183 insertions(+), 26 deletions(-)
> 
> diff --git a/migration/colo.c b/migration/colo.c
> index 4381067..88936f5 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -534,6 +534,7 @@ void *colo_process_incoming_thread(void *opaque)
>      uint64_t value;
>      Error *local_err = NULL;
>  
> +    rcu_register_thread();
>      qemu_sem_init(&mis->colo_incoming_sem, 0);
>  
>      migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
> @@ -666,5 +667,6 @@ out:
>      }
>      migration_incoming_exit_colo();
>  
> +    rcu_unregister_thread();
>      return NULL;
>  }
> diff --git a/migration/migration.c b/migration/migration.c
> index 1d0aaec..4253d9f 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2028,6 +2028,7 @@ static void *source_return_path_thread(void *opaque)
>      int res;
>  
>      trace_source_return_path_thread_entry();
> +    rcu_register_thread();
>  
>  retry:
>      while (!ms->rp_state.error && !qemu_file_get_error(rp) &&
> @@ -2167,6 +2168,7 @@ out:
>      trace_source_return_path_thread_end();
>      ms->rp_state.from_dst_file = NULL;
>      qemu_fclose(rp);
> +    rcu_unregister_thread();
>      return NULL;
>  }
>  
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index 48e5155..98613eb 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -853,6 +853,7 @@ static void *postcopy_ram_fault_thread(void *opaque)
>      RAMBlock *rb = NULL;
>  
>      trace_postcopy_ram_fault_thread_entry();
> +    rcu_register_thread();
>      mis->last_rb = NULL; /* last RAMBlock we sent part of */
>      qemu_sem_post(&mis->fault_thread_sem);
>  
> @@ -1059,6 +1060,7 @@ retry:
>              }
>          }
>      }
> +    rcu_unregister_thread();
>      trace_postcopy_ram_fault_thread_exit();
>      g_free(pfd);
>      return NULL;
> diff --git a/migration/ram.c b/migration/ram.c
> index a500015..a674fb5 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -683,6 +683,7 @@ static void *multifd_send_thread(void *opaque)
>      MultiFDSendParams *p = opaque;
>      Error *local_err = NULL;
>  
> +    rcu_register_thread();
>      if (multifd_send_initial_packet(p, &local_err) < 0) {
>          goto out;
>      }
> @@ -706,6 +707,7 @@ out:
>      p->running = false;
>      qemu_mutex_unlock(&p->mutex);
>  
> +    rcu_unregister_thread();
>      return NULL;
>  }
>  
> @@ -819,6 +821,7 @@ static void *multifd_recv_thread(void *opaque)
>  {
>      MultiFDRecvParams *p = opaque;
>  
> +    rcu_register_thread();
>      while (true) {
>          qemu_mutex_lock(&p->mutex);
>          if (p->quit) {
> @@ -833,6 +836,7 @@ static void *multifd_recv_thread(void *opaque)
>      p->running = false;
>      qemu_mutex_unlock(&p->mutex);
>  
> +    rcu_unregister_thread();
>      return NULL;
>  }
>  
> diff --git a/migration/rdma.c b/migration/rdma.c
> index f6705a3..769f443 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -86,6 +86,7 @@ static uint32_t known_capabilities = RDMA_CAPABILITY_PIN_ALL;
>                                  " to abort!"); \
>                  rdma->error_reported = 1; \
>              } \
> +            rcu_read_unlock(); \
>              return rdma->error_state; \
>          } \
>      } while (0)
> @@ -402,7 +403,8 @@ typedef struct QIOChannelRDMA QIOChannelRDMA;
>  
>  struct QIOChannelRDMA {
>      QIOChannel parent;
> -    RDMAContext *rdma;
> +    RDMAContext *rdmain;
> +    RDMAContext *rdmaout;
>      QEMUFile *file;
>      bool blocking; /* XXX we don't actually honour this yet */
>  };
> @@ -2630,12 +2632,20 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
>  {
>      QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
>      QEMUFile *f = rioc->file;
> -    RDMAContext *rdma = rioc->rdma;
> +    RDMAContext *rdma;
>      int ret;
>      ssize_t done = 0;
>      size_t i;
>      size_t len = 0;
>  
> +    rcu_read_lock();
> +    rdma = atomic_rcu_read(&rioc->rdmaout);
> +
> +    if (!rdma) {
> +        rcu_read_unlock();
> +        return -EIO;
> +    }
> +
>      CHECK_ERROR_STATE();
>  
>      /*
> @@ -2645,6 +2655,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
>      ret = qemu_rdma_write_flush(f, rdma);
>      if (ret < 0) {
>          rdma->error_state = ret;
> +        rcu_read_unlock();
>          return ret;
>      }
>  
> @@ -2664,6 +2675,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
>  
>              if (ret < 0) {
>                  rdma->error_state = ret;
> +                rcu_read_unlock();
>                  return ret;
>              }
>  
> @@ -2672,6 +2684,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
>          }
>      }
>  
> +    rcu_read_unlock();
>      return done;
>  }
>  
> @@ -2705,12 +2718,20 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
>                                        Error **errp)
>  {
>      QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
> -    RDMAContext *rdma = rioc->rdma;
> +    RDMAContext *rdma;
>      RDMAControlHeader head;
>      int ret = 0;
>      ssize_t i;
>      size_t done = 0;
>  
> +    rcu_read_lock();
> +    rdma = atomic_rcu_read(&rioc->rdmain);
> +
> +    if (!rdma) {
> +        rcu_read_unlock();
> +        return -EIO;
> +    }
> +
>      CHECK_ERROR_STATE();
>  
>      for (i = 0; i < niov; i++) {
> @@ -2722,7 +2743,7 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
>           * were given and dish out the bytes until we run
>           * out of bytes.
>           */
> -        ret = qemu_rdma_fill(rioc->rdma, data, want, 0);
> +        ret = qemu_rdma_fill(rdma, data, want, 0);
>          done += ret;
>          want -= ret;
>          /* Got what we needed, so go to next iovec */
> @@ -2744,25 +2765,28 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
>  
>          if (ret < 0) {
>              rdma->error_state = ret;
> +            rcu_read_unlock();
>              return ret;
>          }
>  
>          /*
>           * SEND was received with new bytes, now try again.
>           */
> -        ret = qemu_rdma_fill(rioc->rdma, data, want, 0);
> +        ret = qemu_rdma_fill(rdma, data, want, 0);
>          done += ret;
>          want -= ret;
>  
>          /* Still didn't get enough, so lets just return */
>          if (want) {
>              if (done == 0) {
> +                rcu_read_unlock();
>                  return QIO_CHANNEL_ERR_BLOCK;
>              } else {
>                  break;
>              }
>          }
>      }
> +    rcu_read_unlock();
>      return done;
>  }
>  
> @@ -2814,15 +2838,29 @@ qio_channel_rdma_source_prepare(GSource *source,
>                                  gint *timeout)
>  {
>      QIOChannelRDMASource *rsource = (QIOChannelRDMASource *)source;
> -    RDMAContext *rdma = rsource->rioc->rdma;
> +    RDMAContext *rdma;
>      GIOCondition cond = 0;
>      *timeout = -1;
>  
> +    rcu_read_lock();
> +    if (rsource->condition == G_IO_IN) {
> +        rdma = atomic_rcu_read(&rsource->rioc->rdmain);
> +    } else {
> +        rdma = atomic_rcu_read(&rsource->rioc->rdmaout);
> +    }
> +
> +    if (!rdma) {
> +        error_report("RDMAContext is NULL when prepare Gsource");
> +        rcu_read_unlock();
> +        return FALSE;
> +    }
> +
>      if (rdma->wr_data[0].control_len) {
>          cond |= G_IO_IN;
>      }
>      cond |= G_IO_OUT;
>  
> +    rcu_read_unlock();
>      return cond & rsource->condition;
>  }
>  
> @@ -2830,14 +2868,28 @@ static gboolean
>  qio_channel_rdma_source_check(GSource *source)
>  {
>      QIOChannelRDMASource *rsource = (QIOChannelRDMASource *)source;
> -    RDMAContext *rdma = rsource->rioc->rdma;
> +    RDMAContext *rdma;
>      GIOCondition cond = 0;
>  
> +    rcu_read_lock();
> +    if (rsource->condition == G_IO_IN) {
> +        rdma = atomic_rcu_read(&rsource->rioc->rdmain);
> +    } else {
> +        rdma = atomic_rcu_read(&rsource->rioc->rdmaout);
> +    }
> +
> +    if (!rdma) {
> +        error_report("RDMAContext is NULL when check Gsource");
> +        rcu_read_unlock();
> +        return FALSE;
> +    }
> +
>      if (rdma->wr_data[0].control_len) {
>          cond |= G_IO_IN;
>      }
>      cond |= G_IO_OUT;
>  
> +    rcu_read_unlock();
>      return cond & rsource->condition;
>  }
>  
> @@ -2848,14 +2900,28 @@ qio_channel_rdma_source_dispatch(GSource *source,
>  {
>      QIOChannelFunc func = (QIOChannelFunc)callback;
>      QIOChannelRDMASource *rsource = (QIOChannelRDMASource *)source;
> -    RDMAContext *rdma = rsource->rioc->rdma;
> +    RDMAContext *rdma;
>      GIOCondition cond = 0;
>  
> +    rcu_read_lock();
> +    if (rsource->condition == G_IO_IN) {
> +        rdma = atomic_rcu_read(&rsource->rioc->rdmain);
> +    } else {
> +        rdma = atomic_rcu_read(&rsource->rioc->rdmaout);
> +    }
> +
> +    if (!rdma) {
> +        error_report("RDMAContext is NULL when dispatch Gsource");
> +        rcu_read_unlock();
> +        return FALSE;
> +    }
> +
>      if (rdma->wr_data[0].control_len) {
>          cond |= G_IO_IN;
>      }
>      cond |= G_IO_OUT;
>  
> +    rcu_read_unlock();
>      return (*func)(QIO_CHANNEL(rsource->rioc),
>                     (cond & rsource->condition),
>                     user_data);
> @@ -2900,15 +2966,32 @@ static int qio_channel_rdma_close(QIOChannel *ioc,
>                                    Error **errp)
>  {
>      QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
> +    RDMAContext *rdmain, *rdmaout;
>      trace_qemu_rdma_close();
> -    if (rioc->rdma) {
> -        if (!rioc->rdma->error_state) {
> -            rioc->rdma->error_state = qemu_file_get_error(rioc->file);
> -        }
> -        qemu_rdma_cleanup(rioc->rdma);
> -        g_free(rioc->rdma);
> -        rioc->rdma = NULL;
> +
> +    rdmain = rioc->rdmain;
> +    if (rdmain) {
> +        atomic_rcu_set(&rioc->rdmain, NULL);
> +    }
> +
> +    rdmaout = rioc->rdmaout;
> +    if (rdmaout) {
> +        atomic_rcu_set(&rioc->rdmaout, NULL);
>      }
> +
> +    synchronize_rcu();
> +
> +    if (rdmain) {
> +        qemu_rdma_cleanup(rdmain);
> +    }
> +
> +    if (rdmaout) {
> +        qemu_rdma_cleanup(rdmaout);
> +    }
> +
> +    g_free(rdmain);
> +    g_free(rdmaout);
> +
>      return 0;
>  }
>  
> @@ -2951,12 +3034,21 @@ static size_t qemu_rdma_save_page(QEMUFile *f, void *opaque,
>                                    size_t size, uint64_t *bytes_sent)
>  {
>      QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(opaque);
> -    RDMAContext *rdma = rioc->rdma;
> +    RDMAContext *rdma;
>      int ret;
>  
> +    rcu_read_lock();
> +    rdma = atomic_rcu_read(&rioc->rdmaout);
> +
> +    if (!rdma) {
> +        rcu_read_unlock();
> +        return -EIO;
> +    }
> +
>      CHECK_ERROR_STATE();
>  
>      if (migrate_get_current()->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
> +        rcu_read_unlock();
>          return RAM_SAVE_CONTROL_NOT_SUPP;
>      }
>  
> @@ -3041,9 +3133,11 @@ static size_t qemu_rdma_save_page(QEMUFile *f, void *opaque,
>          }
>      }
>  
> +    rcu_read_unlock();
>      return RAM_SAVE_CONTROL_DELAYED;
>  err:
>      rdma->error_state = ret;
> +    rcu_read_unlock();
>      return ret;
>  }
>  
> @@ -3219,8 +3313,8 @@ static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque)
>      RDMAControlHeader blocks = { .type = RDMA_CONTROL_RAM_BLOCKS_RESULT,
>                                   .repeat = 1 };
>      QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(opaque);
> -    RDMAContext *rdma = rioc->rdma;
> -    RDMALocalBlocks *local = &rdma->local_ram_blocks;
> +    RDMAContext *rdma;
> +    RDMALocalBlocks *local;
>      RDMAControlHeader head;
>      RDMARegister *reg, *registers;
>      RDMACompress *comp;
> @@ -3233,8 +3327,17 @@ static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque)
>      int count = 0;
>      int i = 0;
>  
> +    rcu_read_lock();
> +    rdma = atomic_rcu_read(&rioc->rdmain);
> +
> +    if (!rdma) {
> +        rcu_read_unlock();
> +        return -EIO;
> +    }
> +
>      CHECK_ERROR_STATE();
>  
> +    local = &rdma->local_ram_blocks;
>      do {
>          trace_qemu_rdma_registration_handle_wait();
>  
> @@ -3468,6 +3571,7 @@ out:
>      if (ret < 0) {
>          rdma->error_state = ret;
>      }
> +    rcu_read_unlock();
>      return ret;
>  }
>  
> @@ -3481,10 +3585,18 @@ out:
>  static int
>  rdma_block_notification_handle(QIOChannelRDMA *rioc, const char *name)
>  {
> -    RDMAContext *rdma = rioc->rdma;
> +    RDMAContext *rdma;
>      int curr;
>      int found = -1;
>  
> +    rcu_read_lock();
> +    rdma = atomic_rcu_read(&rioc->rdmain);
> +
> +    if (!rdma) {
> +        rcu_read_unlock();
> +        return -EIO;
> +    }
> +
>      /* Find the matching RAMBlock in our local list */
>      for (curr = 0; curr < rdma->local_ram_blocks.nb_blocks; curr++) {
>          if (!strcmp(rdma->local_ram_blocks.block[curr].block_name, name)) {
> @@ -3495,6 +3607,7 @@ rdma_block_notification_handle(QIOChannelRDMA *rioc, const char *name)
>  
>      if (found == -1) {
>          error_report("RAMBlock '%s' not found on destination", name);
> +        rcu_read_unlock();
>          return -ENOENT;
>      }
>  
> @@ -3502,6 +3615,7 @@ rdma_block_notification_handle(QIOChannelRDMA *rioc, const char *name)
>      trace_rdma_block_notification_handle(name, rdma->next_src_index);
>      rdma->next_src_index++;
>  
> +    rcu_read_unlock();
>      return 0;
>  }
>  
> @@ -3524,11 +3638,19 @@ static int qemu_rdma_registration_start(QEMUFile *f, void *opaque,
>                                          uint64_t flags, void *data)
>  {
>      QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(opaque);
> -    RDMAContext *rdma = rioc->rdma;
> +    RDMAContext *rdma;
> +
> +    rcu_read_lock();
> +    rdma = atomic_rcu_read(&rioc->rdmaout);
> +    if (!rdma) {
> +        rcu_read_unlock();
> +        return -EIO;
> +    }
>  
>      CHECK_ERROR_STATE();
>  
>      if (migrate_get_current()->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
> +        rcu_read_unlock();
>          return 0;
>      }
>  
> @@ -3536,6 +3658,7 @@ static int qemu_rdma_registration_start(QEMUFile *f, void *opaque,
>      qemu_put_be64(f, RAM_SAVE_FLAG_HOOK);
>      qemu_fflush(f);
>  
> +    rcu_read_unlock();
>      return 0;
>  }
>  
> @@ -3548,13 +3671,21 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
>  {
>      Error *local_err = NULL, **errp = &local_err;
>      QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(opaque);
> -    RDMAContext *rdma = rioc->rdma;
> +    RDMAContext *rdma;
>      RDMAControlHeader head = { .len = 0, .repeat = 1 };
>      int ret = 0;
>  
> +    rcu_read_lock();
> +    rdma = atomic_rcu_read(&rioc->rdmaout);
> +    if (!rdma) {
> +        rcu_read_unlock();
> +        return -EIO;
> +    }
> +
>      CHECK_ERROR_STATE();
>  
>      if (migrate_get_current()->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
> +        rcu_read_unlock();
>          return 0;
>      }
>  
> @@ -3586,6 +3717,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
>                      qemu_rdma_reg_whole_ram_blocks : NULL);
>          if (ret < 0) {
>              ERROR(errp, "receiving remote info!");
> +            rcu_read_unlock();
>              return ret;
>          }
>  
> @@ -3609,6 +3741,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
>                          "not identical on both the source and destination.",
>                          local->nb_blocks, nb_dest_blocks);
>              rdma->error_state = -EINVAL;
> +            rcu_read_unlock();
>              return -EINVAL;
>          }
>  
> @@ -3625,6 +3758,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
>                              local->block[i].length,
>                              rdma->dest_blocks[i].length);
>                  rdma->error_state = -EINVAL;
> +                rcu_read_unlock();
>                  return -EINVAL;
>              }
>              local->block[i].remote_host_addr =
> @@ -3642,9 +3776,11 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
>          goto err;
>      }
>  
> +    rcu_read_unlock();
>      return 0;
>  err:
>      rdma->error_state = ret;
> +    rcu_read_unlock();
>      return ret;
>  }
>  
> @@ -3662,10 +3798,15 @@ static const QEMUFileHooks rdma_write_hooks = {
>  static void qio_channel_rdma_finalize(Object *obj)
>  {
>      QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(obj);
> -    if (rioc->rdma) {
> -        qemu_rdma_cleanup(rioc->rdma);
> -        g_free(rioc->rdma);
> -        rioc->rdma = NULL;
> +    if (rioc->rdmain) {
> +        qemu_rdma_cleanup(rioc->rdmain);
> +        g_free(rioc->rdmain);
> +        rioc->rdmain = NULL;
> +    }
> +    if (rioc->rdmaout) {
> +        qemu_rdma_cleanup(rioc->rdmaout);
> +        g_free(rioc->rdmaout);
> +        rioc->rdmaout = NULL;
>      }
>  }
>  
> @@ -3705,13 +3846,16 @@ static QEMUFile *qemu_fopen_rdma(RDMAContext *rdma, const char *mode)
>      }
>  
>      rioc = QIO_CHANNEL_RDMA(object_new(TYPE_QIO_CHANNEL_RDMA));
> -    rioc->rdma = rdma;
>  
>      if (mode[0] == 'w') {
>          rioc->file = qemu_fopen_channel_output(QIO_CHANNEL(rioc));
> +        rioc->rdmaout = rdma;
> +        rioc->rdmain = rdma->return_path;
>          qemu_file_set_hooks(rioc->file, &rdma_write_hooks);
>      } else {
>          rioc->file = qemu_fopen_channel_input(QIO_CHANNEL(rioc));
> +        rioc->rdmain = rdma;
> +        rioc->rdmaout = rdma->return_path;
>          qemu_file_set_hooks(rioc->file, &rdma_read_hooks);
>      }
>  
> diff --git a/migration/savevm.c b/migration/savevm.c
> index c2f34ff..21c07d4 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1622,6 +1622,7 @@ static void *postcopy_ram_listen_thread(void *opaque)
>      qemu_sem_post(&mis->listen_thread_sem);
>      trace_postcopy_ram_listen_thread_start();
>  
> +    rcu_register_thread();
>      /*
>       * Because we're a thread and not a coroutine we can't yield
>       * in qemu_file, and thus we must be blocking now.
> @@ -1662,6 +1663,7 @@ static void *postcopy_ram_listen_thread(void *opaque)
>           * to leave the guest running and fire MCEs for pages that never
>           * arrived as a desperate recovery step.
>           */
> +        rcu_unregister_thread();
>          exit(EXIT_FAILURE);
>      }
>  
> @@ -1676,6 +1678,7 @@ static void *postcopy_ram_listen_thread(void *opaque)
>      migration_incoming_state_destroy();
>      qemu_loadvm_state_cleanup();
>  
> +    rcu_unregister_thread();
>      return NULL;
>  }
>  
> -- 
> 1.8.3.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v5 08/10] migration: create a dedicated thread to release rdma resource
  2018-06-05 15:28 ` [Qemu-devel] [PATCH v5 08/10] migration: create a dedicated thread to release rdma resource Lidong Chen
@ 2018-06-27 18:59   ` Dr. David Alan Gilbert
  2018-07-05 14:26     ` 858585 jemmy
  0 siblings, 1 reply; 22+ messages in thread
From: Dr. David Alan Gilbert @ 2018-06-27 18:59 UTC (permalink / raw)
  To: Lidong Chen
  Cc: zhang.zhanghailiang, quintela, berrange, aviadye, pbonzini,
	qemu-devel, adido, galsha, Lidong Chen

* Lidong Chen (jemmy858585@gmail.com) wrote:
> ibv_dereg_mr wait for a long time for big memory size virtual server.
> 
> The test result is:
>   10GB  326ms
>   20GB  699ms
>   30GB  1021ms
>   40GB  1387ms
>   50GB  1712ms
>   60GB  2034ms
>   70GB  2457ms
>   80GB  2807ms
>   90GB  3107ms
>   100GB 3474ms
>   110GB 3735ms
>   120GB 4064ms
>   130GB 4567ms
>   140GB 4886ms
> 
> this will cause the guest os hang for a while when migration finished.
> So create a dedicated thread to release rdma resource.
> 
> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
> ---
>  migration/rdma.c | 43 +++++++++++++++++++++++++++----------------
>  1 file changed, 27 insertions(+), 16 deletions(-)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index dfa4f77..f12e8d5 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -2979,35 +2979,46 @@ static void qio_channel_rdma_set_aio_fd_handler(QIOChannel *ioc,
>      }
>  }
>  
> -static int qio_channel_rdma_close(QIOChannel *ioc,
> -                                  Error **errp)
> +static void *qio_channel_rdma_close_thread(void *arg)
>  {
> -    QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
> -    RDMAContext *rdmain, *rdmaout;
> -    trace_qemu_rdma_close();
> +    RDMAContext **rdma = arg;
> +    RDMAContext *rdmain = rdma[0];
> +    RDMAContext *rdmaout = rdma[1];
>  
> -    rdmain = rioc->rdmain;
> -    if (rdmain) {
> -        atomic_rcu_set(&rioc->rdmain, NULL);
> -    }
> -
> -    rdmaout = rioc->rdmaout;
> -    if (rdmaout) {
> -        atomic_rcu_set(&rioc->rdmaout, NULL);
> -    }
> +    rcu_register_thread();
>  
>      synchronize_rcu();

* see below

> -
>      if (rdmain) {
>          qemu_rdma_cleanup(rdmain);
>      }
> -
>      if (rdmaout) {
>          qemu_rdma_cleanup(rdmaout);
>      }
>  
>      g_free(rdmain);
>      g_free(rdmaout);
> +    g_free(rdma);
> +
> +    rcu_unregister_thread();
> +    return NULL;
> +}
> +
> +static int qio_channel_rdma_close(QIOChannel *ioc,
> +                                  Error **errp)
> +{
> +    QemuThread t;
> +    QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
> +    RDMAContext **rdma = g_new0(RDMAContext*, 2);
> +
> +    trace_qemu_rdma_close();
> +    if (rioc->rdmain || rioc->rdmaout) {
> +        rdma[0] =  rioc->rdmain;
> +        rdma[1] =  rioc->rdmaout;
> +        qemu_thread_create(&t, "rdma cleanup", qio_channel_rdma_close_thread,
> +                           rdma, QEMU_THREAD_DETACHED);
> +        atomic_rcu_set(&rioc->rdmain, NULL);
> +        atomic_rcu_set(&rioc->rdmaout, NULL);

I'm not sure this pair is ordered with the synchronise_rcu above;
Doesn't that mean, on a bad day, that you could get:


    main-thread          rdma_cleanup     another-thread
    qmu_thread_create
                      synchronise_rcu
                                        reads rioc->rdmain
                                        starts doing something with rdmain
    atomic_rcu_set
                      rdma_cleanup


so the another-thread is using it during the cleanup?
Would just moving the atomic_rcu_sets before the qemu_thread_create
fix that?

However, I've got other worries as well:
   a) qemu_rdma_cleanup does:
       migrate_get_current()->state == MIGRATION_STATUS_CANCELLING

      which worries me a little if someone immediately tries to restart
      the migration.

   b) I don't understand what happens if someone does try and restart
      the migration after that, but in the ~5s it takes the ibv cleanup
      to happen.

Dave

    
> +    }
>  
>      return 0;
>  }
> -- 
> 1.8.3.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v5 08/10] migration: create a dedicated thread to release rdma resource
  2018-06-27 18:59   ` Dr. David Alan Gilbert
@ 2018-07-05 14:26     ` 858585 jemmy
  2018-07-19  5:49       ` 858585 jemmy
  0 siblings, 1 reply; 22+ messages in thread
From: 858585 jemmy @ 2018-07-05 14:26 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: zhang.zhanghailiang, Juan Quintela, Daniel P. Berrange,
	Aviad Yehezkel, Paolo Bonzini, qemu-devel, Adi Dotan,
	Gal Shachaf, Lidong Chen

On Thu, Jun 28, 2018 at 2:59 AM, Dr. David Alan Gilbert
<dgilbert@redhat.com> wrote:
> * Lidong Chen (jemmy858585@gmail.com) wrote:
>> ibv_dereg_mr wait for a long time for big memory size virtual server.
>>
>> The test result is:
>>   10GB  326ms
>>   20GB  699ms
>>   30GB  1021ms
>>   40GB  1387ms
>>   50GB  1712ms
>>   60GB  2034ms
>>   70GB  2457ms
>>   80GB  2807ms
>>   90GB  3107ms
>>   100GB 3474ms
>>   110GB 3735ms
>>   120GB 4064ms
>>   130GB 4567ms
>>   140GB 4886ms
>>
>> this will cause the guest os hang for a while when migration finished.
>> So create a dedicated thread to release rdma resource.
>>
>> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
>> ---
>>  migration/rdma.c | 43 +++++++++++++++++++++++++++----------------
>>  1 file changed, 27 insertions(+), 16 deletions(-)
>>
>> diff --git a/migration/rdma.c b/migration/rdma.c
>> index dfa4f77..f12e8d5 100644
>> --- a/migration/rdma.c
>> +++ b/migration/rdma.c
>> @@ -2979,35 +2979,46 @@ static void qio_channel_rdma_set_aio_fd_handler(QIOChannel *ioc,
>>      }
>>  }
>>
>> -static int qio_channel_rdma_close(QIOChannel *ioc,
>> -                                  Error **errp)
>> +static void *qio_channel_rdma_close_thread(void *arg)
>>  {
>> -    QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
>> -    RDMAContext *rdmain, *rdmaout;
>> -    trace_qemu_rdma_close();
>> +    RDMAContext **rdma = arg;
>> +    RDMAContext *rdmain = rdma[0];
>> +    RDMAContext *rdmaout = rdma[1];
>>
>> -    rdmain = rioc->rdmain;
>> -    if (rdmain) {
>> -        atomic_rcu_set(&rioc->rdmain, NULL);
>> -    }
>> -
>> -    rdmaout = rioc->rdmaout;
>> -    if (rdmaout) {
>> -        atomic_rcu_set(&rioc->rdmaout, NULL);
>> -    }
>> +    rcu_register_thread();
>>
>>      synchronize_rcu();
>
> * see below
>
>> -
>>      if (rdmain) {
>>          qemu_rdma_cleanup(rdmain);
>>      }
>> -
>>      if (rdmaout) {
>>          qemu_rdma_cleanup(rdmaout);
>>      }
>>
>>      g_free(rdmain);
>>      g_free(rdmaout);
>> +    g_free(rdma);
>> +
>> +    rcu_unregister_thread();
>> +    return NULL;
>> +}
>> +
>> +static int qio_channel_rdma_close(QIOChannel *ioc,
>> +                                  Error **errp)
>> +{
>> +    QemuThread t;
>> +    QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
>> +    RDMAContext **rdma = g_new0(RDMAContext*, 2);
>> +
>> +    trace_qemu_rdma_close();
>> +    if (rioc->rdmain || rioc->rdmaout) {
>> +        rdma[0] =  rioc->rdmain;
>> +        rdma[1] =  rioc->rdmaout;
>> +        qemu_thread_create(&t, "rdma cleanup", qio_channel_rdma_close_thread,
>> +                           rdma, QEMU_THREAD_DETACHED);
>> +        atomic_rcu_set(&rioc->rdmain, NULL);
>> +        atomic_rcu_set(&rioc->rdmaout, NULL);
>
> I'm not sure this pair is ordered with the synchronise_rcu above;
> Doesn't that mean, on a bad day, that you could get:
>
>
>     main-thread          rdma_cleanup     another-thread
>     qmu_thread_create
>                       synchronise_rcu
>                                         reads rioc->rdmain
>                                         starts doing something with rdmain
>     atomic_rcu_set
>                       rdma_cleanup
>
>
> so the another-thread is using it during the cleanup?
> Would just moving the atomic_rcu_sets before the qemu_thread_create
> fix that?
yes, I will fix it.

>
> However, I've got other worries as well:
>    a) qemu_rdma_cleanup does:
>        migrate_get_current()->state == MIGRATION_STATUS_CANCELLING
>
>       which worries me a little if someone immediately tries to restart
>       the migration.
>
>    b) I don't understand what happens if someone does try and restart
>       the migration after that, but in the ~5s it takes the ibv cleanup
>       to happen.

yes, I will try to fix it.

>
> Dave
>
>
>> +    }
>>
>>      return 0;
>>  }
>> --
>> 1.8.3.1
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v5 08/10] migration: create a dedicated thread to release rdma resource
  2018-07-05 14:26     ` 858585 jemmy
@ 2018-07-19  5:49       ` 858585 jemmy
  2018-07-23 14:54         ` Gal Shachaf
  0 siblings, 1 reply; 22+ messages in thread
From: 858585 jemmy @ 2018-07-19  5:49 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, Aviad Yehezkel, Gal Shachaf
  Cc: zhang.zhanghailiang, Juan Quintela, Daniel P. Berrange,
	Paolo Bonzini, qemu-devel, Adi Dotan, Lidong Chen

On Thu, Jul 5, 2018 at 10:26 PM, 858585 jemmy <jemmy858585@gmail.com> wrote:
> On Thu, Jun 28, 2018 at 2:59 AM, Dr. David Alan Gilbert
> <dgilbert@redhat.com> wrote:
>> * Lidong Chen (jemmy858585@gmail.com) wrote:
>>> ibv_dereg_mr wait for a long time for big memory size virtual server.
>>>
>>> The test result is:
>>>   10GB  326ms
>>>   20GB  699ms
>>>   30GB  1021ms
>>>   40GB  1387ms
>>>   50GB  1712ms
>>>   60GB  2034ms
>>>   70GB  2457ms
>>>   80GB  2807ms
>>>   90GB  3107ms
>>>   100GB 3474ms
>>>   110GB 3735ms
>>>   120GB 4064ms
>>>   130GB 4567ms
>>>   140GB 4886ms
>>>
>>> this will cause the guest os hang for a while when migration finished.
>>> So create a dedicated thread to release rdma resource.
>>>
>>> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
>>> ---
>>>  migration/rdma.c | 43 +++++++++++++++++++++++++++----------------
>>>  1 file changed, 27 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/migration/rdma.c b/migration/rdma.c
>>> index dfa4f77..f12e8d5 100644
>>> --- a/migration/rdma.c
>>> +++ b/migration/rdma.c
>>> @@ -2979,35 +2979,46 @@ static void qio_channel_rdma_set_aio_fd_handler(QIOChannel *ioc,
>>>      }
>>>  }
>>>
>>> -static int qio_channel_rdma_close(QIOChannel *ioc,
>>> -                                  Error **errp)
>>> +static void *qio_channel_rdma_close_thread(void *arg)
>>>  {
>>> -    QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
>>> -    RDMAContext *rdmain, *rdmaout;
>>> -    trace_qemu_rdma_close();
>>> +    RDMAContext **rdma = arg;
>>> +    RDMAContext *rdmain = rdma[0];
>>> +    RDMAContext *rdmaout = rdma[1];
>>>
>>> -    rdmain = rioc->rdmain;
>>> -    if (rdmain) {
>>> -        atomic_rcu_set(&rioc->rdmain, NULL);
>>> -    }
>>> -
>>> -    rdmaout = rioc->rdmaout;
>>> -    if (rdmaout) {
>>> -        atomic_rcu_set(&rioc->rdmaout, NULL);
>>> -    }
>>> +    rcu_register_thread();
>>>
>>>      synchronize_rcu();
>>
>> * see below
>>
>>> -
>>>      if (rdmain) {
>>>          qemu_rdma_cleanup(rdmain);
>>>      }
>>> -
>>>      if (rdmaout) {
>>>          qemu_rdma_cleanup(rdmaout);
>>>      }
>>>
>>>      g_free(rdmain);
>>>      g_free(rdmaout);
>>> +    g_free(rdma);
>>> +
>>> +    rcu_unregister_thread();
>>> +    return NULL;
>>> +}
>>> +
>>> +static int qio_channel_rdma_close(QIOChannel *ioc,
>>> +                                  Error **errp)
>>> +{
>>> +    QemuThread t;
>>> +    QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
>>> +    RDMAContext **rdma = g_new0(RDMAContext*, 2);
>>> +
>>> +    trace_qemu_rdma_close();
>>> +    if (rioc->rdmain || rioc->rdmaout) {
>>> +        rdma[0] =  rioc->rdmain;
>>> +        rdma[1] =  rioc->rdmaout;
>>> +        qemu_thread_create(&t, "rdma cleanup", qio_channel_rdma_close_thread,
>>> +                           rdma, QEMU_THREAD_DETACHED);
>>> +        atomic_rcu_set(&rioc->rdmain, NULL);
>>> +        atomic_rcu_set(&rioc->rdmaout, NULL);
>>
>> I'm not sure this pair is ordered with the synchronise_rcu above;
>> Doesn't that mean, on a bad day, that you could get:
>>
>>
>>     main-thread          rdma_cleanup     another-thread
>>     qmu_thread_create
>>                       synchronise_rcu
>>                                         reads rioc->rdmain
>>                                         starts doing something with rdmain
>>     atomic_rcu_set
>>                       rdma_cleanup
>>
>>
>> so the another-thread is using it during the cleanup?
>> Would just moving the atomic_rcu_sets before the qemu_thread_create
>> fix that?
> yes, I will fix it.
>
>>
>> However, I've got other worries as well:
>>    a) qemu_rdma_cleanup does:
>>        migrate_get_current()->state == MIGRATION_STATUS_CANCELLING
>>
>>       which worries me a little if someone immediately tries to restart
>>       the migration.

Because the current qemu version don't wait for
RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnect,
so I think it's not necessary to send RDMA_CONTROL_ERROR.

compare to send RDMA_CONTROL_ERROR, I think use cm event to notify
peer qemu is better.
maybe the rdma is already in error_state, and RDMA_CONTROL_ERROR
cannot send successfully.

For peer qemu, in qemu_rdma_wait_comp_channel function, it's should
not only poll rdma->comp_channel->fd,
it's should also poll  rdma->channel->fd.

for the source qemu, it's fixed by this patch.
migration: poll the cm event while wait RDMA work request completion

and for destination qemu, it's need a new patch to fix it.

so I prefer to remove MIGRATION_STATUS_CANCELLING in qemu_rdma_cleanup function.

>>
>>    b) I don't understand what happens if someone does try and restart
>>       the migration after that, but in the ~5s it takes the ibv cleanup
>>       to happen.

I prefer to add a new variable in current_migration.  if the rdma
cleanup thread has not
exited. it's should not start a new migration.

>
> yes, I will try to fix it.
>
>>
>> Dave
>>
>>
>>> +    }
>>>
>>>      return 0;
>>>  }
>>> --
>>> 1.8.3.1
>>>
>> --
>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v5 08/10] migration: create a dedicated thread to release rdma resource
  2018-07-19  5:49       ` 858585 jemmy
@ 2018-07-23 14:54         ` Gal Shachaf
  2018-07-27  5:34           ` 858585 jemmy
  0 siblings, 1 reply; 22+ messages in thread
From: Gal Shachaf @ 2018-07-23 14:54 UTC (permalink / raw)
  To: 858585 jemmy, Dr. David Alan Gilbert, Aviad Yehezkel
  Cc: zhang.zhanghailiang, Juan Quintela, Daniel P. Berrange,
	Paolo Bonzini, qemu-devel, Adi Dotan, Lidong Chen

On Thu, Jul 5, 2018 at 10:26 PM, 858585 jemmy <jemmy858585@gmail.com> wrote:
> On Thu, Jun 28, 2018 at 2:59 AM, Dr. David Alan Gilbert 
> <dgilbert@redhat.com> wrote:
>> * Lidong Chen (jemmy858585@gmail.com) wrote:
>>> ibv_dereg_mr wait for a long time for big memory size virtual server.
>>>
>>> The test result is:
>>>   10GB  326ms
>>>   20GB  699ms
>>>   30GB  1021ms
>>>   40GB  1387ms
>>>   50GB  1712ms
>>>   60GB  2034ms
>>>   70GB  2457ms
>>>   80GB  2807ms
>>>   90GB  3107ms
>>>   100GB 3474ms
>>>   110GB 3735ms
>>>   120GB 4064ms
>>>   130GB 4567ms
>>>   140GB 4886ms
>>>
>>> this will cause the guest os hang for a while when migration finished.
>>> So create a dedicated thread to release rdma resource.
>>>
>>> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
>>> ---
>>>  migration/rdma.c | 43 +++++++++++++++++++++++++++----------------
>>>  1 file changed, 27 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/migration/rdma.c b/migration/rdma.c index 
>>> dfa4f77..f12e8d5 100644
>>> --- a/migration/rdma.c
>>> +++ b/migration/rdma.c
>>> @@ -2979,35 +2979,46 @@ static void qio_channel_rdma_set_aio_fd_handler(QIOChannel *ioc,
>>>      }
>>>  }
>>>
>>> -static int qio_channel_rdma_close(QIOChannel *ioc,
>>> -                                  Error **errp)
>>> +static void *qio_channel_rdma_close_thread(void *arg)
>>>  {
>>> -    QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
>>> -    RDMAContext *rdmain, *rdmaout;
>>> -    trace_qemu_rdma_close();
>>> +    RDMAContext **rdma = arg;
>>> +    RDMAContext *rdmain = rdma[0];
>>> +    RDMAContext *rdmaout = rdma[1];
>>>
>>> -    rdmain = rioc->rdmain;
>>> -    if (rdmain) {
>>> -        atomic_rcu_set(&rioc->rdmain, NULL);
>>> -    }
>>> -
>>> -    rdmaout = rioc->rdmaout;
>>> -    if (rdmaout) {
>>> -        atomic_rcu_set(&rioc->rdmaout, NULL);
>>> -    }
>>> +    rcu_register_thread();
>>>
>>>      synchronize_rcu();
>>
>> * see below
>>
>>> -
>>>      if (rdmain) {
>>>          qemu_rdma_cleanup(rdmain);
>>>      }
>>> -
>>>      if (rdmaout) {
>>>          qemu_rdma_cleanup(rdmaout);
>>>      }
>>>
>>>      g_free(rdmain);
>>>      g_free(rdmaout);
>>> +    g_free(rdma);
>>> +
>>> +    rcu_unregister_thread();
>>> +    return NULL;
>>> +}
>>> +
>>> +static int qio_channel_rdma_close(QIOChannel *ioc,
>>> +                                  Error **errp) {
>>> +    QemuThread t;
>>> +    QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
>>> +    RDMAContext **rdma = g_new0(RDMAContext*, 2);
>>> +
>>> +    trace_qemu_rdma_close();
>>> +    if (rioc->rdmain || rioc->rdmaout) {
>>> +        rdma[0] =  rioc->rdmain;
>>> +        rdma[1] =  rioc->rdmaout;
>>> +        qemu_thread_create(&t, "rdma cleanup", qio_channel_rdma_close_thread,
>>> +                           rdma, QEMU_THREAD_DETACHED);
>>> +        atomic_rcu_set(&rioc->rdmain, NULL);
>>> +        atomic_rcu_set(&rioc->rdmaout, NULL);
>>
>> I'm not sure this pair is ordered with the synchronise_rcu above; 
>> Doesn't that mean, on a bad day, that you could get:
>>
>>
>>     main-thread          rdma_cleanup     another-thread
>>     qmu_thread_create
>>                       synchronise_rcu
>>                                         reads rioc->rdmain
>>                                         starts doing something with rdmain
>>     atomic_rcu_set
>>                       rdma_cleanup
>>
>>
>> so the another-thread is using it during the cleanup?
>> Would just moving the atomic_rcu_sets before the qemu_thread_create 
>> fix that?
> yes, I will fix it.
>
>>
>> However, I've got other worries as well:
>>    a) qemu_rdma_cleanup does:
>>        migrate_get_current()->state == MIGRATION_STATUS_CANCELLING
>>
>>       which worries me a little if someone immediately tries to restart
>>       the migration.

Because the current qemu version don't wait for RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnect, so I think it's not necessary to send RDMA_CONTROL_ERROR.

compare to send RDMA_CONTROL_ERROR, I think use cm event to notify peer qemu is better.
maybe the rdma is already in error_state, and RDMA_CONTROL_ERROR cannot send successfully.

For peer qemu, in qemu_rdma_wait_comp_channel function, it's should not only poll rdma->comp_channel->fd, it's should also poll  rdma->channel->fd.
[GS] I agree that we should poll on CM events in addition to CQ events.

for the source qemu, it's fixed by this patch.
migration: poll the cm event while wait RDMA work request completion

and for destination qemu, it's need a new patch to fix it.

so I prefer to remove MIGRATION_STATUS_CANCELLING in qemu_rdma_cleanup function.

[GS] The intention of this patch is to move the costly ibv_dereg_mr to a separate thread, so we do not actually need to also move the RDMA_CONTROL_ERROR  and rdma_disconnect, that come beforehand in qemu_rdma_cleanup,  to a thread. I suggest that we move them out of qemu_rdma_cleanup to a new function "qemu_rdma_disconnect" and call it on each RDMAcontext before the qemu_thread_create.
>>
>>    b) I don't understand what happens if someone does try and restart
>>       the migration after that, but in the ~5s it takes the ibv cleanup
>>       to happen.

I prefer to add a new variable in current_migration.  if the rdma cleanup thread has not exited. it's should not start a new migration.
[GS] I support this suggestion. 

>
> yes, I will try to fix it.
>
>>
>> Dave
>>
>>
>>> +    }
>>>
>>>      return 0;
>>>  }
>>> --
>>> 1.8.3.1
>>>
>> --
>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v5 08/10] migration: create a dedicated thread to release rdma resource
  2018-07-23 14:54         ` Gal Shachaf
@ 2018-07-27  5:34           ` 858585 jemmy
  0 siblings, 0 replies; 22+ messages in thread
From: 858585 jemmy @ 2018-07-27  5:34 UTC (permalink / raw)
  To: Gal Shachaf
  Cc: Dr. David Alan Gilbert, Aviad Yehezkel, zhang.zhanghailiang,
	Juan Quintela, Daniel P. Berrange, Paolo Bonzini, qemu-devel,
	Adi Dotan, Lidong Chen

On Mon, Jul 23, 2018 at 10:54 PM, Gal Shachaf <galsha@mellanox.com> wrote:
> On Thu, Jul 5, 2018 at 10:26 PM, 858585 jemmy <jemmy858585@gmail.com> wrote:
>> On Thu, Jun 28, 2018 at 2:59 AM, Dr. David Alan Gilbert
>> <dgilbert@redhat.com> wrote:
>>> * Lidong Chen (jemmy858585@gmail.com) wrote:
>>>> ibv_dereg_mr wait for a long time for big memory size virtual server.
>>>>
>>>> The test result is:
>>>>   10GB  326ms
>>>>   20GB  699ms
>>>>   30GB  1021ms
>>>>   40GB  1387ms
>>>>   50GB  1712ms
>>>>   60GB  2034ms
>>>>   70GB  2457ms
>>>>   80GB  2807ms
>>>>   90GB  3107ms
>>>>   100GB 3474ms
>>>>   110GB 3735ms
>>>>   120GB 4064ms
>>>>   130GB 4567ms
>>>>   140GB 4886ms
>>>>
>>>> this will cause the guest os hang for a while when migration finished.
>>>> So create a dedicated thread to release rdma resource.
>>>>
>>>> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
>>>> ---
>>>>  migration/rdma.c | 43 +++++++++++++++++++++++++++----------------
>>>>  1 file changed, 27 insertions(+), 16 deletions(-)
>>>>
>>>> diff --git a/migration/rdma.c b/migration/rdma.c index
>>>> dfa4f77..f12e8d5 100644
>>>> --- a/migration/rdma.c
>>>> +++ b/migration/rdma.c
>>>> @@ -2979,35 +2979,46 @@ static void qio_channel_rdma_set_aio_fd_handler(QIOChannel *ioc,
>>>>      }
>>>>  }
>>>>
>>>> -static int qio_channel_rdma_close(QIOChannel *ioc,
>>>> -                                  Error **errp)
>>>> +static void *qio_channel_rdma_close_thread(void *arg)
>>>>  {
>>>> -    QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
>>>> -    RDMAContext *rdmain, *rdmaout;
>>>> -    trace_qemu_rdma_close();
>>>> +    RDMAContext **rdma = arg;
>>>> +    RDMAContext *rdmain = rdma[0];
>>>> +    RDMAContext *rdmaout = rdma[1];
>>>>
>>>> -    rdmain = rioc->rdmain;
>>>> -    if (rdmain) {
>>>> -        atomic_rcu_set(&rioc->rdmain, NULL);
>>>> -    }
>>>> -
>>>> -    rdmaout = rioc->rdmaout;
>>>> -    if (rdmaout) {
>>>> -        atomic_rcu_set(&rioc->rdmaout, NULL);
>>>> -    }
>>>> +    rcu_register_thread();
>>>>
>>>>      synchronize_rcu();
>>>
>>> * see below
>>>
>>>> -
>>>>      if (rdmain) {
>>>>          qemu_rdma_cleanup(rdmain);
>>>>      }
>>>> -
>>>>      if (rdmaout) {
>>>>          qemu_rdma_cleanup(rdmaout);
>>>>      }
>>>>
>>>>      g_free(rdmain);
>>>>      g_free(rdmaout);
>>>> +    g_free(rdma);
>>>> +
>>>> +    rcu_unregister_thread();
>>>> +    return NULL;
>>>> +}
>>>> +
>>>> +static int qio_channel_rdma_close(QIOChannel *ioc,
>>>> +                                  Error **errp) {
>>>> +    QemuThread t;
>>>> +    QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
>>>> +    RDMAContext **rdma = g_new0(RDMAContext*, 2);
>>>> +
>>>> +    trace_qemu_rdma_close();
>>>> +    if (rioc->rdmain || rioc->rdmaout) {
>>>> +        rdma[0] =  rioc->rdmain;
>>>> +        rdma[1] =  rioc->rdmaout;
>>>> +        qemu_thread_create(&t, "rdma cleanup", qio_channel_rdma_close_thread,
>>>> +                           rdma, QEMU_THREAD_DETACHED);
>>>> +        atomic_rcu_set(&rioc->rdmain, NULL);
>>>> +        atomic_rcu_set(&rioc->rdmaout, NULL);
>>>
>>> I'm not sure this pair is ordered with the synchronise_rcu above;
>>> Doesn't that mean, on a bad day, that you could get:
>>>
>>>
>>>     main-thread          rdma_cleanup     another-thread
>>>     qmu_thread_create
>>>                       synchronise_rcu
>>>                                         reads rioc->rdmain
>>>                                         starts doing something with rdmain
>>>     atomic_rcu_set
>>>                       rdma_cleanup
>>>
>>>
>>> so the another-thread is using it during the cleanup?
>>> Would just moving the atomic_rcu_sets before the qemu_thread_create
>>> fix that?
>> yes, I will fix it.
>>
>>>
>>> However, I've got other worries as well:
>>>    a) qemu_rdma_cleanup does:
>>>        migrate_get_current()->state == MIGRATION_STATUS_CANCELLING
>>>
>>>       which worries me a little if someone immediately tries to restart
>>>       the migration.
>
> Because the current qemu version don't wait for RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnect, so I think it's not necessary to send RDMA_CONTROL_ERROR.
>
> compare to send RDMA_CONTROL_ERROR, I think use cm event to notify peer qemu is better.
> maybe the rdma is already in error_state, and RDMA_CONTROL_ERROR cannot send successfully.
>
> For peer qemu, in qemu_rdma_wait_comp_channel function, it's should not only poll rdma->comp_channel->fd, it's should also poll  rdma->channel->fd.
> [GS] I agree that we should poll on CM events in addition to CQ events.
>
> for the source qemu, it's fixed by this patch.
> migration: poll the cm event while wait RDMA work request completion
>
> and for destination qemu, it's need a new patch to fix it.
>
> so I prefer to remove MIGRATION_STATUS_CANCELLING in qemu_rdma_cleanup function.
>
> [GS] The intention of this patch is to move the costly ibv_dereg_mr to a separate thread, so we do not actually need to also move the RDMA_CONTROL_ERROR  and rdma_disconnect, that come beforehand in qemu_rdma_cleanup,  to a thread. I suggest that we move them out of qemu_rdma_cleanup to a new function "qemu_rdma_disconnect" and call it on each RDMAcontext before the qemu_thread_create.

Hi Dave, Gal:
I find the reason why we send RDMA_CONTROL_ERROR  while
MIGRATION_STATUS_CANCELLING is to fix this issue.
https://git.qemu.org/?p=qemu.git;a=commit;h=32bce196344772df8d68ab40e2dd1dd57be1aa7c

The current qemu version don't wait for RDMA_CM_EVENT_DISCONNECTED
event after rdma_disconnect, so I think it's not necessary to send
RDMA_CONTROL_ERROR while MIGRATION_STATUS_CANCELLING.

and I was confused about the purpose of RDMA_CONTROL_ERROR.
It seems to notify something error on one side, but it's only invoked
in qemu_rdma_cleanup.
and rdma_disconnect is enough to notify peer qemu.
and Gal told me that DREQ drop just the same as the DREP, so we cannot
guarantee that the destination will receive the
RDMA_CM_EVENT_DISCONNECTED
event.
but we also cannot guarantee the destination will receive the
RDMA_CONTROL_ERROR when rdma in is error state.

So I prefer to also remove send RDMA_CONTROL_ERROR in qemu_rdma_cleanup.

Any suggestion?

Thanks.

>>>
>>>    b) I don't understand what happens if someone does try and restart
>>>       the migration after that, but in the ~5s it takes the ibv cleanup
>>>       to happen.
>
> I prefer to add a new variable in current_migration.  if the rdma cleanup thread has not exited. it's should not start a new migration.
> [GS] I support this suggestion.
>
>>
>> yes, I will try to fix it.
>>
>>>
>>> Dave
>>>
>>>
>>>> +    }
>>>>
>>>>      return 0;
>>>>  }
>>>> --
>>>> 1.8.3.1
>>>>
>>> --
>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2018-07-27  5:34 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-05 15:27 [Qemu-devel] [PATCH v5 00/10] Enable postcopy RDMA live migration Lidong Chen
2018-06-05 15:28 ` [Qemu-devel] [PATCH v5 01/10] migration: disable RDMA WRITE after postcopy started Lidong Chen
2018-06-05 15:28 ` [Qemu-devel] [PATCH v5 02/10] migration: create a dedicated connection for rdma return path Lidong Chen
2018-06-05 15:28 ` [Qemu-devel] [PATCH v5 03/10] migration: avoid concurrent invoke channel_close by different threads Lidong Chen
2018-06-13 17:02   ` Daniel P. Berrangé
2018-06-05 15:28 ` [Qemu-devel] [PATCH v5 04/10] migration: implement bi-directional RDMA QIOChannel Lidong Chen
2018-06-13 14:21   ` Dr. David Alan Gilbert
2018-06-27 15:31     ` 858585 jemmy
2018-06-27 15:32   ` Dr. David Alan Gilbert
2018-06-05 15:28 ` [Qemu-devel] [PATCH v5 05/10] migration: Stop rdma yielding during incoming postcopy Lidong Chen
2018-06-05 15:28 ` [Qemu-devel] [PATCH v5 06/10] migration: implement io_set_aio_fd_handler function for RDMA QIOChannel Lidong Chen
2018-06-05 15:28 ` [Qemu-devel] [PATCH v5 07/10] migration: invoke qio_channel_yield only when qemu_in_coroutine() Lidong Chen
2018-06-05 15:28 ` [Qemu-devel] [PATCH v5 08/10] migration: create a dedicated thread to release rdma resource Lidong Chen
2018-06-27 18:59   ` Dr. David Alan Gilbert
2018-07-05 14:26     ` 858585 jemmy
2018-07-19  5:49       ` 858585 jemmy
2018-07-23 14:54         ` Gal Shachaf
2018-07-27  5:34           ` 858585 jemmy
2018-06-05 15:28 ` [Qemu-devel] [PATCH v5 09/10] migration: poll the cm event while wait RDMA work request completion Lidong Chen
2018-06-13 14:24   ` Dr. David Alan Gilbert
2018-06-14  2:42     ` 858585 jemmy
2018-06-05 15:28 ` [Qemu-devel] [PATCH v5 10/10] migration: implement the shutdown for RDMA QIOChannel Lidong Chen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.