All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 0/5] Enable postcopy RDMA live migration
@ 2018-04-25 14:35 Lidong Chen
  2018-04-25 14:35 ` [Qemu-devel] [PATCH v2 1/5] migration: disable RDMA WRITE after postcopy started Lidong Chen
                   ` (4 more replies)
  0 siblings, 5 replies; 18+ messages in thread
From: Lidong Chen @ 2018-04-25 14:35 UTC (permalink / raw)
  To: quintela, dgilbert; +Cc: qemu-devel, galsha, aviadye, licq, adido, Lidong Chen

The RDMA QIOChannel does not support bi-directional communication, so when RDMA 
live migration with postcopy enabled, the source qemu return path get qemu file 
error.

These patches implement bi-directional communication for RDMA QIOChannel and 
disable the RDMA WRITE during the postcopy phase.

This patch just make postcopy works, and will improve performance later.

[v2]
 - does not update bytes_xfer when disable RDMA WRITE (David)
 - implement bi-directional communication for RDMA QIOChannel (Daniel)

Please, review.

Lidong Chen (5):
  migration: disable RDMA WRITE after postcopy started
  migration: create a dedicated connection for rdma return path
  migration: remove unnecessary variables len in QIOChannelRDMA
  migration: implement bi-directional RDMA QIOChannel
  migration: Stop rdma yielding during incoming postcopy

 migration/qemu-file.c |   8 +-
 migration/rdma.c      | 287 +++++++++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 265 insertions(+), 30 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH v2 1/5] migration: disable RDMA WRITE after postcopy started
  2018-04-25 14:35 [Qemu-devel] [PATCH v2 0/5] Enable postcopy RDMA live migration Lidong Chen
@ 2018-04-25 14:35 ` Lidong Chen
  2018-04-26 16:11   ` Dr. David Alan Gilbert
  2018-04-25 14:35 ` [Qemu-devel] [PATCH v2 2/5] migration: create a dedicated connection for rdma return path Lidong Chen
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 18+ messages in thread
From: Lidong Chen @ 2018-04-25 14:35 UTC (permalink / raw)
  To: quintela, dgilbert; +Cc: qemu-devel, galsha, aviadye, licq, adido, Lidong Chen

RDMA WRITE operations are performed with no notification to the destination
qemu, then the destination qemu can not wakeup. This patch disable RDMA WRITE
after postcopy started.

Signed-off-by: Lidong Chen <lidongchen@tencent.com>
---
 migration/qemu-file.c |  8 ++++++--
 migration/rdma.c      | 12 ++++++++++++
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index bb63c77..add8c3a 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -253,8 +253,12 @@ size_t ram_control_save_page(QEMUFile *f, ram_addr_t block_offset,
     if (f->hooks && f->hooks->save_page) {
         int ret = f->hooks->save_page(f, f->opaque, block_offset,
                                       offset, size, bytes_sent);
-        f->bytes_xfer += size;
-        if (ret != RAM_SAVE_CONTROL_DELAYED) {
+        if (ret != RAM_SAVE_CONTROL_NOT_SUPP) {
+            f->bytes_xfer += size;
+        }
+
+        if (ret != RAM_SAVE_CONTROL_DELAYED &&
+            ret != RAM_SAVE_CONTROL_NOT_SUPP) {
             if (bytes_sent && *bytes_sent > 0) {
                 qemu_update_position(f, *bytes_sent);
             } else if (ret < 0) {
diff --git a/migration/rdma.c b/migration/rdma.c
index da474fc..a22be43 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2927,6 +2927,10 @@ static size_t qemu_rdma_save_page(QEMUFile *f, void *opaque,
 
     CHECK_ERROR_STATE();
 
+    if (migrate_get_current()->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
+        return RAM_SAVE_CONTROL_NOT_SUPP;
+    }
+
     qemu_fflush(f);
 
     if (size > 0) {
@@ -3482,6 +3486,10 @@ static int qemu_rdma_registration_start(QEMUFile *f, void *opaque,
 
     CHECK_ERROR_STATE();
 
+    if (migrate_get_current()->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
+        return 0;
+    }
+
     trace_qemu_rdma_registration_start(flags);
     qemu_put_be64(f, RAM_SAVE_FLAG_HOOK);
     qemu_fflush(f);
@@ -3504,6 +3512,10 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
 
     CHECK_ERROR_STATE();
 
+    if (migrate_get_current()->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
+        return 0;
+    }
+
     qemu_fflush(f);
     ret = qemu_rdma_drain_cq(f, rdma);
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH v2 2/5] migration: create a dedicated connection for rdma return path
  2018-04-25 14:35 [Qemu-devel] [PATCH v2 0/5] Enable postcopy RDMA live migration Lidong Chen
  2018-04-25 14:35 ` [Qemu-devel] [PATCH v2 1/5] migration: disable RDMA WRITE after postcopy started Lidong Chen
@ 2018-04-25 14:35 ` Lidong Chen
  2018-04-26 16:19   ` Dr. David Alan Gilbert
  2018-04-25 14:35 ` [Qemu-devel] [PATCH v2 3/5] migration: remove unnecessary variables len in QIOChannelRDMA Lidong Chen
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 18+ messages in thread
From: Lidong Chen @ 2018-04-25 14:35 UTC (permalink / raw)
  To: quintela, dgilbert; +Cc: qemu-devel, galsha, aviadye, licq, adido, Lidong Chen

If start a RDMA migration with postcopy enabled, the source qemu
establish a dedicated connection for return path.

Signed-off-by: Lidong Chen <lidongchen@tencent.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/rdma.c | 94 ++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 91 insertions(+), 3 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index a22be43..c745427 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -387,6 +387,10 @@ typedef struct RDMAContext {
     uint64_t unregistrations[RDMA_SIGNALED_SEND_MAX];
 
     GHashTable *blockmap;
+
+    /* the RDMAContext for return path */
+    struct RDMAContext *return_path;
+    bool is_return_path;
 } RDMAContext;
 
 #define TYPE_QIO_CHANNEL_RDMA "qio-channel-rdma"
@@ -2329,10 +2333,22 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
         rdma_destroy_id(rdma->cm_id);
         rdma->cm_id = NULL;
     }
+
+    /* the destination side, listen_id and channel is shared */
     if (rdma->listen_id) {
-        rdma_destroy_id(rdma->listen_id);
+        if (!rdma->is_return_path) {
+            rdma_destroy_id(rdma->listen_id);
+        }
         rdma->listen_id = NULL;
+
+        if (rdma->channel) {
+            if (!rdma->is_return_path) {
+                rdma_destroy_event_channel(rdma->channel);
+            }
+            rdma->channel = NULL;
+        }
     }
+
     if (rdma->channel) {
         rdma_destroy_event_channel(rdma->channel);
         rdma->channel = NULL;
@@ -2561,6 +2577,25 @@ err_dest_init_create_listen_id:
 
 }
 
+static void qemu_rdma_return_path_dest_init(RDMAContext *rdma_return_path,
+                                            RDMAContext *rdma)
+{
+    int idx;
+
+    for (idx = 0; idx < RDMA_WRID_MAX; idx++) {
+        rdma_return_path->wr_data[idx].control_len = 0;
+        rdma_return_path->wr_data[idx].control_curr = NULL;
+    }
+
+    /*the CM channel and CM id is shared*/
+    rdma_return_path->channel = rdma->channel;
+    rdma_return_path->listen_id = rdma->listen_id;
+
+    rdma->return_path = rdma_return_path;
+    rdma_return_path->return_path = rdma;
+    rdma_return_path->is_return_path = true;
+}
+
 static void *qemu_rdma_data_init(const char *host_port, Error **errp)
 {
     RDMAContext *rdma = NULL;
@@ -3018,6 +3053,8 @@ err:
     return ret;
 }
 
+static void rdma_accept_incoming_migration(void *opaque);
+
 static int qemu_rdma_accept(RDMAContext *rdma)
 {
     RDMACapabilities cap;
@@ -3112,7 +3149,14 @@ static int qemu_rdma_accept(RDMAContext *rdma)
         }
     }
 
-    qemu_set_fd_handler(rdma->channel->fd, NULL, NULL, NULL);
+    /* Accept the second connection request for return path */
+    if (migrate_postcopy() && !rdma->is_return_path) {
+        qemu_set_fd_handler(rdma->channel->fd, rdma_accept_incoming_migration,
+                            NULL,
+                            (void *)(intptr_t)rdma->return_path);
+    } else {
+        qemu_set_fd_handler(rdma->channel->fd, NULL, NULL, NULL);
+    }
 
     ret = rdma_accept(rdma->cm_id, &conn_param);
     if (ret) {
@@ -3693,6 +3737,10 @@ static void rdma_accept_incoming_migration(void *opaque)
 
     trace_qemu_rdma_accept_incoming_migration_accepted();
 
+    if (rdma->is_return_path) {
+        return;
+    }
+
     f = qemu_fopen_rdma(rdma, "rb");
     if (f == NULL) {
         ERROR(errp, "could not qemu_fopen_rdma!");
@@ -3707,7 +3755,7 @@ static void rdma_accept_incoming_migration(void *opaque)
 void rdma_start_incoming_migration(const char *host_port, Error **errp)
 {
     int ret;
-    RDMAContext *rdma;
+    RDMAContext *rdma, *rdma_return_path;
     Error *local_err = NULL;
 
     trace_rdma_start_incoming_migration();
@@ -3734,12 +3782,24 @@ void rdma_start_incoming_migration(const char *host_port, Error **errp)
 
     trace_rdma_start_incoming_migration_after_rdma_listen();
 
+    /* initialize the RDMAContext for return path */
+    if (migrate_postcopy()) {
+        rdma_return_path = qemu_rdma_data_init(host_port, &local_err);
+
+        if (rdma_return_path == NULL) {
+            goto err;
+        }
+
+        qemu_rdma_return_path_dest_init(rdma_return_path, rdma);
+    }
+
     qemu_set_fd_handler(rdma->channel->fd, rdma_accept_incoming_migration,
                         NULL, (void *)(intptr_t)rdma);
     return;
 err:
     error_propagate(errp, local_err);
     g_free(rdma);
+    g_free(rdma_return_path);
 }
 
 void rdma_start_outgoing_migration(void *opaque,
@@ -3747,6 +3807,7 @@ void rdma_start_outgoing_migration(void *opaque,
 {
     MigrationState *s = opaque;
     RDMAContext *rdma = qemu_rdma_data_init(host_port, errp);
+    RDMAContext *rdma_return_path = NULL;
     int ret = 0;
 
     if (rdma == NULL) {
@@ -3767,6 +3828,32 @@ void rdma_start_outgoing_migration(void *opaque,
         goto err;
     }
 
+    /* RDMA postcopy need a seprate queue pair for return path */
+    if (migrate_postcopy()) {
+        rdma_return_path = qemu_rdma_data_init(host_port, errp);
+
+        if (rdma_return_path == NULL) {
+            goto err;
+        }
+
+        ret = qemu_rdma_source_init(rdma_return_path,
+            s->enabled_capabilities[MIGRATION_CAPABILITY_RDMA_PIN_ALL], errp);
+
+        if (ret) {
+            goto err;
+        }
+
+        ret = qemu_rdma_connect(rdma_return_path, errp);
+
+        if (ret) {
+            goto err;
+        }
+
+        rdma->return_path = rdma_return_path;
+        rdma_return_path->return_path = rdma;
+        rdma_return_path->is_return_path = true;
+    }
+
     trace_rdma_start_outgoing_migration_after_rdma_connect();
 
     s->to_dst_file = qemu_fopen_rdma(rdma, "wb");
@@ -3774,4 +3861,5 @@ void rdma_start_outgoing_migration(void *opaque,
     return;
 err:
     g_free(rdma);
+    g_free(rdma_return_path);
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH v2 3/5] migration: remove unnecessary variables len in QIOChannelRDMA
  2018-04-25 14:35 [Qemu-devel] [PATCH v2 0/5] Enable postcopy RDMA live migration Lidong Chen
  2018-04-25 14:35 ` [Qemu-devel] [PATCH v2 1/5] migration: disable RDMA WRITE after postcopy started Lidong Chen
  2018-04-25 14:35 ` [Qemu-devel] [PATCH v2 2/5] migration: create a dedicated connection for rdma return path Lidong Chen
@ 2018-04-25 14:35 ` Lidong Chen
  2018-04-26 16:40   ` Dr. David Alan Gilbert
  2018-04-27  9:04   ` Daniel P. Berrangé
  2018-04-25 14:35 ` [Qemu-devel] [PATCH v2 4/5] migration: implement bi-directional RDMA QIOChannel Lidong Chen
  2018-04-25 14:35 ` [Qemu-devel] [PATCH v2 5/5] migration: Stop rdma yielding during incoming postcopy Lidong Chen
  4 siblings, 2 replies; 18+ messages in thread
From: Lidong Chen @ 2018-04-25 14:35 UTC (permalink / raw)
  To: quintela, dgilbert; +Cc: qemu-devel, galsha, aviadye, licq, adido, Lidong Chen

Because qio_channel_rdma_writev and qio_channel_rdma_readv maybe invoked
by different threads concurrently, this patch removes unnecessary variables
len in QIOChannelRDMA and use local variable instead.

Signed-off-by: Lidong Chen <lidongchen@tencent.com>
---
 migration/rdma.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index c745427..f5c1d02 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -404,7 +404,6 @@ struct QIOChannelRDMA {
     QIOChannel parent;
     RDMAContext *rdma;
     QEMUFile *file;
-    size_t len;
     bool blocking; /* XXX we don't actually honour this yet */
 };
 
@@ -2640,6 +2639,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
     int ret;
     ssize_t done = 0;
     size_t i;
+    size_t len = 0;
 
     CHECK_ERROR_STATE();
 
@@ -2659,10 +2659,10 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
         while (remaining) {
             RDMAControlHeader head;
 
-            rioc->len = MIN(remaining, RDMA_SEND_INCREMENT);
-            remaining -= rioc->len;
+            len = MIN(remaining, RDMA_SEND_INCREMENT);
+            remaining -= len;
 
-            head.len = rioc->len;
+            head.len = len;
             head.type = RDMA_CONTROL_QEMU_FILE;
 
             ret = qemu_rdma_exchange_send(rdma, &head, data, NULL, NULL, NULL);
@@ -2672,8 +2672,8 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
                 return ret;
             }
 
-            data += rioc->len;
-            done += rioc->len;
+            data += len;
+            done += len;
         }
     }
 
@@ -2768,8 +2768,7 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
             }
         }
     }
-    rioc->len = done;
-    return rioc->len;
+    return done;
 }
 
 /*
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH v2 4/5] migration: implement bi-directional RDMA QIOChannel
  2018-04-25 14:35 [Qemu-devel] [PATCH v2 0/5] Enable postcopy RDMA live migration Lidong Chen
                   ` (2 preceding siblings ...)
  2018-04-25 14:35 ` [Qemu-devel] [PATCH v2 3/5] migration: remove unnecessary variables len in QIOChannelRDMA Lidong Chen
@ 2018-04-25 14:35 ` Lidong Chen
  2018-04-26 17:36   ` Dr. David Alan Gilbert
  2018-04-25 14:35 ` [Qemu-devel] [PATCH v2 5/5] migration: Stop rdma yielding during incoming postcopy Lidong Chen
  4 siblings, 1 reply; 18+ messages in thread
From: Lidong Chen @ 2018-04-25 14:35 UTC (permalink / raw)
  To: quintela, dgilbert; +Cc: qemu-devel, galsha, aviadye, licq, adido, Lidong Chen

This patch implements bi-directional RDMA QIOChannel. Because different
threads may access RDMAQIOChannel concurrently, this patch use RCU to protect it.

Signed-off-by: Lidong Chen <lidongchen@tencent.com>
---
 migration/rdma.c | 162 +++++++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 146 insertions(+), 16 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index f5c1d02..0652224 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -86,6 +86,7 @@ static uint32_t known_capabilities = RDMA_CAPABILITY_PIN_ALL;
                                 " to abort!"); \
                 rdma->error_reported = 1; \
             } \
+            rcu_read_unlock(); \
             return rdma->error_state; \
         } \
     } while (0)
@@ -405,6 +406,7 @@ struct QIOChannelRDMA {
     RDMAContext *rdma;
     QEMUFile *file;
     bool blocking; /* XXX we don't actually honour this yet */
+    QemuMutex lock;
 };
 
 /*
@@ -2635,12 +2637,29 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
 {
     QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
     QEMUFile *f = rioc->file;
-    RDMAContext *rdma = rioc->rdma;
+    RDMAContext *rdma;
     int ret;
     ssize_t done = 0;
     size_t i;
     size_t len = 0;
 
+    rcu_read_lock();
+    rdma = atomic_rcu_read(&rioc->rdma);
+
+    if (!rdma) {
+        rcu_read_unlock();
+        return -EIO;
+    }
+
+    if (rdma->listen_id) {
+        rdma = rdma->return_path;
+    }
+
+    if (!rdma) {
+        rcu_read_unlock();
+        return -EIO;
+    }
+
     CHECK_ERROR_STATE();
 
     /*
@@ -2650,6 +2669,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
     ret = qemu_rdma_write_flush(f, rdma);
     if (ret < 0) {
         rdma->error_state = ret;
+        rcu_read_unlock();
         return ret;
     }
 
@@ -2669,6 +2689,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
 
             if (ret < 0) {
                 rdma->error_state = ret;
+                rcu_read_unlock();
                 return ret;
             }
 
@@ -2677,6 +2698,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
         }
     }
 
+    rcu_read_unlock();
     return done;
 }
 
@@ -2710,12 +2732,29 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
                                       Error **errp)
 {
     QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
-    RDMAContext *rdma = rioc->rdma;
+    RDMAContext *rdma;
     RDMAControlHeader head;
     int ret = 0;
     ssize_t i;
     size_t done = 0;
 
+    rcu_read_lock();
+    rdma = atomic_rcu_read(&rioc->rdma);
+
+    if (!rdma) {
+        rcu_read_unlock();
+        return -EIO;
+    }
+
+    if (!rdma->listen_id) {
+        rdma = rdma->return_path;
+    }
+
+    if (!rdma) {
+        rcu_read_unlock();
+        return -EIO;
+    }
+
     CHECK_ERROR_STATE();
 
     for (i = 0; i < niov; i++) {
@@ -2727,7 +2766,7 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
          * were given and dish out the bytes until we run
          * out of bytes.
          */
-        ret = qemu_rdma_fill(rioc->rdma, data, want, 0);
+        ret = qemu_rdma_fill(rdma, data, want, 0);
         done += ret;
         want -= ret;
         /* Got what we needed, so go to next iovec */
@@ -2749,25 +2788,28 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
 
         if (ret < 0) {
             rdma->error_state = ret;
+            rcu_read_unlock();
             return ret;
         }
 
         /*
          * SEND was received with new bytes, now try again.
          */
-        ret = qemu_rdma_fill(rioc->rdma, data, want, 0);
+        ret = qemu_rdma_fill(rdma, data, want, 0);
         done += ret;
         want -= ret;
 
         /* Still didn't get enough, so lets just return */
         if (want) {
             if (done == 0) {
+                rcu_read_unlock();
                 return QIO_CHANNEL_ERR_BLOCK;
             } else {
                 break;
             }
         }
     }
+    rcu_read_unlock();
     return done;
 }
 
@@ -2823,6 +2865,16 @@ qio_channel_rdma_source_prepare(GSource *source,
     GIOCondition cond = 0;
     *timeout = -1;
 
+    if ((rdma->listen_id && rsource->condition == G_IO_OUT) ||
+       (!rdma->listen_id && rsource->condition == G_IO_IN)) {
+        rdma = rdma->return_path;
+    }
+
+    if (!rdma) {
+        error_report("RDMAContext is NULL when prepare Gsource");
+        return FALSE;
+    }
+
     if (rdma->wr_data[0].control_len) {
         cond |= G_IO_IN;
     }
@@ -2838,6 +2890,16 @@ qio_channel_rdma_source_check(GSource *source)
     RDMAContext *rdma = rsource->rioc->rdma;
     GIOCondition cond = 0;
 
+    if ((rdma->listen_id && rsource->condition == G_IO_OUT) ||
+       (!rdma->listen_id && rsource->condition == G_IO_IN)) {
+        rdma = rdma->return_path;
+    }
+
+    if (!rdma) {
+        error_report("RDMAContext is NULL when check Gsource");
+        return FALSE;
+    }
+
     if (rdma->wr_data[0].control_len) {
         cond |= G_IO_IN;
     }
@@ -2856,6 +2918,16 @@ qio_channel_rdma_source_dispatch(GSource *source,
     RDMAContext *rdma = rsource->rioc->rdma;
     GIOCondition cond = 0;
 
+    if ((rdma->listen_id && rsource->condition == G_IO_OUT) ||
+       (!rdma->listen_id && rsource->condition == G_IO_IN)) {
+        rdma = rdma->return_path;
+    }
+
+    if (!rdma) {
+        error_report("RDMAContext is NULL when dispatch Gsource");
+        return FALSE;
+    }
+
     if (rdma->wr_data[0].control_len) {
         cond |= G_IO_IN;
     }
@@ -2905,15 +2977,29 @@ static int qio_channel_rdma_close(QIOChannel *ioc,
                                   Error **errp)
 {
     QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
+    RDMAContext *rdma;
     trace_qemu_rdma_close();
-    if (rioc->rdma) {
-        if (!rioc->rdma->error_state) {
-            rioc->rdma->error_state = qemu_file_get_error(rioc->file);
-        }
-        qemu_rdma_cleanup(rioc->rdma);
-        g_free(rioc->rdma);
-        rioc->rdma = NULL;
+
+    qemu_mutex_lock(&rioc->lock);
+    rdma = rioc->rdma;
+    if (!rdma) {
+        qemu_mutex_unlock(&rioc->lock);
+        return 0;
+    }
+    atomic_rcu_set(&rioc->rdma, NULL);
+    qemu_mutex_unlock(&rioc->lock);
+
+    if (!rdma->error_state) {
+        rdma->error_state = qemu_file_get_error(rioc->file);
+    }
+    qemu_rdma_cleanup(rdma);
+
+    if (rdma->return_path) {
+        qemu_rdma_cleanup(rdma->return_path);
+        g_free(rdma->return_path);
     }
+
+    g_free(rdma);
     return 0;
 }
 
@@ -2956,12 +3042,21 @@ static size_t qemu_rdma_save_page(QEMUFile *f, void *opaque,
                                   size_t size, uint64_t *bytes_sent)
 {
     QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(opaque);
-    RDMAContext *rdma = rioc->rdma;
+    RDMAContext *rdma;
     int ret;
 
+    rcu_read_lock();
+    rdma = atomic_rcu_read(&rioc->rdma);
+
+    if (!rdma) {
+        rcu_read_unlock();
+        return -EIO;
+    }
+
     CHECK_ERROR_STATE();
 
     if (migrate_get_current()->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
+        rcu_read_unlock();
         return RAM_SAVE_CONTROL_NOT_SUPP;
     }
 
@@ -3046,9 +3141,11 @@ static size_t qemu_rdma_save_page(QEMUFile *f, void *opaque,
         }
     }
 
+    rcu_read_unlock();
     return RAM_SAVE_CONTROL_DELAYED;
 err:
     rdma->error_state = ret;
+    rcu_read_unlock();
     return ret;
 }
 
@@ -3224,8 +3321,8 @@ static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque)
     RDMAControlHeader blocks = { .type = RDMA_CONTROL_RAM_BLOCKS_RESULT,
                                  .repeat = 1 };
     QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(opaque);
-    RDMAContext *rdma = rioc->rdma;
-    RDMALocalBlocks *local = &rdma->local_ram_blocks;
+    RDMAContext *rdma;
+    RDMALocalBlocks *local;
     RDMAControlHeader head;
     RDMARegister *reg, *registers;
     RDMACompress *comp;
@@ -3238,8 +3335,17 @@ static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque)
     int count = 0;
     int i = 0;
 
+    rcu_read_lock();
+    rdma = atomic_rcu_read(&rioc->rdma);
+
+    if (!rdma) {
+        rcu_read_unlock();
+        return -EIO;
+    }
+
     CHECK_ERROR_STATE();
 
+    local = &rdma->local_ram_blocks;
     do {
         trace_qemu_rdma_registration_handle_wait();
 
@@ -3469,6 +3575,7 @@ out:
     if (ret < 0) {
         rdma->error_state = ret;
     }
+    rcu_read_unlock();
     return ret;
 }
 
@@ -3525,11 +3632,19 @@ static int qemu_rdma_registration_start(QEMUFile *f, void *opaque,
                                         uint64_t flags, void *data)
 {
     QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(opaque);
-    RDMAContext *rdma = rioc->rdma;
+    RDMAContext *rdma;
+
+    rcu_read_lock();
+    rdma = atomic_rcu_read(&rioc->rdma);
+    if (!rdma) {
+        rcu_read_unlock();
+        return -EIO;
+    }
 
     CHECK_ERROR_STATE();
 
     if (migrate_get_current()->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
+        rcu_read_unlock();
         return 0;
     }
 
@@ -3537,6 +3652,7 @@ static int qemu_rdma_registration_start(QEMUFile *f, void *opaque,
     qemu_put_be64(f, RAM_SAVE_FLAG_HOOK);
     qemu_fflush(f);
 
+    rcu_read_unlock();
     return 0;
 }
 
@@ -3549,13 +3665,21 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
 {
     Error *local_err = NULL, **errp = &local_err;
     QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(opaque);
-    RDMAContext *rdma = rioc->rdma;
+    RDMAContext *rdma;
     RDMAControlHeader head = { .len = 0, .repeat = 1 };
     int ret = 0;
 
+    rcu_read_lock();
+    rdma = atomic_rcu_read(&rioc->rdma);
+    if (!rdma) {
+        rcu_read_unlock();
+        return -EIO;
+    }
+
     CHECK_ERROR_STATE();
 
     if (migrate_get_current()->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
+        rcu_read_unlock();
         return 0;
     }
 
@@ -3587,6 +3711,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
                     qemu_rdma_reg_whole_ram_blocks : NULL);
         if (ret < 0) {
             ERROR(errp, "receiving remote info!");
+            rcu_read_unlock();
             return ret;
         }
 
@@ -3610,6 +3735,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
                         "not identical on both the source and destination.",
                         local->nb_blocks, nb_dest_blocks);
             rdma->error_state = -EINVAL;
+            rcu_read_unlock();
             return -EINVAL;
         }
 
@@ -3626,6 +3752,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
                             local->block[i].length,
                             rdma->dest_blocks[i].length);
                 rdma->error_state = -EINVAL;
+                rcu_read_unlock();
                 return -EINVAL;
             }
             local->block[i].remote_host_addr =
@@ -3643,9 +3770,11 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
         goto err;
     }
 
+    rcu_read_unlock();
     return 0;
 err:
     rdma->error_state = ret;
+    rcu_read_unlock();
     return ret;
 }
 
@@ -3707,6 +3836,7 @@ static QEMUFile *qemu_fopen_rdma(RDMAContext *rdma, const char *mode)
 
     rioc = QIO_CHANNEL_RDMA(object_new(TYPE_QIO_CHANNEL_RDMA));
     rioc->rdma = rdma;
+    qemu_mutex_init(&rioc->lock);
 
     if (mode[0] == 'w') {
         rioc->file = qemu_fopen_channel_output(QIO_CHANNEL(rioc));
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH v2 5/5] migration: Stop rdma yielding during incoming postcopy
  2018-04-25 14:35 [Qemu-devel] [PATCH v2 0/5] Enable postcopy RDMA live migration Lidong Chen
                   ` (3 preceding siblings ...)
  2018-04-25 14:35 ` [Qemu-devel] [PATCH v2 4/5] migration: implement bi-directional RDMA QIOChannel Lidong Chen
@ 2018-04-25 14:35 ` Lidong Chen
  2018-04-26 17:54   ` Dr. David Alan Gilbert
  4 siblings, 1 reply; 18+ messages in thread
From: Lidong Chen @ 2018-04-25 14:35 UTC (permalink / raw)
  To: quintela, dgilbert; +Cc: qemu-devel, galsha, aviadye, licq, adido, Lidong Chen

During incoming postcopy, the destination qemu will invoke
qemu_rdma_wait_comp_channel in a seprate thread. So does not use rdma
yield, and poll the completion channel fd instead.

Signed-off-by: Lidong Chen <lidongchen@tencent.com>
---
 migration/rdma.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 0652224..4ba9fe2 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1490,11 +1490,13 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma)
      * Coroutine doesn't start until migration_fd_process_incoming()
      * so don't yield unless we know we're running inside of a coroutine.
      */
-    if (rdma->migration_started_on_destination) {
+    if (rdma->migration_started_on_destination &&
+        migration_incoming_get_current()->state == MIGRATION_STATUS_ACTIVE) {
         yield_until_fd_readable(rdma->comp_channel->fd);
     } else {
         /* This is the source side, we're in a separate thread
          * or destination prior to migration_fd_process_incoming()
+         * after postcopy, the destination also in a seprate thread.
          * we can't yield; so we have to poll the fd.
          * But we need to be able to handle 'cancel' or an error
          * without hanging forever.
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/5] migration: disable RDMA WRITE after postcopy started
  2018-04-25 14:35 ` [Qemu-devel] [PATCH v2 1/5] migration: disable RDMA WRITE after postcopy started Lidong Chen
@ 2018-04-26 16:11   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 18+ messages in thread
From: Dr. David Alan Gilbert @ 2018-04-26 16:11 UTC (permalink / raw)
  To: Lidong Chen
  Cc: quintela, qemu-devel, galsha, aviadye, licq, adido, Lidong Chen

* Lidong Chen (jemmy858585@gmail.com) wrote:
> RDMA WRITE operations are performed with no notification to the destination
> qemu, then the destination qemu can not wakeup. This patch disable RDMA WRITE
> after postcopy started.
> 
> Signed-off-by: Lidong Chen <lidongchen@tencent.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/qemu-file.c |  8 ++++++--
>  migration/rdma.c      | 12 ++++++++++++
>  2 files changed, 18 insertions(+), 2 deletions(-)
> 
> diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> index bb63c77..add8c3a 100644
> --- a/migration/qemu-file.c
> +++ b/migration/qemu-file.c
> @@ -253,8 +253,12 @@ size_t ram_control_save_page(QEMUFile *f, ram_addr_t block_offset,
>      if (f->hooks && f->hooks->save_page) {
>          int ret = f->hooks->save_page(f, f->opaque, block_offset,
>                                        offset, size, bytes_sent);
> -        f->bytes_xfer += size;
> -        if (ret != RAM_SAVE_CONTROL_DELAYED) {
> +        if (ret != RAM_SAVE_CONTROL_NOT_SUPP) {
> +            f->bytes_xfer += size;
> +        }
> +
> +        if (ret != RAM_SAVE_CONTROL_DELAYED &&
> +            ret != RAM_SAVE_CONTROL_NOT_SUPP) {
>              if (bytes_sent && *bytes_sent > 0) {
>                  qemu_update_position(f, *bytes_sent);
>              } else if (ret < 0) {
> diff --git a/migration/rdma.c b/migration/rdma.c
> index da474fc..a22be43 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -2927,6 +2927,10 @@ static size_t qemu_rdma_save_page(QEMUFile *f, void *opaque,
>  
>      CHECK_ERROR_STATE();
>  
> +    if (migrate_get_current()->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
> +        return RAM_SAVE_CONTROL_NOT_SUPP;
> +    }
> +
>      qemu_fflush(f);
>  
>      if (size > 0) {
> @@ -3482,6 +3486,10 @@ static int qemu_rdma_registration_start(QEMUFile *f, void *opaque,
>  
>      CHECK_ERROR_STATE();
>  
> +    if (migrate_get_current()->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
> +        return 0;
> +    }
> +
>      trace_qemu_rdma_registration_start(flags);
>      qemu_put_be64(f, RAM_SAVE_FLAG_HOOK);
>      qemu_fflush(f);
> @@ -3504,6 +3512,10 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
>  
>      CHECK_ERROR_STATE();
>  
> +    if (migrate_get_current()->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
> +        return 0;
> +    }
> +
>      qemu_fflush(f);
>      ret = qemu_rdma_drain_cq(f, rdma);
>  
> -- 
> 1.8.3.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/5] migration: create a dedicated connection for rdma return path
  2018-04-25 14:35 ` [Qemu-devel] [PATCH v2 2/5] migration: create a dedicated connection for rdma return path Lidong Chen
@ 2018-04-26 16:19   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 18+ messages in thread
From: Dr. David Alan Gilbert @ 2018-04-26 16:19 UTC (permalink / raw)
  To: Lidong Chen
  Cc: quintela, qemu-devel, galsha, aviadye, licq, adido, Lidong Chen

* Lidong Chen (jemmy858585@gmail.com) wrote:
> If start a RDMA migration with postcopy enabled, the source qemu
> establish a dedicated connection for return path.
> 
> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Oops, I should have Reviewed-by rather than Signed-off-by, so:


Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/rdma.c | 94 ++++++++++++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 91 insertions(+), 3 deletions(-)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index a22be43..c745427 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -387,6 +387,10 @@ typedef struct RDMAContext {
>      uint64_t unregistrations[RDMA_SIGNALED_SEND_MAX];
>  
>      GHashTable *blockmap;
> +
> +    /* the RDMAContext for return path */
> +    struct RDMAContext *return_path;
> +    bool is_return_path;
>  } RDMAContext;
>  
>  #define TYPE_QIO_CHANNEL_RDMA "qio-channel-rdma"
> @@ -2329,10 +2333,22 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
>          rdma_destroy_id(rdma->cm_id);
>          rdma->cm_id = NULL;
>      }
> +
> +    /* the destination side, listen_id and channel is shared */
>      if (rdma->listen_id) {
> -        rdma_destroy_id(rdma->listen_id);
> +        if (!rdma->is_return_path) {
> +            rdma_destroy_id(rdma->listen_id);
> +        }
>          rdma->listen_id = NULL;
> +
> +        if (rdma->channel) {
> +            if (!rdma->is_return_path) {
> +                rdma_destroy_event_channel(rdma->channel);
> +            }
> +            rdma->channel = NULL;
> +        }
>      }
> +
>      if (rdma->channel) {
>          rdma_destroy_event_channel(rdma->channel);
>          rdma->channel = NULL;
> @@ -2561,6 +2577,25 @@ err_dest_init_create_listen_id:
>  
>  }
>  
> +static void qemu_rdma_return_path_dest_init(RDMAContext *rdma_return_path,
> +                                            RDMAContext *rdma)
> +{
> +    int idx;
> +
> +    for (idx = 0; idx < RDMA_WRID_MAX; idx++) {
> +        rdma_return_path->wr_data[idx].control_len = 0;
> +        rdma_return_path->wr_data[idx].control_curr = NULL;
> +    }
> +
> +    /*the CM channel and CM id is shared*/
> +    rdma_return_path->channel = rdma->channel;
> +    rdma_return_path->listen_id = rdma->listen_id;
> +
> +    rdma->return_path = rdma_return_path;
> +    rdma_return_path->return_path = rdma;
> +    rdma_return_path->is_return_path = true;
> +}
> +
>  static void *qemu_rdma_data_init(const char *host_port, Error **errp)
>  {
>      RDMAContext *rdma = NULL;
> @@ -3018,6 +3053,8 @@ err:
>      return ret;
>  }
>  
> +static void rdma_accept_incoming_migration(void *opaque);
> +
>  static int qemu_rdma_accept(RDMAContext *rdma)
>  {
>      RDMACapabilities cap;
> @@ -3112,7 +3149,14 @@ static int qemu_rdma_accept(RDMAContext *rdma)
>          }
>      }
>  
> -    qemu_set_fd_handler(rdma->channel->fd, NULL, NULL, NULL);
> +    /* Accept the second connection request for return path */
> +    if (migrate_postcopy() && !rdma->is_return_path) {
> +        qemu_set_fd_handler(rdma->channel->fd, rdma_accept_incoming_migration,
> +                            NULL,
> +                            (void *)(intptr_t)rdma->return_path);
> +    } else {
> +        qemu_set_fd_handler(rdma->channel->fd, NULL, NULL, NULL);
> +    }
>  
>      ret = rdma_accept(rdma->cm_id, &conn_param);
>      if (ret) {
> @@ -3693,6 +3737,10 @@ static void rdma_accept_incoming_migration(void *opaque)
>  
>      trace_qemu_rdma_accept_incoming_migration_accepted();
>  
> +    if (rdma->is_return_path) {
> +        return;
> +    }
> +
>      f = qemu_fopen_rdma(rdma, "rb");
>      if (f == NULL) {
>          ERROR(errp, "could not qemu_fopen_rdma!");
> @@ -3707,7 +3755,7 @@ static void rdma_accept_incoming_migration(void *opaque)
>  void rdma_start_incoming_migration(const char *host_port, Error **errp)
>  {
>      int ret;
> -    RDMAContext *rdma;
> +    RDMAContext *rdma, *rdma_return_path;
>      Error *local_err = NULL;
>  
>      trace_rdma_start_incoming_migration();
> @@ -3734,12 +3782,24 @@ void rdma_start_incoming_migration(const char *host_port, Error **errp)
>  
>      trace_rdma_start_incoming_migration_after_rdma_listen();
>  
> +    /* initialize the RDMAContext for return path */
> +    if (migrate_postcopy()) {
> +        rdma_return_path = qemu_rdma_data_init(host_port, &local_err);
> +
> +        if (rdma_return_path == NULL) {
> +            goto err;
> +        }
> +
> +        qemu_rdma_return_path_dest_init(rdma_return_path, rdma);
> +    }
> +
>      qemu_set_fd_handler(rdma->channel->fd, rdma_accept_incoming_migration,
>                          NULL, (void *)(intptr_t)rdma);
>      return;
>  err:
>      error_propagate(errp, local_err);
>      g_free(rdma);
> +    g_free(rdma_return_path);
>  }
>  
>  void rdma_start_outgoing_migration(void *opaque,
> @@ -3747,6 +3807,7 @@ void rdma_start_outgoing_migration(void *opaque,
>  {
>      MigrationState *s = opaque;
>      RDMAContext *rdma = qemu_rdma_data_init(host_port, errp);
> +    RDMAContext *rdma_return_path = NULL;
>      int ret = 0;
>  
>      if (rdma == NULL) {
> @@ -3767,6 +3828,32 @@ void rdma_start_outgoing_migration(void *opaque,
>          goto err;
>      }
>  
> +    /* RDMA postcopy need a seprate queue pair for return path */
> +    if (migrate_postcopy()) {
> +        rdma_return_path = qemu_rdma_data_init(host_port, errp);
> +
> +        if (rdma_return_path == NULL) {
> +            goto err;
> +        }
> +
> +        ret = qemu_rdma_source_init(rdma_return_path,
> +            s->enabled_capabilities[MIGRATION_CAPABILITY_RDMA_PIN_ALL], errp);
> +
> +        if (ret) {
> +            goto err;
> +        }
> +
> +        ret = qemu_rdma_connect(rdma_return_path, errp);
> +
> +        if (ret) {
> +            goto err;
> +        }
> +
> +        rdma->return_path = rdma_return_path;
> +        rdma_return_path->return_path = rdma;
> +        rdma_return_path->is_return_path = true;
> +    }
> +
>      trace_rdma_start_outgoing_migration_after_rdma_connect();
>  
>      s->to_dst_file = qemu_fopen_rdma(rdma, "wb");
> @@ -3774,4 +3861,5 @@ void rdma_start_outgoing_migration(void *opaque,
>      return;
>  err:
>      g_free(rdma);
> +    g_free(rdma_return_path);
>  }
> -- 
> 1.8.3.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/5] migration: remove unnecessary variables len in QIOChannelRDMA
  2018-04-25 14:35 ` [Qemu-devel] [PATCH v2 3/5] migration: remove unnecessary variables len in QIOChannelRDMA Lidong Chen
@ 2018-04-26 16:40   ` Dr. David Alan Gilbert
  2018-04-27  3:51     ` 858585 jemmy
  2018-04-27  9:01     ` Daniel P. Berrangé
  2018-04-27  9:04   ` Daniel P. Berrangé
  1 sibling, 2 replies; 18+ messages in thread
From: Dr. David Alan Gilbert @ 2018-04-26 16:40 UTC (permalink / raw)
  To: Lidong Chen, berrange
  Cc: quintela, qemu-devel, galsha, aviadye, licq, adido, Lidong Chen

* Lidong Chen (jemmy858585@gmail.com) wrote:
> Because qio_channel_rdma_writev and qio_channel_rdma_readv maybe invoked
> by different threads concurrently, this patch removes unnecessary variables
> len in QIOChannelRDMA and use local variable instead.
> 
> Signed-off-by: Lidong Chen <lidongchen@tencent.com>

I'm OK with this patch as is; but now you're making me worried that I
don't quite understand what thrads are accessing it at the same time; we
need to document/comment what's accessed concurrently.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/rdma.c | 15 +++++++--------
>  1 file changed, 7 insertions(+), 8 deletions(-)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index c745427..f5c1d02 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -404,7 +404,6 @@ struct QIOChannelRDMA {
>      QIOChannel parent;
>      RDMAContext *rdma;
>      QEMUFile *file;
> -    size_t len;
>      bool blocking; /* XXX we don't actually honour this yet */
>  };
>  
> @@ -2640,6 +2639,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
>      int ret;
>      ssize_t done = 0;
>      size_t i;
> +    size_t len = 0;
>  
>      CHECK_ERROR_STATE();
>  
> @@ -2659,10 +2659,10 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
>          while (remaining) {
>              RDMAControlHeader head;
>  
> -            rioc->len = MIN(remaining, RDMA_SEND_INCREMENT);
> -            remaining -= rioc->len;
> +            len = MIN(remaining, RDMA_SEND_INCREMENT);
> +            remaining -= len;
>  
> -            head.len = rioc->len;
> +            head.len = len;
>              head.type = RDMA_CONTROL_QEMU_FILE;
>  
>              ret = qemu_rdma_exchange_send(rdma, &head, data, NULL, NULL, NULL);
> @@ -2672,8 +2672,8 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
>                  return ret;
>              }
>  
> -            data += rioc->len;
> -            done += rioc->len;
> +            data += len;
> +            done += len;
>          }
>      }
>  
> @@ -2768,8 +2768,7 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
>              }
>          }
>      }
> -    rioc->len = done;
> -    return rioc->len;
> +    return done;
>  }
>  
>  /*
> -- 
> 1.8.3.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v2 4/5] migration: implement bi-directional RDMA QIOChannel
  2018-04-25 14:35 ` [Qemu-devel] [PATCH v2 4/5] migration: implement bi-directional RDMA QIOChannel Lidong Chen
@ 2018-04-26 17:36   ` Dr. David Alan Gilbert
  2018-04-27  7:56     ` 858585 jemmy
  0 siblings, 1 reply; 18+ messages in thread
From: Dr. David Alan Gilbert @ 2018-04-26 17:36 UTC (permalink / raw)
  To: Lidong Chen, berrange
  Cc: quintela, qemu-devel, galsha, aviadye, licq, adido, Lidong Chen

* Lidong Chen (jemmy858585@gmail.com) wrote:
> This patch implements bi-directional RDMA QIOChannel. Because different
> threads may access RDMAQIOChannel concurrently, this patch use RCU to protect it.
> 
> Signed-off-by: Lidong Chen <lidongchen@tencent.com>

I'm a bit confused by this.

I can see it's adding RCU to protect the rdma structures against
deletion from multiple threads; that I'm OK with in principal; is that
the only locking we need? (I guess the two directions are actually
separate RDMAContext's so maybe).

But is there nothing else to make the QIOChannel bidirectional?

Also, a lot seems dependent on listen_id, can you explain how that's
being used.

Finally, I don't think you have anywhere that destroys the new mutex you
added.

Dave
P.S. Please cc Daniel Berrange on this series, since it's so much
IOChannel stuff.

> ---
>  migration/rdma.c | 162 +++++++++++++++++++++++++++++++++++++++++++++++++------
>  1 file changed, 146 insertions(+), 16 deletions(-)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index f5c1d02..0652224 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -86,6 +86,7 @@ static uint32_t known_capabilities = RDMA_CAPABILITY_PIN_ALL;
>                                  " to abort!"); \
>                  rdma->error_reported = 1; \
>              } \
> +            rcu_read_unlock(); \
>              return rdma->error_state; \
>          } \
>      } while (0)
> @@ -405,6 +406,7 @@ struct QIOChannelRDMA {
>      RDMAContext *rdma;
>      QEMUFile *file;
>      bool blocking; /* XXX we don't actually honour this yet */
> +    QemuMutex lock;
>  };
>  
>  /*
> @@ -2635,12 +2637,29 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
>  {
>      QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
>      QEMUFile *f = rioc->file;
> -    RDMAContext *rdma = rioc->rdma;
> +    RDMAContext *rdma;
>      int ret;
>      ssize_t done = 0;
>      size_t i;
>      size_t len = 0;
>  
> +    rcu_read_lock();
> +    rdma = atomic_rcu_read(&rioc->rdma);
> +
> +    if (!rdma) {
> +        rcu_read_unlock();
> +        return -EIO;
> +    }
> +
> +    if (rdma->listen_id) {
> +        rdma = rdma->return_path;
> +    }
> +
> +    if (!rdma) {
> +        rcu_read_unlock();
> +        return -EIO;
> +    }
> +
>      CHECK_ERROR_STATE();
>  
>      /*
> @@ -2650,6 +2669,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
>      ret = qemu_rdma_write_flush(f, rdma);
>      if (ret < 0) {
>          rdma->error_state = ret;
> +        rcu_read_unlock();
>          return ret;
>      }
>  
> @@ -2669,6 +2689,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
>  
>              if (ret < 0) {
>                  rdma->error_state = ret;
> +                rcu_read_unlock();
>                  return ret;
>              }
>  
> @@ -2677,6 +2698,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
>          }
>      }
>  
> +    rcu_read_unlock();
>      return done;
>  }
>  
> @@ -2710,12 +2732,29 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
>                                        Error **errp)
>  {
>      QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
> -    RDMAContext *rdma = rioc->rdma;
> +    RDMAContext *rdma;
>      RDMAControlHeader head;
>      int ret = 0;
>      ssize_t i;
>      size_t done = 0;
>  
> +    rcu_read_lock();
> +    rdma = atomic_rcu_read(&rioc->rdma);
> +
> +    if (!rdma) {
> +        rcu_read_unlock();
> +        return -EIO;
> +    }
> +
> +    if (!rdma->listen_id) {
> +        rdma = rdma->return_path;
> +    }
> +
> +    if (!rdma) {
> +        rcu_read_unlock();
> +        return -EIO;
> +    }
> +
>      CHECK_ERROR_STATE();
>  
>      for (i = 0; i < niov; i++) {
> @@ -2727,7 +2766,7 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
>           * were given and dish out the bytes until we run
>           * out of bytes.
>           */
> -        ret = qemu_rdma_fill(rioc->rdma, data, want, 0);
> +        ret = qemu_rdma_fill(rdma, data, want, 0);
>          done += ret;
>          want -= ret;
>          /* Got what we needed, so go to next iovec */
> @@ -2749,25 +2788,28 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
>  
>          if (ret < 0) {
>              rdma->error_state = ret;
> +            rcu_read_unlock();
>              return ret;
>          }
>  
>          /*
>           * SEND was received with new bytes, now try again.
>           */
> -        ret = qemu_rdma_fill(rioc->rdma, data, want, 0);
> +        ret = qemu_rdma_fill(rdma, data, want, 0);
>          done += ret;
>          want -= ret;
>  
>          /* Still didn't get enough, so lets just return */
>          if (want) {
>              if (done == 0) {
> +                rcu_read_unlock();
>                  return QIO_CHANNEL_ERR_BLOCK;
>              } else {
>                  break;
>              }
>          }
>      }
> +    rcu_read_unlock();
>      return done;
>  }
>  
> @@ -2823,6 +2865,16 @@ qio_channel_rdma_source_prepare(GSource *source,
>      GIOCondition cond = 0;
>      *timeout = -1;
>  
> +    if ((rdma->listen_id && rsource->condition == G_IO_OUT) ||
> +       (!rdma->listen_id && rsource->condition == G_IO_IN)) {
> +        rdma = rdma->return_path;
> +    }
> +
> +    if (!rdma) {
> +        error_report("RDMAContext is NULL when prepare Gsource");
> +        return FALSE;
> +    }
> +
>      if (rdma->wr_data[0].control_len) {
>          cond |= G_IO_IN;
>      }
> @@ -2838,6 +2890,16 @@ qio_channel_rdma_source_check(GSource *source)
>      RDMAContext *rdma = rsource->rioc->rdma;
>      GIOCondition cond = 0;
>  
> +    if ((rdma->listen_id && rsource->condition == G_IO_OUT) ||
> +       (!rdma->listen_id && rsource->condition == G_IO_IN)) {
> +        rdma = rdma->return_path;
> +    }
> +
> +    if (!rdma) {
> +        error_report("RDMAContext is NULL when check Gsource");
> +        return FALSE;
> +    }
> +
>      if (rdma->wr_data[0].control_len) {
>          cond |= G_IO_IN;
>      }
> @@ -2856,6 +2918,16 @@ qio_channel_rdma_source_dispatch(GSource *source,
>      RDMAContext *rdma = rsource->rioc->rdma;
>      GIOCondition cond = 0;
>  
> +    if ((rdma->listen_id && rsource->condition == G_IO_OUT) ||
> +       (!rdma->listen_id && rsource->condition == G_IO_IN)) {
> +        rdma = rdma->return_path;
> +    }
> +
> +    if (!rdma) {
> +        error_report("RDMAContext is NULL when dispatch Gsource");
> +        return FALSE;
> +    }
> +
>      if (rdma->wr_data[0].control_len) {
>          cond |= G_IO_IN;
>      }
> @@ -2905,15 +2977,29 @@ static int qio_channel_rdma_close(QIOChannel *ioc,
>                                    Error **errp)
>  {
>      QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
> +    RDMAContext *rdma;
>      trace_qemu_rdma_close();
> -    if (rioc->rdma) {
> -        if (!rioc->rdma->error_state) {
> -            rioc->rdma->error_state = qemu_file_get_error(rioc->file);
> -        }
> -        qemu_rdma_cleanup(rioc->rdma);
> -        g_free(rioc->rdma);
> -        rioc->rdma = NULL;
> +
> +    qemu_mutex_lock(&rioc->lock);
> +    rdma = rioc->rdma;
> +    if (!rdma) {
> +        qemu_mutex_unlock(&rioc->lock);
> +        return 0;
> +    }
> +    atomic_rcu_set(&rioc->rdma, NULL);
> +    qemu_mutex_unlock(&rioc->lock);
> +
> +    if (!rdma->error_state) {
> +        rdma->error_state = qemu_file_get_error(rioc->file);
> +    }
> +    qemu_rdma_cleanup(rdma);
> +
> +    if (rdma->return_path) {
> +        qemu_rdma_cleanup(rdma->return_path);
> +        g_free(rdma->return_path);
>      }
> +
> +    g_free(rdma);
>      return 0;
>  }
>  
> @@ -2956,12 +3042,21 @@ static size_t qemu_rdma_save_page(QEMUFile *f, void *opaque,
>                                    size_t size, uint64_t *bytes_sent)
>  {
>      QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(opaque);
> -    RDMAContext *rdma = rioc->rdma;
> +    RDMAContext *rdma;
>      int ret;
>  
> +    rcu_read_lock();
> +    rdma = atomic_rcu_read(&rioc->rdma);
> +
> +    if (!rdma) {
> +        rcu_read_unlock();
> +        return -EIO;
> +    }
> +
>      CHECK_ERROR_STATE();
>  
>      if (migrate_get_current()->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
> +        rcu_read_unlock();
>          return RAM_SAVE_CONTROL_NOT_SUPP;
>      }
>  
> @@ -3046,9 +3141,11 @@ static size_t qemu_rdma_save_page(QEMUFile *f, void *opaque,
>          }
>      }
>  
> +    rcu_read_unlock();
>      return RAM_SAVE_CONTROL_DELAYED;
>  err:
>      rdma->error_state = ret;
> +    rcu_read_unlock();
>      return ret;
>  }
>  
> @@ -3224,8 +3321,8 @@ static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque)
>      RDMAControlHeader blocks = { .type = RDMA_CONTROL_RAM_BLOCKS_RESULT,
>                                   .repeat = 1 };
>      QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(opaque);
> -    RDMAContext *rdma = rioc->rdma;
> -    RDMALocalBlocks *local = &rdma->local_ram_blocks;
> +    RDMAContext *rdma;
> +    RDMALocalBlocks *local;
>      RDMAControlHeader head;
>      RDMARegister *reg, *registers;
>      RDMACompress *comp;
> @@ -3238,8 +3335,17 @@ static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque)
>      int count = 0;
>      int i = 0;
>  
> +    rcu_read_lock();
> +    rdma = atomic_rcu_read(&rioc->rdma);
> +
> +    if (!rdma) {
> +        rcu_read_unlock();
> +        return -EIO;
> +    }
> +
>      CHECK_ERROR_STATE();
>  
> +    local = &rdma->local_ram_blocks;
>      do {
>          trace_qemu_rdma_registration_handle_wait();
>  
> @@ -3469,6 +3575,7 @@ out:
>      if (ret < 0) {
>          rdma->error_state = ret;
>      }
> +    rcu_read_unlock();
>      return ret;
>  }
>  
> @@ -3525,11 +3632,19 @@ static int qemu_rdma_registration_start(QEMUFile *f, void *opaque,
>                                          uint64_t flags, void *data)
>  {
>      QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(opaque);
> -    RDMAContext *rdma = rioc->rdma;
> +    RDMAContext *rdma;
> +
> +    rcu_read_lock();
> +    rdma = atomic_rcu_read(&rioc->rdma);
> +    if (!rdma) {
> +        rcu_read_unlock();
> +        return -EIO;
> +    }
>  
>      CHECK_ERROR_STATE();
>  
>      if (migrate_get_current()->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
> +        rcu_read_unlock();
>          return 0;
>      }
>  
> @@ -3537,6 +3652,7 @@ static int qemu_rdma_registration_start(QEMUFile *f, void *opaque,
>      qemu_put_be64(f, RAM_SAVE_FLAG_HOOK);
>      qemu_fflush(f);
>  
> +    rcu_read_unlock();
>      return 0;
>  }
>  
> @@ -3549,13 +3665,21 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
>  {
>      Error *local_err = NULL, **errp = &local_err;
>      QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(opaque);
> -    RDMAContext *rdma = rioc->rdma;
> +    RDMAContext *rdma;
>      RDMAControlHeader head = { .len = 0, .repeat = 1 };
>      int ret = 0;
>  
> +    rcu_read_lock();
> +    rdma = atomic_rcu_read(&rioc->rdma);
> +    if (!rdma) {
> +        rcu_read_unlock();
> +        return -EIO;
> +    }
> +
>      CHECK_ERROR_STATE();
>  
>      if (migrate_get_current()->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
> +        rcu_read_unlock();
>          return 0;
>      }
>  
> @@ -3587,6 +3711,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
>                      qemu_rdma_reg_whole_ram_blocks : NULL);
>          if (ret < 0) {
>              ERROR(errp, "receiving remote info!");
> +            rcu_read_unlock();
>              return ret;
>          }
>  
> @@ -3610,6 +3735,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
>                          "not identical on both the source and destination.",
>                          local->nb_blocks, nb_dest_blocks);
>              rdma->error_state = -EINVAL;
> +            rcu_read_unlock();
>              return -EINVAL;
>          }
>  
> @@ -3626,6 +3752,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
>                              local->block[i].length,
>                              rdma->dest_blocks[i].length);
>                  rdma->error_state = -EINVAL;
> +                rcu_read_unlock();
>                  return -EINVAL;
>              }
>              local->block[i].remote_host_addr =
> @@ -3643,9 +3770,11 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
>          goto err;
>      }
>  
> +    rcu_read_unlock();
>      return 0;
>  err:
>      rdma->error_state = ret;
> +    rcu_read_unlock();
>      return ret;
>  }
>  
> @@ -3707,6 +3836,7 @@ static QEMUFile *qemu_fopen_rdma(RDMAContext *rdma, const char *mode)
>  
>      rioc = QIO_CHANNEL_RDMA(object_new(TYPE_QIO_CHANNEL_RDMA));
>      rioc->rdma = rdma;
> +    qemu_mutex_init(&rioc->lock);
>  
>      if (mode[0] == 'w') {
>          rioc->file = qemu_fopen_channel_output(QIO_CHANNEL(rioc));
> -- 
> 1.8.3.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v2 5/5] migration: Stop rdma yielding during incoming postcopy
  2018-04-25 14:35 ` [Qemu-devel] [PATCH v2 5/5] migration: Stop rdma yielding during incoming postcopy Lidong Chen
@ 2018-04-26 17:54   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 18+ messages in thread
From: Dr. David Alan Gilbert @ 2018-04-26 17:54 UTC (permalink / raw)
  To: Lidong Chen
  Cc: quintela, qemu-devel, galsha, aviadye, licq, adido, Lidong Chen

* Lidong Chen (jemmy858585@gmail.com) wrote:
> During incoming postcopy, the destination qemu will invoke
> qemu_rdma_wait_comp_channel in a seprate thread. So does not use rdma
> yield, and poll the completion channel fd instead.
> 
> Signed-off-by: Lidong Chen <lidongchen@tencent.com>

OK, I think so; it's a bit delicate, but I can't currently see a better
way.


Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/rdma.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index 0652224..4ba9fe2 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -1490,11 +1490,13 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma)
>       * Coroutine doesn't start until migration_fd_process_incoming()
>       * so don't yield unless we know we're running inside of a coroutine.
>       */
> -    if (rdma->migration_started_on_destination) {
> +    if (rdma->migration_started_on_destination &&
> +        migration_incoming_get_current()->state == MIGRATION_STATUS_ACTIVE) {
>          yield_until_fd_readable(rdma->comp_channel->fd);
>      } else {
>          /* This is the source side, we're in a separate thread
>           * or destination prior to migration_fd_process_incoming()
> +         * after postcopy, the destination also in a seprate thread.
>           * we can't yield; so we have to poll the fd.
>           * But we need to be able to handle 'cancel' or an error
>           * without hanging forever.
> -- 
> 1.8.3.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/5] migration: remove unnecessary variables len in QIOChannelRDMA
  2018-04-26 16:40   ` Dr. David Alan Gilbert
@ 2018-04-27  3:51     ` 858585 jemmy
  2018-04-27  9:01     ` Daniel P. Berrangé
  1 sibling, 0 replies; 18+ messages in thread
From: 858585 jemmy @ 2018-04-27  3:51 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Daniel P. Berrange, Juan Quintela, qemu-devel, Gal Shachaf,
	Aviad Yehezkel, licq, adido, Lidong Chen

On Fri, Apr 27, 2018 at 12:40 AM, Dr. David Alan Gilbert
<dgilbert@redhat.com> wrote:
> * Lidong Chen (jemmy858585@gmail.com) wrote:
>> Because qio_channel_rdma_writev and qio_channel_rdma_readv maybe invoked
>> by different threads concurrently, this patch removes unnecessary variables
>> len in QIOChannelRDMA and use local variable instead.
>>
>> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
>
> I'm OK with this patch as is; but now you're making me worried that I
> don't quite understand what thrads are accessing it at the same time; we
> need to document/comment what's accessed concurrently..

for the source qemu, migration thread invokes qio_channel_rdma_writev,
and the return
path thread invokes qio_channel_rdma_readv.

for the destination qemu, before postcopy, the main thread invokes
qio_channel_rdma_readv
and qio_channel_rdma_writev.
after postcopy, the listen_thread invokes qio_channel_rdma_readv, and
the ram_fault_thread
invokes qio_channel_rdma_writev.

>
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>
>> ---
>>  migration/rdma.c | 15 +++++++--------
>>  1 file changed, 7 insertions(+), 8 deletions(-)
>>
>> diff --git a/migration/rdma.c b/migration/rdma.c
>> index c745427..f5c1d02 100644
>> --- a/migration/rdma.c
>> +++ b/migration/rdma.c
>> @@ -404,7 +404,6 @@ struct QIOChannelRDMA {
>>      QIOChannel parent;
>>      RDMAContext *rdma;
>>      QEMUFile *file;
>> -    size_t len;
>>      bool blocking; /* XXX we don't actually honour this yet */
>>  };
>>
>> @@ -2640,6 +2639,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
>>      int ret;
>>      ssize_t done = 0;
>>      size_t i;
>> +    size_t len = 0;
>>
>>      CHECK_ERROR_STATE();
>>
>> @@ -2659,10 +2659,10 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
>>          while (remaining) {
>>              RDMAControlHeader head;
>>
>> -            rioc->len = MIN(remaining, RDMA_SEND_INCREMENT);
>> -            remaining -= rioc->len;
>> +            len = MIN(remaining, RDMA_SEND_INCREMENT);
>> +            remaining -= len;
>>
>> -            head.len = rioc->len;
>> +            head.len = len;
>>              head.type = RDMA_CONTROL_QEMU_FILE;
>>
>>              ret = qemu_rdma_exchange_send(rdma, &head, data, NULL, NULL, NULL);
>> @@ -2672,8 +2672,8 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
>>                  return ret;
>>              }
>>
>> -            data += rioc->len;
>> -            done += rioc->len;
>> +            data += len;
>> +            done += len;
>>          }
>>      }
>>
>> @@ -2768,8 +2768,7 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
>>              }
>>          }
>>      }
>> -    rioc->len = done;
>> -    return rioc->len;
>> +    return done;
>>  }
>>
>>  /*
>> --
>> 1.8.3.1
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v2 4/5] migration: implement bi-directional RDMA QIOChannel
  2018-04-26 17:36   ` Dr. David Alan Gilbert
@ 2018-04-27  7:56     ` 858585 jemmy
  2018-04-27  9:16       ` Daniel P. Berrangé
  0 siblings, 1 reply; 18+ messages in thread
From: 858585 jemmy @ 2018-04-27  7:56 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Daniel P. Berrange, Juan Quintela, qemu-devel, Gal Shachaf,
	Aviad Yehezkel, licq, adido, Lidong Chen

On Fri, Apr 27, 2018 at 1:36 AM, Dr. David Alan Gilbert
<dgilbert@redhat.com> wrote:
> * Lidong Chen (jemmy858585@gmail.com) wrote:
>> This patch implements bi-directional RDMA QIOChannel. Because different
>> threads may access RDMAQIOChannel concurrently, this patch use RCU to protect it.
>>
>> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
>
> I'm a bit confused by this.
>
> I can see it's adding RCU to protect the rdma structures against
> deletion from multiple threads; that I'm OK with in principal; is that
> the only locking we need? (I guess the two directions are actually
> separate RDMAContext's so maybe).

The qio_channel_rdma_close maybe invoked by migration thread and
return path thread
concurrently, so I use a mutex to protect it.

If one thread invoke qio_channel_rdma_writev, another thread invokes
qio_channel_rdma_readv,
two threads will use separate RDMAContext, so it does not need a lock.

If two threads invoke qio_channel_rdma_writev concurrently, it will
need a lock to protect.
but I find source qemu migration thread only invoke
qio_channel_rdma_writev, the return path
thread only invokes qio_channel_rdma_readv.

The destination qemu only invoked qio_channel_rdma_readv by main
thread before postcopy and or
listen thread after postcopy.

The destination qemu have already protected it by using
qemu_mutex_lock(&mis->rp_mutex) when writing data to
source qemu.

But should we use qemu_mutex_lock to protect qio_channel_rdma_writev
and qio_channel_rdma_readv?
to avoid some change in future invoke qio_channel_rdma_writev or
qio_channel_rdma_readv concurrently?

>
> But is there nothing else to make the QIOChannel bidirectional?
>
> Also, a lot seems dependent on listen_id, can you explain how that's
> being used.

The destination qemu is server side, so listen_id is not zero. the
source qemu is client side,
the listen_id is zero.
I use listen_id to determine whether qemu is destination or source.

for the destination qemu, if write data to source, it need use the
return_path rdma, like this:
    if (rdma->listen_id) {
        rdma = rdma->return_path;
    }

for the source qemu, if read data from destination, it also need use
the return_path rdma.
    if (!rdma->listen_id) {
        rdma = rdma->return_path;
    }

>
> Finally, I don't think you have anywhere that destroys the new mutex you
> added.
I will fix this next version.

>
> Dave
> P.S. Please cc Daniel Berrange on this series, since it's so much
> IOChannel stuff.
>
>> ---
>>  migration/rdma.c | 162 +++++++++++++++++++++++++++++++++++++++++++++++++------
>>  1 file changed, 146 insertions(+), 16 deletions(-)
>>
>> diff --git a/migration/rdma.c b/migration/rdma.c
>> index f5c1d02..0652224 100644
>> --- a/migration/rdma.c
>> +++ b/migration/rdma.c
>> @@ -86,6 +86,7 @@ static uint32_t known_capabilities = RDMA_CAPABILITY_PIN_ALL;
>>                                  " to abort!"); \
>>                  rdma->error_reported = 1; \
>>              } \
>> +            rcu_read_unlock(); \
>>              return rdma->error_state; \
>>          } \
>>      } while (0)
>> @@ -405,6 +406,7 @@ struct QIOChannelRDMA {
>>      RDMAContext *rdma;
>>      QEMUFile *file;
>>      bool blocking; /* XXX we don't actually honour this yet */
>> +    QemuMutex lock;
>>  };
>>
>>  /*
>> @@ -2635,12 +2637,29 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
>>  {
>>      QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
>>      QEMUFile *f = rioc->file;
>> -    RDMAContext *rdma = rioc->rdma;
>> +    RDMAContext *rdma;
>>      int ret;
>>      ssize_t done = 0;
>>      size_t i;
>>      size_t len = 0;
>>
>> +    rcu_read_lock();
>> +    rdma = atomic_rcu_read(&rioc->rdma);
>> +
>> +    if (!rdma) {
>> +        rcu_read_unlock();
>> +        return -EIO;
>> +    }
>> +
>> +    if (rdma->listen_id) {
>> +        rdma = rdma->return_path;
>> +    }
>> +
>> +    if (!rdma) {
>> +        rcu_read_unlock();
>> +        return -EIO;
>> +    }
>> +
>>      CHECK_ERROR_STATE();
>>
>>      /*
>> @@ -2650,6 +2669,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
>>      ret = qemu_rdma_write_flush(f, rdma);
>>      if (ret < 0) {
>>          rdma->error_state = ret;
>> +        rcu_read_unlock();
>>          return ret;
>>      }
>>
>> @@ -2669,6 +2689,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
>>
>>              if (ret < 0) {
>>                  rdma->error_state = ret;
>> +                rcu_read_unlock();
>>                  return ret;
>>              }
>>
>> @@ -2677,6 +2698,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
>>          }
>>      }
>>
>> +    rcu_read_unlock();
>>      return done;
>>  }
>>
>> @@ -2710,12 +2732,29 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
>>                                        Error **errp)
>>  {
>>      QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
>> -    RDMAContext *rdma = rioc->rdma;
>> +    RDMAContext *rdma;
>>      RDMAControlHeader head;
>>      int ret = 0;
>>      ssize_t i;
>>      size_t done = 0;
>>
>> +    rcu_read_lock();
>> +    rdma = atomic_rcu_read(&rioc->rdma);
>> +
>> +    if (!rdma) {
>> +        rcu_read_unlock();
>> +        return -EIO;
>> +    }
>> +
>> +    if (!rdma->listen_id) {
>> +        rdma = rdma->return_path;
>> +    }
>> +
>> +    if (!rdma) {
>> +        rcu_read_unlock();
>> +        return -EIO;
>> +    }
>> +
>>      CHECK_ERROR_STATE();
>>
>>      for (i = 0; i < niov; i++) {
>> @@ -2727,7 +2766,7 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
>>           * were given and dish out the bytes until we run
>>           * out of bytes.
>>           */
>> -        ret = qemu_rdma_fill(rioc->rdma, data, want, 0);
>> +        ret = qemu_rdma_fill(rdma, data, want, 0);
>>          done += ret;
>>          want -= ret;
>>          /* Got what we needed, so go to next iovec */
>> @@ -2749,25 +2788,28 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
>>
>>          if (ret < 0) {
>>              rdma->error_state = ret;
>> +            rcu_read_unlock();
>>              return ret;
>>          }
>>
>>          /*
>>           * SEND was received with new bytes, now try again.
>>           */
>> -        ret = qemu_rdma_fill(rioc->rdma, data, want, 0);
>> +        ret = qemu_rdma_fill(rdma, data, want, 0);
>>          done += ret;
>>          want -= ret;
>>
>>          /* Still didn't get enough, so lets just return */
>>          if (want) {
>>              if (done == 0) {
>> +                rcu_read_unlock();
>>                  return QIO_CHANNEL_ERR_BLOCK;
>>              } else {
>>                  break;
>>              }
>>          }
>>      }
>> +    rcu_read_unlock();
>>      return done;
>>  }
>>
>> @@ -2823,6 +2865,16 @@ qio_channel_rdma_source_prepare(GSource *source,
>>      GIOCondition cond = 0;
>>      *timeout = -1;
>>
>> +    if ((rdma->listen_id && rsource->condition == G_IO_OUT) ||
>> +       (!rdma->listen_id && rsource->condition == G_IO_IN)) {
>> +        rdma = rdma->return_path;
>> +    }
>> +
>> +    if (!rdma) {
>> +        error_report("RDMAContext is NULL when prepare Gsource");
>> +        return FALSE;
>> +    }
>> +
>>      if (rdma->wr_data[0].control_len) {
>>          cond |= G_IO_IN;
>>      }
>> @@ -2838,6 +2890,16 @@ qio_channel_rdma_source_check(GSource *source)
>>      RDMAContext *rdma = rsource->rioc->rdma;
>>      GIOCondition cond = 0;
>>
>> +    if ((rdma->listen_id && rsource->condition == G_IO_OUT) ||
>> +       (!rdma->listen_id && rsource->condition == G_IO_IN)) {
>> +        rdma = rdma->return_path;
>> +    }
>> +
>> +    if (!rdma) {
>> +        error_report("RDMAContext is NULL when check Gsource");
>> +        return FALSE;
>> +    }
>> +
>>      if (rdma->wr_data[0].control_len) {
>>          cond |= G_IO_IN;
>>      }
>> @@ -2856,6 +2918,16 @@ qio_channel_rdma_source_dispatch(GSource *source,
>>      RDMAContext *rdma = rsource->rioc->rdma;
>>      GIOCondition cond = 0;
>>
>> +    if ((rdma->listen_id && rsource->condition == G_IO_OUT) ||
>> +       (!rdma->listen_id && rsource->condition == G_IO_IN)) {
>> +        rdma = rdma->return_path;
>> +    }
>> +
>> +    if (!rdma) {
>> +        error_report("RDMAContext is NULL when dispatch Gsource");
>> +        return FALSE;
>> +    }
>> +
>>      if (rdma->wr_data[0].control_len) {
>>          cond |= G_IO_IN;
>>      }
>> @@ -2905,15 +2977,29 @@ static int qio_channel_rdma_close(QIOChannel *ioc,
>>                                    Error **errp)
>>  {
>>      QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
>> +    RDMAContext *rdma;
>>      trace_qemu_rdma_close();
>> -    if (rioc->rdma) {
>> -        if (!rioc->rdma->error_state) {
>> -            rioc->rdma->error_state = qemu_file_get_error(rioc->file);
>> -        }
>> -        qemu_rdma_cleanup(rioc->rdma);
>> -        g_free(rioc->rdma);
>> -        rioc->rdma = NULL;
>> +
>> +    qemu_mutex_lock(&rioc->lock);
>> +    rdma = rioc->rdma;
>> +    if (!rdma) {
>> +        qemu_mutex_unlock(&rioc->lock);
>> +        return 0;
>> +    }
>> +    atomic_rcu_set(&rioc->rdma, NULL);
>> +    qemu_mutex_unlock(&rioc->lock);
>> +
>> +    if (!rdma->error_state) {
>> +        rdma->error_state = qemu_file_get_error(rioc->file);
>> +    }
>> +    qemu_rdma_cleanup(rdma);
>> +
>> +    if (rdma->return_path) {
>> +        qemu_rdma_cleanup(rdma->return_path);
>> +        g_free(rdma->return_path);
>>      }
>> +
>> +    g_free(rdma);
>>      return 0;
>>  }
>>
>> @@ -2956,12 +3042,21 @@ static size_t qemu_rdma_save_page(QEMUFile *f, void *opaque,
>>                                    size_t size, uint64_t *bytes_sent)
>>  {
>>      QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(opaque);
>> -    RDMAContext *rdma = rioc->rdma;
>> +    RDMAContext *rdma;
>>      int ret;
>>
>> +    rcu_read_lock();
>> +    rdma = atomic_rcu_read(&rioc->rdma);
>> +
>> +    if (!rdma) {
>> +        rcu_read_unlock();
>> +        return -EIO;
>> +    }
>> +
>>      CHECK_ERROR_STATE();
>>
>>      if (migrate_get_current()->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
>> +        rcu_read_unlock();
>>          return RAM_SAVE_CONTROL_NOT_SUPP;
>>      }
>>
>> @@ -3046,9 +3141,11 @@ static size_t qemu_rdma_save_page(QEMUFile *f, void *opaque,
>>          }
>>      }
>>
>> +    rcu_read_unlock();
>>      return RAM_SAVE_CONTROL_DELAYED;
>>  err:
>>      rdma->error_state = ret;
>> +    rcu_read_unlock();
>>      return ret;
>>  }
>>
>> @@ -3224,8 +3321,8 @@ static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque)
>>      RDMAControlHeader blocks = { .type = RDMA_CONTROL_RAM_BLOCKS_RESULT,
>>                                   .repeat = 1 };
>>      QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(opaque);
>> -    RDMAContext *rdma = rioc->rdma;
>> -    RDMALocalBlocks *local = &rdma->local_ram_blocks;
>> +    RDMAContext *rdma;
>> +    RDMALocalBlocks *local;
>>      RDMAControlHeader head;
>>      RDMARegister *reg, *registers;
>>      RDMACompress *comp;
>> @@ -3238,8 +3335,17 @@ static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque)
>>      int count = 0;
>>      int i = 0;
>>
>> +    rcu_read_lock();
>> +    rdma = atomic_rcu_read(&rioc->rdma);
>> +
>> +    if (!rdma) {
>> +        rcu_read_unlock();
>> +        return -EIO;
>> +    }
>> +
>>      CHECK_ERROR_STATE();
>>
>> +    local = &rdma->local_ram_blocks;
>>      do {
>>          trace_qemu_rdma_registration_handle_wait();
>>
>> @@ -3469,6 +3575,7 @@ out:
>>      if (ret < 0) {
>>          rdma->error_state = ret;
>>      }
>> +    rcu_read_unlock();
>>      return ret;
>>  }
>>
>> @@ -3525,11 +3632,19 @@ static int qemu_rdma_registration_start(QEMUFile *f, void *opaque,
>>                                          uint64_t flags, void *data)
>>  {
>>      QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(opaque);
>> -    RDMAContext *rdma = rioc->rdma;
>> +    RDMAContext *rdma;
>> +
>> +    rcu_read_lock();
>> +    rdma = atomic_rcu_read(&rioc->rdma);
>> +    if (!rdma) {
>> +        rcu_read_unlock();
>> +        return -EIO;
>> +    }
>>
>>      CHECK_ERROR_STATE();
>>
>>      if (migrate_get_current()->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
>> +        rcu_read_unlock();
>>          return 0;
>>      }
>>
>> @@ -3537,6 +3652,7 @@ static int qemu_rdma_registration_start(QEMUFile *f, void *opaque,
>>      qemu_put_be64(f, RAM_SAVE_FLAG_HOOK);
>>      qemu_fflush(f);
>>
>> +    rcu_read_unlock();
>>      return 0;
>>  }
>>
>> @@ -3549,13 +3665,21 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
>>  {
>>      Error *local_err = NULL, **errp = &local_err;
>>      QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(opaque);
>> -    RDMAContext *rdma = rioc->rdma;
>> +    RDMAContext *rdma;
>>      RDMAControlHeader head = { .len = 0, .repeat = 1 };
>>      int ret = 0;
>>
>> +    rcu_read_lock();
>> +    rdma = atomic_rcu_read(&rioc->rdma);
>> +    if (!rdma) {
>> +        rcu_read_unlock();
>> +        return -EIO;
>> +    }
>> +
>>      CHECK_ERROR_STATE();
>>
>>      if (migrate_get_current()->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
>> +        rcu_read_unlock();
>>          return 0;
>>      }
>>
>> @@ -3587,6 +3711,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
>>                      qemu_rdma_reg_whole_ram_blocks : NULL);
>>          if (ret < 0) {
>>              ERROR(errp, "receiving remote info!");
>> +            rcu_read_unlock();
>>              return ret;
>>          }
>>
>> @@ -3610,6 +3735,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
>>                          "not identical on both the source and destination.",
>>                          local->nb_blocks, nb_dest_blocks);
>>              rdma->error_state = -EINVAL;
>> +            rcu_read_unlock();
>>              return -EINVAL;
>>          }
>>
>> @@ -3626,6 +3752,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
>>                              local->block[i].length,
>>                              rdma->dest_blocks[i].length);
>>                  rdma->error_state = -EINVAL;
>> +                rcu_read_unlock();
>>                  return -EINVAL;
>>              }
>>              local->block[i].remote_host_addr =
>> @@ -3643,9 +3770,11 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
>>          goto err;
>>      }
>>
>> +    rcu_read_unlock();
>>      return 0;
>>  err:
>>      rdma->error_state = ret;
>> +    rcu_read_unlock();
>>      return ret;
>>  }
>>
>> @@ -3707,6 +3836,7 @@ static QEMUFile *qemu_fopen_rdma(RDMAContext *rdma, const char *mode)
>>
>>      rioc = QIO_CHANNEL_RDMA(object_new(TYPE_QIO_CHANNEL_RDMA));
>>      rioc->rdma = rdma;
>> +    qemu_mutex_init(&rioc->lock);
>>
>>      if (mode[0] == 'w') {
>>          rioc->file = qemu_fopen_channel_output(QIO_CHANNEL(rioc));
>> --
>> 1.8.3.1
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/5] migration: remove unnecessary variables len in QIOChannelRDMA
  2018-04-26 16:40   ` Dr. David Alan Gilbert
  2018-04-27  3:51     ` 858585 jemmy
@ 2018-04-27  9:01     ` Daniel P. Berrangé
  1 sibling, 0 replies; 18+ messages in thread
From: Daniel P. Berrangé @ 2018-04-27  9:01 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Lidong Chen, quintela, qemu-devel, galsha, aviadye, licq, adido,
	Lidong Chen

On Thu, Apr 26, 2018 at 05:40:11PM +0100, Dr. David Alan Gilbert wrote:
> * Lidong Chen (jemmy858585@gmail.com) wrote:
> > Because qio_channel_rdma_writev and qio_channel_rdma_readv maybe invoked
> > by different threads concurrently, this patch removes unnecessary variables
> > len in QIOChannelRDMA and use local variable instead.
> > 
> > Signed-off-by: Lidong Chen <lidongchen@tencent.com>
> 
> I'm OK with this patch as is; but now you're making me worried that I
> don't quite understand what thrads are accessing it at the same time; we
> need to document/comment what's accessed concurrently.

FWIW, I have always intended that QIOChannel impls be able to support
bidirectional I/O from multiple threads, provided they each are using
it in a different direction. So this is a mistake in my original RDMA
impl.

> 
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 
> > ---
> >  migration/rdma.c | 15 +++++++--------
> >  1 file changed, 7 insertions(+), 8 deletions(-)
> > 
> > diff --git a/migration/rdma.c b/migration/rdma.c
> > index c745427..f5c1d02 100644
> > --- a/migration/rdma.c
> > +++ b/migration/rdma.c
> > @@ -404,7 +404,6 @@ struct QIOChannelRDMA {
> >      QIOChannel parent;
> >      RDMAContext *rdma;
> >      QEMUFile *file;
> > -    size_t len;
> >      bool blocking; /* XXX we don't actually honour this yet */
> >  };
> >  
> > @@ -2640,6 +2639,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
> >      int ret;
> >      ssize_t done = 0;
> >      size_t i;
> > +    size_t len = 0;
> >  
> >      CHECK_ERROR_STATE();
> >  
> > @@ -2659,10 +2659,10 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
> >          while (remaining) {
> >              RDMAControlHeader head;
> >  
> > -            rioc->len = MIN(remaining, RDMA_SEND_INCREMENT);
> > -            remaining -= rioc->len;
> > +            len = MIN(remaining, RDMA_SEND_INCREMENT);
> > +            remaining -= len;
> >  
> > -            head.len = rioc->len;
> > +            head.len = len;
> >              head.type = RDMA_CONTROL_QEMU_FILE;
> >  
> >              ret = qemu_rdma_exchange_send(rdma, &head, data, NULL, NULL, NULL);
> > @@ -2672,8 +2672,8 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
> >                  return ret;
> >              }
> >  
> > -            data += rioc->len;
> > -            done += rioc->len;
> > +            data += len;
> > +            done += len;
> >          }
> >      }
> >  
> > @@ -2768,8 +2768,7 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
> >              }
> >          }
> >      }
> > -    rioc->len = done;
> > -    return rioc->len;
> > +    return done;
> >  }
> >  
> >  /*
> > -- 
> > 1.8.3.1
> > 
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/5] migration: remove unnecessary variables len in QIOChannelRDMA
  2018-04-25 14:35 ` [Qemu-devel] [PATCH v2 3/5] migration: remove unnecessary variables len in QIOChannelRDMA Lidong Chen
  2018-04-26 16:40   ` Dr. David Alan Gilbert
@ 2018-04-27  9:04   ` Daniel P. Berrangé
  1 sibling, 0 replies; 18+ messages in thread
From: Daniel P. Berrangé @ 2018-04-27  9:04 UTC (permalink / raw)
  To: Lidong Chen
  Cc: quintela, dgilbert, galsha, adido, aviadye, qemu-devel, licq,
	Lidong Chen

On Wed, Apr 25, 2018 at 10:35:32PM +0800, Lidong Chen wrote:
> Because qio_channel_rdma_writev and qio_channel_rdma_readv maybe invoked
> by different threads concurrently, this patch removes unnecessary variables
> len in QIOChannelRDMA and use local variable instead.
> 
> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
> ---
>  migration/rdma.c | 15 +++++++--------
>  1 file changed, 7 insertions(+), 8 deletions(-)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v2 4/5] migration: implement bi-directional RDMA QIOChannel
  2018-04-27  7:56     ` 858585 jemmy
@ 2018-04-27  9:16       ` Daniel P. Berrangé
  2018-04-28  4:16         ` 858585 jemmy
  0 siblings, 1 reply; 18+ messages in thread
From: Daniel P. Berrangé @ 2018-04-27  9:16 UTC (permalink / raw)
  To: 858585 jemmy
  Cc: Dr. David Alan Gilbert, Juan Quintela, qemu-devel, Gal Shachaf,
	Aviad Yehezkel, licq, adido, Lidong Chen

On Fri, Apr 27, 2018 at 03:56:38PM +0800, 858585 jemmy wrote:
> On Fri, Apr 27, 2018 at 1:36 AM, Dr. David Alan Gilbert
> <dgilbert@redhat.com> wrote:
> > * Lidong Chen (jemmy858585@gmail.com) wrote:
> >> This patch implements bi-directional RDMA QIOChannel. Because different
> >> threads may access RDMAQIOChannel concurrently, this patch use RCU to protect it.
> >>
> >> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
> >
> > I'm a bit confused by this.
> >
> > I can see it's adding RCU to protect the rdma structures against
> > deletion from multiple threads; that I'm OK with in principal; is that
> > the only locking we need? (I guess the two directions are actually
> > separate RDMAContext's so maybe).
> 
> The qio_channel_rdma_close maybe invoked by migration thread and
> return path thread
> concurrently, so I use a mutex to protect it.

Hmm, that is not good - concurrent threads calling close must not be
allowed to happen even with non-RDMA I/O chanels.

For example, with the QIOChannelSocket, one thread can call close
which sets the fd = -1, another thread can race with this and either
end up calling close again on the same FD or calling close on -1.
Either way the second thread will get an error from close() when
it should have skipped the close() and returned success. Perhaps
migration gets lucky and this doesn't result in it being marked
as failed, but it is still not good.

So only one thread should be calling close().

> If one thread invoke qio_channel_rdma_writev, another thread invokes
> qio_channel_rdma_readv,
> two threads will use separate RDMAContext, so it does not need a lock.
> 
> If two threads invoke qio_channel_rdma_writev concurrently, it will
> need a lock to protect.
> but I find source qemu migration thread only invoke
> qio_channel_rdma_writev, the return path
> thread only invokes qio_channel_rdma_readv.

QIOChannel impls are only intended to cope with a single thread doing
I/O in each direction. If you have two threads needing to read, or
two threads needing to write, the layer above should provide locking
to ensure correct ordering  of I/O oprations.

> The destination qemu only invoked qio_channel_rdma_readv by main
> thread before postcopy and or
> listen thread after postcopy.
> 
> The destination qemu have already protected it by using
> qemu_mutex_lock(&mis->rp_mutex) when writing data to
> source qemu.
> 
> But should we use qemu_mutex_lock to protect qio_channel_rdma_writev
> and qio_channel_rdma_readv?
> to avoid some change in future invoke qio_channel_rdma_writev or
> qio_channel_rdma_readv concurrently?

> 
> >
> > But is there nothing else to make the QIOChannel bidirectional?
> >
> > Also, a lot seems dependent on listen_id, can you explain how that's
> > being used.
> 
> The destination qemu is server side, so listen_id is not zero. the
> source qemu is client side,
> the listen_id is zero.
> I use listen_id to determine whether qemu is destination or source.
> 
> for the destination qemu, if write data to source, it need use the
> return_path rdma, like this:
>     if (rdma->listen_id) {
>         rdma = rdma->return_path;
>     }
> 
> for the source qemu, if read data from destination, it also need use
> the return_path rdma.
>     if (!rdma->listen_id) {
>         rdma = rdma->return_path;
>     }

This feels uncessarily complex to me. Why not just change QIOCHannelRMDA
struct, so that it has 2 RDMA context pointers eg

struct QIOChannelRDMA {
    QIOChannel parent;
    RDMAContext *rdmain;
    RDMAContext *rdmaout;
    QEMUFile *file;
    bool blocking; /* XXX we don't actually honour this yet */
};



Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v2 4/5] migration: implement bi-directional RDMA QIOChannel
  2018-04-27  9:16       ` Daniel P. Berrangé
@ 2018-04-28  4:16         ` 858585 jemmy
  2018-04-30  9:18           ` Daniel P. Berrangé
  0 siblings, 1 reply; 18+ messages in thread
From: 858585 jemmy @ 2018-04-28  4:16 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Dr. David Alan Gilbert, Juan Quintela, qemu-devel, Gal Shachaf,
	Aviad Yehezkel, licq, adido, Lidong Chen

On Fri, Apr 27, 2018 at 5:16 PM, Daniel P. Berrangé <berrange@redhat.com> wrote:
> On Fri, Apr 27, 2018 at 03:56:38PM +0800, 858585 jemmy wrote:
>> On Fri, Apr 27, 2018 at 1:36 AM, Dr. David Alan Gilbert
>> <dgilbert@redhat.com> wrote:
>> > * Lidong Chen (jemmy858585@gmail.com) wrote:
>> >> This patch implements bi-directional RDMA QIOChannel. Because different
>> >> threads may access RDMAQIOChannel concurrently, this patch use RCU to protect it.
>> >>
>> >> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
>> >
>> > I'm a bit confused by this.
>> >
>> > I can see it's adding RCU to protect the rdma structures against
>> > deletion from multiple threads; that I'm OK with in principal; is that
>> > the only locking we need? (I guess the two directions are actually
>> > separate RDMAContext's so maybe).
>>
>> The qio_channel_rdma_close maybe invoked by migration thread and
>> return path thread
>> concurrently, so I use a mutex to protect it.
>
> Hmm, that is not good - concurrent threads calling close must not be
> allowed to happen even with non-RDMA I/O chanels.
>
> For example, with the QIOChannelSocket, one thread can call close
> which sets the fd = -1, another thread can race with this and either
> end up calling close again on the same FD or calling close on -1.
> Either way the second thread will get an error from close() when
> it should have skipped the close() and returned success. Perhaps
> migration gets lucky and this doesn't result in it being marked
> as failed, but it is still not good.
>
> So only one thread should be calling close().

for live migration, source qemu invokes qemu_fclose in different
threads, include main thread,
migration thread, return path thread.
destination qemu invokes qemu_fclose in main thread, listen thread and
COLO incoming thread.

so I prefer to add a lock for QEMUFile struct, like this:

int qemu_fclose(QEMUFile *f)
{
    int ret;
    qemu_fflush(f);
    ret = qemu_file_get_error(f);

    if (f->ops->close) {
+        qemu_mutex_lock(&f->lock);
        int ret2 = f->ops->close(f->opaque);
+       qemu_mutex_unlock(&f->lock);
        if (ret >= 0) {
            ret = ret2;
        }
    }
    /* If any error was spotted before closing, we should report it
     * instead of the close() return value.
     */
    if (f->last_error) {
        ret = f->last_error;
    }
    g_free(f);
    trace_qemu_file_fclose();
    return ret;
}

Any suggestion?

>
>> If one thread invoke qio_channel_rdma_writev, another thread invokes
>> qio_channel_rdma_readv,
>> two threads will use separate RDMAContext, so it does not need a lock.
>>
>> If two threads invoke qio_channel_rdma_writev concurrently, it will
>> need a lock to protect.
>> but I find source qemu migration thread only invoke
>> qio_channel_rdma_writev, the return path
>> thread only invokes qio_channel_rdma_readv.
>
> QIOChannel impls are only intended to cope with a single thread doing
> I/O in each direction. If you have two threads needing to read, or
> two threads needing to write, the layer above should provide locking
> to ensure correct ordering  of I/O oprations.

yes, so I think RCU is enough, we do not need more lock.

>
>> The destination qemu only invoked qio_channel_rdma_readv by main
>> thread before postcopy and or
>> listen thread after postcopy.
>>
>> The destination qemu have already protected it by using
>> qemu_mutex_lock(&mis->rp_mutex) when writing data to
>> source qemu.
>>
>> But should we use qemu_mutex_lock to protect qio_channel_rdma_writev
>> and qio_channel_rdma_readv?
>> to avoid some change in future invoke qio_channel_rdma_writev or
>> qio_channel_rdma_readv concurrently?
>
>>
>> >
>> > But is there nothing else to make the QIOChannel bidirectional?
>> >
>> > Also, a lot seems dependent on listen_id, can you explain how that's
>> > being used.
>>
>> The destination qemu is server side, so listen_id is not zero. the
>> source qemu is client side,
>> the listen_id is zero.
>> I use listen_id to determine whether qemu is destination or source.
>>
>> for the destination qemu, if write data to source, it need use the
>> return_path rdma, like this:
>>     if (rdma->listen_id) {
>>         rdma = rdma->return_path;
>>     }
>>
>> for the source qemu, if read data from destination, it also need use
>> the return_path rdma.
>>     if (!rdma->listen_id) {
>>         rdma = rdma->return_path;
>>     }
>
> This feels uncessarily complex to me. Why not just change QIOCHannelRMDA
> struct, so that it has 2 RDMA context pointers eg
>
> struct QIOChannelRDMA {
>     QIOChannel parent;
>     RDMAContext *rdmain;
>     RDMAContext *rdmaout;
>     QEMUFile *file;
>     bool blocking; /* XXX we don't actually honour this yet */
> };

The reason is the parameter of some function is RDMAContext.
like qemu_rdma_accept, rdma_start_outgoing_migration.
It's easier to implement return path.

so I should add some comment to the code.

>
>
>
> Regards,
> Daniel
> --
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v2 4/5] migration: implement bi-directional RDMA QIOChannel
  2018-04-28  4:16         ` 858585 jemmy
@ 2018-04-30  9:18           ` Daniel P. Berrangé
  0 siblings, 0 replies; 18+ messages in thread
From: Daniel P. Berrangé @ 2018-04-30  9:18 UTC (permalink / raw)
  To: 858585 jemmy
  Cc: Dr. David Alan Gilbert, Juan Quintela, qemu-devel, Gal Shachaf,
	Aviad Yehezkel, licq, adido, Lidong Chen

On Sat, Apr 28, 2018 at 12:16:36PM +0800, 858585 jemmy wrote:
> On Fri, Apr 27, 2018 at 5:16 PM, Daniel P. Berrangé <berrange@redhat.com> wrote:
> so I prefer to add a lock for QEMUFile struct, like this:
> 
> int qemu_fclose(QEMUFile *f)
> {
>     int ret;
>     qemu_fflush(f);
>     ret = qemu_file_get_error(f);
> 
>     if (f->ops->close) {
> +        qemu_mutex_lock(&f->lock);
>         int ret2 = f->ops->close(f->opaque);
> +       qemu_mutex_unlock(&f->lock);
>         if (ret >= 0) {
>             ret = ret2;
>         }
>     }

That looks ok


> >> > But is there nothing else to make the QIOChannel bidirectional?
> >> >
> >> > Also, a lot seems dependent on listen_id, can you explain how that's
> >> > being used.
> >>
> >> The destination qemu is server side, so listen_id is not zero. the
> >> source qemu is client side,
> >> the listen_id is zero.
> >> I use listen_id to determine whether qemu is destination or source.
> >>
> >> for the destination qemu, if write data to source, it need use the
> >> return_path rdma, like this:
> >>     if (rdma->listen_id) {
> >>         rdma = rdma->return_path;
> >>     }
> >>
> >> for the source qemu, if read data from destination, it also need use
> >> the return_path rdma.
> >>     if (!rdma->listen_id) {
> >>         rdma = rdma->return_path;
> >>     }
> >
> > This feels uncessarily complex to me. Why not just change QIOCHannelRMDA
> > struct, so that it has 2 RDMA context pointers eg
> >
> > struct QIOChannelRDMA {
> >     QIOChannel parent;
> >     RDMAContext *rdmain;
> >     RDMAContext *rdmaout;
> >     QEMUFile *file;
> >     bool blocking; /* XXX we don't actually honour this yet */
> > };
> 
> The reason is the parameter of some function is RDMAContext.
> like qemu_rdma_accept, rdma_start_outgoing_migration.
> It's easier to implement return path.

I don't see that as an issue - qemu_fopen_rdma() function should be able
to setup a pair of RDMAContext pointers when it creates the QIOChannelRDMA
struct. It just has to request the return path from the single RDMAContext
it gets passed, and then set that on the QIOChannelRDMA struct.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2018-04-30  9:18 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-25 14:35 [Qemu-devel] [PATCH v2 0/5] Enable postcopy RDMA live migration Lidong Chen
2018-04-25 14:35 ` [Qemu-devel] [PATCH v2 1/5] migration: disable RDMA WRITE after postcopy started Lidong Chen
2018-04-26 16:11   ` Dr. David Alan Gilbert
2018-04-25 14:35 ` [Qemu-devel] [PATCH v2 2/5] migration: create a dedicated connection for rdma return path Lidong Chen
2018-04-26 16:19   ` Dr. David Alan Gilbert
2018-04-25 14:35 ` [Qemu-devel] [PATCH v2 3/5] migration: remove unnecessary variables len in QIOChannelRDMA Lidong Chen
2018-04-26 16:40   ` Dr. David Alan Gilbert
2018-04-27  3:51     ` 858585 jemmy
2018-04-27  9:01     ` Daniel P. Berrangé
2018-04-27  9:04   ` Daniel P. Berrangé
2018-04-25 14:35 ` [Qemu-devel] [PATCH v2 4/5] migration: implement bi-directional RDMA QIOChannel Lidong Chen
2018-04-26 17:36   ` Dr. David Alan Gilbert
2018-04-27  7:56     ` 858585 jemmy
2018-04-27  9:16       ` Daniel P. Berrangé
2018-04-28  4:16         ` 858585 jemmy
2018-04-30  9:18           ` Daniel P. Berrangé
2018-04-25 14:35 ` [Qemu-devel] [PATCH v2 5/5] migration: Stop rdma yielding during incoming postcopy Lidong Chen
2018-04-26 17:54   ` Dr. David Alan Gilbert

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.