QEMU-Devel Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH RFC 00/12] *** mulitple RDMA channels for migration ***
@ 2020-01-09  4:59 Zhimin Feng
  2020-01-09  4:59 ` [PATCH RFC 01/12] migration: Add multiRDMA capability support Zhimin Feng
                   ` (13 more replies)
  0 siblings, 14 replies; 37+ messages in thread
From: Zhimin Feng @ 2020-01-09  4:59 UTC (permalink / raw)
  To: quintela, dgilbert, armbru, eblake
  Cc: jemmy858585, fengzhimin, qemu-devel, zhang.zhanghailiang

From: fengzhimin <fengzhimin1@huawei.com>

Currently there is a single channel for RDMA migration, this causes
the problem that the network bandwidth is not fully utilized for
25Gigabit NIC. Inspired by the Multifd, we use two RDMA channels to
send RAM pages, which we call MultiRDMA.

We compare the migration performance of MultiRDMA with origin
RDMA migration. The VM specifications for migration are as follows:
- VM use 4k page;
- the number of VCPU is 4;
- the total memory is 16Gigabit;
- use 'mempress' tool to pressurize VM(mempress 8000 500);
- use 25Gigabit network card to migrate;

For origin RDMA and MultiRDMA migration, the total migration times of
VM are as follows:
+++++++++++++++++++++++++++++++++++++++++++++++++
|             | NOT rdma-pin-all | rdma-pin-all |
+++++++++++++++++++++++++++++++++++++++++++++++++
| origin RDMA |       18 s       |     23 s     |
-------------------------------------------------
|  MultiRDMA  |       13 s       |     18 s     |
+++++++++++++++++++++++++++++++++++++++++++++++++

For NOT rdma-pin-all migration, the multiRDMA can improve the
total migration time by about 27.8%.
For rdma-pin-all migration, the multiRDMA can improve the
total migration time by about 21.7%.

Test the multiRDMA migration like this:
'virsh migrate --live --rdma-parallel --migrateuri
rdma://hostname domain qemu+tcp://hostname/system'


fengzhimin (12):
  migration: Add multiRDMA capability support
  migration: Export the 'migration_incoming_setup' function           
             and add the 'migrate_use_rdma_pin_all' function
  migration: Create the multi-rdma-channels parameter
  migration/rdma: Create multiRDMA migration threads
  migration/rdma: Create the multiRDMA channels
  migration/rdma: Transmit initial package
  migration/rdma: Be sure all channels are created
  migration/rdma: register memory for multiRDMA channels
  migration/rdma: Wait for all multiRDMA to complete registration
  migration/rdma: use multiRDMA to send RAM block for rdma-pin-all mode
  migration/rdma: use multiRDMA to send RAM block for NOT rdma-pin-all
                  mode
  migration/rdma: only register the virt-ram block for MultiRDMA

 migration/migration.c |   55 +-
 migration/migration.h |    6 +
 migration/rdma.c      | 1320 +++++++++++++++++++++++++++++++++++++----
 monitor/hmp-cmds.c    |    7 +
 qapi/migration.json   |   27 +-
 5 files changed, 1285 insertions(+), 130 deletions(-)

-- 
2.19.1




^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH RFC 01/12] migration: Add multiRDMA capability support
  2020-01-09  4:59 [PATCH RFC 00/12] *** mulitple RDMA channels for migration *** Zhimin Feng
@ 2020-01-09  4:59 ` Zhimin Feng
  2020-01-13 15:30   ` Markus Armbruster
                     ` (2 more replies)
  2020-01-09  4:59 ` [PATCH RFC 02/12] migration: Export the 'migration_incoming_setup' function and add the 'migrate_use_rdma_pin_all' function Zhimin Feng
                   ` (12 subsequent siblings)
  13 siblings, 3 replies; 37+ messages in thread
From: Zhimin Feng @ 2020-01-09  4:59 UTC (permalink / raw)
  To: quintela, dgilbert, armbru, eblake
  Cc: jemmy858585, fengzhimin, qemu-devel, zhang.zhanghailiang

From: fengzhimin <fengzhimin1@huawei.com>

Signed-off-by: fengzhimin <fengzhimin1@huawei.com>
---
 migration/migration.c | 11 +++++++++++
 migration/migration.h |  1 +
 qapi/migration.json   |  4 +++-
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index 354ad072fa..e98e236ef9 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2176,6 +2176,15 @@ bool migrate_use_events(void)
     return s->enabled_capabilities[MIGRATION_CAPABILITY_EVENTS];
 }
 
+bool migrate_use_multiRDMA(void)
+{
+    MigrationState *s;
+
+    s = migrate_get_current();
+
+    return s->enabled_capabilities[MIGRATION_CAPABILITY_MULTIRDMA];
+}
+
 bool migrate_use_multifd(void)
 {
     MigrationState *s;
@@ -3509,6 +3518,8 @@ static Property migration_properties[] = {
     DEFINE_PROP_MIG_CAP("x-block", MIGRATION_CAPABILITY_BLOCK),
     DEFINE_PROP_MIG_CAP("x-return-path", MIGRATION_CAPABILITY_RETURN_PATH),
     DEFINE_PROP_MIG_CAP("x-multifd", MIGRATION_CAPABILITY_MULTIFD),
+    DEFINE_PROP_MIG_CAP("x-multirdma",
+                        MIGRATION_CAPABILITY_MULTIRDMA),
 
     DEFINE_PROP_END_OF_LIST(),
 };
diff --git a/migration/migration.h b/migration/migration.h
index 79b3dda146..bb488028a6 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -296,6 +296,7 @@ bool migrate_ignore_shared(void);
 bool migrate_validate_uuid(void);
 
 bool migrate_auto_converge(void);
+bool migrate_use_multiRDMA(void);
 bool migrate_use_multifd(void);
 bool migrate_pause_before_switchover(void);
 int migrate_multifd_channels(void);
diff --git a/qapi/migration.json b/qapi/migration.json
index b7348d0c8b..c995ffdc4c 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -421,6 +421,8 @@
 # @validate-uuid: Send the UUID of the source to allow the destination
 #                 to ensure it is the same. (since 4.2)
 #
+# @multirdma: Use more than one channels for rdma migration. (since 4.2)
+#
 # Since: 1.2
 ##
 { 'enum': 'MigrationCapability',
@@ -428,7 +430,7 @@
            'compress', 'events', 'postcopy-ram', 'x-colo', 'release-ram',
            'block', 'return-path', 'pause-before-switchover', 'multifd',
            'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate',
-           'x-ignore-shared', 'validate-uuid' ] }
+           'x-ignore-shared', 'validate-uuid', 'multirdma' ] }
 
 ##
 # @MigrationCapabilityStatus:
-- 
2.19.1




^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH RFC 02/12] migration: Export the 'migration_incoming_setup' function and add the 'migrate_use_rdma_pin_all' function
  2020-01-09  4:59 [PATCH RFC 00/12] *** mulitple RDMA channels for migration *** Zhimin Feng
  2020-01-09  4:59 ` [PATCH RFC 01/12] migration: Add multiRDMA capability support Zhimin Feng
@ 2020-01-09  4:59 ` Zhimin Feng
  2020-01-15 18:57   ` Dr. David Alan Gilbert
  2020-01-16 13:19   ` Juan Quintela
  2020-01-09  4:59 ` [PATCH RFC 03/12] migration: Create the multi-rdma-channels parameter Zhimin Feng
                   ` (11 subsequent siblings)
  13 siblings, 2 replies; 37+ messages in thread
From: Zhimin Feng @ 2020-01-09  4:59 UTC (permalink / raw)
  To: quintela, dgilbert, armbru, eblake
  Cc: jemmy858585, fengzhimin, qemu-devel, zhang.zhanghailiang

From: fengzhimin <fengzhimin1@huawei.com>

We need to call the 'migration_incoming_setup' function in migration/rdma.c,
so it has to be changed to a global function.

Signed-off-by: fengzhimin <fengzhimin1@huawei.com>
---
 migration/migration.c | 11 ++++++++++-
 migration/migration.h |  2 ++
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index e98e236ef9..d9d73a5eac 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -518,7 +518,7 @@ fail:
     exit(EXIT_FAILURE);
 }
 
-static void migration_incoming_setup(QEMUFile *f)
+void migration_incoming_setup(QEMUFile *f)
 {
     MigrationIncomingState *mis = migration_incoming_get_current();
 
@@ -2185,6 +2185,15 @@ bool migrate_use_multiRDMA(void)
     return s->enabled_capabilities[MIGRATION_CAPABILITY_MULTIRDMA];
 }
 
+bool migrate_use_rdma_pin_all(void)
+{
+    MigrationState *s;
+
+    s = migrate_get_current();
+
+    return s->enabled_capabilities[MIGRATION_CAPABILITY_RDMA_PIN_ALL];
+}
+
 bool migrate_use_multifd(void)
 {
     MigrationState *s;
diff --git a/migration/migration.h b/migration/migration.h
index bb488028a6..0a23375b2f 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -265,6 +265,7 @@ struct MigrationState
 
 void migrate_set_state(int *state, int old_state, int new_state);
 
+void migration_incoming_setup(QEMUFile *f);
 void migration_fd_process_incoming(QEMUFile *f);
 void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp);
 void migration_incoming_process(void);
@@ -297,6 +298,7 @@ bool migrate_validate_uuid(void);
 
 bool migrate_auto_converge(void);
 bool migrate_use_multiRDMA(void);
+bool migrate_use_rdma_pin_all(void);
 bool migrate_use_multifd(void);
 bool migrate_pause_before_switchover(void);
 int migrate_multifd_channels(void);
-- 
2.19.1




^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH RFC 03/12] migration: Create the multi-rdma-channels parameter
  2020-01-09  4:59 [PATCH RFC 00/12] *** mulitple RDMA channels for migration *** Zhimin Feng
  2020-01-09  4:59 ` [PATCH RFC 01/12] migration: Add multiRDMA capability support Zhimin Feng
  2020-01-09  4:59 ` [PATCH RFC 02/12] migration: Export the 'migration_incoming_setup' function and add the 'migrate_use_rdma_pin_all' function Zhimin Feng
@ 2020-01-09  4:59 ` Zhimin Feng
  2020-01-13 15:34   ` Markus Armbruster
  2020-01-16 13:20   ` Juan Quintela
  2020-01-09  4:59 ` [PATCH RFC 04/12] migration/rdma: Create multiRDMA migration threads Zhimin Feng
                   ` (10 subsequent siblings)
  13 siblings, 2 replies; 37+ messages in thread
From: Zhimin Feng @ 2020-01-09  4:59 UTC (permalink / raw)
  To: quintela, dgilbert, armbru, eblake
  Cc: jemmy858585, fengzhimin, qemu-devel, zhang.zhanghailiang

From: fengzhimin <fengzhimin1@huawei.com>

Indicates the number of RDMA threads that we would create.
By default we create 2 threads for RDMA migration.

Signed-off-by: fengzhimin <fengzhimin1@huawei.com>
---
 migration/migration.c | 32 ++++++++++++++++++++++++++++++++
 migration/migration.h |  1 +
 monitor/hmp-cmds.c    |  7 +++++++
 qapi/migration.json   | 23 +++++++++++++++++++----
 4 files changed, 59 insertions(+), 4 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index d9d73a5eac..5756a4806e 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -88,6 +88,9 @@
 #define DEFAULT_MIGRATE_X_CHECKPOINT_DELAY (200 * 100)
 #define DEFAULT_MIGRATE_MULTIFD_CHANNELS 2
 
+/* Define default MultiRDMA thread number */
+#define DEFAULT_MIGRATE_MULTIRDMA_CHANNELS 2
+
 /* Background transfer rate for postcopy, 0 means unlimited, note
  * that page requests can still exceed this limit.
  */
@@ -788,6 +791,8 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp)
     params->announce_rounds = s->parameters.announce_rounds;
     params->has_announce_step = true;
     params->announce_step = s->parameters.announce_step;
+    params->has_multi_rdma_channels = true;
+    params->multi_rdma_channels = s->parameters.multi_rdma_channels;
 
     return params;
 }
@@ -1171,6 +1176,14 @@ static bool migrate_params_check(MigrationParameters *params, Error **errp)
         return false;
     }
 
+    if (params->has_multi_rdma_channels
+        && (params->multi_rdma_channels < 1)) {
+        error_setg(errp, QERR_INVALID_PARAMETER_VALUE,
+                   "multi_rdma_channels",
+                   "is invalid, it should be in the range of 1 to 5");
+        return false;
+    }
+
     if (params->has_xbzrle_cache_size &&
         (params->xbzrle_cache_size < qemu_target_page_size() ||
          !is_power_of_2(params->xbzrle_cache_size))) {
@@ -1302,6 +1315,9 @@ static void migrate_params_test_apply(MigrateSetParameters *params,
     if (params->has_announce_step) {
         dest->announce_step = params->announce_step;
     }
+    if (params->has_multi_rdma_channels) {
+        dest->multi_rdma_channels = params->multi_rdma_channels;
+    }
 }
 
 static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
@@ -1403,6 +1419,9 @@ static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
     if (params->has_announce_step) {
         s->parameters.announce_step = params->announce_step;
     }
+    if (params->has_multi_rdma_channels) {
+        s->parameters.multi_rdma_channels = params->multi_rdma_channels;
+    }
 }
 
 void qmp_migrate_set_parameters(MigrateSetParameters *params, Error **errp)
@@ -2222,6 +2241,15 @@ int migrate_multifd_channels(void)
     return s->parameters.multifd_channels;
 }
 
+int migrate_multiRDMA_channels(void)
+{
+    MigrationState *s;
+
+    s = migrate_get_current();
+
+    return s->parameters.multi_rdma_channels;
+}
+
 int migrate_use_xbzrle(void)
 {
     MigrationState *s;
@@ -3513,6 +3541,9 @@ static Property migration_properties[] = {
     DEFINE_PROP_SIZE("announce-step", MigrationState,
                       parameters.announce_step,
                       DEFAULT_MIGRATE_ANNOUNCE_STEP),
+    DEFINE_PROP_UINT8("multiRDMA-channels", MigrationState,
+                      parameters.multi_rdma_channels,
+                      DEFAULT_MIGRATE_MULTIRDMA_CHANNELS),
 
     /* Migration capabilities */
     DEFINE_PROP_MIG_CAP("x-xbzrle", MIGRATION_CAPABILITY_XBZRLE),
@@ -3591,6 +3622,7 @@ static void migration_instance_init(Object *obj)
     params->has_announce_max = true;
     params->has_announce_rounds = true;
     params->has_announce_step = true;
+    params->has_multi_rdma_channels = true;
 
     qemu_sem_init(&ms->postcopy_pause_sem, 0);
     qemu_sem_init(&ms->postcopy_pause_rp_sem, 0);
diff --git a/migration/migration.h b/migration/migration.h
index 0a23375b2f..4192c22d8c 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -271,6 +271,7 @@ void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp);
 void migration_incoming_process(void);
 
 bool  migration_has_all_channels(void);
+int migrate_multiRDMA_channels(void);
 
 uint64_t migrate_max_downtime(void);
 
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index d0e0af893a..80898c8942 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -456,6 +456,9 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict)
         monitor_printf(mon, " %s: '%s'\n",
             MigrationParameter_str(MIGRATION_PARAMETER_TLS_AUTHZ),
             params->has_tls_authz ? params->tls_authz : "");
+        monitor_printf(mon, "%s: %u\n",
+            MigrationParameter_str(MIGRATION_PARAMETER_MULTI_RDMA_CHANNELS),
+            params->multi_rdma_channels);
     }
 
     qapi_free_MigrationParameters(params);
@@ -1855,6 +1858,10 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
         p->has_announce_step = true;
         visit_type_size(v, param, &p->announce_step, &err);
         break;
+    case MIGRATION_PARAMETER_MULTI_RDMA_CHANNELS:
+        p->has_multi_rdma_channels = true;
+        visit_type_int(v, param, &p->multi_rdma_channels, &err);
+        break;
     default:
         assert(0);
     }
diff --git a/qapi/migration.json b/qapi/migration.json
index c995ffdc4c..ab79bf0600 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -588,6 +588,10 @@
 # @max-cpu-throttle: maximum cpu throttle percentage.
 #                    Defaults to 99. (Since 3.1)
 #
+# @multi-rdma-channels: Number of channels used to migrate data in
+#                       parallel. This is the same number that the
+#                       number of multiRDMA used for migration.  The
+#                       default value is 2 (since 4.2)
 # Since: 2.4
 ##
 { 'enum': 'MigrationParameter',
@@ -600,7 +604,8 @@
            'downtime-limit', 'x-checkpoint-delay', 'block-incremental',
            'multifd-channels',
            'xbzrle-cache-size', 'max-postcopy-bandwidth',
-           'max-cpu-throttle' ] }
+           'max-cpu-throttle',
+           'multi-rdma-channels'] }
 
 ##
 # @MigrateSetParameters:
@@ -690,6 +695,10 @@
 # @max-cpu-throttle: maximum cpu throttle percentage.
 #                    The default value is 99. (Since 3.1)
 #
+# @multi-rdma-channels: Number of channels used to migrate data in
+#                       parallel. This is the same number that the
+#                       number of multiRDMA used for migration.  The
+#                       default value is 2 (since 4.2)
 # Since: 2.4
 ##
 # TODO either fuse back into MigrationParameters, or make
@@ -715,7 +724,8 @@
             '*multifd-channels': 'int',
             '*xbzrle-cache-size': 'size',
             '*max-postcopy-bandwidth': 'size',
-	    '*max-cpu-throttle': 'int' } }
+     	    '*max-cpu-throttle': 'int',
+            '*multi-rdma-channels': 'int'} }
 
 ##
 # @migrate-set-parameters:
@@ -825,6 +835,10 @@
 #                    Defaults to 99.
 #                     (Since 3.1)
 #
+# @multi-rdma-channels: Number of channels used to migrate data in
+#                       parallel. This is the same number that the
+#                       number of multiRDMA used for migration.  The
+#                       default value is 2 (since 4.2)
 # Since: 2.4
 ##
 { 'struct': 'MigrationParameters',
@@ -847,8 +861,9 @@
             '*block-incremental': 'bool' ,
             '*multifd-channels': 'uint8',
             '*xbzrle-cache-size': 'size',
-	    '*max-postcopy-bandwidth': 'size',
-            '*max-cpu-throttle':'uint8'} }
+     	    '*max-postcopy-bandwidth': 'size',
+            '*max-cpu-throttle':'uint8',
+            '*multi-rdma-channels':'uint8'} }
 
 ##
 # @query-migrate-parameters:
-- 
2.19.1




^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH RFC 04/12] migration/rdma: Create multiRDMA migration threads
  2020-01-09  4:59 [PATCH RFC 00/12] *** mulitple RDMA channels for migration *** Zhimin Feng
                   ` (2 preceding siblings ...)
  2020-01-09  4:59 ` [PATCH RFC 03/12] migration: Create the multi-rdma-channels parameter Zhimin Feng
@ 2020-01-09  4:59 ` Zhimin Feng
  2020-01-16 13:25   ` Juan Quintela
  2020-01-09  4:59 ` [PATCH RFC 05/12] migration/rdma: Create the multiRDMA channels Zhimin Feng
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 37+ messages in thread
From: Zhimin Feng @ 2020-01-09  4:59 UTC (permalink / raw)
  To: quintela, dgilbert, armbru, eblake
  Cc: jemmy858585, fengzhimin, qemu-devel, zhang.zhanghailiang

From: fengzhimin <fengzhimin1@huawei.com>

Creation of the RDMA threads, nothing inside yet.

Signed-off-by: fengzhimin <fengzhimin1@huawei.com>
---
 migration/migration.c |   1 +
 migration/migration.h |   2 +
 migration/rdma.c      | 283 ++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 286 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index 5756a4806e..f8d4eb657e 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1546,6 +1546,7 @@ static void migrate_fd_cleanup(MigrationState *s)
         qemu_mutex_lock_iothread();
 
         multifd_save_cleanup();
+        multiRDMA_save_cleanup();
         qemu_mutex_lock(&s->qemu_file_lock);
         tmp = s->to_dst_file;
         s->to_dst_file = NULL;
diff --git a/migration/migration.h b/migration/migration.h
index 4192c22d8c..d69a3fe4e9 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -272,6 +272,8 @@ void migration_incoming_process(void);
 
 bool  migration_has_all_channels(void);
 int migrate_multiRDMA_channels(void);
+int multiRDMA_save_cleanup(void);
+int multiRDMA_load_cleanup(void);
 
 uint64_t migrate_max_downtime(void);
 
diff --git a/migration/rdma.c b/migration/rdma.c
index e241dcb992..992e5abfed 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -395,6 +395,58 @@ typedef struct RDMAContext {
     bool is_return_path;
 } RDMAContext;
 
+typedef struct {
+    /* this fields are not changed once the thread is created */
+    /* channel number */
+    uint8_t id;
+    /* channel thread name */
+    char *name;
+    /* channel thread id */
+    QemuThread thread;
+    /* sem where to wait for more work */
+    QemuSemaphore sem;
+    /* this mutex protects the following parameters */
+    QemuMutex mutex;
+    /* is this channel thread running */
+    bool running;
+    /* should this thread finish */
+    bool quit;
+}  MultiRDMASendParams;
+
+struct {
+    MultiRDMASendParams *params;
+    /* number of created threads */
+    int count;
+    /* syncs main thread and channels */
+    QemuSemaphore sem_sync;
+} *multiRDMA_send_state;
+
+typedef struct {
+    /* this fields are not changed once the thread is created */
+    /* channel number */
+    uint8_t id;
+    /* channel thread name */
+    char *name;
+    /* channel thread id */
+    QemuThread thread;
+    /* sem where to wait for more work */
+    QemuSemaphore sem;
+    /* this mutex protects the following parameters */
+    QemuMutex mutex;
+    /* is this channel thread running */
+    bool running;
+    /* should this thread finish */
+    bool quit;
+} MultiRDMARecvParams;
+
+struct {
+    MultiRDMARecvParams *params;
+    /* number of created threads */
+    int count;
+    /* syncs main thread and channels */
+    QemuSemaphore sem_sync;
+} *multiRDMA_recv_state;
+
 #define TYPE_QIO_CHANNEL_RDMA "qio-channel-rdma"
 #define QIO_CHANNEL_RDMA(obj)                                     \
     OBJECT_CHECK(QIOChannelRDMA, (obj), TYPE_QIO_CHANNEL_RDMA)
@@ -3018,6 +3070,7 @@ static void qio_channel_rdma_close_rcu(struct rdma_close_rcu *rcu)
     if (rcu->rdmaout) {
         qemu_rdma_cleanup(rcu->rdmaout);
     }
+    multiRDMA_load_cleanup();
 
     g_free(rcu->rdmain);
     g_free(rcu->rdmaout);
@@ -3919,6 +3972,7 @@ static void qio_channel_rdma_finalize(Object *obj)
         g_free(rioc->rdmaout);
         rioc->rdmaout = NULL;
     }
+    multiRDMA_load_cleanup();
 }
 
 static void qio_channel_rdma_class_init(ObjectClass *klass,
@@ -4007,6 +4061,59 @@ static void rdma_accept_incoming_migration(void *opaque)
     migration_fd_process_incoming(f);
 }
 
+static void *multiRDMA_recv_thread(void *opaque)
+{
+    MultiRDMARecvParams *p = opaque;
+
+    while (true) {
+        qemu_mutex_lock(&p->mutex);
+        if (p->quit) {
+            qemu_mutex_unlock(&p->mutex);
+            break;
+        }
+        qemu_mutex_unlock(&p->mutex);
+        qemu_sem_wait(&p->sem);
+    }
+
+    qemu_mutex_lock(&p->mutex);
+    p->running = false;
+    qemu_mutex_unlock(&p->mutex);
+
+    return NULL;
+}
+
+static int multiRDMA_load_setup(const char *host_port, RDMAContext *rdma,
+                                      Error **errp)
+{
+    uint8_t i;
+    int thread_count;
+
+    thread_count = migrate_multiRDMA_channels();
+    if (multiRDMA_recv_state == NULL) {
+        multiRDMA_recv_state = g_malloc0(sizeof(*multiRDMA_recv_state));
+        multiRDMA_recv_state->params = g_new0(MultiRDMARecvParams,
+                                              thread_count);
+        atomic_set(&multiRDMA_recv_state->count, 0);
+        qemu_sem_init(&multiRDMA_recv_state->sem_sync, 0);
+
+        for (i = 0; i < thread_count; i++) {
+            MultiRDMARecvParams *p = &multiRDMA_recv_state->params[i];
+
+            qemu_mutex_init(&p->mutex);
+            qemu_sem_init(&p->sem, 0);
+            p->quit = false;
+            p->id = i;
+            p->running = true;
+            p->name = g_strdup_printf("multiRDMARecv_%d", i);
+            qemu_thread_create(&p->thread, p->name, multiRDMA_recv_thread,
+                               p, QEMU_THREAD_JOINABLE);
+            atomic_inc(&multiRDMA_recv_state->count);
+        }
+    }
+
+    return 0;
+}
+
 void rdma_start_incoming_migration(const char *host_port, Error **errp)
 {
     int ret;
@@ -4048,6 +4155,13 @@ void rdma_start_incoming_migration(const char *host_port, Error **errp)
         qemu_rdma_return_path_dest_init(rdma_return_path, rdma);
     }
 
+    if (migrate_use_multiRDMA()) {
+        if (multiRDMA_load_setup(host_port, rdma, &local_err) != 0) {
+            ERROR(errp, "init multiRDMA failure!");
+            goto err;
+        }
+    }
+
     qemu_set_fd_handler(rdma->channel->fd, rdma_accept_incoming_migration,
                         NULL, (void *)(intptr_t)rdma);
     return;
@@ -4055,6 +4169,167 @@ err:
     error_propagate(errp, local_err);
     g_free(rdma);
     g_free(rdma_return_path);
+    multiRDMA_load_cleanup();
+}
+
+static void *multiRDMA_send_thread(void *opaque)
+{
+    MultiRDMASendParams *p = opaque;
+
+    while (true) {
+        qemu_mutex_lock(&p->mutex);
+        if (p->quit) {
+            qemu_mutex_unlock(&p->mutex);
+            break;
+        }
+        qemu_mutex_unlock(&p->mutex);
+        qemu_sem_wait(&p->sem);
+    }
+
+    qemu_mutex_lock(&p->mutex);
+    p->running = false;
+    qemu_mutex_unlock(&p->mutex);
+
+    return NULL;
+}
+
+static int multiRDMA_save_setup(const char *host_port, Error **errp)
+{
+    int thread_count;
+    uint8_t i;
+
+    thread_count = migrate_multiRDMA_channels();
+    multiRDMA_send_state = g_malloc0(sizeof(*multiRDMA_send_state));
+    multiRDMA_send_state->params = g_new0(MultiRDMASendParams,
+                                          thread_count);
+    atomic_set(&multiRDMA_send_state->count, 0);
+    qemu_sem_init(&multiRDMA_send_state->sem_sync, 0);
+
+    for (i = 0; i < thread_count; i++) {
+        MultiRDMASendParams *p = &multiRDMA_send_state->params[i];
+        qemu_mutex_init(&p->mutex);
+        qemu_sem_init(&p->sem, 0);
+        p->quit = false;
+        p->id = i;
+        p->running = true;
+        p->name = g_strdup_printf("multiRDMASend_%d", i);
+
+        qemu_thread_create(&p->thread, p->name, multiRDMA_send_thread, p,
+                           QEMU_THREAD_JOINABLE);
+        atomic_inc(&multiRDMA_send_state->count);
+    }
+
+    return 0;
+}
+
+static void multiRDMA_send_terminate_threads(void)
+{
+    int i;
+    int thread_count = migrate_multiRDMA_channels();
+
+    for (i = 0; i < thread_count; i++) {
+        MultiRDMASendParams *p = &multiRDMA_send_state->params[i];
+
+        qemu_mutex_lock(&p->mutex);
+        p->quit = true;
+        qemu_mutex_unlock(&p->mutex);
+        qemu_sem_post(&p->sem);
+    }
+}
+
+int multiRDMA_save_cleanup(void)
+{
+    int i;
+    int ret = 0;
+    int thread_count = migrate_multiRDMA_channels();
+
+    if (!migrate_use_multiRDMA()) {
+        return 0;
+    }
+
+    /* prevent double free */
+    if (multiRDMA_send_state == NULL) {
+        return 0;
+    }
+
+    /* notify multi RDMA threads to exit */
+    multiRDMA_send_terminate_threads();
+
+    /* wait for multi RDMA send threads to be exit */
+    for (i = 0; i < thread_count; i++) {
+        MultiRDMASendParams *p = &multiRDMA_send_state->params[i];
+
+        qemu_thread_join(&p->thread);
+    }
+
+    for (i = 0; i < thread_count; i++) {
+        MultiRDMASendParams *p = &multiRDMA_send_state->params[i];
+        qemu_mutex_destroy(&p->mutex);
+        qemu_sem_destroy(&p->sem);
+        g_free(p->name);
+        p->name = NULL;
+    }
+
+    qemu_sem_destroy(&multiRDMA_send_state->sem_sync);
+    g_free(multiRDMA_send_state);
+    multiRDMA_send_state = NULL;
+
+    return ret;
+}
+
+static void multiRDMA_recv_terminate_threads(void)
+{
+    int i;
+    int thread_count = migrate_multiRDMA_channels();
+
+    for (i = 0; i < thread_count; i++) {
+        MultiRDMARecvParams *p = &multiRDMA_recv_state->params[i];
+
+        qemu_mutex_lock(&p->mutex);
+        p->quit = true;
+        qemu_mutex_unlock(&p->mutex);
+        qemu_sem_post(&p->sem);
+    }
+}
+
+int multiRDMA_load_cleanup(void)
+{
+    int i;
+    int ret = 0;
+    int thread_count = migrate_multiRDMA_channels();
+
+    if (!migrate_use_multiRDMA()) {
+        return 0;
+    }
+
+    /* prevent double free */
+    if (multiRDMA_recv_state == NULL) {
+        return 0;
+    }
+
+    /* notify multi RDMA recv threads to exit */
+    multiRDMA_recv_terminate_threads();
+
+    /* wait for multi RDMA threads to be exit */
+    for (i = 0; i < thread_count; i++) {
+        MultiRDMARecvParams *p = &multiRDMA_recv_state->params[i];
+
+        qemu_thread_join(&p->thread);
+    }
+
+    for (i = 0; i < thread_count; i++) {
+        MultiRDMARecvParams *p = &multiRDMA_recv_state->params[i];
+        qemu_mutex_destroy(&p->mutex);
+        qemu_sem_destroy(&p->sem);
+        g_free(p->name);
+        p->name = NULL;
+    }
+
+    qemu_sem_destroy(&multiRDMA_recv_state->sem_sync);
+    g_free(multiRDMA_recv_state);
+    multiRDMA_recv_state = NULL;
+
+    return ret;
 }
 
 void rdma_start_outgoing_migration(void *opaque,
@@ -4111,10 +4386,18 @@ void rdma_start_outgoing_migration(void *opaque,
 
     trace_rdma_start_outgoing_migration_after_rdma_connect();
 
+    if (migrate_use_multiRDMA()) {
+        if (multiRDMA_save_setup(host_port, errp) != 0) {
+            ERROR(errp, "init multiRDMA channels failure!");
+            goto err;
+        }
+    }
+
     s->to_dst_file = qemu_fopen_rdma(rdma, "wb");
     migrate_fd_connect(s, NULL);
     return;
 err:
     g_free(rdma);
     g_free(rdma_return_path);
+    multiRDMA_save_cleanup();
 }
-- 
2.19.1




^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH RFC 05/12] migration/rdma: Create the multiRDMA channels
  2020-01-09  4:59 [PATCH RFC 00/12] *** mulitple RDMA channels for migration *** Zhimin Feng
                   ` (3 preceding siblings ...)
  2020-01-09  4:59 ` [PATCH RFC 04/12] migration/rdma: Create multiRDMA migration threads Zhimin Feng
@ 2020-01-09  4:59 ` Zhimin Feng
  2020-01-15 19:54   ` Dr. David Alan Gilbert
  2020-01-16 13:30   ` Juan Quintela
  2020-01-09  4:59 ` [PATCH RFC 06/12] migration/rdma: Transmit initial package Zhimin Feng
                   ` (8 subsequent siblings)
  13 siblings, 2 replies; 37+ messages in thread
From: Zhimin Feng @ 2020-01-09  4:59 UTC (permalink / raw)
  To: quintela, dgilbert, armbru, eblake
  Cc: jemmy858585, fengzhimin, qemu-devel, zhang.zhanghailiang

From: fengzhimin <fengzhimin1@huawei.com>

In both sides. We still don't transmit anything through them,
and we only build the RDMA connections.

Signed-off-by: fengzhimin <fengzhimin1@huawei.com>
---
 migration/rdma.c | 253 +++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 223 insertions(+), 30 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 992e5abfed..5b780bef36 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -403,6 +403,10 @@ typedef struct {
     char *name;
     /* channel thread id */
     QemuThread thread;
+    /* RDMAContext channel */
+    RDMAContext *rdma;
+    /* communication channel */
+    QEMUFile *file;
     /* sem where to wait for more work */
     QemuSemaphore sem;
     /* this mutex protects the following parameters */
@@ -429,6 +433,10 @@ typedef struct {
     char *name;
     /* channel thread id */
     QemuThread thread;
+    /* RDMAContext channel */
+    RDMAContext *rdma;
+    /* communication channel */
+    QEMUFile *file;
     /* sem where to wait for more work */
     QemuSemaphore sem;
     /* this mutex protects the following parameters */
@@ -3417,6 +3425,27 @@ static int qemu_rdma_accept(RDMAContext *rdma)
         qemu_set_fd_handler(rdma->channel->fd, rdma_accept_incoming_migration,
                             NULL,
                             (void *)(intptr_t)rdma->return_path);
+    } else if (migrate_use_multiRDMA()) {
+        int thread_count;
+        int i;
+        RDMAContext *multi_rdma = NULL;
+        thread_count = migrate_multiRDMA_channels();
+        /* create the multi Thread RDMA channels */
+        for (i = 0; i < thread_count; i++) {
+            if (multiRDMA_recv_state->params[i].rdma->cm_id == NULL) {
+                multi_rdma = multiRDMA_recv_state->params[i].rdma;
+                break;
+            }
+        }
+
+        if (multi_rdma) {
+            qemu_set_fd_handler(rdma->channel->fd,
+                                rdma_accept_incoming_migration,
+                                NULL, (void *)(intptr_t)multi_rdma);
+        } else {
+            qemu_set_fd_handler(rdma->channel->fd, rdma_cm_poll_handler,
+                                NULL, rdma);
+        }
     } else {
         qemu_set_fd_handler(rdma->channel->fd, rdma_cm_poll_handler,
                             NULL, rdma);
@@ -4029,6 +4058,58 @@ static QEMUFile *qemu_fopen_rdma(RDMAContext *rdma, const char *mode)
     return rioc->file;
 }
 
+static void *multiRDMA_recv_thread(void *opaque)
+{
+    MultiRDMARecvParams *p = opaque;
+
+    while (true) {
+        qemu_mutex_lock(&p->mutex);
+        if (p->quit) {
+            qemu_mutex_unlock(&p->mutex);
+            break;
+        }
+        qemu_mutex_unlock(&p->mutex);
+        qemu_sem_wait(&p->sem);
+    }
+
+    qemu_mutex_lock(&p->mutex);
+    p->running = false;
+    qemu_mutex_unlock(&p->mutex);
+
+    return NULL;
+}
+
+static void multiRDMA_recv_new_channel(QEMUFile *f, int id)
+{
+    MultiRDMARecvParams *p;
+    Error *local_err = NULL;
+
+    p = &multiRDMA_recv_state->params[id];
+    if (p->file != NULL) {
+        error_setg(&local_err,
+                   "multiRDMA: received id '%d' already setup'", id);
+        return ;
+    }
+    p->file = f;
+
+    qemu_thread_create(&p->thread, p->name, multiRDMA_recv_thread, p,
+                       QEMU_THREAD_JOINABLE);
+    atomic_inc(&multiRDMA_recv_state->count);
+}
+
+static void migration_multiRDMA_process_incoming(QEMUFile *f, RDMAContext *rdma)
+{
+    MigrationIncomingState *mis = migration_incoming_get_current();
+
+    if (!mis->from_src_file) {
+        rdma->migration_started_on_destination = 1;
+        migration_incoming_setup(f);
+        migration_incoming_process();
+    } else {
+        multiRDMA_recv_new_channel(f, multiRDMA_recv_state->count);
+    }
+}
+
 static void rdma_accept_incoming_migration(void *opaque)
 {
     RDMAContext *rdma = opaque;
@@ -4057,29 +4138,13 @@ static void rdma_accept_incoming_migration(void *opaque)
         return;
     }
 
-    rdma->migration_started_on_destination = 1;
-    migration_fd_process_incoming(f);
-}
-
-static void *multiRDMA_recv_thread(void *opaque)
-{
-    MultiRDMARecvParams *p = opaque;
-
-    while (true) {
-        qemu_mutex_lock(&p->mutex);
-        if (p->quit) {
-            qemu_mutex_unlock(&p->mutex);
-            break;
-        }
-        qemu_mutex_unlock(&p->mutex);
-        qemu_sem_wait(&p->sem);
+    if (migrate_use_multiRDMA()) {
+        /* build the multiRDMA channels */
+        migration_multiRDMA_process_incoming(f, rdma);
+    } else {
+        rdma->migration_started_on_destination = 1;
+        migration_fd_process_incoming(f);
     }
-
-    qemu_mutex_lock(&p->mutex);
-    p->running = false;
-    qemu_mutex_unlock(&p->mutex);
-
-    return NULL;
 }
 
 static int multiRDMA_load_setup(const char *host_port, RDMAContext *rdma,
@@ -4087,6 +4152,7 @@ static int multiRDMA_load_setup(const char *host_port, RDMAContext *rdma,
 {
     uint8_t i;
     int thread_count;
+    int idx;
 
     thread_count = migrate_multiRDMA_channels();
     if (multiRDMA_recv_state == NULL) {
@@ -4099,15 +4165,21 @@ static int multiRDMA_load_setup(const char *host_port, RDMAContext *rdma,
         for (i = 0; i < thread_count; i++) {
             MultiRDMARecvParams *p = &multiRDMA_recv_state->params[i];
 
+            p->rdma = qemu_rdma_data_init(host_port, errp);
+            for (idx = 0; idx < RDMA_WRID_MAX; idx++) {
+                p->rdma->wr_data[idx].control_len = 0;
+                p->rdma->wr_data[idx].control_curr = NULL;
+            }
+            /* the CM channel and CM id is shared */
+            p->rdma->channel = rdma->channel;
+            p->rdma->listen_id = rdma->listen_id;
+
             qemu_mutex_init(&p->mutex);
             qemu_sem_init(&p->sem, 0);
             p->quit = false;
             p->id = i;
             p->running = true;
             p->name = g_strdup_printf("multiRDMARecv_%d", i);
-            qemu_thread_create(&p->thread, p->name, multiRDMA_recv_thread,
-                               p, QEMU_THREAD_JOINABLE);
-            atomic_inc(&multiRDMA_recv_state->count);
         }
     }
 
@@ -4155,6 +4227,7 @@ void rdma_start_incoming_migration(const char *host_port, Error **errp)
         qemu_rdma_return_path_dest_init(rdma_return_path, rdma);
     }
 
+    /* initialize the RDMAContext for multiRDMA */
     if (migrate_use_multiRDMA()) {
         if (multiRDMA_load_setup(host_port, rdma, &local_err) != 0) {
             ERROR(errp, "init multiRDMA failure!");
@@ -4193,10 +4266,29 @@ static void *multiRDMA_send_thread(void *opaque)
     return NULL;
 }
 
+static void multiRDMA_send_new_channel(QEMUFile *f, int id)
+{
+    MultiRDMASendParams *p;
+    Error *local_err = NULL;
+
+    p = &multiRDMA_send_state->params[id];
+    if (p->file != NULL) {
+        error_setg(&local_err,
+                   "multiRDMA: send id '%d' already setup'", id);
+        return ;
+    }
+    p->file = f;
+
+    qemu_thread_create(&p->thread, p->name, multiRDMA_send_thread,
+                       p, QEMU_THREAD_JOINABLE);
+    atomic_inc(&multiRDMA_send_state->count);
+}
+
 static int multiRDMA_save_setup(const char *host_port, Error **errp)
 {
     int thread_count;
     uint8_t i;
+    int ret;
 
     thread_count = migrate_multiRDMA_channels();
     multiRDMA_send_state = g_malloc0(sizeof(*multiRDMA_send_state));
@@ -4207,6 +4299,27 @@ static int multiRDMA_save_setup(const char *host_port, Error **errp)
 
     for (i = 0; i < thread_count; i++) {
         MultiRDMASendParams *p = &multiRDMA_send_state->params[i];
+        QEMUFile *f = NULL;
+
+        p->rdma = qemu_rdma_data_init(host_port, errp);
+        if (p->rdma == NULL) {
+            ERROR(errp, "init RDMA data failure for multi channel %d!", i);
+            goto err;
+        }
+
+        ret = qemu_rdma_source_init(p->rdma, migrate_use_rdma_pin_all(), errp);
+        if (ret) {
+            ERROR(errp, "init RDMA source failure for multi channel %d!", i);
+            goto err;
+        }
+
+        ret = qemu_rdma_connect(p->rdma, errp);
+        if (ret) {
+            ERROR(errp, "connect multi channel %d failure!", i);
+            goto err;
+        }
+
+        f = qemu_fopen_rdma(multiRDMA_send_state->params[i].rdma, "wb");
         qemu_mutex_init(&p->mutex);
         qemu_sem_init(&p->sem, 0);
         p->quit = false;
@@ -4214,12 +4327,20 @@ static int multiRDMA_save_setup(const char *host_port, Error **errp)
         p->running = true;
         p->name = g_strdup_printf("multiRDMASend_%d", i);
 
-        qemu_thread_create(&p->thread, p->name, multiRDMA_send_thread, p,
-                           QEMU_THREAD_JOINABLE);
-        atomic_inc(&multiRDMA_send_state->count);
+        multiRDMA_send_new_channel(f, i);
     }
 
     return 0;
+
+err:
+    for (i = 0; i < thread_count; i++) {
+        g_free(multiRDMA_send_state->params[i].rdma);
+    }
+
+    g_free(multiRDMA_send_state->params);
+    g_free(multiRDMA_send_state);
+
+    return -1;
 }
 
 static void multiRDMA_send_terminate_threads(void)
@@ -4268,6 +4389,8 @@ int multiRDMA_save_cleanup(void)
         qemu_sem_destroy(&p->sem);
         g_free(p->name);
         p->name = NULL;
+        qemu_rdma_cleanup(multiRDMA_send_state->params[i].rdma);
+        g_free(multiRDMA_send_state->params[i].rdma);
     }
 
     qemu_sem_destroy(&multiRDMA_send_state->sem_sync);
@@ -4292,6 +4415,71 @@ static void multiRDMA_recv_terminate_threads(void)
     }
 }
 
+static void qemu_multiRDMA_load_cleanup(RDMAContext *rdma)
+{
+    int idx;
+
+    if (rdma->cm_id && rdma->connected) {
+        if ((rdma->error_state ||
+             migrate_get_current()->state == MIGRATION_STATUS_CANCELLING) &&
+            !rdma->received_error) {
+            RDMAControlHeader head = { .len = 0,
+                                       .type = RDMA_CONTROL_ERROR,
+                                       .repeat = 1,
+                                     };
+            error_report("Early error. Sending error.");
+            qemu_rdma_post_send_control(rdma, NULL, &head);
+        }
+
+        rdma_disconnect(rdma->cm_id);
+        trace_qemu_rdma_cleanup_disconnect();
+        rdma->connected = false;
+    }
+
+    g_free(rdma->dest_blocks);
+    rdma->dest_blocks = NULL;
+
+    for (idx = 0; idx < RDMA_WRID_MAX; idx++) {
+        if (rdma->wr_data[idx].control_mr) {
+            rdma->total_registrations--;
+            ibv_dereg_mr(rdma->wr_data[idx].control_mr);
+        }
+        rdma->wr_data[idx].control_mr = NULL;
+    }
+
+    if (rdma->local_ram_blocks.block) {
+        while (rdma->local_ram_blocks.nb_blocks) {
+            rdma_delete_block(rdma, &rdma->local_ram_blocks.block[0]);
+        }
+    }
+
+    if (rdma->qp) {
+        rdma_destroy_qp(rdma->cm_id);
+        rdma->qp = NULL;
+    }
+    if (rdma->cq) {
+        ibv_destroy_cq(rdma->cq);
+        rdma->cq = NULL;
+    }
+    if (rdma->comp_channel) {
+        ibv_destroy_comp_channel(rdma->comp_channel);
+        rdma->comp_channel = NULL;
+    }
+    if (rdma->pd) {
+        ibv_dealloc_pd(rdma->pd);
+        rdma->pd = NULL;
+    }
+    if (rdma->cm_id) {
+        rdma_destroy_id(rdma->cm_id);
+        rdma->cm_id = NULL;
+    }
+
+    rdma->listen_id = NULL;
+    rdma->channel = NULL;
+    g_free(rdma->host);
+    rdma->host = NULL;
+}
+
 int multiRDMA_load_cleanup(void)
 {
     int i;
@@ -4323,6 +4511,8 @@ int multiRDMA_load_cleanup(void)
         qemu_sem_destroy(&p->sem);
         g_free(p->name);
         p->name = NULL;
+        qemu_multiRDMA_load_cleanup(multiRDMA_recv_state->params[i].rdma);
+        g_free(multiRDMA_recv_state->params[i].rdma);
     }
 
     qemu_sem_destroy(&multiRDMA_recv_state->sem_sync);
@@ -4386,15 +4576,18 @@ void rdma_start_outgoing_migration(void *opaque,
 
     trace_rdma_start_outgoing_migration_after_rdma_connect();
 
+    s->to_dst_file = qemu_fopen_rdma(rdma, "wb");
+    /* create multiRDMA channel */
     if (migrate_use_multiRDMA()) {
         if (multiRDMA_save_setup(host_port, errp) != 0) {
             ERROR(errp, "init multiRDMA channels failure!");
             goto err;
         }
+        migrate_fd_connect(s, NULL);
+    } else {
+        migrate_fd_connect(s, NULL);
     }
 
-    s->to_dst_file = qemu_fopen_rdma(rdma, "wb");
-    migrate_fd_connect(s, NULL);
     return;
 err:
     g_free(rdma);
-- 
2.19.1




^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH RFC 06/12] migration/rdma: Transmit initial package
  2020-01-09  4:59 [PATCH RFC 00/12] *** mulitple RDMA channels for migration *** Zhimin Feng
                   ` (4 preceding siblings ...)
  2020-01-09  4:59 ` [PATCH RFC 05/12] migration/rdma: Create the multiRDMA channels Zhimin Feng
@ 2020-01-09  4:59 ` Zhimin Feng
  2020-01-15 18:36   ` Dr. David Alan Gilbert
  2020-01-09  4:59 ` [PATCH RFC 07/12] migration/rdma: Be sure all channels are created Zhimin Feng
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 37+ messages in thread
From: Zhimin Feng @ 2020-01-09  4:59 UTC (permalink / raw)
  To: quintela, dgilbert, armbru, eblake
  Cc: jemmy858585, fengzhimin, qemu-devel, zhang.zhanghailiang

From: fengzhimin <fengzhimin1@huawei.com>

Transmit initial package through the multiRDMA channels,
so that we can identify the main channel and multiRDMA channels.

Signed-off-by: fengzhimin <fengzhimin1@huawei.com>
---
 migration/rdma.c | 114 ++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 107 insertions(+), 7 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 5b780bef36..db75a4372a 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -116,6 +116,16 @@ static uint32_t known_capabilities = RDMA_CAPABILITY_PIN_ALL;
 
 #define RDMA_WRID_CHUNK_MASK (~RDMA_WRID_BLOCK_MASK & ~RDMA_WRID_TYPE_MASK)
 
+/* Define magic to distinguish multiRDMA and main RDMA */
+#define MULTIRDMA_MAGIC 0x11223344U
+#define MAINRDMA_MAGIC 0x55667788U
+#define ERROR_MAGIC 0xFFFFFFFFU
+
+#define MULTIRDMA_VERSION 1
+#define MAINRDMA_VERSION 1
+
+#define UNUSED_ID 0xFFU
+
 /*
  * RDMA migration protocol:
  * 1. RDMA Writes (data messages, i.e. RAM)
@@ -167,6 +177,14 @@ enum {
     RDMA_CONTROL_UNREGISTER_FINISHED, /* unpinning finished */
 };
 
+/*
+ * Identify the MultiRDAM channel info
+ */
+typedef struct {
+    uint32_t magic;
+    uint32_t version;
+    uint8_t id;
+} __attribute__((packed)) RDMAChannelInit_t;
 
 /*
  * Memory and MR structures used to represent an IB Send/Recv work request.
@@ -3430,7 +3448,7 @@ static int qemu_rdma_accept(RDMAContext *rdma)
         int i;
         RDMAContext *multi_rdma = NULL;
         thread_count = migrate_multiRDMA_channels();
-        /* create the multi Thread RDMA channels */
+        /* create the multiRDMA channels */
         for (i = 0; i < thread_count; i++) {
             if (multiRDMA_recv_state->params[i].rdma->cm_id == NULL) {
                 multi_rdma = multiRDMA_recv_state->params[i].rdma;
@@ -4058,6 +4076,48 @@ static QEMUFile *qemu_fopen_rdma(RDMAContext *rdma, const char *mode)
     return rioc->file;
 }
 
+static RDMAChannelInit_t migration_rdma_recv_initial_packet(QEMUFile *f,
+                                                            Error **errp)
+{
+    RDMAChannelInit_t msg;
+    int thread_count = migrate_multiRDMA_channels();
+    qemu_get_buffer(f, (uint8_t *)&msg, sizeof(msg));
+    be32_to_cpus(&msg.magic);
+    be32_to_cpus(&msg.version);
+
+    if (msg.magic != MULTIRDMA_MAGIC &&
+        msg.magic != MAINRDMA_MAGIC) {
+        error_setg(errp, "multiRDMA: received unrecognized "
+                   "packet magic %x", msg.magic);
+        goto err;
+    }
+
+    if (msg.magic == MULTIRDMA_MAGIC
+        && msg.version != MULTIRDMA_VERSION) {
+        error_setg(errp, "multiRDMA: received packet version "
+                   "%d expected %d", msg.version, MULTIRDMA_VERSION);
+        goto err;
+    }
+
+    if (msg.magic == MAINRDMA_MAGIC && msg.version != MAINRDMA_VERSION) {
+        error_setg(errp, "multiRDMA: received packet version "
+                   "%d expected %d", msg.version, MAINRDMA_VERSION);
+        goto err;
+    }
+
+    if (msg.magic == MULTIRDMA_MAGIC && msg.id > thread_count) {
+        error_setg(errp, "multiRDMA: received channel version %d "
+                   "expected %d", msg.version, MULTIRDMA_VERSION);
+        goto err;
+    }
+
+    return msg;
+err:
+    msg.magic = ERROR_MAGIC;
+    msg.id = UNUSED_ID;
+    return msg;
+}
+
 static void *multiRDMA_recv_thread(void *opaque)
 {
     MultiRDMARecvParams *p = opaque;
@@ -4100,13 +4160,34 @@ static void multiRDMA_recv_new_channel(QEMUFile *f, int id)
 static void migration_multiRDMA_process_incoming(QEMUFile *f, RDMAContext *rdma)
 {
     MigrationIncomingState *mis = migration_incoming_get_current();
+    Error *local_err = NULL;
+    RDMAChannelInit_t msg = migration_rdma_recv_initial_packet(f, &local_err);
+
+    switch (msg.magic) {
+    case MAINRDMA_MAGIC:
+        if (!mis->from_src_file) {
+            rdma->migration_started_on_destination = 1;
+            migration_incoming_setup(f);
+            migration_incoming_process();
+        }
+        break;
 
-    if (!mis->from_src_file) {
-        rdma->migration_started_on_destination = 1;
-        migration_incoming_setup(f);
-        migration_incoming_process();
-    } else {
-        multiRDMA_recv_new_channel(f, multiRDMA_recv_state->count);
+    case MULTIRDMA_MAGIC:
+        multiRDMA_recv_new_channel(f, msg.id);
+        break;
+
+    case ERROR_MAGIC:
+    default:
+        if (local_err) {
+            MigrationState *s = migrate_get_current();
+            migrate_set_error(s, local_err);
+            if (s->state == MIGRATION_STATUS_SETUP ||
+                    s->state == MIGRATION_STATUS_ACTIVE) {
+                migrate_set_state(&s->state, s->state,
+                        MIGRATION_STATUS_FAILED);
+            }
+        }
+        break;
     }
 }
 
@@ -4245,10 +4326,26 @@ err:
     multiRDMA_load_cleanup();
 }
 
+static void migration_rdma_send_initial_packet(QEMUFile *f, uint8_t id)
+{
+    RDMAChannelInit_t msg;
+
+    msg.magic = cpu_to_be32(id == UNUSED_ID ?
+                            MAINRDMA_MAGIC : MULTIRDMA_MAGIC);
+    msg.version = cpu_to_be32(id == UNUSED_ID ?
+                              MAINRDMA_VERSION : MULTIRDMA_VERSION);
+    msg.id = id;
+    qemu_put_buffer(f, (uint8_t *)&msg, sizeof(msg));
+    qemu_fflush(f);
+}
+
 static void *multiRDMA_send_thread(void *opaque)
 {
     MultiRDMASendParams *p = opaque;
 
+    /* send the multiRDMA channels magic */
+    migration_rdma_send_initial_packet(p->file, p->id);
+
     while (true) {
         qemu_mutex_lock(&p->mutex);
         if (p->quit) {
@@ -4579,6 +4676,9 @@ void rdma_start_outgoing_migration(void *opaque,
     s->to_dst_file = qemu_fopen_rdma(rdma, "wb");
     /* create multiRDMA channel */
     if (migrate_use_multiRDMA()) {
+        /* send the main RDMA channel magic */
+        migration_rdma_send_initial_packet(s->to_dst_file, UNUSED_ID);
+
         if (multiRDMA_save_setup(host_port, errp) != 0) {
             ERROR(errp, "init multiRDMA channels failure!");
             goto err;
-- 
2.19.1




^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH RFC 07/12] migration/rdma: Be sure all channels are created
  2020-01-09  4:59 [PATCH RFC 00/12] *** mulitple RDMA channels for migration *** Zhimin Feng
                   ` (5 preceding siblings ...)
  2020-01-09  4:59 ` [PATCH RFC 06/12] migration/rdma: Transmit initial package Zhimin Feng
@ 2020-01-09  4:59 ` Zhimin Feng
  2020-01-09  4:59 ` [PATCH RFC 08/12] migration/rdma: register memory for multiRDMA channels Zhimin Feng
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 37+ messages in thread
From: Zhimin Feng @ 2020-01-09  4:59 UTC (permalink / raw)
  To: quintela, dgilbert, armbru, eblake
  Cc: jemmy858585, fengzhimin, qemu-devel, zhang.zhanghailiang

From: fengzhimin <fengzhimin1@huawei.com>

We need to build all multiRDMA channels before we start migration.

Signed-off-by: fengzhimin <fengzhimin1@huawei.com>
---
 migration/rdma.c | 44 ++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 42 insertions(+), 2 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index db75a4372a..518a21b0fe 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -4076,6 +4076,38 @@ static QEMUFile *qemu_fopen_rdma(RDMAContext *rdma, const char *mode)
     return rioc->file;
 }
 
+static bool multiRDMA_send_all_channels_created(void)
+{
+    int thread_count = migrate_multiRDMA_channels();
+
+    return thread_count == atomic_read(&multiRDMA_send_state->count);
+}
+
+static bool multiRDMA_recv_all_channels_created(void)
+{
+    int thread_count = migrate_multiRDMA_channels();
+
+    return thread_count == atomic_read(&multiRDMA_recv_state->count);
+}
+
+static bool migration_has_all_rdma_channels(void)
+{
+    bool all_channels;
+    if (multiRDMA_send_state) {
+        MigrationState *ms = migrate_get_current();
+        all_channels = multiRDMA_send_all_channels_created();
+
+        return all_channels && ms->to_dst_file != NULL;
+    } else {
+        MigrationIncomingState *mis = migration_incoming_get_current();
+        all_channels = multiRDMA_recv_all_channels_created();
+
+        return all_channels && mis->from_src_file != NULL;
+    }
+
+    return false;
+}
+
 static RDMAChannelInit_t migration_rdma_recv_initial_packet(QEMUFile *f,
                                                             Error **errp)
 {
@@ -4168,7 +4200,6 @@ static void migration_multiRDMA_process_incoming(QEMUFile *f, RDMAContext *rdma)
         if (!mis->from_src_file) {
             rdma->migration_started_on_destination = 1;
             migration_incoming_setup(f);
-            migration_incoming_process();
         }
         break;
 
@@ -4189,6 +4220,11 @@ static void migration_multiRDMA_process_incoming(QEMUFile *f, RDMAContext *rdma)
         }
         break;
     }
+
+    /* wait for all channels to be established */
+    if (migration_has_all_rdma_channels()) {
+        migration_incoming_process();
+    }
 }
 
 static void rdma_accept_incoming_migration(void *opaque)
@@ -4683,7 +4719,11 @@ void rdma_start_outgoing_migration(void *opaque,
             ERROR(errp, "init multiRDMA channels failure!");
             goto err;
         }
-        migrate_fd_connect(s, NULL);
+
+        /* wait for all channels to be established */
+        if (migration_has_all_rdma_channels()) {
+            migrate_fd_connect(s, NULL);
+        }
     } else {
         migrate_fd_connect(s, NULL);
     }
-- 
2.19.1




^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH RFC 08/12] migration/rdma: register memory for multiRDMA channels
  2020-01-09  4:59 [PATCH RFC 00/12] *** mulitple RDMA channels for migration *** Zhimin Feng
                   ` (6 preceding siblings ...)
  2020-01-09  4:59 ` [PATCH RFC 07/12] migration/rdma: Be sure all channels are created Zhimin Feng
@ 2020-01-09  4:59 ` Zhimin Feng
  2020-01-09  4:59 ` [PATCH RFC 09/12] migration/rdma: Wait for all multiRDMA to complete registration Zhimin Feng
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 37+ messages in thread
From: Zhimin Feng @ 2020-01-09  4:59 UTC (permalink / raw)
  To: quintela, dgilbert, armbru, eblake
  Cc: jemmy858585, fengzhimin, qemu-devel, zhang.zhanghailiang

From: fengzhimin <fengzhimin1@huawei.com>

register memory for multiRDMA channels and transmit the destination
the keys to source to use including the virtual addresses.

Signed-off-by: fengzhimin <fengzhimin1@huawei.com>
---
 migration/rdma.c | 192 ++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 190 insertions(+), 2 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 518a21b0fe..6ecc870844 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3847,6 +3847,15 @@ static int rdma_load_hook(QEMUFile *f, void *opaque, uint64_t flags, void *data)
         return rdma_block_notification_handle(opaque, data);
 
     case RAM_CONTROL_HOOK:
+        if (migrate_use_multiRDMA()) {
+            int i;
+            int thread_count = migrate_multiRDMA_channels();
+            /* Inform dest recv_thread to poll */
+            for (i = 0; i < thread_count; i++) {
+                qemu_sem_post(&multiRDMA_recv_state->params[i].sem);
+            }
+        }
+
         return qemu_rdma_registration_handle(f, opaque);
 
     default:
@@ -3920,6 +3929,17 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
         head.type = RDMA_CONTROL_RAM_BLOCKS_REQUEST;
         trace_qemu_rdma_registration_stop_ram();
 
+        if (migrate_use_multiRDMA()) {
+            /*
+             * Inform the multiRDMA channels to register memory
+             */
+            int i;
+            int thread_count = migrate_multiRDMA_channels();
+            for (i = 0; i < thread_count; i++) {
+                qemu_sem_post(&multiRDMA_send_state->params[i].sem);
+            }
+        }
+
         /*
          * Make sure that we parallelize the pinning on both sides.
          * For very large guests, doing this serially takes a really
@@ -3985,6 +4005,15 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
     head.type = RDMA_CONTROL_REGISTER_FINISHED;
     ret = qemu_rdma_exchange_send(rdma, &head, NULL, NULL, NULL, NULL);
 
+    if (migrate_use_multiRDMA()) {
+        /* Inform src send_thread to send FINISHED signal */
+        int i;
+        int thread_count = migrate_multiRDMA_channels();
+        for (i = 0; i < thread_count; i++) {
+            qemu_sem_post(&multiRDMA_send_state->params[i].sem);
+        }
+    }
+
     if (ret < 0) {
         goto err;
     }
@@ -4150,18 +4179,119 @@ err:
     return msg;
 }
 
+static int qemu_multiRDMA_registration_handle(void *opaque)
+{
+    RDMAControlHeader blocks = { .type = RDMA_CONTROL_RAM_BLOCKS_RESULT,
+                                 .repeat = 1 };
+    MultiRDMARecvParams *p = opaque;
+    RDMAContext *rdma = p->rdma;
+    RDMALocalBlocks *local = &rdma->local_ram_blocks;
+    RDMAControlHeader head;
+    int ret = 0;
+    int i = 0;
+
+    CHECK_ERROR_STATE();
+
+    do {
+        ret = qemu_rdma_exchange_recv(rdma, &head, RDMA_CONTROL_NONE);
+
+        if (ret < 0) {
+            break;
+        }
+
+        if (head.repeat > RDMA_CONTROL_MAX_COMMANDS_PER_MESSAGE) {
+            error_report("rdma: Too many requests in this message (%d)."
+                         "Bailing.", head.repeat);
+            ret = -EIO;
+            break;
+        }
+
+        switch (head.type) {
+        case RDMA_CONTROL_REGISTER_FINISHED:
+            goto out;
+        case RDMA_CONTROL_RAM_BLOCKS_REQUEST:
+            qsort(rdma->local_ram_blocks.block,
+                  rdma->local_ram_blocks.nb_blocks,
+                  sizeof(RDMALocalBlock), dest_ram_sort_func);
+
+            if (rdma->pin_all) {
+                ret = qemu_rdma_reg_whole_ram_blocks(rdma);
+                if (ret) {
+                    error_report("rdma migration: error dest "
+                                 "registering ram blocks");
+                    goto out;
+                }
+            }
+
+            for (i = 0; i < local->nb_blocks; i++) {
+                /*
+                 * The multiRDMA threads only register ram block
+                 * to send data, other blocks are sent by main RDMA thread.
+                 */
+                if (strcmp(local->block[i].block_name, "mach-virt.ram") == 0) {
+                    rdma->dest_blocks[i].remote_host_addr =
+                        (uintptr_t)(local->block[i].local_host_addr);
+
+                    if (rdma->pin_all) {
+                        rdma->dest_blocks[i].remote_rkey =
+                            local->block[i].mr->rkey;
+                    }
+
+                    rdma->dest_blocks[i].offset = local->block[i].offset;
+                    rdma->dest_blocks[i].length = local->block[i].length;
+
+                    dest_block_to_network(&rdma->dest_blocks[i]);
+
+                    break;
+                }
+            }
+
+            blocks.len = rdma->local_ram_blocks.nb_blocks
+                                                * sizeof(RDMADestBlock);
+
+            ret = qemu_rdma_post_send_control(rdma,
+                                              (uint8_t *) rdma->dest_blocks,
+                                              &blocks);
+
+            if (ret < 0) {
+                error_report("rdma migration: error sending remote info");
+                goto out;
+            }
+
+            break;
+        default:
+            error_report("Unknown control message %s", control_desc(head.type));
+            ret = -EIO;
+            goto out;
+        }
+    } while (1);
+out:
+    if (ret < 0) {
+        rdma->error_state = ret;
+    }
+    return ret;
+}
+
 static void *multiRDMA_recv_thread(void *opaque)
 {
     MultiRDMARecvParams *p = opaque;
+    int ret;
 
     while (true) {
+        qemu_sem_wait(&p->sem);
+
         qemu_mutex_lock(&p->mutex);
         if (p->quit) {
             qemu_mutex_unlock(&p->mutex);
             break;
         }
         qemu_mutex_unlock(&p->mutex);
-        qemu_sem_wait(&p->sem);
+
+        ret = qemu_multiRDMA_registration_handle(opaque);
+        if (ret < 0) {
+            qemu_file_set_error(p->file, ret);
+            break;
+        }
     }
 
     qemu_mutex_lock(&p->mutex);
@@ -4378,18 +4508,76 @@ static void migration_rdma_send_initial_packet(QEMUFile *f, uint8_t id)
 static void *multiRDMA_send_thread(void *opaque)
 {
     MultiRDMASendParams *p = opaque;
+    RDMAContext *rdma = p->rdma;
+    int ret;
 
     /* send the multiRDMA channels magic */
     migration_rdma_send_initial_packet(p->file, p->id);
 
+    /* wait for semaphore notification to register memory */
+    qemu_sem_wait(&p->sem);
+
+    RDMAControlHeader resp = {.type = RDMA_CONTROL_RAM_BLOCKS_RESULT };
+    RDMALocalBlocks *local = &rdma->local_ram_blocks;
+    int reg_result_idx, i, nb_dest_blocks;
+    RDMAControlHeader head = { .len = 0, .repeat = 1 };
+
+    head.type = RDMA_CONTROL_RAM_BLOCKS_REQUEST;
+
+    ret = qemu_rdma_exchange_send(rdma, &head, NULL, &resp,
+            &reg_result_idx, rdma->pin_all ?
+            qemu_rdma_reg_whole_ram_blocks : NULL);
+    if (ret < 0) {
+        return NULL;
+    }
+
+    nb_dest_blocks = resp.len / sizeof(RDMADestBlock);
+
+    if (local->nb_blocks != nb_dest_blocks) {
+        rdma->error_state = -EINVAL;
+        return NULL;
+    }
+
+    qemu_rdma_move_header(rdma, reg_result_idx, &resp);
+    memcpy(rdma->dest_blocks,
+           rdma->wr_data[reg_result_idx].control_curr, resp.len);
+
+    for (i = 0; i < nb_dest_blocks; i++) {
+        /*
+         * The multiRDMA threads only register ram block
+         * to send data, other blocks are sent by main RDMA thread.
+         */
+        if (strcmp(local->block[i].block_name, "mach-virt.ram") == 0) {
+            network_to_dest_block(&rdma->dest_blocks[i]);
+
+            /* We require that the blocks are in the same order */
+            if (rdma->dest_blocks[i].length != local->block[i].length) {
+                rdma->error_state = -EINVAL;
+                return NULL;
+            }
+            local->block[i].remote_host_addr =
+                rdma->dest_blocks[i].remote_host_addr;
+            local->block[i].remote_rkey = rdma->dest_blocks[i].remote_rkey;
+            break;
+        }
+    }
+
     while (true) {
+        qemu_sem_wait(&p->sem);
+
         qemu_mutex_lock(&p->mutex);
         if (p->quit) {
             qemu_mutex_unlock(&p->mutex);
             break;
         }
         qemu_mutex_unlock(&p->mutex);
-        qemu_sem_wait(&p->sem);
+
+        /* Send FINISHED to the destination */
+        head.type = RDMA_CONTROL_REGISTER_FINISHED;
+        ret = qemu_rdma_exchange_send(rdma, &head, NULL, NULL, NULL, NULL);
+        if (ret < 0) {
+            return NULL;
+        }
     }
 
     qemu_mutex_lock(&p->mutex);
-- 
2.19.1




^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH RFC 09/12] migration/rdma: Wait for all multiRDMA to complete registration
  2020-01-09  4:59 [PATCH RFC 00/12] *** mulitple RDMA channels for migration *** Zhimin Feng
                   ` (7 preceding siblings ...)
  2020-01-09  4:59 ` [PATCH RFC 08/12] migration/rdma: register memory for multiRDMA channels Zhimin Feng
@ 2020-01-09  4:59 ` Zhimin Feng
  2020-01-09  4:59 ` [PATCH RFC 10/12] migration/rdma: use multiRDMA to send RAM block for rdma-pin-all mode Zhimin Feng
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 37+ messages in thread
From: Zhimin Feng @ 2020-01-09  4:59 UTC (permalink / raw)
  To: quintela, dgilbert, armbru, eblake
  Cc: jemmy858585, fengzhimin, qemu-devel, zhang.zhanghailiang

From: fengzhimin <fengzhimin1@huawei.com>

Signed-off-by: fengzhimin <fengzhimin1@huawei.com>
---
 migration/rdma.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/migration/rdma.c b/migration/rdma.c
index 6ecc870844..425dfa709d 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -439,6 +439,10 @@ struct {
     MultiRDMASendParams *params;
     /* number of created threads */
     int count;
+    /* this mutex protects the following parameters */
+    QemuMutex mutex_sync;
+    /* number of registered multiRDMA channels */
+    unsigned int reg_mr_channels;
     /* syncs main thread and channels */
     QemuSemaphore sem_sync;
 } *multiRDMA_send_state;
@@ -3998,6 +4002,11 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
                     rdma->dest_blocks[i].remote_host_addr;
             local->block[i].remote_rkey = rdma->dest_blocks[i].remote_rkey;
         }
+
+        /* Wait for all multiRDMA channels to complete registration */
+        if (migrate_use_multiRDMA()) {
+            qemu_sem_wait(&multiRDMA_send_state->sem_sync);
+        }
     }
 
     trace_qemu_rdma_registration_stop(flags);
@@ -4562,6 +4571,17 @@ static void *multiRDMA_send_thread(void *opaque)
         }
     }
 
+    /*
+     * Inform the main RDMA thread to run when multiRDMA
+     * threads have completed registration.
+     */
+    qemu_mutex_lock(&multiRDMA_send_state->mutex_sync);
+    if (++multiRDMA_send_state->reg_mr_channels ==
+        migrate_multiRDMA_channels()) {
+        qemu_sem_post(&multiRDMA_send_state->sem_sync);
+    }
+    qemu_mutex_unlock(&multiRDMA_send_state->mutex_sync);
+
     while (true) {
         qemu_sem_wait(&p->sem);
 
@@ -4616,6 +4636,8 @@ static int multiRDMA_save_setup(const char *host_port, Error **errp)
     multiRDMA_send_state->params = g_new0(MultiRDMASendParams,
                                           thread_count);
     atomic_set(&multiRDMA_send_state->count, 0);
+    atomic_set(&multiRDMA_send_state->reg_mr_channels, 0);
+    qemu_mutex_init(&multiRDMA_send_state->mutex_sync);
     qemu_sem_init(&multiRDMA_send_state->sem_sync, 0);
 
     for (i = 0; i < thread_count; i++) {
@@ -4714,6 +4736,7 @@ int multiRDMA_save_cleanup(void)
         g_free(multiRDMA_send_state->params[i].rdma);
     }
 
+    qemu_mutex_destroy(&multiRDMA_send_state->mutex_sync);
     qemu_sem_destroy(&multiRDMA_send_state->sem_sync);
     g_free(multiRDMA_send_state);
     multiRDMA_send_state = NULL;
-- 
2.19.1




^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH RFC 10/12] migration/rdma: use multiRDMA to send RAM block for rdma-pin-all mode
  2020-01-09  4:59 [PATCH RFC 00/12] *** mulitple RDMA channels for migration *** Zhimin Feng
                   ` (8 preceding siblings ...)
  2020-01-09  4:59 ` [PATCH RFC 09/12] migration/rdma: Wait for all multiRDMA to complete registration Zhimin Feng
@ 2020-01-09  4:59 ` Zhimin Feng
  2020-01-09  4:59 ` [PATCH RFC 11/12] migration/rdma: use multiRDMA to send RAM block for NOT " Zhimin Feng
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 37+ messages in thread
From: Zhimin Feng @ 2020-01-09  4:59 UTC (permalink / raw)
  To: quintela, dgilbert, armbru, eblake
  Cc: jemmy858585, fengzhimin, qemu-devel, zhang.zhanghailiang

From: fengzhimin <fengzhimin1@huawei.com>

Send the RAM block through MultiRDMA channels for using rdma-pin-all option,
and we choose the channel to send data through polling the MultiRDMA thread.

Signed-off-by: fengzhimin <fengzhimin1@huawei.com>
---
 migration/rdma.c | 66 +++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 63 insertions(+), 3 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 425dfa709d..36261f1fc8 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -427,6 +427,8 @@ typedef struct {
     QEMUFile *file;
     /* sem where to wait for more work */
     QemuSemaphore sem;
+    /* syncs main thread and channels */
+    QemuSemaphore sem_sync;
     /* this mutex protects the following parameters */
     QemuMutex mutex;
     /* is this channel thread running */
@@ -439,6 +441,8 @@ struct {
     MultiRDMASendParams *params;
     /* number of created threads */
     int count;
+    /* index of current RDMA channels */
+    int current_RDMA_index;
     /* this mutex protects the following parameters */
     QemuMutex mutex_sync;
     /* number of registered multiRDMA channels */
@@ -2043,6 +2047,18 @@ static int qemu_rdma_exchange_recv(RDMAContext *rdma, RDMAControlHeader *head,
     return 0;
 }
 
+/*
+ * Get the multiRDMA channel used to send data.
+ */
+static int get_multiRDMA_channel(void)
+{
+    int thread_count = migrate_multiRDMA_channels();
+    multiRDMA_send_state->current_RDMA_index++;
+    multiRDMA_send_state->current_RDMA_index %= thread_count;
+
+    return multiRDMA_send_state->current_RDMA_index;
+}
+
 /*
  * Write an actual chunk of memory using RDMA.
  *
@@ -2068,6 +2084,16 @@ static int qemu_rdma_write_one(QEMUFile *f, RDMAContext *rdma,
                                .repeat = 1,
                              };
 
+    if (migrate_use_multiRDMA() &&
+        migrate_use_rdma_pin_all()) {
+        /* The multiRDMA threads only send ram block */
+        if (strcmp(block->block_name, "mach-virt.ram") == 0) {
+            int channel = get_multiRDMA_channel();
+            rdma = multiRDMA_send_state->params[channel].rdma;
+            block = &(rdma->local_ram_blocks.block[current_index]);
+        }
+    }
+
 retry:
     sge.addr = (uintptr_t)(block->local_host_addr +
                             (current_addr - block->offset));
@@ -2285,8 +2311,22 @@ static int qemu_rdma_write_flush(QEMUFile *f, RDMAContext *rdma)
     }
 
     if (ret == 0) {
-        rdma->nb_sent++;
-        trace_qemu_rdma_write_flush(rdma->nb_sent);
+        if (migrate_use_multiRDMA() &&
+            migrate_use_rdma_pin_all()) {
+            /* The multiRDMA threads only send ram block */
+            RDMALocalBlock *block = NULL;
+            block = &(rdma->local_ram_blocks.block[rdma->current_index]);
+            if (strcmp(block->block_name, "mach-virt.ram") == 0) {
+                int current_RDMA = multiRDMA_send_state->current_RDMA_index;
+                multiRDMA_send_state->params[current_RDMA].rdma->nb_sent++;
+            } else {
+                rdma->nb_sent++;
+                trace_qemu_rdma_write_flush(rdma->nb_sent);
+            }
+        } else {
+            rdma->nb_sent++;
+            trace_qemu_rdma_write_flush(rdma->nb_sent);
+        }
     }
 
     rdma->current_length = 0;
@@ -4015,11 +4055,15 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
     ret = qemu_rdma_exchange_send(rdma, &head, NULL, NULL, NULL, NULL);
 
     if (migrate_use_multiRDMA()) {
-        /* Inform src send_thread to send FINISHED signal */
+        /*
+         * Inform src send_thread to send FINISHED signal.
+         * Wait for multiRDMA send threads to poll the CQE.
+         */
         int i;
         int thread_count = migrate_multiRDMA_channels();
         for (i = 0; i < thread_count; i++) {
             qemu_sem_post(&multiRDMA_send_state->params[i].sem);
+            qemu_sem_wait(&multiRDMA_send_state->params[i].sem_sync);
         }
     }
 
@@ -4592,12 +4636,25 @@ static void *multiRDMA_send_thread(void *opaque)
         }
         qemu_mutex_unlock(&p->mutex);
 
+        /* To complete polling(CQE) */
+        while (rdma->nb_sent) {
+            ret = qemu_rdma_block_for_wrid(rdma, RDMA_WRID_RDMA_WRITE, NULL);
+            if (ret < 0) {
+                error_report("multiRDMA migration: "
+                             "complete polling error!");
+                return NULL;
+            }
+        }
+
         /* Send FINISHED to the destination */
         head.type = RDMA_CONTROL_REGISTER_FINISHED;
         ret = qemu_rdma_exchange_send(rdma, &head, NULL, NULL, NULL, NULL);
         if (ret < 0) {
             return NULL;
         }
+
+        /* sync main thread */
+        qemu_sem_post(&p->sem_sync);
     }
 
     qemu_mutex_lock(&p->mutex);
@@ -4637,6 +4694,7 @@ static int multiRDMA_save_setup(const char *host_port, Error **errp)
                                           thread_count);
     atomic_set(&multiRDMA_send_state->count, 0);
     atomic_set(&multiRDMA_send_state->reg_mr_channels, 0);
+    atomic_set(&multiRDMA_send_state->current_RDMA_index, 0);
     qemu_mutex_init(&multiRDMA_send_state->mutex_sync);
     qemu_sem_init(&multiRDMA_send_state->sem_sync, 0);
 
@@ -4665,6 +4723,7 @@ static int multiRDMA_save_setup(const char *host_port, Error **errp)
         f = qemu_fopen_rdma(multiRDMA_send_state->params[i].rdma, "wb");
         qemu_mutex_init(&p->mutex);
         qemu_sem_init(&p->sem, 0);
+        qemu_sem_init(&p->sem_sync, 0);
         p->quit = false;
         p->id = i;
         p->running = true;
@@ -4730,6 +4789,7 @@ int multiRDMA_save_cleanup(void)
         MultiRDMASendParams *p = &multiRDMA_send_state->params[i];
         qemu_mutex_destroy(&p->mutex);
         qemu_sem_destroy(&p->sem);
+        qemu_sem_destroy(&p->sem_sync);
         g_free(p->name);
         p->name = NULL;
         qemu_rdma_cleanup(multiRDMA_send_state->params[i].rdma);
-- 
2.19.1




^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH RFC 11/12] migration/rdma: use multiRDMA to send RAM block for NOT rdma-pin-all mode
  2020-01-09  4:59 [PATCH RFC 00/12] *** mulitple RDMA channels for migration *** Zhimin Feng
                   ` (9 preceding siblings ...)
  2020-01-09  4:59 ` [PATCH RFC 10/12] migration/rdma: use multiRDMA to send RAM block for rdma-pin-all mode Zhimin Feng
@ 2020-01-09  4:59 ` " Zhimin Feng
  2020-01-09  4:59 ` [PATCH RFC 12/12] migration/rdma: only register the virt-ram block for MultiRDMA Zhimin Feng
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 37+ messages in thread
From: Zhimin Feng @ 2020-01-09  4:59 UTC (permalink / raw)
  To: quintela, dgilbert, armbru, eblake
  Cc: jemmy858585, fengzhimin, qemu-devel, zhang.zhanghailiang

From: fengzhimin <fengzhimin1@huawei.com>

Signed-off-by: fengzhimin <fengzhimin1@huawei.com>
---
 migration/rdma.c | 109 +++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 105 insertions(+), 4 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 36261f1fc8..0a150099e2 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2084,8 +2084,7 @@ static int qemu_rdma_write_one(QEMUFile *f, RDMAContext *rdma,
                                .repeat = 1,
                              };
 
-    if (migrate_use_multiRDMA() &&
-        migrate_use_rdma_pin_all()) {
+    if (migrate_use_multiRDMA()) {
         /* The multiRDMA threads only send ram block */
         if (strcmp(block->block_name, "mach-virt.ram") == 0) {
             int channel = get_multiRDMA_channel();
@@ -2311,8 +2310,7 @@ static int qemu_rdma_write_flush(QEMUFile *f, RDMAContext *rdma)
     }
 
     if (ret == 0) {
-        if (migrate_use_multiRDMA() &&
-            migrate_use_rdma_pin_all()) {
+        if (migrate_use_multiRDMA()) {
             /* The multiRDMA threads only send ram block */
             RDMALocalBlock *block = NULL;
             block = &(rdma->local_ram_blocks.block[rdma->current_index]);
@@ -4234,12 +4232,24 @@ err:
 
 static int qemu_multiRDMA_registration_handle(void *opaque)
 {
+    RDMAControlHeader reg_resp = { .len = sizeof(RDMARegisterResult),
+                                   .type = RDMA_CONTROL_REGISTER_RESULT,
+                                   .repeat = 0,
+                                 };
     RDMAControlHeader blocks = { .type = RDMA_CONTROL_RAM_BLOCKS_RESULT,
                                  .repeat = 1 };
     MultiRDMARecvParams *p = opaque;
     RDMAContext *rdma = p->rdma;
     RDMALocalBlocks *local = &rdma->local_ram_blocks;
     RDMAControlHeader head;
+    RDMARegister *reg, *registers;
+    RDMACompress *comp;
+    RDMARegisterResult *reg_result;
+    static RDMARegisterResult results[RDMA_CONTROL_MAX_COMMANDS_PER_MESSAGE];
+    RDMALocalBlock *block;
+    void *host_addr;
+    int idx = 0;
+    int count = 0;
     int ret = 0;
     int i = 0;
 
@@ -4260,8 +4270,28 @@ static int qemu_multiRDMA_registration_handle(void *opaque)
         }
 
         switch (head.type) {
+        case RDMA_CONTROL_COMPRESS:
+            comp = (RDMACompress *) rdma->wr_data[idx].control_curr;
+            network_to_compress(comp);
+
+            if (comp->block_idx >= rdma->local_ram_blocks.nb_blocks) {
+                error_report("rdma: 'compress' bad block index %u (vs %d)",
+                        (unsigned int)comp->block_idx,
+                        rdma->local_ram_blocks.nb_blocks);
+                ret = -EIO;
+                goto out;
+            }
+            block = &(rdma->local_ram_blocks.block[comp->block_idx]);
+
+            host_addr = block->local_host_addr +
+                (comp->offset - block->offset);
+
+            ram_handle_compressed(host_addr, comp->value, comp->length);
+            break;
+
         case RDMA_CONTROL_REGISTER_FINISHED:
             goto out;
+
         case RDMA_CONTROL_RAM_BLOCKS_REQUEST:
             qsort(rdma->local_ram_blocks.block,
                   rdma->local_ram_blocks.nb_blocks,
@@ -4310,8 +4340,79 @@ static int qemu_multiRDMA_registration_handle(void *opaque)
                 error_report("rdma migration: error sending remote info");
                 goto out;
             }
+            break;
+
+        case RDMA_CONTROL_REGISTER_REQUEST:
+            reg_resp.repeat = head.repeat;
+            registers = (RDMARegister *) rdma->wr_data[idx].control_curr;
 
+            for (count = 0; count < head.repeat; count++) {
+                uint64_t chunk;
+                uint8_t *chunk_start, *chunk_end;
+
+                reg = &registers[count];
+                network_to_register(reg);
+
+                reg_result = &results[count];
+
+                if (reg->current_index >= rdma->local_ram_blocks.nb_blocks) {
+                    error_report("rdma: 'register' bad block index %u (vs %d)",
+                            (unsigned int)reg->current_index,
+                            rdma->local_ram_blocks.nb_blocks);
+                    ret = -ENOENT;
+                    goto out;
+                }
+                block = &(rdma->local_ram_blocks.block[reg->current_index]);
+                if (block->is_ram_block) {
+                    if (block->offset > reg->key.current_addr) {
+                        error_report("rdma: bad register address for block %s"
+                                " offset: %" PRIx64 " current_addr: %" PRIx64,
+                                block->block_name, block->offset,
+                                reg->key.current_addr);
+                        ret = -ERANGE;
+                        goto out;
+                    }
+                    host_addr = (block->local_host_addr +
+                            (reg->key.current_addr - block->offset));
+                    chunk = ram_chunk_index(block->local_host_addr,
+                            (uint8_t *) host_addr);
+                } else {
+                    chunk = reg->key.chunk;
+                    host_addr = block->local_host_addr +
+                        (reg->key.chunk * (1UL << RDMA_REG_CHUNK_SHIFT));
+                    /* Check for particularly bad chunk value */
+                    if (host_addr < (void *)block->local_host_addr) {
+                        error_report("rdma: bad chunk for block %s"
+                                " chunk: %" PRIx64,
+                                block->block_name, reg->key.chunk);
+                        ret = -ERANGE;
+                        goto out;
+                    }
+                }
+                chunk_start = ram_chunk_start(block, chunk);
+                chunk_end = ram_chunk_end(block, chunk + reg->chunks);
+                if (qemu_rdma_register_and_get_keys(rdma, block,
+                            (uintptr_t)host_addr, NULL, &reg_result->rkey,
+                            chunk, chunk_start, chunk_end)) {
+                    error_report("cannot get rkey");
+                    ret = -EINVAL;
+                    goto out;
+                }
+
+                reg_result->host_addr = (uintptr_t)block->local_host_addr;
+
+                result_to_network(reg_result);
+            }
+
+            ret = qemu_rdma_post_send_control(rdma,
+                    (uint8_t *) results, &reg_resp);
+
+            if (ret < 0) {
+                error_report("Failed to send control buffer");
+                goto out;
+            }
             break;
+
         default:
             error_report("Unknown control message %s", control_desc(head.type));
             ret = -EIO;
-- 
2.19.1




^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH RFC 12/12] migration/rdma: only register the virt-ram block for MultiRDMA
  2020-01-09  4:59 [PATCH RFC 00/12] *** mulitple RDMA channels for migration *** Zhimin Feng
                   ` (10 preceding siblings ...)
  2020-01-09  4:59 ` [PATCH RFC 11/12] migration/rdma: use multiRDMA to send RAM block for NOT " Zhimin Feng
@ 2020-01-09  4:59 ` Zhimin Feng
  2020-01-17 18:52   ` Dr. David Alan Gilbert
  2020-01-09 10:38 ` [PATCH RFC 00/12] *** mulitple RDMA channels for migration *** no-reply
  2020-01-15 19:57 ` Dr. David Alan Gilbert
  13 siblings, 1 reply; 37+ messages in thread
From: Zhimin Feng @ 2020-01-09  4:59 UTC (permalink / raw)
  To: quintela, dgilbert, armbru, eblake
  Cc: jemmy858585, fengzhimin, qemu-devel, zhang.zhanghailiang

From: fengzhimin <fengzhimin1@huawei.com>

The virt-ram block is sent by MultiRDMA, so we only register it for
MultiRDMA channels and main channel don't register the virt-ram block.

Signed-off-by: fengzhimin <fengzhimin1@huawei.com>
---
 migration/rdma.c | 140 +++++++++++++++++++++++++++++++++++++----------
 1 file changed, 112 insertions(+), 28 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 0a150099e2..1477fd509b 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -618,7 +618,9 @@ const char *print_wrid(int wrid);
 static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
                                    uint8_t *data, RDMAControlHeader *resp,
                                    int *resp_idx,
-                                   int (*callback)(RDMAContext *rdma));
+                                   int (*callback)(RDMAContext *rdma,
+                                   uint8_t id),
+                                   uint8_t id);
 
 static inline uint64_t ram_chunk_index(const uint8_t *start,
                                        const uint8_t *host)
@@ -1198,24 +1200,81 @@ static int qemu_rdma_alloc_qp(RDMAContext *rdma)
     return 0;
 }
 
-static int qemu_rdma_reg_whole_ram_blocks(RDMAContext *rdma)
+/*
+ * Parameters:
+ *    @id == UNUSED_ID :
+ *    This means that we register memory for the main RDMA channel,
+ *    the main RDMA channel don't register the mach-virt.ram block
+ *    when we use multiRDMA method to migrate.
+ *
+ *    @id == 0 or id == 1 or ... :
+ *    This means that we register memory for the multiRDMA channels,
+ *    the multiRDMA channels only register memory for the mach-virt.ram
+ *    block when we use multiRDAM method to migrate.
+ */
+static int qemu_rdma_reg_whole_ram_blocks(RDMAContext *rdma, uint8_t id)
 {
     int i;
     RDMALocalBlocks *local = &rdma->local_ram_blocks;
 
-    for (i = 0; i < local->nb_blocks; i++) {
-        local->block[i].mr =
-            ibv_reg_mr(rdma->pd,
-                    local->block[i].local_host_addr,
-                    local->block[i].length,
-                    IBV_ACCESS_LOCAL_WRITE |
-                    IBV_ACCESS_REMOTE_WRITE
-                    );
-        if (!local->block[i].mr) {
-            perror("Failed to register local dest ram block!\n");
-            break;
+    if (migrate_use_multiRDMA()) {
+        if (id == UNUSED_ID) {
+            for (i = 0; i < local->nb_blocks; i++) {
+                /* main RDMA channel don't register the mach-virt.ram block */
+                if (strcmp(local->block[i].block_name, "mach-virt.ram") == 0) {
+                    continue;
+                }
+
+                local->block[i].mr =
+                    ibv_reg_mr(rdma->pd,
+                            local->block[i].local_host_addr,
+                            local->block[i].length,
+                            IBV_ACCESS_LOCAL_WRITE |
+                            IBV_ACCESS_REMOTE_WRITE
+                            );
+                if (!local->block[i].mr) {
+                    perror("Failed to register local dest ram block!\n");
+                    break;
+                }
+                rdma->total_registrations++;
+            }
+        } else {
+            for (i = 0; i < local->nb_blocks; i++) {
+                /*
+                 * The multiRDAM channels only register
+                 * the mach-virt.ram block
+                 */
+                if (strcmp(local->block[i].block_name, "mach-virt.ram") == 0) {
+                    local->block[i].mr =
+                        ibv_reg_mr(rdma->pd,
+                                local->block[i].local_host_addr,
+                                local->block[i].length,
+                                IBV_ACCESS_LOCAL_WRITE |
+                                IBV_ACCESS_REMOTE_WRITE
+                                );
+                    if (!local->block[i].mr) {
+                        perror("Failed to register local dest ram block!\n");
+                        break;
+                    }
+                    rdma->total_registrations++;
+                }
+            }
+        }
+    } else {
+        for (i = 0; i < local->nb_blocks; i++) {
+            local->block[i].mr =
+                ibv_reg_mr(rdma->pd,
+                        local->block[i].local_host_addr,
+                        local->block[i].length,
+                        IBV_ACCESS_LOCAL_WRITE |
+                        IBV_ACCESS_REMOTE_WRITE
+                        );
+            if (!local->block[i].mr) {
+                perror("Failed to register local dest ram block!\n");
+                break;
+            }
+            rdma->total_registrations++;
         }
-        rdma->total_registrations++;
     }
 
     if (i >= local->nb_blocks) {
@@ -1223,8 +1282,10 @@ static int qemu_rdma_reg_whole_ram_blocks(RDMAContext *rdma)
     }
 
     for (i--; i >= 0; i--) {
-        ibv_dereg_mr(local->block[i].mr);
-        rdma->total_registrations--;
+        if (local->block[i].mr) {
+            ibv_dereg_mr(local->block[i].mr);
+            rdma->total_registrations--;
+        }
     }
 
     return -1;
@@ -1446,7 +1507,7 @@ static int qemu_rdma_unregister_waiting(RDMAContext *rdma)
         reg.key.chunk = chunk;
         register_to_network(rdma, &reg);
         ret = qemu_rdma_exchange_send(rdma, &head, (uint8_t *) &reg,
-                                &resp, NULL, NULL);
+                                      &resp, NULL, NULL, UNUSED_ID);
         if (ret < 0) {
             return ret;
         }
@@ -1915,11 +1976,17 @@ static void qemu_rdma_move_header(RDMAContext *rdma, int idx,
  * The extra (optional) response is used during registration to us from having
  * to perform an *additional* exchange of message just to provide a response by
  * instead piggy-backing on the acknowledgement.
+ *
+ * Parameters:
+ *    @id : callback function need two parameters, id is the second parameter.
+ *
  */
 static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
                                    uint8_t *data, RDMAControlHeader *resp,
                                    int *resp_idx,
-                                   int (*callback)(RDMAContext *rdma))
+                                   int (*callback)(RDMAContext *rdma,
+                                   uint8_t id),
+                                   uint8_t id)
 {
     int ret = 0;
 
@@ -1973,7 +2040,7 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
     if (resp) {
         if (callback) {
             trace_qemu_rdma_exchange_send_issue_callback();
-            ret = callback(rdma);
+            ret = callback(rdma, id);
             if (ret < 0) {
                 return ret;
             }
@@ -2168,7 +2235,7 @@ retry:
 
                 compress_to_network(rdma, &comp);
                 ret = qemu_rdma_exchange_send(rdma, &head,
-                                (uint8_t *) &comp, NULL, NULL, NULL);
+                                (uint8_t *) &comp, NULL, NULL, NULL, UNUSED_ID);
 
                 if (ret < 0) {
                     return -EIO;
@@ -2195,7 +2262,7 @@ retry:
 
             register_to_network(rdma, &reg);
             ret = qemu_rdma_exchange_send(rdma, &head, (uint8_t *) &reg,
-                                    &resp, &reg_result_idx, NULL);
+                                    &resp, &reg_result_idx, NULL, UNUSED_ID);
             if (ret < 0) {
                 return ret;
             }
@@ -2828,7 +2895,8 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
             head.len = len;
             head.type = RDMA_CONTROL_QEMU_FILE;
 
-            ret = qemu_rdma_exchange_send(rdma, &head, data, NULL, NULL, NULL);
+            ret = qemu_rdma_exchange_send(rdma, &head, data, NULL,
+                                          NULL, NULL, UNUSED_ID);
 
             if (ret < 0) {
                 rdma->error_state = ret;
@@ -3660,7 +3728,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque)
             }
 
             if (rdma->pin_all) {
-                ret = qemu_rdma_reg_whole_ram_blocks(rdma);
+                ret = qemu_rdma_reg_whole_ram_blocks(rdma, UNUSED_ID);
                 if (ret) {
                     error_report("rdma migration: error dest "
                                     "registering ram blocks");
@@ -3675,6 +3743,15 @@ static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque)
              * their "local" descriptions with what was sent.
              */
             for (i = 0; i < local->nb_blocks; i++) {
+                /*
+                 * use the main RDMA channel to deliver the block of device
+                 * use the multiRDMA channels to deliver the RAMBlock
+                 */
+                if (migrate_use_multiRDMA() &&
+                    strcmp(local->block[i].block_name, "mach-virt.ram") == 0) {
+                        continue;
+                }
+
                 rdma->dest_blocks[i].remote_host_addr =
                     (uintptr_t)(local->block[i].local_host_addr);
 
@@ -3992,7 +4069,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
          */
         ret = qemu_rdma_exchange_send(rdma, &head, NULL, &resp,
                     &reg_result_idx, rdma->pin_all ?
-                    qemu_rdma_reg_whole_ram_blocks : NULL);
+                    qemu_rdma_reg_whole_ram_blocks : NULL, UNUSED_ID);
         if (ret < 0) {
             ERROR(errp, "receiving remote info!");
             return ret;
@@ -4025,6 +4102,11 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
         memcpy(rdma->dest_blocks,
             rdma->wr_data[reg_result_idx].control_curr, resp.len);
         for (i = 0; i < nb_dest_blocks; i++) {
+            if (migrate_use_multiRDMA() &&
+                strcmp(local->block[i].block_name, "mach-virt.ram") == 0) {
+                continue;
+            }
+
             network_to_dest_block(&rdma->dest_blocks[i]);
 
             /* We require that the blocks are in the same order */
@@ -4050,7 +4132,8 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
     trace_qemu_rdma_registration_stop(flags);
 
     head.type = RDMA_CONTROL_REGISTER_FINISHED;
-    ret = qemu_rdma_exchange_send(rdma, &head, NULL, NULL, NULL, NULL);
+    ret = qemu_rdma_exchange_send(rdma, &head, NULL, NULL,
+                                  NULL, NULL, UNUSED_ID);
 
     if (migrate_use_multiRDMA()) {
         /*
@@ -4298,7 +4381,7 @@ static int qemu_multiRDMA_registration_handle(void *opaque)
                   sizeof(RDMALocalBlock), dest_ram_sort_func);
 
             if (rdma->pin_all) {
-                ret = qemu_rdma_reg_whole_ram_blocks(rdma);
+                ret = qemu_rdma_reg_whole_ram_blocks(rdma, p->id);
                 if (ret) {
                     error_report("rdma migration: error dest "
                                  "registering ram blocks");
@@ -4680,7 +4763,7 @@ static void *multiRDMA_send_thread(void *opaque)
 
     ret = qemu_rdma_exchange_send(rdma, &head, NULL, &resp,
             &reg_result_idx, rdma->pin_all ?
-            qemu_rdma_reg_whole_ram_blocks : NULL);
+            qemu_rdma_reg_whole_ram_blocks : NULL, p->id);
     if (ret < 0) {
         return NULL;
     }
@@ -4749,7 +4832,8 @@ static void *multiRDMA_send_thread(void *opaque)
 
         /* Send FINISHED to the destination */
         head.type = RDMA_CONTROL_REGISTER_FINISHED;
-        ret = qemu_rdma_exchange_send(rdma, &head, NULL, NULL, NULL, NULL);
+        ret = qemu_rdma_exchange_send(rdma, &head, NULL, NULL,
+                                      NULL, NULL, p->id);
         if (ret < 0) {
             return NULL;
         }
-- 
2.19.1




^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC 00/12] *** mulitple RDMA channels for migration ***
  2020-01-09  4:59 [PATCH RFC 00/12] *** mulitple RDMA channels for migration *** Zhimin Feng
                   ` (11 preceding siblings ...)
  2020-01-09  4:59 ` [PATCH RFC 12/12] migration/rdma: only register the virt-ram block for MultiRDMA Zhimin Feng
@ 2020-01-09 10:38 ` no-reply
  2020-01-15 19:57 ` Dr. David Alan Gilbert
  13 siblings, 0 replies; 37+ messages in thread
From: no-reply @ 2020-01-09 10:38 UTC (permalink / raw)
  To: fengzhimin1
  Cc: jemmy858585, quintela, qemu-devel, fengzhimin1, armbru, dgilbert,
	zhang.zhanghailiang

Patchew URL: https://patchew.org/QEMU/20200109045922.904-1-fengzhimin1@huawei.com/



Hi,

This series failed the docker-mingw@fedora build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#! /bin/bash
export ARCH=x86_64
make docker-image-fedora V=1 NETWORK=1
time make docker-test-mingw@fedora J=14 NETWORK=1
=== TEST SCRIPT END ===

  LINK    aarch64-softmmu/qemu-system-aarch64w.exe
../migration/migration.o: In function `migrate_fd_cleanup':
/tmp/qemu-test/src/migration/migration.c:1549: undefined reference to `multiRDMA_save_cleanup'
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:206: qemu-system-x86_64w.exe] Error 1
make: *** [Makefile:483: x86_64-softmmu/all] Error 2
make: *** Waiting for unfinished jobs....
../migration/migration.o: In function `migrate_fd_cleanup':
/tmp/qemu-test/src/migration/migration.c:1549: undefined reference to `multiRDMA_save_cleanup'
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:206: qemu-system-aarch64w.exe] Error 1
make: *** [Makefile:483: aarch64-softmmu/all] Error 2
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 662, in <module>
    sys.exit(main())
---
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--label', 'com.qemu.instance.uuid=b69d215b8e4143ba8f9e54fa9d5a6cbc', '-u', '1001', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-1frpag52/src/docker-src.2020-01-09-05.35.46.27299:/var/tmp/qemu:z,ro', 'qemu:fedora', '/var/tmp/qemu/run', 'test-mingw']' returned non-zero exit status 2.
filter=--filter=label=com.qemu.instance.uuid=b69d215b8e4143ba8f9e54fa9d5a6cbc
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-1frpag52/src'
make: *** [docker-run-test-mingw@fedora] Error 2

real    2m32.657s
user    0m8.573s


The full log is available at
http://patchew.org/logs/20200109045922.904-1-fengzhimin1@huawei.com/testing.docker-mingw@fedora/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC 01/12] migration: Add multiRDMA capability support
  2020-01-09  4:59 ` [PATCH RFC 01/12] migration: Add multiRDMA capability support Zhimin Feng
@ 2020-01-13 15:30   ` Markus Armbruster
  2020-01-15  1:55     ` fengzhimin
  2020-01-13 16:26   ` Eric Blake
  2020-01-15 18:09   ` Dr. David Alan Gilbert
  2 siblings, 1 reply; 37+ messages in thread
From: Markus Armbruster @ 2020-01-13 15:30 UTC (permalink / raw)
  To: Zhimin Feng
  Cc: zhang.zhanghailiang, quintela, qemu-devel, dgilbert, jemmy858585

Zhimin Feng <fengzhimin1@huawei.com> writes:

> From: fengzhimin <fengzhimin1@huawei.com>
>
> Signed-off-by: fengzhimin <fengzhimin1@huawei.com>
> ---
[...]
> diff --git a/qapi/migration.json b/qapi/migration.json
> index b7348d0c8b..c995ffdc4c 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -421,6 +421,8 @@
>  # @validate-uuid: Send the UUID of the source to allow the destination
>  #                 to ensure it is the same. (since 4.2)
>  #
> +# @multirdma: Use more than one channels for rdma migration. (since 4.2)
> +#
>  # Since: 1.2
>  ##
>  { 'enum': 'MigrationCapability',
> @@ -428,7 +430,7 @@
>             'compress', 'events', 'postcopy-ram', 'x-colo', 'release-ram',
>             'block', 'return-path', 'pause-before-switchover', 'multifd',
>             'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate',
> -           'x-ignore-shared', 'validate-uuid' ] }
> +           'x-ignore-shared', 'validate-uuid', 'multirdma' ] }
>  
>  ##
>  # @MigrationCapabilityStatus:

Spell it multi-rdma?



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC 03/12] migration: Create the multi-rdma-channels parameter
  2020-01-09  4:59 ` [PATCH RFC 03/12] migration: Create the multi-rdma-channels parameter Zhimin Feng
@ 2020-01-13 15:34   ` Markus Armbruster
  2020-01-15  1:57     ` fengzhimin
  2020-01-16 13:20   ` Juan Quintela
  1 sibling, 1 reply; 37+ messages in thread
From: Markus Armbruster @ 2020-01-13 15:34 UTC (permalink / raw)
  To: Zhimin Feng
  Cc: zhang.zhanghailiang, quintela, qemu-devel, dgilbert, jemmy858585

Zhimin Feng <fengzhimin1@huawei.com> writes:

> From: fengzhimin <fengzhimin1@huawei.com>
>
> Indicates the number of RDMA threads that we would create.
> By default we create 2 threads for RDMA migration.
>
> Signed-off-by: fengzhimin <fengzhimin1@huawei.com>
[...]
> diff --git a/qapi/migration.json b/qapi/migration.json
> index c995ffdc4c..ab79bf0600 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -588,6 +588,10 @@
>  # @max-cpu-throttle: maximum cpu throttle percentage.
>  #                    Defaults to 99. (Since 3.1)
>  #
> +# @multi-rdma-channels: Number of channels used to migrate data in
> +#                       parallel. This is the same number that the

same number as

> +#                       number of multiRDMA used for migration.  The

Pardon my ignorance: what's "the number of multiRDMA used for
migration"?

> +#                       default value is 2 (since 4.2)

(since 5.0)

>  # Since: 2.4
>  ##
>  { 'enum': 'MigrationParameter',
> @@ -600,7 +604,8 @@
>             'downtime-limit', 'x-checkpoint-delay', 'block-incremental',
>             'multifd-channels',
>             'xbzrle-cache-size', 'max-postcopy-bandwidth',
> -           'max-cpu-throttle' ] }
> +           'max-cpu-throttle',
> +           'multi-rdma-channels'] }
>  
>  ##
>  # @MigrateSetParameters:
> @@ -690,6 +695,10 @@
>  # @max-cpu-throttle: maximum cpu throttle percentage.
>  #                    The default value is 99. (Since 3.1)
>  #
> +# @multi-rdma-channels: Number of channels used to migrate data in
> +#                       parallel. This is the same number that the
> +#                       number of multiRDMA used for migration.  The
> +#                       default value is 2 (since 4.2)

See above.

>  # Since: 2.4
>  ##
>  # TODO either fuse back into MigrationParameters, or make
> @@ -715,7 +724,8 @@
>              '*multifd-channels': 'int',
>              '*xbzrle-cache-size': 'size',
>              '*max-postcopy-bandwidth': 'size',
> -	    '*max-cpu-throttle': 'int' } }
> +     	    '*max-cpu-throttle': 'int',

Please use spaces instead of tab.

> +            '*multi-rdma-channels': 'int'} }
>  
>  ##
>  # @migrate-set-parameters:
> @@ -825,6 +835,10 @@
>  #                    Defaults to 99.
>  #                     (Since 3.1)
>  #
> +# @multi-rdma-channels: Number of channels used to migrate data in
> +#                       parallel. This is the same number that the
> +#                       number of multiRDMA used for migration.  The
> +#                       default value is 2 (since 4.2)
>  # Since: 2.4

See above.

>  ##
>  { 'struct': 'MigrationParameters',
> @@ -847,8 +861,9 @@
>              '*block-incremental': 'bool' ,
>              '*multifd-channels': 'uint8',
>              '*xbzrle-cache-size': 'size',
> -	    '*max-postcopy-bandwidth': 'size',
> -            '*max-cpu-throttle':'uint8'} }
> +     	    '*max-postcopy-bandwidth': 'size',
> +            '*max-cpu-throttle':'uint8',
> +            '*multi-rdma-channels':'uint8'} }
>  
>  ##
>  # @query-migrate-parameters:

Please use spaces instead of tab.



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC 01/12] migration: Add multiRDMA capability support
  2020-01-09  4:59 ` [PATCH RFC 01/12] migration: Add multiRDMA capability support Zhimin Feng
  2020-01-13 15:30   ` Markus Armbruster
@ 2020-01-13 16:26   ` Eric Blake
  2020-01-15  2:04     ` fengzhimin
  2020-01-15 18:09   ` Dr. David Alan Gilbert
  2 siblings, 1 reply; 37+ messages in thread
From: Eric Blake @ 2020-01-13 16:26 UTC (permalink / raw)
  To: Zhimin Feng, quintela, dgilbert, armbru
  Cc: jemmy858585, qemu-devel, zhang.zhanghailiang

On 1/8/20 10:59 PM, Zhimin Feng wrote:
> From: fengzhimin <fengzhimin1@huawei.com>
> 
> Signed-off-by: fengzhimin <fengzhimin1@huawei.com>
> ---

> +++ b/qapi/migration.json
> @@ -421,6 +421,8 @@
>   # @validate-uuid: Send the UUID of the source to allow the destination
>   #                 to ensure it is the same. (since 4.2)
>   #
> +# @multirdma: Use more than one channels for rdma migration. (since 4.2)

We've missed 4.2; the next release will be 5.0.

> +#
>   # Since: 1.2
>   ##
>   { 'enum': 'MigrationCapability',
> @@ -428,7 +430,7 @@
>              'compress', 'events', 'postcopy-ram', 'x-colo', 'release-ram',
>              'block', 'return-path', 'pause-before-switchover', 'multifd',
>              'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate',
> -           'x-ignore-shared', 'validate-uuid' ] }
> +           'x-ignore-shared', 'validate-uuid', 'multirdma' ] }
>   
>   ##
>   # @MigrationCapabilityStatus:
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH RFC 01/12] migration: Add multiRDMA capability support
  2020-01-13 15:30   ` Markus Armbruster
@ 2020-01-15  1:55     ` fengzhimin
  0 siblings, 0 replies; 37+ messages in thread
From: fengzhimin @ 2020-01-15  1:55 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Zhanghailiang, quintela, qemu-devel, dgilbert, jemmy858585

Thanks for your review. I will change it in the next version(V2).

-----Original Message-----
From: Markus Armbruster [mailto:armbru@redhat.com] 
Sent: Monday, January 13, 2020 11:30 PM
To: fengzhimin <fengzhimin1@huawei.com>
Cc: quintela@redhat.com; dgilbert@redhat.com; eblake@redhat.com; jemmy858585@gmail.com; qemu-devel@nongnu.org; Zhanghailiang <zhang.zhanghailiang@huawei.com>
Subject: Re: [PATCH RFC 01/12] migration: Add multiRDMA capability support

Zhimin Feng <fengzhimin1@huawei.com> writes:

> From: fengzhimin <fengzhimin1@huawei.com>
>
> Signed-off-by: fengzhimin <fengzhimin1@huawei.com>
> ---
[...]
> diff --git a/qapi/migration.json b/qapi/migration.json index 
> b7348d0c8b..c995ffdc4c 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -421,6 +421,8 @@
>  # @validate-uuid: Send the UUID of the source to allow the destination
>  #                 to ensure it is the same. (since 4.2)
>  #
> +# @multirdma: Use more than one channels for rdma migration. (since 
> +4.2) #
>  # Since: 1.2
>  ##
>  { 'enum': 'MigrationCapability',
> @@ -428,7 +430,7 @@
>             'compress', 'events', 'postcopy-ram', 'x-colo', 'release-ram',
>             'block', 'return-path', 'pause-before-switchover', 'multifd',
>             'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate',
> -           'x-ignore-shared', 'validate-uuid' ] }
> +           'x-ignore-shared', 'validate-uuid', 'multirdma' ] }
>  
>  ##
>  # @MigrationCapabilityStatus:

Spell it multi-rdma?



^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH RFC 03/12] migration: Create the multi-rdma-channels parameter
  2020-01-13 15:34   ` Markus Armbruster
@ 2020-01-15  1:57     ` fengzhimin
  0 siblings, 0 replies; 37+ messages in thread
From: fengzhimin @ 2020-01-15  1:57 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Zhanghailiang, quintela, qemu-devel, dgilbert, jemmy858585

Thanks for your review. I will fix these errors in the next version(V2).

-----Original Message-----
From: Markus Armbruster [mailto:armbru@redhat.com] 
Sent: Monday, January 13, 2020 11:35 PM
To: fengzhimin <fengzhimin1@huawei.com>
Cc: quintela@redhat.com; dgilbert@redhat.com; eblake@redhat.com; jemmy858585@gmail.com; qemu-devel@nongnu.org; Zhanghailiang <zhang.zhanghailiang@huawei.com>
Subject: Re: [PATCH RFC 03/12] migration: Create the multi-rdma-channels parameter

Zhimin Feng <fengzhimin1@huawei.com> writes:

> From: fengzhimin <fengzhimin1@huawei.com>
>
> Indicates the number of RDMA threads that we would create.
> By default we create 2 threads for RDMA migration.
>
> Signed-off-by: fengzhimin <fengzhimin1@huawei.com>
[...]
> diff --git a/qapi/migration.json b/qapi/migration.json index 
> c995ffdc4c..ab79bf0600 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -588,6 +588,10 @@
>  # @max-cpu-throttle: maximum cpu throttle percentage.
>  #                    Defaults to 99. (Since 3.1)
>  #
> +# @multi-rdma-channels: Number of channels used to migrate data in
> +#                       parallel. This is the same number that the

same number as

> +#                       number of multiRDMA used for migration.  The

Pardon my ignorance: what's "the number of multiRDMA used for migration"?

> +#                       default value is 2 (since 4.2)

(since 5.0)

>  # Since: 2.4
>  ##
>  { 'enum': 'MigrationParameter',
> @@ -600,7 +604,8 @@
>             'downtime-limit', 'x-checkpoint-delay', 'block-incremental',
>             'multifd-channels',
>             'xbzrle-cache-size', 'max-postcopy-bandwidth',
> -           'max-cpu-throttle' ] }
> +           'max-cpu-throttle',
> +           'multi-rdma-channels'] }
>  
>  ##
>  # @MigrateSetParameters:
> @@ -690,6 +695,10 @@
>  # @max-cpu-throttle: maximum cpu throttle percentage.
>  #                    The default value is 99. (Since 3.1)
>  #
> +# @multi-rdma-channels: Number of channels used to migrate data in
> +#                       parallel. This is the same number that the
> +#                       number of multiRDMA used for migration.  The
> +#                       default value is 2 (since 4.2)

See above.

>  # Since: 2.4
>  ##
>  # TODO either fuse back into MigrationParameters, or make @@ -715,7 
> +724,8 @@
>              '*multifd-channels': 'int',
>              '*xbzrle-cache-size': 'size',
>              '*max-postcopy-bandwidth': 'size',
> -	    '*max-cpu-throttle': 'int' } }
> +     	    '*max-cpu-throttle': 'int',

Please use spaces instead of tab.

> +            '*multi-rdma-channels': 'int'} }
>  
>  ##
>  # @migrate-set-parameters:
> @@ -825,6 +835,10 @@
>  #                    Defaults to 99.
>  #                     (Since 3.1)
>  #
> +# @multi-rdma-channels: Number of channels used to migrate data in
> +#                       parallel. This is the same number that the
> +#                       number of multiRDMA used for migration.  The
> +#                       default value is 2 (since 4.2)
>  # Since: 2.4

See above.

>  ##
>  { 'struct': 'MigrationParameters',
> @@ -847,8 +861,9 @@
>              '*block-incremental': 'bool' ,
>              '*multifd-channels': 'uint8',
>              '*xbzrle-cache-size': 'size',
> -	    '*max-postcopy-bandwidth': 'size',
> -            '*max-cpu-throttle':'uint8'} }
> +     	    '*max-postcopy-bandwidth': 'size',
> +            '*max-cpu-throttle':'uint8',
> +            '*multi-rdma-channels':'uint8'} }
>  
>  ##
>  # @query-migrate-parameters:

Please use spaces instead of tab.



^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH RFC 01/12] migration: Add multiRDMA capability support
  2020-01-13 16:26   ` Eric Blake
@ 2020-01-15  2:04     ` fengzhimin
  0 siblings, 0 replies; 37+ messages in thread
From: fengzhimin @ 2020-01-15  2:04 UTC (permalink / raw)
  To: Eric Blake, quintela, dgilbert, armbru
  Cc: jemmy858585, qemu-devel, Zhanghailiang

Thanks for your review. I will fix these errors in the next version(V2).
I hope you can busy schedule to find time to check other patches about Multi-RDMA.

-----Original Message-----
From: Eric Blake [mailto:eblake@redhat.com] 
Sent: Tuesday, January 14, 2020 12:27 AM
To: fengzhimin <fengzhimin1@huawei.com>; quintela@redhat.com; dgilbert@redhat.com; armbru@redhat.com
Cc: qemu-devel@nongnu.org; Zhanghailiang <zhang.zhanghailiang@huawei.com>; jemmy858585@gmail.com
Subject: Re: [PATCH RFC 01/12] migration: Add multiRDMA capability support

On 1/8/20 10:59 PM, Zhimin Feng wrote:
> From: fengzhimin <fengzhimin1@huawei.com>
> 
> Signed-off-by: fengzhimin <fengzhimin1@huawei.com>
> ---

> +++ b/qapi/migration.json
> @@ -421,6 +421,8 @@
>   # @validate-uuid: Send the UUID of the source to allow the destination
>   #                 to ensure it is the same. (since 4.2)
>   #
> +# @multirdma: Use more than one channels for rdma migration. (since 4.2)

We've missed 4.2; the next release will be 5.0.

> +#
>   # Since: 1.2
>   ##
>   { 'enum': 'MigrationCapability',
> @@ -428,7 +430,7 @@
>              'compress', 'events', 'postcopy-ram', 'x-colo', 'release-ram',
>              'block', 'return-path', 'pause-before-switchover', 'multifd',
>              'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate',
> -           'x-ignore-shared', 'validate-uuid' ] }
> +           'x-ignore-shared', 'validate-uuid', 'multirdma' ] }
>   
>   ##
>   # @MigrationCapabilityStatus:
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC 01/12] migration: Add multiRDMA capability support
  2020-01-09  4:59 ` [PATCH RFC 01/12] migration: Add multiRDMA capability support Zhimin Feng
  2020-01-13 15:30   ` Markus Armbruster
  2020-01-13 16:26   ` Eric Blake
@ 2020-01-15 18:09   ` Dr. David Alan Gilbert
  2020-01-16 13:18     ` Juan Quintela
  2 siblings, 1 reply; 37+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-15 18:09 UTC (permalink / raw)
  To: Zhimin Feng
  Cc: zhang.zhanghailiang, quintela, qemu-devel, armbru, jemmy858585

* Zhimin Feng (fengzhimin1@huawei.com) wrote:
> From: fengzhimin <fengzhimin1@huawei.com>
> 
> Signed-off-by: fengzhimin <fengzhimin1@huawei.com>

Instead of creating x-multirdma as a capability and the corresponding
parameter for the number of channels; it would be better just
to use the multifd parameters when used with an rdma transport;
as far as I know multifd doesn't work with rdma at the moment,
and to the user the idea of multifd over rdma is just the same thing.

Dave

> ---
>  migration/migration.c | 11 +++++++++++
>  migration/migration.h |  1 +
>  qapi/migration.json   |  4 +++-
>  3 files changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 354ad072fa..e98e236ef9 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2176,6 +2176,15 @@ bool migrate_use_events(void)
>      return s->enabled_capabilities[MIGRATION_CAPABILITY_EVENTS];
>  }
>  
> +bool migrate_use_multiRDMA(void)
> +{
> +    MigrationState *s;
> +
> +    s = migrate_get_current();
> +
> +    return s->enabled_capabilities[MIGRATION_CAPABILITY_MULTIRDMA];
> +}
> +
>  bool migrate_use_multifd(void)
>  {
>      MigrationState *s;
> @@ -3509,6 +3518,8 @@ static Property migration_properties[] = {
>      DEFINE_PROP_MIG_CAP("x-block", MIGRATION_CAPABILITY_BLOCK),
>      DEFINE_PROP_MIG_CAP("x-return-path", MIGRATION_CAPABILITY_RETURN_PATH),
>      DEFINE_PROP_MIG_CAP("x-multifd", MIGRATION_CAPABILITY_MULTIFD),
> +    DEFINE_PROP_MIG_CAP("x-multirdma",
> +                        MIGRATION_CAPABILITY_MULTIRDMA),
>  
>      DEFINE_PROP_END_OF_LIST(),
>  };
> diff --git a/migration/migration.h b/migration/migration.h
> index 79b3dda146..bb488028a6 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -296,6 +296,7 @@ bool migrate_ignore_shared(void);
>  bool migrate_validate_uuid(void);
>  
>  bool migrate_auto_converge(void);
> +bool migrate_use_multiRDMA(void);
>  bool migrate_use_multifd(void);
>  bool migrate_pause_before_switchover(void);
>  int migrate_multifd_channels(void);
> diff --git a/qapi/migration.json b/qapi/migration.json
> index b7348d0c8b..c995ffdc4c 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -421,6 +421,8 @@
>  # @validate-uuid: Send the UUID of the source to allow the destination
>  #                 to ensure it is the same. (since 4.2)
>  #
> +# @multirdma: Use more than one channels for rdma migration. (since 4.2)
> +#
>  # Since: 1.2
>  ##
>  { 'enum': 'MigrationCapability',
> @@ -428,7 +430,7 @@
>             'compress', 'events', 'postcopy-ram', 'x-colo', 'release-ram',
>             'block', 'return-path', 'pause-before-switchover', 'multifd',
>             'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate',
> -           'x-ignore-shared', 'validate-uuid' ] }
> +           'x-ignore-shared', 'validate-uuid', 'multirdma' ] }
>  
>  ##
>  # @MigrationCapabilityStatus:
> -- 
> 2.19.1
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC 06/12] migration/rdma: Transmit initial package
  2020-01-09  4:59 ` [PATCH RFC 06/12] migration/rdma: Transmit initial package Zhimin Feng
@ 2020-01-15 18:36   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 37+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-15 18:36 UTC (permalink / raw)
  To: Zhimin Feng
  Cc: zhang.zhanghailiang, quintela, qemu-devel, armbru, jemmy858585

* Zhimin Feng (fengzhimin1@huawei.com) wrote:
> From: fengzhimin <fengzhimin1@huawei.com>
> 
> Transmit initial package through the multiRDMA channels,
> so that we can identify the main channel and multiRDMA channels.
> 
> Signed-off-by: fengzhimin <fengzhimin1@huawei.com>

'packet' not 'package'

> ---
>  migration/rdma.c | 114 ++++++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 107 insertions(+), 7 deletions(-)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index 5b780bef36..db75a4372a 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -116,6 +116,16 @@ static uint32_t known_capabilities = RDMA_CAPABILITY_PIN_ALL;
>  
>  #define RDMA_WRID_CHUNK_MASK (~RDMA_WRID_BLOCK_MASK & ~RDMA_WRID_TYPE_MASK)
>  
> +/* Define magic to distinguish multiRDMA and main RDMA */
> +#define MULTIRDMA_MAGIC 0x11223344U
> +#define MAINRDMA_MAGIC 0x55667788U

Can you explain more about when you use these two - is it 'MAINRDMA' on
the first channel/file and multi on the extra ones???

> +#define ERROR_MAGIC 0xFFFFFFFFU
> +
> +#define MULTIRDMA_VERSION 1
> +#define MAINRDMA_VERSION 1
> +
> +#define UNUSED_ID 0xFFU

Make sure you can't set the number of channels to 256 (or more) then.

>  /*
>   * RDMA migration protocol:
>   * 1. RDMA Writes (data messages, i.e. RAM)
> @@ -167,6 +177,14 @@ enum {
>      RDMA_CONTROL_UNREGISTER_FINISHED, /* unpinning finished */
>  };
>  
> +/*
> + * Identify the MultiRDAM channel info
> + */
> +typedef struct {
> +    uint32_t magic;
> +    uint32_t version;
> +    uint8_t id;
> +} __attribute__((packed)) RDMAChannelInit_t;

Since you're using qemu_get/qemu_put on the QEMUFile, don't
bother with packing a struct, just use qemu_get_be32 and qemu_put_be32.

>  /*
>   * Memory and MR structures used to represent an IB Send/Recv work request.
> @@ -3430,7 +3448,7 @@ static int qemu_rdma_accept(RDMAContext *rdma)
>          int i;
>          RDMAContext *multi_rdma = NULL;
>          thread_count = migrate_multiRDMA_channels();
> -        /* create the multi Thread RDMA channels */
> +        /* create the multiRDMA channels */

That should be fixed in the previous patch that created it.

>          for (i = 0; i < thread_count; i++) {
>              if (multiRDMA_recv_state->params[i].rdma->cm_id == NULL) {
>                  multi_rdma = multiRDMA_recv_state->params[i].rdma;
> @@ -4058,6 +4076,48 @@ static QEMUFile *qemu_fopen_rdma(RDMAContext *rdma, const char *mode)
>      return rioc->file;
>  }
>  
> +static RDMAChannelInit_t migration_rdma_recv_initial_packet(QEMUFile *f,
> +                                                            Error **errp)
> +{
> +    RDMAChannelInit_t msg;
> +    int thread_count = migrate_multiRDMA_channels();
> +    qemu_get_buffer(f, (uint8_t *)&msg, sizeof(msg));
> +    be32_to_cpus(&msg.magic);
> +    be32_to_cpus(&msg.version);
> +
> +    if (msg.magic != MULTIRDMA_MAGIC &&
> +        msg.magic != MAINRDMA_MAGIC) {
> +        error_setg(errp, "multiRDMA: received unrecognized "
> +                   "packet magic %x", msg.magic);
> +        goto err;
> +    }
> +
> +    if (msg.magic == MULTIRDMA_MAGIC
> +        && msg.version != MULTIRDMA_VERSION) {
> +        error_setg(errp, "multiRDMA: received packet version "
> +                   "%d expected %d", msg.version, MULTIRDMA_VERSION);
> +        goto err;
> +    }
> +
> +    if (msg.magic == MAINRDMA_MAGIC && msg.version != MAINRDMA_VERSION) {
> +        error_setg(errp, "multiRDMA: received packet version "
> +                   "%d expected %d", msg.version, MAINRDMA_VERSION);
> +        goto err;
> +    }

It took me a few minutes to see the difference between these two
previous if's the error messages should be different.

> +    if (msg.magic == MULTIRDMA_MAGIC && msg.id > thread_count) {
> +        error_setg(errp, "multiRDMA: received channel version %d "
> +                   "expected %d", msg.version, MULTIRDMA_VERSION);

That text seems wrong, since you're checking for the thread count.

> +        goto err;
> +    }
> +
> +    return msg;
> +err:
> +    msg.magic = ERROR_MAGIC;
> +    msg.id = UNUSED_ID;
> +    return msg;
> +}
> +
>  static void *multiRDMA_recv_thread(void *opaque)
>  {
>      MultiRDMARecvParams *p = opaque;
> @@ -4100,13 +4160,34 @@ static void multiRDMA_recv_new_channel(QEMUFile *f, int id)
>  static void migration_multiRDMA_process_incoming(QEMUFile *f, RDMAContext *rdma)
>  {
>      MigrationIncomingState *mis = migration_incoming_get_current();
> +    Error *local_err = NULL;
> +    RDMAChannelInit_t msg = migration_rdma_recv_initial_packet(f, &local_err);

It's probably simpler here to check for
   if (local_err)

   and then you can avoid the need for ERROR_MAGIC.

> +    switch (msg.magic) {
> +    case MAINRDMA_MAGIC:
> +        if (!mis->from_src_file) {
> +            rdma->migration_started_on_destination = 1;
> +            migration_incoming_setup(f);
> +            migration_incoming_process();
> +        }
> +        break;
>  
> -    if (!mis->from_src_file) {
> -        rdma->migration_started_on_destination = 1;
> -        migration_incoming_setup(f);
> -        migration_incoming_process();
> -    } else {
> -        multiRDMA_recv_new_channel(f, multiRDMA_recv_state->count);
> +    case MULTIRDMA_MAGIC:
> +        multiRDMA_recv_new_channel(f, msg.id);
> +        break;
> +
> +    case ERROR_MAGIC:
> +    default:
> +        if (local_err) {
> +            MigrationState *s = migrate_get_current();
> +            migrate_set_error(s, local_err);
> +            if (s->state == MIGRATION_STATUS_SETUP ||
> +                    s->state == MIGRATION_STATUS_ACTIVE) {
> +                migrate_set_state(&s->state, s->state,
> +                        MIGRATION_STATUS_FAILED);
> +            }
> +        }
> +        break;
>      }
>  }
>  
> @@ -4245,10 +4326,26 @@ err:
>      multiRDMA_load_cleanup();
>  }
>  
> +static void migration_rdma_send_initial_packet(QEMUFile *f, uint8_t id)
> +{
> +    RDMAChannelInit_t msg;
> +
> +    msg.magic = cpu_to_be32(id == UNUSED_ID ?
> +                            MAINRDMA_MAGIC : MULTIRDMA_MAGIC);
> +    msg.version = cpu_to_be32(id == UNUSED_ID ?
> +                              MAINRDMA_VERSION : MULTIRDMA_VERSION);
> +    msg.id = id;
> +    qemu_put_buffer(f, (uint8_t *)&msg, sizeof(msg));
> +    qemu_fflush(f);
> +}
> +
>  static void *multiRDMA_send_thread(void *opaque)
>  {
>      MultiRDMASendParams *p = opaque;
>  
> +    /* send the multiRDMA channels magic */
> +    migration_rdma_send_initial_packet(p->file, p->id);
> +
>      while (true) {
>          qemu_mutex_lock(&p->mutex);
>          if (p->quit) {
> @@ -4579,6 +4676,9 @@ void rdma_start_outgoing_migration(void *opaque,
>      s->to_dst_file = qemu_fopen_rdma(rdma, "wb");
>      /* create multiRDMA channel */
>      if (migrate_use_multiRDMA()) {
> +        /* send the main RDMA channel magic */
> +        migration_rdma_send_initial_packet(s->to_dst_file, UNUSED_ID);
> +
>          if (multiRDMA_save_setup(host_port, errp) != 0) {
>              ERROR(errp, "init multiRDMA channels failure!");
>              goto err;
> -- 
> 2.19.1
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC 02/12] migration: Export the 'migration_incoming_setup' function            and add the 'migrate_use_rdma_pin_all' function
  2020-01-09  4:59 ` [PATCH RFC 02/12] migration: Export the 'migration_incoming_setup' function and add the 'migrate_use_rdma_pin_all' function Zhimin Feng
@ 2020-01-15 18:57   ` Dr. David Alan Gilbert
  2020-01-16 13:19   ` Juan Quintela
  1 sibling, 0 replies; 37+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-15 18:57 UTC (permalink / raw)
  To: Zhimin Feng
  Cc: zhang.zhanghailiang, quintela, qemu-devel, armbru, jemmy858585

* Zhimin Feng (fengzhimin1@huawei.com) wrote:
> From: fengzhimin <fengzhimin1@huawei.com>
> 
> We need to call the 'migration_incoming_setup' function in migration/rdma.c,
> so it has to be changed to a global function.
> 
> Signed-off-by: fengzhimin <fengzhimin1@huawei.com>

OK, but this should probably be split into two patches.

With a split;


Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/migration.c | 11 ++++++++++-
>  migration/migration.h |  2 ++
>  2 files changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index e98e236ef9..d9d73a5eac 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -518,7 +518,7 @@ fail:
>      exit(EXIT_FAILURE);
>  }
>  
> -static void migration_incoming_setup(QEMUFile *f)
> +void migration_incoming_setup(QEMUFile *f)
>  {
>      MigrationIncomingState *mis = migration_incoming_get_current();
>  
> @@ -2185,6 +2185,15 @@ bool migrate_use_multiRDMA(void)
>      return s->enabled_capabilities[MIGRATION_CAPABILITY_MULTIRDMA];
>  }
>  
> +bool migrate_use_rdma_pin_all(void)
> +{
> +    MigrationState *s;
> +
> +    s = migrate_get_current();
> +
> +    return s->enabled_capabilities[MIGRATION_CAPABILITY_RDMA_PIN_ALL];
> +}
> +
>  bool migrate_use_multifd(void)
>  {
>      MigrationState *s;
> diff --git a/migration/migration.h b/migration/migration.h
> index bb488028a6..0a23375b2f 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -265,6 +265,7 @@ struct MigrationState
>  
>  void migrate_set_state(int *state, int old_state, int new_state);
>  
> +void migration_incoming_setup(QEMUFile *f);
>  void migration_fd_process_incoming(QEMUFile *f);
>  void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp);
>  void migration_incoming_process(void);
> @@ -297,6 +298,7 @@ bool migrate_validate_uuid(void);
>  
>  bool migrate_auto_converge(void);
>  bool migrate_use_multiRDMA(void);
> +bool migrate_use_rdma_pin_all(void);
>  bool migrate_use_multifd(void);
>  bool migrate_pause_before_switchover(void);
>  int migrate_multifd_channels(void);
> -- 
> 2.19.1
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC 05/12] migration/rdma: Create the multiRDMA channels
  2020-01-09  4:59 ` [PATCH RFC 05/12] migration/rdma: Create the multiRDMA channels Zhimin Feng
@ 2020-01-15 19:54   ` Dr. David Alan Gilbert
  2020-01-16 13:30   ` Juan Quintela
  1 sibling, 0 replies; 37+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-15 19:54 UTC (permalink / raw)
  To: Zhimin Feng
  Cc: zhang.zhanghailiang, quintela, qemu-devel, armbru, jemmy858585

* Zhimin Feng (fengzhimin1@huawei.com) wrote:
> From: fengzhimin <fengzhimin1@huawei.com>
> 
> In both sides. We still don't transmit anything through them,
> and we only build the RDMA connections.
> 
> Signed-off-by: fengzhimin <fengzhimin1@huawei.com>
> ---
>  migration/rdma.c | 253 +++++++++++++++++++++++++++++++++++++++++------
>  1 file changed, 223 insertions(+), 30 deletions(-)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index 992e5abfed..5b780bef36 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -403,6 +403,10 @@ typedef struct {
>      char *name;
>      /* channel thread id */
>      QemuThread thread;
> +    /* RDMAContext channel */
> +    RDMAContext *rdma;
> +    /* communication channel */
> +    QEMUFile *file;
>      /* sem where to wait for more work */
>      QemuSemaphore sem;
>      /* this mutex protects the following parameters */
> @@ -429,6 +433,10 @@ typedef struct {
>      char *name;
>      /* channel thread id */
>      QemuThread thread;
> +    /* RDMAContext channel */
> +    RDMAContext *rdma;
> +    /* communication channel */
> +    QEMUFile *file;
>      /* sem where to wait for more work */
>      QemuSemaphore sem;
>      /* this mutex protects the following parameters */
> @@ -3417,6 +3425,27 @@ static int qemu_rdma_accept(RDMAContext *rdma)
>          qemu_set_fd_handler(rdma->channel->fd, rdma_accept_incoming_migration,
>                              NULL,
>                              (void *)(intptr_t)rdma->return_path);
> +    } else if (migrate_use_multiRDMA()) {
> +        int thread_count;
> +        int i;
> +        RDMAContext *multi_rdma = NULL;
> +        thread_count = migrate_multiRDMA_channels();
> +        /* create the multi Thread RDMA channels */
> +        for (i = 0; i < thread_count; i++) {
> +            if (multiRDMA_recv_state->params[i].rdma->cm_id == NULL) {
> +                multi_rdma = multiRDMA_recv_state->params[i].rdma;
> +                break;
> +            }
> +        }
> +
> +        if (multi_rdma) {
> +            qemu_set_fd_handler(rdma->channel->fd,
> +                                rdma_accept_incoming_migration,
> +                                NULL, (void *)(intptr_t)multi_rdma);
> +        } else {
> +            qemu_set_fd_handler(rdma->channel->fd, rdma_cm_poll_handler,
> +                                NULL, rdma);
> +        }
>      } else {
>          qemu_set_fd_handler(rdma->channel->fd, rdma_cm_poll_handler,
>                              NULL, rdma);
> @@ -4029,6 +4058,58 @@ static QEMUFile *qemu_fopen_rdma(RDMAContext *rdma, const char *mode)
>      return rioc->file;
>  }
>  
> +static void *multiRDMA_recv_thread(void *opaque)
> +{
> +    MultiRDMARecvParams *p = opaque;
> +
> +    while (true) {
> +        qemu_mutex_lock(&p->mutex);
> +        if (p->quit) {
> +            qemu_mutex_unlock(&p->mutex);
> +            break;
> +        }
> +        qemu_mutex_unlock(&p->mutex);
> +        qemu_sem_wait(&p->sem);
> +    }
> +
> +    qemu_mutex_lock(&p->mutex);
> +    p->running = false;
> +    qemu_mutex_unlock(&p->mutex);
> +
> +    return NULL;
> +}
> +
> +static void multiRDMA_recv_new_channel(QEMUFile *f, int id)
> +{
> +    MultiRDMARecvParams *p;
> +    Error *local_err = NULL;
> +
> +    p = &multiRDMA_recv_state->params[id];
> +    if (p->file != NULL) {
> +        error_setg(&local_err,
> +                   "multiRDMA: received id '%d' already setup'", id);
> +        return ;
> +    }
> +    p->file = f;
> +
> +    qemu_thread_create(&p->thread, p->name, multiRDMA_recv_thread, p,
> +                       QEMU_THREAD_JOINABLE);
> +    atomic_inc(&multiRDMA_recv_state->count);
> +}
> +
> +static void migration_multiRDMA_process_incoming(QEMUFile *f, RDMAContext *rdma)
> +{
> +    MigrationIncomingState *mis = migration_incoming_get_current();
> +
> +    if (!mis->from_src_file) {
> +        rdma->migration_started_on_destination = 1;
> +        migration_incoming_setup(f);
> +        migration_incoming_process();
> +    } else {
> +        multiRDMA_recv_new_channel(f, multiRDMA_recv_state->count);
> +    }
> +}
> +
>  static void rdma_accept_incoming_migration(void *opaque)
>  {
>      RDMAContext *rdma = opaque;
> @@ -4057,29 +4138,13 @@ static void rdma_accept_incoming_migration(void *opaque)
>          return;
>      }
>  
> -    rdma->migration_started_on_destination = 1;
> -    migration_fd_process_incoming(f);
> -}
> -
> -static void *multiRDMA_recv_thread(void *opaque)
> -{
> -    MultiRDMARecvParams *p = opaque;
> -
> -    while (true) {
> -        qemu_mutex_lock(&p->mutex);
> -        if (p->quit) {
> -            qemu_mutex_unlock(&p->mutex);
> -            break;
> -        }
> -        qemu_mutex_unlock(&p->mutex);
> -        qemu_sem_wait(&p->sem);
> +    if (migrate_use_multiRDMA()) {
> +        /* build the multiRDMA channels */
> +        migration_multiRDMA_process_incoming(f, rdma);
> +    } else {
> +        rdma->migration_started_on_destination = 1;
> +        migration_fd_process_incoming(f);
>      }
> -
> -    qemu_mutex_lock(&p->mutex);
> -    p->running = false;
> -    qemu_mutex_unlock(&p->mutex);
> -
> -    return NULL;
>  }
>  
>  static int multiRDMA_load_setup(const char *host_port, RDMAContext *rdma,
> @@ -4087,6 +4152,7 @@ static int multiRDMA_load_setup(const char *host_port, RDMAContext *rdma,
>  {
>      uint8_t i;
>      int thread_count;
> +    int idx;
>  
>      thread_count = migrate_multiRDMA_channels();
>      if (multiRDMA_recv_state == NULL) {
> @@ -4099,15 +4165,21 @@ static int multiRDMA_load_setup(const char *host_port, RDMAContext *rdma,
>          for (i = 0; i < thread_count; i++) {
>              MultiRDMARecvParams *p = &multiRDMA_recv_state->params[i];
>  
> +            p->rdma = qemu_rdma_data_init(host_port, errp);
> +            for (idx = 0; idx < RDMA_WRID_MAX; idx++) {
> +                p->rdma->wr_data[idx].control_len = 0;
> +                p->rdma->wr_data[idx].control_curr = NULL;
> +            }
> +            /* the CM channel and CM id is shared */
> +            p->rdma->channel = rdma->channel;
> +            p->rdma->listen_id = rdma->listen_id;
> +
>              qemu_mutex_init(&p->mutex);
>              qemu_sem_init(&p->sem, 0);
>              p->quit = false;
>              p->id = i;
>              p->running = true;
>              p->name = g_strdup_printf("multiRDMARecv_%d", i);
> -            qemu_thread_create(&p->thread, p->name, multiRDMA_recv_thread,
> -                               p, QEMU_THREAD_JOINABLE);
> -            atomic_inc(&multiRDMA_recv_state->count);
>          }
>      }
>  
> @@ -4155,6 +4227,7 @@ void rdma_start_incoming_migration(const char *host_port, Error **errp)
>          qemu_rdma_return_path_dest_init(rdma_return_path, rdma);
>      }
>  
> +    /* initialize the RDMAContext for multiRDMA */
>      if (migrate_use_multiRDMA()) {
>          if (multiRDMA_load_setup(host_port, rdma, &local_err) != 0) {
>              ERROR(errp, "init multiRDMA failure!");
> @@ -4193,10 +4266,29 @@ static void *multiRDMA_send_thread(void *opaque)
>      return NULL;
>  }
>  
> +static void multiRDMA_send_new_channel(QEMUFile *f, int id)
> +{
> +    MultiRDMASendParams *p;
> +    Error *local_err = NULL;
> +
> +    p = &multiRDMA_send_state->params[id];
> +    if (p->file != NULL) {
> +        error_setg(&local_err,
> +                   "multiRDMA: send id '%d' already setup'", id);
> +        return ;
> +    }
> +    p->file = f;
> +
> +    qemu_thread_create(&p->thread, p->name, multiRDMA_send_thread,
> +                       p, QEMU_THREAD_JOINABLE);
> +    atomic_inc(&multiRDMA_send_state->count);
> +}
> +
>  static int multiRDMA_save_setup(const char *host_port, Error **errp)
>  {
>      int thread_count;
>      uint8_t i;
> +    int ret;
>  
>      thread_count = migrate_multiRDMA_channels();
>      multiRDMA_send_state = g_malloc0(sizeof(*multiRDMA_send_state));
> @@ -4207,6 +4299,27 @@ static int multiRDMA_save_setup(const char *host_port, Error **errp)
>  
>      for (i = 0; i < thread_count; i++) {
>          MultiRDMASendParams *p = &multiRDMA_send_state->params[i];
> +        QEMUFile *f = NULL;
> +
> +        p->rdma = qemu_rdma_data_init(host_port, errp);
> +        if (p->rdma == NULL) {
> +            ERROR(errp, "init RDMA data failure for multi channel %d!", i);
> +            goto err;
> +        }
> +
> +        ret = qemu_rdma_source_init(p->rdma, migrate_use_rdma_pin_all(), errp);
> +        if (ret) {
> +            ERROR(errp, "init RDMA source failure for multi channel %d!", i);
> +            goto err;
> +        }
> +
> +        ret = qemu_rdma_connect(p->rdma, errp);
> +        if (ret) {
> +            ERROR(errp, "connect multi channel %d failure!", i);
> +            goto err;
> +        }
> +
> +        f = qemu_fopen_rdma(multiRDMA_send_state->params[i].rdma, "wb");
>          qemu_mutex_init(&p->mutex);
>          qemu_sem_init(&p->sem, 0);
>          p->quit = false;
> @@ -4214,12 +4327,20 @@ static int multiRDMA_save_setup(const char *host_port, Error **errp)
>          p->running = true;
>          p->name = g_strdup_printf("multiRDMASend_%d", i);
>  
> -        qemu_thread_create(&p->thread, p->name, multiRDMA_send_thread, p,
> -                           QEMU_THREAD_JOINABLE);
> -        atomic_inc(&multiRDMA_send_state->count);
> +        multiRDMA_send_new_channel(f, i);
>      }
>  
>      return 0;
> +
> +err:
> +    for (i = 0; i < thread_count; i++) {
> +        g_free(multiRDMA_send_state->params[i].rdma);
> +    }
> +
> +    g_free(multiRDMA_send_state->params);
> +    g_free(multiRDMA_send_state);
> +
> +    return -1;

This err path doesn't look enough - don't you have to do the equivalent
of qemu_rdma_cleanup for each channel that did succesfully connect,
and then also the one that's failed (perhaps after the first step)?

>  }
>  
>  static void multiRDMA_send_terminate_threads(void)
> @@ -4268,6 +4389,8 @@ int multiRDMA_save_cleanup(void)
>          qemu_sem_destroy(&p->sem);
>          g_free(p->name);
>          p->name = NULL;
> +        qemu_rdma_cleanup(multiRDMA_send_state->params[i].rdma);
> +        g_free(multiRDMA_send_state->params[i].rdma);
>      }
>  
>      qemu_sem_destroy(&multiRDMA_send_state->sem_sync);
> @@ -4292,6 +4415,71 @@ static void multiRDMA_recv_terminate_threads(void)
>      }
>  }
>  
> +static void qemu_multiRDMA_load_cleanup(RDMAContext *rdma)
> +{
> +    int idx;
> +
> +    if (rdma->cm_id && rdma->connected) {
> +        if ((rdma->error_state ||
> +             migrate_get_current()->state == MIGRATION_STATUS_CANCELLING) &&
> +            !rdma->received_error) {
> +            RDMAControlHeader head = { .len = 0,
> +                                       .type = RDMA_CONTROL_ERROR,
> +                                       .repeat = 1,
> +                                     };
> +            error_report("Early error. Sending error.");
> +            qemu_rdma_post_send_control(rdma, NULL, &head);
> +        }
> +
> +        rdma_disconnect(rdma->cm_id);
> +        trace_qemu_rdma_cleanup_disconnect();
> +        rdma->connected = false;
> +    }
> +
> +    g_free(rdma->dest_blocks);
> +    rdma->dest_blocks = NULL;
> +
> +    for (idx = 0; idx < RDMA_WRID_MAX; idx++) {
> +        if (rdma->wr_data[idx].control_mr) {
> +            rdma->total_registrations--;
> +            ibv_dereg_mr(rdma->wr_data[idx].control_mr);
> +        }
> +        rdma->wr_data[idx].control_mr = NULL;
> +    }
> +
> +    if (rdma->local_ram_blocks.block) {
> +        while (rdma->local_ram_blocks.nb_blocks) {
> +            rdma_delete_block(rdma, &rdma->local_ram_blocks.block[0]);
> +        }
> +    }
> +
> +    if (rdma->qp) {
> +        rdma_destroy_qp(rdma->cm_id);
> +        rdma->qp = NULL;
> +    }
> +    if (rdma->cq) {
> +        ibv_destroy_cq(rdma->cq);
> +        rdma->cq = NULL;
> +    }
> +    if (rdma->comp_channel) {
> +        ibv_destroy_comp_channel(rdma->comp_channel);
> +        rdma->comp_channel = NULL;
> +    }
> +    if (rdma->pd) {
> +        ibv_dealloc_pd(rdma->pd);
> +        rdma->pd = NULL;
> +    }
> +    if (rdma->cm_id) {
> +        rdma_destroy_id(rdma->cm_id);
> +        rdma->cm_id = NULL;
> +    }
> +
> +    rdma->listen_id = NULL;
> +    rdma->channel = NULL;
> +    g_free(rdma->host);
> +    rdma->host = NULL;
> +}
> +
>  int multiRDMA_load_cleanup(void)
>  {
>      int i;
> @@ -4323,6 +4511,8 @@ int multiRDMA_load_cleanup(void)
>          qemu_sem_destroy(&p->sem);
>          g_free(p->name);
>          p->name = NULL;
> +        qemu_multiRDMA_load_cleanup(multiRDMA_recv_state->params[i].rdma);
> +        g_free(multiRDMA_recv_state->params[i].rdma);
>      }
>  
>      qemu_sem_destroy(&multiRDMA_recv_state->sem_sync);
> @@ -4386,15 +4576,18 @@ void rdma_start_outgoing_migration(void *opaque,
>  
>      trace_rdma_start_outgoing_migration_after_rdma_connect();
>  
> +    s->to_dst_file = qemu_fopen_rdma(rdma, "wb");
> +    /* create multiRDMA channel */
>      if (migrate_use_multiRDMA()) {
>          if (multiRDMA_save_setup(host_port, errp) != 0) {
>              ERROR(errp, "init multiRDMA channels failure!");
>              goto err;
>          }
> +        migrate_fd_connect(s, NULL);
> +    } else {
> +        migrate_fd_connect(s, NULL);
>      }

If that's on both sides of the 'if' then it should move to here.

> -    s->to_dst_file = qemu_fopen_rdma(rdma, "wb");
> -    migrate_fd_connect(s, NULL);
>      return;
>  err:
>      g_free(rdma);
> -- 
> 2.19.1
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC 00/12] *** mulitple RDMA channels for migration ***
  2020-01-09  4:59 [PATCH RFC 00/12] *** mulitple RDMA channels for migration *** Zhimin Feng
                   ` (12 preceding siblings ...)
  2020-01-09 10:38 ` [PATCH RFC 00/12] *** mulitple RDMA channels for migration *** no-reply
@ 2020-01-15 19:57 ` Dr. David Alan Gilbert
  2020-01-16  1:37   ` fengzhimin
  13 siblings, 1 reply; 37+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-15 19:57 UTC (permalink / raw)
  To: Zhimin Feng
  Cc: zhang.zhanghailiang, quintela, qemu-devel, armbru, jemmy858585

* Zhimin Feng (fengzhimin1@huawei.com) wrote:
> From: fengzhimin <fengzhimin1@huawei.com>
> 
> Currently there is a single channel for RDMA migration, this causes
> the problem that the network bandwidth is not fully utilized for
> 25Gigabit NIC. Inspired by the Multifd, we use two RDMA channels to
> send RAM pages, which we call MultiRDMA.
> 
> We compare the migration performance of MultiRDMA with origin
> RDMA migration. The VM specifications for migration are as follows:
> - VM use 4k page;
> - the number of VCPU is 4;
> - the total memory is 16Gigabit;
> - use 'mempress' tool to pressurize VM(mempress 8000 500);
> - use 25Gigabit network card to migrate;
> 
> For origin RDMA and MultiRDMA migration, the total migration times of
> VM are as follows:
> +++++++++++++++++++++++++++++++++++++++++++++++++
> |             | NOT rdma-pin-all | rdma-pin-all |
> +++++++++++++++++++++++++++++++++++++++++++++++++
> | origin RDMA |       18 s       |     23 s     |
> -------------------------------------------------
> |  MultiRDMA  |       13 s       |     18 s     |
> +++++++++++++++++++++++++++++++++++++++++++++++++

Very nice.

> For NOT rdma-pin-all migration, the multiRDMA can improve the
> total migration time by about 27.8%.
> For rdma-pin-all migration, the multiRDMA can improve the
> total migration time by about 21.7%.
> 
> Test the multiRDMA migration like this:
> 'virsh migrate --live --rdma-parallel --migrateuri
> rdma://hostname domain qemu+tcp://hostname/system'

It will take me a while to finish the review; but another
general suggestion is add more trace_ calls; it will make it easier
to diagnose problems later.

Dave

> 
> fengzhimin (12):
>   migration: Add multiRDMA capability support
>   migration: Export the 'migration_incoming_setup' function           
>              and add the 'migrate_use_rdma_pin_all' function
>   migration: Create the multi-rdma-channels parameter
>   migration/rdma: Create multiRDMA migration threads
>   migration/rdma: Create the multiRDMA channels
>   migration/rdma: Transmit initial package
>   migration/rdma: Be sure all channels are created
>   migration/rdma: register memory for multiRDMA channels
>   migration/rdma: Wait for all multiRDMA to complete registration
>   migration/rdma: use multiRDMA to send RAM block for rdma-pin-all mode
>   migration/rdma: use multiRDMA to send RAM block for NOT rdma-pin-all
>                   mode
>   migration/rdma: only register the virt-ram block for MultiRDMA
> 
>  migration/migration.c |   55 +-
>  migration/migration.h |    6 +
>  migration/rdma.c      | 1320 +++++++++++++++++++++++++++++++++++++----
>  monitor/hmp-cmds.c    |    7 +
>  qapi/migration.json   |   27 +-
>  5 files changed, 1285 insertions(+), 130 deletions(-)
> 
> -- 
> 2.19.1
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH RFC 00/12] *** mulitple RDMA channels for migration ***
  2020-01-15 19:57 ` Dr. David Alan Gilbert
@ 2020-01-16  1:37   ` fengzhimin
  0 siblings, 0 replies; 37+ messages in thread
From: fengzhimin @ 2020-01-16  1:37 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Zhanghailiang, quintela, qemu-devel, armbru, jemmy858585

Thanks for your review. I will add more trace_ calls in the next version(V2) and modify its according to your suggestions.

-----Original Message-----
From: Dr. David Alan Gilbert [mailto:dgilbert@redhat.com] 
Sent: Thursday, January 16, 2020 3:57 AM
To: fengzhimin <fengzhimin1@huawei.com>
Cc: quintela@redhat.com; armbru@redhat.com; eblake@redhat.com; qemu-devel@nongnu.org; Zhanghailiang <zhang.zhanghailiang@huawei.com>; jemmy858585@gmail.com
Subject: Re: [PATCH RFC 00/12] *** mulitple RDMA channels for migration ***

* Zhimin Feng (fengzhimin1@huawei.com) wrote:
> From: fengzhimin <fengzhimin1@huawei.com>
> 
> Currently there is a single channel for RDMA migration, this causes 
> the problem that the network bandwidth is not fully utilized for 
> 25Gigabit NIC. Inspired by the Multifd, we use two RDMA channels to 
> send RAM pages, which we call MultiRDMA.
> 
> We compare the migration performance of MultiRDMA with origin RDMA 
> migration. The VM specifications for migration are as follows:
> - VM use 4k page;
> - the number of VCPU is 4;
> - the total memory is 16Gigabit;
> - use 'mempress' tool to pressurize VM(mempress 8000 500);
> - use 25Gigabit network card to migrate;
> 
> For origin RDMA and MultiRDMA migration, the total migration times of 
> VM are as follows:
> +++++++++++++++++++++++++++++++++++++++++++++++++
> |             | NOT rdma-pin-all | rdma-pin-all |
> +++++++++++++++++++++++++++++++++++++++++++++++++
> | origin RDMA |       18 s       |     23 s     |
> -------------------------------------------------
> |  MultiRDMA  |       13 s       |     18 s     |
> +++++++++++++++++++++++++++++++++++++++++++++++++

Very nice.

> For NOT rdma-pin-all migration, the multiRDMA can improve the total 
> migration time by about 27.8%.
> For rdma-pin-all migration, the multiRDMA can improve the total 
> migration time by about 21.7%.
> 
> Test the multiRDMA migration like this:
> 'virsh migrate --live --rdma-parallel --migrateuri rdma://hostname 
> domain qemu+tcp://hostname/system'

It will take me a while to finish the review; but another general suggestion is add more trace_ calls; it will make it easier to diagnose problems later.

Dave

> 
> fengzhimin (12):
>   migration: Add multiRDMA capability support
>   migration: Export the 'migration_incoming_setup' function           
>              and add the 'migrate_use_rdma_pin_all' function
>   migration: Create the multi-rdma-channels parameter
>   migration/rdma: Create multiRDMA migration threads
>   migration/rdma: Create the multiRDMA channels
>   migration/rdma: Transmit initial package
>   migration/rdma: Be sure all channels are created
>   migration/rdma: register memory for multiRDMA channels
>   migration/rdma: Wait for all multiRDMA to complete registration
>   migration/rdma: use multiRDMA to send RAM block for rdma-pin-all mode
>   migration/rdma: use multiRDMA to send RAM block for NOT rdma-pin-all
>                   mode
>   migration/rdma: only register the virt-ram block for MultiRDMA
> 
>  migration/migration.c |   55 +-
>  migration/migration.h |    6 +
>  migration/rdma.c      | 1320 +++++++++++++++++++++++++++++++++++++----
>  monitor/hmp-cmds.c    |    7 +
>  qapi/migration.json   |   27 +-
>  5 files changed, 1285 insertions(+), 130 deletions(-)
> 
> --
> 2.19.1
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC 01/12] migration: Add multiRDMA capability support
  2020-01-15 18:09   ` Dr. David Alan Gilbert
@ 2020-01-16 13:18     ` Juan Quintela
  2020-01-17  1:30       ` fengzhimin
  0 siblings, 1 reply; 37+ messages in thread
From: Juan Quintela @ 2020-01-16 13:18 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: zhang.zhanghailiang, Zhimin Feng, armbru, qemu-devel, jemmy858585

"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> * Zhimin Feng (fengzhimin1@huawei.com) wrote:
>> From: fengzhimin <fengzhimin1@huawei.com>
>> 
>> Signed-off-by: fengzhimin <fengzhimin1@huawei.com>
>
> Instead of creating x-multirdma as a capability and the corresponding
> parameter for the number of channels; it would be better just
> to use the multifd parameters when used with an rdma transport;
> as far as I know multifd doesn't work with rdma at the moment,
> and to the user the idea of multifd over rdma is just the same thing.

I was about to suggest that.  We could setup both capabilities:

multifd + rdma



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC 02/12] migration: Export the 'migration_incoming_setup' function and add the 'migrate_use_rdma_pin_all' function
  2020-01-09  4:59 ` [PATCH RFC 02/12] migration: Export the 'migration_incoming_setup' function and add the 'migrate_use_rdma_pin_all' function Zhimin Feng
  2020-01-15 18:57   ` Dr. David Alan Gilbert
@ 2020-01-16 13:19   ` Juan Quintela
  1 sibling, 0 replies; 37+ messages in thread
From: Juan Quintela @ 2020-01-16 13:19 UTC (permalink / raw)
  To: Zhimin Feng
  Cc: zhang.zhanghailiang, armbru, qemu-devel, dgilbert, jemmy858585

Zhimin Feng <fengzhimin1@huawei.com> wrote:
> From: fengzhimin <fengzhimin1@huawei.com>
>
> We need to call the 'migration_incoming_setup' function in migration/rdma.c,
> so it has to be changed to a global function.
>
> Signed-off-by: fengzhimin <fengzhimin1@huawei.com>

Agreed with Dave, two patches please.

Once done that.

Reviewed-by: Juan Quintela <quintela@redhat.com>



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC 03/12] migration: Create the multi-rdma-channels parameter
  2020-01-09  4:59 ` [PATCH RFC 03/12] migration: Create the multi-rdma-channels parameter Zhimin Feng
  2020-01-13 15:34   ` Markus Armbruster
@ 2020-01-16 13:20   ` Juan Quintela
  1 sibling, 0 replies; 37+ messages in thread
From: Juan Quintela @ 2020-01-16 13:20 UTC (permalink / raw)
  To: Zhimin Feng
  Cc: zhang.zhanghailiang, armbru, qemu-devel, dgilbert, jemmy858585

Zhimin Feng <fengzhimin1@huawei.com> wrote:
> From: fengzhimin <fengzhimin1@huawei.com>
>
> Indicates the number of RDMA threads that we would create.
> By default we create 2 threads for RDMA migration.
>
> Signed-off-by: fengzhimin <fengzhimin1@huawei.com>

Two things:

- If you use multifd as suggested on the 1st platch, please use
  multifd-channels here.

- Once told that, why are you using a default of two?

Thanks, Juan.



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC 04/12] migration/rdma: Create multiRDMA migration threads
  2020-01-09  4:59 ` [PATCH RFC 04/12] migration/rdma: Create multiRDMA migration threads Zhimin Feng
@ 2020-01-16 13:25   ` Juan Quintela
  2020-01-17  1:32     ` fengzhimin
  0 siblings, 1 reply; 37+ messages in thread
From: Juan Quintela @ 2020-01-16 13:25 UTC (permalink / raw)
  To: Zhimin Feng
  Cc: zhang.zhanghailiang, armbru, qemu-devel, dgilbert, jemmy858585

Zhimin Feng <fengzhimin1@huawei.com> wrote:
> From: fengzhimin <fengzhimin1@huawei.com>
>
> Creation of the RDMA threads, nothing inside yet.
>
> Signed-off-by: fengzhimin <fengzhimin1@huawei.com>

> ---
>  migration/migration.c |   1 +
>  migration/migration.h |   2 +
>  migration/rdma.c      | 283 ++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 286 insertions(+)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index 5756a4806e..f8d4eb657e 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1546,6 +1546,7 @@ static void migrate_fd_cleanup(MigrationState *s)
>          qemu_mutex_lock_iothread();
>  
>          multifd_save_cleanup();
> +        multiRDMA_save_cleanup();

Can we merge this with multifd?


> +typedef struct {
> +    /* this fields are not changed once the thread is created */
> +    /* channel number */
> +    uint8_t id;
> +    /* channel thread name */
> +    char *name;
> +    /* channel thread id */
> +    QemuThread thread;
> +    /* sem where to wait for more work */
> +    QemuSemaphore sem;
> +    /* this mutex protects the following parameters */
> +    QemuMutex mutex;
> +    /* is this channel thread running */
> +    bool running;
> +    /* should this thread finish */
> +    bool quit;
> +}  MultiRDMASendParams;

This is basically the same than MultiFBSendParams, same for the rest.

I would very much preffer not to have two sets of threads that are
really equivalent.

Thanks, Juan.



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC 05/12] migration/rdma: Create the multiRDMA channels
  2020-01-09  4:59 ` [PATCH RFC 05/12] migration/rdma: Create the multiRDMA channels Zhimin Feng
  2020-01-15 19:54   ` Dr. David Alan Gilbert
@ 2020-01-16 13:30   ` Juan Quintela
  1 sibling, 0 replies; 37+ messages in thread
From: Juan Quintela @ 2020-01-16 13:30 UTC (permalink / raw)
  To: Zhimin Feng
  Cc: zhang.zhanghailiang, armbru, qemu-devel, dgilbert, jemmy858585

Zhimin Feng <fengzhimin1@huawei.com> wrote:
> From: fengzhimin <fengzhimin1@huawei.com>
>
> In both sides. We still don't transmit anything through them,
> and we only build the RDMA connections.
>
> Signed-off-by: fengzhimin <fengzhimin1@huawei.com>

We need to abstract this on top of multifd, we are doing the equivalent
of the code for thread creation (you have it easier because you are able
to do it with a for, but it is basically the same.


Later, Juan.



^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH RFC 01/12] migration: Add multiRDMA capability support
  2020-01-16 13:18     ` Juan Quintela
@ 2020-01-17  1:30       ` fengzhimin
  0 siblings, 0 replies; 37+ messages in thread
From: fengzhimin @ 2020-01-17  1:30 UTC (permalink / raw)
  To: quintela, Dr. David Alan Gilbert
  Cc: Zhanghailiang, jemmy858585, armbru, qemu-devel

Thanks for your review. I will modify its according to your suggestions.

-----Original Message-----
From: Juan Quintela [mailto:quintela@redhat.com] 
Sent: Thursday, January 16, 2020 9:19 PM
To: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: fengzhimin <fengzhimin1@huawei.com>; armbru@redhat.com; eblake@redhat.com; qemu-devel@nongnu.org; Zhanghailiang <zhang.zhanghailiang@huawei.com>; jemmy858585@gmail.com
Subject: Re: [PATCH RFC 01/12] migration: Add multiRDMA capability support

"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> * Zhimin Feng (fengzhimin1@huawei.com) wrote:
>> From: fengzhimin <fengzhimin1@huawei.com>
>> 
>> Signed-off-by: fengzhimin <fengzhimin1@huawei.com>
>
> Instead of creating x-multirdma as a capability and the corresponding 
> parameter for the number of channels; it would be better just to use 
> the multifd parameters when used with an rdma transport; as far as I 
> know multifd doesn't work with rdma at the moment, and to the user the 
> idea of multifd over rdma is just the same thing.

I was about to suggest that.  We could setup both capabilities:

multifd + rdma



^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH RFC 04/12] migration/rdma: Create multiRDMA migration threads
  2020-01-16 13:25   ` Juan Quintela
@ 2020-01-17  1:32     ` fengzhimin
  0 siblings, 0 replies; 37+ messages in thread
From: fengzhimin @ 2020-01-17  1:32 UTC (permalink / raw)
  To: quintela; +Cc: Zhanghailiang, armbru, qemu-devel, dgilbert, jemmy858585

Thanks for your review. I will merge this with multifd.

-----Original Message-----
From: Juan Quintela [mailto:quintela@redhat.com] 
Sent: Thursday, January 16, 2020 9:25 PM
To: fengzhimin <fengzhimin1@huawei.com>
Cc: dgilbert@redhat.com; armbru@redhat.com; eblake@redhat.com; qemu-devel@nongnu.org; Zhanghailiang <zhang.zhanghailiang@huawei.com>; jemmy858585@gmail.com
Subject: Re: [PATCH RFC 04/12] migration/rdma: Create multiRDMA migration threads

Zhimin Feng <fengzhimin1@huawei.com> wrote:
> From: fengzhimin <fengzhimin1@huawei.com>
>
> Creation of the RDMA threads, nothing inside yet.
>
> Signed-off-by: fengzhimin <fengzhimin1@huawei.com>

> ---
>  migration/migration.c |   1 +
>  migration/migration.h |   2 +
>  migration/rdma.c      | 283 ++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 286 insertions(+)
>
> diff --git a/migration/migration.c b/migration/migration.c index 
> 5756a4806e..f8d4eb657e 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1546,6 +1546,7 @@ static void migrate_fd_cleanup(MigrationState *s)
>          qemu_mutex_lock_iothread();
>  
>          multifd_save_cleanup();
> +        multiRDMA_save_cleanup();

Can we merge this with multifd?


> +typedef struct {
> +    /* this fields are not changed once the thread is created */
> +    /* channel number */
> +    uint8_t id;
> +    /* channel thread name */
> +    char *name;
> +    /* channel thread id */
> +    QemuThread thread;
> +    /* sem where to wait for more work */
> +    QemuSemaphore sem;
> +    /* this mutex protects the following parameters */
> +    QemuMutex mutex;
> +    /* is this channel thread running */
> +    bool running;
> +    /* should this thread finish */
> +    bool quit;
> +}  MultiRDMASendParams;

This is basically the same than MultiFBSendParams, same for the rest.

I would very much preffer not to have two sets of threads that are really equivalent.

Thanks, Juan.



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC 12/12] migration/rdma: only register the virt-ram block for MultiRDMA
  2020-01-09  4:59 ` [PATCH RFC 12/12] migration/rdma: only register the virt-ram block for MultiRDMA Zhimin Feng
@ 2020-01-17 18:52   ` Dr. David Alan Gilbert
  2020-01-19  1:44     ` fengzhimin
  0 siblings, 1 reply; 37+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-17 18:52 UTC (permalink / raw)
  To: Zhimin Feng
  Cc: zhang.zhanghailiang, quintela, qemu-devel, armbru, jemmy858585

* Zhimin Feng (fengzhimin1@huawei.com) wrote:
> From: fengzhimin <fengzhimin1@huawei.com>
> 
> The virt-ram block is sent by MultiRDMA, so we only register it for
> MultiRDMA channels and main channel don't register the virt-ram block.
> 
> Signed-off-by: fengzhimin <fengzhimin1@huawei.com>

You can't specialise on the name of the RAMBlock like that.
'mach-virt.ram' is the name specific to just the main ram on just
aarch's machine type;  for example the name on x86 is completely
different and if you use NUMA or hotplug etc it would also be different
on aarch.

Is there a downside to also registering the mach-virt.ram on the main
channel?

Dave

> ---
>  migration/rdma.c | 140 +++++++++++++++++++++++++++++++++++++----------
>  1 file changed, 112 insertions(+), 28 deletions(-)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index 0a150099e2..1477fd509b 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -618,7 +618,9 @@ const char *print_wrid(int wrid);
>  static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
>                                     uint8_t *data, RDMAControlHeader *resp,
>                                     int *resp_idx,
> -                                   int (*callback)(RDMAContext *rdma));
> +                                   int (*callback)(RDMAContext *rdma,
> +                                   uint8_t id),
> +                                   uint8_t id);
>  
>  static inline uint64_t ram_chunk_index(const uint8_t *start,
>                                         const uint8_t *host)
> @@ -1198,24 +1200,81 @@ static int qemu_rdma_alloc_qp(RDMAContext *rdma)
>      return 0;
>  }
>  
> -static int qemu_rdma_reg_whole_ram_blocks(RDMAContext *rdma)
> +/*
> + * Parameters:
> + *    @id == UNUSED_ID :
> + *    This means that we register memory for the main RDMA channel,
> + *    the main RDMA channel don't register the mach-virt.ram block
> + *    when we use multiRDMA method to migrate.
> + *
> + *    @id == 0 or id == 1 or ... :
> + *    This means that we register memory for the multiRDMA channels,
> + *    the multiRDMA channels only register memory for the mach-virt.ram
> + *    block when we use multiRDAM method to migrate.
> + */
> +static int qemu_rdma_reg_whole_ram_blocks(RDMAContext *rdma, uint8_t id)
>  {
>      int i;
>      RDMALocalBlocks *local = &rdma->local_ram_blocks;
>  
> -    for (i = 0; i < local->nb_blocks; i++) {
> -        local->block[i].mr =
> -            ibv_reg_mr(rdma->pd,
> -                    local->block[i].local_host_addr,
> -                    local->block[i].length,
> -                    IBV_ACCESS_LOCAL_WRITE |
> -                    IBV_ACCESS_REMOTE_WRITE
> -                    );
> -        if (!local->block[i].mr) {
> -            perror("Failed to register local dest ram block!\n");
> -            break;
> +    if (migrate_use_multiRDMA()) {
> +        if (id == UNUSED_ID) {
> +            for (i = 0; i < local->nb_blocks; i++) {
> +                /* main RDMA channel don't register the mach-virt.ram block */
> +                if (strcmp(local->block[i].block_name, "mach-virt.ram") == 0) {
> +                    continue;
> +                }
> +
> +                local->block[i].mr =
> +                    ibv_reg_mr(rdma->pd,
> +                            local->block[i].local_host_addr,
> +                            local->block[i].length,
> +                            IBV_ACCESS_LOCAL_WRITE |
> +                            IBV_ACCESS_REMOTE_WRITE
> +                            );
> +                if (!local->block[i].mr) {
> +                    perror("Failed to register local dest ram block!\n");
> +                    break;
> +                }
> +                rdma->total_registrations++;
> +            }
> +        } else {
> +            for (i = 0; i < local->nb_blocks; i++) {
> +                /*
> +                 * The multiRDAM channels only register
> +                 * the mach-virt.ram block
> +                 */
> +                if (strcmp(local->block[i].block_name, "mach-virt.ram") == 0) {
> +                    local->block[i].mr =
> +                        ibv_reg_mr(rdma->pd,
> +                                local->block[i].local_host_addr,
> +                                local->block[i].length,
> +                                IBV_ACCESS_LOCAL_WRITE |
> +                                IBV_ACCESS_REMOTE_WRITE
> +                                );
> +                    if (!local->block[i].mr) {
> +                        perror("Failed to register local dest ram block!\n");
> +                        break;
> +                    }
> +                    rdma->total_registrations++;
> +                }
> +            }
> +        }
> +    } else {
> +        for (i = 0; i < local->nb_blocks; i++) {
> +            local->block[i].mr =
> +                ibv_reg_mr(rdma->pd,
> +                        local->block[i].local_host_addr,
> +                        local->block[i].length,
> +                        IBV_ACCESS_LOCAL_WRITE |
> +                        IBV_ACCESS_REMOTE_WRITE
> +                        );
> +            if (!local->block[i].mr) {
> +                perror("Failed to register local dest ram block!\n");
> +                break;
> +            }
> +            rdma->total_registrations++;
>          }
> -        rdma->total_registrations++;
>      }
>  
>      if (i >= local->nb_blocks) {
> @@ -1223,8 +1282,10 @@ static int qemu_rdma_reg_whole_ram_blocks(RDMAContext *rdma)
>      }
>  
>      for (i--; i >= 0; i--) {
> -        ibv_dereg_mr(local->block[i].mr);
> -        rdma->total_registrations--;
> +        if (local->block[i].mr) {
> +            ibv_dereg_mr(local->block[i].mr);
> +            rdma->total_registrations--;
> +        }
>      }
>  
>      return -1;
> @@ -1446,7 +1507,7 @@ static int qemu_rdma_unregister_waiting(RDMAContext *rdma)
>          reg.key.chunk = chunk;
>          register_to_network(rdma, &reg);
>          ret = qemu_rdma_exchange_send(rdma, &head, (uint8_t *) &reg,
> -                                &resp, NULL, NULL);
> +                                      &resp, NULL, NULL, UNUSED_ID);
>          if (ret < 0) {
>              return ret;
>          }
> @@ -1915,11 +1976,17 @@ static void qemu_rdma_move_header(RDMAContext *rdma, int idx,
>   * The extra (optional) response is used during registration to us from having
>   * to perform an *additional* exchange of message just to provide a response by
>   * instead piggy-backing on the acknowledgement.
> + *
> + * Parameters:
> + *    @id : callback function need two parameters, id is the second parameter.
> + *
>   */
>  static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
>                                     uint8_t *data, RDMAControlHeader *resp,
>                                     int *resp_idx,
> -                                   int (*callback)(RDMAContext *rdma))
> +                                   int (*callback)(RDMAContext *rdma,
> +                                   uint8_t id),
> +                                   uint8_t id)
>  {
>      int ret = 0;
>  
> @@ -1973,7 +2040,7 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
>      if (resp) {
>          if (callback) {
>              trace_qemu_rdma_exchange_send_issue_callback();
> -            ret = callback(rdma);
> +            ret = callback(rdma, id);
>              if (ret < 0) {
>                  return ret;
>              }
> @@ -2168,7 +2235,7 @@ retry:
>  
>                  compress_to_network(rdma, &comp);
>                  ret = qemu_rdma_exchange_send(rdma, &head,
> -                                (uint8_t *) &comp, NULL, NULL, NULL);
> +                                (uint8_t *) &comp, NULL, NULL, NULL, UNUSED_ID);
>  
>                  if (ret < 0) {
>                      return -EIO;
> @@ -2195,7 +2262,7 @@ retry:
>  
>              register_to_network(rdma, &reg);
>              ret = qemu_rdma_exchange_send(rdma, &head, (uint8_t *) &reg,
> -                                    &resp, &reg_result_idx, NULL);
> +                                    &resp, &reg_result_idx, NULL, UNUSED_ID);
>              if (ret < 0) {
>                  return ret;
>              }
> @@ -2828,7 +2895,8 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
>              head.len = len;
>              head.type = RDMA_CONTROL_QEMU_FILE;
>  
> -            ret = qemu_rdma_exchange_send(rdma, &head, data, NULL, NULL, NULL);
> +            ret = qemu_rdma_exchange_send(rdma, &head, data, NULL,
> +                                          NULL, NULL, UNUSED_ID);
>  
>              if (ret < 0) {
>                  rdma->error_state = ret;
> @@ -3660,7 +3728,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque)
>              }
>  
>              if (rdma->pin_all) {
> -                ret = qemu_rdma_reg_whole_ram_blocks(rdma);
> +                ret = qemu_rdma_reg_whole_ram_blocks(rdma, UNUSED_ID);
>                  if (ret) {
>                      error_report("rdma migration: error dest "
>                                      "registering ram blocks");
> @@ -3675,6 +3743,15 @@ static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque)
>               * their "local" descriptions with what was sent.
>               */
>              for (i = 0; i < local->nb_blocks; i++) {
> +                /*
> +                 * use the main RDMA channel to deliver the block of device
> +                 * use the multiRDMA channels to deliver the RAMBlock
> +                 */
> +                if (migrate_use_multiRDMA() &&
> +                    strcmp(local->block[i].block_name, "mach-virt.ram") == 0) {
> +                        continue;
> +                }
> +
>                  rdma->dest_blocks[i].remote_host_addr =
>                      (uintptr_t)(local->block[i].local_host_addr);
>  
> @@ -3992,7 +4069,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
>           */
>          ret = qemu_rdma_exchange_send(rdma, &head, NULL, &resp,
>                      &reg_result_idx, rdma->pin_all ?
> -                    qemu_rdma_reg_whole_ram_blocks : NULL);
> +                    qemu_rdma_reg_whole_ram_blocks : NULL, UNUSED_ID);
>          if (ret < 0) {
>              ERROR(errp, "receiving remote info!");
>              return ret;
> @@ -4025,6 +4102,11 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
>          memcpy(rdma->dest_blocks,
>              rdma->wr_data[reg_result_idx].control_curr, resp.len);
>          for (i = 0; i < nb_dest_blocks; i++) {
> +            if (migrate_use_multiRDMA() &&
> +                strcmp(local->block[i].block_name, "mach-virt.ram") == 0) {
> +                continue;
> +            }
> +
>              network_to_dest_block(&rdma->dest_blocks[i]);
>  
>              /* We require that the blocks are in the same order */
> @@ -4050,7 +4132,8 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
>      trace_qemu_rdma_registration_stop(flags);
>  
>      head.type = RDMA_CONTROL_REGISTER_FINISHED;
> -    ret = qemu_rdma_exchange_send(rdma, &head, NULL, NULL, NULL, NULL);
> +    ret = qemu_rdma_exchange_send(rdma, &head, NULL, NULL,
> +                                  NULL, NULL, UNUSED_ID);
>  
>      if (migrate_use_multiRDMA()) {
>          /*
> @@ -4298,7 +4381,7 @@ static int qemu_multiRDMA_registration_handle(void *opaque)
>                    sizeof(RDMALocalBlock), dest_ram_sort_func);
>  
>              if (rdma->pin_all) {
> -                ret = qemu_rdma_reg_whole_ram_blocks(rdma);
> +                ret = qemu_rdma_reg_whole_ram_blocks(rdma, p->id);
>                  if (ret) {
>                      error_report("rdma migration: error dest "
>                                   "registering ram blocks");
> @@ -4680,7 +4763,7 @@ static void *multiRDMA_send_thread(void *opaque)
>  
>      ret = qemu_rdma_exchange_send(rdma, &head, NULL, &resp,
>              &reg_result_idx, rdma->pin_all ?
> -            qemu_rdma_reg_whole_ram_blocks : NULL);
> +            qemu_rdma_reg_whole_ram_blocks : NULL, p->id);
>      if (ret < 0) {
>          return NULL;
>      }
> @@ -4749,7 +4832,8 @@ static void *multiRDMA_send_thread(void *opaque)
>  
>          /* Send FINISHED to the destination */
>          head.type = RDMA_CONTROL_REGISTER_FINISHED;
> -        ret = qemu_rdma_exchange_send(rdma, &head, NULL, NULL, NULL, NULL);
> +        ret = qemu_rdma_exchange_send(rdma, &head, NULL, NULL,
> +                                      NULL, NULL, p->id);
>          if (ret < 0) {
>              return NULL;
>          }
> -- 
> 2.19.1
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH RFC 12/12] migration/rdma: only register the virt-ram block for MultiRDMA
  2020-01-17 18:52   ` Dr. David Alan Gilbert
@ 2020-01-19  1:44     ` fengzhimin
  2020-01-20  9:05       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 37+ messages in thread
From: fengzhimin @ 2020-01-19  1:44 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Zhanghailiang, quintela, qemu-devel, armbru, jemmy858585

OK, I will modify it.

Due to the mach-virt.ram is sent by the multiRDMA channels instead of the main channel, it don't to register on the main channel.
It takes a long time to register the mach-virt.ram for VM with large capacity memory, so we shall try our best not to register it.

Thanks for your review.

Zhimin Feng

-----Original Message-----
From: Dr. David Alan Gilbert [mailto:dgilbert@redhat.com] 
Sent: Saturday, January 18, 2020 2:52 AM
To: fengzhimin <fengzhimin1@huawei.com>
Cc: quintela@redhat.com; armbru@redhat.com; eblake@redhat.com; qemu-devel@nongnu.org; Zhanghailiang <zhang.zhanghailiang@huawei.com>; jemmy858585@gmail.com
Subject: Re: [PATCH RFC 12/12] migration/rdma: only register the virt-ram block for MultiRDMA

* Zhimin Feng (fengzhimin1@huawei.com) wrote:
> From: fengzhimin <fengzhimin1@huawei.com>
> 
> The virt-ram block is sent by MultiRDMA, so we only register it for 
> MultiRDMA channels and main channel don't register the virt-ram block.
> 
> Signed-off-by: fengzhimin <fengzhimin1@huawei.com>

You can't specialise on the name of the RAMBlock like that.
'mach-virt.ram' is the name specific to just the main ram on just aarch's machine type;  for example the name on x86 is completely different and if you use NUMA or hotplug etc it would also be different on aarch.

Is there a downside to also registering the mach-virt.ram on the main channel?

Dave

> ---
>  migration/rdma.c | 140 
> +++++++++++++++++++++++++++++++++++++----------
>  1 file changed, 112 insertions(+), 28 deletions(-)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c index 
> 0a150099e2..1477fd509b 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -618,7 +618,9 @@ const char *print_wrid(int wrid);  static int 
> qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
>                                     uint8_t *data, RDMAControlHeader *resp,
>                                     int *resp_idx,
> -                                   int (*callback)(RDMAContext *rdma));
> +                                   int (*callback)(RDMAContext *rdma,
> +                                   uint8_t id),
> +                                   uint8_t id);
>  
>  static inline uint64_t ram_chunk_index(const uint8_t *start,
>                                         const uint8_t *host) @@ 
> -1198,24 +1200,81 @@ static int qemu_rdma_alloc_qp(RDMAContext *rdma)
>      return 0;
>  }
>  
> -static int qemu_rdma_reg_whole_ram_blocks(RDMAContext *rdma)
> +/*
> + * Parameters:
> + *    @id == UNUSED_ID :
> + *    This means that we register memory for the main RDMA channel,
> + *    the main RDMA channel don't register the mach-virt.ram block
> + *    when we use multiRDMA method to migrate.
> + *
> + *    @id == 0 or id == 1 or ... :
> + *    This means that we register memory for the multiRDMA channels,
> + *    the multiRDMA channels only register memory for the mach-virt.ram
> + *    block when we use multiRDAM method to migrate.
> + */
> +static int qemu_rdma_reg_whole_ram_blocks(RDMAContext *rdma, uint8_t 
> +id)
>  {
>      int i;
>      RDMALocalBlocks *local = &rdma->local_ram_blocks;
>  
> -    for (i = 0; i < local->nb_blocks; i++) {
> -        local->block[i].mr =
> -            ibv_reg_mr(rdma->pd,
> -                    local->block[i].local_host_addr,
> -                    local->block[i].length,
> -                    IBV_ACCESS_LOCAL_WRITE |
> -                    IBV_ACCESS_REMOTE_WRITE
> -                    );
> -        if (!local->block[i].mr) {
> -            perror("Failed to register local dest ram block!\n");
> -            break;
> +    if (migrate_use_multiRDMA()) {
> +        if (id == UNUSED_ID) {
> +            for (i = 0; i < local->nb_blocks; i++) {
> +                /* main RDMA channel don't register the mach-virt.ram block */
> +                if (strcmp(local->block[i].block_name, "mach-virt.ram") == 0) {
> +                    continue;
> +                }
> +
> +                local->block[i].mr =
> +                    ibv_reg_mr(rdma->pd,
> +                            local->block[i].local_host_addr,
> +                            local->block[i].length,
> +                            IBV_ACCESS_LOCAL_WRITE |
> +                            IBV_ACCESS_REMOTE_WRITE
> +                            );
> +                if (!local->block[i].mr) {
> +                    perror("Failed to register local dest ram block!\n");
> +                    break;
> +                }
> +                rdma->total_registrations++;
> +            }
> +        } else {
> +            for (i = 0; i < local->nb_blocks; i++) {
> +                /*
> +                 * The multiRDAM channels only register
> +                 * the mach-virt.ram block
> +                 */
> +                if (strcmp(local->block[i].block_name, "mach-virt.ram") == 0) {
> +                    local->block[i].mr =
> +                        ibv_reg_mr(rdma->pd,
> +                                local->block[i].local_host_addr,
> +                                local->block[i].length,
> +                                IBV_ACCESS_LOCAL_WRITE |
> +                                IBV_ACCESS_REMOTE_WRITE
> +                                );
> +                    if (!local->block[i].mr) {
> +                        perror("Failed to register local dest ram block!\n");
> +                        break;
> +                    }
> +                    rdma->total_registrations++;
> +                }
> +            }
> +        }
> +    } else {
> +        for (i = 0; i < local->nb_blocks; i++) {
> +            local->block[i].mr =
> +                ibv_reg_mr(rdma->pd,
> +                        local->block[i].local_host_addr,
> +                        local->block[i].length,
> +                        IBV_ACCESS_LOCAL_WRITE |
> +                        IBV_ACCESS_REMOTE_WRITE
> +                        );
> +            if (!local->block[i].mr) {
> +                perror("Failed to register local dest ram block!\n");
> +                break;
> +            }
> +            rdma->total_registrations++;
>          }
> -        rdma->total_registrations++;
>      }
>  
>      if (i >= local->nb_blocks) {
> @@ -1223,8 +1282,10 @@ static int qemu_rdma_reg_whole_ram_blocks(RDMAContext *rdma)
>      }
>  
>      for (i--; i >= 0; i--) {
> -        ibv_dereg_mr(local->block[i].mr);
> -        rdma->total_registrations--;
> +        if (local->block[i].mr) {
> +            ibv_dereg_mr(local->block[i].mr);
> +            rdma->total_registrations--;
> +        }
>      }
>  
>      return -1;
> @@ -1446,7 +1507,7 @@ static int qemu_rdma_unregister_waiting(RDMAContext *rdma)
>          reg.key.chunk = chunk;
>          register_to_network(rdma, &reg);
>          ret = qemu_rdma_exchange_send(rdma, &head, (uint8_t *) &reg,
> -                                &resp, NULL, NULL);
> +                                      &resp, NULL, NULL, UNUSED_ID);
>          if (ret < 0) {
>              return ret;
>          }
> @@ -1915,11 +1976,17 @@ static void qemu_rdma_move_header(RDMAContext *rdma, int idx,
>   * The extra (optional) response is used during registration to us from having
>   * to perform an *additional* exchange of message just to provide a response by
>   * instead piggy-backing on the acknowledgement.
> + *
> + * Parameters:
> + *    @id : callback function need two parameters, id is the second parameter.
> + *
>   */
>  static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
>                                     uint8_t *data, RDMAControlHeader *resp,
>                                     int *resp_idx,
> -                                   int (*callback)(RDMAContext *rdma))
> +                                   int (*callback)(RDMAContext *rdma,
> +                                   uint8_t id),
> +                                   uint8_t id)
>  {
>      int ret = 0;
>  
> @@ -1973,7 +2040,7 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
>      if (resp) {
>          if (callback) {
>              trace_qemu_rdma_exchange_send_issue_callback();
> -            ret = callback(rdma);
> +            ret = callback(rdma, id);
>              if (ret < 0) {
>                  return ret;
>              }
> @@ -2168,7 +2235,7 @@ retry:
>  
>                  compress_to_network(rdma, &comp);
>                  ret = qemu_rdma_exchange_send(rdma, &head,
> -                                (uint8_t *) &comp, NULL, NULL, NULL);
> +                                (uint8_t *) &comp, NULL, NULL, NULL, 
> + UNUSED_ID);
>  
>                  if (ret < 0) {
>                      return -EIO;
> @@ -2195,7 +2262,7 @@ retry:
>  
>              register_to_network(rdma, &reg);
>              ret = qemu_rdma_exchange_send(rdma, &head, (uint8_t *) &reg,
> -                                    &resp, &reg_result_idx, NULL);
> +                                    &resp, &reg_result_idx, NULL, 
> + UNUSED_ID);
>              if (ret < 0) {
>                  return ret;
>              }
> @@ -2828,7 +2895,8 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
>              head.len = len;
>              head.type = RDMA_CONTROL_QEMU_FILE;
>  
> -            ret = qemu_rdma_exchange_send(rdma, &head, data, NULL, NULL, NULL);
> +            ret = qemu_rdma_exchange_send(rdma, &head, data, NULL,
> +                                          NULL, NULL, UNUSED_ID);
>  
>              if (ret < 0) {
>                  rdma->error_state = ret; @@ -3660,7 +3728,7 @@ static 
> int qemu_rdma_registration_handle(QEMUFile *f, void *opaque)
>              }
>  
>              if (rdma->pin_all) {
> -                ret = qemu_rdma_reg_whole_ram_blocks(rdma);
> +                ret = qemu_rdma_reg_whole_ram_blocks(rdma, 
> + UNUSED_ID);
>                  if (ret) {
>                      error_report("rdma migration: error dest "
>                                      "registering ram blocks"); @@ 
> -3675,6 +3743,15 @@ static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque)
>               * their "local" descriptions with what was sent.
>               */
>              for (i = 0; i < local->nb_blocks; i++) {
> +                /*
> +                 * use the main RDMA channel to deliver the block of device
> +                 * use the multiRDMA channels to deliver the RAMBlock
> +                 */
> +                if (migrate_use_multiRDMA() &&
> +                    strcmp(local->block[i].block_name, "mach-virt.ram") == 0) {
> +                        continue;
> +                }
> +
>                  rdma->dest_blocks[i].remote_host_addr =
>                      (uintptr_t)(local->block[i].local_host_addr);
>  
> @@ -3992,7 +4069,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
>           */
>          ret = qemu_rdma_exchange_send(rdma, &head, NULL, &resp,
>                      &reg_result_idx, rdma->pin_all ?
> -                    qemu_rdma_reg_whole_ram_blocks : NULL);
> +                    qemu_rdma_reg_whole_ram_blocks : NULL, 
> + UNUSED_ID);
>          if (ret < 0) {
>              ERROR(errp, "receiving remote info!");
>              return ret;
> @@ -4025,6 +4102,11 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
>          memcpy(rdma->dest_blocks,
>              rdma->wr_data[reg_result_idx].control_curr, resp.len);
>          for (i = 0; i < nb_dest_blocks; i++) {
> +            if (migrate_use_multiRDMA() &&
> +                strcmp(local->block[i].block_name, "mach-virt.ram") == 0) {
> +                continue;
> +            }
> +
>              network_to_dest_block(&rdma->dest_blocks[i]);
>  
>              /* We require that the blocks are in the same order */ @@ 
> -4050,7 +4132,8 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
>      trace_qemu_rdma_registration_stop(flags);
>  
>      head.type = RDMA_CONTROL_REGISTER_FINISHED;
> -    ret = qemu_rdma_exchange_send(rdma, &head, NULL, NULL, NULL, NULL);
> +    ret = qemu_rdma_exchange_send(rdma, &head, NULL, NULL,
> +                                  NULL, NULL, UNUSED_ID);
>  
>      if (migrate_use_multiRDMA()) {
>          /*
> @@ -4298,7 +4381,7 @@ static int qemu_multiRDMA_registration_handle(void *opaque)
>                    sizeof(RDMALocalBlock), dest_ram_sort_func);
>  
>              if (rdma->pin_all) {
> -                ret = qemu_rdma_reg_whole_ram_blocks(rdma);
> +                ret = qemu_rdma_reg_whole_ram_blocks(rdma, p->id);
>                  if (ret) {
>                      error_report("rdma migration: error dest "
>                                   "registering ram blocks"); @@ 
> -4680,7 +4763,7 @@ static void *multiRDMA_send_thread(void *opaque)
>  
>      ret = qemu_rdma_exchange_send(rdma, &head, NULL, &resp,
>              &reg_result_idx, rdma->pin_all ?
> -            qemu_rdma_reg_whole_ram_blocks : NULL);
> +            qemu_rdma_reg_whole_ram_blocks : NULL, p->id);
>      if (ret < 0) {
>          return NULL;
>      }
> @@ -4749,7 +4832,8 @@ static void *multiRDMA_send_thread(void *opaque)
>  
>          /* Send FINISHED to the destination */
>          head.type = RDMA_CONTROL_REGISTER_FINISHED;
> -        ret = qemu_rdma_exchange_send(rdma, &head, NULL, NULL, NULL, NULL);
> +        ret = qemu_rdma_exchange_send(rdma, &head, NULL, NULL,
> +                                      NULL, NULL, p->id);
>          if (ret < 0) {
>              return NULL;
>          }
> --
> 2.19.1
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC 12/12] migration/rdma: only register the virt-ram block for MultiRDMA
  2020-01-19  1:44     ` fengzhimin
@ 2020-01-20  9:05       ` Dr. David Alan Gilbert
  2020-01-21  1:30         ` fengzhimin
  0 siblings, 1 reply; 37+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-20  9:05 UTC (permalink / raw)
  To: fengzhimin; +Cc: Zhanghailiang, quintela, qemu-devel, armbru, jemmy858585

* fengzhimin (fengzhimin1@huawei.com) wrote:
> OK, I will modify it.
> 
> Due to the mach-virt.ram is sent by the multiRDMA channels instead of the main channel, it don't to register on the main channel.

You might be OK if instead  of using the name, you use a size threshold;
e.g. you use the multirdma threads for any RAM block larger than say
128MB.

> It takes a long time to register the mach-virt.ram for VM with large capacity memory, so we shall try our best not to register it.

I'm curious why, I know it's expensive to register RAM blocks with rdma
code; but I thought that would just be the first time; it surprises me
that registering it with a 2nd RDMA channel is as expensive.

But then that makes me ask a 2nd question; is your performance increase
due to multiple threads or is it due to the multiple RDMA channels?
COuld you have multiple threads but still a single RDMA channel
(and with sufficient locking) still get the performance?

Dave

> Thanks for your review.
> 
> Zhimin Feng
> 
> -----Original Message-----
> From: Dr. David Alan Gilbert [mailto:dgilbert@redhat.com] 
> Sent: Saturday, January 18, 2020 2:52 AM
> To: fengzhimin <fengzhimin1@huawei.com>
> Cc: quintela@redhat.com; armbru@redhat.com; eblake@redhat.com; qemu-devel@nongnu.org; Zhanghailiang <zhang.zhanghailiang@huawei.com>; jemmy858585@gmail.com
> Subject: Re: [PATCH RFC 12/12] migration/rdma: only register the virt-ram block for MultiRDMA
> 
> * Zhimin Feng (fengzhimin1@huawei.com) wrote:
> > From: fengzhimin <fengzhimin1@huawei.com>
> > 
> > The virt-ram block is sent by MultiRDMA, so we only register it for 
> > MultiRDMA channels and main channel don't register the virt-ram block.
> > 
> > Signed-off-by: fengzhimin <fengzhimin1@huawei.com>
> 
> You can't specialise on the name of the RAMBlock like that.
> 'mach-virt.ram' is the name specific to just the main ram on just aarch's machine type;  for example the name on x86 is completely different and if you use NUMA or hotplug etc it would also be different on aarch.
> 
> Is there a downside to also registering the mach-virt.ram on the main channel?
> 
> Dave
> 
> > ---
> >  migration/rdma.c | 140 
> > +++++++++++++++++++++++++++++++++++++----------
> >  1 file changed, 112 insertions(+), 28 deletions(-)
> > 
> > diff --git a/migration/rdma.c b/migration/rdma.c index 
> > 0a150099e2..1477fd509b 100644
> > --- a/migration/rdma.c
> > +++ b/migration/rdma.c
> > @@ -618,7 +618,9 @@ const char *print_wrid(int wrid);  static int 
> > qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
> >                                     uint8_t *data, RDMAControlHeader *resp,
> >                                     int *resp_idx,
> > -                                   int (*callback)(RDMAContext *rdma));
> > +                                   int (*callback)(RDMAContext *rdma,
> > +                                   uint8_t id),
> > +                                   uint8_t id);
> >  
> >  static inline uint64_t ram_chunk_index(const uint8_t *start,
> >                                         const uint8_t *host) @@ 
> > -1198,24 +1200,81 @@ static int qemu_rdma_alloc_qp(RDMAContext *rdma)
> >      return 0;
> >  }
> >  
> > -static int qemu_rdma_reg_whole_ram_blocks(RDMAContext *rdma)
> > +/*
> > + * Parameters:
> > + *    @id == UNUSED_ID :
> > + *    This means that we register memory for the main RDMA channel,
> > + *    the main RDMA channel don't register the mach-virt.ram block
> > + *    when we use multiRDMA method to migrate.
> > + *
> > + *    @id == 0 or id == 1 or ... :
> > + *    This means that we register memory for the multiRDMA channels,
> > + *    the multiRDMA channels only register memory for the mach-virt.ram
> > + *    block when we use multiRDAM method to migrate.
> > + */
> > +static int qemu_rdma_reg_whole_ram_blocks(RDMAContext *rdma, uint8_t 
> > +id)
> >  {
> >      int i;
> >      RDMALocalBlocks *local = &rdma->local_ram_blocks;
> >  
> > -    for (i = 0; i < local->nb_blocks; i++) {
> > -        local->block[i].mr =
> > -            ibv_reg_mr(rdma->pd,
> > -                    local->block[i].local_host_addr,
> > -                    local->block[i].length,
> > -                    IBV_ACCESS_LOCAL_WRITE |
> > -                    IBV_ACCESS_REMOTE_WRITE
> > -                    );
> > -        if (!local->block[i].mr) {
> > -            perror("Failed to register local dest ram block!\n");
> > -            break;
> > +    if (migrate_use_multiRDMA()) {
> > +        if (id == UNUSED_ID) {
> > +            for (i = 0; i < local->nb_blocks; i++) {
> > +                /* main RDMA channel don't register the mach-virt.ram block */
> > +                if (strcmp(local->block[i].block_name, "mach-virt.ram") == 0) {
> > +                    continue;
> > +                }
> > +
> > +                local->block[i].mr =
> > +                    ibv_reg_mr(rdma->pd,
> > +                            local->block[i].local_host_addr,
> > +                            local->block[i].length,
> > +                            IBV_ACCESS_LOCAL_WRITE |
> > +                            IBV_ACCESS_REMOTE_WRITE
> > +                            );
> > +                if (!local->block[i].mr) {
> > +                    perror("Failed to register local dest ram block!\n");
> > +                    break;
> > +                }
> > +                rdma->total_registrations++;
> > +            }
> > +        } else {
> > +            for (i = 0; i < local->nb_blocks; i++) {
> > +                /*
> > +                 * The multiRDAM channels only register
> > +                 * the mach-virt.ram block
> > +                 */
> > +                if (strcmp(local->block[i].block_name, "mach-virt.ram") == 0) {
> > +                    local->block[i].mr =
> > +                        ibv_reg_mr(rdma->pd,
> > +                                local->block[i].local_host_addr,
> > +                                local->block[i].length,
> > +                                IBV_ACCESS_LOCAL_WRITE |
> > +                                IBV_ACCESS_REMOTE_WRITE
> > +                                );
> > +                    if (!local->block[i].mr) {
> > +                        perror("Failed to register local dest ram block!\n");
> > +                        break;
> > +                    }
> > +                    rdma->total_registrations++;
> > +                }
> > +            }
> > +        }
> > +    } else {
> > +        for (i = 0; i < local->nb_blocks; i++) {
> > +            local->block[i].mr =
> > +                ibv_reg_mr(rdma->pd,
> > +                        local->block[i].local_host_addr,
> > +                        local->block[i].length,
> > +                        IBV_ACCESS_LOCAL_WRITE |
> > +                        IBV_ACCESS_REMOTE_WRITE
> > +                        );
> > +            if (!local->block[i].mr) {
> > +                perror("Failed to register local dest ram block!\n");
> > +                break;
> > +            }
> > +            rdma->total_registrations++;
> >          }
> > -        rdma->total_registrations++;
> >      }
> >  
> >      if (i >= local->nb_blocks) {
> > @@ -1223,8 +1282,10 @@ static int qemu_rdma_reg_whole_ram_blocks(RDMAContext *rdma)
> >      }
> >  
> >      for (i--; i >= 0; i--) {
> > -        ibv_dereg_mr(local->block[i].mr);
> > -        rdma->total_registrations--;
> > +        if (local->block[i].mr) {
> > +            ibv_dereg_mr(local->block[i].mr);
> > +            rdma->total_registrations--;
> > +        }
> >      }
> >  
> >      return -1;
> > @@ -1446,7 +1507,7 @@ static int qemu_rdma_unregister_waiting(RDMAContext *rdma)
> >          reg.key.chunk = chunk;
> >          register_to_network(rdma, &reg);
> >          ret = qemu_rdma_exchange_send(rdma, &head, (uint8_t *) &reg,
> > -                                &resp, NULL, NULL);
> > +                                      &resp, NULL, NULL, UNUSED_ID);
> >          if (ret < 0) {
> >              return ret;
> >          }
> > @@ -1915,11 +1976,17 @@ static void qemu_rdma_move_header(RDMAContext *rdma, int idx,
> >   * The extra (optional) response is used during registration to us from having
> >   * to perform an *additional* exchange of message just to provide a response by
> >   * instead piggy-backing on the acknowledgement.
> > + *
> > + * Parameters:
> > + *    @id : callback function need two parameters, id is the second parameter.
> > + *
> >   */
> >  static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
> >                                     uint8_t *data, RDMAControlHeader *resp,
> >                                     int *resp_idx,
> > -                                   int (*callback)(RDMAContext *rdma))
> > +                                   int (*callback)(RDMAContext *rdma,
> > +                                   uint8_t id),
> > +                                   uint8_t id)
> >  {
> >      int ret = 0;
> >  
> > @@ -1973,7 +2040,7 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
> >      if (resp) {
> >          if (callback) {
> >              trace_qemu_rdma_exchange_send_issue_callback();
> > -            ret = callback(rdma);
> > +            ret = callback(rdma, id);
> >              if (ret < 0) {
> >                  return ret;
> >              }
> > @@ -2168,7 +2235,7 @@ retry:
> >  
> >                  compress_to_network(rdma, &comp);
> >                  ret = qemu_rdma_exchange_send(rdma, &head,
> > -                                (uint8_t *) &comp, NULL, NULL, NULL);
> > +                                (uint8_t *) &comp, NULL, NULL, NULL, 
> > + UNUSED_ID);
> >  
> >                  if (ret < 0) {
> >                      return -EIO;
> > @@ -2195,7 +2262,7 @@ retry:
> >  
> >              register_to_network(rdma, &reg);
> >              ret = qemu_rdma_exchange_send(rdma, &head, (uint8_t *) &reg,
> > -                                    &resp, &reg_result_idx, NULL);
> > +                                    &resp, &reg_result_idx, NULL, 
> > + UNUSED_ID);
> >              if (ret < 0) {
> >                  return ret;
> >              }
> > @@ -2828,7 +2895,8 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
> >              head.len = len;
> >              head.type = RDMA_CONTROL_QEMU_FILE;
> >  
> > -            ret = qemu_rdma_exchange_send(rdma, &head, data, NULL, NULL, NULL);
> > +            ret = qemu_rdma_exchange_send(rdma, &head, data, NULL,
> > +                                          NULL, NULL, UNUSED_ID);
> >  
> >              if (ret < 0) {
> >                  rdma->error_state = ret; @@ -3660,7 +3728,7 @@ static 
> > int qemu_rdma_registration_handle(QEMUFile *f, void *opaque)
> >              }
> >  
> >              if (rdma->pin_all) {
> > -                ret = qemu_rdma_reg_whole_ram_blocks(rdma);
> > +                ret = qemu_rdma_reg_whole_ram_blocks(rdma, 
> > + UNUSED_ID);
> >                  if (ret) {
> >                      error_report("rdma migration: error dest "
> >                                      "registering ram blocks"); @@ 
> > -3675,6 +3743,15 @@ static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque)
> >               * their "local" descriptions with what was sent.
> >               */
> >              for (i = 0; i < local->nb_blocks; i++) {
> > +                /*
> > +                 * use the main RDMA channel to deliver the block of device
> > +                 * use the multiRDMA channels to deliver the RAMBlock
> > +                 */
> > +                if (migrate_use_multiRDMA() &&
> > +                    strcmp(local->block[i].block_name, "mach-virt.ram") == 0) {
> > +                        continue;
> > +                }
> > +
> >                  rdma->dest_blocks[i].remote_host_addr =
> >                      (uintptr_t)(local->block[i].local_host_addr);
> >  
> > @@ -3992,7 +4069,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
> >           */
> >          ret = qemu_rdma_exchange_send(rdma, &head, NULL, &resp,
> >                      &reg_result_idx, rdma->pin_all ?
> > -                    qemu_rdma_reg_whole_ram_blocks : NULL);
> > +                    qemu_rdma_reg_whole_ram_blocks : NULL, 
> > + UNUSED_ID);
> >          if (ret < 0) {
> >              ERROR(errp, "receiving remote info!");
> >              return ret;
> > @@ -4025,6 +4102,11 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
> >          memcpy(rdma->dest_blocks,
> >              rdma->wr_data[reg_result_idx].control_curr, resp.len);
> >          for (i = 0; i < nb_dest_blocks; i++) {
> > +            if (migrate_use_multiRDMA() &&
> > +                strcmp(local->block[i].block_name, "mach-virt.ram") == 0) {
> > +                continue;
> > +            }
> > +
> >              network_to_dest_block(&rdma->dest_blocks[i]);
> >  
> >              /* We require that the blocks are in the same order */ @@ 
> > -4050,7 +4132,8 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
> >      trace_qemu_rdma_registration_stop(flags);
> >  
> >      head.type = RDMA_CONTROL_REGISTER_FINISHED;
> > -    ret = qemu_rdma_exchange_send(rdma, &head, NULL, NULL, NULL, NULL);
> > +    ret = qemu_rdma_exchange_send(rdma, &head, NULL, NULL,
> > +                                  NULL, NULL, UNUSED_ID);
> >  
> >      if (migrate_use_multiRDMA()) {
> >          /*
> > @@ -4298,7 +4381,7 @@ static int qemu_multiRDMA_registration_handle(void *opaque)
> >                    sizeof(RDMALocalBlock), dest_ram_sort_func);
> >  
> >              if (rdma->pin_all) {
> > -                ret = qemu_rdma_reg_whole_ram_blocks(rdma);
> > +                ret = qemu_rdma_reg_whole_ram_blocks(rdma, p->id);
> >                  if (ret) {
> >                      error_report("rdma migration: error dest "
> >                                   "registering ram blocks"); @@ 
> > -4680,7 +4763,7 @@ static void *multiRDMA_send_thread(void *opaque)
> >  
> >      ret = qemu_rdma_exchange_send(rdma, &head, NULL, &resp,
> >              &reg_result_idx, rdma->pin_all ?
> > -            qemu_rdma_reg_whole_ram_blocks : NULL);
> > +            qemu_rdma_reg_whole_ram_blocks : NULL, p->id);
> >      if (ret < 0) {
> >          return NULL;
> >      }
> > @@ -4749,7 +4832,8 @@ static void *multiRDMA_send_thread(void *opaque)
> >  
> >          /* Send FINISHED to the destination */
> >          head.type = RDMA_CONTROL_REGISTER_FINISHED;
> > -        ret = qemu_rdma_exchange_send(rdma, &head, NULL, NULL, NULL, NULL);
> > +        ret = qemu_rdma_exchange_send(rdma, &head, NULL, NULL,
> > +                                      NULL, NULL, p->id);
> >          if (ret < 0) {
> >              return NULL;
> >          }
> > --
> > 2.19.1
> > 
> > 
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH RFC 12/12] migration/rdma: only register the virt-ram block for MultiRDMA
  2020-01-20  9:05       ` Dr. David Alan Gilbert
@ 2020-01-21  1:30         ` fengzhimin
  0 siblings, 0 replies; 37+ messages in thread
From: fengzhimin @ 2020-01-21  1:30 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Zhanghailiang, quintela, qemu-devel, armbru, jemmy858585

The performance increase is due to the multiple RDMA channels instead of multiple threads, so we must register RAM blocks for the multiple RDMA channels.

Zhimin Feng

-----Original Message-----
From: Dr. David Alan Gilbert [mailto:dgilbert@redhat.com] 
Sent: Monday, January 20, 2020 5:05 PM
To: fengzhimin <fengzhimin1@huawei.com>
Cc: quintela@redhat.com; armbru@redhat.com; eblake@redhat.com; qemu-devel@nongnu.org; Zhanghailiang <zhang.zhanghailiang@huawei.com>; jemmy858585@gmail.com
Subject: Re: [PATCH RFC 12/12] migration/rdma: only register the virt-ram block for MultiRDMA

* fengzhimin (fengzhimin1@huawei.com) wrote:
> OK, I will modify it.
> 
> Due to the mach-virt.ram is sent by the multiRDMA channels instead of the main channel, it don't to register on the main channel.

You might be OK if instead  of using the name, you use a size threshold; e.g. you use the multirdma threads for any RAM block larger than say 128MB.

> It takes a long time to register the mach-virt.ram for VM with large capacity memory, so we shall try our best not to register it.

I'm curious why, I know it's expensive to register RAM blocks with rdma code; but I thought that would just be the first time; it surprises me that registering it with a 2nd RDMA channel is as expensive.

But then that makes me ask a 2nd question; is your performance increase due to multiple threads or is it due to the multiple RDMA channels?
COuld you have multiple threads but still a single RDMA channel (and with sufficient locking) still get the performance?

Dave

> Thanks for your review.
> 
> Zhimin Feng
> 
> -----Original Message-----
> From: Dr. David Alan Gilbert [mailto:dgilbert@redhat.com]
> Sent: Saturday, January 18, 2020 2:52 AM
> To: fengzhimin <fengzhimin1@huawei.com>
> Cc: quintela@redhat.com; armbru@redhat.com; eblake@redhat.com; 
> qemu-devel@nongnu.org; Zhanghailiang <zhang.zhanghailiang@huawei.com>; 
> jemmy858585@gmail.com
> Subject: Re: [PATCH RFC 12/12] migration/rdma: only register the 
> virt-ram block for MultiRDMA
> 
> * Zhimin Feng (fengzhimin1@huawei.com) wrote:
> > From: fengzhimin <fengzhimin1@huawei.com>
> > 
> > The virt-ram block is sent by MultiRDMA, so we only register it for 
> > MultiRDMA channels and main channel don't register the virt-ram block.
> > 
> > Signed-off-by: fengzhimin <fengzhimin1@huawei.com>
> 
> You can't specialise on the name of the RAMBlock like that.
> 'mach-virt.ram' is the name specific to just the main ram on just aarch's machine type;  for example the name on x86 is completely different and if you use NUMA or hotplug etc it would also be different on aarch.
> 
> Is there a downside to also registering the mach-virt.ram on the main channel?
> 
> Dave
> 
> > ---
> >  migration/rdma.c | 140
> > +++++++++++++++++++++++++++++++++++++----------
> >  1 file changed, 112 insertions(+), 28 deletions(-)
> > 
> > diff --git a/migration/rdma.c b/migration/rdma.c index 
> > 0a150099e2..1477fd509b 100644
> > --- a/migration/rdma.c
> > +++ b/migration/rdma.c
> > @@ -618,7 +618,9 @@ const char *print_wrid(int wrid);  static int 
> > qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
> >                                     uint8_t *data, RDMAControlHeader *resp,
> >                                     int *resp_idx,
> > -                                   int (*callback)(RDMAContext *rdma));
> > +                                   int (*callback)(RDMAContext *rdma,
> > +                                   uint8_t id),
> > +                                   uint8_t id);
> >  
> >  static inline uint64_t ram_chunk_index(const uint8_t *start,
> >                                         const uint8_t *host) @@
> > -1198,24 +1200,81 @@ static int qemu_rdma_alloc_qp(RDMAContext *rdma)
> >      return 0;
> >  }
> >  
> > -static int qemu_rdma_reg_whole_ram_blocks(RDMAContext *rdma)
> > +/*
> > + * Parameters:
> > + *    @id == UNUSED_ID :
> > + *    This means that we register memory for the main RDMA channel,
> > + *    the main RDMA channel don't register the mach-virt.ram block
> > + *    when we use multiRDMA method to migrate.
> > + *
> > + *    @id == 0 or id == 1 or ... :
> > + *    This means that we register memory for the multiRDMA channels,
> > + *    the multiRDMA channels only register memory for the mach-virt.ram
> > + *    block when we use multiRDAM method to migrate.
> > + */
> > +static int qemu_rdma_reg_whole_ram_blocks(RDMAContext *rdma, 
> > +uint8_t
> > +id)
> >  {
> >      int i;
> >      RDMALocalBlocks *local = &rdma->local_ram_blocks;
> >  
> > -    for (i = 0; i < local->nb_blocks; i++) {
> > -        local->block[i].mr =
> > -            ibv_reg_mr(rdma->pd,
> > -                    local->block[i].local_host_addr,
> > -                    local->block[i].length,
> > -                    IBV_ACCESS_LOCAL_WRITE |
> > -                    IBV_ACCESS_REMOTE_WRITE
> > -                    );
> > -        if (!local->block[i].mr) {
> > -            perror("Failed to register local dest ram block!\n");
> > -            break;
> > +    if (migrate_use_multiRDMA()) {
> > +        if (id == UNUSED_ID) {
> > +            for (i = 0; i < local->nb_blocks; i++) {
> > +                /* main RDMA channel don't register the mach-virt.ram block */
> > +                if (strcmp(local->block[i].block_name, "mach-virt.ram") == 0) {
> > +                    continue;
> > +                }
> > +
> > +                local->block[i].mr =
> > +                    ibv_reg_mr(rdma->pd,
> > +                            local->block[i].local_host_addr,
> > +                            local->block[i].length,
> > +                            IBV_ACCESS_LOCAL_WRITE |
> > +                            IBV_ACCESS_REMOTE_WRITE
> > +                            );
> > +                if (!local->block[i].mr) {
> > +                    perror("Failed to register local dest ram block!\n");
> > +                    break;
> > +                }
> > +                rdma->total_registrations++;
> > +            }
> > +        } else {
> > +            for (i = 0; i < local->nb_blocks; i++) {
> > +                /*
> > +                 * The multiRDAM channels only register
> > +                 * the mach-virt.ram block
> > +                 */
> > +                if (strcmp(local->block[i].block_name, "mach-virt.ram") == 0) {
> > +                    local->block[i].mr =
> > +                        ibv_reg_mr(rdma->pd,
> > +                                local->block[i].local_host_addr,
> > +                                local->block[i].length,
> > +                                IBV_ACCESS_LOCAL_WRITE |
> > +                                IBV_ACCESS_REMOTE_WRITE
> > +                                );
> > +                    if (!local->block[i].mr) {
> > +                        perror("Failed to register local dest ram block!\n");
> > +                        break;
> > +                    }
> > +                    rdma->total_registrations++;
> > +                }
> > +            }
> > +        }
> > +    } else {
> > +        for (i = 0; i < local->nb_blocks; i++) {
> > +            local->block[i].mr =
> > +                ibv_reg_mr(rdma->pd,
> > +                        local->block[i].local_host_addr,
> > +                        local->block[i].length,
> > +                        IBV_ACCESS_LOCAL_WRITE |
> > +                        IBV_ACCESS_REMOTE_WRITE
> > +                        );
> > +            if (!local->block[i].mr) {
> > +                perror("Failed to register local dest ram block!\n");
> > +                break;
> > +            }
> > +            rdma->total_registrations++;
> >          }
> > -        rdma->total_registrations++;
> >      }
> >  
> >      if (i >= local->nb_blocks) {
> > @@ -1223,8 +1282,10 @@ static int qemu_rdma_reg_whole_ram_blocks(RDMAContext *rdma)
> >      }
> >  
> >      for (i--; i >= 0; i--) {
> > -        ibv_dereg_mr(local->block[i].mr);
> > -        rdma->total_registrations--;
> > +        if (local->block[i].mr) {
> > +            ibv_dereg_mr(local->block[i].mr);
> > +            rdma->total_registrations--;
> > +        }
> >      }
> >  
> >      return -1;
> > @@ -1446,7 +1507,7 @@ static int qemu_rdma_unregister_waiting(RDMAContext *rdma)
> >          reg.key.chunk = chunk;
> >          register_to_network(rdma, &reg);
> >          ret = qemu_rdma_exchange_send(rdma, &head, (uint8_t *) &reg,
> > -                                &resp, NULL, NULL);
> > +                                      &resp, NULL, NULL, 
> > + UNUSED_ID);
> >          if (ret < 0) {
> >              return ret;
> >          }
> > @@ -1915,11 +1976,17 @@ static void qemu_rdma_move_header(RDMAContext *rdma, int idx,
> >   * The extra (optional) response is used during registration to us from having
> >   * to perform an *additional* exchange of message just to provide a response by
> >   * instead piggy-backing on the acknowledgement.
> > + *
> > + * Parameters:
> > + *    @id : callback function need two parameters, id is the second parameter.
> > + *
> >   */
> >  static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
> >                                     uint8_t *data, RDMAControlHeader *resp,
> >                                     int *resp_idx,
> > -                                   int (*callback)(RDMAContext *rdma))
> > +                                   int (*callback)(RDMAContext *rdma,
> > +                                   uint8_t id),
> > +                                   uint8_t id)
> >  {
> >      int ret = 0;
> >  
> > @@ -1973,7 +2040,7 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
> >      if (resp) {
> >          if (callback) {
> >              trace_qemu_rdma_exchange_send_issue_callback();
> > -            ret = callback(rdma);
> > +            ret = callback(rdma, id);
> >              if (ret < 0) {
> >                  return ret;
> >              }
> > @@ -2168,7 +2235,7 @@ retry:
> >  
> >                  compress_to_network(rdma, &comp);
> >                  ret = qemu_rdma_exchange_send(rdma, &head,
> > -                                (uint8_t *) &comp, NULL, NULL, NULL);
> > +                                (uint8_t *) &comp, NULL, NULL, 
> > + NULL, UNUSED_ID);
> >  
> >                  if (ret < 0) {
> >                      return -EIO;
> > @@ -2195,7 +2262,7 @@ retry:
> >  
> >              register_to_network(rdma, &reg);
> >              ret = qemu_rdma_exchange_send(rdma, &head, (uint8_t *) &reg,
> > -                                    &resp, &reg_result_idx, NULL);
> > +                                    &resp, &reg_result_idx, NULL, 
> > + UNUSED_ID);
> >              if (ret < 0) {
> >                  return ret;
> >              }
> > @@ -2828,7 +2895,8 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
> >              head.len = len;
> >              head.type = RDMA_CONTROL_QEMU_FILE;
> >  
> > -            ret = qemu_rdma_exchange_send(rdma, &head, data, NULL, NULL, NULL);
> > +            ret = qemu_rdma_exchange_send(rdma, &head, data, NULL,
> > +                                          NULL, NULL, UNUSED_ID);
> >  
> >              if (ret < 0) {
> >                  rdma->error_state = ret; @@ -3660,7 +3728,7 @@ 
> > static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque)
> >              }
> >  
> >              if (rdma->pin_all) {
> > -                ret = qemu_rdma_reg_whole_ram_blocks(rdma);
> > +                ret = qemu_rdma_reg_whole_ram_blocks(rdma,
> > + UNUSED_ID);
> >                  if (ret) {
> >                      error_report("rdma migration: error dest "
> >                                      "registering ram blocks"); @@
> > -3675,6 +3743,15 @@ static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque)
> >               * their "local" descriptions with what was sent.
> >               */
> >              for (i = 0; i < local->nb_blocks; i++) {
> > +                /*
> > +                 * use the main RDMA channel to deliver the block of device
> > +                 * use the multiRDMA channels to deliver the RAMBlock
> > +                 */
> > +                if (migrate_use_multiRDMA() &&
> > +                    strcmp(local->block[i].block_name, "mach-virt.ram") == 0) {
> > +                        continue;
> > +                }
> > +
> >                  rdma->dest_blocks[i].remote_host_addr =
> >                      (uintptr_t)(local->block[i].local_host_addr);
> >  
> > @@ -3992,7 +4069,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
> >           */
> >          ret = qemu_rdma_exchange_send(rdma, &head, NULL, &resp,
> >                      &reg_result_idx, rdma->pin_all ?
> > -                    qemu_rdma_reg_whole_ram_blocks : NULL);
> > +                    qemu_rdma_reg_whole_ram_blocks : NULL, 
> > + UNUSED_ID);
> >          if (ret < 0) {
> >              ERROR(errp, "receiving remote info!");
> >              return ret;
> > @@ -4025,6 +4102,11 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
> >          memcpy(rdma->dest_blocks,
> >              rdma->wr_data[reg_result_idx].control_curr, resp.len);
> >          for (i = 0; i < nb_dest_blocks; i++) {
> > +            if (migrate_use_multiRDMA() &&
> > +                strcmp(local->block[i].block_name, "mach-virt.ram") == 0) {
> > +                continue;
> > +            }
> > +
> >              network_to_dest_block(&rdma->dest_blocks[i]);
> >  
> >              /* We require that the blocks are in the same order */ 
> > @@
> > -4050,7 +4132,8 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
> >      trace_qemu_rdma_registration_stop(flags);
> >  
> >      head.type = RDMA_CONTROL_REGISTER_FINISHED;
> > -    ret = qemu_rdma_exchange_send(rdma, &head, NULL, NULL, NULL, NULL);
> > +    ret = qemu_rdma_exchange_send(rdma, &head, NULL, NULL,
> > +                                  NULL, NULL, UNUSED_ID);
> >  
> >      if (migrate_use_multiRDMA()) {
> >          /*
> > @@ -4298,7 +4381,7 @@ static int qemu_multiRDMA_registration_handle(void *opaque)
> >                    sizeof(RDMALocalBlock), dest_ram_sort_func);
> >  
> >              if (rdma->pin_all) {
> > -                ret = qemu_rdma_reg_whole_ram_blocks(rdma);
> > +                ret = qemu_rdma_reg_whole_ram_blocks(rdma, p->id);
> >                  if (ret) {
> >                      error_report("rdma migration: error dest "
> >                                   "registering ram blocks"); @@
> > -4680,7 +4763,7 @@ static void *multiRDMA_send_thread(void *opaque)
> >  
> >      ret = qemu_rdma_exchange_send(rdma, &head, NULL, &resp,
> >              &reg_result_idx, rdma->pin_all ?
> > -            qemu_rdma_reg_whole_ram_blocks : NULL);
> > +            qemu_rdma_reg_whole_ram_blocks : NULL, p->id);
> >      if (ret < 0) {
> >          return NULL;
> >      }
> > @@ -4749,7 +4832,8 @@ static void *multiRDMA_send_thread(void 
> > *opaque)
> >  
> >          /* Send FINISHED to the destination */
> >          head.type = RDMA_CONTROL_REGISTER_FINISHED;
> > -        ret = qemu_rdma_exchange_send(rdma, &head, NULL, NULL, NULL, NULL);
> > +        ret = qemu_rdma_exchange_send(rdma, &head, NULL, NULL,
> > +                                      NULL, NULL, p->id);
> >          if (ret < 0) {
> >              return NULL;
> >          }
> > --
> > 2.19.1
> > 
> > 
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, back to index

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-09  4:59 [PATCH RFC 00/12] *** mulitple RDMA channels for migration *** Zhimin Feng
2020-01-09  4:59 ` [PATCH RFC 01/12] migration: Add multiRDMA capability support Zhimin Feng
2020-01-13 15:30   ` Markus Armbruster
2020-01-15  1:55     ` fengzhimin
2020-01-13 16:26   ` Eric Blake
2020-01-15  2:04     ` fengzhimin
2020-01-15 18:09   ` Dr. David Alan Gilbert
2020-01-16 13:18     ` Juan Quintela
2020-01-17  1:30       ` fengzhimin
2020-01-09  4:59 ` [PATCH RFC 02/12] migration: Export the 'migration_incoming_setup' function and add the 'migrate_use_rdma_pin_all' function Zhimin Feng
2020-01-15 18:57   ` Dr. David Alan Gilbert
2020-01-16 13:19   ` Juan Quintela
2020-01-09  4:59 ` [PATCH RFC 03/12] migration: Create the multi-rdma-channels parameter Zhimin Feng
2020-01-13 15:34   ` Markus Armbruster
2020-01-15  1:57     ` fengzhimin
2020-01-16 13:20   ` Juan Quintela
2020-01-09  4:59 ` [PATCH RFC 04/12] migration/rdma: Create multiRDMA migration threads Zhimin Feng
2020-01-16 13:25   ` Juan Quintela
2020-01-17  1:32     ` fengzhimin
2020-01-09  4:59 ` [PATCH RFC 05/12] migration/rdma: Create the multiRDMA channels Zhimin Feng
2020-01-15 19:54   ` Dr. David Alan Gilbert
2020-01-16 13:30   ` Juan Quintela
2020-01-09  4:59 ` [PATCH RFC 06/12] migration/rdma: Transmit initial package Zhimin Feng
2020-01-15 18:36   ` Dr. David Alan Gilbert
2020-01-09  4:59 ` [PATCH RFC 07/12] migration/rdma: Be sure all channels are created Zhimin Feng
2020-01-09  4:59 ` [PATCH RFC 08/12] migration/rdma: register memory for multiRDMA channels Zhimin Feng
2020-01-09  4:59 ` [PATCH RFC 09/12] migration/rdma: Wait for all multiRDMA to complete registration Zhimin Feng
2020-01-09  4:59 ` [PATCH RFC 10/12] migration/rdma: use multiRDMA to send RAM block for rdma-pin-all mode Zhimin Feng
2020-01-09  4:59 ` [PATCH RFC 11/12] migration/rdma: use multiRDMA to send RAM block for NOT " Zhimin Feng
2020-01-09  4:59 ` [PATCH RFC 12/12] migration/rdma: only register the virt-ram block for MultiRDMA Zhimin Feng
2020-01-17 18:52   ` Dr. David Alan Gilbert
2020-01-19  1:44     ` fengzhimin
2020-01-20  9:05       ` Dr. David Alan Gilbert
2020-01-21  1:30         ` fengzhimin
2020-01-09 10:38 ` [PATCH RFC 00/12] *** mulitple RDMA channels for migration *** no-reply
2020-01-15 19:57 ` Dr. David Alan Gilbert
2020-01-16  1:37   ` fengzhimin

QEMU-Devel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/qemu-devel/0 qemu-devel/git/0.git
	git clone --mirror https://lore.kernel.org/qemu-devel/1 qemu-devel/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 qemu-devel qemu-devel/ https://lore.kernel.org/qemu-devel \
		qemu-devel@nongnu.org
	public-inbox-index qemu-devel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.nongnu.qemu-devel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git