qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 0/1] COLO: migrate dirty ram pages before colo checkpoint
@ 2020-06-21  2:10 Derek Su
  2020-06-21  2:10 ` [PATCH v1 1/1] migration/colo.c: " Derek Su
  2020-07-31  7:52 ` [PATCH v1 0/1] COLO: " Lukas Straub
  0 siblings, 2 replies; 9+ messages in thread
From: Derek Su @ 2020-06-21  2:10 UTC (permalink / raw)
  To: qemu-devel
  Cc: zhang.zhanghailiang, chyang, quintela, Derek Su, dgilbert,
	ctcheng, jwsu1986

This series is to reduce the guest's downtime during colo checkpoint
by migrating dirty ram pages as many as possible before colo checkpoint.

If the iteration count reaches COLO_RAM_MIGRATE_ITERATION_MAX or
ram pending size is lower than 'x-colo-migrate-ram-threshold',
stop the ram migration and do colo checkpoint.

Test environment:
The both primary VM and secondary VM has 1GiB ram and 10GbE NIC
for FT traffic.
One fio buffer write job runs on the guest.                                                                                                                     
The result shows the total primary VM downtime is decreased by ~40%.

Please help to review it and suggestions are welcomed.
Thanks.

Derek Su (1):
  migration/colo.c: migrate dirty ram pages before colo checkpoint

 migration/colo.c       | 79 ++++++++++++++++++++++++++++++++++++++++++
 migration/migration.c  | 20 +++++++++++
 migration/trace-events |  2 ++
 monitor/hmp-cmds.c     |  8 +++++
 qapi/migration.json    | 18 ++++++++--
 5 files changed, 125 insertions(+), 2 deletions(-)

-- 
2.17.1



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v1 1/1] migration/colo.c: migrate dirty ram pages before colo checkpoint
  2020-06-21  2:10 [PATCH v1 0/1] COLO: migrate dirty ram pages before colo checkpoint Derek Su
@ 2020-06-21  2:10 ` Derek Su
  2020-06-22 17:23   ` Eric Blake
  2020-07-13  9:03   ` Derek Su
  2020-07-31  7:52 ` [PATCH v1 0/1] COLO: " Lukas Straub
  1 sibling, 2 replies; 9+ messages in thread
From: Derek Su @ 2020-06-21  2:10 UTC (permalink / raw)
  To: qemu-devel
  Cc: zhang.zhanghailiang, chyang, quintela, Derek Su, dgilbert,
	ctcheng, jwsu1986

To reduce the guest's downtime during checkpoint, migrate dirty
ram pages as many as possible before colo checkpoint.

If the iteration count reaches COLO_RAM_MIGRATE_ITERATION_MAX or
ram pending size is lower than 'x-colo-migrate-ram-threshold',
stop the ram migration and colo checkpoint.

Signed-off-by: Derek Su <dereksu@qnap.com>
---
 migration/colo.c       | 79 ++++++++++++++++++++++++++++++++++++++++++
 migration/migration.c  | 20 +++++++++++
 migration/trace-events |  2 ++
 monitor/hmp-cmds.c     |  8 +++++
 qapi/migration.json    | 18 ++++++++--
 5 files changed, 125 insertions(+), 2 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index ea7d1e9d4e..a0c71d7c56 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -46,6 +46,13 @@ static Notifier packets_compare_notifier;
 static COLOMode last_colo_mode;
 
 #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
+#define COLO_RAM_MIGRATE_ITERATION_MAX 10
+
+typedef enum {
+    COLO_MIG_ITERATE_RESUME = 0,  /* Resume migration iteration */
+    COLO_MIG_ITERATE_BREAK  = 1,  /* Break migration loop */
+} COLOMigIterateState;
+
 
 bool migration_in_colo_state(void)
 {
@@ -398,6 +405,68 @@ static uint64_t colo_receive_message_value(QEMUFile *f, uint32_t expect_msg,
     return value;
 }
 
+static int colo_migrate_ram_iteration(MigrationState *s, Error **errp)
+{
+    int64_t threshold_size = s->parameters.x_colo_migrate_ram_threshold;
+    uint64_t pending_size, pend_pre, pend_compat, pend_post;
+    Error *local_err = NULL;
+    int ret;
+
+    if (threshold_size == 0) {
+        return COLO_MIG_ITERATE_BREAK;
+    }
+
+    qemu_savevm_state_pending(s->to_dst_file, threshold_size,
+                              &pend_pre, &pend_compat, &pend_post);
+    pending_size = pend_pre + pend_compat + pend_post;
+
+    trace_colo_migrate_pending(pending_size, threshold_size,
+                               pend_pre, pend_compat, pend_post);
+
+    /* Still a significant amount to transfer */
+    if (pending_size && pending_size >= threshold_size) {
+        colo_send_message(s->to_dst_file, COLO_MESSAGE_MIGRATE_RAM, &local_err);
+        if (local_err) {
+            error_propagate(errp, local_err);
+            return COLO_MIG_ITERATE_BREAK;
+        }
+
+        qemu_savevm_state_iterate(s->to_dst_file, false);
+        qemu_put_byte(s->to_dst_file, QEMU_VM_EOF);
+
+        ret = qemu_file_get_error(s->to_dst_file);
+        if (ret < 0) {
+            error_setg_errno(errp, -ret, "Failed to send dirty pages");
+            return COLO_MIG_ITERATE_BREAK;
+        }
+
+        return COLO_MIG_ITERATE_RESUME;
+    }
+
+    trace_colo_migration_low_pending(pending_size);
+
+    return COLO_MIG_ITERATE_BREAK;
+}
+
+static void colo_migrate_ram_before_checkpoint(MigrationState *s, Error **errp)
+{
+    Error *local_err = NULL;
+    int iterate_count = 0;
+
+    while (iterate_count++ < COLO_RAM_MIGRATE_ITERATION_MAX) {
+        COLOMigIterateState state;
+
+        state = colo_migrate_ram_iteration(s, &local_err);
+        if (state == COLO_MIG_ITERATE_BREAK) {
+            if (local_err) {
+                error_propagate(errp, local_err);
+            }
+
+            return;
+        }
+    };
+}
+
 static int colo_do_checkpoint_transaction(MigrationState *s,
                                           QIOChannelBuffer *bioc,
                                           QEMUFile *fb)
@@ -405,6 +474,11 @@ static int colo_do_checkpoint_transaction(MigrationState *s,
     Error *local_err = NULL;
     int ret = -1;
 
+    colo_migrate_ram_before_checkpoint(s, &local_err);
+    if (local_err) {
+        goto out;
+    }
+
     colo_send_message(s->to_dst_file, COLO_MESSAGE_CHECKPOINT_REQUEST,
                       &local_err);
     if (local_err) {
@@ -819,6 +893,11 @@ static void colo_wait_handle_message(MigrationIncomingState *mis,
     case COLO_MESSAGE_CHECKPOINT_REQUEST:
         colo_incoming_process_checkpoint(mis, fb, bioc, errp);
         break;
+    case COLO_MESSAGE_MIGRATE_RAM:
+        if (qemu_loadvm_state_main(mis->from_src_file, mis) < 0) {
+            error_setg(errp, "Load ram failed");
+        }
+        break;
     default:
         error_setg(errp, "Got unknown COLO message: %d", msg);
         break;
diff --git a/migration/migration.c b/migration/migration.c
index 481a590f72..390937ae5d 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -88,6 +88,7 @@
 
 /* The delay time (in ms) between two COLO checkpoints */
 #define DEFAULT_MIGRATE_X_CHECKPOINT_DELAY (200 * 100)
+#define DEFAULT_COLO_MIGRATE_RAM_THRESHOLD (100 * 1024 * 1024UL)
 #define DEFAULT_MIGRATE_MULTIFD_CHANNELS 2
 #define DEFAULT_MIGRATE_MULTIFD_COMPRESSION MULTIFD_COMPRESSION_NONE
 /* 0: means nocompress, 1: best speed, ... 9: best compress ratio */
@@ -800,6 +801,9 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp)
     params->downtime_limit = s->parameters.downtime_limit;
     params->has_x_checkpoint_delay = true;
     params->x_checkpoint_delay = s->parameters.x_checkpoint_delay;
+    params->has_x_colo_migrate_ram_threshold = true;
+    params->x_colo_migrate_ram_threshold =
+            s->parameters.x_colo_migrate_ram_threshold;
     params->has_block_incremental = true;
     params->block_incremental = s->parameters.block_incremental;
     params->has_multifd_channels = true;
@@ -1223,6 +1227,8 @@ static bool migrate_params_check(MigrationParameters *params, Error **errp)
 
     /* x_checkpoint_delay is now always positive */
 
+    /* x_colo_migrate_ram_threshold is zero or positive */
+
     if (params->has_multifd_channels && (params->multifd_channels < 1)) {
         error_setg(errp, QERR_INVALID_PARAMETER_VALUE,
                    "multifd_channels",
@@ -1356,6 +1362,11 @@ static void migrate_params_test_apply(MigrateSetParameters *params,
         dest->x_checkpoint_delay = params->x_checkpoint_delay;
     }
 
+    if (params->has_x_colo_migrate_ram_threshold) {
+        dest->x_colo_migrate_ram_threshold =
+            params->x_colo_migrate_ram_threshold;
+    }
+
     if (params->has_block_incremental) {
         dest->block_incremental = params->block_incremental;
     }
@@ -1463,6 +1474,11 @@ static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
         }
     }
 
+    if (params->has_x_colo_migrate_ram_threshold) {
+        s->parameters.x_colo_migrate_ram_threshold =
+                params->x_colo_migrate_ram_threshold;
+    }
+
     if (params->has_block_incremental) {
         s->parameters.block_incremental = params->block_incremental;
     }
@@ -3622,6 +3638,9 @@ static Property migration_properties[] = {
     DEFINE_PROP_UINT32("x-checkpoint-delay", MigrationState,
                       parameters.x_checkpoint_delay,
                       DEFAULT_MIGRATE_X_CHECKPOINT_DELAY),
+    DEFINE_PROP_UINT64("x-colo-migrate-ram-threshold", MigrationState,
+                      parameters.x_colo_migrate_ram_threshold,
+                      DEFAULT_COLO_MIGRATE_RAM_THRESHOLD),
     DEFINE_PROP_UINT8("multifd-channels", MigrationState,
                       parameters.multifd_channels,
                       DEFAULT_MIGRATE_MULTIFD_CHANNELS),
@@ -3724,6 +3743,7 @@ static void migration_instance_init(Object *obj)
     params->has_max_bandwidth = true;
     params->has_downtime_limit = true;
     params->has_x_checkpoint_delay = true;
+    params->has_x_colo_migrate_ram_threshold = true;
     params->has_block_incremental = true;
     params->has_multifd_channels = true;
     params->has_multifd_compression = true;
diff --git a/migration/trace-events b/migration/trace-events
index 4ab0a503d2..32bf42cdb3 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -295,6 +295,8 @@ migration_tls_incoming_handshake_complete(void) ""
 colo_vm_state_change(const char *old, const char *new) "Change '%s' => '%s'"
 colo_send_message(const char *msg) "Send '%s' message"
 colo_receive_message(const char *msg) "Receive '%s' message"
+colo_migrate_pending(uint64_t size, uint64_t max, uint64_t pre, uint64_t compat, uint64_t post) "pending size %" PRIu64 " max %" PRIu64 " (pre = %" PRIu64 " compat=%" PRIu64 " post=%" PRIu64 ")"
+colo_migration_low_pending(uint64_t pending) "%" PRIu64
 
 # colo-failover.c
 colo_failover_set_state(const char *new_state) "new state %s"
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index e03adf0d4d..ebca533960 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -450,6 +450,10 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict)
         monitor_printf(mon, "%s: %u\n",
             MigrationParameter_str(MIGRATION_PARAMETER_X_CHECKPOINT_DELAY),
             params->x_checkpoint_delay);
+        assert(params->has_x_colo_migrate_ram_threshold);
+        monitor_printf(mon, "%s: %" PRIu64 "\n",
+            MigrationParameter_str(MIGRATION_PARAMETER_X_COLO_MIGRATE_RAM_THRESHOLD),
+            params->x_colo_migrate_ram_threshold);
         assert(params->has_block_incremental);
         monitor_printf(mon, "%s: %s\n",
             MigrationParameter_str(MIGRATION_PARAMETER_BLOCK_INCREMENTAL),
@@ -1330,6 +1334,10 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
         p->has_x_checkpoint_delay = true;
         visit_type_int(v, param, &p->x_checkpoint_delay, &err);
         break;
+    case MIGRATION_PARAMETER_X_COLO_MIGRATE_RAM_THRESHOLD:
+        p->has_x_colo_migrate_ram_threshold = true;
+        visit_type_int(v, param, &p->x_colo_migrate_ram_threshold, &err);
+        break;
     case MIGRATION_PARAMETER_BLOCK_INCREMENTAL:
         p->has_block_incremental = true;
         visit_type_bool(v, param, &p->block_incremental, &err);
diff --git a/qapi/migration.json b/qapi/migration.json
index d5000558c6..30576c038c 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -600,6 +600,9 @@
 # @x-checkpoint-delay: The delay time (in ms) between two COLO checkpoints in
 #                      periodic mode. (Since 2.8)
 #
+# @x-colo-migrate-ram-threshold: the threshold (in bytes) of the COLO ram migration's
+#                                pending size before COLO checkpoint. (Since 5.0)
+#
 # @block-incremental: Affects how much storage is migrated when the
 #                     block migration capability is enabled.  When false, the entire
 #                     storage backing chain is migrated into a flattened image at
@@ -651,7 +654,8 @@
            'cpu-throttle-initial', 'cpu-throttle-increment',
            'cpu-throttle-tailslow',
            'tls-creds', 'tls-hostname', 'tls-authz', 'max-bandwidth',
-           'downtime-limit', 'x-checkpoint-delay', 'block-incremental',
+           'downtime-limit', 'x-checkpoint-delay',
+           'x-colo-migrate-ram-threshold', 'block-incremental',
            'multifd-channels',
            'xbzrle-cache-size', 'max-postcopy-bandwidth',
            'max-cpu-throttle', 'multifd-compression',
@@ -740,6 +744,9 @@
 #
 # @x-checkpoint-delay: the delay time between two COLO checkpoints. (Since 2.8)
 #
+# @x-colo-migrate-ram-threshold: the threshold in bytes of the COLO ram migration's
+#                                pending size before COLO checkpoint. (Since 5.0)
+#
 # @block-incremental: Affects how much storage is migrated when the
 #                     block migration capability is enabled.  When false, the entire
 #                     storage backing chain is migrated into a flattened image at
@@ -804,6 +811,7 @@
             '*max-bandwidth': 'int',
             '*downtime-limit': 'int',
             '*x-checkpoint-delay': 'int',
+            '*x-colo-migrate-ram-threshold': 'int',
             '*block-incremental': 'bool',
             '*multifd-channels': 'int',
             '*xbzrle-cache-size': 'size',
@@ -915,6 +923,9 @@
 #
 # @x-checkpoint-delay: the delay time between two COLO checkpoints. (Since 2.8)
 #
+# @x-colo-migrate-ram-threshold: the threshold in bytes of the COLO ram migration's
+#                                pending size before COLO checkpoint. (Since 5.0)
+#
 # @block-incremental: Affects how much storage is migrated when the
 #                     block migration capability is enabled.  When false, the entire
 #                     storage backing chain is migrated into a flattened image at
@@ -978,6 +989,7 @@
             '*max-bandwidth': 'size',
             '*downtime-limit': 'uint64',
             '*x-checkpoint-delay': 'uint32',
+            '*x-colo-migrate-ram-threshold': 'uint64',
             '*block-incremental': 'bool' ,
             '*multifd-channels': 'uint8',
             '*xbzrle-cache-size': 'size',
@@ -1116,12 +1128,14 @@
 #
 # @vmstate-loaded: VM's state has been loaded by SVM.
 #
+# @migrate-ram: Send dirty pages as many as possible before COLO checkpoint.
+#
 # Since: 2.8
 ##
 { 'enum': 'COLOMessage',
   'data': [ 'checkpoint-ready', 'checkpoint-request', 'checkpoint-reply',
             'vmstate-send', 'vmstate-size', 'vmstate-received',
-            'vmstate-loaded' ] }
+            'vmstate-loaded', 'migrate-ram' ] }
 
 ##
 # @COLOMode:
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v1 1/1] migration/colo.c: migrate dirty ram pages before colo checkpoint
  2020-06-21  2:10 ` [PATCH v1 1/1] migration/colo.c: " Derek Su
@ 2020-06-22 17:23   ` Eric Blake
  2020-07-13  9:03   ` Derek Su
  1 sibling, 0 replies; 9+ messages in thread
From: Eric Blake @ 2020-06-22 17:23 UTC (permalink / raw)
  To: Derek Su, qemu-devel
  Cc: zhang.zhanghailiang, chyang, quintela, dgilbert, ctcheng, jwsu1986

On 6/20/20 9:10 PM, Derek Su wrote:
> To reduce the guest's downtime during checkpoint, migrate dirty
> ram pages as many as possible before colo checkpoint.
> 
> If the iteration count reaches COLO_RAM_MIGRATE_ITERATION_MAX or
> ram pending size is lower than 'x-colo-migrate-ram-threshold',
> stop the ram migration and colo checkpoint.
> 
> Signed-off-by: Derek Su <dereksu@qnap.com>
> ---
>   migration/colo.c       | 79 ++++++++++++++++++++++++++++++++++++++++++
>   migration/migration.c  | 20 +++++++++++
>   migration/trace-events |  2 ++
>   monitor/hmp-cmds.c     |  8 +++++
>   qapi/migration.json    | 18 ++++++++--
>   5 files changed, 125 insertions(+), 2 deletions(-)

Focusing just on UI,


> +++ b/qapi/migration.json
> @@ -600,6 +600,9 @@
>   # @x-checkpoint-delay: The delay time (in ms) between two COLO checkpoints in
>   #                      periodic mode. (Since 2.8)
>   #
> +# @x-colo-migrate-ram-threshold: the threshold (in bytes) of the COLO ram migration's

Long line, please wrap prior to 80 columns.

> +#                                pending size before COLO checkpoint. (Since 5.0)

s/5.0/5.1/

> +#
>   # @block-incremental: Affects how much storage is migrated when the
>   #                     block migration capability is enabled.  When false, the entire
>   #                     storage backing chain is migrated into a flattened image at
> @@ -651,7 +654,8 @@
>              'cpu-throttle-initial', 'cpu-throttle-increment',
>              'cpu-throttle-tailslow',
>              'tls-creds', 'tls-hostname', 'tls-authz', 'max-bandwidth',
> -           'downtime-limit', 'x-checkpoint-delay', 'block-incremental',
> +           'downtime-limit', 'x-checkpoint-delay',
> +           'x-colo-migrate-ram-threshold', 'block-incremental',
>              'multifd-channels',
>              'xbzrle-cache-size', 'max-postcopy-bandwidth',
>              'max-cpu-throttle', 'multifd-compression',
> @@ -740,6 +744,9 @@
>   #
>   # @x-checkpoint-delay: the delay time between two COLO checkpoints. (Since 2.8)
>   #
> +# @x-colo-migrate-ram-threshold: the threshold in bytes of the COLO ram migration's
> +#                                pending size before COLO checkpoint. (Since 5.0)
> +#

Ditto.


> @@ -1116,12 +1128,14 @@
>   #
>   # @vmstate-loaded: VM's state has been loaded by SVM.
>   #
> +# @migrate-ram: Send dirty pages as many as possible before COLO checkpoint.
> +#

Missing a notation that migrate-ram is since 5.1.

>   # Since: 2.8
>   ##
>   { 'enum': 'COLOMessage',
>     'data': [ 'checkpoint-ready', 'checkpoint-request', 'checkpoint-reply',
>               'vmstate-send', 'vmstate-size', 'vmstate-received',
> -            'vmstate-loaded' ] }
> +            'vmstate-loaded', 'migrate-ram' ] }
>   
>   ##
>   # @COLOMode:
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v1 1/1] migration/colo.c: migrate dirty ram pages before colo checkpoint
  2020-06-21  2:10 ` [PATCH v1 1/1] migration/colo.c: " Derek Su
  2020-06-22 17:23   ` Eric Blake
@ 2020-07-13  9:03   ` Derek Su
  1 sibling, 0 replies; 9+ messages in thread
From: Derek Su @ 2020-07-13  9:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: zhang.zhanghailiang, dgilbert, quintela

Hello,

Ping...

Anyone have comments about this path?
To reduce the downtime during checkpoints, the patch tries to migrate 
memory page as many as possible just before entering COLO state.

Thanks.

Regards,
Derek

On 2020/6/21 上午10:10, Derek Su wrote:
> To reduce the guest's downtime during checkpoint, migrate dirty
> ram pages as many as possible before colo checkpoint.
> 
> If the iteration count reaches COLO_RAM_MIGRATE_ITERATION_MAX or
> ram pending size is lower than 'x-colo-migrate-ram-threshold',
> stop the ram migration and colo checkpoint.
> 
> Signed-off-by: Derek Su <dereksu@qnap.com>
> ---
>   migration/colo.c       | 79 ++++++++++++++++++++++++++++++++++++++++++
>   migration/migration.c  | 20 +++++++++++
>   migration/trace-events |  2 ++
>   monitor/hmp-cmds.c     |  8 +++++
>   qapi/migration.json    | 18 ++++++++--
>   5 files changed, 125 insertions(+), 2 deletions(-)
> 
> diff --git a/migration/colo.c b/migration/colo.c
> index ea7d1e9d4e..a0c71d7c56 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -46,6 +46,13 @@ static Notifier packets_compare_notifier;
>   static COLOMode last_colo_mode;
>   
>   #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
> +#define COLO_RAM_MIGRATE_ITERATION_MAX 10
> +
> +typedef enum {
> +    COLO_MIG_ITERATE_RESUME = 0,  /* Resume migration iteration */
> +    COLO_MIG_ITERATE_BREAK  = 1,  /* Break migration loop */
> +} COLOMigIterateState;
> +
>   
>   bool migration_in_colo_state(void)
>   {
> @@ -398,6 +405,68 @@ static uint64_t colo_receive_message_value(QEMUFile *f, uint32_t expect_msg,
>       return value;
>   }
>   
> +static int colo_migrate_ram_iteration(MigrationState *s, Error **errp)
> +{
> +    int64_t threshold_size = s->parameters.x_colo_migrate_ram_threshold;
> +    uint64_t pending_size, pend_pre, pend_compat, pend_post;
> +    Error *local_err = NULL;
> +    int ret;
> +
> +    if (threshold_size == 0) {
> +        return COLO_MIG_ITERATE_BREAK;
> +    }
> +
> +    qemu_savevm_state_pending(s->to_dst_file, threshold_size,
> +                              &pend_pre, &pend_compat, &pend_post);
> +    pending_size = pend_pre + pend_compat + pend_post;
> +
> +    trace_colo_migrate_pending(pending_size, threshold_size,
> +                               pend_pre, pend_compat, pend_post);
> +
> +    /* Still a significant amount to transfer */
> +    if (pending_size && pending_size >= threshold_size) {
> +        colo_send_message(s->to_dst_file, COLO_MESSAGE_MIGRATE_RAM, &local_err);
> +        if (local_err) {
> +            error_propagate(errp, local_err);
> +            return COLO_MIG_ITERATE_BREAK;
> +        }
> +
> +        qemu_savevm_state_iterate(s->to_dst_file, false);
> +        qemu_put_byte(s->to_dst_file, QEMU_VM_EOF);
> +
> +        ret = qemu_file_get_error(s->to_dst_file);
> +        if (ret < 0) {
> +            error_setg_errno(errp, -ret, "Failed to send dirty pages");
> +            return COLO_MIG_ITERATE_BREAK;
> +        }
> +
> +        return COLO_MIG_ITERATE_RESUME;
> +    }
> +
> +    trace_colo_migration_low_pending(pending_size);
> +
> +    return COLO_MIG_ITERATE_BREAK;
> +}
> +
> +static void colo_migrate_ram_before_checkpoint(MigrationState *s, Error **errp)
> +{
> +    Error *local_err = NULL;
> +    int iterate_count = 0;
> +
> +    while (iterate_count++ < COLO_RAM_MIGRATE_ITERATION_MAX) {
> +        COLOMigIterateState state;
> +
> +        state = colo_migrate_ram_iteration(s, &local_err);
> +        if (state == COLO_MIG_ITERATE_BREAK) {
> +            if (local_err) {
> +                error_propagate(errp, local_err);
> +            }
> +
> +            return;
> +        }
> +    };
> +}
> +
>   static int colo_do_checkpoint_transaction(MigrationState *s,
>                                             QIOChannelBuffer *bioc,
>                                             QEMUFile *fb)
> @@ -405,6 +474,11 @@ static int colo_do_checkpoint_transaction(MigrationState *s,
>       Error *local_err = NULL;
>       int ret = -1;
>   
> +    colo_migrate_ram_before_checkpoint(s, &local_err);
> +    if (local_err) {
> +        goto out;
> +    }
> +
>       colo_send_message(s->to_dst_file, COLO_MESSAGE_CHECKPOINT_REQUEST,
>                         &local_err);
>       if (local_err) {
> @@ -819,6 +893,11 @@ static void colo_wait_handle_message(MigrationIncomingState *mis,
>       case COLO_MESSAGE_CHECKPOINT_REQUEST:
>           colo_incoming_process_checkpoint(mis, fb, bioc, errp);
>           break;
> +    case COLO_MESSAGE_MIGRATE_RAM:
> +        if (qemu_loadvm_state_main(mis->from_src_file, mis) < 0) {
> +            error_setg(errp, "Load ram failed");
> +        }
> +        break;
>       default:
>           error_setg(errp, "Got unknown COLO message: %d", msg);
>           break;
> diff --git a/migration/migration.c b/migration/migration.c
> index 481a590f72..390937ae5d 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -88,6 +88,7 @@
>   
>   /* The delay time (in ms) between two COLO checkpoints */
>   #define DEFAULT_MIGRATE_X_CHECKPOINT_DELAY (200 * 100)
> +#define DEFAULT_COLO_MIGRATE_RAM_THRESHOLD (100 * 1024 * 1024UL)
>   #define DEFAULT_MIGRATE_MULTIFD_CHANNELS 2
>   #define DEFAULT_MIGRATE_MULTIFD_COMPRESSION MULTIFD_COMPRESSION_NONE
>   /* 0: means nocompress, 1: best speed, ... 9: best compress ratio */
> @@ -800,6 +801,9 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp)
>       params->downtime_limit = s->parameters.downtime_limit;
>       params->has_x_checkpoint_delay = true;
>       params->x_checkpoint_delay = s->parameters.x_checkpoint_delay;
> +    params->has_x_colo_migrate_ram_threshold = true;
> +    params->x_colo_migrate_ram_threshold =
> +            s->parameters.x_colo_migrate_ram_threshold;
>       params->has_block_incremental = true;
>       params->block_incremental = s->parameters.block_incremental;
>       params->has_multifd_channels = true;
> @@ -1223,6 +1227,8 @@ static bool migrate_params_check(MigrationParameters *params, Error **errp)
>   
>       /* x_checkpoint_delay is now always positive */
>   
> +    /* x_colo_migrate_ram_threshold is zero or positive */
> +
>       if (params->has_multifd_channels && (params->multifd_channels < 1)) {
>           error_setg(errp, QERR_INVALID_PARAMETER_VALUE,
>                      "multifd_channels",
> @@ -1356,6 +1362,11 @@ static void migrate_params_test_apply(MigrateSetParameters *params,
>           dest->x_checkpoint_delay = params->x_checkpoint_delay;
>       }
>   
> +    if (params->has_x_colo_migrate_ram_threshold) {
> +        dest->x_colo_migrate_ram_threshold =
> +            params->x_colo_migrate_ram_threshold;
> +    }
> +
>       if (params->has_block_incremental) {
>           dest->block_incremental = params->block_incremental;
>       }
> @@ -1463,6 +1474,11 @@ static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
>           }
>       }
>   
> +    if (params->has_x_colo_migrate_ram_threshold) {
> +        s->parameters.x_colo_migrate_ram_threshold =
> +                params->x_colo_migrate_ram_threshold;
> +    }
> +
>       if (params->has_block_incremental) {
>           s->parameters.block_incremental = params->block_incremental;
>       }
> @@ -3622,6 +3638,9 @@ static Property migration_properties[] = {
>       DEFINE_PROP_UINT32("x-checkpoint-delay", MigrationState,
>                         parameters.x_checkpoint_delay,
>                         DEFAULT_MIGRATE_X_CHECKPOINT_DELAY),
> +    DEFINE_PROP_UINT64("x-colo-migrate-ram-threshold", MigrationState,
> +                      parameters.x_colo_migrate_ram_threshold,
> +                      DEFAULT_COLO_MIGRATE_RAM_THRESHOLD),
>       DEFINE_PROP_UINT8("multifd-channels", MigrationState,
>                         parameters.multifd_channels,
>                         DEFAULT_MIGRATE_MULTIFD_CHANNELS),
> @@ -3724,6 +3743,7 @@ static void migration_instance_init(Object *obj)
>       params->has_max_bandwidth = true;
>       params->has_downtime_limit = true;
>       params->has_x_checkpoint_delay = true;
> +    params->has_x_colo_migrate_ram_threshold = true;
>       params->has_block_incremental = true;
>       params->has_multifd_channels = true;
>       params->has_multifd_compression = true;
> diff --git a/migration/trace-events b/migration/trace-events
> index 4ab0a503d2..32bf42cdb3 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -295,6 +295,8 @@ migration_tls_incoming_handshake_complete(void) ""
>   colo_vm_state_change(const char *old, const char *new) "Change '%s' => '%s'"
>   colo_send_message(const char *msg) "Send '%s' message"
>   colo_receive_message(const char *msg) "Receive '%s' message"
> +colo_migrate_pending(uint64_t size, uint64_t max, uint64_t pre, uint64_t compat, uint64_t post) "pending size %" PRIu64 " max %" PRIu64 " (pre = %" PRIu64 " compat=%" PRIu64 " post=%" PRIu64 ")"
> +colo_migration_low_pending(uint64_t pending) "%" PRIu64
>   
>   # colo-failover.c
>   colo_failover_set_state(const char *new_state) "new state %s"
> diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
> index e03adf0d4d..ebca533960 100644
> --- a/monitor/hmp-cmds.c
> +++ b/monitor/hmp-cmds.c
> @@ -450,6 +450,10 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict)
>           monitor_printf(mon, "%s: %u\n",
>               MigrationParameter_str(MIGRATION_PARAMETER_X_CHECKPOINT_DELAY),
>               params->x_checkpoint_delay);
> +        assert(params->has_x_colo_migrate_ram_threshold);
> +        monitor_printf(mon, "%s: %" PRIu64 "\n",
> +            MigrationParameter_str(MIGRATION_PARAMETER_X_COLO_MIGRATE_RAM_THRESHOLD),
> +            params->x_colo_migrate_ram_threshold);
>           assert(params->has_block_incremental);
>           monitor_printf(mon, "%s: %s\n",
>               MigrationParameter_str(MIGRATION_PARAMETER_BLOCK_INCREMENTAL),
> @@ -1330,6 +1334,10 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
>           p->has_x_checkpoint_delay = true;
>           visit_type_int(v, param, &p->x_checkpoint_delay, &err);
>           break;
> +    case MIGRATION_PARAMETER_X_COLO_MIGRATE_RAM_THRESHOLD:
> +        p->has_x_colo_migrate_ram_threshold = true;
> +        visit_type_int(v, param, &p->x_colo_migrate_ram_threshold, &err);
> +        break;
>       case MIGRATION_PARAMETER_BLOCK_INCREMENTAL:
>           p->has_block_incremental = true;
>           visit_type_bool(v, param, &p->block_incremental, &err);
> diff --git a/qapi/migration.json b/qapi/migration.json
> index d5000558c6..30576c038c 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -600,6 +600,9 @@
>   # @x-checkpoint-delay: The delay time (in ms) between two COLO checkpoints in
>   #                      periodic mode. (Since 2.8)
>   #
> +# @x-colo-migrate-ram-threshold: the threshold (in bytes) of the COLO ram migration's
> +#                                pending size before COLO checkpoint. (Since 5.0)
> +#
>   # @block-incremental: Affects how much storage is migrated when the
>   #                     block migration capability is enabled.  When false, the entire
>   #                     storage backing chain is migrated into a flattened image at
> @@ -651,7 +654,8 @@
>              'cpu-throttle-initial', 'cpu-throttle-increment',
>              'cpu-throttle-tailslow',
>              'tls-creds', 'tls-hostname', 'tls-authz', 'max-bandwidth',
> -           'downtime-limit', 'x-checkpoint-delay', 'block-incremental',
> +           'downtime-limit', 'x-checkpoint-delay',
> +           'x-colo-migrate-ram-threshold', 'block-incremental',
>              'multifd-channels',
>              'xbzrle-cache-size', 'max-postcopy-bandwidth',
>              'max-cpu-throttle', 'multifd-compression',
> @@ -740,6 +744,9 @@
>   #
>   # @x-checkpoint-delay: the delay time between two COLO checkpoints. (Since 2.8)
>   #
> +# @x-colo-migrate-ram-threshold: the threshold in bytes of the COLO ram migration's
> +#                                pending size before COLO checkpoint. (Since 5.0)
> +#
>   # @block-incremental: Affects how much storage is migrated when the
>   #                     block migration capability is enabled.  When false, the entire
>   #                     storage backing chain is migrated into a flattened image at
> @@ -804,6 +811,7 @@
>               '*max-bandwidth': 'int',
>               '*downtime-limit': 'int',
>               '*x-checkpoint-delay': 'int',
> +            '*x-colo-migrate-ram-threshold': 'int',
>               '*block-incremental': 'bool',
>               '*multifd-channels': 'int',
>               '*xbzrle-cache-size': 'size',
> @@ -915,6 +923,9 @@
>   #
>   # @x-checkpoint-delay: the delay time between two COLO checkpoints. (Since 2.8)
>   #
> +# @x-colo-migrate-ram-threshold: the threshold in bytes of the COLO ram migration's
> +#                                pending size before COLO checkpoint. (Since 5.0)
> +#
>   # @block-incremental: Affects how much storage is migrated when the
>   #                     block migration capability is enabled.  When false, the entire
>   #                     storage backing chain is migrated into a flattened image at
> @@ -978,6 +989,7 @@
>               '*max-bandwidth': 'size',
>               '*downtime-limit': 'uint64',
>               '*x-checkpoint-delay': 'uint32',
> +            '*x-colo-migrate-ram-threshold': 'uint64',
>               '*block-incremental': 'bool' ,
>               '*multifd-channels': 'uint8',
>               '*xbzrle-cache-size': 'size',
> @@ -1116,12 +1128,14 @@
>   #
>   # @vmstate-loaded: VM's state has been loaded by SVM.
>   #
> +# @migrate-ram: Send dirty pages as many as possible before COLO checkpoint.
> +#
>   # Since: 2.8
>   ##
>   { 'enum': 'COLOMessage',
>     'data': [ 'checkpoint-ready', 'checkpoint-request', 'checkpoint-reply',
>               'vmstate-send', 'vmstate-size', 'vmstate-received',
> -            'vmstate-loaded' ] }
> +            'vmstate-loaded', 'migrate-ram' ] }
>   
>   ##
>   # @COLOMode:
> 



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v1 0/1] COLO: migrate dirty ram pages before colo checkpoint
  2020-06-21  2:10 [PATCH v1 0/1] COLO: migrate dirty ram pages before colo checkpoint Derek Su
  2020-06-21  2:10 ` [PATCH v1 1/1] migration/colo.c: " Derek Su
@ 2020-07-31  7:52 ` Lukas Straub
  2020-07-31  9:00   ` Zhanghailiang
  2020-08-13 10:27   ` Derek Su
  1 sibling, 2 replies; 9+ messages in thread
From: Lukas Straub @ 2020-07-31  7:52 UTC (permalink / raw)
  To: Derek Su
  Cc: zhang.zhanghailiang, chyang, quintela, qemu-devel, dgilbert,
	ctcheng, jwsu1986

[-- Attachment #1: Type: text/plain, Size: 1559 bytes --]

On Sun, 21 Jun 2020 10:10:03 +0800
Derek Su <dereksu@qnap.com> wrote:

> This series is to reduce the guest's downtime during colo checkpoint
> by migrating dirty ram pages as many as possible before colo checkpoint.
> 
> If the iteration count reaches COLO_RAM_MIGRATE_ITERATION_MAX or
> ram pending size is lower than 'x-colo-migrate-ram-threshold',
> stop the ram migration and do colo checkpoint.
> 
> Test environment:
> The both primary VM and secondary VM has 1GiB ram and 10GbE NIC
> for FT traffic.
> One fio buffer write job runs on the guest.                                                                                                                     
> The result shows the total primary VM downtime is decreased by ~40%.
> 
> Please help to review it and suggestions are welcomed.
> Thanks.

Hello Derek,
Sorry for the late reply.
I think this is not a good idea, because it unnecessarily introduces a delay between checkpoint request and the checkpoint itself and thus impairs network bound workloads due to increased network latency. Workloads that are independent from network don't cause many checkpoints anyway, so it doesn't help there either.

Hailang did have a patch to migrate ram between checkpoints, which should help all workloads, but it wasn't merged back then. I think you can pick it up again, rebase and address David's and Eric's comments:
https://lore.kernel.org/qemu-devel/20200217012049.22988-3-zhang.zhanghailiang@huawei.com/T/#u

Hailang, are you ok with that?

Regards,
Lukas Straub

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH v1 0/1] COLO: migrate dirty ram pages before colo checkpoint
  2020-07-31  7:52 ` [PATCH v1 0/1] COLO: " Lukas Straub
@ 2020-07-31  9:00   ` Zhanghailiang
  2020-08-13 10:27   ` Derek Su
  1 sibling, 0 replies; 9+ messages in thread
From: Zhanghailiang @ 2020-07-31  9:00 UTC (permalink / raw)
  To: Lukas Straub, Derek Su
  Cc: chyang, quintela, qemu-devel, dgilbert, ctcheng, jwsu1986

Hi Lukas Straub & Derek,

Sorry for the late reply, too busy these days ;)

> -----Original Message-----
> From: Lukas Straub [mailto:lukasstraub2@web.de]
> Sent: Friday, July 31, 2020 3:52 PM
> To: Derek Su <dereksu@qnap.com>
> Cc: qemu-devel@nongnu.org; Zhanghailiang
> <zhang.zhanghailiang@huawei.com>; chyang@qnap.com;
> quintela@redhat.com; dgilbert@redhat.com; ctcheng@qnap.com;
> jwsu1986@gmail.com
> Subject: Re: [PATCH v1 0/1] COLO: migrate dirty ram pages before colo
> checkpoint
> 
> On Sun, 21 Jun 2020 10:10:03 +0800
> Derek Su <dereksu@qnap.com> wrote:
> 
> > This series is to reduce the guest's downtime during colo checkpoint
> > by migrating dirty ram pages as many as possible before colo checkpoint.
> >
> > If the iteration count reaches COLO_RAM_MIGRATE_ITERATION_MAX or
> ram
> > pending size is lower than 'x-colo-migrate-ram-threshold', stop the
> > ram migration and do colo checkpoint.
> >
> > Test environment:
> > The both primary VM and secondary VM has 1GiB ram and 10GbE NIC for
> FT
> > traffic.
> > One fio buffer write job runs on the guest.
> > The result shows the total primary VM downtime is decreased by ~40%.
> >
> > Please help to review it and suggestions are welcomed.
> > Thanks.
> 
> Hello Derek,
> Sorry for the late reply.
> I think this is not a good idea, because it unnecessarily introduces a delay
> between checkpoint request and the checkpoint itself and thus impairs
> network bound workloads due to increased network latency. Workloads that
> are independent from network don't cause many checkpoints anyway, so it
> doesn't help there either.
> 

Agreed, though it seems to reduce VM's downtime while do checkpoint, but
It doesn't help to reduce network latency, because the network packages which are
Different between SVM and PVM caused this checkpoint request, it will be blocked
Until finishing checkpoint process.


> Hailang did have a patch to migrate ram between checkpoints, which should
> help all workloads, but it wasn't merged back then. I think you can pick it up
> again, rebase and address David's and Eric's comments:
> https://lore.kernel.org/qemu-devel/20200217012049.22988-3-zhang.zhang
> hailiang@huawei.com/T/#u
>  

The second one is not merged, which can help reduce the downtime.

> Hailang, are you ok with that?
> 

Yes. @Derek, please feel free to pick it up if you would like to ;)


Thanks,
Hailiang

> Regards,
> Lukas Straub


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v1 0/1] COLO: migrate dirty ram pages before colo checkpoint
  2020-07-31  7:52 ` [PATCH v1 0/1] COLO: " Lukas Straub
  2020-07-31  9:00   ` Zhanghailiang
@ 2020-08-13 10:27   ` Derek Su
  2020-08-15  1:41     ` Zhanghailiang
  1 sibling, 1 reply; 9+ messages in thread
From: Derek Su @ 2020-08-13 10:27 UTC (permalink / raw)
  To: Lukas Straub
  Cc: zhang.zhanghailiang, chyang, quintela, Derek Su, dgilbert,
	qemu-devel, ctcheng

On Fri, Jul 31, 2020 at 3:52 PM Lukas Straub <lukasstraub2@web.de> wrote:
>
> On Sun, 21 Jun 2020 10:10:03 +0800
> Derek Su <dereksu@qnap.com> wrote:
>
> > This series is to reduce the guest's downtime during colo checkpoint
> > by migrating dirty ram pages as many as possible before colo checkpoint.
> >
> > If the iteration count reaches COLO_RAM_MIGRATE_ITERATION_MAX or
> > ram pending size is lower than 'x-colo-migrate-ram-threshold',
> > stop the ram migration and do colo checkpoint.
> >
> > Test environment:
> > The both primary VM and secondary VM has 1GiB ram and 10GbE NIC
> > for FT traffic.
> > One fio buffer write job runs on the guest.
> > The result shows the total primary VM downtime is decreased by ~40%.
> >
> > Please help to review it and suggestions are welcomed.
> > Thanks.
>
> Hello Derek,
> Sorry for the late reply.
> I think this is not a good idea, because it unnecessarily introduces a delay between checkpoint request and the checkpoint itself and thus impairs network bound workloads due to increased network latency. Workloads that are independent from network don't cause many checkpoints anyway, so it doesn't help there either.
>

Hello, Lukas & Zhanghailiang

Thanks for your opinions.
I went through my patch, and I feel a little confused and would like
to dig into it more.

In this patch, colo_migrate_ram_before_checkpoint() is before
COLO_MESSAGE_CHECKPOINT_REQUEST,
so the SVM and PVM should not enter the pause state.

In the meanwhile, the packets to PVM/SVM can still be compared and
notify inconsistency if mismatched, right?
Is it possible to introduce extra network latency?

In my test (randwrite to disk by fio with direct=0),
the ping from another client to the PVM  using generic colo and colo
used this patch are below.
The network latency does not increase as my expectation.

generic colo
```
64 bytes from 192.168.80.18: icmp_seq=87 ttl=64 time=28.109 ms
64 bytes from 192.168.80.18: icmp_seq=88 ttl=64 time=16.747 ms
64 bytes from 192.168.80.18: icmp_seq=89 ttl=64 time=2388.779 ms
<----checkpoint start
64 bytes from 192.168.80.18: icmp_seq=90 ttl=64 time=1385.792 ms
64 bytes from 192.168.80.18: icmp_seq=91 ttl=64 time=384.896 ms
<----checkpoint end
64 bytes from 192.168.80.18: icmp_seq=92 ttl=64 time=3.895 ms
64 bytes from 192.168.80.18: icmp_seq=93 ttl=64 time=1.020 ms
64 bytes from 192.168.80.18: icmp_seq=94 ttl=64 time=0.865 ms
64 bytes from 192.168.80.18: icmp_seq=95 ttl=64 time=0.854 ms
64 bytes from 192.168.80.18: icmp_seq=96 ttl=64 time=28.359 ms
64 bytes from 192.168.80.18: icmp_seq=97 ttl=64 time=12.309 ms
64 bytes from 192.168.80.18: icmp_seq=98 ttl=64 time=0.870 ms
64 bytes from 192.168.80.18: icmp_seq=99 ttl=64 time=2371.733 ms
64 bytes from 192.168.80.18: icmp_seq=100 ttl=64 time=1371.440 ms
64 bytes from 192.168.80.18: icmp_seq=101 ttl=64 time=366.414 ms
64 bytes from 192.168.80.18: icmp_seq=102 ttl=64 time=0.818 ms
64 bytes from 192.168.80.18: icmp_seq=103 ttl=64 time=0.997 ms
```

colo used this patch
```
64 bytes from 192.168.80.18: icmp_seq=72 ttl=64 time=1.417 ms
64 bytes from 192.168.80.18: icmp_seq=73 ttl=64 time=0.931 ms
64 bytes from 192.168.80.18: icmp_seq=74 ttl=64 time=0.876 ms
64 bytes from 192.168.80.18: icmp_seq=75 ttl=64 time=1184.034 ms
<----checkpoint start
64 bytes from 192.168.80.18: icmp_seq=76 ttl=64 time=181.297 ms
<----checkpoint end
64 bytes from 192.168.80.18: icmp_seq=77 ttl=64 time=0.865 ms
64 bytes from 192.168.80.18: icmp_seq=78 ttl=64 time=0.858 ms
64 bytes from 192.168.80.18: icmp_seq=79 ttl=64 time=1.247 ms
64 bytes from 192.168.80.18: icmp_seq=80 ttl=64 time=0.946 ms
64 bytes from 192.168.80.18: icmp_seq=81 ttl=64 time=0.855 ms
64 bytes from 192.168.80.18: icmp_seq=82 ttl=64 time=0.868 ms
64 bytes from 192.168.80.18: icmp_seq=83 ttl=64 time=0.749 ms
64 bytes from 192.168.80.18: icmp_seq=84 ttl=64 time=2.154 ms
64 bytes from 192.168.80.18: icmp_seq=85 ttl=64 time=1499.186 ms
64 bytes from 192.168.80.18: icmp_seq=86 ttl=64 time=496.173 ms
64 bytes from 192.168.80.18: icmp_seq=87 ttl=64 time=0.854 ms
64 bytes from 192.168.80.18: icmp_seq=88 ttl=64 time=0.774 ms
```

Thank you.

Regards,
Derek

> Hailang did have a patch to migrate ram between checkpoints, which should help all workloads, but it wasn't merged back then. I think you can pick it up again, rebase and address David's and Eric's comments:
> https://lore.kernel.org/qemu-devel/20200217012049.22988-3-zhang.zhanghailiang@huawei.com/T/#u
>
> Hailang, are you ok with that?
>
> Regards,
> Lukas Straub


^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH v1 0/1] COLO: migrate dirty ram pages before colo checkpoint
  2020-08-13 10:27   ` Derek Su
@ 2020-08-15  1:41     ` Zhanghailiang
  2020-08-15  2:45       ` Derek Su
  0 siblings, 1 reply; 9+ messages in thread
From: Zhanghailiang @ 2020-08-15  1:41 UTC (permalink / raw)
  To: Derek Su, Lukas Straub
  Cc: qemu-devel, chyang, quintela, Derek Su, dgilbert, ctcheng

> -----Original Message-----
> From: Derek Su [mailto:jwsu1986@gmail.com]
> Sent: Thursday, August 13, 2020 6:28 PM
> To: Lukas Straub <lukasstraub2@web.de>
> Cc: Derek Su <dereksu@qnap.com>; qemu-devel@nongnu.org; Zhanghailiang
> <zhang.zhanghailiang@huawei.com>; chyang@qnap.com; quintela@redhat.com;
> dgilbert@redhat.com; ctcheng@qnap.com
> Subject: Re: [PATCH v1 0/1] COLO: migrate dirty ram pages before colo
> checkpoint
> 
> On Fri, Jul 31, 2020 at 3:52 PM Lukas Straub <lukasstraub2@web.de> wrote:
> >
> > On Sun, 21 Jun 2020 10:10:03 +0800
> > Derek Su <dereksu@qnap.com> wrote:
> >
> > > This series is to reduce the guest's downtime during colo checkpoint
> > > by migrating dirty ram pages as many as possible before colo checkpoint.
> > >
> > > If the iteration count reaches COLO_RAM_MIGRATE_ITERATION_MAX or ram
> > > pending size is lower than 'x-colo-migrate-ram-threshold', stop the
> > > ram migration and do colo checkpoint.
> > >
> > > Test environment:
> > > The both primary VM and secondary VM has 1GiB ram and 10GbE NIC for
> > > FT traffic.
> > > One fio buffer write job runs on the guest.
> > > The result shows the total primary VM downtime is decreased by ~40%.
> > >
> > > Please help to review it and suggestions are welcomed.
> > > Thanks.
> >
> > Hello Derek,
> > Sorry for the late reply.
> > I think this is not a good idea, because it unnecessarily introduces a delay
> between checkpoint request and the checkpoint itself and thus impairs network
> bound workloads due to increased network latency. Workloads that are
> independent from network don't cause many checkpoints anyway, so it doesn't
> help there either.
> >
> 

Hi Derek,

Actually, There is a quit interesting question we should think: 
What will happen if VM continues to run after detected a mismatched state between PVM and SVM,
According to the rules of COLO, we should stop VMs immediately to sync the state between PVM and SVM,
But here, you choose them to continue to run for a while, then there may be more client's network packages
Coming, and may cause more memory pages dirty, another side effect is the new network packages will not
Be sent out with high probability, because their replies should be different since the state between PVM and SVM is different.

So, IMHO, it makes non-sense to let VMs to continue to run after detected them in different state.
Besides, I don't think it is easy to construct this case in tests.


Thanks,
Hailiang

s> Hello, Lukas & Zhanghailiang
> 
> Thanks for your opinions.
> I went through my patch, and I feel a little confused and would like to dig into it
> more.
> 
> In this patch, colo_migrate_ram_before_checkpoint() is before
> COLO_MESSAGE_CHECKPOINT_REQUEST, so the SVM and PVM should not enter
> the pause state.
> 
> In the meanwhile, the packets to PVM/SVM can still be compared and notify
> inconsistency if mismatched, right?
> Is it possible to introduce extra network latency?
> 
> In my test (randwrite to disk by fio with direct=0), the ping from another client to
> the PVM  using generic colo and colo used this patch are below.
> The network latency does not increase as my expectation.
> 
> generic colo
> ```
> 64 bytes from 192.168.80.18: icmp_seq=87 ttl=64 time=28.109 ms
> 64 bytes from 192.168.80.18: icmp_seq=88 ttl=64 time=16.747 ms
> 64 bytes from 192.168.80.18: icmp_seq=89 ttl=64 time=2388.779 ms
> <----checkpoint start
> 64 bytes from 192.168.80.18: icmp_seq=90 ttl=64 time=1385.792 ms
> 64 bytes from 192.168.80.18: icmp_seq=91 ttl=64 time=384.896 ms
> <----checkpoint end
> 64 bytes from 192.168.80.18: icmp_seq=92 ttl=64 time=3.895 ms
> 64 bytes from 192.168.80.18: icmp_seq=93 ttl=64 time=1.020 ms
> 64 bytes from 192.168.80.18: icmp_seq=94 ttl=64 time=0.865 ms
> 64 bytes from 192.168.80.18: icmp_seq=95 ttl=64 time=0.854 ms
> 64 bytes from 192.168.80.18: icmp_seq=96 ttl=64 time=28.359 ms
> 64 bytes from 192.168.80.18: icmp_seq=97 ttl=64 time=12.309 ms
> 64 bytes from 192.168.80.18: icmp_seq=98 ttl=64 time=0.870 ms
> 64 bytes from 192.168.80.18: icmp_seq=99 ttl=64 time=2371.733 ms
> 64 bytes from 192.168.80.18: icmp_seq=100 ttl=64 time=1371.440 ms
> 64 bytes from 192.168.80.18: icmp_seq=101 ttl=64 time=366.414 ms
> 64 bytes from 192.168.80.18: icmp_seq=102 ttl=64 time=0.818 ms
> 64 bytes from 192.168.80.18: icmp_seq=103 ttl=64 time=0.997 ms ```
> 
> colo used this patch
> ```
> 64 bytes from 192.168.80.18: icmp_seq=72 ttl=64 time=1.417 ms
> 64 bytes from 192.168.80.18: icmp_seq=73 ttl=64 time=0.931 ms
> 64 bytes from 192.168.80.18: icmp_seq=74 ttl=64 time=0.876 ms
> 64 bytes from 192.168.80.18: icmp_seq=75 ttl=64 time=1184.034 ms
> <----checkpoint start
> 64 bytes from 192.168.80.18: icmp_seq=76 ttl=64 time=181.297 ms
> <----checkpoint end
> 64 bytes from 192.168.80.18: icmp_seq=77 ttl=64 time=0.865 ms
> 64 bytes from 192.168.80.18: icmp_seq=78 ttl=64 time=0.858 ms
> 64 bytes from 192.168.80.18: icmp_seq=79 ttl=64 time=1.247 ms
> 64 bytes from 192.168.80.18: icmp_seq=80 ttl=64 time=0.946 ms
> 64 bytes from 192.168.80.18: icmp_seq=81 ttl=64 time=0.855 ms
> 64 bytes from 192.168.80.18: icmp_seq=82 ttl=64 time=0.868 ms
> 64 bytes from 192.168.80.18: icmp_seq=83 ttl=64 time=0.749 ms
> 64 bytes from 192.168.80.18: icmp_seq=84 ttl=64 time=2.154 ms
> 64 bytes from 192.168.80.18: icmp_seq=85 ttl=64 time=1499.186 ms
> 64 bytes from 192.168.80.18: icmp_seq=86 ttl=64 time=496.173 ms
> 64 bytes from 192.168.80.18: icmp_seq=87 ttl=64 time=0.854 ms
> 64 bytes from 192.168.80.18: icmp_seq=88 ttl=64 time=0.774 ms ```
> 
> Thank you.
> 
> Regards,
> Derek
> 
> > Hailang did have a patch to migrate ram between checkpoints, which should
> help all workloads, but it wasn't merged back then. I think you can pick it up again,
> rebase and address David's and Eric's comments:
> > https://lore.kernel.org/qemu-devel/20200217012049.22988-3-zhang.zhangh
> > ailiang@huawei.com/T/#u
> >
> > Hailang, are you ok with that?
> >
> > Regards,
> > Lukas Straub

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v1 0/1] COLO: migrate dirty ram pages before colo checkpoint
  2020-08-15  1:41     ` Zhanghailiang
@ 2020-08-15  2:45       ` Derek Su
  0 siblings, 0 replies; 9+ messages in thread
From: Derek Su @ 2020-08-15  2:45 UTC (permalink / raw)
  To: Zhanghailiang
  Cc: Lukas Straub, chyang, quintela, Derek Su, dgilbert, qemu-devel, ctcheng

On Sat, Aug 15, 2020 at 9:42 AM Zhanghailiang
<zhang.zhanghailiang@huawei.com> wrote:
>
> > -----Original Message-----
> > From: Derek Su [mailto:jwsu1986@gmail.com]
> > Sent: Thursday, August 13, 2020 6:28 PM
> > To: Lukas Straub <lukasstraub2@web.de>
> > Cc: Derek Su <dereksu@qnap.com>; qemu-devel@nongnu.org; Zhanghailiang
> > <zhang.zhanghailiang@huawei.com>; chyang@qnap.com; quintela@redhat.com;
> > dgilbert@redhat.com; ctcheng@qnap.com
> > Subject: Re: [PATCH v1 0/1] COLO: migrate dirty ram pages before colo
> > checkpoint
> >
> > On Fri, Jul 31, 2020 at 3:52 PM Lukas Straub <lukasstraub2@web.de> wrote:
> > >
> > > On Sun, 21 Jun 2020 10:10:03 +0800
> > > Derek Su <dereksu@qnap.com> wrote:
> > >
> > > > This series is to reduce the guest's downtime during colo checkpoint
> > > > by migrating dirty ram pages as many as possible before colo checkpoint.
> > > >
> > > > If the iteration count reaches COLO_RAM_MIGRATE_ITERATION_MAX or ram
> > > > pending size is lower than 'x-colo-migrate-ram-threshold', stop the
> > > > ram migration and do colo checkpoint.
> > > >
> > > > Test environment:
> > > > The both primary VM and secondary VM has 1GiB ram and 10GbE NIC for
> > > > FT traffic.
> > > > One fio buffer write job runs on the guest.
> > > > The result shows the total primary VM downtime is decreased by ~40%.
> > > >
> > > > Please help to review it and suggestions are welcomed.
> > > > Thanks.
> > >
> > > Hello Derek,
> > > Sorry for the late reply.
> > > I think this is not a good idea, because it unnecessarily introduces a delay
> > between checkpoint request and the checkpoint itself and thus impairs network
> > bound workloads due to increased network latency. Workloads that are
> > independent from network don't cause many checkpoints anyway, so it doesn't
> > help there either.
> > >
> >
>
> Hi Derek,
>
> Actually, There is a quit interesting question we should think:
> What will happen if VM continues to run after detected a mismatched state between PVM and SVM,
> According to the rules of COLO, we should stop VMs immediately to sync the state between PVM and SVM,
> But here, you choose them to continue to run for a while, then there may be more client's network packages
> Coming, and may cause more memory pages dirty, another side effect is the new network packages will not
> Be sent out with high probability, because their replies should be different since the state between PVM and SVM is different.
>
> So, IMHO, it makes non-sense to let VMs to continue to run after detected them in different state.
> Besides, I don't think it is easy to construct this case in tests.
>
>
> Thanks,
> Hailiang
>

Hello, Hailiang

Thanks. I got your point.
In my tests, the mismatch between packets does not happen, so the
network latency does not increase.

By the way, I've tried your commit addressing this issue.
It is useful for low dirty memory and low dirty rate workload.

But in high "buffered IO read/write" workload, it results in PVM
resends massive and same dirty ram pages  every cycle triggered
by DEFAULT_RAM_PENDING_CHECK (default 1 second) timer, so hurt the IO
performance and without improvement of downtime?
Do you have any thoughts about this?

Is it possible to separate the checkpoint invoked by the periodic
timer and the packet mismatch and to use a different strategy
to cope with the long downtime issue?

Thanks.

Regards,
Derek

> s> Hello, Lukas & Zhanghailiang
> >
> > Thanks for your opinions.
> > I went through my patch, and I feel a little confused and would like to dig into it
> > more.
> >
> > In this patch, colo_migrate_ram_before_checkpoint() is before
> > COLO_MESSAGE_CHECKPOINT_REQUEST, so the SVM and PVM should not enter
> > the pause state.
> >
> > In the meanwhile, the packets to PVM/SVM can still be compared and notify
> > inconsistency if mismatched, right?
> > Is it possible to introduce extra network latency?
> >
> > In my test (randwrite to disk by fio with direct=0), the ping from another client to
> > the PVM  using generic colo and colo used this patch are below.
> > The network latency does not increase as my expectation.
> >
> > generic colo
> > ```
> > 64 bytes from 192.168.80.18: icmp_seq=87 ttl=64 time=28.109 ms
> > 64 bytes from 192.168.80.18: icmp_seq=88 ttl=64 time=16.747 ms
> > 64 bytes from 192.168.80.18: icmp_seq=89 ttl=64 time=2388.779 ms
> > <----checkpoint start
> > 64 bytes from 192.168.80.18: icmp_seq=90 ttl=64 time=1385.792 ms
> > 64 bytes from 192.168.80.18: icmp_seq=91 ttl=64 time=384.896 ms
> > <----checkpoint end
> > 64 bytes from 192.168.80.18: icmp_seq=92 ttl=64 time=3.895 ms
> > 64 bytes from 192.168.80.18: icmp_seq=93 ttl=64 time=1.020 ms
> > 64 bytes from 192.168.80.18: icmp_seq=94 ttl=64 time=0.865 ms
> > 64 bytes from 192.168.80.18: icmp_seq=95 ttl=64 time=0.854 ms
> > 64 bytes from 192.168.80.18: icmp_seq=96 ttl=64 time=28.359 ms
> > 64 bytes from 192.168.80.18: icmp_seq=97 ttl=64 time=12.309 ms
> > 64 bytes from 192.168.80.18: icmp_seq=98 ttl=64 time=0.870 ms
> > 64 bytes from 192.168.80.18: icmp_seq=99 ttl=64 time=2371.733 ms
> > 64 bytes from 192.168.80.18: icmp_seq=100 ttl=64 time=1371.440 ms
> > 64 bytes from 192.168.80.18: icmp_seq=101 ttl=64 time=366.414 ms
> > 64 bytes from 192.168.80.18: icmp_seq=102 ttl=64 time=0.818 ms
> > 64 bytes from 192.168.80.18: icmp_seq=103 ttl=64 time=0.997 ms ```
> >
> > colo used this patch
> > ```
> > 64 bytes from 192.168.80.18: icmp_seq=72 ttl=64 time=1.417 ms
> > 64 bytes from 192.168.80.18: icmp_seq=73 ttl=64 time=0.931 ms
> > 64 bytes from 192.168.80.18: icmp_seq=74 ttl=64 time=0.876 ms
> > 64 bytes from 192.168.80.18: icmp_seq=75 ttl=64 time=1184.034 ms
> > <----checkpoint start
> > 64 bytes from 192.168.80.18: icmp_seq=76 ttl=64 time=181.297 ms
> > <----checkpoint end
> > 64 bytes from 192.168.80.18: icmp_seq=77 ttl=64 time=0.865 ms
> > 64 bytes from 192.168.80.18: icmp_seq=78 ttl=64 time=0.858 ms
> > 64 bytes from 192.168.80.18: icmp_seq=79 ttl=64 time=1.247 ms
> > 64 bytes from 192.168.80.18: icmp_seq=80 ttl=64 time=0.946 ms
> > 64 bytes from 192.168.80.18: icmp_seq=81 ttl=64 time=0.855 ms
> > 64 bytes from 192.168.80.18: icmp_seq=82 ttl=64 time=0.868 ms
> > 64 bytes from 192.168.80.18: icmp_seq=83 ttl=64 time=0.749 ms
> > 64 bytes from 192.168.80.18: icmp_seq=84 ttl=64 time=2.154 ms
> > 64 bytes from 192.168.80.18: icmp_seq=85 ttl=64 time=1499.186 ms
> > 64 bytes from 192.168.80.18: icmp_seq=86 ttl=64 time=496.173 ms
> > 64 bytes from 192.168.80.18: icmp_seq=87 ttl=64 time=0.854 ms
> > 64 bytes from 192.168.80.18: icmp_seq=88 ttl=64 time=0.774 ms ```
> >
> > Thank you.
> >
> > Regards,
> > Derek
> >
> > > Hailang did have a patch to migrate ram between checkpoints, which should
> > help all workloads, but it wasn't merged back then. I think you can pick it up again,
> > rebase and address David's and Eric's comments:
> > > https://lore.kernel.org/qemu-devel/20200217012049.22988-3-zhang.zhangh
> > > ailiang@huawei.com/T/#u
> > >
> > > Hailang, are you ok with that?
> > >
> > > Regards,
> > > Lukas Straub


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-08-15 17:33 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-21  2:10 [PATCH v1 0/1] COLO: migrate dirty ram pages before colo checkpoint Derek Su
2020-06-21  2:10 ` [PATCH v1 1/1] migration/colo.c: " Derek Su
2020-06-22 17:23   ` Eric Blake
2020-07-13  9:03   ` Derek Su
2020-07-31  7:52 ` [PATCH v1 0/1] COLO: " Lukas Straub
2020-07-31  9:00   ` Zhanghailiang
2020-08-13 10:27   ` Derek Su
2020-08-15  1:41     ` Zhanghailiang
2020-08-15  2:45       ` Derek Su

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).