qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] Optimize VM's downtime while do checkpoint in COLO
@ 2020-02-17  1:20 Hailiang Zhang
  2020-02-17  1:20 ` [PATCH 1/3] migration/colo: wrap incoming checkpoint process into new helper Hailiang Zhang
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Hailiang Zhang @ 2020-02-17  1:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: danielcho, chen.zhang, Hailiang Zhang, dgilbert, quintela

Hi,

This is an untested serial that tries to reduce VM's pause time
while do checkpoint in COLO state.

The second patch tries to reduce the total number of dirty pages
while do checkpoint with VM been paused, instead of sending all
dirty pages while VM been pause, it sends part of dirty pages during
the gap time of two checkpoints when SVM and PVM are running.

The third patch tries to reduce the pause time of backup ram into
cache in secondary part.


Hailiang Zhang (3):
  migration/colo: wrap incoming checkpoint process into new helper
  COLO: Migrate dirty pages during the gap of checkpointing
  COLO: Optimize memory back-up process

 migration/colo.c       | 332 +++++++++++++++++++++++++----------------
 migration/migration.h  |   1 +
 migration/ram.c        |  35 ++++-
 migration/ram.h        |   1 +
 migration/trace-events |   1 +
 qapi/migration.json    |   4 +-
 6 files changed, 234 insertions(+), 140 deletions(-)

-- 
2.21.0




^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/3] migration/colo: wrap incoming checkpoint process into new helper
  2020-02-17  1:20 [PATCH 0/3] Optimize VM's downtime while do checkpoint in COLO Hailiang Zhang
@ 2020-02-17  1:20 ` Hailiang Zhang
  2020-02-19 18:24   ` Dr. David Alan Gilbert
  2020-02-17  1:20 ` [PATCH 2/3] COLO: Migrate dirty pages during the gap of checkpointing Hailiang Zhang
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Hailiang Zhang @ 2020-02-17  1:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: danielcho, chen.zhang, Hailiang Zhang, dgilbert, quintela

Split checkpoint incoming process into a helper.

Signed-off-by: Hailiang Zhang <zhang.zhanghailiang@huawei.com>
---
 migration/colo.c | 260 ++++++++++++++++++++++++-----------------------
 1 file changed, 133 insertions(+), 127 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index 2c88aa57a2..93c5a452fb 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -664,13 +664,138 @@ void migrate_start_colo_process(MigrationState *s)
     qemu_mutex_lock_iothread();
 }
 
-static void colo_wait_handle_message(QEMUFile *f, int *checkpoint_request,
-                                     Error **errp)
+static void colo_incoming_process_checkpoint(MigrationIncomingState *mis,
+                      QEMUFile *fb, QIOChannelBuffer *bioc, Error **errp)
+{
+    uint64_t total_size;
+    uint64_t value;
+    Error *local_err = NULL;
+    int ret;
+
+    qemu_mutex_lock_iothread();
+    vm_stop_force_state(RUN_STATE_COLO);
+    trace_colo_vm_state_change("run", "stop");
+    qemu_mutex_unlock_iothread();
+
+    /* FIXME: This is unnecessary for periodic checkpoint mode */
+    colo_send_message(mis->to_src_file, COLO_MESSAGE_CHECKPOINT_REPLY,
+                 &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+
+    colo_receive_check_message(mis->from_src_file,
+                       COLO_MESSAGE_VMSTATE_SEND, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+
+    qemu_mutex_lock_iothread();
+    cpu_synchronize_all_pre_loadvm();
+    ret = qemu_loadvm_state_main(mis->from_src_file, mis);
+    qemu_mutex_unlock_iothread();
+
+    if (ret < 0) {
+        error_setg(errp, "Load VM's live state (ram) error");
+        return;
+    }
+
+    value = colo_receive_message_value(mis->from_src_file,
+                             COLO_MESSAGE_VMSTATE_SIZE, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+
+    /*
+     * Read VM device state data into channel buffer,
+     * It's better to re-use the memory allocated.
+     * Here we need to handle the channel buffer directly.
+     */
+    if (value > bioc->capacity) {
+        bioc->capacity = value;
+        bioc->data = g_realloc(bioc->data, bioc->capacity);
+    }
+    total_size = qemu_get_buffer(mis->from_src_file, bioc->data, value);
+    if (total_size != value) {
+        error_setg(errp, "Got %" PRIu64 " VMState data, less than expected"
+                    " %" PRIu64, total_size, value);
+        return;
+    }
+    bioc->usage = total_size;
+    qio_channel_io_seek(QIO_CHANNEL(bioc), 0, 0, NULL);
+
+    colo_send_message(mis->to_src_file, COLO_MESSAGE_VMSTATE_RECEIVED,
+                 &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+
+    qemu_mutex_lock_iothread();
+    vmstate_loading = true;
+    ret = qemu_load_device_state(fb);
+    if (ret < 0) {
+        error_setg(errp, "COLO: load device state failed");
+        qemu_mutex_unlock_iothread();
+        return;
+    }
+
+#ifdef CONFIG_REPLICATION
+    replication_get_error_all(&local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        qemu_mutex_unlock_iothread();
+        return;
+    }
+
+    /* discard colo disk buffer */
+    replication_do_checkpoint_all(&local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        qemu_mutex_unlock_iothread();
+        return;
+    }
+#else
+    abort();
+#endif
+    /* Notify all filters of all NIC to do checkpoint */
+    colo_notify_filters_event(COLO_EVENT_CHECKPOINT, &local_err);
+
+    if (local_err) {
+        error_propagate(errp, local_err);
+        qemu_mutex_unlock_iothread();
+        return;
+    }
+
+    vmstate_loading = false;
+    vm_start();
+    trace_colo_vm_state_change("stop", "run");
+    qemu_mutex_unlock_iothread();
+
+    if (failover_get_state() == FAILOVER_STATUS_RELAUNCH) {
+        failover_set_state(FAILOVER_STATUS_RELAUNCH,
+                        FAILOVER_STATUS_NONE);
+        failover_request_active(NULL);
+        return;
+    }
+
+    colo_send_message(mis->to_src_file, COLO_MESSAGE_VMSTATE_LOADED,
+                 &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+    }
+}
+
+static void colo_wait_handle_message(MigrationIncomingState *mis,
+                QEMUFile *fb, QIOChannelBuffer *bioc, Error **errp)
 {
     COLOMessage msg;
     Error *local_err = NULL;
 
-    msg = colo_receive_message(f, &local_err);
+    msg = colo_receive_message(mis->from_src_file, &local_err);
     if (local_err) {
         error_propagate(errp, local_err);
         return;
@@ -678,10 +803,9 @@ static void colo_wait_handle_message(QEMUFile *f, int *checkpoint_request,
 
     switch (msg) {
     case COLO_MESSAGE_CHECKPOINT_REQUEST:
-        *checkpoint_request = 1;
+        colo_incoming_process_checkpoint(mis, fb, bioc, errp);
         break;
     default:
-        *checkpoint_request = 0;
         error_setg(errp, "Got unknown COLO message: %d", msg);
         break;
     }
@@ -692,10 +816,7 @@ void *colo_process_incoming_thread(void *opaque)
     MigrationIncomingState *mis = opaque;
     QEMUFile *fb = NULL;
     QIOChannelBuffer *bioc = NULL; /* Cache incoming device state */
-    uint64_t total_size;
-    uint64_t value;
     Error *local_err = NULL;
-    int ret;
 
     rcu_register_thread();
     qemu_sem_init(&mis->colo_incoming_sem, 0);
@@ -749,134 +870,19 @@ void *colo_process_incoming_thread(void *opaque)
     }
 
     while (mis->state == MIGRATION_STATUS_COLO) {
-        int request = 0;
-
-        colo_wait_handle_message(mis->from_src_file, &request, &local_err);
+        colo_wait_handle_message(mis, fb, bioc, &local_err);
         if (local_err) {
-            goto out;
+            error_report_err(local_err);
+            break;
         }
-        assert(request);
         if (failover_get_state() != FAILOVER_STATUS_NONE) {
             error_report("failover request");
-            goto out;
-        }
-
-        qemu_mutex_lock_iothread();
-        vm_stop_force_state(RUN_STATE_COLO);
-        trace_colo_vm_state_change("run", "stop");
-        qemu_mutex_unlock_iothread();
-
-        /* FIXME: This is unnecessary for periodic checkpoint mode */
-        colo_send_message(mis->to_src_file, COLO_MESSAGE_CHECKPOINT_REPLY,
-                     &local_err);
-        if (local_err) {
-            goto out;
-        }
-
-        colo_receive_check_message(mis->from_src_file,
-                           COLO_MESSAGE_VMSTATE_SEND, &local_err);
-        if (local_err) {
-            goto out;
-        }
-
-        qemu_mutex_lock_iothread();
-        cpu_synchronize_all_pre_loadvm();
-        ret = qemu_loadvm_state_main(mis->from_src_file, mis);
-        qemu_mutex_unlock_iothread();
-
-        if (ret < 0) {
-            error_report("Load VM's live state (ram) error");
-            goto out;
-        }
-
-        value = colo_receive_message_value(mis->from_src_file,
-                                 COLO_MESSAGE_VMSTATE_SIZE, &local_err);
-        if (local_err) {
-            goto out;
-        }
-
-        /*
-         * Read VM device state data into channel buffer,
-         * It's better to re-use the memory allocated.
-         * Here we need to handle the channel buffer directly.
-         */
-        if (value > bioc->capacity) {
-            bioc->capacity = value;
-            bioc->data = g_realloc(bioc->data, bioc->capacity);
-        }
-        total_size = qemu_get_buffer(mis->from_src_file, bioc->data, value);
-        if (total_size != value) {
-            error_report("Got %" PRIu64 " VMState data, less than expected"
-                        " %" PRIu64, total_size, value);
-            goto out;
-        }
-        bioc->usage = total_size;
-        qio_channel_io_seek(QIO_CHANNEL(bioc), 0, 0, NULL);
-
-        colo_send_message(mis->to_src_file, COLO_MESSAGE_VMSTATE_RECEIVED,
-                     &local_err);
-        if (local_err) {
-            goto out;
-        }
-
-        qemu_mutex_lock_iothread();
-        vmstate_loading = true;
-        ret = qemu_load_device_state(fb);
-        if (ret < 0) {
-            error_report("COLO: load device state failed");
-            qemu_mutex_unlock_iothread();
-            goto out;
-        }
-
-#ifdef CONFIG_REPLICATION
-        replication_get_error_all(&local_err);
-        if (local_err) {
-            qemu_mutex_unlock_iothread();
-            goto out;
-        }
-
-        /* discard colo disk buffer */
-        replication_do_checkpoint_all(&local_err);
-        if (local_err) {
-            qemu_mutex_unlock_iothread();
-            goto out;
-        }
-#else
-        abort();
-#endif
-        /* Notify all filters of all NIC to do checkpoint */
-        colo_notify_filters_event(COLO_EVENT_CHECKPOINT, &local_err);
-
-        if (local_err) {
-            qemu_mutex_unlock_iothread();
-            goto out;
-        }
-
-        vmstate_loading = false;
-        vm_start();
-        trace_colo_vm_state_change("stop", "run");
-        qemu_mutex_unlock_iothread();
-
-        if (failover_get_state() == FAILOVER_STATUS_RELAUNCH) {
-            failover_set_state(FAILOVER_STATUS_RELAUNCH,
-                            FAILOVER_STATUS_NONE);
-            failover_request_active(NULL);
-            goto out;
-        }
-
-        colo_send_message(mis->to_src_file, COLO_MESSAGE_VMSTATE_LOADED,
-                     &local_err);
-        if (local_err) {
-            goto out;
+            break;
         }
     }
 
 out:
     vmstate_loading = false;
-    /* Throw the unreported error message after exited from loop */
-    if (local_err) {
-        error_report_err(local_err);
-    }
 
     /*
      * There are only two reasons we can get here, some error happened
-- 
2.21.0




^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/3] COLO: Migrate dirty pages during the gap of checkpointing
  2020-02-17  1:20 [PATCH 0/3] Optimize VM's downtime while do checkpoint in COLO Hailiang Zhang
  2020-02-17  1:20 ` [PATCH 1/3] migration/colo: wrap incoming checkpoint process into new helper Hailiang Zhang
@ 2020-02-17  1:20 ` Hailiang Zhang
  2020-02-17 11:45   ` Eric Blake
  2020-02-19 18:51   ` Dr. David Alan Gilbert
  2020-02-17  1:20 ` [PATCH 3/3] COLO: Optimize memory back-up process Hailiang Zhang
  2020-02-20 18:27 ` [PATCH 0/3] Optimize VM's downtime while do checkpoint in COLO Dr. David Alan Gilbert
  3 siblings, 2 replies; 11+ messages in thread
From: Hailiang Zhang @ 2020-02-17  1:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: danielcho, chen.zhang, Hailiang Zhang, dgilbert, quintela

We can migrate some dirty pages during the gap of checkpointing,
by this way, we can reduce the amount of ram migrated during checkpointing.

Signed-off-by: Hailiang Zhang <zhang.zhanghailiang@huawei.com>
---
 migration/colo.c       | 69 +++++++++++++++++++++++++++++++++++++++---
 migration/migration.h  |  1 +
 migration/trace-events |  1 +
 qapi/migration.json    |  4 ++-
 4 files changed, 70 insertions(+), 5 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index 93c5a452fb..d30c6bc4ad 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -46,6 +46,13 @@ static COLOMode last_colo_mode;
 
 #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
 
+#define DEFAULT_RAM_PENDING_CHECK 1000
+
+/* should be calculated by bandwidth and max downtime ? */
+#define THRESHOLD_PENDING_SIZE (100 * 1024 * 1024UL)
+
+static int checkpoint_request;
+
 bool migration_in_colo_state(void)
 {
     MigrationState *s = migrate_get_current();
@@ -516,6 +523,20 @@ static void colo_compare_notify_checkpoint(Notifier *notifier, void *data)
     colo_checkpoint_notify(data);
 }
 
+static bool colo_need_migrate_ram_background(MigrationState *s)
+{
+    uint64_t pending_size, pend_pre, pend_compat, pend_post;
+    int64_t max_size = THRESHOLD_PENDING_SIZE;
+
+    qemu_savevm_state_pending(s->to_dst_file, max_size, &pend_pre,
+                              &pend_compat, &pend_post);
+    pending_size = pend_pre + pend_compat + pend_post;
+
+    trace_colo_need_migrate_ram_background(pending_size);
+    return (pending_size >= max_size);
+}
+
+
 static void colo_process_checkpoint(MigrationState *s)
 {
     QIOChannelBuffer *bioc;
@@ -571,6 +592,8 @@ static void colo_process_checkpoint(MigrationState *s)
 
     timer_mod(s->colo_delay_timer,
             current_time + s->parameters.x_checkpoint_delay);
+    timer_mod(s->pending_ram_check_timer,
+        current_time + DEFAULT_RAM_PENDING_CHECK);
 
     while (s->state == MIGRATION_STATUS_COLO) {
         if (failover_get_state() != FAILOVER_STATUS_NONE) {
@@ -583,10 +606,25 @@ static void colo_process_checkpoint(MigrationState *s)
         if (s->state != MIGRATION_STATUS_COLO) {
             goto out;
         }
-        ret = colo_do_checkpoint_transaction(s, bioc, fb);
-        if (ret < 0) {
-            goto out;
-        }
+        if (atomic_xchg(&checkpoint_request, 0)) {
+            /* start a colo checkpoint */
+            ret = colo_do_checkpoint_transaction(s, bioc, fb);
+            if (ret < 0) {
+                goto out;
+            }
+        } else {
+            if (colo_need_migrate_ram_background(s)) {
+                colo_send_message(s->to_dst_file,
+                                  COLO_MESSAGE_MIGRATE_RAM_BACKGROUND,
+                                  &local_err);
+                if (local_err) {
+                    goto out;
+                }
+
+                qemu_savevm_state_iterate(s->to_dst_file, false);
+                qemu_put_byte(s->to_dst_file, QEMU_VM_EOF);
+            }
+         }
     }
 
 out:
@@ -626,6 +664,8 @@ out:
     colo_compare_unregister_notifier(&packets_compare_notifier);
     timer_del(s->colo_delay_timer);
     timer_free(s->colo_delay_timer);
+    timer_del(s->pending_ram_check_timer);
+    timer_free(s->pending_ram_check_timer);
     qemu_sem_destroy(&s->colo_checkpoint_sem);
 
     /*
@@ -643,6 +683,7 @@ void colo_checkpoint_notify(void *opaque)
     MigrationState *s = opaque;
     int64_t next_notify_time;
 
+    atomic_inc(&checkpoint_request);
     qemu_sem_post(&s->colo_checkpoint_sem);
     s->colo_checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
     next_notify_time = s->colo_checkpoint_time +
@@ -650,6 +691,19 @@ void colo_checkpoint_notify(void *opaque)
     timer_mod(s->colo_delay_timer, next_notify_time);
 }
 
+static void colo_pending_ram_check_notify(void *opaque)
+{
+    int64_t next_notify_time;
+    MigrationState *s = opaque;
+
+    if (migration_in_colo_state()) {
+        next_notify_time = DEFAULT_RAM_PENDING_CHECK +
+                           qemu_clock_get_ms(QEMU_CLOCK_HOST);
+        timer_mod(s->pending_ram_check_timer, next_notify_time);
+        qemu_sem_post(&s->colo_checkpoint_sem);
+    }
+}
+
 void migrate_start_colo_process(MigrationState *s)
 {
     qemu_mutex_unlock_iothread();
@@ -657,6 +711,8 @@ void migrate_start_colo_process(MigrationState *s)
     s->colo_delay_timer =  timer_new_ms(QEMU_CLOCK_HOST,
                                 colo_checkpoint_notify, s);
 
+    s->pending_ram_check_timer = timer_new_ms(QEMU_CLOCK_HOST,
+                                colo_pending_ram_check_notify, s);
     qemu_sem_init(&s->colo_exit_sem, 0);
     migrate_set_state(&s->state, MIGRATION_STATUS_ACTIVE,
                       MIGRATION_STATUS_COLO);
@@ -805,6 +861,11 @@ static void colo_wait_handle_message(MigrationIncomingState *mis,
     case COLO_MESSAGE_CHECKPOINT_REQUEST:
         colo_incoming_process_checkpoint(mis, fb, bioc, errp);
         break;
+    case COLO_MESSAGE_MIGRATE_RAM_BACKGROUND:
+        if (qemu_loadvm_state_main(mis->from_src_file, mis) < 0) {
+            error_setg(errp, "Load ram background failed");
+        }
+        break;
     default:
         error_setg(errp, "Got unknown COLO message: %d", msg);
         break;
diff --git a/migration/migration.h b/migration/migration.h
index 8473ddfc88..5355259789 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -219,6 +219,7 @@ struct MigrationState
     QemuSemaphore colo_checkpoint_sem;
     int64_t colo_checkpoint_time;
     QEMUTimer *colo_delay_timer;
+    QEMUTimer *pending_ram_check_timer;
 
     /* The first error that has occurred.
        We used the mutex to be able to return the 1st error message */
diff --git a/migration/trace-events b/migration/trace-events
index 4ab0a503d2..f2ed0c8645 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -295,6 +295,7 @@ migration_tls_incoming_handshake_complete(void) ""
 colo_vm_state_change(const char *old, const char *new) "Change '%s' => '%s'"
 colo_send_message(const char *msg) "Send '%s' message"
 colo_receive_message(const char *msg) "Receive '%s' message"
+colo_need_migrate_ram_background(uint64_t pending_size) "Pending 0x%" PRIx64 " dirty ram"
 
 # colo-failover.c
 colo_failover_set_state(const char *new_state) "new state %s"
diff --git a/qapi/migration.json b/qapi/migration.json
index b7348d0c8b..ff7a4f18b0 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -977,12 +977,14 @@
 #
 # @vmstate-loaded: VM's state has been loaded by SVM.
 #
+# @migrate-ram-background: Send some dirty pages during the gap of COLO checkpoint
+#
 # Since: 2.8
 ##
 { 'enum': 'COLOMessage',
   'data': [ 'checkpoint-ready', 'checkpoint-request', 'checkpoint-reply',
             'vmstate-send', 'vmstate-size', 'vmstate-received',
-            'vmstate-loaded' ] }
+            'vmstate-loaded', 'migrate-ram-background' ] }
 
 ##
 # @COLOMode:
-- 
2.21.0




^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/3] COLO: Optimize memory back-up process
  2020-02-17  1:20 [PATCH 0/3] Optimize VM's downtime while do checkpoint in COLO Hailiang Zhang
  2020-02-17  1:20 ` [PATCH 1/3] migration/colo: wrap incoming checkpoint process into new helper Hailiang Zhang
  2020-02-17  1:20 ` [PATCH 2/3] COLO: Migrate dirty pages during the gap of checkpointing Hailiang Zhang
@ 2020-02-17  1:20 ` Hailiang Zhang
  2020-02-20 18:24   ` Dr. David Alan Gilbert
  2020-02-20 18:27 ` [PATCH 0/3] Optimize VM's downtime while do checkpoint in COLO Dr. David Alan Gilbert
  3 siblings, 1 reply; 11+ messages in thread
From: Hailiang Zhang @ 2020-02-17  1:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: danielcho, chen.zhang, Hailiang Zhang, dgilbert, quintela

This patch will reduce the downtime of VM for the initial process,
Privously, we copied all these memory in preparing stage of COLO
while we need to stop VM, which is a time-consuming process.
Here we optimize it by a trick, back-up every page while in migration
process while COLO is enabled, though it affects the speed of the
migration, but it obviously reduce the downtime of back-up all SVM'S
memory in COLO preparing stage.

Signed-off-by: Hailiang Zhang <zhang.zhanghailiang@huawei.com>
---
 migration/colo.c |  3 +++
 migration/ram.c  | 35 +++++++++++++++++++++++++++--------
 migration/ram.h  |  1 +
 3 files changed, 31 insertions(+), 8 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index d30c6bc4ad..febf010571 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -26,6 +26,7 @@
 #include "qemu/main-loop.h"
 #include "qemu/rcu.h"
 #include "migration/failover.h"
+#include "migration/ram.h"
 #ifdef CONFIG_REPLICATION
 #include "replication.h"
 #endif
@@ -906,6 +907,8 @@ void *colo_process_incoming_thread(void *opaque)
      */
     qemu_file_set_blocking(mis->from_src_file, true);
 
+    colo_incoming_start_dirty_log();
+
     bioc = qio_channel_buffer_new(COLO_BUFFER_BASE_SIZE);
     fb = qemu_fopen_channel_input(QIO_CHANNEL(bioc));
     object_unref(OBJECT(bioc));
diff --git a/migration/ram.c b/migration/ram.c
index ed23ed1c7c..24a8aa3527 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2986,7 +2986,6 @@ int colo_init_ram_cache(void)
                 }
                 return -errno;
             }
-            memcpy(block->colo_cache, block->host, block->used_length);
         }
     }
 
@@ -3005,12 +3004,16 @@ int colo_init_ram_cache(void)
             bitmap_set(block->bmap, 0, pages);
         }
     }
+
+    return 0;
+}
+
+void colo_incoming_start_dirty_log(void)
+{
     ram_state = g_new0(RAMState, 1);
     ram_state->migration_dirty_pages = 0;
     qemu_mutex_init(&ram_state->bitmap_mutex);
     memory_global_dirty_log_start();
-
-    return 0;
 }
 
 /* It is need to hold the global lock to call this helper */
@@ -3348,7 +3351,7 @@ static int ram_load_precopy(QEMUFile *f)
 
     while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) {
         ram_addr_t addr, total_ram_bytes;
-        void *host = NULL;
+        void *host = NULL, *host_bak = NULL;
         uint8_t ch;
 
         /*
@@ -3378,13 +3381,26 @@ static int ram_load_precopy(QEMUFile *f)
         if (flags & (RAM_SAVE_FLAG_ZERO | RAM_SAVE_FLAG_PAGE |
                      RAM_SAVE_FLAG_COMPRESS_PAGE | RAM_SAVE_FLAG_XBZRLE)) {
             RAMBlock *block = ram_block_from_stream(f, flags);
-
             /*
-             * After going into COLO, we should load the Page into colo_cache.
+             * After going into COLO, we should load the Page into colo_cache
+             * NOTE: We need to keep a copy of SVM's ram in colo_cache.
+             * Privously, we copied all these memory in preparing stage of COLO
+             * while we need to stop VM, which is a time-consuming process.
+             * Here we optimize it by a trick, back-up every page while in
+             * migration process while COLO is enabled, though it affects the
+             * speed of the migration, but it obviously reduce the downtime of
+             * back-up all SVM'S memory in COLO preparing stage.
              */
-            if (migration_incoming_in_colo_state()) {
+            if (migration_incoming_colo_enabled()) {
                 host = colo_cache_from_block_offset(block, addr);
-            } else {
+                /*
+                 * After going into COLO, load the Page into colo_cache.
+                 */
+                if (!migration_incoming_in_colo_state()) {
+                    host_bak = host;
+                }
+            }
+            if (!migration_incoming_in_colo_state()) {
                 host = host_from_ram_block_offset(block, addr);
             }
             if (!host) {
@@ -3506,6 +3522,9 @@ static int ram_load_precopy(QEMUFile *f)
         if (!ret) {
             ret = qemu_file_get_error(f);
         }
+        if (!ret && host_bak && host) {
+            memcpy(host_bak, host, TARGET_PAGE_SIZE);
+        }
     }
 
     ret |= wait_for_decompress_done();
diff --git a/migration/ram.h b/migration/ram.h
index a553d40751..5ceaff7cb4 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -66,5 +66,6 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *rb);
 /* ram cache */
 int colo_init_ram_cache(void);
 void colo_release_ram_cache(void);
+void colo_incoming_start_dirty_log(void);
 
 #endif
-- 
2.21.0




^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] COLO: Migrate dirty pages during the gap of checkpointing
  2020-02-17  1:20 ` [PATCH 2/3] COLO: Migrate dirty pages during the gap of checkpointing Hailiang Zhang
@ 2020-02-17 11:45   ` Eric Blake
  2020-02-19 18:51   ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 11+ messages in thread
From: Eric Blake @ 2020-02-17 11:45 UTC (permalink / raw)
  To: Hailiang Zhang, qemu-devel; +Cc: danielcho, chen.zhang, dgilbert, quintela

On 2/16/20 7:20 PM, Hailiang Zhang wrote:
> We can migrate some dirty pages during the gap of checkpointing,
> by this way, we can reduce the amount of ram migrated during checkpointing.
> 
> Signed-off-by: Hailiang Zhang <zhang.zhanghailiang@huawei.com>
> ---

> +++ b/qapi/migration.json
> @@ -977,12 +977,14 @@
>   #
>   # @vmstate-loaded: VM's state has been loaded by SVM.
>   #
> +# @migrate-ram-background: Send some dirty pages during the gap of COLO checkpoint
> +#

Missing a '(since 5.0)' marker.

>   # Since: 2.8
>   ##
>   { 'enum': 'COLOMessage',
>     'data': [ 'checkpoint-ready', 'checkpoint-request', 'checkpoint-reply',
>               'vmstate-send', 'vmstate-size', 'vmstate-received',
> -            'vmstate-loaded' ] }
> +            'vmstate-loaded', 'migrate-ram-background' ] }
>   
>   ##
>   # @COLOMode:
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] migration/colo: wrap incoming checkpoint process into new helper
  2020-02-17  1:20 ` [PATCH 1/3] migration/colo: wrap incoming checkpoint process into new helper Hailiang Zhang
@ 2020-02-19 18:24   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 11+ messages in thread
From: Dr. David Alan Gilbert @ 2020-02-19 18:24 UTC (permalink / raw)
  To: Hailiang Zhang; +Cc: danielcho, chen.zhang, qemu-devel, quintela

* Hailiang Zhang (zhang.zhanghailiang@huawei.com) wrote:
> Split checkpoint incoming process into a helper.
> 
> Signed-off-by: Hailiang Zhang <zhang.zhanghailiang@huawei.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/colo.c | 260 ++++++++++++++++++++++++-----------------------
>  1 file changed, 133 insertions(+), 127 deletions(-)
> 
> diff --git a/migration/colo.c b/migration/colo.c
> index 2c88aa57a2..93c5a452fb 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -664,13 +664,138 @@ void migrate_start_colo_process(MigrationState *s)
>      qemu_mutex_lock_iothread();
>  }
>  
> -static void colo_wait_handle_message(QEMUFile *f, int *checkpoint_request,
> -                                     Error **errp)
> +static void colo_incoming_process_checkpoint(MigrationIncomingState *mis,
> +                      QEMUFile *fb, QIOChannelBuffer *bioc, Error **errp)
> +{
> +    uint64_t total_size;
> +    uint64_t value;
> +    Error *local_err = NULL;
> +    int ret;
> +
> +    qemu_mutex_lock_iothread();
> +    vm_stop_force_state(RUN_STATE_COLO);
> +    trace_colo_vm_state_change("run", "stop");
> +    qemu_mutex_unlock_iothread();
> +
> +    /* FIXME: This is unnecessary for periodic checkpoint mode */
> +    colo_send_message(mis->to_src_file, COLO_MESSAGE_CHECKPOINT_REPLY,
> +                 &local_err);
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        return;
> +    }
> +
> +    colo_receive_check_message(mis->from_src_file,
> +                       COLO_MESSAGE_VMSTATE_SEND, &local_err);
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        return;
> +    }
> +
> +    qemu_mutex_lock_iothread();
> +    cpu_synchronize_all_pre_loadvm();
> +    ret = qemu_loadvm_state_main(mis->from_src_file, mis);
> +    qemu_mutex_unlock_iothread();
> +
> +    if (ret < 0) {
> +        error_setg(errp, "Load VM's live state (ram) error");
> +        return;
> +    }
> +
> +    value = colo_receive_message_value(mis->from_src_file,
> +                             COLO_MESSAGE_VMSTATE_SIZE, &local_err);
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        return;
> +    }
> +
> +    /*
> +     * Read VM device state data into channel buffer,
> +     * It's better to re-use the memory allocated.
> +     * Here we need to handle the channel buffer directly.
> +     */
> +    if (value > bioc->capacity) {
> +        bioc->capacity = value;
> +        bioc->data = g_realloc(bioc->data, bioc->capacity);
> +    }
> +    total_size = qemu_get_buffer(mis->from_src_file, bioc->data, value);
> +    if (total_size != value) {
> +        error_setg(errp, "Got %" PRIu64 " VMState data, less than expected"
> +                    " %" PRIu64, total_size, value);
> +        return;
> +    }
> +    bioc->usage = total_size;
> +    qio_channel_io_seek(QIO_CHANNEL(bioc), 0, 0, NULL);
> +
> +    colo_send_message(mis->to_src_file, COLO_MESSAGE_VMSTATE_RECEIVED,
> +                 &local_err);
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        return;
> +    }
> +
> +    qemu_mutex_lock_iothread();
> +    vmstate_loading = true;
> +    ret = qemu_load_device_state(fb);
> +    if (ret < 0) {
> +        error_setg(errp, "COLO: load device state failed");
> +        qemu_mutex_unlock_iothread();
> +        return;
> +    }
> +
> +#ifdef CONFIG_REPLICATION
> +    replication_get_error_all(&local_err);
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        qemu_mutex_unlock_iothread();
> +        return;
> +    }
> +
> +    /* discard colo disk buffer */
> +    replication_do_checkpoint_all(&local_err);
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        qemu_mutex_unlock_iothread();
> +        return;
> +    }
> +#else
> +    abort();
> +#endif
> +    /* Notify all filters of all NIC to do checkpoint */
> +    colo_notify_filters_event(COLO_EVENT_CHECKPOINT, &local_err);
> +
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        qemu_mutex_unlock_iothread();
> +        return;
> +    }
> +
> +    vmstate_loading = false;
> +    vm_start();
> +    trace_colo_vm_state_change("stop", "run");
> +    qemu_mutex_unlock_iothread();
> +
> +    if (failover_get_state() == FAILOVER_STATUS_RELAUNCH) {
> +        failover_set_state(FAILOVER_STATUS_RELAUNCH,
> +                        FAILOVER_STATUS_NONE);
> +        failover_request_active(NULL);
> +        return;
> +    }
> +
> +    colo_send_message(mis->to_src_file, COLO_MESSAGE_VMSTATE_LOADED,
> +                 &local_err);
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +    }
> +}
> +
> +static void colo_wait_handle_message(MigrationIncomingState *mis,
> +                QEMUFile *fb, QIOChannelBuffer *bioc, Error **errp)
>  {
>      COLOMessage msg;
>      Error *local_err = NULL;
>  
> -    msg = colo_receive_message(f, &local_err);
> +    msg = colo_receive_message(mis->from_src_file, &local_err);
>      if (local_err) {
>          error_propagate(errp, local_err);
>          return;
> @@ -678,10 +803,9 @@ static void colo_wait_handle_message(QEMUFile *f, int *checkpoint_request,
>  
>      switch (msg) {
>      case COLO_MESSAGE_CHECKPOINT_REQUEST:
> -        *checkpoint_request = 1;
> +        colo_incoming_process_checkpoint(mis, fb, bioc, errp);
>          break;
>      default:
> -        *checkpoint_request = 0;
>          error_setg(errp, "Got unknown COLO message: %d", msg);
>          break;
>      }
> @@ -692,10 +816,7 @@ void *colo_process_incoming_thread(void *opaque)
>      MigrationIncomingState *mis = opaque;
>      QEMUFile *fb = NULL;
>      QIOChannelBuffer *bioc = NULL; /* Cache incoming device state */
> -    uint64_t total_size;
> -    uint64_t value;
>      Error *local_err = NULL;
> -    int ret;
>  
>      rcu_register_thread();
>      qemu_sem_init(&mis->colo_incoming_sem, 0);
> @@ -749,134 +870,19 @@ void *colo_process_incoming_thread(void *opaque)
>      }
>  
>      while (mis->state == MIGRATION_STATUS_COLO) {
> -        int request = 0;
> -
> -        colo_wait_handle_message(mis->from_src_file, &request, &local_err);
> +        colo_wait_handle_message(mis, fb, bioc, &local_err);
>          if (local_err) {
> -            goto out;
> +            error_report_err(local_err);
> +            break;
>          }
> -        assert(request);
>          if (failover_get_state() != FAILOVER_STATUS_NONE) {
>              error_report("failover request");
> -            goto out;
> -        }
> -
> -        qemu_mutex_lock_iothread();
> -        vm_stop_force_state(RUN_STATE_COLO);
> -        trace_colo_vm_state_change("run", "stop");
> -        qemu_mutex_unlock_iothread();
> -
> -        /* FIXME: This is unnecessary for periodic checkpoint mode */
> -        colo_send_message(mis->to_src_file, COLO_MESSAGE_CHECKPOINT_REPLY,
> -                     &local_err);
> -        if (local_err) {
> -            goto out;
> -        }
> -
> -        colo_receive_check_message(mis->from_src_file,
> -                           COLO_MESSAGE_VMSTATE_SEND, &local_err);
> -        if (local_err) {
> -            goto out;
> -        }
> -
> -        qemu_mutex_lock_iothread();
> -        cpu_synchronize_all_pre_loadvm();
> -        ret = qemu_loadvm_state_main(mis->from_src_file, mis);
> -        qemu_mutex_unlock_iothread();
> -
> -        if (ret < 0) {
> -            error_report("Load VM's live state (ram) error");
> -            goto out;
> -        }
> -
> -        value = colo_receive_message_value(mis->from_src_file,
> -                                 COLO_MESSAGE_VMSTATE_SIZE, &local_err);
> -        if (local_err) {
> -            goto out;
> -        }
> -
> -        /*
> -         * Read VM device state data into channel buffer,
> -         * It's better to re-use the memory allocated.
> -         * Here we need to handle the channel buffer directly.
> -         */
> -        if (value > bioc->capacity) {
> -            bioc->capacity = value;
> -            bioc->data = g_realloc(bioc->data, bioc->capacity);
> -        }
> -        total_size = qemu_get_buffer(mis->from_src_file, bioc->data, value);
> -        if (total_size != value) {
> -            error_report("Got %" PRIu64 " VMState data, less than expected"
> -                        " %" PRIu64, total_size, value);
> -            goto out;
> -        }
> -        bioc->usage = total_size;
> -        qio_channel_io_seek(QIO_CHANNEL(bioc), 0, 0, NULL);
> -
> -        colo_send_message(mis->to_src_file, COLO_MESSAGE_VMSTATE_RECEIVED,
> -                     &local_err);
> -        if (local_err) {
> -            goto out;
> -        }
> -
> -        qemu_mutex_lock_iothread();
> -        vmstate_loading = true;
> -        ret = qemu_load_device_state(fb);
> -        if (ret < 0) {
> -            error_report("COLO: load device state failed");
> -            qemu_mutex_unlock_iothread();
> -            goto out;
> -        }
> -
> -#ifdef CONFIG_REPLICATION
> -        replication_get_error_all(&local_err);
> -        if (local_err) {
> -            qemu_mutex_unlock_iothread();
> -            goto out;
> -        }
> -
> -        /* discard colo disk buffer */
> -        replication_do_checkpoint_all(&local_err);
> -        if (local_err) {
> -            qemu_mutex_unlock_iothread();
> -            goto out;
> -        }
> -#else
> -        abort();
> -#endif
> -        /* Notify all filters of all NIC to do checkpoint */
> -        colo_notify_filters_event(COLO_EVENT_CHECKPOINT, &local_err);
> -
> -        if (local_err) {
> -            qemu_mutex_unlock_iothread();
> -            goto out;
> -        }
> -
> -        vmstate_loading = false;
> -        vm_start();
> -        trace_colo_vm_state_change("stop", "run");
> -        qemu_mutex_unlock_iothread();
> -
> -        if (failover_get_state() == FAILOVER_STATUS_RELAUNCH) {
> -            failover_set_state(FAILOVER_STATUS_RELAUNCH,
> -                            FAILOVER_STATUS_NONE);
> -            failover_request_active(NULL);
> -            goto out;
> -        }
> -
> -        colo_send_message(mis->to_src_file, COLO_MESSAGE_VMSTATE_LOADED,
> -                     &local_err);
> -        if (local_err) {
> -            goto out;
> +            break;
>          }
>      }
>  
>  out:
>      vmstate_loading = false;
> -    /* Throw the unreported error message after exited from loop */
> -    if (local_err) {
> -        error_report_err(local_err);
> -    }
>  
>      /*
>       * There are only two reasons we can get here, some error happened
> -- 
> 2.21.0
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] COLO: Migrate dirty pages during the gap of checkpointing
  2020-02-17  1:20 ` [PATCH 2/3] COLO: Migrate dirty pages during the gap of checkpointing Hailiang Zhang
  2020-02-17 11:45   ` Eric Blake
@ 2020-02-19 18:51   ` Dr. David Alan Gilbert
  2020-02-24  4:06     ` Zhanghailiang
  1 sibling, 1 reply; 11+ messages in thread
From: Dr. David Alan Gilbert @ 2020-02-19 18:51 UTC (permalink / raw)
  To: Hailiang Zhang; +Cc: danielcho, chen.zhang, qemu-devel, quintela

* Hailiang Zhang (zhang.zhanghailiang@huawei.com) wrote:
> We can migrate some dirty pages during the gap of checkpointing,
> by this way, we can reduce the amount of ram migrated during checkpointing.
> 
> Signed-off-by: Hailiang Zhang <zhang.zhanghailiang@huawei.com>
> ---
>  migration/colo.c       | 69 +++++++++++++++++++++++++++++++++++++++---
>  migration/migration.h  |  1 +
>  migration/trace-events |  1 +
>  qapi/migration.json    |  4 ++-
>  4 files changed, 70 insertions(+), 5 deletions(-)
> 
> diff --git a/migration/colo.c b/migration/colo.c
> index 93c5a452fb..d30c6bc4ad 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -46,6 +46,13 @@ static COLOMode last_colo_mode;
>  
>  #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
>  
> +#define DEFAULT_RAM_PENDING_CHECK 1000
> +
> +/* should be calculated by bandwidth and max downtime ? */
> +#define THRESHOLD_PENDING_SIZE (100 * 1024 * 1024UL)

Turn both of these magic constants into parameters.

> +static int checkpoint_request;
> +
>  bool migration_in_colo_state(void)
>  {
>      MigrationState *s = migrate_get_current();
> @@ -516,6 +523,20 @@ static void colo_compare_notify_checkpoint(Notifier *notifier, void *data)
>      colo_checkpoint_notify(data);
>  }
>  
> +static bool colo_need_migrate_ram_background(MigrationState *s)
> +{
> +    uint64_t pending_size, pend_pre, pend_compat, pend_post;
> +    int64_t max_size = THRESHOLD_PENDING_SIZE;
> +
> +    qemu_savevm_state_pending(s->to_dst_file, max_size, &pend_pre,
> +                              &pend_compat, &pend_post);
> +    pending_size = pend_pre + pend_compat + pend_post;
> +
> +    trace_colo_need_migrate_ram_background(pending_size);
> +    return (pending_size >= max_size);
> +}
> +
> +
>  static void colo_process_checkpoint(MigrationState *s)
>  {
>      QIOChannelBuffer *bioc;
> @@ -571,6 +592,8 @@ static void colo_process_checkpoint(MigrationState *s)
>  
>      timer_mod(s->colo_delay_timer,
>              current_time + s->parameters.x_checkpoint_delay);
> +    timer_mod(s->pending_ram_check_timer,
> +        current_time + DEFAULT_RAM_PENDING_CHECK);

What happens if the iterate takes a while and this triggers in the
middle of the iterate?

>      while (s->state == MIGRATION_STATUS_COLO) {
>          if (failover_get_state() != FAILOVER_STATUS_NONE) {
> @@ -583,10 +606,25 @@ static void colo_process_checkpoint(MigrationState *s)
>          if (s->state != MIGRATION_STATUS_COLO) {
>              goto out;
>          }
> -        ret = colo_do_checkpoint_transaction(s, bioc, fb);
> -        if (ret < 0) {
> -            goto out;
> -        }
> +        if (atomic_xchg(&checkpoint_request, 0)) {
> +            /* start a colo checkpoint */
> +            ret = colo_do_checkpoint_transaction(s, bioc, fb);
> +            if (ret < 0) {
> +                goto out;
> +            }
> +        } else {
> +            if (colo_need_migrate_ram_background(s)) {
> +                colo_send_message(s->to_dst_file,
> +                                  COLO_MESSAGE_MIGRATE_RAM_BACKGROUND,
> +                                  &local_err);
> +                if (local_err) {
> +                    goto out;
> +                }
> +
> +                qemu_savevm_state_iterate(s->to_dst_file, false);
> +                qemu_put_byte(s->to_dst_file, QEMU_VM_EOF);

Maybe you should do a qemu_file_get_error(..) at this point to check
it's OK.

> +            }
> +         }
>      }
>  
>  out:
> @@ -626,6 +664,8 @@ out:
>      colo_compare_unregister_notifier(&packets_compare_notifier);
>      timer_del(s->colo_delay_timer);
>      timer_free(s->colo_delay_timer);
> +    timer_del(s->pending_ram_check_timer);
> +    timer_free(s->pending_ram_check_timer);
>      qemu_sem_destroy(&s->colo_checkpoint_sem);
>  
>      /*
> @@ -643,6 +683,7 @@ void colo_checkpoint_notify(void *opaque)
>      MigrationState *s = opaque;
>      int64_t next_notify_time;
>  
> +    atomic_inc(&checkpoint_request);

Can you explain what you've changed about this atomic in this patch,
I don't quite see what you're doing.

>      qemu_sem_post(&s->colo_checkpoint_sem);
>      s->colo_checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
>      next_notify_time = s->colo_checkpoint_time +
> @@ -650,6 +691,19 @@ void colo_checkpoint_notify(void *opaque)
>      timer_mod(s->colo_delay_timer, next_notify_time);
>  }
>  
> +static void colo_pending_ram_check_notify(void *opaque)
> +{
> +    int64_t next_notify_time;
> +    MigrationState *s = opaque;
> +
> +    if (migration_in_colo_state()) {
> +        next_notify_time = DEFAULT_RAM_PENDING_CHECK +
> +                           qemu_clock_get_ms(QEMU_CLOCK_HOST);
> +        timer_mod(s->pending_ram_check_timer, next_notify_time);
> +        qemu_sem_post(&s->colo_checkpoint_sem);
> +    }
> +}
> +
>  void migrate_start_colo_process(MigrationState *s)
>  {
>      qemu_mutex_unlock_iothread();
> @@ -657,6 +711,8 @@ void migrate_start_colo_process(MigrationState *s)
>      s->colo_delay_timer =  timer_new_ms(QEMU_CLOCK_HOST,
>                                  colo_checkpoint_notify, s);
>  
> +    s->pending_ram_check_timer = timer_new_ms(QEMU_CLOCK_HOST,
> +                                colo_pending_ram_check_notify, s);
>      qemu_sem_init(&s->colo_exit_sem, 0);
>      migrate_set_state(&s->state, MIGRATION_STATUS_ACTIVE,
>                        MIGRATION_STATUS_COLO);
> @@ -805,6 +861,11 @@ static void colo_wait_handle_message(MigrationIncomingState *mis,
>      case COLO_MESSAGE_CHECKPOINT_REQUEST:
>          colo_incoming_process_checkpoint(mis, fb, bioc, errp);
>          break;
> +    case COLO_MESSAGE_MIGRATE_RAM_BACKGROUND:
> +        if (qemu_loadvm_state_main(mis->from_src_file, mis) < 0) {
> +            error_setg(errp, "Load ram background failed");
> +        }

OK, that's a little dangerou, in that it relies on the source being good
and only sending iterative data;  stuff would break badly if it actually
sent you devices.

> +        break;
>      default:
>          error_setg(errp, "Got unknown COLO message: %d", msg);
>          break;
> diff --git a/migration/migration.h b/migration/migration.h
> index 8473ddfc88..5355259789 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -219,6 +219,7 @@ struct MigrationState
>      QemuSemaphore colo_checkpoint_sem;
>      int64_t colo_checkpoint_time;
>      QEMUTimer *colo_delay_timer;
> +    QEMUTimer *pending_ram_check_timer;
>  
>      /* The first error that has occurred.
>         We used the mutex to be able to return the 1st error message */
> diff --git a/migration/trace-events b/migration/trace-events
> index 4ab0a503d2..f2ed0c8645 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -295,6 +295,7 @@ migration_tls_incoming_handshake_complete(void) ""
>  colo_vm_state_change(const char *old, const char *new) "Change '%s' => '%s'"
>  colo_send_message(const char *msg) "Send '%s' message"
>  colo_receive_message(const char *msg) "Receive '%s' message"
> +colo_need_migrate_ram_background(uint64_t pending_size) "Pending 0x%" PRIx64 " dirty ram"
>  
>  # colo-failover.c
>  colo_failover_set_state(const char *new_state) "new state %s"
> diff --git a/qapi/migration.json b/qapi/migration.json
> index b7348d0c8b..ff7a4f18b0 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -977,12 +977,14 @@
>  #
>  # @vmstate-loaded: VM's state has been loaded by SVM.
>  #
> +# @migrate-ram-background: Send some dirty pages during the gap of COLO checkpoint
> +#
>  # Since: 2.8
>  ##
>  { 'enum': 'COLOMessage',
>    'data': [ 'checkpoint-ready', 'checkpoint-request', 'checkpoint-reply',
>              'vmstate-send', 'vmstate-size', 'vmstate-received',
> -            'vmstate-loaded' ] }
> +            'vmstate-loaded', 'migrate-ram-background' ] }
>  
>  ##
>  # @COLOMode:
> -- 
> 2.21.0
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] COLO: Optimize memory back-up process
  2020-02-17  1:20 ` [PATCH 3/3] COLO: Optimize memory back-up process Hailiang Zhang
@ 2020-02-20 18:24   ` Dr. David Alan Gilbert
  2020-02-24  4:10     ` Zhanghailiang
  0 siblings, 1 reply; 11+ messages in thread
From: Dr. David Alan Gilbert @ 2020-02-20 18:24 UTC (permalink / raw)
  To: Hailiang Zhang; +Cc: danielcho, chen.zhang, qemu-devel, quintela

* Hailiang Zhang (zhang.zhanghailiang@huawei.com) wrote:
> This patch will reduce the downtime of VM for the initial process,
> Privously, we copied all these memory in preparing stage of COLO
> while we need to stop VM, which is a time-consuming process.
> Here we optimize it by a trick, back-up every page while in migration
> process while COLO is enabled, though it affects the speed of the
> migration, but it obviously reduce the downtime of back-up all SVM'S
> memory in COLO preparing stage.
> 
> Signed-off-by: Hailiang Zhang <zhang.zhanghailiang@huawei.com>

OK, I think this is right, but it took me quite a while to understand,
I think one of the comments below might not be right:

> ---
>  migration/colo.c |  3 +++
>  migration/ram.c  | 35 +++++++++++++++++++++++++++--------
>  migration/ram.h  |  1 +
>  3 files changed, 31 insertions(+), 8 deletions(-)
> 
> diff --git a/migration/colo.c b/migration/colo.c
> index d30c6bc4ad..febf010571 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -26,6 +26,7 @@
>  #include "qemu/main-loop.h"
>  #include "qemu/rcu.h"
>  #include "migration/failover.h"
> +#include "migration/ram.h"
>  #ifdef CONFIG_REPLICATION
>  #include "replication.h"
>  #endif
> @@ -906,6 +907,8 @@ void *colo_process_incoming_thread(void *opaque)
>       */
>      qemu_file_set_blocking(mis->from_src_file, true);
>  
> +    colo_incoming_start_dirty_log();
> +
>      bioc = qio_channel_buffer_new(COLO_BUFFER_BASE_SIZE);
>      fb = qemu_fopen_channel_input(QIO_CHANNEL(bioc));
>      object_unref(OBJECT(bioc));
> diff --git a/migration/ram.c b/migration/ram.c
> index ed23ed1c7c..24a8aa3527 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -2986,7 +2986,6 @@ int colo_init_ram_cache(void)
>                  }
>                  return -errno;
>              }
> -            memcpy(block->colo_cache, block->host, block->used_length);
>          }
>      }
>  
> @@ -3005,12 +3004,16 @@ int colo_init_ram_cache(void)
>              bitmap_set(block->bmap, 0, pages);
>          }
>      }
> +
> +    return 0;
> +}
> +
> +void colo_incoming_start_dirty_log(void)
> +{
>      ram_state = g_new0(RAMState, 1);
>      ram_state->migration_dirty_pages = 0;
>      qemu_mutex_init(&ram_state->bitmap_mutex);
>      memory_global_dirty_log_start();
> -
> -    return 0;
>  }
>  
>  /* It is need to hold the global lock to call this helper */
> @@ -3348,7 +3351,7 @@ static int ram_load_precopy(QEMUFile *f)
>  
>      while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) {
>          ram_addr_t addr, total_ram_bytes;
> -        void *host = NULL;
> +        void *host = NULL, *host_bak = NULL;
>          uint8_t ch;
>  
>          /*
> @@ -3378,13 +3381,26 @@ static int ram_load_precopy(QEMUFile *f)
>          if (flags & (RAM_SAVE_FLAG_ZERO | RAM_SAVE_FLAG_PAGE |
>                       RAM_SAVE_FLAG_COMPRESS_PAGE | RAM_SAVE_FLAG_XBZRLE)) {
>              RAMBlock *block = ram_block_from_stream(f, flags);
> -
>              /*
> -             * After going into COLO, we should load the Page into colo_cache.
> +             * After going into COLO, we should load the Page into colo_cache
> +             * NOTE: We need to keep a copy of SVM's ram in colo_cache.
> +             * Privously, we copied all these memory in preparing stage of COLO
> +             * while we need to stop VM, which is a time-consuming process.
> +             * Here we optimize it by a trick, back-up every page while in
> +             * migration process while COLO is enabled, though it affects the
> +             * speed of the migration, but it obviously reduce the downtime of
> +             * back-up all SVM'S memory in COLO preparing stage.
>               */
> -            if (migration_incoming_in_colo_state()) {
> +            if (migration_incoming_colo_enabled()) {
>                  host = colo_cache_from_block_offset(block, addr);
> -            } else {
> +                /*
> +                 * After going into COLO, load the Page into colo_cache.
> +                 */
> +                if (!migration_incoming_in_colo_state()) {
> +                    host_bak = host;
> +                }
> +            }
> +            if (!migration_incoming_in_colo_state()) {
>                  host = host_from_ram_block_offset(block, addr);

So this works out as quite complicated:
   a) In normal migration we do the last one and just set:
         host = host_from_ram_block_offset(block, addr);
         host_bak = NULL

   b) At the start, when colo_enabled, but !in_colo_state
         host = colo_cache
         host_bak = host
         host = host_from_ram_block_offset

   c) in_colo_state
         host = colo_cache
         host_bak = NULL


(b) is pretty confusing, setting host twice; can't we tidy that up?

Also, that last comment 'After going into COLO' I think is really
  'Before COLO state, copy from ram into cache'

Dave

>              }
>              if (!host) {
> @@ -3506,6 +3522,9 @@ static int ram_load_precopy(QEMUFile *f)
>          if (!ret) {
>              ret = qemu_file_get_error(f);
>          }
> +        if (!ret && host_bak && host) {
> +            memcpy(host_bak, host, TARGET_PAGE_SIZE);
> +        }
>      }
>  
>      ret |= wait_for_decompress_done();
> diff --git a/migration/ram.h b/migration/ram.h
> index a553d40751..5ceaff7cb4 100644
> --- a/migration/ram.h
> +++ b/migration/ram.h
> @@ -66,5 +66,6 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *rb);
>  /* ram cache */
>  int colo_init_ram_cache(void);
>  void colo_release_ram_cache(void);
> +void colo_incoming_start_dirty_log(void);
>  
>  #endif
> -- 
> 2.21.0
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/3] Optimize VM's downtime while do checkpoint in COLO
  2020-02-17  1:20 [PATCH 0/3] Optimize VM's downtime while do checkpoint in COLO Hailiang Zhang
                   ` (2 preceding siblings ...)
  2020-02-17  1:20 ` [PATCH 3/3] COLO: Optimize memory back-up process Hailiang Zhang
@ 2020-02-20 18:27 ` Dr. David Alan Gilbert
  3 siblings, 0 replies; 11+ messages in thread
From: Dr. David Alan Gilbert @ 2020-02-20 18:27 UTC (permalink / raw)
  To: Hailiang Zhang; +Cc: danielcho, chen.zhang, qemu-devel, quintela

* Hailiang Zhang (zhang.zhanghailiang@huawei.com) wrote:
> Hi,
> 
> This is an untested serial that tries to reduce VM's pause time
> while do checkpoint in COLO state.
> 
> The second patch tries to reduce the total number of dirty pages
> while do checkpoint with VM been paused, instead of sending all
> dirty pages while VM been pause, it sends part of dirty pages during
> the gap time of two checkpoints when SVM and PVM are running.
> 
> The third patch tries to reduce the pause time of backup ram into
> cache in secondary part.

This is quite nice; I've asked for a clarification on the last one, but
it's only a tidy up.

I guess this really helps continuous-COLO as well, because it means
that restarting to the sync to the new secondary is a lot simpler?

Dave

> 
> Hailiang Zhang (3):
>   migration/colo: wrap incoming checkpoint process into new helper
>   COLO: Migrate dirty pages during the gap of checkpointing
>   COLO: Optimize memory back-up process
> 
>  migration/colo.c       | 332 +++++++++++++++++++++++++----------------
>  migration/migration.h  |   1 +
>  migration/ram.c        |  35 ++++-
>  migration/ram.h        |   1 +
>  migration/trace-events |   1 +
>  qapi/migration.json    |   4 +-
>  6 files changed, 234 insertions(+), 140 deletions(-)
> 
> -- 
> 2.21.0
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH 2/3] COLO: Migrate dirty pages during the gap of checkpointing
  2020-02-19 18:51   ` Dr. David Alan Gilbert
@ 2020-02-24  4:06     ` Zhanghailiang
  0 siblings, 0 replies; 11+ messages in thread
From: Zhanghailiang @ 2020-02-24  4:06 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: danielcho, chen.zhang, qemu-devel, quintela

> -----Original Message-----
> From: Dr. David Alan Gilbert [mailto:dgilbert@redhat.com]
> Sent: Thursday, February 20, 2020 2:51 AM
> To: Zhanghailiang <zhang.zhanghailiang@huawei.com>
> Cc: qemu-devel@nongnu.org; quintela@redhat.com; chen.zhang@intel.com;
> danielcho@qnap.com
> Subject: Re: [PATCH 2/3] COLO: Migrate dirty pages during the gap of
> checkpointing
> 
> * Hailiang Zhang (zhang.zhanghailiang@huawei.com) wrote:
> > We can migrate some dirty pages during the gap of checkpointing,
> > by this way, we can reduce the amount of ram migrated during
> checkpointing.
> >
> > Signed-off-by: Hailiang Zhang <zhang.zhanghailiang@huawei.com>
> > ---
> >  migration/colo.c       | 69
> +++++++++++++++++++++++++++++++++++++++---
> >  migration/migration.h  |  1 +
> >  migration/trace-events |  1 +
> >  qapi/migration.json    |  4 ++-
> >  4 files changed, 70 insertions(+), 5 deletions(-)
> >
> > diff --git a/migration/colo.c b/migration/colo.c
> > index 93c5a452fb..d30c6bc4ad 100644
> > --- a/migration/colo.c
> > +++ b/migration/colo.c
> > @@ -46,6 +46,13 @@ static COLOMode last_colo_mode;
> >
> >  #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
> >
> > +#define DEFAULT_RAM_PENDING_CHECK 1000
> > +
> > +/* should be calculated by bandwidth and max downtime ? */
> > +#define THRESHOLD_PENDING_SIZE (100 * 1024 * 1024UL)
> 
> Turn both of these magic constants into parameters.
> 

Good idea, will do this in later patches.

> > +static int checkpoint_request;
> > +
> >  bool migration_in_colo_state(void)
> >  {
> >      MigrationState *s = migrate_get_current();
> > @@ -516,6 +523,20 @@ static void
> colo_compare_notify_checkpoint(Notifier *notifier, void *data)
> >      colo_checkpoint_notify(data);
> >  }
> >
> > +static bool colo_need_migrate_ram_background(MigrationState *s)
> > +{
> > +    uint64_t pending_size, pend_pre, pend_compat, pend_post;
> > +    int64_t max_size = THRESHOLD_PENDING_SIZE;
> > +
> > +    qemu_savevm_state_pending(s->to_dst_file, max_size, &pend_pre,
> > +                              &pend_compat, &pend_post);
> > +    pending_size = pend_pre + pend_compat + pend_post;
> > +
> > +    trace_colo_need_migrate_ram_background(pending_size);
> > +    return (pending_size >= max_size);
> > +}
> > +
> > +
> >  static void colo_process_checkpoint(MigrationState *s)
> >  {
> >      QIOChannelBuffer *bioc;
> > @@ -571,6 +592,8 @@ static void
> colo_process_checkpoint(MigrationState *s)
> >
> >      timer_mod(s->colo_delay_timer,
> >              current_time + s->parameters.x_checkpoint_delay);
> > +    timer_mod(s->pending_ram_check_timer,
> > +        current_time + DEFAULT_RAM_PENDING_CHECK);
> 
> What happens if the iterate takes a while and this triggers in the
> middle of the iterate?
> 

It will trigger another iterate after this one been finished.

> >      while (s->state == MIGRATION_STATUS_COLO) {
> >          if (failover_get_state() != FAILOVER_STATUS_NONE) {
> > @@ -583,10 +606,25 @@ static void
> colo_process_checkpoint(MigrationState *s)
> >          if (s->state != MIGRATION_STATUS_COLO) {
> >              goto out;
> >          }
> > -        ret = colo_do_checkpoint_transaction(s, bioc, fb);
> > -        if (ret < 0) {
> > -            goto out;
> > -        }
> > +        if (atomic_xchg(&checkpoint_request, 0)) {
> > +            /* start a colo checkpoint */
> > +            ret = colo_do_checkpoint_transaction(s, bioc, fb);
> > +            if (ret < 0) {
> > +                goto out;
> > +            }
> > +        } else {
> > +            if (colo_need_migrate_ram_background(s)) {
> > +                colo_send_message(s->to_dst_file,
> > +
> COLO_MESSAGE_MIGRATE_RAM_BACKGROUND,
> > +                                  &local_err);
> > +                if (local_err) {
> > +                    goto out;
> > +                }
> > +
> > +                qemu_savevm_state_iterate(s->to_dst_file, false);
> > +                qemu_put_byte(s->to_dst_file, QEMU_VM_EOF);
> 
> Maybe you should do a qemu_file_get_error(..) at this point to check
> it's OK.

Agreed, we should check it.

> 
> > +            }
> > +         }
> >      }
> >
> >  out:
> > @@ -626,6 +664,8 @@ out:
> >      colo_compare_unregister_notifier(&packets_compare_notifier);
> >      timer_del(s->colo_delay_timer);
> >      timer_free(s->colo_delay_timer);
> > +    timer_del(s->pending_ram_check_timer);
> > +    timer_free(s->pending_ram_check_timer);
> >      qemu_sem_destroy(&s->colo_checkpoint_sem);
> >
> >      /*
> > @@ -643,6 +683,7 @@ void colo_checkpoint_notify(void *opaque)
> >      MigrationState *s = opaque;
> >      int64_t next_notify_time;
> >
> > +    atomic_inc(&checkpoint_request);
> 
> Can you explain what you've changed about this atomic in this patch,
> I don't quite see what you're doing.
> 

We use this to check who waked it from waiting for colo_checkpoint_sem,
By background migration request or checkpoint request.
If the value is zero, it is waked by background migration request, or it is waked
By checkpoint request.


> >      qemu_sem_post(&s->colo_checkpoint_sem);
> >      s->colo_checkpoint_time =
> qemu_clock_get_ms(QEMU_CLOCK_HOST);
> >      next_notify_time = s->colo_checkpoint_time +
> > @@ -650,6 +691,19 @@ void colo_checkpoint_notify(void *opaque)
> >      timer_mod(s->colo_delay_timer, next_notify_time);
> >  }
> >
> > +static void colo_pending_ram_check_notify(void *opaque)
> > +{
> > +    int64_t next_notify_time;
> > +    MigrationState *s = opaque;
> > +
> > +    if (migration_in_colo_state()) {
> > +        next_notify_time = DEFAULT_RAM_PENDING_CHECK +
> > +
> qemu_clock_get_ms(QEMU_CLOCK_HOST);
> > +        timer_mod(s->pending_ram_check_timer, next_notify_time);
> > +        qemu_sem_post(&s->colo_checkpoint_sem);
> > +    }
> > +}
> > +
> >  void migrate_start_colo_process(MigrationState *s)
> >  {
> >      qemu_mutex_unlock_iothread();
> > @@ -657,6 +711,8 @@ void migrate_start_colo_process(MigrationState
> *s)
> >      s->colo_delay_timer =  timer_new_ms(QEMU_CLOCK_HOST,
> >                                  colo_checkpoint_notify, s);
> >
> > +    s->pending_ram_check_timer =
> timer_new_ms(QEMU_CLOCK_HOST,
> > +                                colo_pending_ram_check_notify, s);
> >      qemu_sem_init(&s->colo_exit_sem, 0);
> >      migrate_set_state(&s->state, MIGRATION_STATUS_ACTIVE,
> >                        MIGRATION_STATUS_COLO);
> > @@ -805,6 +861,11 @@ static void
> colo_wait_handle_message(MigrationIncomingState *mis,
> >      case COLO_MESSAGE_CHECKPOINT_REQUEST:
> >          colo_incoming_process_checkpoint(mis, fb, bioc, errp);
> >          break;
> > +    case COLO_MESSAGE_MIGRATE_RAM_BACKGROUND:
> > +        if (qemu_loadvm_state_main(mis->from_src_file, mis) < 0) {
> > +            error_setg(errp, "Load ram background failed");
> > +        }
> 
> OK, that's a little dangerou, in that it relies on the source being good
> and only sending iterative data;  stuff would break badly if it actually
> sent you devices.
> 

You are right, we should check it, besides we should have a timeout mechanism to
avoid been stuck here. 

> > +        break;
> >      default:
> >          error_setg(errp, "Got unknown COLO message: %d", msg);
> >          break;
> > diff --git a/migration/migration.h b/migration/migration.h
> > index 8473ddfc88..5355259789 100644
> > --- a/migration/migration.h
> > +++ b/migration/migration.h
> > @@ -219,6 +219,7 @@ struct MigrationState
> >      QemuSemaphore colo_checkpoint_sem;
> >      int64_t colo_checkpoint_time;
> >      QEMUTimer *colo_delay_timer;
> > +    QEMUTimer *pending_ram_check_timer;
> >
> >      /* The first error that has occurred.
> >         We used the mutex to be able to return the 1st error message */
> > diff --git a/migration/trace-events b/migration/trace-events
> > index 4ab0a503d2..f2ed0c8645 100644
> > --- a/migration/trace-events
> > +++ b/migration/trace-events
> > @@ -295,6 +295,7 @@
> migration_tls_incoming_handshake_complete(void) ""
> >  colo_vm_state_change(const char *old, const char *new) "Change '%s' =>
> '%s'"
> >  colo_send_message(const char *msg) "Send '%s' message"
> >  colo_receive_message(const char *msg) "Receive '%s' message"
> > +colo_need_migrate_ram_background(uint64_t pending_size) "Pending
> 0x%" PRIx64 " dirty ram"
> >
> >  # colo-failover.c
> >  colo_failover_set_state(const char *new_state) "new state %s"
> > diff --git a/qapi/migration.json b/qapi/migration.json
> > index b7348d0c8b..ff7a4f18b0 100644
> > --- a/qapi/migration.json
> > +++ b/qapi/migration.json
> > @@ -977,12 +977,14 @@
> >  #
> >  # @vmstate-loaded: VM's state has been loaded by SVM.
> >  #
> > +# @migrate-ram-background: Send some dirty pages during the gap of
> COLO checkpoint
> > +#
> >  # Since: 2.8
> >  ##
> >  { 'enum': 'COLOMessage',
> >    'data': [ 'checkpoint-ready', 'checkpoint-request', 'checkpoint-reply',
> >              'vmstate-send', 'vmstate-size', 'vmstate-received',
> > -            'vmstate-loaded' ] }
> > +            'vmstate-loaded', 'migrate-ram-background' ] }
> >
> >  ##
> >  # @COLOMode:
> > --
> > 2.21.0
> >
> >
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH 3/3] COLO: Optimize memory back-up process
  2020-02-20 18:24   ` Dr. David Alan Gilbert
@ 2020-02-24  4:10     ` Zhanghailiang
  0 siblings, 0 replies; 11+ messages in thread
From: Zhanghailiang @ 2020-02-24  4:10 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: danielcho, chen.zhang, qemu-devel, quintela

Hi Dave,

> -----Original Message-----
> From: Dr. David Alan Gilbert [mailto:dgilbert@redhat.com]
> Sent: Friday, February 21, 2020 2:25 AM
> To: Zhanghailiang <zhang.zhanghailiang@huawei.com>
> Cc: qemu-devel@nongnu.org; quintela@redhat.com; chen.zhang@intel.com;
> danielcho@qnap.com
> Subject: Re: [PATCH 3/3] COLO: Optimize memory back-up process
> 
> * Hailiang Zhang (zhang.zhanghailiang@huawei.com) wrote:
> > This patch will reduce the downtime of VM for the initial process,
> > Privously, we copied all these memory in preparing stage of COLO
> > while we need to stop VM, which is a time-consuming process.
> > Here we optimize it by a trick, back-up every page while in migration
> > process while COLO is enabled, though it affects the speed of the
> > migration, but it obviously reduce the downtime of back-up all SVM'S
> > memory in COLO preparing stage.
> >
> > Signed-off-by: Hailiang Zhang <zhang.zhanghailiang@huawei.com>
> 
> OK, I think this is right, but it took me quite a while to understand,
> I think one of the comments below might not be right:
> 

> > ---
> >  migration/colo.c |  3 +++
> >  migration/ram.c  | 35 +++++++++++++++++++++++++++--------
> >  migration/ram.h  |  1 +
> >  3 files changed, 31 insertions(+), 8 deletions(-)
> >
> > diff --git a/migration/colo.c b/migration/colo.c
> > index d30c6bc4ad..febf010571 100644
> > --- a/migration/colo.c
> > +++ b/migration/colo.c
> > @@ -26,6 +26,7 @@
> >  #include "qemu/main-loop.h"
> >  #include "qemu/rcu.h"
> >  #include "migration/failover.h"
> > +#include "migration/ram.h"
> >  #ifdef CONFIG_REPLICATION
> >  #include "replication.h"
> >  #endif
> > @@ -906,6 +907,8 @@ void *colo_process_incoming_thread(void
> *opaque)
> >       */
> >      qemu_file_set_blocking(mis->from_src_file, true);
> >
> > +    colo_incoming_start_dirty_log();
> > +
> >      bioc = qio_channel_buffer_new(COLO_BUFFER_BASE_SIZE);
> >      fb = qemu_fopen_channel_input(QIO_CHANNEL(bioc));
> >      object_unref(OBJECT(bioc));
> > diff --git a/migration/ram.c b/migration/ram.c
> > index ed23ed1c7c..24a8aa3527 100644
> > --- a/migration/ram.c
> > +++ b/migration/ram.c
> > @@ -2986,7 +2986,6 @@ int colo_init_ram_cache(void)
> >                  }
> >                  return -errno;
> >              }
> > -            memcpy(block->colo_cache, block->host,
> block->used_length);
> >          }
> >      }
> >
> > @@ -3005,12 +3004,16 @@ int colo_init_ram_cache(void)
> >              bitmap_set(block->bmap, 0, pages);
> >          }
> >      }
> > +
> > +    return 0;
> > +}
> > +
> > +void colo_incoming_start_dirty_log(void)
> > +{
> >      ram_state = g_new0(RAMState, 1);
> >      ram_state->migration_dirty_pages = 0;
> >      qemu_mutex_init(&ram_state->bitmap_mutex);
> >      memory_global_dirty_log_start();
> > -
> > -    return 0;
> >  }
> >
> >  /* It is need to hold the global lock to call this helper */
> > @@ -3348,7 +3351,7 @@ static int ram_load_precopy(QEMUFile *f)
> >
> >      while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) {
> >          ram_addr_t addr, total_ram_bytes;
> > -        void *host = NULL;
> > +        void *host = NULL, *host_bak = NULL;
> >          uint8_t ch;
> >
> >          /*
> > @@ -3378,13 +3381,26 @@ static int ram_load_precopy(QEMUFile *f)
> >          if (flags & (RAM_SAVE_FLAG_ZERO | RAM_SAVE_FLAG_PAGE |
> >                       RAM_SAVE_FLAG_COMPRESS_PAGE |
> RAM_SAVE_FLAG_XBZRLE)) {
> >              RAMBlock *block = ram_block_from_stream(f, flags);
> > -
> >              /*
> > -             * After going into COLO, we should load the Page into
> colo_cache.
> > +             * After going into COLO, we should load the Page into
> colo_cache
> > +             * NOTE: We need to keep a copy of SVM's ram in
> colo_cache.
> > +             * Privously, we copied all these memory in preparing stage
> of COLO
> > +             * while we need to stop VM, which is a time-consuming
> process.
> > +             * Here we optimize it by a trick, back-up every page while
> in
> > +             * migration process while COLO is enabled, though it
> affects the
> > +             * speed of the migration, but it obviously reduce the
> downtime of
> > +             * back-up all SVM'S memory in COLO preparing stage.
> >               */
> > -            if (migration_incoming_in_colo_state()) {
> > +            if (migration_incoming_colo_enabled()) {
> >                  host = colo_cache_from_block_offset(block, addr);
> > -            } else {
> > +                /*
> > +                 * After going into COLO, load the Page into
> colo_cache.
> > +                 */
> > +                if (!migration_incoming_in_colo_state()) {
> > +                    host_bak = host;
> > +                }
> > +            }
> > +            if (!migration_incoming_in_colo_state()) {
> >                  host = host_from_ram_block_offset(block, addr);
> 
> So this works out as quite complicated:
>    a) In normal migration we do the last one and just set:
>          host = host_from_ram_block_offset(block, addr);
>          host_bak = NULL
> 
>    b) At the start, when colo_enabled, but !in_colo_state
>          host = colo_cache
>          host_bak = host
>          host = host_from_ram_block_offset
> 
>    c) in_colo_state
>          host = colo_cache
>          host_bak = NULL
> 
> 
> (b) is pretty confusing, setting host twice; can't we tidy that up?
> 
> Also, that last comment 'After going into COLO' I think is really
>   'Before COLO state, copy from ram into cache'
> 

The code logic here is not good, I have fixed this in V2 like this :)

+            host = host_from_ram_block_offset(block, addr);
             /*
-             * After going into COLO, we should load the Page into colo_cache.
+             * After going into COLO stage, we should not load the page
+             * into SVM's memory diretly, we put them into colo_cache firstly.
+             * NOTE: We need to keep a copy of SVM's ram in colo_cache.
+             * Privously, we copied all these memory in preparing stage of COLO
+             * while we need to stop VM, which is a time-consuming process.
+             * Here we optimize it by a trick, back-up every page while in
+             * migration process while COLO is enabled, though it affects the
+             * speed of the migration, but it obviously reduce the downtime of
+             * back-up all SVM'S memory in COLO preparing stage.
              */
-            if (migration_incoming_in_colo_state()) {
-                host = colo_cache_from_block_offset(block, addr);
-            } else {
-                host = host_from_ram_block_offset(block, addr);
+            if (migration_incoming_colo_enabled()) {
+                if (migration_incoming_in_colo_state()) {
+                    /* In COLO stage, put all pages into cache temporarily */
+                    host = colo_cache_from_block_offset(block, addr);
+                } else {
+                   /*
+                    * In migration stage but before COLO stage,
+                    * Put all pages into both cache and SVM's memory.
+                    */
+                    host_bak = colo_cache_from_block_offset(block, addr);
+                }


Thanks,
Hailiang

> Dave
> 
> >              }
> >              if (!host) {
> > @@ -3506,6 +3522,9 @@ static int ram_load_precopy(QEMUFile *f)
> >          if (!ret) {
> >              ret = qemu_file_get_error(f);
> >          }
> > +        if (!ret && host_bak && host) {
> > +            memcpy(host_bak, host, TARGET_PAGE_SIZE);
> > +        }
> >      }
> >
> >      ret |= wait_for_decompress_done();
> > diff --git a/migration/ram.h b/migration/ram.h
> > index a553d40751..5ceaff7cb4 100644
> > --- a/migration/ram.h
> > +++ b/migration/ram.h
> > @@ -66,5 +66,6 @@ int ram_dirty_bitmap_reload(MigrationState *s,
> RAMBlock *rb);
> >  /* ram cache */
> >  int colo_init_ram_cache(void);
> >  void colo_release_ram_cache(void);
> > +void colo_incoming_start_dirty_log(void);
> >
> >  #endif
> > --
> > 2.21.0
> >
> >
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-02-24  4:11 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-17  1:20 [PATCH 0/3] Optimize VM's downtime while do checkpoint in COLO Hailiang Zhang
2020-02-17  1:20 ` [PATCH 1/3] migration/colo: wrap incoming checkpoint process into new helper Hailiang Zhang
2020-02-19 18:24   ` Dr. David Alan Gilbert
2020-02-17  1:20 ` [PATCH 2/3] COLO: Migrate dirty pages during the gap of checkpointing Hailiang Zhang
2020-02-17 11:45   ` Eric Blake
2020-02-19 18:51   ` Dr. David Alan Gilbert
2020-02-24  4:06     ` Zhanghailiang
2020-02-17  1:20 ` [PATCH 3/3] COLO: Optimize memory back-up process Hailiang Zhang
2020-02-20 18:24   ` Dr. David Alan Gilbert
2020-02-24  4:10     ` Zhanghailiang
2020-02-20 18:27 ` [PATCH 0/3] Optimize VM's downtime while do checkpoint in COLO Dr. David Alan Gilbert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).