[Qemu-devel] [v2 RESEND 0/4] Fix long vm downtime during live migration

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Qemu-devel] [v2 RESEND 0/4] Fix long vm downtime during live migration
@ 2015-11-02  7:36 Liang Li
  2015-11-02  7:37 ` [Qemu-devel] [v2 RESEND 1/4] migration: defer migration_end & blk_mig_cleanup Liang Li
                   ` (6 more replies)
  0 siblings, 7 replies; 13+ messages in thread
From: Liang Li @ 2015-11-02  7:36 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, Liang Li, yong.y.wang, stefanha, amit.shah, pbonzini

The patch 3ea3b7fa9af067982f34b of kvm introduces a lazy collapsing
of small sptes into large sptes mechanism, which intend to solve the
performance drop issue if live migration fails or is canceled. The
rmap will be scanned in the KVM_SET_USER_MEMORY_REGION ioctl context
when dirty logging is stopped so as to drop the small sptes, scanning
the rmap and drop the small sptes is a time consuming operation which
will take dozens of milliseconds, the actual time depends on VM's
memory size. For a VM with 8GB RAM, it will take about 30ms.

The current QEMU code stop the dirty logging during the pause and
copy stage by calling the migration_end() function. Now migration_end()
is a time consuming operation because it calls
memroy_global_dirty_log_stop(), which will trigger the scanning of rmap
and dropping small sptes operation. So call migration_end() before all
the vmsate data has already been transferred to the destination will
prolong VM downtime.

migration_end() should be deferred after all the data has been
transferred to the destination. blk_mig_cleanup() can be deferred too.

Effect of this patch
====================
For a VM with 8G RAM, this patch can reduce the VM downtime about 30 ms.

You can follow these steps to see the effect of this patch.

1. Start a VM with the command:
  ./qemu-system-x86_64 -enable-kvm -smp 4 -m 8192 -monitor stdio\
      /share/rhel6u5.qcow
   in the source host and 
  ./qemu-system-x86_64 -enable-kvm -smp 4 -m 8192 -monitor stdio\
      /share/rhel6u5.qcow -incoming tcp:0:4444
   in the destination host.
2. In the source side qemu monitor:
  (qemu) migrate_set_speed 0
  (qemu) migrate_set_downtime 0.01
  (qemu) migrate -d tcp:($DST_HOST_IP):4444
  (qemu) info migrate

The actual VM downtime in my environment:
=====================================
|without this patch| with this patch|
|-----------------------------------|
|      35ms        |     4ms        |
=====================================

Changes:
  * Remove qemu_savevm_sate_cancel() in migrate_fd_cleanup().
  * Add 2 more patches for code cleanup.
  * Add more details in the commit message.

Liang Li (4):
  migration: defer migration_end & blk_mig_cleanup
  migration: rename qemu_savevm_state_cancel
  migration: rename cancel to cleanup in SaveVMHandles
  migration: code clean up

 include/migration/vmstate.h |  2 +-
 include/sysemu/sysemu.h     |  2 +-
 migration/block.c           | 10 ++--------
 migration/migration.c       | 13 ++++++-------
 migration/ram.c             | 10 ++--------
 migration/savevm.c          | 10 +++++-----
 trace-events                |  2 +-
 7 files changed, 18 insertions(+), 31 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Qemu-devel] [v2 RESEND 1/4] migration: defer migration_end & blk_mig_cleanup
  2015-11-02  7:36 [Qemu-devel] [v2 RESEND 0/4] Fix long vm downtime during live migration Liang Li
@ 2015-11-02  7:37 ` Liang Li
  2015-11-03 12:28   ` Juan Quintela
  2015-11-02  7:37 ` [Qemu-devel] [v2 RESEND 2/4] migration: rename qemu_savevm_state_cancel Liang Li
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 13+ messages in thread
From: Liang Li @ 2015-11-02  7:37 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, Liang Li, yong.y.wang, stefanha, amit.shah, pbonzini

Because of the patch 3ea3b7fa9af067982f34b of kvm, which introduces a
lazy collapsing of small sptes into large sptes mechanism, now
migration_end() is a time consuming operation because it calls
memroy_global_dirty_log_stop(), which will trigger the dropping of small
sptes operation and takes about dozens of milliseconds, so call
migration_end() before all the vmsate data has already been transferred
to the destination will prolong VM downtime. This operation should be
deferred after all the data has been transferred to the destination.

blk_mig_cleanup() can be deferred too.

For a VM with 8G RAM, this patch can reduce the VM downtime about 30 ms.

Signed-off-by: Liang Li <liang.z.li@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 migration/block.c     |  1 -
 migration/migration.c | 13 ++++++-------
 migration/ram.c       |  1 -
 3 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/migration/block.c b/migration/block.c
index f7bb1e0..8401597 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -750,7 +750,6 @@ static int block_save_complete(QEMUFile *f, void *opaque)
 
     qemu_put_be64(f, BLK_MIG_FLAG_EOS);
 
-    blk_mig_cleanup();
     return 0;
 }
 
diff --git a/migration/migration.c b/migration/migration.c
index b092f38..d5a7304 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -613,12 +613,9 @@ static void migrate_fd_cleanup(void *opaque)
 
     assert(s->state != MIGRATION_STATUS_ACTIVE);
 
-    if (s->state != MIGRATION_STATUS_COMPLETED) {
-        qemu_savevm_state_cancel();
-        if (s->state == MIGRATION_STATUS_CANCELLING) {
-            migrate_set_state(s, MIGRATION_STATUS_CANCELLING,
-                              MIGRATION_STATUS_CANCELLED);
-        }
+    if (s->state == MIGRATION_STATUS_CANCELLING) {
+        migrate_set_state(s, MIGRATION_STATUS_CANCELLING,
+                          MIGRATION_STATUS_CANCELLED);
     }
 
     notifier_list_notify(&migration_state_notifiers, s);
@@ -1028,6 +1025,7 @@ static void *migration_thread(void *opaque)
     int64_t initial_bytes = 0;
     int64_t max_size = 0;
     int64_t start_time = initial_time;
+    int64_t end_time;
     bool old_vm_running = false;
 
     rcu_register_thread();
@@ -1089,10 +1087,11 @@ static void *migration_thread(void *opaque)
 
     /* If we enabled cpu throttling for auto-converge, turn it off. */
     cpu_throttle_stop();
+    end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
 
     qemu_mutex_lock_iothread();
+    qemu_savevm_state_cancel();
     if (s->state == MIGRATION_STATUS_COMPLETED) {
-        int64_t end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
         uint64_t transferred_bytes = qemu_ftell(s->file);
         s->total_time = end_time - s->total_time;
         s->downtime = end_time - start_time;
diff --git a/migration/ram.c b/migration/ram.c
index a25bcc7..25e9eeb 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1344,7 +1344,6 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
 
     rcu_read_unlock();
 
-    migration_end();
     qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
 
     return 0;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [v2 RESEND 2/4] migration: rename qemu_savevm_state_cancel
  2015-11-02  7:36 [Qemu-devel] [v2 RESEND 0/4] Fix long vm downtime during live migration Liang Li
  2015-11-02  7:37 ` [Qemu-devel] [v2 RESEND 1/4] migration: defer migration_end & blk_mig_cleanup Liang Li
@ 2015-11-02  7:37 ` Liang Li
  2015-11-03 12:29   ` Juan Quintela
  2015-11-02  7:37 ` [Qemu-devel] [v2 RESEND 3/4] migration: rename cancel to cleanup in SaveVMHandles Liang Li
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 13+ messages in thread
From: Liang Li @ 2015-11-02  7:37 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, Liang Li, yong.y.wang, stefanha, amit.shah, pbonzini

The function qemu_savevm_state_cancel is called after the migration
in migration_thread, it seems strange to 'cancel' it after completion,
rename it to qemu_savevm_state_cleanup looks better.

Signed-off-by: Liang Li <liang.z.li@intel.com>
---
 include/sysemu/sysemu.h | 2 +-
 migration/migration.c   | 2 +-
 migration/savevm.c      | 6 +++---
 trace-events            | 2 +-
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index c439975..5cb0f05 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -89,7 +89,7 @@ void qemu_savevm_state_begin(QEMUFile *f,
 void qemu_savevm_state_header(QEMUFile *f);
 int qemu_savevm_state_iterate(QEMUFile *f);
 void qemu_savevm_state_complete(QEMUFile *f);
-void qemu_savevm_state_cancel(void);
+void qemu_savevm_state_cleanup(void);
 uint64_t qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size);
 int qemu_loadvm_state(QEMUFile *f);
 
diff --git a/migration/migration.c b/migration/migration.c
index d5a7304..f99d3ea 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1090,7 +1090,7 @@ static void *migration_thread(void *opaque)
     end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
 
     qemu_mutex_lock_iothread();
-    qemu_savevm_state_cancel();
+    qemu_savevm_state_cleanup();
     if (s->state == MIGRATION_STATUS_COMPLETED) {
         uint64_t transferred_bytes = qemu_ftell(s->file);
         s->total_time = end_time - s->total_time;
diff --git a/migration/savevm.c b/migration/savevm.c
index dbcc39a..ae8fdda 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -902,11 +902,11 @@ uint64_t qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size)
     return ret;
 }
 
-void qemu_savevm_state_cancel(void)
+void qemu_savevm_state_cleanup(void)
 {
     SaveStateEntry *se;
 
-    trace_savevm_state_cancel();
+    trace_savevm_state_cleanup();
     QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
         if (se->ops && se->ops->cancel) {
             se->ops->cancel(se->opaque);
@@ -943,7 +943,7 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
         ret = qemu_file_get_error(f);
     }
     if (ret != 0) {
-        qemu_savevm_state_cancel();
+        qemu_savevm_state_cleanup();
         error_setg_errno(errp, -ret, "Error while writing VM state");
     }
     return ret;
diff --git a/trace-events b/trace-events
index 72136b9..17fbddf 100644
--- a/trace-events
+++ b/trace-events
@@ -1211,7 +1211,7 @@ savevm_state_begin(void) ""
 savevm_state_header(void) ""
 savevm_state_iterate(void) ""
 savevm_state_complete(void) ""
-savevm_state_cancel(void) ""
+savevm_state_cleanup(void) ""
 vmstate_save(const char *idstr, const char *vmsd_name) "%s, %s"
 vmstate_load(const char *idstr, const char *vmsd_name) "%s, %s"
 qemu_announce_self_iter(const char *mac) "%s"
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [v2 RESEND 3/4] migration: rename cancel to cleanup in SaveVMHandles
  2015-11-02  7:36 [Qemu-devel] [v2 RESEND 0/4] Fix long vm downtime during live migration Liang Li
  2015-11-02  7:37 ` [Qemu-devel] [v2 RESEND 1/4] migration: defer migration_end & blk_mig_cleanup Liang Li
  2015-11-02  7:37 ` [Qemu-devel] [v2 RESEND 2/4] migration: rename qemu_savevm_state_cancel Liang Li
@ 2015-11-02  7:37 ` Liang Li
  2015-11-03 12:30   ` Juan Quintela
  2015-11-02  7:37 ` [Qemu-devel] [v2 RESEND 4/4] migration: code clean up Liang Li
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 13+ messages in thread
From: Liang Li @ 2015-11-02  7:37 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, Liang Li, yong.y.wang, stefanha, amit.shah, pbonzini

'cleanup' seems more appropriate than 'cancel'.

Signed-off-by: Liang Li <liang.z.li@intel.com>
---
 include/migration/vmstate.h | 2 +-
 migration/block.c           | 2 +-
 migration/ram.c             | 2 +-
 migration/savevm.c          | 4 ++--
 4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index 9a65522..d173b56 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -39,7 +39,7 @@ typedef struct SaveVMHandlers {
     void (*set_params)(const MigrationParams *params, void * opaque);
     SaveStateHandler *save_state;
 
-    void (*cancel)(void *opaque);
+    void (*cleanup)(void *opaque);
     int (*save_live_complete)(QEMUFile *f, void *opaque);
 
     /* This runs both outside and inside the iothread lock.  */
diff --git a/migration/block.c b/migration/block.c
index 8401597..ecfe005 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -884,7 +884,7 @@ static SaveVMHandlers savevm_block_handlers = {
     .save_live_complete = block_save_complete,
     .save_live_pending = block_save_pending,
     .load_state = block_load,
-    .cancel = block_migration_cancel,
+    .cleanup = block_migration_cancel,
     .is_active = block_is_active,
 };
 
diff --git a/migration/ram.c b/migration/ram.c
index 25e9eeb..0a51473 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1685,7 +1685,7 @@ static SaveVMHandlers savevm_ram_handlers = {
     .save_live_complete = ram_save_complete,
     .save_live_pending = ram_save_pending,
     .load_state = ram_load,
-    .cancel = ram_migration_cancel,
+    .cleanup = ram_migration_cancel,
 };
 
 void ram_mig_init(void)
diff --git a/migration/savevm.c b/migration/savevm.c
index ae8fdda..e05158d 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -908,8 +908,8 @@ void qemu_savevm_state_cleanup(void)
 
     trace_savevm_state_cleanup();
     QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
-        if (se->ops && se->ops->cancel) {
-            se->ops->cancel(se->opaque);
+        if (se->ops && se->ops->cleanup) {
+            se->ops->cleanup(se->opaque);
         }
     }
 }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [v2 RESEND 4/4] migration: code clean up
  2015-11-02  7:36 [Qemu-devel] [v2 RESEND 0/4] Fix long vm downtime during live migration Liang Li
                   ` (2 preceding siblings ...)
  2015-11-02  7:37 ` [Qemu-devel] [v2 RESEND 3/4] migration: rename cancel to cleanup in SaveVMHandles Liang Li
@ 2015-11-02  7:37 ` Liang Li
  2015-11-03 12:30   ` Juan Quintela
  2015-11-02 11:54 ` [Qemu-devel] [v2 RESEND 0/4] Fix long vm downtime during live migration Paolo Bonzini
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 13+ messages in thread
From: Liang Li @ 2015-11-02  7:37 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, Liang Li, yong.y.wang, stefanha, amit.shah, pbonzini

Just clean up code, no behavior change.

Signed-off-by: Liang Li <liang.z.li@intel.com>
---
 migration/block.c | 9 ++-------
 migration/ram.c   | 9 ++-------
 2 files changed, 4 insertions(+), 14 deletions(-)

diff --git a/migration/block.c b/migration/block.c
index ecfe005..cf9d9f8 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -591,7 +591,7 @@ static int64_t get_remaining_dirty(void)
 
 /* Called with iothread lock taken.  */
 
-static void blk_mig_cleanup(void)
+static void block_migration_cleanup(void *opaque)
 {
     BlkMigDevState *bmds;
     BlkMigBlock *blk;
@@ -618,11 +618,6 @@ static void blk_mig_cleanup(void)
     blk_mig_unlock();
 }
 
-static void block_migration_cancel(void *opaque)
-{
-    blk_mig_cleanup();
-}
-
 static int block_save_setup(QEMUFile *f, void *opaque)
 {
     int ret;
@@ -884,7 +879,7 @@ static SaveVMHandlers savevm_block_handlers = {
     .save_live_complete = block_save_complete,
     .save_live_pending = block_save_pending,
     .load_state = block_load,
-    .cleanup = block_migration_cancel,
+    .cleanup = block_migration_cleanup,
     .is_active = block_is_active,
 };
 
diff --git a/migration/ram.c b/migration/ram.c
index 0a51473..df3df9e 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1100,7 +1100,7 @@ static void migration_bitmap_free(struct BitmapRcu *bmap)
     g_free(bmap);
 }
 
-static void migration_end(void)
+static void ram_migration_cleanup(void *opaque)
 {
     /* caller have hold iothread lock or is in a bh, so there is
      * no writing race against this migration_bitmap
@@ -1124,11 +1124,6 @@ static void migration_end(void)
     XBZRLE_cache_unlock();
 }
 
-static void ram_migration_cancel(void *opaque)
-{
-    migration_end();
-}
-
 static void reset_ram_globals(void)
 {
     last_seen_block = NULL;
@@ -1685,7 +1680,7 @@ static SaveVMHandlers savevm_ram_handlers = {
     .save_live_complete = ram_save_complete,
     .save_live_pending = ram_save_pending,
     .load_state = ram_load,
-    .cleanup = ram_migration_cancel,
+    .cleanup = ram_migration_cleanup,
 };
 
 void ram_mig_init(void)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [v2 RESEND 0/4] Fix long vm downtime during live migration
  2015-11-02  7:36 [Qemu-devel] [v2 RESEND 0/4] Fix long vm downtime during live migration Liang Li
                   ` (3 preceding siblings ...)
  2015-11-02  7:37 ` [Qemu-devel] [v2 RESEND 4/4] migration: code clean up Liang Li
@ 2015-11-02 11:54 ` Paolo Bonzini
  2015-11-03 12:36   ` Juan Quintela
  2015-11-02 14:40 ` Stefan Hajnoczi
  2015-11-03 12:23 ` Amit Shah
  6 siblings, 1 reply; 13+ messages in thread
From: Paolo Bonzini @ 2015-11-02 11:54 UTC (permalink / raw)
  To: Liang Li, qemu-devel
  Cc: amit.shah, Wanpeng Li, yong.y.wang, stefanha, quintela



On 02/11/2015 08:36, Liang Li wrote:
> The patch 3ea3b7fa9af067982f34b of kvm introduces a lazy collapsing
> of small sptes into large sptes mechanism, which intend to solve the
> performance drop issue if live migration fails or is canceled. The
> rmap will be scanned in the KVM_SET_USER_MEMORY_REGION ioctl context
> when dirty logging is stopped so as to drop the small sptes, scanning
> the rmap and drop the small sptes is a time consuming operation which
> will take dozens of milliseconds, the actual time depends on VM's
> memory size. For a VM with 8GB RAM, it will take about 30ms.

I'm okay with these patches.  Juan, can they be included in 2.5?

However, the KVM patch is a regression too.  Wanpeng, can you look into
doing the collapsing from a work item?

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [v2 RESEND 0/4] Fix long vm downtime during live migration
  2015-11-02  7:36 [Qemu-devel] [v2 RESEND 0/4] Fix long vm downtime during live migration Liang Li
                   ` (4 preceding siblings ...)
  2015-11-02 11:54 ` [Qemu-devel] [v2 RESEND 0/4] Fix long vm downtime during live migration Paolo Bonzini
@ 2015-11-02 14:40 ` Stefan Hajnoczi
  2015-11-03 12:23 ` Amit Shah
  6 siblings, 0 replies; 13+ messages in thread
From: Stefan Hajnoczi @ 2015-11-02 14:40 UTC (permalink / raw)
  To: Liang Li; +Cc: amit.shah, pbonzini, yong.y.wang, qemu-devel, quintela

[-- Attachment #1: Type: text/plain, Size: 3058 bytes --]

On Mon, Nov 02, 2015 at 03:36:59PM +0800, Liang Li wrote:
> The patch 3ea3b7fa9af067982f34b of kvm introduces a lazy collapsing
> of small sptes into large sptes mechanism, which intend to solve the
> performance drop issue if live migration fails or is canceled. The
> rmap will be scanned in the KVM_SET_USER_MEMORY_REGION ioctl context
> when dirty logging is stopped so as to drop the small sptes, scanning
> the rmap and drop the small sptes is a time consuming operation which
> will take dozens of milliseconds, the actual time depends on VM's
> memory size. For a VM with 8GB RAM, it will take about 30ms.
> 
> The current QEMU code stop the dirty logging during the pause and
> copy stage by calling the migration_end() function. Now migration_end()
> is a time consuming operation because it calls
> memroy_global_dirty_log_stop(), which will trigger the scanning of rmap
> and dropping small sptes operation. So call migration_end() before all
> the vmsate data has already been transferred to the destination will
> prolong VM downtime.
> 
> migration_end() should be deferred after all the data has been
> transferred to the destination. blk_mig_cleanup() can be deferred too.
> 
> Effect of this patch
> ====================
> For a VM with 8G RAM, this patch can reduce the VM downtime about 30 ms.
> 
> You can follow these steps to see the effect of this patch.
> 
> 1. Start a VM with the command:
>   ./qemu-system-x86_64 -enable-kvm -smp 4 -m 8192 -monitor stdio\
>       /share/rhel6u5.qcow
>    in the source host and 
>   ./qemu-system-x86_64 -enable-kvm -smp 4 -m 8192 -monitor stdio\
>       /share/rhel6u5.qcow -incoming tcp:0:4444
>    in the destination host.
> 2. In the source side qemu monitor:
>   (qemu) migrate_set_speed 0
>   (qemu) migrate_set_downtime 0.01
>   (qemu) migrate -d tcp:($DST_HOST_IP):4444
>   (qemu) info migrate
> 
> The actual VM downtime in my environment:
> =====================================
> |without this patch| with this patch|
> |-----------------------------------|
> |      35ms        |     4ms        |
> =====================================
> 
> 
> Changes:
>   * Remove qemu_savevm_sate_cancel() in migrate_fd_cleanup().
>   * Add 2 more patches for code cleanup.
>   * Add more details in the commit message.
> 
> Liang Li (4):
>   migration: defer migration_end & blk_mig_cleanup
>   migration: rename qemu_savevm_state_cancel
>   migration: rename cancel to cleanup in SaveVMHandles
>   migration: code clean up
> 
>  include/migration/vmstate.h |  2 +-
>  include/sysemu/sysemu.h     |  2 +-
>  migration/block.c           | 10 ++--------
>  migration/migration.c       | 13 ++++++-------
>  migration/ram.c             | 10 ++--------
>  migration/savevm.c          | 10 +++++-----
>  trace-events                |  2 +-
>  7 files changed, 18 insertions(+), 31 deletions(-)

I haven't reviewed in detail, but if migration folks are happy then it's
fine by me:

Acked-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [v2 RESEND 0/4] Fix long vm downtime during live migration
  2015-11-02  7:36 [Qemu-devel] [v2 RESEND 0/4] Fix long vm downtime during live migration Liang Li
                   ` (5 preceding siblings ...)
  2015-11-02 14:40 ` Stefan Hajnoczi
@ 2015-11-03 12:23 ` Amit Shah
  6 siblings, 0 replies; 13+ messages in thread
From: Amit Shah @ 2015-11-03 12:23 UTC (permalink / raw)
  To: Liang Li; +Cc: yong.y.wang, pbonzini, qemu-devel, stefanha, quintela

On (Mon) 02 Nov 2015 [15:36:59], Liang Li wrote:
> The patch 3ea3b7fa9af067982f34b of kvm introduces a lazy collapsing
> of small sptes into large sptes mechanism, which intend to solve the
> performance drop issue if live migration fails or is canceled. The
> rmap will be scanned in the KVM_SET_USER_MEMORY_REGION ioctl context
> when dirty logging is stopped so as to drop the small sptes, scanning
> the rmap and drop the small sptes is a time consuming operation which
> will take dozens of milliseconds, the actual time depends on VM's
> memory size. For a VM with 8GB RAM, it will take about 30ms.
> 
> The current QEMU code stop the dirty logging during the pause and
> copy stage by calling the migration_end() function. Now migration_end()
> is a time consuming operation because it calls
> memroy_global_dirty_log_stop(), which will trigger the scanning of rmap
> and dropping small sptes operation. So call migration_end() before all
> the vmsate data has already been transferred to the destination will
> prolong VM downtime.
> 
> migration_end() should be deferred after all the data has been
> transferred to the destination. blk_mig_cleanup() can be deferred too.

Reviewed-by: Amit Shah <amit.shah@redhat.com>

Thanks for adding to the commit message, that helped.


		Amit

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [v2 RESEND 1/4] migration: defer migration_end & blk_mig_cleanup
  2015-11-02  7:37 ` [Qemu-devel] [v2 RESEND 1/4] migration: defer migration_end & blk_mig_cleanup Liang Li
@ 2015-11-03 12:28   ` Juan Quintela
  0 siblings, 0 replies; 13+ messages in thread
From: Juan Quintela @ 2015-11-03 12:28 UTC (permalink / raw)
  To: Liang Li; +Cc: amit.shah, pbonzini, qemu-devel, stefanha, yong.y.wang

Liang Li <liang.z.li@intel.com> wrote:
> Because of the patch 3ea3b7fa9af067982f34b of kvm, which introduces a
> lazy collapsing of small sptes into large sptes mechanism, now
> migration_end() is a time consuming operation because it calls
> memroy_global_dirty_log_stop(), which will trigger the dropping of small
> sptes operation and takes about dozens of milliseconds, so call
> migration_end() before all the vmsate data has already been transferred
> to the destination will prolong VM downtime. This operation should be
> deferred after all the data has been transferred to the destination.
>
> blk_mig_cleanup() can be deferred too.
>
> For a VM with 8G RAM, this patch can reduce the VM downtime about 30 ms.
>
> Signed-off-by: Liang Li <liang.z.li@intel.com>
> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

And the naming makes more sense even, thanks

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [v2 RESEND 2/4] migration: rename qemu_savevm_state_cancel
  2015-11-02  7:37 ` [Qemu-devel] [v2 RESEND 2/4] migration: rename qemu_savevm_state_cancel Liang Li
@ 2015-11-03 12:29   ` Juan Quintela
  0 siblings, 0 replies; 13+ messages in thread
From: Juan Quintela @ 2015-11-03 12:29 UTC (permalink / raw)
  To: Liang Li; +Cc: amit.shah, pbonzini, qemu-devel, stefanha, yong.y.wang

Liang Li <liang.z.li@intel.com> wrote:
> The function qemu_savevm_state_cancel is called after the migration
> in migration_thread, it seems strange to 'cancel' it after completion,
> rename it to qemu_savevm_state_cleanup looks better.
>
> Signed-off-by: Liang Li <liang.z.li@intel.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [v2 RESEND 3/4] migration: rename cancel to cleanup in SaveVMHandles
  2015-11-02  7:37 ` [Qemu-devel] [v2 RESEND 3/4] migration: rename cancel to cleanup in SaveVMHandles Liang Li
@ 2015-11-03 12:30   ` Juan Quintela
  0 siblings, 0 replies; 13+ messages in thread
From: Juan Quintela @ 2015-11-03 12:30 UTC (permalink / raw)
  To: Liang Li; +Cc: amit.shah, pbonzini, qemu-devel, stefanha, yong.y.wang

Liang Li <liang.z.li@intel.com> wrote:
> 'cleanup' seems more appropriate than 'cancel'.
>
> Signed-off-by: Liang Li <liang.z.li@intel.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [v2 RESEND 4/4] migration: code clean up
  2015-11-02  7:37 ` [Qemu-devel] [v2 RESEND 4/4] migration: code clean up Liang Li
@ 2015-11-03 12:30   ` Juan Quintela
  0 siblings, 0 replies; 13+ messages in thread
From: Juan Quintela @ 2015-11-03 12:30 UTC (permalink / raw)
  To: Liang Li; +Cc: amit.shah, pbonzini, qemu-devel, stefanha, yong.y.wang

Liang Li <liang.z.li@intel.com> wrote:
> Just clean up code, no behavior change.
>
> Signed-off-by: Liang Li <liang.z.li@intel.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>


Applied the whole series

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [v2 RESEND 0/4] Fix long vm downtime during live migration
  2015-11-02 11:54 ` [Qemu-devel] [v2 RESEND 0/4] Fix long vm downtime during live migration Paolo Bonzini
@ 2015-11-03 12:36   ` Juan Quintela
  0 siblings, 0 replies; 13+ messages in thread
From: Juan Quintela @ 2015-11-03 12:36 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Wanpeng Li, Liang Li, qemu-devel, yong.y.wang, stefanha, amit.shah

Paolo Bonzini <pbonzini@redhat.com> wrote:
> On 02/11/2015 08:36, Liang Li wrote:
>> The patch 3ea3b7fa9af067982f34b of kvm introduces a lazy collapsing
>> of small sptes into large sptes mechanism, which intend to solve the
>> performance drop issue if live migration fails or is canceled. The
>> rmap will be scanned in the KVM_SET_USER_MEMORY_REGION ioctl context
>> when dirty logging is stopped so as to drop the small sptes, scanning
>> the rmap and drop the small sptes is a time consuming operation which
>> will take dozens of milliseconds, the actual time depends on VM's
>> memory size. For a VM with 8GB RAM, it will take about 30ms.
>
> I'm okay with these patches.  Juan, can they be included in 2.5?

Applied, thanks.

>
> However, the KVM patch is a regression too.  Wanpeng, can you look into
> doing the collapsing from a work item?
>
> Thanks,
>
> Paolo

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2015-11-03 12:36 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-02  7:36 [Qemu-devel] [v2 RESEND 0/4] Fix long vm downtime during live migration Liang Li
2015-11-02  7:37 ` [Qemu-devel] [v2 RESEND 1/4] migration: defer migration_end & blk_mig_cleanup Liang Li
2015-11-03 12:28   ` Juan Quintela
2015-11-02  7:37 ` [Qemu-devel] [v2 RESEND 2/4] migration: rename qemu_savevm_state_cancel Liang Li
2015-11-03 12:29   ` Juan Quintela
2015-11-02  7:37 ` [Qemu-devel] [v2 RESEND 3/4] migration: rename cancel to cleanup in SaveVMHandles Liang Li
2015-11-03 12:30   ` Juan Quintela
2015-11-02  7:37 ` [Qemu-devel] [v2 RESEND 4/4] migration: code clean up Liang Li
2015-11-03 12:30   ` Juan Quintela
2015-11-02 11:54 ` [Qemu-devel] [v2 RESEND 0/4] Fix long vm downtime during live migration Paolo Bonzini
2015-11-03 12:36   ` Juan Quintela
2015-11-02 14:40 ` Stefan Hajnoczi
2015-11-03 12:23 ` Amit Shah

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.