All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC PATCH v6 0/3] Throttle-down guest to help with live migration convergence
@ 2013-06-14 13:58 Chegu Vinod
  2013-06-14 13:58 ` [Qemu-devel] [RFC PATCH v6 1/3] Introduce async_run_on_cpu() Chegu Vinod
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Chegu Vinod @ 2013-06-14 13:58 UTC (permalink / raw)
  To: eblake, anthony, quintela, owasserm, pbonzini, qemu-devel; +Cc: chegu_vinod

Busy enterprise workloads hosted on large sized VM's tend to dirty
memory faster than the transfer rate achieved via live guest migration.
Despite some good recent improvements (& using dedicated 10Gig NICs
between hosts) the live migration does NOT converge.

If a user chooses to force convergence of their migration via a new
migration capability "auto-converge" then this change will auto-detect
lack of convergence scenario and trigger a slow down of the workload
by explicitly disallowing the VCPUs from spending much time in the VM
context.

The migration thread tries to catchup and this eventually leads
to convergence in some "deterministic" amount of time. Yes it does
impact the performance of all the VCPUs but in our observation that
lasts only for a short duration of time. i.e. end up entering
stage 3 (downtime phase) soon after that. No external monitoring/triggers
are required.

Thanks to Juan and Paolo for their useful suggestions.

---
Changes from v5:
- incorporated feedback from Paolo & Igor.
- rebased to latest qemu.git

Changes from v4:
- incorporated feedback from Paolo.
- split into 3 patches.

Changes from v3:
- incorporated feedback from Paolo and Eric
- rebased to latest qemu.git

Changes from v2:
- incorporated feedback from Orit, Juan and Eric
- stop the throttling thread at the start of stage 3
- rebased to latest qemu.git

Changes from v1:
- rebased to latest qemu.git
- added auto-converge capability(default off) - suggested by Anthony Liguori &
                                                Eric Blake.

Signed-off-by: Chegu Vinod <chegu_vinod@hp.com>
---

Chegu Vinod (3):
  Introduce async_run_on_cpu()
  Add 'auto-converge' migration capability
  Force auto-convegence of live migration

 arch_init.c                   |   85 +++++++++++++++++++++++++++++++++++++++++
 cpus.c                        |   29 ++++++++++++++
 include/migration/migration.h |    2 +
 include/qemu-common.h         |    1 +
 include/qom/cpu.h             |   10 +++++
 migration.c                   |    9 ++++
 qapi-schema.json              |    5 ++-
 7 files changed, 140 insertions(+), 1 deletions(-)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Qemu-devel] [RFC PATCH v6 1/3] Introduce async_run_on_cpu()
  2013-06-14 13:58 [Qemu-devel] [RFC PATCH v6 0/3] Throttle-down guest to help with live migration convergence Chegu Vinod
@ 2013-06-14 13:58 ` Chegu Vinod
  2013-06-14 13:58 ` [Qemu-devel] [RFC PATCH v6 2/3] Add 'auto-converge' migration capability Chegu Vinod
  2013-06-14 13:58 ` [Qemu-devel] [RFC PATCH v6 3/3] Force auto-convegence of live migration Chegu Vinod
  2 siblings, 0 replies; 6+ messages in thread
From: Chegu Vinod @ 2013-06-14 13:58 UTC (permalink / raw)
  To: eblake, anthony, quintela, owasserm, pbonzini, qemu-devel; +Cc: chegu_vinod

Introduce an asynchronous version of run_on_cpu() i.e. the caller
doesn't have to block till the call back routine finishes execution
on the target vcpu.

Signed-off-by: Chegu Vinod <chegu_vinod@hp.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 cpus.c                |   29 +++++++++++++++++++++++++++++
 include/qemu-common.h |    1 +
 include/qom/cpu.h     |   10 ++++++++++
 3 files changed, 40 insertions(+), 0 deletions(-)

diff --git a/cpus.c b/cpus.c
index c232265..8cd4eab 100644
--- a/cpus.c
+++ b/cpus.c
@@ -653,6 +653,7 @@ void run_on_cpu(CPUState *cpu, void (*func)(void *data), void *data)
 
     wi.func = func;
     wi.data = data;
+    wi.free = false;
     if (cpu->queued_work_first == NULL) {
         cpu->queued_work_first = &wi;
     } else {
@@ -671,6 +672,31 @@ void run_on_cpu(CPUState *cpu, void (*func)(void *data), void *data)
     }
 }
 
+void async_run_on_cpu(CPUState *cpu, void (*func)(void *data), void *data)
+{
+    struct qemu_work_item *wi;
+
+    if (qemu_cpu_is_self(cpu)) {
+        func(data);
+        return;
+    }
+
+    wi = g_malloc0(sizeof(struct qemu_work_item));
+    wi->func = func;
+    wi->data = data;
+    wi->free = true;
+    if (cpu->queued_work_first == NULL) {
+        cpu->queued_work_first = wi;
+    } else {
+        cpu->queued_work_last->next = wi;
+    }
+    cpu->queued_work_last = wi;
+    wi->next = NULL;
+    wi->done = false;
+
+    qemu_cpu_kick(cpu);
+}
+
 static void flush_queued_work(CPUState *cpu)
 {
     struct qemu_work_item *wi;
@@ -683,6 +709,9 @@ static void flush_queued_work(CPUState *cpu)
         cpu->queued_work_first = wi->next;
         wi->func(wi->data);
         wi->done = true;
+        if (wi->free) {
+            g_free(wi);
+        }
     }
     cpu->queued_work_last = NULL;
     qemu_cond_broadcast(&qemu_work_cond);
diff --git a/include/qemu-common.h b/include/qemu-common.h
index ed8b6e2..ac0ed38 100644
--- a/include/qemu-common.h
+++ b/include/qemu-common.h
@@ -302,6 +302,7 @@ struct qemu_work_item {
     void (*func)(void *data);
     void *data;
     int done;
+    bool free;
 };
 
 #ifdef CONFIG_USER_ONLY
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index 7cd9442..46465e9 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -265,6 +265,16 @@ bool cpu_is_stopped(CPUState *cpu);
 void run_on_cpu(CPUState *cpu, void (*func)(void *data), void *data);
 
 /**
+ * async_run_on_cpu:
+ * @cpu: The vCPU to run on.
+ * @func: The function to be executed.
+ * @data: Data to pass to the function.
+ *
+ * Schedules the function @func for execution on the vCPU @cpu asynchronously.
+ */
+void async_run_on_cpu(CPUState *cpu, void (*func)(void *data), void *data);
+
+/**
  * qemu_for_each_cpu:
  * @func: The function to be executed.
  * @data: Data to pass to the function.
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [Qemu-devel] [RFC PATCH v6 2/3] Add 'auto-converge' migration capability
  2013-06-14 13:58 [Qemu-devel] [RFC PATCH v6 0/3] Throttle-down guest to help with live migration convergence Chegu Vinod
  2013-06-14 13:58 ` [Qemu-devel] [RFC PATCH v6 1/3] Introduce async_run_on_cpu() Chegu Vinod
@ 2013-06-14 13:58 ` Chegu Vinod
  2013-06-14 13:58 ` [Qemu-devel] [RFC PATCH v6 3/3] Force auto-convegence of live migration Chegu Vinod
  2 siblings, 0 replies; 6+ messages in thread
From: Chegu Vinod @ 2013-06-14 13:58 UTC (permalink / raw)
  To: eblake, anthony, quintela, owasserm, pbonzini, qemu-devel; +Cc: chegu_vinod

The auto-converge migration capability allows the user to specify if they
choose live migration seqeunce to automatically detect and force convergence.

Signed-off-by: Chegu Vinod <chegu_vinod@hp.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 include/migration/migration.h |    2 ++
 migration.c                   |    9 +++++++++
 qapi-schema.json              |    5 ++++-
 3 files changed, 15 insertions(+), 1 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index e2acec6..ace91b0 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -127,4 +127,6 @@ int migrate_use_xbzrle(void);
 int64_t migrate_xbzrle_cache_size(void);
 
 int64_t xbzrle_cache_resize(int64_t new_size);
+
+bool migrate_auto_converge(void);
 #endif
diff --git a/migration.c b/migration.c
index 058f9e6..d0759c1 100644
--- a/migration.c
+++ b/migration.c
@@ -473,6 +473,15 @@ void qmp_migrate_set_downtime(double value, Error **errp)
     max_downtime = (uint64_t)value;
 }
 
+bool migrate_auto_converge(void)
+{
+    MigrationState *s;
+
+    s = migrate_get_current();
+
+    return s->enabled_capabilities[MIGRATION_CAPABILITY_AUTO_CONVERGE];
+}
+
 int migrate_use_xbzrle(void)
 {
     MigrationState *s;
diff --git a/qapi-schema.json b/qapi-schema.json
index 5ad6894..882a7fd 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -605,10 +605,13 @@
 #          This feature allows us to minimize migration traffic for certain work
 #          loads, by sending compressed difference of the pages
 #
+# @auto-converge: If enabled, QEMU will automatically throttle down the guest
+#          to speed up convergence of RAM migration. (since 1.6)
+#
 # Since: 1.2
 ##
 { 'enum': 'MigrationCapability',
-  'data': ['xbzrle'] }
+  'data': ['xbzrle', 'auto-converge'] }
 
 ##
 # @MigrationCapabilityStatus
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [Qemu-devel] [RFC PATCH v6 3/3] Force auto-convegence of live migration
  2013-06-14 13:58 [Qemu-devel] [RFC PATCH v6 0/3] Throttle-down guest to help with live migration convergence Chegu Vinod
  2013-06-14 13:58 ` [Qemu-devel] [RFC PATCH v6 1/3] Introduce async_run_on_cpu() Chegu Vinod
  2013-06-14 13:58 ` [Qemu-devel] [RFC PATCH v6 2/3] Add 'auto-converge' migration capability Chegu Vinod
@ 2013-06-14 13:58 ` Chegu Vinod
  2013-06-20 12:54   ` Paolo Bonzini
  2 siblings, 1 reply; 6+ messages in thread
From: Chegu Vinod @ 2013-06-14 13:58 UTC (permalink / raw)
  To: eblake, anthony, quintela, owasserm, pbonzini, qemu-devel; +Cc: chegu_vinod

If a user chooses to turn on the auto-converge migration capability
these changes detect the lack of convergence and throttle down the
guest. i.e. force the VCPUs out of the guest for some duration
and let the migration thread catchup and help converge.

Verified the convergence using the following :
 - Java Warehouse workload running on a 20VCPU/256G guest(~80% busy)
 - OLTP like workload running on a 80VCPU/512G guest (~80% busy)

Sample results with Java warehouse workload : (migrate speed set to 20Gb and
migrate downtime set to 4seconds).

 (qemu) info migrate
 capabilities: xbzrle: off auto-converge: off  <----
 Migration status: active
 total time: 1487503 milliseconds
 expected downtime: 519 milliseconds
 transferred ram: 383749347 kbytes
 remaining ram: 2753372 kbytes
 total ram: 268444224 kbytes
 duplicate: 65461532 pages
 skipped: 64901568 pages
 normal: 95750218 pages
 normal bytes: 383000872 kbytes
 dirty pages rate: 67551 pages

 ---

 (qemu) info migrate
 capabilities: xbzrle: off auto-converge: on   <----
 Migration status: completed
 total time: 241161 milliseconds
 downtime: 6373 milliseconds
 transferred ram: 28235307 kbytes
 remaining ram: 0 kbytes
 total ram: 268444224 kbytes
 duplicate: 64946416 pages
 skipped: 64903523 pages
 normal: 7044971 pages
 normal bytes: 28179884 kbytes

Signed-off-by: Chegu Vinod <chegu_vinod@hp.com>
---
 arch_init.c |   85 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 85 insertions(+), 0 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 5d32ecf..69c6c8c 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -104,6 +104,8 @@ int graphic_depth = 15;
 #endif
 
 const uint32_t arch_type = QEMU_ARCH;
+static bool mig_throttle_on;
+static void throttle_down_guest_to_converge(void);
 
 /***********************************************************/
 /* ram save/restore */
@@ -378,8 +380,15 @@ static void migration_bitmap_sync(void)
     uint64_t num_dirty_pages_init = migration_dirty_pages;
     MigrationState *s = migrate_get_current();
     static int64_t start_time;
+    static int64_t bytes_xfer_prev;
     static int64_t num_dirty_pages_period;
     int64_t end_time;
+    int64_t bytes_xfer_now;
+    static int dirty_rate_high_cnt;
+
+    if (!bytes_xfer_prev) {
+        bytes_xfer_prev = ram_bytes_transferred();
+    }
 
     if (!start_time) {
         start_time = qemu_get_clock_ms(rt_clock);
@@ -404,6 +413,23 @@ static void migration_bitmap_sync(void)
 
     /* more than 1 second = 1000 millisecons */
     if (end_time > start_time + 1000) {
+        if (migrate_auto_converge()) {
+            /* The following detection logic can be refined later. For now:
+               Check to see if the dirtied bytes is 50% more than the approx.
+               amount of bytes that just got transferred since the last time we
+               were in this routine. If that happens >N times (for now N==4)
+               we turn on the throttle down logic */
+            bytes_xfer_now = ram_bytes_transferred();
+            if (s->dirty_pages_rate &&
+                ((num_dirty_pages_period*TARGET_PAGE_SIZE) >
+                ((bytes_xfer_now - bytes_xfer_prev)/2))) {
+                if (dirty_rate_high_cnt++ > 4) {
+                    DPRINTF("Unable to converge. Throtting down guest\n");
+                    mig_throttle_on = true;
+                }
+             }
+             bytes_xfer_prev = bytes_xfer_now;
+        }
         s->dirty_pages_rate = num_dirty_pages_period * 1000
             / (end_time - start_time);
         s->dirty_bytes_rate = s->dirty_pages_rate * TARGET_PAGE_SIZE;
@@ -628,6 +654,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
         }
         total_sent += bytes_sent;
         acct_info.iterations++;
+        throttle_down_guest_to_converge();
         /* we want to check in the 1st loop, just in case it was the 1st time
            and we had to sync the dirty bitmap.
            qemu_get_clock_ns() is a bit expensive, so we only check each some
@@ -1098,3 +1125,61 @@ TargetInfo *qmp_query_target(Error **errp)
 
     return info;
 }
+
+static bool throttling_needed(void)
+{
+    if (!migrate_auto_converge()) {
+        return false;
+    }
+    return mig_throttle_on;
+}
+
+/* Stub function that's gets run on the vcpu when its brought out of the
+   VM to run inside qemu via async_run_on_cpu()*/
+static void mig_sleep_cpu(void *opq)
+{
+    qemu_mutex_unlock_iothread();
+    g_usleep(30*1000);
+    qemu_mutex_lock_iothread();
+}
+
+/* To reduce the dirty rate explicitly disallow the VCPUs from spending
+   much time in the VM. The migration thread will try to catchup.
+   Workload will experience a performance drop.
+*/
+static void mig_throttle_cpu_down(CPUState *cpu, void *data)
+{
+    async_run_on_cpu(cpu, mig_sleep_cpu, NULL);
+}
+
+static void mig_throttle_guest_down(void)
+{
+    if (throttling_needed()) {
+        qemu_mutex_lock_iothread();
+        qemu_for_each_cpu(mig_throttle_cpu_down, NULL);
+        qemu_mutex_unlock_iothread();
+    }
+}
+
+static void throttle_down_guest_to_converge(void)
+{
+    static int64_t t0;
+    int64_t        t1;
+
+    if (!throttling_needed()) {
+        return;
+    }
+
+    if (!t0)  {
+        t0 = qemu_get_clock_ns(rt_clock);
+        return;
+    }
+
+    t1 = qemu_get_clock_ns(rt_clock);
+
+    /* If it has been more than 40 ms since the last time the guest
+     * was throtled then do it again.
+     */
+    if (((t1-t0)/1000000) > 40) {
+        mig_throttle_guest_down();
+        t0 = t1;
+    }
+}
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v6 3/3] Force auto-convegence of live migration
  2013-06-14 13:58 ` [Qemu-devel] [RFC PATCH v6 3/3] Force auto-convegence of live migration Chegu Vinod
@ 2013-06-20 12:54   ` Paolo Bonzini
  2013-06-24  2:08     ` Chegu Vinod
  0 siblings, 1 reply; 6+ messages in thread
From: Paolo Bonzini @ 2013-06-20 12:54 UTC (permalink / raw)
  To: Chegu Vinod; +Cc: owasserm, qemu-devel, anthony, quintela

Il 14/06/2013 15:58, Chegu Vinod ha scritto:
> If a user chooses to turn on the auto-converge migration capability
> these changes detect the lack of convergence and throttle down the
> guest. i.e. force the VCPUs out of the guest for some duration
> and let the migration thread catchup and help converge.

Hi Vinod,

pretty much the same comments I sent you yesterday on the obsolete
version of the patch still apply.

> Verified the convergence using the following :
>  - Java Warehouse workload running on a 20VCPU/256G guest(~80% busy)
>  - OLTP like workload running on a 80VCPU/512G guest (~80% busy)
> 
> Sample results with Java warehouse workload : (migrate speed set to 20Gb and
> migrate downtime set to 4seconds).
> 
>  (qemu) info migrate
>  capabilities: xbzrle: off auto-converge: off  <----
>  Migration status: active
>  total time: 1487503 milliseconds
>  expected downtime: 519 milliseconds
>  transferred ram: 383749347 kbytes
>  remaining ram: 2753372 kbytes
>  total ram: 268444224 kbytes
>  duplicate: 65461532 pages
>  skipped: 64901568 pages
>  normal: 95750218 pages
>  normal bytes: 383000872 kbytes
>  dirty pages rate: 67551 pages
> 
>  ---
> 
>  (qemu) info migrate
>  capabilities: xbzrle: off auto-converge: on   <----
>  Migration status: completed
>  total time: 241161 milliseconds
>  downtime: 6373 milliseconds
>  transferred ram: 28235307 kbytes
>  remaining ram: 0 kbytes
>  total ram: 268444224 kbytes
>  duplicate: 64946416 pages
>  skipped: 64903523 pages
>  normal: 7044971 pages
>  normal bytes: 28179884 kbytes
> 
> Signed-off-by: Chegu Vinod <chegu_vinod@hp.com>
> ---
>  arch_init.c |   85 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 files changed, 85 insertions(+), 0 deletions(-)
> 
> diff --git a/arch_init.c b/arch_init.c
> index 5d32ecf..69c6c8c 100644
> --- a/arch_init.c
> +++ b/arch_init.c
> @@ -104,6 +104,8 @@ int graphic_depth = 15;
>  #endif
>  
>  const uint32_t arch_type = QEMU_ARCH;
> +static bool mig_throttle_on;
> +static void throttle_down_guest_to_converge(void);
>  
>  /***********************************************************/
>  /* ram save/restore */
> @@ -378,8 +380,15 @@ static void migration_bitmap_sync(void)
>      uint64_t num_dirty_pages_init = migration_dirty_pages;
>      MigrationState *s = migrate_get_current();
>      static int64_t start_time;
> +    static int64_t bytes_xfer_prev;
>      static int64_t num_dirty_pages_period;
>      int64_t end_time;
> +    int64_t bytes_xfer_now;
> +    static int dirty_rate_high_cnt;
> +
> +    if (!bytes_xfer_prev) {
> +        bytes_xfer_prev = ram_bytes_transferred();
> +    }
>  
>      if (!start_time) {
>          start_time = qemu_get_clock_ms(rt_clock);
> @@ -404,6 +413,23 @@ static void migration_bitmap_sync(void)
>  
>      /* more than 1 second = 1000 millisecons */
>      if (end_time > start_time + 1000) {
> +        if (migrate_auto_converge()) {
> +            /* The following detection logic can be refined later. For now:
> +               Check to see if the dirtied bytes is 50% more than the approx.
> +               amount of bytes that just got transferred since the last time we
> +               were in this routine. If that happens >N times (for now N==4)
> +               we turn on the throttle down logic */
> +            bytes_xfer_now = ram_bytes_transferred();
> +            if (s->dirty_pages_rate &&
> +                ((num_dirty_pages_period*TARGET_PAGE_SIZE) >
> +                ((bytes_xfer_now - bytes_xfer_prev)/2))) {
> +                if (dirty_rate_high_cnt++ > 4) {

Too many parentheses, and please remove the nested if.

> +                    DPRINTF("Unable to converge. Throtting down guest\n");

Please use tracepoint instead.

> +                    mig_throttle_on = true;

Need to reset dirty_rate_high_cnt here, and both
dirty_rate_high_cnt/mig_throttle_on if you see !migrate_auto_converge().
 This ensures that throttling does not kick in automatically if you
disable and re-enable the feature.  It also lets you remove a bunch of
migrate_auto_converge() checks.

You also need to reset dirty_rate_high_cnt/mig_throttle_on in the setup
phase of migration.

> +                }
> +             }
> +             bytes_xfer_prev = bytes_xfer_now;
> +        }
>          s->dirty_pages_rate = num_dirty_pages_period * 1000
>              / (end_time - start_time);
>          s->dirty_bytes_rate = s->dirty_pages_rate * TARGET_PAGE_SIZE;
> @@ -628,6 +654,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
>          }
>          total_sent += bytes_sent;
>          acct_info.iterations++;
> +        throttle_down_guest_to_converge();

You can use a shorter name, like check_cpu_throttling().

>          /* we want to check in the 1st loop, just in case it was the 1st time
>             and we had to sync the dirty bitmap.
>             qemu_get_clock_ns() is a bit expensive, so we only check each some
> @@ -1098,3 +1125,61 @@ TargetInfo *qmp_query_target(Error **errp)
>  
>      return info;
>  }
> +
> +static bool throttling_needed(void)
> +{
> +    if (!migrate_auto_converge()) {
> +        return false;
> +    }
> +    return mig_throttle_on;
> +}
> +
> +/* Stub function that's gets run on the vcpu when its brought out of the
> +   VM to run inside qemu via async_run_on_cpu()*/
> +static void mig_sleep_cpu(void *opq)
> +{
> +    qemu_mutex_unlock_iothread();
> +    g_usleep(30*1000);
> +    qemu_mutex_lock_iothread();

Letting the user specify the entity of the pause would be nice, so that
management can ramp it up.  It can be done as a follow-up by adding a
'*value': 'int' field to MigrationCapabilityStatus (between 0 and 100,
default 30 as above).

Paolo

> +}
> +
> +/* To reduce the dirty rate explicitly disallow the VCPUs from spending
> +   much time in the VM. The migration thread will try to catchup.
> +   Workload will experience a performance drop.
> +*/
> +static void mig_throttle_cpu_down(CPUState *cpu, void *data)
> +{
> +    async_run_on_cpu(cpu, mig_sleep_cpu, NULL);
> +}
> +
> +static void mig_throttle_guest_down(void)
> +{
> +    if (throttling_needed()) {

No need for this "if", it is done already in the caller.

> +        qemu_mutex_lock_iothread();
> +        qemu_for_each_cpu(mig_throttle_cpu_down, NULL);
> +        qemu_mutex_unlock_iothread();
> +    }
> +}
> +
> +static void throttle_down_guest_to_converge(void)
> +{
> +    static int64_t t0;
> +    int64_t        t1;
> +
> +    if (!throttling_needed()) {

With the above suggested changes, this can simply check mig_throttle_on.

> +        return;
> +    }
> +
> +    if (!t0)  {
> +        t0 = qemu_get_clock_ns(rt_clock);
> +        return;
> +    }
> +
> +    t1 = qemu_get_clock_ns(rt_clock);
> +
> +    /* If it has been more than 40 ms since the last time the guest
> +     * was throtled then do it again.
> +     */

throttled

> +    if (((t1-t0)/1000000) > 40) {

I prefer moving the multiplication to the right so you don't need
parentheses, but this is _really_ a nit...

Paolo

> +        mig_throttle_guest_down();
> +        t0 = t1;
> +    }
> +}
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v6 3/3] Force auto-convegence of live migration
  2013-06-20 12:54   ` Paolo Bonzini
@ 2013-06-24  2:08     ` Chegu Vinod
  0 siblings, 0 replies; 6+ messages in thread
From: Chegu Vinod @ 2013-06-24  2:08 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: owasserm, qemu-devel, anthony, quintela

On 6/20/2013 5:54 AM, Paolo Bonzini wrote:
> Il 14/06/2013 15:58, Chegu Vinod ha scritto:
>> If a user chooses to turn on the auto-converge migration capability
>> these changes detect the lack of convergence and throttle down the
>> guest. i.e. force the VCPUs out of the guest for some duration
>> and let the migration thread catchup and help converge.
> Hi Vinod,
>
> pretty much the same comments I sent you yesterday on the obsolete
> version of the patch still apply.
>
>> Verified the convergence using the following :
>>   - Java Warehouse workload running on a 20VCPU/256G guest(~80% busy)
>>   - OLTP like workload running on a 80VCPU/512G guest (~80% busy)
>>
>> Sample results with Java warehouse workload : (migrate speed set to 20Gb and
>> migrate downtime set to 4seconds).
>>
>>   (qemu) info migrate
>>   capabilities: xbzrle: off auto-converge: off  <----
>>   Migration status: active
>>   total time: 1487503 milliseconds
>>   expected downtime: 519 milliseconds
>>   transferred ram: 383749347 kbytes
>>   remaining ram: 2753372 kbytes
>>   total ram: 268444224 kbytes
>>   duplicate: 65461532 pages
>>   skipped: 64901568 pages
>>   normal: 95750218 pages
>>   normal bytes: 383000872 kbytes
>>   dirty pages rate: 67551 pages
>>
>>   ---
>>
>>   (qemu) info migrate
>>   capabilities: xbzrle: off auto-converge: on   <----
>>   Migration status: completed
>>   total time: 241161 milliseconds
>>   downtime: 6373 milliseconds
>>   transferred ram: 28235307 kbytes
>>   remaining ram: 0 kbytes
>>   total ram: 268444224 kbytes
>>   duplicate: 64946416 pages
>>   skipped: 64903523 pages
>>   normal: 7044971 pages
>>   normal bytes: 28179884 kbytes
>>
>> Signed-off-by: Chegu Vinod <chegu_vinod@hp.com>
>> ---
>>   arch_init.c |   85 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 files changed, 85 insertions(+), 0 deletions(-)
>>
>> diff --git a/arch_init.c b/arch_init.c
>> index 5d32ecf..69c6c8c 100644
>> --- a/arch_init.c
>> +++ b/arch_init.c
>> @@ -104,6 +104,8 @@ int graphic_depth = 15;
>>   #endif
>>   
>>   const uint32_t arch_type = QEMU_ARCH;
>> +static bool mig_throttle_on;
>> +static void throttle_down_guest_to_converge(void);
>>   
>>   /***********************************************************/
>>   /* ram save/restore */
>> @@ -378,8 +380,15 @@ static void migration_bitmap_sync(void)
>>       uint64_t num_dirty_pages_init = migration_dirty_pages;
>>       MigrationState *s = migrate_get_current();
>>       static int64_t start_time;
>> +    static int64_t bytes_xfer_prev;
>>       static int64_t num_dirty_pages_period;
>>       int64_t end_time;
>> +    int64_t bytes_xfer_now;
>> +    static int dirty_rate_high_cnt;
>> +
>> +    if (!bytes_xfer_prev) {
>> +        bytes_xfer_prev = ram_bytes_transferred();
>> +    }
>>   
>>       if (!start_time) {
>>           start_time = qemu_get_clock_ms(rt_clock);
>> @@ -404,6 +413,23 @@ static void migration_bitmap_sync(void)
>>   
>>       /* more than 1 second = 1000 millisecons */
>>       if (end_time > start_time + 1000) {
>> +        if (migrate_auto_converge()) {
>> +            /* The following detection logic can be refined later. For now:
>> +               Check to see if the dirtied bytes is 50% more than the approx.
>> +               amount of bytes that just got transferred since the last time we
>> +               were in this routine. If that happens >N times (for now N==4)
>> +               we turn on the throttle down logic */
>> +            bytes_xfer_now = ram_bytes_transferred();
>> +            if (s->dirty_pages_rate &&
>> +                ((num_dirty_pages_period*TARGET_PAGE_SIZE) >
>> +                ((bytes_xfer_now - bytes_xfer_prev)/2))) {
>> +                if (dirty_rate_high_cnt++ > 4) {
> Too many parentheses, and please remove the nested if.
>
>> +                    DPRINTF("Unable to converge. Throtting down guest\n");
> Please use tracepoint instead.
>
>> +                    mig_throttle_on = true;
> Need to reset dirty_rate_high_cnt here, and both
> dirty_rate_high_cnt/mig_throttle_on if you see !migrate_auto_converge().
>   This ensures that throttling does not kick in automatically if you
> disable and re-enable the feature.  It also lets you remove a bunch of
> migrate_auto_converge() checks.
>
> You also need to reset dirty_rate_high_cnt/mig_throttle_on in the setup
> phase of migration.
>
>> +                }
>> +             }
>> +             bytes_xfer_prev = bytes_xfer_now;
>> +        }
>>           s->dirty_pages_rate = num_dirty_pages_period * 1000
>>               / (end_time - start_time);
>>           s->dirty_bytes_rate = s->dirty_pages_rate * TARGET_PAGE_SIZE;
>> @@ -628,6 +654,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
>>           }
>>           total_sent += bytes_sent;
>>           acct_info.iterations++;
>> +        throttle_down_guest_to_converge();
> You can use a shorter name, like check_cpu_throttling().
>
>>           /* we want to check in the 1st loop, just in case it was the 1st time
>>              and we had to sync the dirty bitmap.
>>              qemu_get_clock_ns() is a bit expensive, so we only check each some
>> @@ -1098,3 +1125,61 @@ TargetInfo *qmp_query_target(Error **errp)
>>   
>>       return info;
>>   }
>> +
>> +static bool throttling_needed(void)
>> +{
>> +    if (!migrate_auto_converge()) {
>> +        return false;
>> +    }
>> +    return mig_throttle_on;
>> +}
>> +
>> +/* Stub function that's gets run on the vcpu when its brought out of the
>> +   VM to run inside qemu via async_run_on_cpu()*/
>> +static void mig_sleep_cpu(void *opq)
>> +{
>> +    qemu_mutex_unlock_iothread();
>> +    g_usleep(30*1000);
>> +    qemu_mutex_lock_iothread();
> Letting the user specify the entity of the pause would be nice, so that
> management can ramp it up.  It can be done as a follow-up by adding a
> '*value': 'int' field to MigrationCapabilityStatus (between 0 and 100,
> default 30 as above).

Thanks Paolo.  With the exception of the above which can be pursued as a 
follow-up... I have incorporated all your suggested changes and 
re-tested and sending out a v7 series (without RFC).

Thanks!
Vinod


> Paolo
>
>> +}
>> +
>> +/* To reduce the dirty rate explicitly disallow the VCPUs from spending
>> +   much time in the VM. The migration thread will try to catchup.
>> +   Workload will experience a performance drop.
>> +*/
>> +static void mig_throttle_cpu_down(CPUState *cpu, void *data)
>> +{
>> +    async_run_on_cpu(cpu, mig_sleep_cpu, NULL);
>> +}
>> +
>> +static void mig_throttle_guest_down(void)
>> +{
>> +    if (throttling_needed()) {
> No need for this "if", it is done already in the caller.
>
>> +        qemu_mutex_lock_iothread();
>> +        qemu_for_each_cpu(mig_throttle_cpu_down, NULL);
>> +        qemu_mutex_unlock_iothread();
>> +    }
>> +}
>> +
>> +static void throttle_down_guest_to_converge(void)
>> +{
>> +    static int64_t t0;
>> +    int64_t        t1;
>> +
>> +    if (!throttling_needed()) {
> With the above suggested changes, this can simply check mig_throttle_on.
>
>> +        return;
>> +    }
>> +
>> +    if (!t0)  {
>> +        t0 = qemu_get_clock_ns(rt_clock);
>> +        return;
>> +    }
>> +
>> +    t1 = qemu_get_clock_ns(rt_clock);
>> +
>> +    /* If it has been more than 40 ms since the last time the guest
>> +     * was throtled then do it again.
>> +     */
> throttled
>
>> +    if (((t1-t0)/1000000) > 40) {
> I prefer moving the multiplication to the right so you don't need
> parentheses, but this is _really_ a nit...
>
> Paolo
>
>> +        mig_throttle_guest_down();
>> +        t0 = t1;
>> +    }
>> +}
>>
> .
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-06-24  2:08 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-14 13:58 [Qemu-devel] [RFC PATCH v6 0/3] Throttle-down guest to help with live migration convergence Chegu Vinod
2013-06-14 13:58 ` [Qemu-devel] [RFC PATCH v6 1/3] Introduce async_run_on_cpu() Chegu Vinod
2013-06-14 13:58 ` [Qemu-devel] [RFC PATCH v6 2/3] Add 'auto-converge' migration capability Chegu Vinod
2013-06-14 13:58 ` [Qemu-devel] [RFC PATCH v6 3/3] Force auto-convegence of live migration Chegu Vinod
2013-06-20 12:54   ` Paolo Bonzini
2013-06-24  2:08     ` Chegu Vinod

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.