[Qemu-devel] [PATCH 0/2] migration: Dynamic cpu throttling for auto-converge

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH 0/2] migration: Dynamic cpu throttling for auto-converge
@ 2015-06-01 15:17 Jason J. Herne
  2015-06-01 15:17 ` [Qemu-devel] [PATCH 1/2] cpu: Provide vcpu throttling interface Jason J. Herne
  2015-06-01 15:17 ` [Qemu-devel] [PATCH 2/2] migration: Dynamic cpu throttling for auto-converge Jason J. Herne
  0 siblings, 2 replies; 15+ messages in thread
From: Jason J. Herne @ 2015-06-01 15:17 UTC (permalink / raw)
  To: afaerber, amit.shah, dgilbert, borntraeger, quintela, qemu-devel
  Cc: Jason J. Herne

This patch set provides a new method for throttling a vcpu and makes use of said
method to dynamically increase cpu throttling during an autoconverge
migration until the migration completes. This method ensures that all migrations
will eventually converge.

The method used here for throttling vcpus is likely not the best. However, I
believe that it is preferable to what is used for autoconverge today.

This work is related to the following discussion:
https://lists.gnu.org/archive/html/qemu-devel/2015-03/msg00287.html

Jason J. Herne (2):
  cpu: Provide vcpu throttling interface
  migration: Dynamic cpu throttling for auto-converge

 arch_init.c           | 95 +++++++++++++++++----------------------------------
 cpus.c                | 62 +++++++++++++++++++++++++++++++++
 include/qom/cpu.h     | 46 +++++++++++++++++++++++++
 migration/migration.c |  9 +++++
 4 files changed, 149 insertions(+), 63 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH 1/2] cpu: Provide vcpu throttling interface
  2015-06-01 15:17 [Qemu-devel] [PATCH 0/2] migration: Dynamic cpu throttling for auto-converge Jason J. Herne
@ 2015-06-01 15:17 ` Jason J. Herne
  2015-06-01 15:23   ` Andrey Korolyov
  2015-06-03  7:12   ` Juan Quintela
  2015-06-01 15:17 ` [Qemu-devel] [PATCH 2/2] migration: Dynamic cpu throttling for auto-converge Jason J. Herne
  1 sibling, 2 replies; 15+ messages in thread
From: Jason J. Herne @ 2015-06-01 15:17 UTC (permalink / raw)
  To: afaerber, amit.shah, dgilbert, borntraeger, quintela, qemu-devel
  Cc: Jason J. Herne

Provide a method to throttle guest cpu execution. CPUState is augmented with
timeout controls and throttle start/stop functions. To throttle the guest cpu
the caller simply has to call the throttle start function and provide a ratio of
sleep time to normal execution time.

Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
---
 cpus.c            | 62 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 include/qom/cpu.h | 46 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 108 insertions(+)

diff --git a/cpus.c b/cpus.c
index de6469f..7568357 100644
--- a/cpus.c
+++ b/cpus.c
@@ -64,6 +64,9 @@
 
 #endif /* CONFIG_LINUX */
 
+/* Number of ms between cpu throttle operations */
+#define CPU_THROTTLE_TIMESLICE 10
+
 static CPUState *next_cpu;
 int64_t max_delay;
 int64_t max_advance;
@@ -919,6 +922,65 @@ static void qemu_kvm_wait_io_event(CPUState *cpu)
     qemu_wait_io_event_common(cpu);
 }
 
+static void cpu_throttle_thread(void *opq)
+{
+    CPUState *cpu = (CPUState *)opq;
+    long sleeptime_ms = (long)(cpu->throttle_ratio * CPU_THROTTLE_TIMESLICE);
+
+    /* Stop the timer if needed */
+    if (cpu->throttle_timer_stop) {
+        timer_del(cpu->throttle_timer);
+        timer_free(cpu->throttle_timer);
+        cpu->throttle_timer = NULL;
+        return;
+    }
+
+    qemu_mutex_unlock_iothread();
+    g_usleep(sleeptime_ms * 1000); /* Convert ms to us for usleep call */
+    qemu_mutex_lock_iothread();
+
+    timer_mod(cpu->throttle_timer, qemu_clock_get_ms(QEMU_CLOCK_REALTIME) +
+                                   CPU_THROTTLE_TIMESLICE);
+}
+
+static void cpu_throttle_timer_pop(void *opq)
+{
+    CPUState *cpu = (CPUState *)opq;
+
+    async_run_on_cpu(cpu, cpu_throttle_thread, cpu);
+}
+
+void cpu_throttle_start(CPUState *cpu, float throttle_ratio)
+{
+    assert(throttle_ratio > 0);
+    cpu->throttle_ratio = throttle_ratio;
+
+    if (!cpu_throttle_active(cpu)) {
+        cpu->throttle_timer = timer_new_ms(QEMU_CLOCK_REALTIME,
+                                           cpu_throttle_timer_pop, cpu);
+        timer_mod(cpu->throttle_timer, qemu_clock_get_ms(QEMU_CLOCK_REALTIME) +
+                                       CPU_THROTTLE_TIMESLICE);
+        cpu->throttle_timer_stop = false;
+    }
+}
+
+void cpu_throttle_stop(CPUState *cpu)
+{
+    assert(cpu_throttle_active(cpu));
+    cpu->throttle_timer_stop = true;
+}
+
+bool cpu_throttle_active(CPUState *cpu)
+{
+    return (cpu->throttle_timer != NULL);
+}
+
+float cpu_throttle_get_ratio(CPUState *cpu)
+{
+    assert(cpu_throttle_active(cpu));
+    return cpu->throttle_ratio;
+}
+
 static void *qemu_kvm_cpu_thread_fn(void *arg)
 {
     CPUState *cpu = arg;
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index 39f0f19..9d16e6a 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -310,6 +310,11 @@ struct CPUState {
     uint32_t can_do_io;
     int32_t exception_index; /* used by m68k TCG */
 
+    /* vcpu throttling controls */
+    QEMUTimer *throttle_timer;
+    bool throttle_timer_stop;
+    float throttle_ratio;
+
     /* Note that this is accessed at the start of every TB via a negative
        offset from AREG0.  Leave this field at the end so as to make the
        (absolute value) offset as small as possible.  This reduces code
@@ -553,6 +558,47 @@ CPUState *qemu_get_cpu(int index);
  */
 bool cpu_exists(int64_t id);
 
+/**
+ * cpu_throttle_start:
+ * @cpu: The vcpu to throttle
+ *
+ * Throttles a vcpu by forcing it to sleep. The duration of the sleep is a
+ * ratio of sleep time to running time. A ratio of 1.0 corresponds to a 50%
+ * duty cycle (example: 10ms sleep for every 10ms awake).
+ *
+ * cpu_throttle_start can be called as needed to adjust the throttle ratio.
+ * Once the throttling starts, it will remain in effect until cpu_throttle_stop
+ * is called.
+ */
+void cpu_throttle_start(CPUState *cpu, float throttle_ratio);
+
+/**
+ * cpu_throttle_stop:
+ * @cpu: The vcpu to stop throttling
+ *
+ * Stops the vcpu throttling started by cpu_throttle_start.
+ */
+void cpu_throttle_stop(CPUState *cpu);
+
+/**
+ * cpu_throttle_active:
+ * @cpu: The vcpu to check
+ *
+ * Returns %true if this vcpu is currently being throttled, %false otherwise.
+ */
+bool cpu_throttle_active(CPUState *cpu);
+
+/**
+ * cpu_throttle_get_ratio:
+ * @cpu: The vcpu whose throttle ratio to return.
+ *
+ * Returns the ratio being used to throttle this vcpu. See cpu_throttle_start
+ * for details.
+ *
+ * Returns The ratio being used to throttle this vcpu.
+ */
+float cpu_throttle_get_ratio(CPUState *cpu);
+
 #ifndef CONFIG_USER_ONLY
 
 typedef void (*CPUInterruptHandler)(CPUState *, int);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH 2/2] migration: Dynamic cpu throttling for auto-converge
  2015-06-01 15:17 [Qemu-devel] [PATCH 0/2] migration: Dynamic cpu throttling for auto-converge Jason J. Herne
  2015-06-01 15:17 ` [Qemu-devel] [PATCH 1/2] cpu: Provide vcpu throttling interface Jason J. Herne
@ 2015-06-01 15:17 ` Jason J. Herne
  2015-06-01 15:32   ` Dr. David Alan Gilbert
  2015-06-03  7:21   ` Juan Quintela
  1 sibling, 2 replies; 15+ messages in thread
From: Jason J. Herne @ 2015-06-01 15:17 UTC (permalink / raw)
  To: afaerber, amit.shah, dgilbert, borntraeger, quintela, qemu-devel
  Cc: Jason J. Herne

Remove traditional auto-converge static 30ms throttling code and replace it
with a dynamic throttling algorithm.

Additionally, be more aggressive when deciding when to start throttling.
Previously we waited until four unproductive memory passes. Now we begin
throttling after only two unproductive memory passes. Four seemed quite
arbitrary and only waiting for two passes allows us to complete the migration
faster.

Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
---
 arch_init.c           | 95 +++++++++++++++++----------------------------------
 migration/migration.c |  9 +++++
 2 files changed, 41 insertions(+), 63 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 23d3feb..73ae494 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -111,9 +111,7 @@ int graphic_depth = 32;
 #endif
 
 const uint32_t arch_type = QEMU_ARCH;
-static bool mig_throttle_on;
 static int dirty_rate_high_cnt;
-static void check_guest_throttling(void);
 
 static uint64_t bitmap_sync_count;
 
@@ -487,6 +485,31 @@ static size_t save_page_header(QEMUFile *f, RAMBlock *block, ram_addr_t offset)
     return size;
 }
 
+/* Reduce amount of guest cpu execution to hopefully slow down memory writes.
+ * If guest dirty memory rate is reduced below the rate at which we can
+ * transfer pages to the destination then we should be able to complete
+ * migration. Some workloads dirty memory way too fast and will not effectively
+ * converge, even with auto-converge. For these workloads we will continue to
+ * increase throttling until the guest is paused long enough to complete the
+ * migration. This essentially becomes a non-live migration.
+ */
+static void mig_throttle_guest_down(void)
+{
+    CPUState *cpu;
+
+    CPU_FOREACH(cpu) {
+        /* We have not started throttling yet. Lets start it.*/
+        if (!cpu_throttle_active(cpu)) {
+            cpu_throttle_start(cpu, 0.2);
+        }
+
+        /* Throttling is already in place. Just increase the throttling rate */
+        else {
+            cpu_throttle_start(cpu, cpu_throttle_get_ratio(cpu) * 2);
+        }
+    }
+}
+
 /* Update the xbzrle cache to reflect a page that's been sent as all 0.
  * The important thing is that a stale (not-yet-0'd) page be replaced
  * by the new data.
@@ -714,21 +737,21 @@ static void migration_bitmap_sync(void)
             /* The following detection logic can be refined later. For now:
                Check to see if the dirtied bytes is 50% more than the approx.
                amount of bytes that just got transferred since the last time we
-               were in this routine. If that happens >N times (for now N==4)
-               we turn on the throttle down logic */
+               were in this routine. If that happens twice, start or increase
+               throttling */
             bytes_xfer_now = ram_bytes_transferred();
+
             if (s->dirty_pages_rate &&
                (num_dirty_pages_period * TARGET_PAGE_SIZE >
                    (bytes_xfer_now - bytes_xfer_prev)/2) &&
-               (dirty_rate_high_cnt++ > 4)) {
+               (dirty_rate_high_cnt++ >= 2)) {
                     trace_migration_throttle();
-                    mig_throttle_on = true;
                     dirty_rate_high_cnt = 0;
+                    mig_throttle_guest_down();
              }
              bytes_xfer_prev = bytes_xfer_now;
-        } else {
-             mig_throttle_on = false;
         }
+
         if (migrate_use_xbzrle()) {
             if (iterations_prev != acct_info.iterations) {
                 acct_info.xbzrle_cache_miss_rate =
@@ -1197,7 +1220,6 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
     RAMBlock *block;
     int64_t ram_bitmap_pages; /* Size of bitmap in pages, including gaps */
 
-    mig_throttle_on = false;
     dirty_rate_high_cnt = 0;
     bitmap_sync_count = 0;
     migration_bitmap_sync_init();
@@ -1301,12 +1323,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
         }
         pages_sent += pages;
         acct_info.iterations++;
-        check_guest_throttling();
-        /* we want to check in the 1st loop, just in case it was the 1st time
-           and we had to sync the dirty bitmap.
-           qemu_get_clock_ns() is a bit expensive, so we only check each some
-           iterations
-        */
+
         if ((i & 63) == 0) {
             uint64_t t1 = (qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - t0) / 1000000;
             if (t1 > MAX_WAIT) {
@@ -1913,51 +1930,3 @@ TargetInfo *qmp_query_target(Error **errp)
     return info;
 }
 
-/* Stub function that's gets run on the vcpu when its brought out of the
-   VM to run inside qemu via async_run_on_cpu()*/
-static void mig_sleep_cpu(void *opq)
-{
-    qemu_mutex_unlock_iothread();
-    g_usleep(30*1000);
-    qemu_mutex_lock_iothread();
-}
-
-/* To reduce the dirty rate explicitly disallow the VCPUs from spending
-   much time in the VM. The migration thread will try to catchup.
-   Workload will experience a performance drop.
-*/
-static void mig_throttle_guest_down(void)
-{
-    CPUState *cpu;
-
-    qemu_mutex_lock_iothread();
-    CPU_FOREACH(cpu) {
-        async_run_on_cpu(cpu, mig_sleep_cpu, NULL);
-    }
-    qemu_mutex_unlock_iothread();
-}
-
-static void check_guest_throttling(void)
-{
-    static int64_t t0;
-    int64_t        t1;
-
-    if (!mig_throttle_on) {
-        return;
-    }
-
-    if (!t0)  {
-        t0 = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
-        return;
-    }
-
-    t1 = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
-
-    /* If it has been more than 40 ms since the last time the guest
-     * was throttled then do it again.
-     */
-    if (40 < (t1-t0)/1000000) {
-        mig_throttle_guest_down();
-        t0 = t1;
-    }
-}
diff --git a/migration/migration.c b/migration/migration.c
index 732d229..c9545df 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -25,6 +25,7 @@
 #include "qemu/thread.h"
 #include "qmp-commands.h"
 #include "trace.h"
+#include "qom/cpu.h"
 
 #define MAX_THROTTLE  (32 << 20)      /* Migration speed throttling */
 
@@ -731,6 +732,7 @@ int64_t migrate_xbzrle_cache_size(void)
 static void *migration_thread(void *opaque)
 {
     MigrationState *s = opaque;
+    CPUState *cpu;
     int64_t initial_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
     int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST);
     int64_t initial_bytes = 0;
@@ -814,6 +816,13 @@ static void *migration_thread(void *opaque)
         }
     }
 
+    /* If we enabled cpu throttling for auto-converge, turn it off. */
+    CPU_FOREACH(cpu) {
+        if (cpu_throttle_active(cpu)) {
+            cpu_throttle_stop(cpu);
+        }
+    }
+
     qemu_mutex_lock_iothread();
     if (s->state == MIGRATION_STATUS_COMPLETED) {
         int64_t end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] cpu: Provide vcpu throttling interface
  2015-06-01 15:17 ` [Qemu-devel] [PATCH 1/2] cpu: Provide vcpu throttling interface Jason J. Herne
@ 2015-06-01 15:23   ` Andrey Korolyov
  2015-06-01 17:04     ` Jason J. Herne
  2015-06-03  7:12   ` Juan Quintela
  1 sibling, 1 reply; 15+ messages in thread
From: Andrey Korolyov @ 2015-06-01 15:23 UTC (permalink / raw)
  To: Jason J. Herne
  Cc: quintela, qemu-devel, Dr. David Alan Gilbert, borntraeger,
	Amit Shah, afaerber

On Mon, Jun 1, 2015 at 6:17 PM, Jason J. Herne
<jjherne@linux.vnet.ibm.com> wrote:
> Provide a method to throttle guest cpu execution. CPUState is augmented with
> timeout controls and throttle start/stop functions. To throttle the guest cpu
> the caller simply has to call the throttle start function and provide a ratio of
> sleep time to normal execution time.
>
> Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
> Reviewed-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
> ---
>  cpus.c            | 62 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  include/qom/cpu.h | 46 +++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 108 insertions(+)
>
> diff --git a/cpus.c b/cpus.c
> index de6469f..7568357 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -64,6 +64,9 @@
>
>  #endif /* CONFIG_LINUX */
>
> +/* Number of ms between cpu throttle operations */
> +#define CPU_THROTTLE_TIMESLICE 10
> +
>  static CPUState *next_cpu;
>  int64_t max_delay;
>  int64_t max_advance;
> @@ -919,6 +922,65 @@ static void qemu_kvm_wait_io_event(CPUState *cpu)
>      qemu_wait_io_event_common(cpu);
>  }
>
> +static void cpu_throttle_thread(void *opq)
> +{
> +    CPUState *cpu = (CPUState *)opq;
> +    long sleeptime_ms = (long)(cpu->throttle_ratio * CPU_THROTTLE_TIMESLICE);
> +
> +    /* Stop the timer if needed */
> +    if (cpu->throttle_timer_stop) {
> +        timer_del(cpu->throttle_timer);
> +        timer_free(cpu->throttle_timer);
> +        cpu->throttle_timer = NULL;
> +        return;
> +    }
> +
> +    qemu_mutex_unlock_iothread();
> +    g_usleep(sleeptime_ms * 1000); /* Convert ms to us for usleep call */
> +    qemu_mutex_lock_iothread();
> +
> +    timer_mod(cpu->throttle_timer, qemu_clock_get_ms(QEMU_CLOCK_REALTIME) +
> +                                   CPU_THROTTLE_TIMESLICE);
> +}
> +
> +static void cpu_throttle_timer_pop(void *opq)
> +{
> +    CPUState *cpu = (CPUState *)opq;
> +
> +    async_run_on_cpu(cpu, cpu_throttle_thread, cpu);
> +}
> +
> +void cpu_throttle_start(CPUState *cpu, float throttle_ratio)
> +{
> +    assert(throttle_ratio > 0);
> +    cpu->throttle_ratio = throttle_ratio;
> +
> +    if (!cpu_throttle_active(cpu)) {
> +        cpu->throttle_timer = timer_new_ms(QEMU_CLOCK_REALTIME,
> +                                           cpu_throttle_timer_pop, cpu);
> +        timer_mod(cpu->throttle_timer, qemu_clock_get_ms(QEMU_CLOCK_REALTIME) +
> +                                       CPU_THROTTLE_TIMESLICE);
> +        cpu->throttle_timer_stop = false;
> +    }
> +}
> +
> +void cpu_throttle_stop(CPUState *cpu)
> +{
> +    assert(cpu_throttle_active(cpu));
> +    cpu->throttle_timer_stop = true;
> +}
> +
> +bool cpu_throttle_active(CPUState *cpu)
> +{
> +    return (cpu->throttle_timer != NULL);
> +}
> +
> +float cpu_throttle_get_ratio(CPUState *cpu)
> +{
> +    assert(cpu_throttle_active(cpu));
> +    return cpu->throttle_ratio;
> +}
> +
>  static void *qemu_kvm_cpu_thread_fn(void *arg)
>  {
>      CPUState *cpu = arg;
> diff --git a/include/qom/cpu.h b/include/qom/cpu.h
> index 39f0f19..9d16e6a 100644
> --- a/include/qom/cpu.h
> +++ b/include/qom/cpu.h
> @@ -310,6 +310,11 @@ struct CPUState {
>      uint32_t can_do_io;
>      int32_t exception_index; /* used by m68k TCG */
>
> +    /* vcpu throttling controls */
> +    QEMUTimer *throttle_timer;
> +    bool throttle_timer_stop;
> +    float throttle_ratio;
> +
>      /* Note that this is accessed at the start of every TB via a negative
>         offset from AREG0.  Leave this field at the end so as to make the
>         (absolute value) offset as small as possible.  This reduces code
> @@ -553,6 +558,47 @@ CPUState *qemu_get_cpu(int index);
>   */
>  bool cpu_exists(int64_t id);
>
> +/**
> + * cpu_throttle_start:
> + * @cpu: The vcpu to throttle
> + *
> + * Throttles a vcpu by forcing it to sleep. The duration of the sleep is a
> + * ratio of sleep time to running time. A ratio of 1.0 corresponds to a 50%
> + * duty cycle (example: 10ms sleep for every 10ms awake).
> + *
> + * cpu_throttle_start can be called as needed to adjust the throttle ratio.
> + * Once the throttling starts, it will remain in effect until cpu_throttle_stop
> + * is called.
> + */
> +void cpu_throttle_start(CPUState *cpu, float throttle_ratio);
> +
> +/**
> + * cpu_throttle_stop:
> + * @cpu: The vcpu to stop throttling
> + *
> + * Stops the vcpu throttling started by cpu_throttle_start.
> + */
> +void cpu_throttle_stop(CPUState *cpu);
> +
> +/**
> + * cpu_throttle_active:
> + * @cpu: The vcpu to check
> + *
> + * Returns %true if this vcpu is currently being throttled, %false otherwise.
> + */
> +bool cpu_throttle_active(CPUState *cpu);
> +
> +/**
> + * cpu_throttle_get_ratio:
> + * @cpu: The vcpu whose throttle ratio to return.
> + *
> + * Returns the ratio being used to throttle this vcpu. See cpu_throttle_start
> + * for details.
> + *
> + * Returns The ratio being used to throttle this vcpu.
> + */
> +float cpu_throttle_get_ratio(CPUState *cpu);
> +
>  #ifndef CONFIG_USER_ONLY
>
>  typedef void (*CPUInterruptHandler)(CPUState *, int);
> --
> 1.9.1
>
>

Thanks Jason, this patch would be quite interesting as it eliminates
slight overhead from scheduler when cgroups are actively used for same
task (~5% for per-vcpu cgroup layout simular to libvirt`s one for
guest perf numa bench). Are you planning to add wakeup frequency
throttler as well to same interface?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] migration: Dynamic cpu throttling for auto-converge
  2015-06-01 15:17 ` [Qemu-devel] [PATCH 2/2] migration: Dynamic cpu throttling for auto-converge Jason J. Herne
@ 2015-06-01 15:32   ` Dr. David Alan Gilbert
  2015-06-01 17:16     ` Jason J. Herne
  2015-06-03  7:21   ` Juan Quintela
  1 sibling, 1 reply; 15+ messages in thread
From: Dr. David Alan Gilbert @ 2015-06-01 15:32 UTC (permalink / raw)
  To: Jason J. Herne; +Cc: amit.shah, borntraeger, qemu-devel, afaerber, quintela

* Jason J. Herne (jjherne@linux.vnet.ibm.com) wrote:
> Remove traditional auto-converge static 30ms throttling code and replace it
> with a dynamic throttling algorithm.
> 
> Additionally, be more aggressive when deciding when to start throttling.
> Previously we waited until four unproductive memory passes. Now we begin
> throttling after only two unproductive memory passes. Four seemed quite
> arbitrary and only waiting for two passes allows us to complete the migration
> faster.
> 
> Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
> Reviewed-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
> ---
>  arch_init.c           | 95 +++++++++++++++++----------------------------------
>  migration/migration.c |  9 +++++
>  2 files changed, 41 insertions(+), 63 deletions(-)
> 
> diff --git a/arch_init.c b/arch_init.c
> index 23d3feb..73ae494 100644
> --- a/arch_init.c
> +++ b/arch_init.c
> @@ -111,9 +111,7 @@ int graphic_depth = 32;
>  #endif
>  
>  const uint32_t arch_type = QEMU_ARCH;
> -static bool mig_throttle_on;
>  static int dirty_rate_high_cnt;
> -static void check_guest_throttling(void);
>  
>  static uint64_t bitmap_sync_count;
>  
> @@ -487,6 +485,31 @@ static size_t save_page_header(QEMUFile *f, RAMBlock *block, ram_addr_t offset)
>      return size;
>  }
>  
> +/* Reduce amount of guest cpu execution to hopefully slow down memory writes.
> + * If guest dirty memory rate is reduced below the rate at which we can
> + * transfer pages to the destination then we should be able to complete
> + * migration. Some workloads dirty memory way too fast and will not effectively
> + * converge, even with auto-converge. For these workloads we will continue to
> + * increase throttling until the guest is paused long enough to complete the
> + * migration. This essentially becomes a non-live migration.
> + */
> +static void mig_throttle_guest_down(void)
> +{
> +    CPUState *cpu;
> +
> +    CPU_FOREACH(cpu) {
> +        /* We have not started throttling yet. Lets start it.*/
> +        if (!cpu_throttle_active(cpu)) {
> +            cpu_throttle_start(cpu, 0.2);
> +        }
> +
> +        /* Throttling is already in place. Just increase the throttling rate */
> +        else {
> +            cpu_throttle_start(cpu, cpu_throttle_get_ratio(cpu) * 2);
> +        }

Now that migration has migrate_parameters, it would be best to replace
the magic numbers (the 0.2, the *2 - anything else?)  by parameters that can
change the starting throttling and increase rate.  It would probably also be
good to make the current throttling rate visible in info somewhere; maybe
info migrate?

> +    }
> +}
> +
>  /* Update the xbzrle cache to reflect a page that's been sent as all 0.
>   * The important thing is that a stale (not-yet-0'd) page be replaced
>   * by the new data.
> @@ -714,21 +737,21 @@ static void migration_bitmap_sync(void)
>              /* The following detection logic can be refined later. For now:
>                 Check to see if the dirtied bytes is 50% more than the approx.
>                 amount of bytes that just got transferred since the last time we
> -               were in this routine. If that happens >N times (for now N==4)
> -               we turn on the throttle down logic */
> +               were in this routine. If that happens twice, start or increase
> +               throttling */
>              bytes_xfer_now = ram_bytes_transferred();
> +
>              if (s->dirty_pages_rate &&
>                 (num_dirty_pages_period * TARGET_PAGE_SIZE >
>                     (bytes_xfer_now - bytes_xfer_prev)/2) &&
> -               (dirty_rate_high_cnt++ > 4)) {
> +               (dirty_rate_high_cnt++ >= 2)) {
>                      trace_migration_throttle();
> -                    mig_throttle_on = true;
>                      dirty_rate_high_cnt = 0;
> +                    mig_throttle_guest_down();
>               }
>               bytes_xfer_prev = bytes_xfer_now;
> -        } else {
> -             mig_throttle_on = false;
>          }
> +
>          if (migrate_use_xbzrle()) {
>              if (iterations_prev != acct_info.iterations) {
>                  acct_info.xbzrle_cache_miss_rate =
> @@ -1197,7 +1220,6 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
>      RAMBlock *block;
>      int64_t ram_bitmap_pages; /* Size of bitmap in pages, including gaps */
>  
> -    mig_throttle_on = false;
>      dirty_rate_high_cnt = 0;
>      bitmap_sync_count = 0;
>      migration_bitmap_sync_init();
> @@ -1301,12 +1323,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
>          }
>          pages_sent += pages;
>          acct_info.iterations++;
> -        check_guest_throttling();
> -        /* we want to check in the 1st loop, just in case it was the 1st time
> -           and we had to sync the dirty bitmap.
> -           qemu_get_clock_ns() is a bit expensive, so we only check each some
> -           iterations
> -        */
> +
>          if ((i & 63) == 0) {
>              uint64_t t1 = (qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - t0) / 1000000;
>              if (t1 > MAX_WAIT) {
> @@ -1913,51 +1930,3 @@ TargetInfo *qmp_query_target(Error **errp)
>      return info;
>  }
>  
> -/* Stub function that's gets run on the vcpu when its brought out of the
> -   VM to run inside qemu via async_run_on_cpu()*/
> -static void mig_sleep_cpu(void *opq)
> -{
> -    qemu_mutex_unlock_iothread();
> -    g_usleep(30*1000);
> -    qemu_mutex_lock_iothread();
> -}
> -
> -/* To reduce the dirty rate explicitly disallow the VCPUs from spending
> -   much time in the VM. The migration thread will try to catchup.
> -   Workload will experience a performance drop.
> -*/
> -static void mig_throttle_guest_down(void)
> -{
> -    CPUState *cpu;
> -
> -    qemu_mutex_lock_iothread();
> -    CPU_FOREACH(cpu) {
> -        async_run_on_cpu(cpu, mig_sleep_cpu, NULL);
> -    }
> -    qemu_mutex_unlock_iothread();
> -}
> -
> -static void check_guest_throttling(void)
> -{
> -    static int64_t t0;
> -    int64_t        t1;
> -
> -    if (!mig_throttle_on) {
> -        return;
> -    }
> -
> -    if (!t0)  {
> -        t0 = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
> -        return;
> -    }
> -
> -    t1 = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
> -
> -    /* If it has been more than 40 ms since the last time the guest
> -     * was throttled then do it again.
> -     */
> -    if (40 < (t1-t0)/1000000) {
> -        mig_throttle_guest_down();
> -        t0 = t1;
> -    }
> -}

Lots of deleted code; that's got to be good.

> diff --git a/migration/migration.c b/migration/migration.c
> index 732d229..c9545df 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -25,6 +25,7 @@
>  #include "qemu/thread.h"
>  #include "qmp-commands.h"
>  #include "trace.h"
> +#include "qom/cpu.h"
>  
>  #define MAX_THROTTLE  (32 << 20)      /* Migration speed throttling */
>  
> @@ -731,6 +732,7 @@ int64_t migrate_xbzrle_cache_size(void)
>  static void *migration_thread(void *opaque)
>  {
>      MigrationState *s = opaque;
> +    CPUState *cpu;
>      int64_t initial_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
>      int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST);
>      int64_t initial_bytes = 0;
> @@ -814,6 +816,13 @@ static void *migration_thread(void *opaque)
>          }
>      }
>  
> +    /* If we enabled cpu throttling for auto-converge, turn it off. */
> +    CPU_FOREACH(cpu) {
> +        if (cpu_throttle_active(cpu)) {
> +            cpu_throttle_stop(cpu);
> +        }
> +    }
> +
>      qemu_mutex_lock_iothread();
>      if (s->state == MIGRATION_STATUS_COMPLETED) {
>          int64_t end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> -- 
> 1.9.1
> 

Dave

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] cpu: Provide vcpu throttling interface
  2015-06-01 15:23   ` Andrey Korolyov
@ 2015-06-01 17:04     ` Jason J. Herne
  0 siblings, 0 replies; 15+ messages in thread
From: Jason J. Herne @ 2015-06-01 17:04 UTC (permalink / raw)
  To: Andrey Korolyov
  Cc: quintela, qemu-devel, Dr. David Alan Gilbert, borntraeger,
	Amit Shah, afaerber

On 06/01/2015 11:23 AM, Andrey Korolyov wrote:
> On Mon, Jun 1, 2015 at 6:17 PM, Jason J. Herne
> <jjherne@linux.vnet.ibm.com> wrote:
>> Provide a method to throttle guest cpu execution. CPUState is augmented with
>> timeout controls and throttle start/stop functions. To throttle the guest cpu
>> the caller simply has to call the throttle start function and provide a ratio of
>> sleep time to normal execution time.
>>
>> Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
>> Reviewed-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
>> ---
>>   cpus.c            | 62 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   include/qom/cpu.h | 46 +++++++++++++++++++++++++++++++++++++++++
>>   2 files changed, 108 insertions(+)
>>
>> diff --git a/cpus.c b/cpus.c
>> index de6469f..7568357 100644
>> --- a/cpus.c
>> +++ b/cpus.c
>> @@ -64,6 +64,9 @@
>>
>>   #endif /* CONFIG_LINUX */
>>
>> +/* Number of ms between cpu throttle operations */
>> +#define CPU_THROTTLE_TIMESLICE 10
>> +
>>   static CPUState *next_cpu;
>>   int64_t max_delay;
>>   int64_t max_advance;
>> @@ -919,6 +922,65 @@ static void qemu_kvm_wait_io_event(CPUState *cpu)
>>       qemu_wait_io_event_common(cpu);
>>   }
>>
>> +static void cpu_throttle_thread(void *opq)
>> +{
>> +    CPUState *cpu = (CPUState *)opq;
>> +    long sleeptime_ms = (long)(cpu->throttle_ratio * CPU_THROTTLE_TIMESLICE);
>> +
>> +    /* Stop the timer if needed */
>> +    if (cpu->throttle_timer_stop) {
>> +        timer_del(cpu->throttle_timer);
>> +        timer_free(cpu->throttle_timer);
>> +        cpu->throttle_timer = NULL;
>> +        return;
>> +    }
>> +
>> +    qemu_mutex_unlock_iothread();
>> +    g_usleep(sleeptime_ms * 1000); /* Convert ms to us for usleep call */
>> +    qemu_mutex_lock_iothread();
>> +
>> +    timer_mod(cpu->throttle_timer, qemu_clock_get_ms(QEMU_CLOCK_REALTIME) +
>> +                                   CPU_THROTTLE_TIMESLICE);
>> +}
>> +
>> +static void cpu_throttle_timer_pop(void *opq)
>> +{
>> +    CPUState *cpu = (CPUState *)opq;
>> +
>> +    async_run_on_cpu(cpu, cpu_throttle_thread, cpu);
>> +}
>> +
>> +void cpu_throttle_start(CPUState *cpu, float throttle_ratio)
>> +{
>> +    assert(throttle_ratio > 0);
>> +    cpu->throttle_ratio = throttle_ratio;
>> +
>> +    if (!cpu_throttle_active(cpu)) {
>> +        cpu->throttle_timer = timer_new_ms(QEMU_CLOCK_REALTIME,
>> +                                           cpu_throttle_timer_pop, cpu);
>> +        timer_mod(cpu->throttle_timer, qemu_clock_get_ms(QEMU_CLOCK_REALTIME) +
>> +                                       CPU_THROTTLE_TIMESLICE);
>> +        cpu->throttle_timer_stop = false;
>> +    }
>> +}
>> +
>> +void cpu_throttle_stop(CPUState *cpu)
>> +{
>> +    assert(cpu_throttle_active(cpu));
>> +    cpu->throttle_timer_stop = true;
>> +}
>> +
>> +bool cpu_throttle_active(CPUState *cpu)
>> +{
>> +    return (cpu->throttle_timer != NULL);
>> +}
>> +
>> +float cpu_throttle_get_ratio(CPUState *cpu)
>> +{
>> +    assert(cpu_throttle_active(cpu));
>> +    return cpu->throttle_ratio;
>> +}
>> +
>>   static void *qemu_kvm_cpu_thread_fn(void *arg)
>>   {
>>       CPUState *cpu = arg;
>> diff --git a/include/qom/cpu.h b/include/qom/cpu.h
>> index 39f0f19..9d16e6a 100644
>> --- a/include/qom/cpu.h
>> +++ b/include/qom/cpu.h
>> @@ -310,6 +310,11 @@ struct CPUState {
>>       uint32_t can_do_io;
>>       int32_t exception_index; /* used by m68k TCG */
>>
>> +    /* vcpu throttling controls */
>> +    QEMUTimer *throttle_timer;
>> +    bool throttle_timer_stop;
>> +    float throttle_ratio;
>> +
>>       /* Note that this is accessed at the start of every TB via a negative
>>          offset from AREG0.  Leave this field at the end so as to make the
>>          (absolute value) offset as small as possible.  This reduces code
>> @@ -553,6 +558,47 @@ CPUState *qemu_get_cpu(int index);
>>    */
>>   bool cpu_exists(int64_t id);
>>
>> +/**
>> + * cpu_throttle_start:
>> + * @cpu: The vcpu to throttle
>> + *
>> + * Throttles a vcpu by forcing it to sleep. The duration of the sleep is a
>> + * ratio of sleep time to running time. A ratio of 1.0 corresponds to a 50%
>> + * duty cycle (example: 10ms sleep for every 10ms awake).
>> + *
>> + * cpu_throttle_start can be called as needed to adjust the throttle ratio.
>> + * Once the throttling starts, it will remain in effect until cpu_throttle_stop
>> + * is called.
>> + */
>> +void cpu_throttle_start(CPUState *cpu, float throttle_ratio);
>> +
>> +/**
>> + * cpu_throttle_stop:
>> + * @cpu: The vcpu to stop throttling
>> + *
>> + * Stops the vcpu throttling started by cpu_throttle_start.
>> + */
>> +void cpu_throttle_stop(CPUState *cpu);
>> +
>> +/**
>> + * cpu_throttle_active:
>> + * @cpu: The vcpu to check
>> + *
>> + * Returns %true if this vcpu is currently being throttled, %false otherwise.
>> + */
>> +bool cpu_throttle_active(CPUState *cpu);
>> +
>> +/**
>> + * cpu_throttle_get_ratio:
>> + * @cpu: The vcpu whose throttle ratio to return.
>> + *
>> + * Returns the ratio being used to throttle this vcpu. See cpu_throttle_start
>> + * for details.
>> + *
>> + * Returns The ratio being used to throttle this vcpu.
>> + */
>> +float cpu_throttle_get_ratio(CPUState *cpu);
>> +
>>   #ifndef CONFIG_USER_ONLY
>>
>>   typedef void (*CPUInterruptHandler)(CPUState *, int);
>> --
>> 1.9.1
>>
>>
>
> Thanks Jason, this patch would be quite interesting as it eliminates
> slight overhead from scheduler when cgroups are actively used for same
> task (~5% for per-vcpu cgroup layout simular to libvirt`s one for
> guest perf numa bench). Are you planning to add wakeup frequency
> throttler as well to same interface?
>

No, I was not planning on adding anything. I recognise that the controls
this patch provides are not an ideal throttling mechanism. But it seems
better than what we have today for auto-converge. That said, if people
like the interface and the mechanism it certainly can be enhanced.


-- 
-- Jason J. Herne (jjherne@linux.vnet.ibm.com)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] migration: Dynamic cpu throttling for auto-converge
  2015-06-01 15:32   ` Dr. David Alan Gilbert
@ 2015-06-01 17:16     ` Jason J. Herne
  2015-06-02 13:58       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 15+ messages in thread
From: Jason J. Herne @ 2015-06-01 17:16 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: amit.shah, borntraeger, quintela, qemu-devel, afaerber

On 06/01/2015 11:32 AM, Dr. David Alan Gilbert wrote:
> * Jason J. Herne (jjherne@linux.vnet.ibm.com) wrote:
>> Remove traditional auto-converge static 30ms throttling code and replace it
>> with a dynamic throttling algorithm.
>>
>> Additionally, be more aggressive when deciding when to start throttling.
>> Previously we waited until four unproductive memory passes. Now we begin
>> throttling after only two unproductive memory passes. Four seemed quite
>> arbitrary and only waiting for two passes allows us to complete the migration
>> faster.
>>
>> Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
>> Reviewed-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
>> ---
>>   arch_init.c           | 95 +++++++++++++++++----------------------------------
>>   migration/migration.c |  9 +++++
>>   2 files changed, 41 insertions(+), 63 deletions(-)
>>
>> diff --git a/arch_init.c b/arch_init.c
>> index 23d3feb..73ae494 100644
>> --- a/arch_init.c
>> +++ b/arch_init.c
>> @@ -111,9 +111,7 @@ int graphic_depth = 32;
>>   #endif
>>
>>   const uint32_t arch_type = QEMU_ARCH;
>> -static bool mig_throttle_on;
>>   static int dirty_rate_high_cnt;
>> -static void check_guest_throttling(void);
>>
>>   static uint64_t bitmap_sync_count;
>>
>> @@ -487,6 +485,31 @@ static size_t save_page_header(QEMUFile *f, RAMBlock *block, ram_addr_t offset)
>>       return size;
>>   }
>>
>> +/* Reduce amount of guest cpu execution to hopefully slow down memory writes.
>> + * If guest dirty memory rate is reduced below the rate at which we can
>> + * transfer pages to the destination then we should be able to complete
>> + * migration. Some workloads dirty memory way too fast and will not effectively
>> + * converge, even with auto-converge. For these workloads we will continue to
>> + * increase throttling until the guest is paused long enough to complete the
>> + * migration. This essentially becomes a non-live migration.
>> + */
>> +static void mig_throttle_guest_down(void)
>> +{
>> +    CPUState *cpu;
>> +
>> +    CPU_FOREACH(cpu) {
>> +        /* We have not started throttling yet. Lets start it.*/
>> +        if (!cpu_throttle_active(cpu)) {
>> +            cpu_throttle_start(cpu, 0.2);
>> +        }
>> +
>> +        /* Throttling is already in place. Just increase the throttling rate */
>> +        else {
>> +            cpu_throttle_start(cpu, cpu_throttle_get_ratio(cpu) * 2);
>> +        }
>
> Now that migration has migrate_parameters, it would be best to replace
> the magic numbers (the 0.2, the *2 - anything else?)  by parameters that can
> change the starting throttling and increase rate.  It would probably also be
> good to make the current throttling rate visible in info somewhere; maybe
> info migrate?
>

I did consider all of this. However, I don't think that the controls
this patch provides are an ideal throttling mechanism. I suspect someone 
with
vcpu/scheduling experience could whip up something more user friendly 
and cleaner.
I merely propose this because it seems better than what we have today for
auto-converge.

Also, I'm not sure how useful the information really is to the user. The 
fact that it
is a ratio instead of a percentage might be confusing. Also, I suspect 
it is not
truly very accurate. Again, I was going for "make it better", not "make 
it perfect".

Lastly, if we define this external interface we are kind of stuck with 
it, yes? In
this regard we should be sure that this is how we want cpu throttling to 
work.
Alternatively, I propose to accept this patch set as-is and then work on 
a real
vcpu Throttling mechanism that can be used for auto-converge as well as 
a user
controllable guest throttling/limiting mechanism. Once that is in place 
we can
migrate (no pun intended) the auto-converge code to the new way and 
remove this
stuff.

With all of that said, I'm willing to provide the requested controls if 
we really
feel the pros outweigh the cons. Thanks for your review :).

...

-- 
-- Jason J. Herne (jjherne@linux.vnet.ibm.com)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] migration: Dynamic cpu throttling for auto-converge
  2015-06-01 17:16     ` Jason J. Herne
@ 2015-06-02 13:58       ` Dr. David Alan Gilbert
  2015-06-02 14:37         ` Jason J. Herne
  0 siblings, 1 reply; 15+ messages in thread
From: Dr. David Alan Gilbert @ 2015-06-02 13:58 UTC (permalink / raw)
  To: Jason J. Herne; +Cc: amit.shah, borntraeger, quintela, qemu-devel, afaerber

* Jason J. Herne (jjherne@linux.vnet.ibm.com) wrote:
> On 06/01/2015 11:32 AM, Dr. David Alan Gilbert wrote:
> >* Jason J. Herne (jjherne@linux.vnet.ibm.com) wrote:
> >>Remove traditional auto-converge static 30ms throttling code and replace it
> >>with a dynamic throttling algorithm.
> >>
> >>Additionally, be more aggressive when deciding when to start throttling.
> >>Previously we waited until four unproductive memory passes. Now we begin
> >>throttling after only two unproductive memory passes. Four seemed quite
> >>arbitrary and only waiting for two passes allows us to complete the migration
> >>faster.
> >>
> >>Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
> >>Reviewed-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
> >>---
> >>  arch_init.c           | 95 +++++++++++++++++----------------------------------
> >>  migration/migration.c |  9 +++++
> >>  2 files changed, 41 insertions(+), 63 deletions(-)
> >>
> >>diff --git a/arch_init.c b/arch_init.c
> >>index 23d3feb..73ae494 100644
> >>--- a/arch_init.c
> >>+++ b/arch_init.c
> >>@@ -111,9 +111,7 @@ int graphic_depth = 32;
> >>  #endif
> >>
> >>  const uint32_t arch_type = QEMU_ARCH;
> >>-static bool mig_throttle_on;
> >>  static int dirty_rate_high_cnt;
> >>-static void check_guest_throttling(void);
> >>
> >>  static uint64_t bitmap_sync_count;
> >>
> >>@@ -487,6 +485,31 @@ static size_t save_page_header(QEMUFile *f, RAMBlock *block, ram_addr_t offset)
> >>      return size;
> >>  }
> >>
> >>+/* Reduce amount of guest cpu execution to hopefully slow down memory writes.
> >>+ * If guest dirty memory rate is reduced below the rate at which we can
> >>+ * transfer pages to the destination then we should be able to complete
> >>+ * migration. Some workloads dirty memory way too fast and will not effectively
> >>+ * converge, even with auto-converge. For these workloads we will continue to
> >>+ * increase throttling until the guest is paused long enough to complete the
> >>+ * migration. This essentially becomes a non-live migration.
> >>+ */
> >>+static void mig_throttle_guest_down(void)
> >>+{
> >>+    CPUState *cpu;
> >>+
> >>+    CPU_FOREACH(cpu) {
> >>+        /* We have not started throttling yet. Lets start it.*/
> >>+        if (!cpu_throttle_active(cpu)) {
> >>+            cpu_throttle_start(cpu, 0.2);
> >>+        }
> >>+
> >>+        /* Throttling is already in place. Just increase the throttling rate */
> >>+        else {
> >>+            cpu_throttle_start(cpu, cpu_throttle_get_ratio(cpu) * 2);
> >>+        }
> >
> >Now that migration has migrate_parameters, it would be best to replace
> >the magic numbers (the 0.2, the *2 - anything else?)  by parameters that can
> >change the starting throttling and increase rate.  It would probably also be
> >good to make the current throttling rate visible in info somewhere; maybe
> >info migrate?
> >
> 
> I did consider all of this. However, I don't think that the controls
> this patch provides are an ideal throttling mechanism. I suspect someone
> with
> vcpu/scheduling experience could whip up something more user friendly and
> cleaner.
> I merely propose this because it seems better than what we have today for
> auto-converge.
> 
> Also, I'm not sure how useful the information really is to the user. The
> fact that it is a ratio instead of a percentage might be confusing. Also,
> I suspect it is not
> truly very accurate. Again, I was going for "make it better", not "make it
> perfect".
> 
> Lastly, if we define this external interface we are kind of stuck with it,
> yes? 

Well, one thing you can do is add a parameter with a name starting with x-
which means it's not a fixed interface (so things like libvirt wont use it).
And the reason I was interested in seeing the information was otherwise
we don't really have any way of knowing how well the code is working;
is it already throttling down more and more?

> In
> this regard we should be sure that this is how we want cpu throttling to
> work.  Alternatively, I propose to accept this patch set as-is and then work on a
> real vcpu Throttling mechanism that can be used for auto-converge as well as a
> user controllable guest throttling/limiting mechanism. Once that is in place we
> can migrate (no pun intended) the auto-converge code to the new way and remove
> this stuff.

Yes, it's probably still better than what we already have.

Dave


> 
> With all of that said, I'm willing to provide the requested controls if we
> really
> feel the pros outweigh the cons. Thanks for your review :).
> 
> ...
> 
> -- 
> -- Jason J. Herne (jjherne@linux.vnet.ibm.com)
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] migration: Dynamic cpu throttling for auto-converge
  2015-06-02 13:58       ` Dr. David Alan Gilbert
@ 2015-06-02 14:37         ` Jason J. Herne
  2015-06-02 14:57           ` Dr. David Alan Gilbert
                             ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Jason J. Herne @ 2015-06-02 14:37 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: amit.shah, borntraeger, quintela, qemu-devel, afaerber

On 06/02/2015 09:58 AM, Dr. David Alan Gilbert wrote:
> * Jason J. Herne (jjherne@linux.vnet.ibm.com) wrote:
>> On 06/01/2015 11:32 AM, Dr. David Alan Gilbert wrote:
>>> * Jason J. Herne (jjherne@linux.vnet.ibm.com) wrote:
>>>> Remove traditional auto-converge static 30ms throttling code and replace it
>>>> with a dynamic throttling algorithm.
>>>>
>>>> Additionally, be more aggressive when deciding when to start throttling.
>>>> Previously we waited until four unproductive memory passes. Now we begin
>>>> throttling after only two unproductive memory passes. Four seemed quite
>>>> arbitrary and only waiting for two passes allows us to complete the migration
>>>> faster.
>>>>
>>>> Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
>>>> Reviewed-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
>>>> ---
>>>>   arch_init.c           | 95 +++++++++++++++++----------------------------------
>>>>   migration/migration.c |  9 +++++
>>>>   2 files changed, 41 insertions(+), 63 deletions(-)
>>>>
>>>> diff --git a/arch_init.c b/arch_init.c
>>>> index 23d3feb..73ae494 100644
>>>> --- a/arch_init.c
>>>> +++ b/arch_init.c
>>>> @@ -111,9 +111,7 @@ int graphic_depth = 32;
>>>>   #endif
>>>>
>>>>   const uint32_t arch_type = QEMU_ARCH;
>>>> -static bool mig_throttle_on;
>>>>   static int dirty_rate_high_cnt;
>>>> -static void check_guest_throttling(void);
>>>>
>>>>   static uint64_t bitmap_sync_count;
>>>>
>>>> @@ -487,6 +485,31 @@ static size_t save_page_header(QEMUFile *f, RAMBlock *block, ram_addr_t offset)
>>>>       return size;
>>>>   }
>>>>
>>>> +/* Reduce amount of guest cpu execution to hopefully slow down memory writes.
>>>> + * If guest dirty memory rate is reduced below the rate at which we can
>>>> + * transfer pages to the destination then we should be able to complete
>>>> + * migration. Some workloads dirty memory way too fast and will not effectively
>>>> + * converge, even with auto-converge. For these workloads we will continue to
>>>> + * increase throttling until the guest is paused long enough to complete the
>>>> + * migration. This essentially becomes a non-live migration.
>>>> + */
>>>> +static void mig_throttle_guest_down(void)
>>>> +{
>>>> +    CPUState *cpu;
>>>> +
>>>> +    CPU_FOREACH(cpu) {
>>>> +        /* We have not started throttling yet. Lets start it.*/
>>>> +        if (!cpu_throttle_active(cpu)) {
>>>> +            cpu_throttle_start(cpu, 0.2);
>>>> +        }
>>>> +
>>>> +        /* Throttling is already in place. Just increase the throttling rate */
>>>> +        else {
>>>> +            cpu_throttle_start(cpu, cpu_throttle_get_ratio(cpu) * 2);
>>>> +        }
>>>
>>> Now that migration has migrate_parameters, it would be best to replace
>>> the magic numbers (the 0.2, the *2 - anything else?)  by parameters that can
>>> change the starting throttling and increase rate.  It would probably also be
>>> good to make the current throttling rate visible in info somewhere; maybe
>>> info migrate?
>>>
>>
>> I did consider all of this. However, I don't think that the controls
>> this patch provides are an ideal throttling mechanism. I suspect someone
>> with
>> vcpu/scheduling experience could whip up something more user friendly and
>> cleaner.
>> I merely propose this because it seems better than what we have today for
>> auto-converge.
>>
>> Also, I'm not sure how useful the information really is to the user. The
>> fact that it is a ratio instead of a percentage might be confusing. Also,
>> I suspect it is not
>> truly very accurate. Again, I was going for "make it better", not "make it
>> perfect".
>>
>> Lastly, if we define this external interface we are kind of stuck with it,
>> yes?
>
> Well, one thing you can do is add a parameter with a name starting with x-
> which means it's not a fixed interface (so things like libvirt wont use it).
> And the reason I was interested in seeing the information was otherwise
> we don't really have any way of knowing how well the code is working;
> is it already throttling down more and more?
>

Fair point. Can we add a qmp/hmp command like "info x-cpu-throttle"? Or a
new line in "info migrate" output, "x-throttle-ration:  1.0" perhaps?
Would this mess up libvirt's parsing of the hmp/qmp data?


-- 
-- Jason J. Herne (jjherne@linux.vnet.ibm.com)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] migration: Dynamic cpu throttling for auto-converge
  2015-06-02 14:37         ` Jason J. Herne
@ 2015-06-02 14:57           ` Dr. David Alan Gilbert
  2015-06-02 16:45           ` Eric Blake
  2015-06-03  7:24           ` Juan Quintela
  2 siblings, 0 replies; 15+ messages in thread
From: Dr. David Alan Gilbert @ 2015-06-02 14:57 UTC (permalink / raw)
  To: Jason J. Herne; +Cc: amit.shah, borntraeger, quintela, qemu-devel, afaerber

* Jason J. Herne (jjherne@linux.vnet.ibm.com) wrote:
> On 06/02/2015 09:58 AM, Dr. David Alan Gilbert wrote:
> >* Jason J. Herne (jjherne@linux.vnet.ibm.com) wrote:
> >>On 06/01/2015 11:32 AM, Dr. David Alan Gilbert wrote:
> >>>* Jason J. Herne (jjherne@linux.vnet.ibm.com) wrote:
> >>>>Remove traditional auto-converge static 30ms throttling code and replace it
> >>>>with a dynamic throttling algorithm.
> >>>>
> >>>>Additionally, be more aggressive when deciding when to start throttling.
> >>>>Previously we waited until four unproductive memory passes. Now we begin
> >>>>throttling after only two unproductive memory passes. Four seemed quite
> >>>>arbitrary and only waiting for two passes allows us to complete the migration
> >>>>faster.
> >>>>
> >>>>Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
> >>>>Reviewed-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
> >>>>---
> >>>>  arch_init.c           | 95 +++++++++++++++++----------------------------------
> >>>>  migration/migration.c |  9 +++++
> >>>>  2 files changed, 41 insertions(+), 63 deletions(-)
> >>>>
> >>>>diff --git a/arch_init.c b/arch_init.c
> >>>>index 23d3feb..73ae494 100644
> >>>>--- a/arch_init.c
> >>>>+++ b/arch_init.c
> >>>>@@ -111,9 +111,7 @@ int graphic_depth = 32;
> >>>>  #endif
> >>>>
> >>>>  const uint32_t arch_type = QEMU_ARCH;
> >>>>-static bool mig_throttle_on;
> >>>>  static int dirty_rate_high_cnt;
> >>>>-static void check_guest_throttling(void);
> >>>>
> >>>>  static uint64_t bitmap_sync_count;
> >>>>
> >>>>@@ -487,6 +485,31 @@ static size_t save_page_header(QEMUFile *f, RAMBlock *block, ram_addr_t offset)
> >>>>      return size;
> >>>>  }
> >>>>
> >>>>+/* Reduce amount of guest cpu execution to hopefully slow down memory writes.
> >>>>+ * If guest dirty memory rate is reduced below the rate at which we can
> >>>>+ * transfer pages to the destination then we should be able to complete
> >>>>+ * migration. Some workloads dirty memory way too fast and will not effectively
> >>>>+ * converge, even with auto-converge. For these workloads we will continue to
> >>>>+ * increase throttling until the guest is paused long enough to complete the
> >>>>+ * migration. This essentially becomes a non-live migration.
> >>>>+ */
> >>>>+static void mig_throttle_guest_down(void)
> >>>>+{
> >>>>+    CPUState *cpu;
> >>>>+
> >>>>+    CPU_FOREACH(cpu) {
> >>>>+        /* We have not started throttling yet. Lets start it.*/
> >>>>+        if (!cpu_throttle_active(cpu)) {
> >>>>+            cpu_throttle_start(cpu, 0.2);
> >>>>+        }
> >>>>+
> >>>>+        /* Throttling is already in place. Just increase the throttling rate */
> >>>>+        else {
> >>>>+            cpu_throttle_start(cpu, cpu_throttle_get_ratio(cpu) * 2);
> >>>>+        }
> >>>
> >>>Now that migration has migrate_parameters, it would be best to replace
> >>>the magic numbers (the 0.2, the *2 - anything else?)  by parameters that can
> >>>change the starting throttling and increase rate.  It would probably also be
> >>>good to make the current throttling rate visible in info somewhere; maybe
> >>>info migrate?
> >>>
> >>
> >>I did consider all of this. However, I don't think that the controls
> >>this patch provides are an ideal throttling mechanism. I suspect someone
> >>with
> >>vcpu/scheduling experience could whip up something more user friendly and
> >>cleaner.
> >>I merely propose this because it seems better than what we have today for
> >>auto-converge.
> >>
> >>Also, I'm not sure how useful the information really is to the user. The
> >>fact that it is a ratio instead of a percentage might be confusing. Also,
> >>I suspect it is not
> >>truly very accurate. Again, I was going for "make it better", not "make it
> >>perfect".
> >>
> >>Lastly, if we define this external interface we are kind of stuck with it,
> >>yes?
> >
> >Well, one thing you can do is add a parameter with a name starting with x-
> >which means it's not a fixed interface (so things like libvirt wont use it).
> >And the reason I was interested in seeing the information was otherwise
> >we don't really have any way of knowing how well the code is working;
> >is it already throttling down more and more?
> >
> 
> Fair point. Can we add a qmp/hmp command like "info x-cpu-throttle"? Or a
> new line in "info migrate" output, "x-throttle-ration:  1.0" perhaps?
> Would this mess up libvirt's parsing of the hmp/qmp data?

A friendly libvirt person said that it will ignore extra data
(and if it doesn't it's a bug).

Dave

> -- 
> -- Jason J. Herne (jjherne@linux.vnet.ibm.com)
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] migration: Dynamic cpu throttling for auto-converge
  2015-06-02 14:37         ` Jason J. Herne
  2015-06-02 14:57           ` Dr. David Alan Gilbert
@ 2015-06-02 16:45           ` Eric Blake
  2015-06-03  7:24           ` Juan Quintela
  2 siblings, 0 replies; 15+ messages in thread
From: Eric Blake @ 2015-06-02 16:45 UTC (permalink / raw)
  To: jjherne, Dr. David Alan Gilbert
  Cc: amit.shah, borntraeger, qemu-devel, afaerber, quintela

[-- Attachment #1: Type: text/plain, Size: 1101 bytes --]

On 06/02/2015 08:37 AM, Jason J. Herne wrote:
>>
>> Well, one thing you can do is add a parameter with a name starting
>> with x-
>> which means it's not a fixed interface (so things like libvirt wont
>> use it).
>> And the reason I was interested in seeing the information was otherwise
>> we don't really have any way of knowing how well the code is working;
>> is it already throttling down more and more?
>>
> 
> Fair point. Can we add a qmp/hmp command like "info x-cpu-throttle"? Or a
> new line in "info migrate" output, "x-throttle-ration:  1.0" perhaps?
> Would this mess up libvirt's parsing of the hmp/qmp data?

Libvirt doesn't parse HMP data, and when parsing QMP data it gracefully
ignores any fields it doesn't recognize (especially fields beginning
with 'x-') (and if not, that's a bug to be fixed in libvirt).  The QMP
protocol even documents that clients MUST be prepared for new keys to
appear in output.  So this proposal should not cause any grief.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] cpu: Provide vcpu throttling interface
  2015-06-01 15:17 ` [Qemu-devel] [PATCH 1/2] cpu: Provide vcpu throttling interface Jason J. Herne
  2015-06-01 15:23   ` Andrey Korolyov
@ 2015-06-03  7:12   ` Juan Quintela
  2015-06-03 14:35     ` Jason J. Herne
  1 sibling, 1 reply; 15+ messages in thread
From: Juan Quintela @ 2015-06-03  7:12 UTC (permalink / raw)
  To: Jason J. Herne; +Cc: amit.shah, borntraeger, qemu-devel, afaerber, dgilbert

"Jason J. Herne" <jjherne@linux.vnet.ibm.com> wrote:
> Provide a method to throttle guest cpu execution. CPUState is augmented with
> timeout controls and throttle start/stop functions. To throttle the guest cpu
> the caller simply has to call the throttle start function and provide a ratio of
> sleep time to normal execution time.
>
> Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
> Reviewed-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
> ---
>  cpus.c            | 62 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  include/qom/cpu.h | 46 +++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 108 insertions(+)
>
> diff --git a/cpus.c b/cpus.c
> index de6469f..7568357 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -64,6 +64,9 @@
>  
>  #endif /* CONFIG_LINUX */
>  
> +/* Number of ms between cpu throttle operations */
> +#define CPU_THROTTLE_TIMESLICE 10
> +
>  static CPUState *next_cpu;
>  int64_t max_delay;
>  int64_t max_advance;
> @@ -919,6 +922,65 @@ static void qemu_kvm_wait_io_event(CPUState *cpu)
>      qemu_wait_io_event_common(cpu);
>  }
>  
> +static void cpu_throttle_thread(void *opq)
> +{
> +    CPUState *cpu = (CPUState *)opq;

Cast not neeeded. Same in rest of functions.

> +    long sleeptime_ms = (long)(cpu->throttle_ratio * CPU_THROTTLE_TIMESLICE);
> +
> +    /* Stop the timer if needed */
> +    if (cpu->throttle_timer_stop) {
> +        timer_del(cpu->throttle_timer);
> +        timer_free(cpu->throttle_timer);
> +        cpu->throttle_timer = NULL;
> +        return;
> +    }
> +
> +    qemu_mutex_unlock_iothread();

We are creating a thread for each cpu, but just for waiting?
That is exactly what the iothread does, no?

> +    g_usleep(sleeptime_ms * 1000); /* Convert ms to us for usleep call */
> +    qemu_mutex_lock_iothread();
> +
> +    timer_mod(cpu->throttle_timer, qemu_clock_get_ms(QEMU_CLOCK_REALTIME) +
> +                                   CPU_THROTTLE_TIMESLICE);
> +}
> +
> +static void cpu_throttle_timer_pop(void *opq)
> +{
> +    CPUState *cpu = (CPUState *)opq;
> +
> +    async_run_on_cpu(cpu, cpu_throttle_thread, cpu);
> +}
> +
> +void cpu_throttle_start(CPUState *cpu, float throttle_ratio)
> +{
> +    assert(throttle_ratio > 0);
> +    cpu->throttle_ratio = throttle_ratio;
> +
> +    if (!cpu_throttle_active(cpu)) {
> +        cpu->throttle_timer = timer_new_ms(QEMU_CLOCK_REALTIME,
> +                                           cpu_throttle_timer_pop, cpu);
> +        timer_mod(cpu->throttle_timer, qemu_clock_get_ms(QEMU_CLOCK_REALTIME) +
> +                                       CPU_THROTTLE_TIMESLICE);
> +        cpu->throttle_timer_stop = false;
> +    }
> +}
> +
> +void cpu_throttle_stop(CPUState *cpu)
> +{
> +    assert(cpu_throttle_active(cpu));
> +    cpu->throttle_timer_stop = true;
> +}
> +
> +bool cpu_throttle_active(CPUState *cpu)
> +{
> +    return (cpu->throttle_timer != NULL);
> +}
> +
> +float cpu_throttle_get_ratio(CPUState *cpu)
> +{
> +    assert(cpu_throttle_active(cpu));
> +    return cpu->throttle_ratio;
> +}
> +
>  static void *qemu_kvm_cpu_thread_fn(void *arg)
>  {
>      CPUState *cpu = arg;
> diff --git a/include/qom/cpu.h b/include/qom/cpu.h
> index 39f0f19..9d16e6a 100644
> --- a/include/qom/cpu.h
> +++ b/include/qom/cpu.h
> @@ -310,6 +310,11 @@ struct CPUState {
>      uint32_t can_do_io;
>      int32_t exception_index; /* used by m68k TCG */
>  
> +    /* vcpu throttling controls */
> +    QEMUTimer *throttle_timer;
> +    bool throttle_timer_stop;
> +    float throttle_ratio;
> +

I think that if we go this way, we can have a single throotling thread
that do a for loop for each vcpu.  As a bonus, we got a single timer,
and single bool and a single throotle_ratio (i.e. it don't need to be on
the cpu struct).  Being a global will don't matter because we are
requiring the iothread anyways.



>      /* Note that this is accessed at the start of every TB via a negative
>         offset from AREG0.  Leave this field at the end so as to make the
>         (absolute value) offset as small as possible.  This reduces code
> @@ -553,6 +558,47 @@ CPUState *qemu_get_cpu(int index);
>   */
>  bool cpu_exists(int64_t id);
>  
> +/**
> + * cpu_throttle_start:
> + * @cpu: The vcpu to throttle
> + *
> + * Throttles a vcpu by forcing it to sleep. The duration of the sleep is a
> + * ratio of sleep time to running time. A ratio of 1.0 corresponds to a 50%
> + * duty cycle (example: 10ms sleep for every 10ms awake).
> + *
> + * cpu_throttle_start can be called as needed to adjust the throttle ratio.
> + * Once the throttling starts, it will remain in effect until cpu_throttle_stop
> + * is called.
> + */
> +void cpu_throttle_start(CPUState *cpu, float throttle_ratio);
> +
> +/**
> + * cpu_throttle_stop:
> + * @cpu: The vcpu to stop throttling
> + *
> + * Stops the vcpu throttling started by cpu_throttle_start.
> + */
> +void cpu_throttle_stop(CPUState *cpu);
> +
> +/**
> + * cpu_throttle_active:
> + * @cpu: The vcpu to check
> + *
> + * Returns %true if this vcpu is currently being throttled, %false otherwise.
> + */
> +bool cpu_throttle_active(CPUState *cpu);
> +
> +/**
> + * cpu_throttle_get_ratio:
> + * @cpu: The vcpu whose throttle ratio to return.
> + *
> + * Returns the ratio being used to throttle this vcpu. See cpu_throttle_start
> + * for details.
> + *
> + * Returns The ratio being used to throttle this vcpu.
> + */
> +float cpu_throttle_get_ratio(CPUState *cpu);
> +
>  #ifndef CONFIG_USER_ONLY
>  
>  typedef void (*CPUInterruptHandler)(CPUState *, int);

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] migration: Dynamic cpu throttling for auto-converge
  2015-06-01 15:17 ` [Qemu-devel] [PATCH 2/2] migration: Dynamic cpu throttling for auto-converge Jason J. Herne
  2015-06-01 15:32   ` Dr. David Alan Gilbert
@ 2015-06-03  7:21   ` Juan Quintela
  1 sibling, 0 replies; 15+ messages in thread
From: Juan Quintela @ 2015-06-03  7:21 UTC (permalink / raw)
  To: Jason J. Herne; +Cc: amit.shah, borntraeger, qemu-devel, afaerber, dgilbert

"Jason J. Herne" <jjherne@linux.vnet.ibm.com> wrote:
> Remove traditional auto-converge static 30ms throttling code and replace it
> with a dynamic throttling algorithm.
>
> Additionally, be more aggressive when deciding when to start throttling.
> Previously we waited until four unproductive memory passes. Now we begin
> throttling after only two unproductive memory passes. Four seemed quite
> arbitrary and only waiting for two passes allows us to complete the migration
> faster.
>
> Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
> Reviewed-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
> ---
>  arch_init.c           | 95 +++++++++++++++++----------------------------------
>  migration/migration.c |  9 +++++
>  2 files changed, 41 insertions(+), 63 deletions(-)
>
> diff --git a/arch_init.c b/arch_init.c
> index 23d3feb..73ae494 100644
> --- a/arch_init.c
> +++ b/arch_init.c
> @@ -111,9 +111,7 @@ int graphic_depth = 32;
>  #endif
>  
>  const uint32_t arch_type = QEMU_ARCH;
> -static bool mig_throttle_on;
>  static int dirty_rate_high_cnt;
> -static void check_guest_throttling(void);
>  
>  static uint64_t bitmap_sync_count;
>  
> @@ -487,6 +485,31 @@ static size_t save_page_header(QEMUFile *f, RAMBlock *block, ram_addr_t offset)
>      return size;
>  }
>  
> +/* Reduce amount of guest cpu execution to hopefully slow down memory writes.
> + * If guest dirty memory rate is reduced below the rate at which we can
> + * transfer pages to the destination then we should be able to complete
> + * migration. Some workloads dirty memory way too fast and will not effectively
> + * converge, even with auto-converge. For these workloads we will continue to
> + * increase throttling until the guest is paused long enough to complete the
> + * migration. This essentially becomes a non-live migration.
> + */
> +static void mig_throttle_guest_down(void)
> +{
> +    CPUState *cpu;
> +
> +    CPU_FOREACH(cpu) {
> +        /* We have not started throttling yet. Lets start it.*/
> +        if (!cpu_throttle_active(cpu)) {
> +            cpu_throttle_start(cpu, 0.2);
> +        }
> +
> +        /* Throttling is already in place. Just increase the throttling rate */
> +        else {
> +            cpu_throttle_start(cpu, cpu_throttle_get_ratio(cpu) * 2);
> +        }

If we go this design (I think I preffer the old one), can we rename

cpu_throottle_start() to cpu_throotle_increase(), that just do this?  As
an added bonus, we don't have to export cpu_throttle_get_ratio() and
cpu_throotle_active().


> +    }
> +}
> +
>  /* Update the xbzrle cache to reflect a page that's been sent as all 0.
>   * The important thing is that a stale (not-yet-0'd) page be replaced
>   * by the new data.
> @@ -714,21 +737,21 @@ static void migration_bitmap_sync(void)
>              /* The following detection logic can be refined later. For now:
>                 Check to see if the dirtied bytes is 50% more than the approx.
>                 amount of bytes that just got transferred since the last time we
> -               were in this routine. If that happens >N times (for now N==4)
> -               we turn on the throttle down logic */
> +               were in this routine. If that happens twice, start or increase
> +               throttling */
>              bytes_xfer_now = ram_bytes_transferred();
> +
>              if (s->dirty_pages_rate &&
>                 (num_dirty_pages_period * TARGET_PAGE_SIZE >
>                     (bytes_xfer_now - bytes_xfer_prev)/2) &&
> -               (dirty_rate_high_cnt++ > 4)) {
> +               (dirty_rate_high_cnt++ >= 2)) {

Once we are here, cahnging this to a parameter is a good idea.
The same for the 0.2 increase.


>                      trace_migration_throttle();
> -                    mig_throttle_on = true;
>                      dirty_rate_high_cnt = 0;
> +                    mig_throttle_guest_down();
>               }
>               bytes_xfer_prev = bytes_xfer_now;
> -        } else {
> -             mig_throttle_on = false;
>          }
> +
>          if (migrate_use_xbzrle()) {
>              if (iterations_prev != acct_info.iterations) {
>                  acct_info.xbzrle_cache_miss_rate =
> @@ -1197,7 +1220,6 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
>      RAMBlock *block;
>      int64_t ram_bitmap_pages; /* Size of bitmap in pages, including gaps */
>  
> -    mig_throttle_on = false;
>      dirty_rate_high_cnt = 0;
>      bitmap_sync_count = 0;
>      migration_bitmap_sync_init();
> @@ -1301,12 +1323,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
>          }
>          pages_sent += pages;
>          acct_info.iterations++;
> -        check_guest_throttling();


Notice, ond old code, we check for throotling each time that we do an
iteration through ram, reason of doing it here is that it avoids us to
have to had additional threads to control it.


> -        /* we want to check in the 1st loop, just in case it was the 1st time
> -           and we had to sync the dirty bitmap.
> -           qemu_get_clock_ns() is a bit expensive, so we only check each some
> -           iterations
> -        */
> +
>          if ((i & 63) == 0) {
>              uint64_t t1 = (qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - t0) / 1000000;
>              if (t1 > MAX_WAIT) {
> @@ -1913,51 +1930,3 @@ TargetInfo *qmp_query_target(Error **errp)
>      return info;
>  }
>  
> -/* Stub function that's gets run on the vcpu when its brought out of the
> -   VM to run inside qemu via async_run_on_cpu()*/
> -static void mig_sleep_cpu(void *opq)
> -{
> -    qemu_mutex_unlock_iothread();
> -    g_usleep(30*1000);
> -    qemu_mutex_lock_iothread();
> -}
> -
> -/* To reduce the dirty rate explicitly disallow the VCPUs from spending
> -   much time in the VM. The migration thread will try to catchup.
> -   Workload will experience a performance drop.
> -*/
> -static void mig_throttle_guest_down(void)
> -{
> -    CPUState *cpu;
> -
> -    qemu_mutex_lock_iothread();
> -    CPU_FOREACH(cpu) {
> -        async_run_on_cpu(cpu, mig_sleep_cpu, NULL);
> -    }
> -    qemu_mutex_unlock_iothread();
> -}
> -
> -static void check_guest_throttling(void)
> -{
> -    static int64_t t0;
> -    int64_t        t1;
> -
> -    if (!mig_throttle_on) {
> -        return;
> -    }
> -
> -    if (!t0)  {
> -        t0 = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
> -        return;
> -    }
> -
> -    t1 = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
> -
> -    /* If it has been more than 40 ms since the last time the guest
> -     * was throttled then do it again.
> -     */
> -    if (40 < (t1-t0)/1000000) {
> -        mig_throttle_guest_down();
> -        t0 = t1;
> -    }
> -}
> diff --git a/migration/migration.c b/migration/migration.c
> index 732d229..c9545df 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -25,6 +25,7 @@
>  #include "qemu/thread.h"
>  #include "qmp-commands.h"
>  #include "trace.h"
> +#include "qom/cpu.h"
>  
>  #define MAX_THROTTLE  (32 << 20)      /* Migration speed throttling */
>  
> @@ -731,6 +732,7 @@ int64_t migrate_xbzrle_cache_size(void)
>  static void *migration_thread(void *opaque)
>  {
>      MigrationState *s = opaque;
> +    CPUState *cpu;
>      int64_t initial_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
>      int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST);
>      int64_t initial_bytes = 0;
> @@ -814,6 +816,13 @@ static void *migration_thread(void *opaque)
>          }
>      }
>  
> +    /* If we enabled cpu throttling for auto-converge, turn it off. */
> +    CPU_FOREACH(cpu) {
> +        if (cpu_throttle_active(cpu)) {
> +            cpu_throttle_stop(cpu);
> +        }

It is a question of style, but I will do the other way, put the test for
cpu_throttle_active() inside cpu_throttle_stop().

And if you move to having a single thread, you get a simpler cleanup,  or using the migration_thread
for this, you get the cleanup for free.


> +    }
> +
>      qemu_mutex_lock_iothread();
>      if (s->state == MIGRATION_STATUS_COMPLETED) {
>          int64_t end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);

Thanks, Juan.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] migration: Dynamic cpu throttling for auto-converge
  2015-06-02 14:37         ` Jason J. Herne
  2015-06-02 14:57           ` Dr. David Alan Gilbert
  2015-06-02 16:45           ` Eric Blake
@ 2015-06-03  7:24           ` Juan Quintela
  2 siblings, 0 replies; 15+ messages in thread
From: Juan Quintela @ 2015-06-03  7:24 UTC (permalink / raw)
  To: Jason J. Herne
  Cc: amit.shah, borntraeger, Dr. David Alan Gilbert, afaerber, qemu-devel

"Jason J. Herne" <jjherne@linux.vnet.ibm.com> wrote:
> On 06/02/2015 09:58 AM, Dr. David Alan Gilbert wrote:
>> * Jason J. Herne (jjherne@linux.vnet.ibm.com) wrote:
>>> On 06/01/2015 11:32 AM, Dr. David Alan Gilbert wrote:
>>>> * Jason J. Herne (jjherne@linux.vnet.ibm.com) wrote:
>>>>> Remove traditional auto-converge static 30ms throttling code and replace it
>>>>> with a dynamic throttling algorithm.
>>>>>
>>>>> Additionally, be more aggressive when deciding when to start throttling.
>>>>> Previously we waited until four unproductive memory passes. Now we begin
>>>>> throttling after only two unproductive memory passes. Four seemed quite
>>>>> arbitrary and only waiting for two passes allows us to complete the migration
>>>>> faster.
>>>>>
>>>>> Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
>>>>> Reviewed-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
>>>>> ---
>>>>>   arch_init.c           | 95 +++++++++++++++++----------------------------------
>>>>>   migration/migration.c |  9 +++++
>>>>>   2 files changed, 41 insertions(+), 63 deletions(-)
>>>>>
>>>>> diff --git a/arch_init.c b/arch_init.c
>>>>> index 23d3feb..73ae494 100644
>>>>> --- a/arch_init.c
>>>>> +++ b/arch_init.c
>>>>> @@ -111,9 +111,7 @@ int graphic_depth = 32;
>>>>>   #endif
>>>>>
>>>>>   const uint32_t arch_type = QEMU_ARCH;
>>>>> -static bool mig_throttle_on;
>>>>>   static int dirty_rate_high_cnt;
>>>>> -static void check_guest_throttling(void);
>>>>>
>>>>>   static uint64_t bitmap_sync_count;
>>>>>
>>>>> @@ -487,6 +485,31 @@ static size_t save_page_header(QEMUFile *f, RAMBlock *block, ram_addr_t offset)
>>>>>       return size;
>>>>>   }
>>>>>
>>>>> +/* Reduce amount of guest cpu execution to hopefully slow down memory writes.
>>>>> + * If guest dirty memory rate is reduced below the rate at which we can
>>>>> + * transfer pages to the destination then we should be able to complete
>>>>> + * migration. Some workloads dirty memory way too fast and will not effectively
>>>>> + * converge, even with auto-converge. For these workloads we will continue to
>>>>> + * increase throttling until the guest is paused long enough to complete the
>>>>> + * migration. This essentially becomes a non-live migration.
>>>>> + */
>>>>> +static void mig_throttle_guest_down(void)
>>>>> +{
>>>>> +    CPUState *cpu;
>>>>> +
>>>>> +    CPU_FOREACH(cpu) {
>>>>> +        /* We have not started throttling yet. Lets start it.*/
>>>>> +        if (!cpu_throttle_active(cpu)) {
>>>>> +            cpu_throttle_start(cpu, 0.2);
>>>>> +        }
>>>>> +
>>>>> +        /* Throttling is already in place. Just increase the throttling rate */
>>>>> +        else {
>>>>> +            cpu_throttle_start(cpu, cpu_throttle_get_ratio(cpu) * 2);
>>>>> +        }
>>>>
>>>> Now that migration has migrate_parameters, it would be best to replace
>>>> the magic numbers (the 0.2, the *2 - anything else?)  by parameters that can
>>>> change the starting throttling and increase rate.  It would probably also be
>>>> good to make the current throttling rate visible in info somewhere; maybe
>>>> info migrate?
>>>>
>>>
>>> I did consider all of this. However, I don't think that the controls
>>> this patch provides are an ideal throttling mechanism. I suspect someone
>>> with
>>> vcpu/scheduling experience could whip up something more user friendly and
>>> cleaner.
>>> I merely propose this because it seems better than what we have today for
>>> auto-converge.
>>>
>>> Also, I'm not sure how useful the information really is to the user. The
>>> fact that it is a ratio instead of a percentage might be confusing. Also,
>>> I suspect it is not
>>> truly very accurate. Again, I was going for "make it better", not "make it
>>> perfect".
>>>
>>> Lastly, if we define this external interface we are kind of stuck with it,
>>> yes?
>>
>> Well, one thing you can do is add a parameter with a name starting with x-
>> which means it's not a fixed interface (so things like libvirt wont use it).
>> And the reason I was interested in seeing the information was otherwise
>> we don't really have any way of knowing how well the code is working;
>> is it already throttling down more and more?
>>
>
> Fair point. Can we add a qmp/hmp command like "info x-cpu-throttle"? Or a
> new line in "info migrate" output, "x-throttle-ration:  1.0" perhaps?
> Would this mess up libvirt's parsing of the hmp/qmp data?

info migrate with extra field, please.  Name it with x-throotle-ratio or
whatever.

I would also preffer to ha a percentage (or per thousand) that is an
integer, but that is just a prefference and I don't really care too
much.

Later, Juan.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] cpu: Provide vcpu throttling interface
  2015-06-03  7:12   ` Juan Quintela
@ 2015-06-03 14:35     ` Jason J. Herne
  0 siblings, 0 replies; 15+ messages in thread
From: Jason J. Herne @ 2015-06-03 14:35 UTC (permalink / raw)
  To: quintela; +Cc: amit.shah, borntraeger, qemu-devel, dgilbert, afaerber

On 06/03/2015 03:12 AM, Juan Quintela wrote:
>> diff --git a/include/qom/cpu.h b/include/qom/cpu.h
>> >index 39f0f19..9d16e6a 100644
>> >--- a/include/qom/cpu.h
>> >+++ b/include/qom/cpu.h
>> >@@ -310,6 +310,11 @@ struct CPUState {
>> >      uint32_t can_do_io;
>> >      int32_t exception_index; /* used by m68k TCG */
>> >
>> >+    /* vcpu throttling controls */
>> >+    QEMUTimer *throttle_timer;
>> >+    bool throttle_timer_stop;
>> >+    float throttle_ratio;
>> >+
> I think that if we go this way, we can have a single throotling thread
> that do a for loop for each vcpu.  As a bonus, we got a single timer,
> and single bool and a single throotle_ratio (i.e. it don't need to be on
> the cpu struct).  Being a global will don't matter because we are
> requiring the iothread anyways.
>

Juan,

I like this idea. I'll work up another patch set :).

-- 
-- Jason J. Herne (jjherne@linux.vnet.ibm.com)

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2015-06-03 14:35 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-01 15:17 [Qemu-devel] [PATCH 0/2] migration: Dynamic cpu throttling for auto-converge Jason J. Herne
2015-06-01 15:17 ` [Qemu-devel] [PATCH 1/2] cpu: Provide vcpu throttling interface Jason J. Herne
2015-06-01 15:23   ` Andrey Korolyov
2015-06-01 17:04     ` Jason J. Herne
2015-06-03  7:12   ` Juan Quintela
2015-06-03 14:35     ` Jason J. Herne
2015-06-01 15:17 ` [Qemu-devel] [PATCH 2/2] migration: Dynamic cpu throttling for auto-converge Jason J. Herne
2015-06-01 15:32   ` Dr. David Alan Gilbert
2015-06-01 17:16     ` Jason J. Herne
2015-06-02 13:58       ` Dr. David Alan Gilbert
2015-06-02 14:37         ` Jason J. Herne
2015-06-02 14:57           ` Dr. David Alan Gilbert
2015-06-02 16:45           ` Eric Blake
2015-06-03  7:24           ` Juan Quintela
2015-06-03  7:21   ` Juan Quintela

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.