All of lore.kernel.org
 help / color / mirror / Atom feed
From: Cosmin Marin <cosmin@nutanix.com>
To: qemu-devel@nongnu.org
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Richard Henderson <rth@twiddle.net>,
	Cosmin Marin <cosmin@nutanix.com>
Subject: [Qemu-devel] [PATCH] migration: Improve accuracy of vCPU throttling with per-vCPU timers
Date: Fri, 14 Jun 2019 09:11:06 -0700	[thread overview]
Message-ID: <20190614161106.218854-1-cosmin@nutanix.com> (raw)

During auto-convergence live migration, the configured throttling rate
is not matched in practice. Experimental measurements of throughput for
a memory-write intensive workload indicate disparities between expected
and measured throttle rate - when set to 99%, the actual throttle rate
was 95%. The workload spawns multiple threads (#threads equals #vCPUs)
that dirty most of the VM's memory in an infinite loop.

The root cause is the usage of a VM-wide timer to schedule and execute
asynchronously cpu_throttle_thread() on the vCPUs. Firstly, there are
scalability limitations at scheduling time as a VM-wide (global) loop
must iterate over all vCPUs while running atomic operations (i.e., may
induce delays between vCPUs); moreover, if a vCPU is already running
cpu_throttle_thread() (!DONE) it is skipped (i.e., may induce uneven
aggregate sleep times across vCPUs). Secondly, there is a race condition
between the vCPU threads and the 'scheduling' (migration) thread as a
vCPU thread needs to release the iothread lock, sleep, reacquire the
lock and mark "itself" as completed (DONE). Configuring correct per-vCPU
sleep intervals using this model is non-trivial.

To address the above issues, per-vCPU timers replace the per-VM timer.
The migration thread globally decides the throttling level while each
vCPU thread calculates the equivalent sleep times and sleeps
accordingly. The following table summarizes the results obtained by
running the workload on a 22vCPUs/45GB VM in both scenarios.

+----------------------------------------------------------------+
|          |      per-VM Timer        |   per-vCPU Timer         |
|  Target  |==========================|==========================|
| Throttle | Throughput |    Actual   | Throughput |    Actual   |
|    (%)   |   (GBps)   | Throttle(%) |   (GBps)   | Throttle(%) |
|----------|------------|-------------|------------|-------------|
|         0|     ~493.50|            0|     ~493.50|           0 |
|        20|      395.65|        19.81|      390.35|        20.88|
|        30|      356.43|        27.76|      342.39|        30.60|
|        40|      317.26|        35.69|      293.99|        40.41|
|        50|      268.78|        45.52|      244.95|        50.35|
|        60|      214.61|        56.50|      195.23|        60.43|
|        70|      164.72|        66.61|      147.55|        70.09|
|        80|      112.62|        77.17|       98.52|        80.03|
|        90|       57.09|        88.43|       47.90|        90.29|
|        99|       26.87|        94.55|        3.11|        99.36|
+----------------------------------------------------------------+

The results support a per-vCPU timer model as it produces more accurate
throttling.

Signed-off-by: Cosmin Marin <cosmin@nutanix.com>
---
 cpus.c            | 29 +++++++++++++++--------------
 include/qom/cpu.h |  4 ++--
 2 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/cpus.c b/cpus.c
index dde3b7b981..c2bd3babf6 100644
--- a/cpus.c
+++ b/cpus.c
@@ -80,7 +80,6 @@ int64_t max_delay;
 int64_t max_advance;
 
 /* vcpu throttling controls */
-static QEMUTimer *throttle_timer;
 static unsigned int throttle_percentage;
 
 #define CPU_THROTTLE_PCT_MIN 1
@@ -792,40 +791,42 @@ static void cpu_throttle_thread(CPUState *cpu, run_on_cpu_data opaque)
     qemu_mutex_unlock_iothread();
     g_usleep(sleeptime_ns / 1000); /* Convert ns to us for usleep call */
     qemu_mutex_lock_iothread();
-    atomic_set(&cpu->throttle_thread_scheduled, 0);
 }
 
 static void cpu_throttle_timer_tick(void *opaque)
 {
-    CPUState *cpu;
+    CPUState *cpu = (CPUState *)opaque;
     double pct;
 
     /* Stop the timer if needed */
     if (!cpu_throttle_get_percentage()) {
         return;
     }
-    CPU_FOREACH(cpu) {
-        if (!atomic_xchg(&cpu->throttle_thread_scheduled, 1)) {
-            async_run_on_cpu(cpu, cpu_throttle_thread,
-                             RUN_ON_CPU_NULL);
-        }
-    }
+    
+    async_run_on_cpu(cpu, cpu_throttle_thread, RUN_ON_CPU_NULL);
 
     pct = (double)cpu_throttle_get_percentage()/100;
-    timer_mod(throttle_timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL_RT) +
+    timer_mod(cpu->throttle_timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL_RT) +
                                    CPU_THROTTLE_TIMESLICE_NS / (1-pct));
 }
 
 void cpu_throttle_set(int new_throttle_pct)
 {
+    CPUState *cpu;
+    double pct;
+
     /* Ensure throttle percentage is within valid range */
     new_throttle_pct = MIN(new_throttle_pct, CPU_THROTTLE_PCT_MAX);
     new_throttle_pct = MAX(new_throttle_pct, CPU_THROTTLE_PCT_MIN);
 
     atomic_set(&throttle_percentage, new_throttle_pct);
 
-    timer_mod(throttle_timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL_RT) +
-                                       CPU_THROTTLE_TIMESLICE_NS);
+    pct = (double)new_throttle_pct/100;
+    CPU_FOREACH(cpu) {
+        timer_mod_anticipate(cpu->throttle_timer,
+                qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL_RT) +
+                CPU_THROTTLE_TIMESLICE_NS / (1-pct));
+    }
 }
 
 void cpu_throttle_stop(void)
@@ -848,8 +849,6 @@ void cpu_ticks_init(void)
     seqlock_init(&timers_state.vm_clock_seqlock);
     qemu_spin_init(&timers_state.vm_clock_lock);
     vmstate_register(NULL, 0, &vmstate_timers, &timers_state);
-    throttle_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL_RT,
-                                           cpu_throttle_timer_tick, NULL);
 }
 
 void configure_icount(QemuOpts *opts, Error **errp)
@@ -1267,6 +1266,8 @@ static void *qemu_kvm_cpu_thread_fn(void *arg)
     qemu_thread_get_self(cpu->thread);
     cpu->thread_id = qemu_get_thread_id();
     cpu->can_do_io = 1;
+    cpu->throttle_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL_RT,
+            cpu_throttle_timer_tick, cpu);
     current_cpu = cpu;
 
     r = kvm_init_vcpu(cpu);
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index 5ee0046b62..5a11baec69 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -439,10 +439,10 @@ struct CPUState {
     /* shared by kvm, hax and hvf */
     bool vcpu_dirty;
 
-    /* Used to keep track of an outstanding cpu throttle thread for migration
+    /* Used to cyclically trigger vCPU throttling during VM migration
      * autoconverge
      */
-    bool throttle_thread_scheduled;
+    QEMUTimer *throttle_timer;
 
     bool ignore_memory_transaction_failures;
 
-- 
2.16.5



             reply	other threads:[~2019-06-14 16:40 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-14 16:11 Cosmin Marin [this message]
2019-06-14 17:36 ` [Qemu-devel] [PATCH] migration: Improve accuracy of vCPU throttling with per-vCPU timers no-reply
2019-06-17  3:46 ` Peter Xu
2019-06-18 12:25   ` Cosmin Marin
2019-06-18 14:51     ` Peter Xu
2019-06-18 16:52       ` Cosmin Marin
2019-06-19  1:35         ` Peter Xu
2019-06-19 15:23           ` Cosmin Marin
2019-06-20  2:55             ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190614161106.218854-1-cosmin@nutanix.com \
    --to=cosmin@nutanix.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=rth@twiddle.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.