QEMU-Devel Archive on lore.kernel.org
 help / color / Atom feed
* [RFC 0/3] QEMU cpus.c refactoring
@ 2020-05-21 18:54 Claudio Fontana
  2020-05-21 18:54 ` [RFC 1/3] cpu-throttle: new module, extracted from cpus.c Claudio Fontana
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Claudio Fontana @ 2020-05-21 18:54 UTC (permalink / raw)
  To: Paolo Bonzini, Alex Bennée, Peter Maydell,
	Philippe Mathieu-Daudé
  Cc: Laurent Vivier, Thomas Huth, Eduardo Habkost, Marcelo Tosatti,
	open list:All patches CC here, Roman Bolshakov, Wenchao Wang,
	Colin Xu, Claudio Fontana, open list:X86 HAXM CPUs,
	Sunil Muthuswamy, Richard Henderson

Motivation and higher level steps:

https://lists.gnu.org/archive/html/qemu-devel/2020-05/msg04628.html

This is point 8) in that plan. The idea is to extract the unrelated parts
in cpus, and register interfaces from each single accelerator to the main
cpus module (cpus.c).

While doing this RFC, I noticed some assumptions about Windows being
either TCG or HAX (not considering WHPX) that might need to be revisited.
I added a comment there.

The thing builds successfully based on Linux cross-compilations for
windows/hax, windows/whpx, and I got a good build on Darwin/hvf.

Tests run successully for tcg and kvm configurations, but did not test on
windows or darwin.

Welcome your feedback and help on this,

Claudio

Claudio Fontana (3):
  cpu-throttle: new module, extracted from cpus.c
  cpu-timers: new module extracted from cpus.c
  cpus: implement cpus interfaces for per-accelerator threads

 MAINTAINERS                          |    3 +
 Makefile.target                      |    9 +-
 accel/kvm/Makefile.objs              |    2 +
 accel/kvm/kvm-all.c                  |   15 +-
 accel/kvm/kvm-cpus-interface.c       |   94 ++
 accel/kvm/kvm-cpus-interface.h       |    8 +
 accel/qtest.c                        |   85 +-
 accel/stubs/kvm-stub.c               |    3 +-
 accel/tcg/Makefile.objs              |    1 +
 accel/tcg/cpu-exec.c                 |   43 +-
 accel/tcg/tcg-all.c                  |   19 +-
 accel/tcg/tcg-cpus-interface.c       |  523 +++++++++
 accel/tcg/tcg-cpus-interface.h       |    8 +
 accel/tcg/translate-all.c            |    3 +-
 cpu-throttle.c                       |  122 ++
 cpu-timers.c                         |  776 +++++++++++++
 cpus.c                               | 2015 ++++------------------------------
 docs/replay.txt                      |    6 +-
 exec.c                               |    4 -
 hw/core/cpu.c                        |    1 +
 hw/core/ptimer.c                     |    6 +-
 hw/i386/x86.c                        |    1 +
 include/exec/cpu-all.h               |    4 +
 include/exec/exec-all.h              |    4 +-
 include/hw/core/cpu.h                |   37 -
 include/qemu/main-loop.h             |    5 +
 include/qemu/timer.h                 |   20 -
 include/sysemu/cpu-throttle.h        |   50 +
 include/sysemu/cpu-timers.h          |   73 ++
 include/sysemu/cpus.h                |   56 +-
 include/sysemu/hw_accel.h            |   57 +-
 include/sysemu/kvm.h                 |    2 +-
 include/sysemu/replay.h              |    4 +-
 migration/migration.c                |    1 +
 migration/ram.c                      |    1 +
 qtest.c                              |    2 +-
 replay/replay.c                      |    6 +-
 softmmu/vl.c                         |    8 +-
 stubs/Makefile.objs                  |    1 +
 stubs/clock-warp.c                   |    4 +-
 stubs/cpu-get-clock.c                |    2 +-
 stubs/cpu-get-icount.c               |   14 +-
 stubs/cpu-synchronize-state.c        |   15 +
 target/alpha/translate.c             |    3 +-
 target/arm/helper.c                  |    7 +-
 target/i386/Makefile.objs            |    7 +-
 target/i386/hax-all.c                |    6 +-
 target/i386/hax-cpus-interface.c     |   85 ++
 target/i386/hax-cpus-interface.h     |    8 +
 target/i386/hax-i386.h               |    2 +
 target/i386/hax-posix.c              |   12 +
 target/i386/hax-windows.c            |   20 +
 target/i386/hvf/Makefile.objs        |    2 +-
 target/i386/hvf/hvf-cpus-interface.c |   83 ++
 target/i386/hvf/hvf-cpus-interface.h |    8 +
 target/i386/hvf/hvf.c                |    5 +-
 target/i386/kvm.c                    |    4 +-
 target/i386/whpx-all.c               |    3 +
 target/i386/whpx-cpus-interface.c    |   96 ++
 target/i386/whpx-cpus-interface.h    |    8 +
 target/riscv/csr.c                   |    8 +-
 tests/ptimer-test-stubs.c            |    6 +
 tests/test-timed-average.c           |    2 +-
 util/main-loop.c                     |    4 +-
 util/qemu-timer.c                    |    9 +-
 65 files changed, 2524 insertions(+), 1977 deletions(-)
 create mode 100644 accel/kvm/kvm-cpus-interface.c
 create mode 100644 accel/kvm/kvm-cpus-interface.h
 create mode 100644 accel/tcg/tcg-cpus-interface.c
 create mode 100644 accel/tcg/tcg-cpus-interface.h
 create mode 100644 cpu-throttle.c
 create mode 100644 cpu-timers.c
 create mode 100644 include/sysemu/cpu-throttle.h
 create mode 100644 include/sysemu/cpu-timers.h
 create mode 100644 stubs/cpu-synchronize-state.c
 create mode 100644 target/i386/hax-cpus-interface.c
 create mode 100644 target/i386/hax-cpus-interface.h
 create mode 100644 target/i386/hvf/hvf-cpus-interface.c
 create mode 100644 target/i386/hvf/hvf-cpus-interface.h
 create mode 100644 target/i386/whpx-cpus-interface.c
 create mode 100644 target/i386/whpx-cpus-interface.h

-- 
2.16.4



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC 1/3] cpu-throttle: new module, extracted from cpus.c
  2020-05-21 18:54 [RFC 0/3] QEMU cpus.c refactoring Claudio Fontana
@ 2020-05-21 18:54 ` Claudio Fontana
  2020-05-22  6:07   ` Thomas Huth
  2020-05-21 18:54 ` [RFC 2/3] cpu-timers: new module " Claudio Fontana
  2020-05-21 18:54 ` [RFC 3/3] cpus: implement cpus interfaces for per-accel threads Claudio Fontana
  2 siblings, 1 reply; 11+ messages in thread
From: Claudio Fontana @ 2020-05-21 18:54 UTC (permalink / raw)
  To: Paolo Bonzini, Alex Bennée, Peter Maydell,
	Philippe Mathieu-Daudé
  Cc: Laurent Vivier, Thomas Huth, Eduardo Habkost, Marcelo Tosatti,
	open list:All patches CC here, Roman Bolshakov, Wenchao Wang,
	Colin Xu, Claudio Fontana, open list:X86 HAXM CPUs,
	Sunil Muthuswamy, Richard Henderson

this is a first step in the refactoring of cpus.c.

Signed-off-by: Claudio Fontana <cfontana@suse.de>
---
 MAINTAINERS                   |   1 +
 Makefile.target               |   8 ++-
 cpu-throttle.c                | 122 ++++++++++++++++++++++++++++++++++++++++++
 cpus.c                        |  95 +++-----------------------------
 include/hw/core/cpu.h         |  37 -------------
 include/qemu/main-loop.h      |   5 ++
 include/sysemu/cpu-throttle.h |  50 +++++++++++++++++
 migration/migration.c         |   1 +
 migration/ram.c               |   1 +
 9 files changed, 195 insertions(+), 125 deletions(-)
 create mode 100644 cpu-throttle.c
 create mode 100644 include/sysemu/cpu-throttle.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 87a412c229..35864a275a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2142,6 +2142,7 @@ Main loop
 M: Paolo Bonzini <pbonzini@redhat.com>
 S: Maintained
 F: cpus.c
+F: cpu-throttle.c
 F: include/qemu/main-loop.h
 F: include/sysemu/runstate.h
 F: util/main-loop.c
diff --git a/Makefile.target b/Makefile.target
index 8ed1eba95b..60cfa2a78b 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -152,7 +152,13 @@ endif #CONFIG_BSD_USER
 #########################################################
 # System emulator target
 ifdef CONFIG_SOFTMMU
-obj-y += arch_init.o cpus.o gdbstub.o balloon.o ioport.o
+obj-y += arch_init.o
+obj-y += cpus.o
+obj-y += cpu-throttle.o
+obj-y += gdbstub.o
+obj-y += balloon.o
+obj-y += ioport.o
+
 obj-y += qtest.o
 obj-y += dump/
 obj-y += hw/
diff --git a/cpu-throttle.c b/cpu-throttle.c
new file mode 100644
index 0000000000..4e6b2818ca
--- /dev/null
+++ b/cpu-throttle.c
@@ -0,0 +1,122 @@
+/*
+ * QEMU System Emulator
+ *
+ * Copyright (c) 2003-2008 Fabrice Bellard
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "qemu/thread.h"
+#include "hw/core/cpu.h"
+#include "qemu/main-loop.h"
+#include "sysemu/cpus.h"
+#include "sysemu/cpu-throttle.h"
+
+/* vcpu throttling controls */
+static QEMUTimer *throttle_timer;
+static unsigned int throttle_percentage;
+
+#define CPU_THROTTLE_PCT_MIN 1
+#define CPU_THROTTLE_PCT_MAX 99
+#define CPU_THROTTLE_TIMESLICE_NS 10000000
+
+static void cpu_throttle_thread(CPUState *cpu, run_on_cpu_data opaque)
+{
+    double pct;
+    double throttle_ratio;
+    int64_t sleeptime_ns, endtime_ns;
+
+    if (!cpu_throttle_get_percentage()) {
+        return;
+    }
+
+    pct = (double)cpu_throttle_get_percentage() / 100;
+    throttle_ratio = pct / (1 - pct);
+    /* Add 1ns to fix double's rounding error (like 0.9999999...) */
+    sleeptime_ns = (int64_t)(throttle_ratio * CPU_THROTTLE_TIMESLICE_NS + 1);
+    endtime_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME) + sleeptime_ns;
+    while (sleeptime_ns > 0 && !cpu->stop) {
+        if (sleeptime_ns > SCALE_MS) {
+            qemu_cond_timedwait_iothread(cpu->halt_cond,
+                                         sleeptime_ns / SCALE_MS);
+        } else {
+            qemu_mutex_unlock_iothread();
+            g_usleep(sleeptime_ns / SCALE_US);
+            qemu_mutex_lock_iothread();
+        }
+        sleeptime_ns = endtime_ns - qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
+    }
+    atomic_set(&cpu->throttle_thread_scheduled, 0);
+}
+
+static void cpu_throttle_timer_tick(void *opaque)
+{
+    CPUState *cpu;
+    double pct;
+
+    /* Stop the timer if needed */
+    if (!cpu_throttle_get_percentage()) {
+        return;
+    }
+    CPU_FOREACH(cpu) {
+        if (!atomic_xchg(&cpu->throttle_thread_scheduled, 1)) {
+            async_run_on_cpu(cpu, cpu_throttle_thread,
+                             RUN_ON_CPU_NULL);
+        }
+    }
+
+    pct = (double)cpu_throttle_get_percentage() / 100;
+    timer_mod(throttle_timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL_RT) +
+                                   CPU_THROTTLE_TIMESLICE_NS / (1 - pct));
+}
+
+void cpu_throttle_set(int new_throttle_pct)
+{
+    /* Ensure throttle percentage is within valid range */
+    new_throttle_pct = MIN(new_throttle_pct, CPU_THROTTLE_PCT_MAX);
+    new_throttle_pct = MAX(new_throttle_pct, CPU_THROTTLE_PCT_MIN);
+
+    atomic_set(&throttle_percentage, new_throttle_pct);
+
+    timer_mod(throttle_timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL_RT) +
+                                       CPU_THROTTLE_TIMESLICE_NS);
+}
+
+void cpu_throttle_stop(void)
+{
+    atomic_set(&throttle_percentage, 0);
+}
+
+bool cpu_throttle_active(void)
+{
+    return (cpu_throttle_get_percentage() != 0);
+}
+
+int cpu_throttle_get_percentage(void)
+{
+    return atomic_read(&throttle_percentage);
+}
+
+void cpu_throttle_init(void)
+{
+    throttle_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL_RT,
+                                  cpu_throttle_timer_tick, NULL);
+}
diff --git a/cpus.c b/cpus.c
index 5670c96bcf..3a46a4fc2b 100644
--- a/cpus.c
+++ b/cpus.c
@@ -61,6 +61,8 @@
 #include "hw/boards.h"
 #include "hw/hw.h"
 
+#include "sysemu/cpu-throttle.h"
+
 #ifdef CONFIG_LINUX
 
 #include <sys/prctl.h>
@@ -84,14 +86,6 @@ static QemuMutex qemu_global_mutex;
 int64_t max_delay;
 int64_t max_advance;
 
-/* vcpu throttling controls */
-static QEMUTimer *throttle_timer;
-static unsigned int throttle_percentage;
-
-#define CPU_THROTTLE_PCT_MIN 1
-#define CPU_THROTTLE_PCT_MAX 99
-#define CPU_THROTTLE_TIMESLICE_NS 10000000
-
 bool cpu_is_stopped(CPUState *cpu)
 {
     return cpu->stopped || !runstate_is_running();
@@ -710,90 +704,12 @@ static const VMStateDescription vmstate_timers = {
     }
 };
 
-static void cpu_throttle_thread(CPUState *cpu, run_on_cpu_data opaque)
-{
-    double pct;
-    double throttle_ratio;
-    int64_t sleeptime_ns, endtime_ns;
-
-    if (!cpu_throttle_get_percentage()) {
-        return;
-    }
-
-    pct = (double)cpu_throttle_get_percentage()/100;
-    throttle_ratio = pct / (1 - pct);
-    /* Add 1ns to fix double's rounding error (like 0.9999999...) */
-    sleeptime_ns = (int64_t)(throttle_ratio * CPU_THROTTLE_TIMESLICE_NS + 1);
-    endtime_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME) + sleeptime_ns;
-    while (sleeptime_ns > 0 && !cpu->stop) {
-        if (sleeptime_ns > SCALE_MS) {
-            qemu_cond_timedwait(cpu->halt_cond, &qemu_global_mutex,
-                                sleeptime_ns / SCALE_MS);
-        } else {
-            qemu_mutex_unlock_iothread();
-            g_usleep(sleeptime_ns / SCALE_US);
-            qemu_mutex_lock_iothread();
-        }
-        sleeptime_ns = endtime_ns - qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
-    }
-    atomic_set(&cpu->throttle_thread_scheduled, 0);
-}
-
-static void cpu_throttle_timer_tick(void *opaque)
-{
-    CPUState *cpu;
-    double pct;
-
-    /* Stop the timer if needed */
-    if (!cpu_throttle_get_percentage()) {
-        return;
-    }
-    CPU_FOREACH(cpu) {
-        if (!atomic_xchg(&cpu->throttle_thread_scheduled, 1)) {
-            async_run_on_cpu(cpu, cpu_throttle_thread,
-                             RUN_ON_CPU_NULL);
-        }
-    }
-
-    pct = (double)cpu_throttle_get_percentage()/100;
-    timer_mod(throttle_timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL_RT) +
-                                   CPU_THROTTLE_TIMESLICE_NS / (1-pct));
-}
-
-void cpu_throttle_set(int new_throttle_pct)
-{
-    /* Ensure throttle percentage is within valid range */
-    new_throttle_pct = MIN(new_throttle_pct, CPU_THROTTLE_PCT_MAX);
-    new_throttle_pct = MAX(new_throttle_pct, CPU_THROTTLE_PCT_MIN);
-
-    atomic_set(&throttle_percentage, new_throttle_pct);
-
-    timer_mod(throttle_timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL_RT) +
-                                       CPU_THROTTLE_TIMESLICE_NS);
-}
-
-void cpu_throttle_stop(void)
-{
-    atomic_set(&throttle_percentage, 0);
-}
-
-bool cpu_throttle_active(void)
-{
-    return (cpu_throttle_get_percentage() != 0);
-}
-
-int cpu_throttle_get_percentage(void)
-{
-    return atomic_read(&throttle_percentage);
-}
-
 void cpu_ticks_init(void)
 {
     seqlock_init(&timers_state.vm_clock_seqlock);
     qemu_spin_init(&timers_state.vm_clock_lock);
     vmstate_register(NULL, 0, &vmstate_timers, &timers_state);
-    throttle_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL_RT,
-                                           cpu_throttle_timer_tick, NULL);
+    cpu_throttle_init();
 }
 
 void configure_icount(QemuOpts *opts, Error **errp)
@@ -1852,6 +1768,11 @@ void qemu_cond_wait_iothread(QemuCond *cond)
     qemu_cond_wait(cond, &qemu_global_mutex);
 }
 
+void qemu_cond_timedwait_iothread(QemuCond *cond, int ms)
+{
+    qemu_cond_timedwait(cond, &qemu_global_mutex, ms);
+}
+
 static bool all_vcpus_paused(void)
 {
     CPUState *cpu;
diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index 07f7698155..e6b75d456c 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -817,43 +817,6 @@ bool cpu_exists(int64_t id);
  */
 CPUState *cpu_by_arch_id(int64_t id);
 
-/**
- * cpu_throttle_set:
- * @new_throttle_pct: Percent of sleep time. Valid range is 1 to 99.
- *
- * Throttles all vcpus by forcing them to sleep for the given percentage of
- * time. A throttle_percentage of 25 corresponds to a 75% duty cycle roughly.
- * (example: 10ms sleep for every 30ms awake).
- *
- * cpu_throttle_set can be called as needed to adjust new_throttle_pct.
- * Once the throttling starts, it will remain in effect until cpu_throttle_stop
- * is called.
- */
-void cpu_throttle_set(int new_throttle_pct);
-
-/**
- * cpu_throttle_stop:
- *
- * Stops the vcpu throttling started by cpu_throttle_set.
- */
-void cpu_throttle_stop(void);
-
-/**
- * cpu_throttle_active:
- *
- * Returns: %true if the vcpus are currently being throttled, %false otherwise.
- */
-bool cpu_throttle_active(void);
-
-/**
- * cpu_throttle_get_percentage:
- *
- * Returns the vcpu throttle percentage. See cpu_throttle_set for details.
- *
- * Returns: The throttle percentage in range 1 to 99.
- */
-int cpu_throttle_get_percentage(void);
-
 #ifndef CONFIG_USER_ONLY
 
 typedef void (*CPUInterruptHandler)(CPUState *, int);
diff --git a/include/qemu/main-loop.h b/include/qemu/main-loop.h
index a6d20b0719..2fa3d90ad6 100644
--- a/include/qemu/main-loop.h
+++ b/include/qemu/main-loop.h
@@ -263,6 +263,11 @@ int qemu_add_child_watch(pid_t pid);
  */
 bool qemu_mutex_iothread_locked(void);
 
+/*
+ * qemu_cond_timedwait_iothread: like the previous, but with timeout
+ */
+void qemu_cond_timedwait_iothread(QemuCond *cond, int ms);
+
 /**
  * qemu_mutex_lock_iothread: Lock the main loop mutex.
  *
diff --git a/include/sysemu/cpu-throttle.h b/include/sysemu/cpu-throttle.h
new file mode 100644
index 0000000000..22356502a5
--- /dev/null
+++ b/include/sysemu/cpu-throttle.h
@@ -0,0 +1,50 @@
+#ifndef SYSEMU_CPU_THROTTLE_H
+#define SYSEMU_CPU_THROTTLE_H
+
+#include "qemu/timer.h"
+
+/**
+ * cpu_throttle_init:
+ *
+ * Initialize the CPU throttling API.
+ */
+void cpu_throttle_init(void);
+
+/**
+ * cpu_throttle_set:
+ * @new_throttle_pct: Percent of sleep time. Valid range is 1 to 99.
+ *
+ * Throttles all vcpus by forcing them to sleep for the given percentage of
+ * time. A throttle_percentage of 25 corresponds to a 75% duty cycle roughly.
+ * (example: 10ms sleep for every 30ms awake).
+ *
+ * cpu_throttle_set can be called as needed to adjust new_throttle_pct.
+ * Once the throttling starts, it will remain in effect until cpu_throttle_stop
+ * is called.
+ */
+void cpu_throttle_set(int new_throttle_pct);
+
+/**
+ * cpu_throttle_stop:
+ *
+ * Stops the vcpu throttling started by cpu_throttle_set.
+ */
+void cpu_throttle_stop(void);
+
+/**
+ * cpu_throttle_active:
+ *
+ * Returns: %true if the vcpus are currently being throttled, %false otherwise.
+ */
+bool cpu_throttle_active(void);
+
+/**
+ * cpu_throttle_get_percentage:
+ *
+ * Returns the vcpu throttle percentage. See cpu_throttle_set for details.
+ *
+ * Returns: The throttle percentage in range 1 to 99.
+ */
+int cpu_throttle_get_percentage(void);
+
+#endif /* SYSEMU_CPU_THROTTLE_H */
diff --git a/migration/migration.c b/migration/migration.c
index 0bb042a0f7..dd18323f32 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -23,6 +23,7 @@
 #include "socket.h"
 #include "sysemu/runstate.h"
 #include "sysemu/sysemu.h"
+#include "sysemu/cpu-throttle.h"
 #include "rdma.h"
 #include "ram.h"
 #include "migration/global_state.h"
diff --git a/migration/ram.c b/migration/ram.c
index 859f835f1a..527f0c7316 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -52,6 +52,7 @@
 #include "migration/colo.h"
 #include "block.h"
 #include "sysemu/sysemu.h"
+#include "sysemu/cpu-throttle.h"
 #include "savevm.h"
 #include "qemu/iov.h"
 #include "multifd.h"
-- 
2.16.4



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC 2/3] cpu-timers: new module extracted from cpus.c
  2020-05-21 18:54 [RFC 0/3] QEMU cpus.c refactoring Claudio Fontana
  2020-05-21 18:54 ` [RFC 1/3] cpu-throttle: new module, extracted from cpus.c Claudio Fontana
@ 2020-05-21 18:54 ` Claudio Fontana
  2020-05-22 13:49   ` Claudio Fontana
  2020-05-21 18:54 ` [RFC 3/3] cpus: implement cpus interfaces for per-accel threads Claudio Fontana
  2 siblings, 1 reply; 11+ messages in thread
From: Claudio Fontana @ 2020-05-21 18:54 UTC (permalink / raw)
  To: Paolo Bonzini, Alex Bennée, Peter Maydell,
	Philippe Mathieu-Daudé
  Cc: Laurent Vivier, Thomas Huth, Eduardo Habkost, Marcelo Tosatti,
	open list:All patches CC here, Roman Bolshakov, Wenchao Wang,
	Colin Xu, Claudio Fontana, open list:X86 HAXM CPUs,
	Sunil Muthuswamy, Richard Henderson

Signed-off-by: Claudio Fontana <cfontana@suse.de>
---
 MAINTAINERS                 |   1 +
 Makefile.target             |   1 +
 accel/qtest.c               |   3 +-
 accel/tcg/cpu-exec.c        |  43 ++-
 accel/tcg/tcg-all.c         |   7 +-
 accel/tcg/translate-all.c   |   3 +-
 cpu-timers.c                | 776 ++++++++++++++++++++++++++++++++++++++++++++
 cpus.c                      | 731 +----------------------------------------
 docs/replay.txt             |   6 +-
 exec.c                      |   4 -
 hw/core/ptimer.c            |   6 +-
 hw/i386/x86.c               |   1 +
 include/exec/cpu-all.h      |   4 +
 include/exec/exec-all.h     |   4 +-
 include/qemu/timer.h        |  20 --
 include/sysemu/cpu-timers.h |  73 +++++
 include/sysemu/cpus.h       |  12 +-
 include/sysemu/replay.h     |   4 +-
 qtest.c                     |   2 +-
 replay/replay.c             |   6 +-
 softmmu/vl.c                |   8 +-
 stubs/clock-warp.c          |   4 +-
 stubs/cpu-get-clock.c       |   2 +-
 stubs/cpu-get-icount.c      |  14 +-
 target/alpha/translate.c    |   3 +-
 target/arm/helper.c         |   7 +-
 target/riscv/csr.c          |   8 +-
 tests/ptimer-test-stubs.c   |   6 +
 tests/test-timed-average.c  |   2 +-
 util/main-loop.c            |   4 +-
 util/qemu-timer.c           |   9 +-
 31 files changed, 965 insertions(+), 809 deletions(-)
 create mode 100644 cpu-timers.c
 create mode 100644 include/sysemu/cpu-timers.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 35864a275a..1b3b17fda8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2143,6 +2143,7 @@ M: Paolo Bonzini <pbonzini@redhat.com>
 S: Maintained
 F: cpus.c
 F: cpu-throttle.c
+F: cpu-timers.c
 F: include/qemu/main-loop.h
 F: include/sysemu/runstate.h
 F: util/main-loop.c
diff --git a/Makefile.target b/Makefile.target
index 60cfa2a78b..1d40237375 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -155,6 +155,7 @@ ifdef CONFIG_SOFTMMU
 obj-y += arch_init.o
 obj-y += cpus.o
 obj-y += cpu-throttle.o
+obj-y += cpu-timers.o
 obj-y += gdbstub.o
 obj-y += balloon.o
 obj-y += ioport.o
diff --git a/accel/qtest.c b/accel/qtest.c
index 5b88f55921..ef9ee0941a 100644
--- a/accel/qtest.c
+++ b/accel/qtest.c
@@ -19,13 +19,14 @@
 #include "sysemu/accel.h"
 #include "sysemu/qtest.h"
 #include "sysemu/cpus.h"
+#include "sysemu/cpu-timers.h"
 
 static int qtest_init_accel(MachineState *ms)
 {
     QemuOpts *opts = qemu_opts_create(qemu_find_opts("icount"), NULL, 0,
                                       &error_abort);
     qemu_opt_set(opts, "shift", "0", &error_abort);
-    configure_icount(opts, &error_abort);
+    icount_configure(opts, &error_abort);
     qemu_opts_del(opts);
     return 0;
 }
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index d95c4848a4..82155c1db3 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -19,6 +19,7 @@
 
 #include "qemu/osdep.h"
 #include "qemu-common.h"
+#include "qemu/qemu-print.h"
 #include "cpu.h"
 #include "trace.h"
 #include "disas/disas.h"
@@ -36,6 +37,8 @@
 #include "hw/i386/apic.h"
 #endif
 #include "sysemu/cpus.h"
+#include "exec/cpu-all.h"
+#include "sysemu/cpu-timers.h"
 #include "sysemu/replay.h"
 
 /* -icount align implementation. */
@@ -56,6 +59,9 @@ typedef struct SyncClocks {
 #define MAX_DELAY_PRINT_RATE 2000000000LL
 #define MAX_NB_PRINTS 100
 
+static int64_t max_delay;
+static int64_t max_advance;
+
 static void align_clocks(SyncClocks *sc, CPUState *cpu)
 {
     int64_t cpu_icount;
@@ -65,7 +71,7 @@ static void align_clocks(SyncClocks *sc, CPUState *cpu)
     }
 
     cpu_icount = cpu->icount_extra + cpu_neg(cpu)->icount_decr.u16.low;
-    sc->diff_clk += cpu_icount_to_ns(sc->last_cpu_icount - cpu_icount);
+    sc->diff_clk += icount_to_ns(sc->last_cpu_icount - cpu_icount);
     sc->last_cpu_icount = cpu_icount;
 
     if (sc->diff_clk > VM_CLOCK_ADVANCE) {
@@ -98,9 +104,9 @@ static void print_delay(const SyncClocks *sc)
             (-sc->diff_clk / (float)1000000000LL <
              (threshold_delay - THRESHOLD_REDUCE))) {
             threshold_delay = (-sc->diff_clk / 1000000000LL) + 1;
-            printf("Warning: The guest is now late by %.1f to %.1f seconds\n",
-                   threshold_delay - 1,
-                   threshold_delay);
+            qemu_printf("Warning: The guest is now late by %.1f to %.1f seconds\n",
+                        threshold_delay - 1,
+                        threshold_delay);
             nb_prints++;
             last_realtime_clock = sc->realtime_clock;
         }
@@ -597,7 +603,7 @@ static inline bool cpu_handle_interrupt(CPUState *cpu,
 
     /* Finally, check if we need to exit to the main loop.  */
     if (unlikely(atomic_read(&cpu->exit_request))
-        || (use_icount
+        || (icount_enabled()
             && cpu_neg(cpu)->icount_decr.u16.low + cpu->icount_extra == 0)) {
         atomic_set(&cpu->exit_request, 0);
         if (cpu->exception_index == -1) {
@@ -638,10 +644,10 @@ static inline void cpu_loop_exec_tb(CPUState *cpu, TranslationBlock *tb,
     }
 
     /* Instruction counter expired.  */
-    assert(use_icount);
+    assert(icount_enabled());
 #ifndef CONFIG_USER_ONLY
     /* Ensure global icount has gone forward */
-    cpu_update_icount(cpu);
+    icount_update(cpu);
     /* Refill decrementer and continue execution.  */
     insns_left = MIN(0xffff, cpu->icount_budget);
     cpu_neg(cpu)->icount_decr.u16.low = insns_left;
@@ -741,3 +747,26 @@ int cpu_exec(CPUState *cpu)
 
     return ret;
 }
+
+#ifndef CONFIG_USER_ONLY
+
+void dump_drift_info(void)
+{
+    if (!icount_enabled()) {
+        return;
+    }
+
+    qemu_printf("Host - Guest clock  %"PRIi64" ms\n",
+                (cpu_get_clock() - icount_get()) / SCALE_MS);
+    if (icount_align_option) {
+        qemu_printf("Max guest delay     %"PRIi64" ms\n",
+                    -max_delay / SCALE_MS);
+        qemu_printf("Max guest advance   %"PRIi64" ms\n",
+                    max_advance / SCALE_MS);
+    } else {
+        qemu_printf("Max guest delay     NA\n");
+        qemu_printf("Max guest advance   NA\n");
+    }
+}
+
+#endif /* !CONFIG_USER_ONLY */
diff --git a/accel/tcg/tcg-all.c b/accel/tcg/tcg-all.c
index 3b4fda5640..e27385d051 100644
--- a/accel/tcg/tcg-all.c
+++ b/accel/tcg/tcg-all.c
@@ -29,6 +29,7 @@
 #include "qom/object.h"
 #include "cpu.h"
 #include "sysemu/cpus.h"
+#include "sysemu/cpu-timers.h"
 #include "qemu/main-loop.h"
 #include "tcg/tcg.h"
 #include "qapi/error.h"
@@ -65,7 +66,7 @@ static void tcg_handle_interrupt(CPUState *cpu, int mask)
         qemu_cpu_kick(cpu);
     } else {
         atomic_set(&cpu_neg(cpu)->icount_decr.u16.high, -1);
-        if (use_icount &&
+        if (icount_enabled() &&
             !cpu->can_do_io
             && (mask & ~old_mask) != 0) {
             cpu_abort(cpu, "Raised interrupt while not in I/O function");
@@ -104,7 +105,7 @@ static bool check_tcg_memory_orders_compatible(void)
 
 static bool default_mttcg_enabled(void)
 {
-    if (use_icount || TCG_OVERSIZED_GUEST) {
+    if (icount_enabled() || TCG_OVERSIZED_GUEST) {
         return false;
     } else {
 #ifdef TARGET_SUPPORTS_MTTCG
@@ -146,7 +147,7 @@ static void tcg_set_thread(Object *obj, const char *value, Error **errp)
     if (strcmp(value, "multi") == 0) {
         if (TCG_OVERSIZED_GUEST) {
             error_setg(errp, "No MTTCG when guest word size > hosts");
-        } else if (use_icount) {
+        } else if (icount_enabled()) {
             error_setg(errp, "No MTTCG when icount is enabled");
         } else {
 #ifndef TARGET_SUPPORTS_MTTCG
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 42ce1dfcff..479edeb2ea 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -57,6 +57,7 @@
 #include "qemu/main-loop.h"
 #include "exec/log.h"
 #include "sysemu/cpus.h"
+#include "sysemu/cpu-timers.h"
 #include "sysemu/tcg.h"
 
 /* #define DEBUG_TB_INVALIDATE */
@@ -369,7 +370,7 @@ static int cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
 
  found:
     if (reset_icount && (tb_cflags(tb) & CF_USE_ICOUNT)) {
-        assert(use_icount);
+        assert(icount_enabled());
         /* Reset the cycle counter to the start of the block
            and shift if to the number of actually executed instructions */
         cpu_neg(cpu)->icount_decr.u16.low += num_insns - i;
diff --git a/cpu-timers.c b/cpu-timers.c
new file mode 100644
index 0000000000..20fea07625
--- /dev/null
+++ b/cpu-timers.c
@@ -0,0 +1,776 @@
+/*
+ * QEMU System Emulator
+ *
+ * Copyright (c) 2003-2008 Fabrice Bellard
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "qemu/cutils.h"
+#include "migration/vmstate.h"
+#include "qapi/error.h"
+#include "qemu/error-report.h"
+#include "exec/exec-all.h"
+#include "sysemu/cpus.h"
+#include "sysemu/qtest.h"
+#include "qemu/main-loop.h"
+#include "qemu/option.h"
+#include "qemu/seqlock.h"
+#include "sysemu/replay.h"
+#include "sysemu/runstate.h"
+#include "hw/core/cpu.h"
+#include "sysemu/cpu-timers.h"
+#include "sysemu/cpu-throttle.h"
+
+typedef struct TimersState {
+    /* Protected by BQL.  */
+    int64_t cpu_ticks_prev;
+    int64_t cpu_ticks_offset;
+
+    /*
+     * Protect fields that can be respectively read outside the
+     * BQL, and written from multiple threads.
+     */
+    QemuSeqLock vm_clock_seqlock;
+    QemuSpin vm_clock_lock;
+
+    int16_t cpu_ticks_enabled;
+
+    /* Conversion factor from emulated instructions to virtual clock ticks.  */
+    int16_t icount_time_shift;
+
+    /* Compensate for varying guest execution speed.  */
+    int64_t qemu_icount_bias;
+
+    int64_t vm_clock_warp_start;
+    int64_t cpu_clock_offset;
+
+    /* Only written by TCG thread */
+    int64_t qemu_icount;
+
+    /* for adjusting icount */
+    QEMUTimer *icount_rt_timer;
+    QEMUTimer *icount_vm_timer;
+    QEMUTimer *icount_warp_timer;
+} TimersState;
+
+static TimersState timers_state;
+
+/*
+ * ICOUNT: Instruction Counter
+ */
+static bool icount_sleep = true;
+/* Arbitrarily pick 1MIPS as the minimum allowable speed.  */
+#define MAX_ICOUNT_SHIFT 10
+
+/*
+ * 0 = Do not count executed instructions.
+ * 1 = Fixed conversion of insn to ns via "shift" option
+ * 2 = Runtime adaptive algorithm to compute shift
+ */
+static int use_icount;
+
+int icount_enabled(void)
+{
+    return use_icount;
+}
+
+static void icount_enable_precise(void)
+{
+    use_icount = 1;
+}
+
+static void icount_enable_adaptive(void)
+{
+    use_icount = 2;
+}
+
+/*
+ * The current number of executed instructions is based on what we
+ * originally budgeted minus the current state of the decrementing
+ * icount counters in extra/u16.low.
+ */
+static int64_t icount_get_executed(CPUState *cpu)
+{
+    return (cpu->icount_budget -
+            (cpu_neg(cpu)->icount_decr.u16.low + cpu->icount_extra));
+}
+
+/*
+ * Update the global shared timer_state.qemu_icount to take into
+ * account executed instructions. This is done by the TCG vCPU
+ * thread so the main-loop can see time has moved forward.
+ */
+static void icount_update_locked(CPUState *cpu)
+{
+    int64_t executed = icount_get_executed(cpu);
+    cpu->icount_budget -= executed;
+
+    atomic_set_i64(&timers_state.qemu_icount,
+                   timers_state.qemu_icount + executed);
+}
+
+/*
+ * Update the global shared timer_state.qemu_icount to take into
+ * account executed instructions. This is done by the TCG vCPU
+ * thread so the main-loop can see time has moved forward.
+ */
+void icount_update(CPUState *cpu)
+{
+    seqlock_write_lock(&timers_state.vm_clock_seqlock,
+                       &timers_state.vm_clock_lock);
+    icount_update_locked(cpu);
+    seqlock_write_unlock(&timers_state.vm_clock_seqlock,
+                         &timers_state.vm_clock_lock);
+}
+
+static int64_t icount_get_raw_locked(void)
+{
+    CPUState *cpu = current_cpu;
+
+    if (cpu && cpu->running) {
+        if (!cpu->can_do_io) {
+            error_report("Bad icount read");
+            exit(1);
+        }
+        /* Take into account what has run */
+        icount_update_locked(cpu);
+    }
+    /* The read is protected by the seqlock, but needs atomic64 to avoid UB */
+    return atomic_read_i64(&timers_state.qemu_icount);
+}
+
+static int64_t icount_get_locked(void)
+{
+    int64_t icount = icount_get_raw_locked();
+    return atomic_read_i64(&timers_state.qemu_icount_bias) +
+        icount_to_ns(icount);
+}
+
+int64_t icount_get_raw(void)
+{
+    int64_t icount;
+    unsigned start;
+
+    do {
+        start = seqlock_read_begin(&timers_state.vm_clock_seqlock);
+        icount = icount_get_raw_locked();
+    } while (seqlock_read_retry(&timers_state.vm_clock_seqlock, start));
+
+    return icount;
+}
+
+/* Return the virtual CPU time, based on the instruction counter.  */
+int64_t icount_get(void)
+{
+    int64_t icount;
+    unsigned start;
+
+    do {
+        start = seqlock_read_begin(&timers_state.vm_clock_seqlock);
+        icount = icount_get_locked();
+    } while (seqlock_read_retry(&timers_state.vm_clock_seqlock, start));
+
+    return icount;
+}
+
+int64_t icount_to_ns(int64_t icount)
+{
+    return icount << atomic_read(&timers_state.icount_time_shift);
+}
+
+/*
+ * Correlation between real and virtual time is always going to be
+ * fairly approximate, so ignore small variation.
+ * When the guest is idle real and virtual time will be aligned in
+ * the IO wait loop.
+ */
+#define ICOUNT_WOBBLE (NANOSECONDS_PER_SECOND / 10)
+
+static int64_t cpu_get_clock_locked(void);
+
+static void icount_adjust(void)
+{
+    int64_t cur_time;
+    int64_t cur_icount;
+    int64_t delta;
+
+    /* Protected by TimersState mutex.  */
+    static int64_t last_delta;
+
+    /* If the VM is not running, then do nothing.  */
+    if (!runstate_is_running()) {
+        return;
+    }
+
+    seqlock_write_lock(&timers_state.vm_clock_seqlock,
+                       &timers_state.vm_clock_lock);
+    cur_time = cpu_get_clock_locked();
+    cur_icount = icount_get_locked();
+
+    delta = cur_icount - cur_time;
+    /* FIXME: This is a very crude algorithm, somewhat prone to oscillation.  */
+    if (delta > 0
+        && last_delta + ICOUNT_WOBBLE < delta * 2
+        && timers_state.icount_time_shift > 0) {
+        /* The guest is getting too far ahead.  Slow time down.  */
+        atomic_set(&timers_state.icount_time_shift,
+                   timers_state.icount_time_shift - 1);
+    }
+    if (delta < 0
+        && last_delta - ICOUNT_WOBBLE > delta * 2
+        && timers_state.icount_time_shift < MAX_ICOUNT_SHIFT) {
+        /* The guest is getting too far behind.  Speed time up.  */
+        atomic_set(&timers_state.icount_time_shift,
+                   timers_state.icount_time_shift + 1);
+    }
+    last_delta = delta;
+    atomic_set_i64(&timers_state.qemu_icount_bias,
+                   cur_icount - (timers_state.qemu_icount
+                                 << timers_state.icount_time_shift));
+    seqlock_write_unlock(&timers_state.vm_clock_seqlock,
+                         &timers_state.vm_clock_lock);
+}
+
+static void icount_adjust_rt(void *opaque)
+{
+    timer_mod(timers_state.icount_rt_timer,
+              qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL_RT) + 1000);
+    icount_adjust();
+}
+
+static void icount_adjust_vm(void *opaque)
+{
+    timer_mod(timers_state.icount_vm_timer,
+                   qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) +
+                   NANOSECONDS_PER_SECOND / 10);
+    icount_adjust();
+}
+
+int64_t icount_round(int64_t count)
+{
+    int shift = atomic_read(&timers_state.icount_time_shift);
+    return (count + (1 << shift) - 1) >> shift;
+}
+
+static void icount_warp_rt(void)
+{
+    unsigned seq;
+    int64_t warp_start;
+
+    /*
+     * The icount_warp_timer is rescheduled soon after vm_clock_warp_start
+     * changes from -1 to another value, so the race here is okay.
+     */
+    do {
+        seq = seqlock_read_begin(&timers_state.vm_clock_seqlock);
+        warp_start = timers_state.vm_clock_warp_start;
+    } while (seqlock_read_retry(&timers_state.vm_clock_seqlock, seq));
+
+    if (warp_start == -1) {
+        return;
+    }
+
+    seqlock_write_lock(&timers_state.vm_clock_seqlock,
+                       &timers_state.vm_clock_lock);
+    if (runstate_is_running()) {
+        int64_t clock = REPLAY_CLOCK_LOCKED(REPLAY_CLOCK_VIRTUAL_RT,
+                                            cpu_get_clock_locked());
+        int64_t warp_delta;
+
+        warp_delta = clock - timers_state.vm_clock_warp_start;
+        if (icount_enabled() == 2) {
+            /*
+             * In adaptive mode, do not let QEMU_CLOCK_VIRTUAL run too
+             * far ahead of real time.
+             */
+            int64_t cur_icount = icount_get_locked();
+            int64_t delta = clock - cur_icount;
+            warp_delta = MIN(warp_delta, delta);
+        }
+        atomic_set_i64(&timers_state.qemu_icount_bias,
+                       timers_state.qemu_icount_bias + warp_delta);
+    }
+    timers_state.vm_clock_warp_start = -1;
+    seqlock_write_unlock(&timers_state.vm_clock_seqlock,
+                       &timers_state.vm_clock_lock);
+
+    if (qemu_clock_expired(QEMU_CLOCK_VIRTUAL)) {
+        qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
+    }
+}
+
+static void icount_timer_cb(void *opaque)
+{
+    /*
+     * No need for a checkpoint because the timer already synchronizes
+     * with CHECKPOINT_CLOCK_VIRTUAL_RT.
+     */
+    icount_warp_rt();
+}
+
+void icount_start_warp_timer(void)
+{
+    int64_t clock;
+    int64_t deadline;
+
+    if (!icount_enabled()) {
+        return;
+    }
+
+    /*
+     * Nothing to do if the VM is stopped: QEMU_CLOCK_VIRTUAL timers
+     * do not fire, so computing the deadline does not make sense.
+     */
+    if (!runstate_is_running()) {
+        return;
+    }
+
+    if (replay_mode != REPLAY_MODE_PLAY) {
+        if (!all_cpu_threads_idle()) {
+            return;
+        }
+
+        if (qtest_enabled()) {
+            /* When testing, qtest commands advance icount.  */
+            return;
+        }
+
+        replay_checkpoint(CHECKPOINT_CLOCK_WARP_START);
+    } else {
+        /* warp clock deterministically in record/replay mode */
+        if (!replay_checkpoint(CHECKPOINT_CLOCK_WARP_START)) {
+            /*
+             * vCPU is sleeping and warp can't be started.
+             * It is probably a race condition: notification sent
+             * to vCPU was processed in advance and vCPU went to sleep.
+             * Therefore we have to wake it up for doing someting.
+             */
+            if (replay_has_checkpoint()) {
+                qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
+            }
+            return;
+        }
+    }
+
+    /* We want to use the earliest deadline from ALL vm_clocks */
+    clock = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL_RT);
+    deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL,
+                                          ~QEMU_TIMER_ATTR_EXTERNAL);
+    if (deadline < 0) {
+        static bool notified;
+        if (!icount_sleep && !notified) {
+            warn_report("icount sleep disabled and no active timers");
+            notified = true;
+        }
+        return;
+    }
+
+    if (deadline > 0) {
+        /*
+         * Ensure QEMU_CLOCK_VIRTUAL proceeds even when the virtual CPU goes to
+         * sleep.  Otherwise, the CPU might be waiting for a future timer
+         * interrupt to wake it up, but the interrupt never comes because
+         * the vCPU isn't running any insns and thus doesn't advance the
+         * QEMU_CLOCK_VIRTUAL.
+         */
+        if (!icount_sleep) {
+            /*
+             * We never let VCPUs sleep in no sleep icount mode.
+             * If there is a pending QEMU_CLOCK_VIRTUAL timer we just advance
+             * to the next QEMU_CLOCK_VIRTUAL event and notify it.
+             * It is useful when we want a deterministic execution time,
+             * isolated from host latencies.
+             */
+            seqlock_write_lock(&timers_state.vm_clock_seqlock,
+                               &timers_state.vm_clock_lock);
+            atomic_set_i64(&timers_state.qemu_icount_bias,
+                           timers_state.qemu_icount_bias + deadline);
+            seqlock_write_unlock(&timers_state.vm_clock_seqlock,
+                                 &timers_state.vm_clock_lock);
+            qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
+        } else {
+            /*
+             * We do stop VCPUs and only advance QEMU_CLOCK_VIRTUAL after some
+             * "real" time, (related to the time left until the next event) has
+             * passed. The QEMU_CLOCK_VIRTUAL_RT clock will do this.
+             * This avoids that the warps are visible externally; for example,
+             * you will not be sending network packets continuously instead of
+             * every 100ms.
+             */
+            seqlock_write_lock(&timers_state.vm_clock_seqlock,
+                               &timers_state.vm_clock_lock);
+            if (timers_state.vm_clock_warp_start == -1
+                || timers_state.vm_clock_warp_start > clock) {
+                timers_state.vm_clock_warp_start = clock;
+            }
+            seqlock_write_unlock(&timers_state.vm_clock_seqlock,
+                                 &timers_state.vm_clock_lock);
+            timer_mod_anticipate(timers_state.icount_warp_timer,
+                                 clock + deadline);
+        }
+    } else if (deadline == 0) {
+        qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
+    }
+}
+
+void icount_account_warp_timer(void)
+{
+    if (!use_icount || !icount_sleep) {
+        return;
+    }
+
+    /*
+     * Nothing to do if the VM is stopped: QEMU_CLOCK_VIRTUAL timers
+     * do not fire, so computing the deadline does not make sense.
+     */
+    if (!runstate_is_running()) {
+        return;
+    }
+
+    /* warp clock deterministically in record/replay mode */
+    if (!replay_checkpoint(CHECKPOINT_CLOCK_WARP_ACCOUNT)) {
+        return;
+    }
+
+    timer_del(timers_state.icount_warp_timer);
+    icount_warp_rt();
+}
+
+void icount_configure(QemuOpts *opts, Error **errp)
+{
+    const char *option = qemu_opt_get(opts, "shift");
+    bool sleep = qemu_opt_get_bool(opts, "sleep", true);
+    bool align = qemu_opt_get_bool(opts, "align", false);
+    long time_shift = -1;
+
+    if (!option && qemu_opt_get(opts, "align")) {
+        error_setg(errp, "Please specify shift option when using align");
+        return;
+    }
+
+    if (align && !sleep) {
+        error_setg(errp, "align=on and sleep=off are incompatible");
+        return;
+    }
+
+    if (strcmp(option, "auto") != 0) {
+        if (qemu_strtol(option, NULL, 0, &time_shift) < 0
+            || time_shift < 0 || time_shift > MAX_ICOUNT_SHIFT) {
+            error_setg(errp, "icount: Invalid shift value");
+            return;
+        }
+    } else if (icount_align_option) {
+        error_setg(errp, "shift=auto and align=on are incompatible");
+        return;
+    } else if (!icount_sleep) {
+        error_setg(errp, "shift=auto and sleep=off are incompatible");
+        return;
+    }
+
+    icount_sleep = sleep;
+    if (icount_sleep) {
+        timers_state.icount_warp_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL_RT,
+                                         icount_timer_cb, NULL);
+    }
+
+    icount_align_option = align;
+
+    if (time_shift >= 0) {
+        timers_state.icount_time_shift = time_shift;
+        icount_enable_precise();
+        return;
+    }
+
+    icount_enable_adaptive();
+
+    /*
+     * 125MIPS seems a reasonable initial guess at the guest speed.
+     * It will be corrected fairly quickly anyway.
+     */
+    timers_state.icount_time_shift = 3;
+
+    /*
+     * Have both realtime and virtual time triggers for speed adjustment.
+     * The realtime trigger catches emulated time passing too slowly,
+     * the virtual time trigger catches emulated time passing too fast.
+     * Realtime triggers occur even when idle, so use them less frequently
+     * than VM triggers.
+     */
+    timers_state.vm_clock_warp_start = -1;
+    timers_state.icount_rt_timer = timer_new_ms(QEMU_CLOCK_VIRTUAL_RT,
+                                   icount_adjust_rt, NULL);
+    timer_mod(timers_state.icount_rt_timer,
+                   qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL_RT) + 1000);
+    timers_state.icount_vm_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
+                                        icount_adjust_vm, NULL);
+    timer_mod(timers_state.icount_vm_timer,
+                   qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) +
+                   NANOSECONDS_PER_SECOND / 10);
+}
+
+/* clock and ticks */
+
+static int64_t cpu_get_ticks_locked(void)
+{
+    int64_t ticks = timers_state.cpu_ticks_offset;
+    if (timers_state.cpu_ticks_enabled) {
+        ticks += cpu_get_host_ticks();
+    }
+
+    if (timers_state.cpu_ticks_prev > ticks) {
+        /* Non increasing ticks may happen if the host uses software suspend. */
+        timers_state.cpu_ticks_offset += timers_state.cpu_ticks_prev - ticks;
+        ticks = timers_state.cpu_ticks_prev;
+    }
+
+    timers_state.cpu_ticks_prev = ticks;
+    return ticks;
+}
+
+/*
+ * return the time elapsed in VM between vm_start and vm_stop.  Unless
+ * icount is active, cpu_get_ticks() uses units of the host CPU cycle
+ * counter.
+ */
+int64_t cpu_get_ticks(void)
+{
+    int64_t ticks;
+
+    if (icount_enabled()) {
+        return icount_get();
+    }
+
+    qemu_spin_lock(&timers_state.vm_clock_lock);
+    ticks = cpu_get_ticks_locked();
+    qemu_spin_unlock(&timers_state.vm_clock_lock);
+    return ticks;
+}
+
+static int64_t cpu_get_clock_locked(void)
+{
+    int64_t time;
+
+    time = timers_state.cpu_clock_offset;
+    if (timers_state.cpu_ticks_enabled) {
+        time += get_clock();
+    }
+
+    return time;
+}
+
+/*
+ * Return the monotonic time elapsed in VM, i.e.,
+ * the time between vm_start and vm_stop
+ */
+int64_t cpu_get_clock(void)
+{
+    int64_t ti;
+    unsigned start;
+
+    do {
+        start = seqlock_read_begin(&timers_state.vm_clock_seqlock);
+        ti = cpu_get_clock_locked();
+    } while (seqlock_read_retry(&timers_state.vm_clock_seqlock, start));
+
+    return ti;
+}
+
+/*
+ * enable cpu_get_ticks()
+ * Caller must hold BQL which serves as mutex for vm_clock_seqlock.
+ */
+void cpu_enable_ticks(void)
+{
+    seqlock_write_lock(&timers_state.vm_clock_seqlock,
+                       &timers_state.vm_clock_lock);
+    if (!timers_state.cpu_ticks_enabled) {
+        timers_state.cpu_ticks_offset -= cpu_get_host_ticks();
+        timers_state.cpu_clock_offset -= get_clock();
+        timers_state.cpu_ticks_enabled = 1;
+    }
+    seqlock_write_unlock(&timers_state.vm_clock_seqlock,
+                       &timers_state.vm_clock_lock);
+}
+
+/*
+ * disable cpu_get_ticks() : the clock is stopped. You must not call
+ * cpu_get_ticks() after that.
+ * Caller must hold BQL which serves as mutex for vm_clock_seqlock.
+ */
+void cpu_disable_ticks(void)
+{
+    seqlock_write_lock(&timers_state.vm_clock_seqlock,
+                       &timers_state.vm_clock_lock);
+    if (timers_state.cpu_ticks_enabled) {
+        timers_state.cpu_ticks_offset += cpu_get_host_ticks();
+        timers_state.cpu_clock_offset = cpu_get_clock_locked();
+        timers_state.cpu_ticks_enabled = 0;
+    }
+    seqlock_write_unlock(&timers_state.vm_clock_seqlock,
+                         &timers_state.vm_clock_lock);
+}
+
+void qtest_clock_warp(int64_t dest)
+{
+    int64_t clock = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
+    AioContext *aio_context;
+    assert(qtest_enabled());
+    aio_context = qemu_get_aio_context();
+    while (clock < dest) {
+        int64_t deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL,
+                                                      QEMU_TIMER_ATTR_ALL);
+        int64_t warp = qemu_soonest_timeout(dest - clock, deadline);
+
+        seqlock_write_lock(&timers_state.vm_clock_seqlock,
+                           &timers_state.vm_clock_lock);
+        atomic_set_i64(&timers_state.qemu_icount_bias,
+                       timers_state.qemu_icount_bias + warp);
+        seqlock_write_unlock(&timers_state.vm_clock_seqlock,
+                             &timers_state.vm_clock_lock);
+
+        qemu_clock_run_timers(QEMU_CLOCK_VIRTUAL);
+        timerlist_run_timers(aio_context->tlg.tl[QEMU_CLOCK_VIRTUAL]);
+        clock = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
+    }
+    qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
+}
+
+static bool icount_state_needed(void *opaque)
+{
+    return icount_enabled();
+}
+
+static bool warp_timer_state_needed(void *opaque)
+{
+    TimersState *s = opaque;
+    return s->icount_warp_timer != NULL;
+}
+
+static bool adjust_timers_state_needed(void *opaque)
+{
+    TimersState *s = opaque;
+    return s->icount_rt_timer != NULL;
+}
+
+/*
+ * Subsection for warp timer migration is optional, because may not be created
+ */
+static const VMStateDescription icount_vmstate_warp_timer = {
+    .name = "timer/icount/warp_timer",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = warp_timer_state_needed,
+    .fields = (VMStateField[]) {
+        VMSTATE_INT64(vm_clock_warp_start, TimersState),
+        VMSTATE_TIMER_PTR(icount_warp_timer, TimersState),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static const VMStateDescription icount_vmstate_adjust_timers = {
+    .name = "timer/icount/timers",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = adjust_timers_state_needed,
+    .fields = (VMStateField[]) {
+        VMSTATE_TIMER_PTR(icount_rt_timer, TimersState),
+        VMSTATE_TIMER_PTR(icount_vm_timer, TimersState),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+/*
+ * This is a subsection for icount migration.
+ */
+static const VMStateDescription icount_vmstate_timers = {
+    .name = "timer/icount",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = icount_state_needed,
+    .fields = (VMStateField[]) {
+        VMSTATE_INT64(qemu_icount_bias, TimersState),
+        VMSTATE_INT64(qemu_icount, TimersState),
+        VMSTATE_END_OF_LIST()
+    },
+    .subsections = (const VMStateDescription * []) {
+        &icount_vmstate_warp_timer,
+        &icount_vmstate_adjust_timers,
+        NULL
+    }
+};
+
+static const VMStateDescription vmstate_timers = {
+    .name = "timer",
+    .version_id = 2,
+    .minimum_version_id = 1,
+    .fields = (VMStateField[]) {
+        VMSTATE_INT64(cpu_ticks_offset, TimersState),
+        VMSTATE_UNUSED(8),
+        VMSTATE_INT64_V(cpu_clock_offset, TimersState, 2),
+        VMSTATE_END_OF_LIST()
+    },
+    .subsections = (const VMStateDescription * []) {
+        &icount_vmstate_timers,
+        NULL
+    }
+};
+
+static void do_nothing(CPUState *cpu, run_on_cpu_data unused)
+{
+}
+
+void qemu_timer_notify_cb(void *opaque, QEMUClockType type)
+{
+    if (!icount_enabled() || type != QEMU_CLOCK_VIRTUAL) {
+        qemu_notify_event();
+        return;
+    }
+
+    if (qemu_in_vcpu_thread()) {
+        /*
+         * A CPU is currently running; kick it back out to the
+         * tcg_cpu_exec() loop so it will recalculate its
+         * icount deadline immediately.
+         */
+        qemu_cpu_kick(current_cpu);
+    } else if (first_cpu) {
+        /*
+         * qemu_cpu_kick is not enough to kick a halted CPU out of
+         * qemu_tcg_wait_io_event.  async_run_on_cpu, instead,
+         * causes cpu_thread_is_idle to return false.  This way,
+         * handle_icount_deadline can run.
+         * If we have no CPUs at all for some reason, we don't
+         * need to do anything.
+         */
+        async_run_on_cpu(first_cpu, do_nothing, RUN_ON_CPU_NULL);
+    }
+}
+
+/* initialize this module and the cpu throttle for convenience as well */
+void cpu_timers_init(void)
+{
+    seqlock_init(&timers_state.vm_clock_seqlock);
+    qemu_spin_init(&timers_state.vm_clock_lock);
+    vmstate_register(NULL, 0, &vmstate_timers, &timers_state);
+
+    cpu_throttle_init();
+}
diff --git a/cpus.c b/cpus.c
index 3a46a4fc2b..7e9f545be8 100644
--- a/cpus.c
+++ b/cpus.c
@@ -58,11 +58,10 @@
 #include "hw/nmi.h"
 #include "sysemu/replay.h"
 #include "sysemu/runstate.h"
+#include "sysemu/cpu-timers.h"
 #include "hw/boards.h"
 #include "hw/hw.h"
 
-#include "sysemu/cpu-throttle.h"
-
 #ifdef CONFIG_LINUX
 
 #include <sys/prctl.h>
@@ -83,9 +82,6 @@
 
 static QemuMutex qemu_global_mutex;
 
-int64_t max_delay;
-int64_t max_advance;
-
 bool cpu_is_stopped(CPUState *cpu)
 {
     return cpu->stopped || !runstate_is_running();
@@ -106,7 +102,7 @@ static bool cpu_thread_is_idle(CPUState *cpu)
     return true;
 }
 
-static bool all_cpu_threads_idle(void)
+bool all_cpu_threads_idle(void)
 {
     CPUState *cpu;
 
@@ -118,668 +114,8 @@ static bool all_cpu_threads_idle(void)
     return true;
 }
 
-/***********************************************************/
-/* guest cycle counter */
-
-/* Protected by TimersState seqlock */
-
-static bool icount_sleep = true;
-/* Arbitrarily pick 1MIPS as the minimum allowable speed.  */
-#define MAX_ICOUNT_SHIFT 10
-
-typedef struct TimersState {
-    /* Protected by BQL.  */
-    int64_t cpu_ticks_prev;
-    int64_t cpu_ticks_offset;
-
-    /* Protect fields that can be respectively read outside the
-     * BQL, and written from multiple threads.
-     */
-    QemuSeqLock vm_clock_seqlock;
-    QemuSpin vm_clock_lock;
-
-    int16_t cpu_ticks_enabled;
-
-    /* Conversion factor from emulated instructions to virtual clock ticks.  */
-    int16_t icount_time_shift;
-
-    /* Compensate for varying guest execution speed.  */
-    int64_t qemu_icount_bias;
-
-    int64_t vm_clock_warp_start;
-    int64_t cpu_clock_offset;
-
-    /* Only written by TCG thread */
-    int64_t qemu_icount;
-
-    /* for adjusting icount */
-    QEMUTimer *icount_rt_timer;
-    QEMUTimer *icount_vm_timer;
-    QEMUTimer *icount_warp_timer;
-} TimersState;
-
-static TimersState timers_state;
 bool mttcg_enabled;
 
-
-/* The current number of executed instructions is based on what we
- * originally budgeted minus the current state of the decrementing
- * icount counters in extra/u16.low.
- */
-static int64_t cpu_get_icount_executed(CPUState *cpu)
-{
-    return (cpu->icount_budget -
-            (cpu_neg(cpu)->icount_decr.u16.low + cpu->icount_extra));
-}
-
-/*
- * Update the global shared timer_state.qemu_icount to take into
- * account executed instructions. This is done by the TCG vCPU
- * thread so the main-loop can see time has moved forward.
- */
-static void cpu_update_icount_locked(CPUState *cpu)
-{
-    int64_t executed = cpu_get_icount_executed(cpu);
-    cpu->icount_budget -= executed;
-
-    atomic_set_i64(&timers_state.qemu_icount,
-                   timers_state.qemu_icount + executed);
-}
-
-/*
- * Update the global shared timer_state.qemu_icount to take into
- * account executed instructions. This is done by the TCG vCPU
- * thread so the main-loop can see time has moved forward.
- */
-void cpu_update_icount(CPUState *cpu)
-{
-    seqlock_write_lock(&timers_state.vm_clock_seqlock,
-                       &timers_state.vm_clock_lock);
-    cpu_update_icount_locked(cpu);
-    seqlock_write_unlock(&timers_state.vm_clock_seqlock,
-                         &timers_state.vm_clock_lock);
-}
-
-static int64_t cpu_get_icount_raw_locked(void)
-{
-    CPUState *cpu = current_cpu;
-
-    if (cpu && cpu->running) {
-        if (!cpu->can_do_io) {
-            error_report("Bad icount read");
-            exit(1);
-        }
-        /* Take into account what has run */
-        cpu_update_icount_locked(cpu);
-    }
-    /* The read is protected by the seqlock, but needs atomic64 to avoid UB */
-    return atomic_read_i64(&timers_state.qemu_icount);
-}
-
-static int64_t cpu_get_icount_locked(void)
-{
-    int64_t icount = cpu_get_icount_raw_locked();
-    return atomic_read_i64(&timers_state.qemu_icount_bias) +
-        cpu_icount_to_ns(icount);
-}
-
-int64_t cpu_get_icount_raw(void)
-{
-    int64_t icount;
-    unsigned start;
-
-    do {
-        start = seqlock_read_begin(&timers_state.vm_clock_seqlock);
-        icount = cpu_get_icount_raw_locked();
-    } while (seqlock_read_retry(&timers_state.vm_clock_seqlock, start));
-
-    return icount;
-}
-
-/* Return the virtual CPU time, based on the instruction counter.  */
-int64_t cpu_get_icount(void)
-{
-    int64_t icount;
-    unsigned start;
-
-    do {
-        start = seqlock_read_begin(&timers_state.vm_clock_seqlock);
-        icount = cpu_get_icount_locked();
-    } while (seqlock_read_retry(&timers_state.vm_clock_seqlock, start));
-
-    return icount;
-}
-
-int64_t cpu_icount_to_ns(int64_t icount)
-{
-    return icount << atomic_read(&timers_state.icount_time_shift);
-}
-
-static int64_t cpu_get_ticks_locked(void)
-{
-    int64_t ticks = timers_state.cpu_ticks_offset;
-    if (timers_state.cpu_ticks_enabled) {
-        ticks += cpu_get_host_ticks();
-    }
-
-    if (timers_state.cpu_ticks_prev > ticks) {
-        /* Non increasing ticks may happen if the host uses software suspend.  */
-        timers_state.cpu_ticks_offset += timers_state.cpu_ticks_prev - ticks;
-        ticks = timers_state.cpu_ticks_prev;
-    }
-
-    timers_state.cpu_ticks_prev = ticks;
-    return ticks;
-}
-
-/* return the time elapsed in VM between vm_start and vm_stop.  Unless
- * icount is active, cpu_get_ticks() uses units of the host CPU cycle
- * counter.
- */
-int64_t cpu_get_ticks(void)
-{
-    int64_t ticks;
-
-    if (use_icount) {
-        return cpu_get_icount();
-    }
-
-    qemu_spin_lock(&timers_state.vm_clock_lock);
-    ticks = cpu_get_ticks_locked();
-    qemu_spin_unlock(&timers_state.vm_clock_lock);
-    return ticks;
-}
-
-static int64_t cpu_get_clock_locked(void)
-{
-    int64_t time;
-
-    time = timers_state.cpu_clock_offset;
-    if (timers_state.cpu_ticks_enabled) {
-        time += get_clock();
-    }
-
-    return time;
-}
-
-/* Return the monotonic time elapsed in VM, i.e.,
- * the time between vm_start and vm_stop
- */
-int64_t cpu_get_clock(void)
-{
-    int64_t ti;
-    unsigned start;
-
-    do {
-        start = seqlock_read_begin(&timers_state.vm_clock_seqlock);
-        ti = cpu_get_clock_locked();
-    } while (seqlock_read_retry(&timers_state.vm_clock_seqlock, start));
-
-    return ti;
-}
-
-/* enable cpu_get_ticks()
- * Caller must hold BQL which serves as mutex for vm_clock_seqlock.
- */
-void cpu_enable_ticks(void)
-{
-    seqlock_write_lock(&timers_state.vm_clock_seqlock,
-                       &timers_state.vm_clock_lock);
-    if (!timers_state.cpu_ticks_enabled) {
-        timers_state.cpu_ticks_offset -= cpu_get_host_ticks();
-        timers_state.cpu_clock_offset -= get_clock();
-        timers_state.cpu_ticks_enabled = 1;
-    }
-    seqlock_write_unlock(&timers_state.vm_clock_seqlock,
-                       &timers_state.vm_clock_lock);
-}
-
-/* disable cpu_get_ticks() : the clock is stopped. You must not call
- * cpu_get_ticks() after that.
- * Caller must hold BQL which serves as mutex for vm_clock_seqlock.
- */
-void cpu_disable_ticks(void)
-{
-    seqlock_write_lock(&timers_state.vm_clock_seqlock,
-                       &timers_state.vm_clock_lock);
-    if (timers_state.cpu_ticks_enabled) {
-        timers_state.cpu_ticks_offset += cpu_get_host_ticks();
-        timers_state.cpu_clock_offset = cpu_get_clock_locked();
-        timers_state.cpu_ticks_enabled = 0;
-    }
-    seqlock_write_unlock(&timers_state.vm_clock_seqlock,
-                         &timers_state.vm_clock_lock);
-}
-
-/* Correlation between real and virtual time is always going to be
-   fairly approximate, so ignore small variation.
-   When the guest is idle real and virtual time will be aligned in
-   the IO wait loop.  */
-#define ICOUNT_WOBBLE (NANOSECONDS_PER_SECOND / 10)
-
-static void icount_adjust(void)
-{
-    int64_t cur_time;
-    int64_t cur_icount;
-    int64_t delta;
-
-    /* Protected by TimersState mutex.  */
-    static int64_t last_delta;
-
-    /* If the VM is not running, then do nothing.  */
-    if (!runstate_is_running()) {
-        return;
-    }
-
-    seqlock_write_lock(&timers_state.vm_clock_seqlock,
-                       &timers_state.vm_clock_lock);
-    cur_time = cpu_get_clock_locked();
-    cur_icount = cpu_get_icount_locked();
-
-    delta = cur_icount - cur_time;
-    /* FIXME: This is a very crude algorithm, somewhat prone to oscillation.  */
-    if (delta > 0
-        && last_delta + ICOUNT_WOBBLE < delta * 2
-        && timers_state.icount_time_shift > 0) {
-        /* The guest is getting too far ahead.  Slow time down.  */
-        atomic_set(&timers_state.icount_time_shift,
-                   timers_state.icount_time_shift - 1);
-    }
-    if (delta < 0
-        && last_delta - ICOUNT_WOBBLE > delta * 2
-        && timers_state.icount_time_shift < MAX_ICOUNT_SHIFT) {
-        /* The guest is getting too far behind.  Speed time up.  */
-        atomic_set(&timers_state.icount_time_shift,
-                   timers_state.icount_time_shift + 1);
-    }
-    last_delta = delta;
-    atomic_set_i64(&timers_state.qemu_icount_bias,
-                   cur_icount - (timers_state.qemu_icount
-                                 << timers_state.icount_time_shift));
-    seqlock_write_unlock(&timers_state.vm_clock_seqlock,
-                         &timers_state.vm_clock_lock);
-}
-
-static void icount_adjust_rt(void *opaque)
-{
-    timer_mod(timers_state.icount_rt_timer,
-              qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL_RT) + 1000);
-    icount_adjust();
-}
-
-static void icount_adjust_vm(void *opaque)
-{
-    timer_mod(timers_state.icount_vm_timer,
-                   qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) +
-                   NANOSECONDS_PER_SECOND / 10);
-    icount_adjust();
-}
-
-static int64_t qemu_icount_round(int64_t count)
-{
-    int shift = atomic_read(&timers_state.icount_time_shift);
-    return (count + (1 << shift) - 1) >> shift;
-}
-
-static void icount_warp_rt(void)
-{
-    unsigned seq;
-    int64_t warp_start;
-
-    /* The icount_warp_timer is rescheduled soon after vm_clock_warp_start
-     * changes from -1 to another value, so the race here is okay.
-     */
-    do {
-        seq = seqlock_read_begin(&timers_state.vm_clock_seqlock);
-        warp_start = timers_state.vm_clock_warp_start;
-    } while (seqlock_read_retry(&timers_state.vm_clock_seqlock, seq));
-
-    if (warp_start == -1) {
-        return;
-    }
-
-    seqlock_write_lock(&timers_state.vm_clock_seqlock,
-                       &timers_state.vm_clock_lock);
-    if (runstate_is_running()) {
-        int64_t clock = REPLAY_CLOCK_LOCKED(REPLAY_CLOCK_VIRTUAL_RT,
-                                            cpu_get_clock_locked());
-        int64_t warp_delta;
-
-        warp_delta = clock - timers_state.vm_clock_warp_start;
-        if (use_icount == 2) {
-            /*
-             * In adaptive mode, do not let QEMU_CLOCK_VIRTUAL run too
-             * far ahead of real time.
-             */
-            int64_t cur_icount = cpu_get_icount_locked();
-            int64_t delta = clock - cur_icount;
-            warp_delta = MIN(warp_delta, delta);
-        }
-        atomic_set_i64(&timers_state.qemu_icount_bias,
-                       timers_state.qemu_icount_bias + warp_delta);
-    }
-    timers_state.vm_clock_warp_start = -1;
-    seqlock_write_unlock(&timers_state.vm_clock_seqlock,
-                       &timers_state.vm_clock_lock);
-
-    if (qemu_clock_expired(QEMU_CLOCK_VIRTUAL)) {
-        qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
-    }
-}
-
-static void icount_timer_cb(void *opaque)
-{
-    /* No need for a checkpoint because the timer already synchronizes
-     * with CHECKPOINT_CLOCK_VIRTUAL_RT.
-     */
-    icount_warp_rt();
-}
-
-void qtest_clock_warp(int64_t dest)
-{
-    int64_t clock = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
-    AioContext *aio_context;
-    assert(qtest_enabled());
-    aio_context = qemu_get_aio_context();
-    while (clock < dest) {
-        int64_t deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL,
-                                                      QEMU_TIMER_ATTR_ALL);
-        int64_t warp = qemu_soonest_timeout(dest - clock, deadline);
-
-        seqlock_write_lock(&timers_state.vm_clock_seqlock,
-                           &timers_state.vm_clock_lock);
-        atomic_set_i64(&timers_state.qemu_icount_bias,
-                       timers_state.qemu_icount_bias + warp);
-        seqlock_write_unlock(&timers_state.vm_clock_seqlock,
-                             &timers_state.vm_clock_lock);
-
-        qemu_clock_run_timers(QEMU_CLOCK_VIRTUAL);
-        timerlist_run_timers(aio_context->tlg.tl[QEMU_CLOCK_VIRTUAL]);
-        clock = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
-    }
-    qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
-}
-
-void qemu_start_warp_timer(void)
-{
-    int64_t clock;
-    int64_t deadline;
-
-    if (!use_icount) {
-        return;
-    }
-
-    /* Nothing to do if the VM is stopped: QEMU_CLOCK_VIRTUAL timers
-     * do not fire, so computing the deadline does not make sense.
-     */
-    if (!runstate_is_running()) {
-        return;
-    }
-
-    if (replay_mode != REPLAY_MODE_PLAY) {
-        if (!all_cpu_threads_idle()) {
-            return;
-        }
-
-        if (qtest_enabled()) {
-            /* When testing, qtest commands advance icount.  */
-            return;
-        }
-
-        replay_checkpoint(CHECKPOINT_CLOCK_WARP_START);
-    } else {
-        /* warp clock deterministically in record/replay mode */
-        if (!replay_checkpoint(CHECKPOINT_CLOCK_WARP_START)) {
-            /* vCPU is sleeping and warp can't be started.
-               It is probably a race condition: notification sent
-               to vCPU was processed in advance and vCPU went to sleep.
-               Therefore we have to wake it up for doing someting. */
-            if (replay_has_checkpoint()) {
-                qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
-            }
-            return;
-        }
-    }
-
-    /* We want to use the earliest deadline from ALL vm_clocks */
-    clock = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL_RT);
-    deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL,
-                                          ~QEMU_TIMER_ATTR_EXTERNAL);
-    if (deadline < 0) {
-        static bool notified;
-        if (!icount_sleep && !notified) {
-            warn_report("icount sleep disabled and no active timers");
-            notified = true;
-        }
-        return;
-    }
-
-    if (deadline > 0) {
-        /*
-         * Ensure QEMU_CLOCK_VIRTUAL proceeds even when the virtual CPU goes to
-         * sleep.  Otherwise, the CPU might be waiting for a future timer
-         * interrupt to wake it up, but the interrupt never comes because
-         * the vCPU isn't running any insns and thus doesn't advance the
-         * QEMU_CLOCK_VIRTUAL.
-         */
-        if (!icount_sleep) {
-            /*
-             * We never let VCPUs sleep in no sleep icount mode.
-             * If there is a pending QEMU_CLOCK_VIRTUAL timer we just advance
-             * to the next QEMU_CLOCK_VIRTUAL event and notify it.
-             * It is useful when we want a deterministic execution time,
-             * isolated from host latencies.
-             */
-            seqlock_write_lock(&timers_state.vm_clock_seqlock,
-                               &timers_state.vm_clock_lock);
-            atomic_set_i64(&timers_state.qemu_icount_bias,
-                           timers_state.qemu_icount_bias + deadline);
-            seqlock_write_unlock(&timers_state.vm_clock_seqlock,
-                                 &timers_state.vm_clock_lock);
-            qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
-        } else {
-            /*
-             * We do stop VCPUs and only advance QEMU_CLOCK_VIRTUAL after some
-             * "real" time, (related to the time left until the next event) has
-             * passed. The QEMU_CLOCK_VIRTUAL_RT clock will do this.
-             * This avoids that the warps are visible externally; for example,
-             * you will not be sending network packets continuously instead of
-             * every 100ms.
-             */
-            seqlock_write_lock(&timers_state.vm_clock_seqlock,
-                               &timers_state.vm_clock_lock);
-            if (timers_state.vm_clock_warp_start == -1
-                || timers_state.vm_clock_warp_start > clock) {
-                timers_state.vm_clock_warp_start = clock;
-            }
-            seqlock_write_unlock(&timers_state.vm_clock_seqlock,
-                                 &timers_state.vm_clock_lock);
-            timer_mod_anticipate(timers_state.icount_warp_timer,
-                                 clock + deadline);
-        }
-    } else if (deadline == 0) {
-        qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
-    }
-}
-
-static void qemu_account_warp_timer(void)
-{
-    if (!use_icount || !icount_sleep) {
-        return;
-    }
-
-    /* Nothing to do if the VM is stopped: QEMU_CLOCK_VIRTUAL timers
-     * do not fire, so computing the deadline does not make sense.
-     */
-    if (!runstate_is_running()) {
-        return;
-    }
-
-    /* warp clock deterministically in record/replay mode */
-    if (!replay_checkpoint(CHECKPOINT_CLOCK_WARP_ACCOUNT)) {
-        return;
-    }
-
-    timer_del(timers_state.icount_warp_timer);
-    icount_warp_rt();
-}
-
-static bool icount_state_needed(void *opaque)
-{
-    return use_icount;
-}
-
-static bool warp_timer_state_needed(void *opaque)
-{
-    TimersState *s = opaque;
-    return s->icount_warp_timer != NULL;
-}
-
-static bool adjust_timers_state_needed(void *opaque)
-{
-    TimersState *s = opaque;
-    return s->icount_rt_timer != NULL;
-}
-
-/*
- * Subsection for warp timer migration is optional, because may not be created
- */
-static const VMStateDescription icount_vmstate_warp_timer = {
-    .name = "timer/icount/warp_timer",
-    .version_id = 1,
-    .minimum_version_id = 1,
-    .needed = warp_timer_state_needed,
-    .fields = (VMStateField[]) {
-        VMSTATE_INT64(vm_clock_warp_start, TimersState),
-        VMSTATE_TIMER_PTR(icount_warp_timer, TimersState),
-        VMSTATE_END_OF_LIST()
-    }
-};
-
-static const VMStateDescription icount_vmstate_adjust_timers = {
-    .name = "timer/icount/timers",
-    .version_id = 1,
-    .minimum_version_id = 1,
-    .needed = adjust_timers_state_needed,
-    .fields = (VMStateField[]) {
-        VMSTATE_TIMER_PTR(icount_rt_timer, TimersState),
-        VMSTATE_TIMER_PTR(icount_vm_timer, TimersState),
-        VMSTATE_END_OF_LIST()
-    }
-};
-
-/*
- * This is a subsection for icount migration.
- */
-static const VMStateDescription icount_vmstate_timers = {
-    .name = "timer/icount",
-    .version_id = 1,
-    .minimum_version_id = 1,
-    .needed = icount_state_needed,
-    .fields = (VMStateField[]) {
-        VMSTATE_INT64(qemu_icount_bias, TimersState),
-        VMSTATE_INT64(qemu_icount, TimersState),
-        VMSTATE_END_OF_LIST()
-    },
-    .subsections = (const VMStateDescription*[]) {
-        &icount_vmstate_warp_timer,
-        &icount_vmstate_adjust_timers,
-        NULL
-    }
-};
-
-static const VMStateDescription vmstate_timers = {
-    .name = "timer",
-    .version_id = 2,
-    .minimum_version_id = 1,
-    .fields = (VMStateField[]) {
-        VMSTATE_INT64(cpu_ticks_offset, TimersState),
-        VMSTATE_UNUSED(8),
-        VMSTATE_INT64_V(cpu_clock_offset, TimersState, 2),
-        VMSTATE_END_OF_LIST()
-    },
-    .subsections = (const VMStateDescription*[]) {
-        &icount_vmstate_timers,
-        NULL
-    }
-};
-
-void cpu_ticks_init(void)
-{
-    seqlock_init(&timers_state.vm_clock_seqlock);
-    qemu_spin_init(&timers_state.vm_clock_lock);
-    vmstate_register(NULL, 0, &vmstate_timers, &timers_state);
-    cpu_throttle_init();
-}
-
-void configure_icount(QemuOpts *opts, Error **errp)
-{
-    const char *option = qemu_opt_get(opts, "shift");
-    bool sleep = qemu_opt_get_bool(opts, "sleep", true);
-    bool align = qemu_opt_get_bool(opts, "align", false);
-    long time_shift = -1;
-
-    if (!option && qemu_opt_get(opts, "align")) {
-        error_setg(errp, "Please specify shift option when using align");
-        return;
-    }
-
-    if (align && !sleep) {
-        error_setg(errp, "align=on and sleep=off are incompatible");
-        return;
-    }
-
-    if (strcmp(option, "auto") != 0) {
-        if (qemu_strtol(option, NULL, 0, &time_shift) < 0
-            || time_shift < 0 || time_shift > MAX_ICOUNT_SHIFT) {
-            error_setg(errp, "icount: Invalid shift value");
-            return;
-        }
-    } else if (icount_align_option) {
-        error_setg(errp, "shift=auto and align=on are incompatible");
-        return;
-    } else if (!icount_sleep) {
-        error_setg(errp, "shift=auto and sleep=off are incompatible");
-        return;
-    }
-
-    icount_sleep = sleep;
-    if (icount_sleep) {
-        timers_state.icount_warp_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL_RT,
-                                         icount_timer_cb, NULL);
-    }
-
-    icount_align_option = align;
-
-    if (time_shift >= 0) {
-        timers_state.icount_time_shift = time_shift;
-        use_icount = 1;
-        return;
-    }
-
-    use_icount = 2;
-
-    /* 125MIPS seems a reasonable initial guess at the guest speed.
-       It will be corrected fairly quickly anyway.  */
-    timers_state.icount_time_shift = 3;
-
-    /* Have both realtime and virtual time triggers for speed adjustment.
-       The realtime trigger catches emulated time passing too slowly,
-       the virtual time trigger catches emulated time passing too fast.
-       Realtime triggers occur even when idle, so use them less frequently
-       than VM triggers.  */
-    timers_state.vm_clock_warp_start = -1;
-    timers_state.icount_rt_timer = timer_new_ms(QEMU_CLOCK_VIRTUAL_RT,
-                                   icount_adjust_rt, NULL);
-    timer_mod(timers_state.icount_rt_timer,
-                   qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL_RT) + 1000);
-    timers_state.icount_vm_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
-                                        icount_adjust_vm, NULL);
-    timer_mod(timers_state.icount_vm_timer,
-                   qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) +
-                   NANOSECONDS_PER_SECOND / 10);
-}
-
 /***********************************************************/
 /* TCG vCPU kick timer
  *
@@ -824,35 +160,6 @@ static void qemu_cpu_kick_rr_cpus(void)
     };
 }
 
-static void do_nothing(CPUState *cpu, run_on_cpu_data unused)
-{
-}
-
-void qemu_timer_notify_cb(void *opaque, QEMUClockType type)
-{
-    if (!use_icount || type != QEMU_CLOCK_VIRTUAL) {
-        qemu_notify_event();
-        return;
-    }
-
-    if (qemu_in_vcpu_thread()) {
-        /* A CPU is currently running; kick it back out to the
-         * tcg_cpu_exec() loop so it will recalculate its
-         * icount deadline immediately.
-         */
-        qemu_cpu_kick(current_cpu);
-    } else if (first_cpu) {
-        /* qemu_cpu_kick is not enough to kick a halted CPU out of
-         * qemu_tcg_wait_io_event.  async_run_on_cpu, instead,
-         * causes cpu_thread_is_idle to return false.  This way,
-         * handle_icount_deadline can run.
-         * If we have no CPUs at all for some reason, we don't
-         * need to do anything.
-         */
-        async_run_on_cpu(first_cpu, do_nothing, RUN_ON_CPU_NULL);
-    }
-}
-
 static void kick_tcg_thread(void *opaque)
 {
     timer_mod(tcg_kick_vcpu_timer, qemu_tcg_next_kick());
@@ -1254,7 +561,7 @@ static int64_t tcg_get_icount_limit(void)
             deadline = INT32_MAX;
         }
 
-        return qemu_icount_round(deadline);
+        return icount_round(deadline);
     } else {
         return replay_get_instructions();
     }
@@ -1263,7 +570,7 @@ static int64_t tcg_get_icount_limit(void)
 static void handle_icount_deadline(void)
 {
     assert(qemu_in_vcpu_thread());
-    if (use_icount) {
+    if (icount_enabled()) {
         int64_t deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL,
                                                       QEMU_TIMER_ATTR_ALL);
 
@@ -1277,7 +584,7 @@ static void handle_icount_deadline(void)
 
 static void prepare_icount_for_run(CPUState *cpu)
 {
-    if (use_icount) {
+    if (icount_enabled()) {
         int insns_left;
 
         /* These should always be cleared by process_icount_data after
@@ -1298,9 +605,9 @@ static void prepare_icount_for_run(CPUState *cpu)
 
 static void process_icount_data(CPUState *cpu)
 {
-    if (use_icount) {
+    if (icount_enabled()) {
         /* Account for executed instructions */
-        cpu_update_icount(cpu);
+        icount_update(cpu);
 
         /* Reset the counters */
         cpu_neg(cpu)->icount_decr.u16.low = 0;
@@ -1401,7 +708,7 @@ static void *qemu_tcg_rr_cpu_thread_fn(void *arg)
         replay_mutex_lock();
         qemu_mutex_lock_iothread();
         /* Account partial waits to QEMU_CLOCK_VIRTUAL.  */
-        qemu_account_warp_timer();
+        icount_account_warp_timer();
 
         /* Run the timers here.  This is much more efficient than
          * waking up the I/O thread and waiting for completion.
@@ -1459,7 +766,7 @@ static void *qemu_tcg_rr_cpu_thread_fn(void *arg)
             atomic_mb_set(&cpu->exit_request, 0);
         }
 
-        if (use_icount && all_cpu_threads_idle()) {
+        if (icount_enabled() && all_cpu_threads_idle()) {
             /*
              * When all cpus are sleeping (e.g in WFI), to avoid a deadlock
              * in the main_loop, wake it up in order to start the warp timer.
@@ -1612,7 +919,7 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
     CPUState *cpu = arg;
 
     assert(tcg_enabled());
-    g_assert(!use_icount);
+    g_assert(!icount_enabled());
 
     rcu_register_thread();
     tcg_register_thread();
@@ -2191,21 +1498,3 @@ void qmp_inject_nmi(Error **errp)
     nmi_monitor_handle(monitor_get_cpu_index(), errp);
 }
 
-void dump_drift_info(void)
-{
-    if (!use_icount) {
-        return;
-    }
-
-    qemu_printf("Host - Guest clock  %"PRIi64" ms\n",
-                (cpu_get_clock() - cpu_get_icount())/SCALE_MS);
-    if (icount_align_option) {
-        qemu_printf("Max guest delay     %"PRIi64" ms\n",
-                    -max_delay / SCALE_MS);
-        qemu_printf("Max guest advance   %"PRIi64" ms\n",
-                    max_advance / SCALE_MS);
-    } else {
-        qemu_printf("Max guest delay     NA\n");
-        qemu_printf("Max guest advance   NA\n");
-    }
-}
diff --git a/docs/replay.txt b/docs/replay.txt
index 70c27edb36..8952e6d852 100644
--- a/docs/replay.txt
+++ b/docs/replay.txt
@@ -184,11 +184,11 @@ is then incremented (which is called "warping" the virtual clock) as
 soon as the timer fires or the CPUs need to go out of the idle state.
 Two functions are used for this purpose; because these actions change
 virtual machine state and must be deterministic, each of them creates a
-checkpoint.  qemu_start_warp_timer checks if the CPUs are idle and if so
-starts accounting real time to virtual clock.  qemu_account_warp_timer
+checkpoint.  icount_start_warp_timer checks if the CPUs are idle and if so
+starts accounting real time to virtual clock.  icount_account_warp_timer
 is called when the CPUs get an interrupt or when the warp timer fires,
 and it warps the virtual clock by the amount of real time that has passed
-since qemu_start_warp_timer.
+since icount_start_warp_timer.
 
 Bottom halves
 -------------
diff --git a/exec.c b/exec.c
index 5162f0d12f..db9a90469b 100644
--- a/exec.c
+++ b/exec.c
@@ -104,10 +104,6 @@ uintptr_t qemu_host_page_size;
 intptr_t qemu_host_page_mask;
 
 #if !defined(CONFIG_USER_ONLY)
-/* 0 = Do not count executed instructions.
-   1 = Precise instruction counting.
-   2 = Adaptive rate instruction counting.  */
-int use_icount;
 
 typedef struct PhysPageEntry PhysPageEntry;
 
diff --git a/hw/core/ptimer.c b/hw/core/ptimer.c
index b5a54e2536..6c9f33208a 100644
--- a/hw/core/ptimer.c
+++ b/hw/core/ptimer.c
@@ -7,11 +7,11 @@
  */
 
 #include "qemu/osdep.h"
-#include "qemu/timer.h"
 #include "hw/ptimer.h"
 #include "migration/vmstate.h"
 #include "qemu/host-utils.h"
 #include "sysemu/replay.h"
+#include "sysemu/cpu-timers.h"
 #include "sysemu/qtest.h"
 #include "block/aio.h"
 #include "sysemu/cpus.h"
@@ -134,7 +134,7 @@ static void ptimer_reload(ptimer_state *s, int delta_adjust)
      * on the current generation of host machines.
      */
 
-    if (s->enabled == 1 && (delta * period < 10000) && !use_icount) {
+    if (s->enabled == 1 && (delta * period < 10000) && !icount_enabled()) {
         period = 10000 / delta;
         period_frac = 0;
     }
@@ -217,7 +217,7 @@ uint64_t ptimer_get_count(ptimer_state *s)
             uint32_t period_frac = s->period_frac;
             uint64_t period = s->period;
 
-            if (!oneshot && (s->delta * period < 10000) && !use_icount) {
+            if (!oneshot && (s->delta * period < 10000) && !icount_enabled()) {
                 period = 10000 / s->delta;
                 period_frac = 0;
             }
diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 7a3bc7ab66..002b3cabc2 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -34,6 +34,7 @@
 #include "sysemu/numa.h"
 #include "sysemu/replay.h"
 #include "sysemu/sysemu.h"
+#include "sysemu/cpu-timers.h"
 #include "trace.h"
 
 #include "hw/i386/x86.h"
diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index d14374bdd4..49eedd714d 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -409,8 +409,12 @@ static inline bool tlb_hit(target_ulong tlb_addr, target_ulong addr)
     return tlb_hit_page(tlb_addr, addr & TARGET_PAGE_MASK);
 }
 
+#ifdef CONFIG_TCG
+void dump_drift_info(void);
 void dump_exec_info(void);
 void dump_opcount_info(void);
+#endif /* CONFIG_TCG */
+
 #endif /* !CONFIG_USER_ONLY */
 
 int cpu_memory_rw_debug(CPUState *cpu, target_ulong addr,
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 8792bea07a..c1f51e37af 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -25,7 +25,7 @@
 #ifdef CONFIG_TCG
 #include "exec/cpu_ldst.h"
 #endif
-#include "sysemu/cpus.h"
+#include "sysemu/cpu-timers.h"
 
 /* allow to see translation results - the slowdown should be negligible, so we leave it */
 #define DEBUG_DISAS
@@ -489,7 +489,7 @@ static inline uint32_t tb_cflags(const TranslationBlock *tb)
 static inline uint32_t curr_cflags(void)
 {
     return (parallel_cpus ? CF_PARALLEL : 0)
-         | (use_icount ? CF_USE_ICOUNT : 0);
+         | (icount_enabled() ? CF_USE_ICOUNT : 0);
 }
 
 /* TranslationBlock invalidate API */
diff --git a/include/qemu/timer.h b/include/qemu/timer.h
index 6a8b48b5a9..c54b1b2813 100644
--- a/include/qemu/timer.h
+++ b/include/qemu/timer.h
@@ -224,13 +224,6 @@ void qemu_clock_notify(QEMUClockType type);
  */
 void qemu_clock_enable(QEMUClockType type, bool enabled);
 
-/**
- * qemu_start_warp_timer:
- *
- * Starts a timer for virtual clock update
- */
-void qemu_start_warp_timer(void);
-
 /**
  * qemu_clock_run_timers:
  * @type: clock on which to operate
@@ -791,12 +784,6 @@ static inline int64_t qemu_soonest_timeout(int64_t timeout1, int64_t timeout2)
  */
 void init_clocks(QEMUTimerListNotifyCB *notify_cb);
 
-int64_t cpu_get_ticks(void);
-/* Caller must hold BQL */
-void cpu_enable_ticks(void);
-/* Caller must hold BQL */
-void cpu_disable_ticks(void);
-
 static inline int64_t get_max_clock_jump(void)
 {
     /* This should be small enough to prevent excessive interrupts from being
@@ -850,13 +837,6 @@ static inline int64_t get_clock(void)
 }
 #endif
 
-/* icount */
-int64_t cpu_get_icount_raw(void);
-int64_t cpu_get_icount(void);
-int64_t cpu_get_clock(void);
-int64_t cpu_icount_to_ns(int64_t icount);
-void    cpu_update_icount(CPUState *cpu);
-
 /*******************************************/
 /* host CPU ticks (if available) */
 
diff --git a/include/sysemu/cpu-timers.h b/include/sysemu/cpu-timers.h
new file mode 100644
index 0000000000..3db579fde7
--- /dev/null
+++ b/include/sysemu/cpu-timers.h
@@ -0,0 +1,73 @@
+#ifndef SYSEMU_CPU_TIMERS_H
+#define SYSEMU_CPU_TIMERS_H
+
+#include "qemu/timer.h"
+
+/* init the whole cpu timers API, including icount, ticks, and cpu_throttle */
+void cpu_timers_init(void);
+
+/* icount - Instruction Counter API */
+
+/*
+ * Return the icount enablement state:
+ *
+ * 0 = Disabled - Do not count executed instructions.
+ * 1 = Enabled - Fixed conversion of insn to ns via "shift" option
+ * 2 = Enabled - Runtime adaptive algorithm to compute shift
+ */
+int icount_enabled(void);
+/*
+ * Update the icount with the executed instructions. Called by
+ * cpus-tcg vCPU thread so the main-loop can see time has moved forward.
+ */
+void icount_update(CPUState *cpu);
+
+/* get raw icount value */
+int64_t icount_get_raw(void);
+
+/* return the virtual CPU time in ns, based on the instruction counter. */
+int64_t icount_get(void);
+/*
+ * convert an instruction counter value to ns, based on the icount shift.
+ * This shift is set as a fixed value with the icount "shift" option
+ * (precise mode), or it is constantly approximated and corrected at
+ * runtime in adaptive mode.
+ */
+int64_t icount_to_ns(int64_t icount);
+
+/* configure the icount options, including "shift" */
+void icount_configure(QemuOpts *opts, Error **errp);
+
+/* used by tcg vcpu thread to calc icount budget */
+int64_t icount_round(int64_t count);
+
+/* if the CPUs are idle, start accounting real time to virtual clock. */
+void icount_start_warp_timer(void);
+void icount_account_warp_timer(void);
+
+/*
+ * CPU Ticks and Clock
+ */
+
+/* Caller must hold BQL */
+void cpu_enable_ticks(void);
+/* Caller must hold BQL */
+void cpu_disable_ticks(void);
+
+/*
+ * return the time elapsed in VM between vm_start and vm_stop.  Unless
+ * icount is active, cpu_get_ticks() uses units of the host CPU cycle
+ * counter.
+ */
+int64_t cpu_get_ticks(void);
+
+/*
+ * Returns the monotonic time elapsed in VM, i.e.,
+ * the time between vm_start and vm_stop
+ */
+int64_t cpu_get_clock(void);
+
+void qemu_timer_notify_cb(void *opaque, QEMUClockType type);
+void qtest_clock_warp(int64_t dest);
+
+#endif /* SYSEMU_CPU_TIMERS_H */
diff --git a/include/sysemu/cpus.h b/include/sysemu/cpus.h
index 3c1da6a018..149de000a0 100644
--- a/include/sysemu/cpus.h
+++ b/include/sysemu/cpus.h
@@ -4,33 +4,23 @@
 #include "qemu/timer.h"
 
 /* cpus.c */
+bool all_cpu_threads_idle(void);
 bool qemu_in_vcpu_thread(void);
 void qemu_init_cpu_loop(void);
 void resume_all_vcpus(void);
 void pause_all_vcpus(void);
 void cpu_stop_current(void);
-void cpu_ticks_init(void);
 
-void configure_icount(QemuOpts *opts, Error **errp);
-extern int use_icount;
 extern int icount_align_option;
 
-/* drift information for info jit command */
-extern int64_t max_delay;
-extern int64_t max_advance;
-void dump_drift_info(void);
-
 /* Unblock cpu */
 void qemu_cpu_kick_self(void);
-void qemu_timer_notify_cb(void *opaque, QEMUClockType type);
 
 void cpu_synchronize_all_states(void);
 void cpu_synchronize_all_post_reset(void);
 void cpu_synchronize_all_post_init(void);
 void cpu_synchronize_all_pre_loadvm(void);
 
-void qtest_clock_warp(int64_t dest);
-
 #ifndef CONFIG_USER_ONLY
 /* vl.c */
 /* *-user doesn't have configurable SMP topology */
diff --git a/include/sysemu/replay.h b/include/sysemu/replay.h
index 5471bb514d..a140d69a73 100644
--- a/include/sysemu/replay.h
+++ b/include/sysemu/replay.h
@@ -109,12 +109,12 @@ int64_t replay_read_clock(ReplayClockKind kind);
 #define REPLAY_CLOCK(clock, value)                                      \
     (replay_mode == REPLAY_MODE_PLAY ? replay_read_clock((clock))       \
         : replay_mode == REPLAY_MODE_RECORD                             \
-            ? replay_save_clock((clock), (value), cpu_get_icount_raw()) \
+            ? replay_save_clock((clock), (value), icount_get_raw()) \
         : (value))
 #define REPLAY_CLOCK_LOCKED(clock, value)                               \
     (replay_mode == REPLAY_MODE_PLAY ? replay_read_clock((clock))       \
         : replay_mode == REPLAY_MODE_RECORD                             \
-            ? replay_save_clock((clock), (value), cpu_get_icount_raw_locked()) \
+            ? replay_save_clock((clock), (value), icount_get_raw_locked()) \
         : (value))
 
 /* Processing data from random generators */
diff --git a/qtest.c b/qtest.c
index 5672b75c35..a1b92853c9 100644
--- a/qtest.c
+++ b/qtest.c
@@ -21,7 +21,7 @@
 #include "exec/memory.h"
 #include "hw/irq.h"
 #include "sysemu/accel.h"
-#include "sysemu/cpus.h"
+#include "sysemu/cpu-timers.h"
 #include "qemu/config-file.h"
 #include "qemu/option.h"
 #include "qemu/error-report.h"
diff --git a/replay/replay.c b/replay/replay.c
index 706c7b4f4b..9896a3b6f5 100644
--- a/replay/replay.c
+++ b/replay/replay.c
@@ -11,10 +11,10 @@
 
 #include "qemu/osdep.h"
 #include "qapi/error.h"
+#include "sysemu/cpu-timers.h"
 #include "sysemu/replay.h"
 #include "sysemu/runstate.h"
 #include "replay-internal.h"
-#include "qemu/timer.h"
 #include "qemu/main-loop.h"
 #include "qemu/option.h"
 #include "sysemu/cpus.h"
@@ -64,7 +64,7 @@ bool replay_next_event_is(int event)
 
 uint64_t replay_get_current_icount(void)
 {
-    return cpu_get_icount_raw();
+    return icount_get_raw();
 }
 
 int replay_get_instructions(void)
@@ -345,7 +345,7 @@ void replay_start(void)
         error_reportf_err(replay_blockers->data, "Record/replay: ");
         exit(1);
     }
-    if (!use_icount) {
+    if (!icount_enabled()) {
         error_report("Please enable icount to use record/replay");
         exit(1);
     }
diff --git a/softmmu/vl.c b/softmmu/vl.c
index ae5451bc23..ed53cd1b62 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -73,6 +73,7 @@
 #include "hw/audio/soundhw.h"
 #include "audio/audio.h"
 #include "sysemu/cpus.h"
+#include "sysemu/cpu-timers.h"
 #include "migration/colo.h"
 #include "migration/postcopy-ram.h"
 #include "sysemu/kvm.h"
@@ -2675,7 +2676,7 @@ static void user_register_global_props(void)
 
 static int do_configure_icount(void *opaque, QemuOpts *opts, Error **errp)
 {
-    configure_icount(opts, errp);
+    icount_configure(opts, errp);
     return 0;
 }
 
@@ -2785,7 +2786,7 @@ static void configure_accelerators(const char *progname)
         error_report("falling back to %s", ac->name);
     }
 
-    if (use_icount && !(tcg_enabled() || qtest_enabled())) {
+    if (icount_enabled() && !(tcg_enabled() || qtest_enabled())) {
         error_report("-icount is not allowed with hardware virtualization");
         exit(1);
     }
@@ -4233,7 +4234,8 @@ void qemu_init(int argc, char **argv, char **envp)
     /* spice needs the timers to be initialized by this point */
     qemu_spice_init();
 
-    cpu_ticks_init();
+    /* initialize cpu timers and VCPU throttle modules */
+    cpu_timers_init();
 
     if (default_net) {
         QemuOptsList *net = qemu_find_opts("net");
diff --git a/stubs/clock-warp.c b/stubs/clock-warp.c
index b53e5dd94c..304da5091c 100644
--- a/stubs/clock-warp.c
+++ b/stubs/clock-warp.c
@@ -1,7 +1,7 @@
 #include "qemu/osdep.h"
-#include "qemu/timer.h"
+#include "sysemu/cpu-timers.h"
 
-void qemu_start_warp_timer(void)
+void icount_start_warp_timer(void)
 {
 }
 
diff --git a/stubs/cpu-get-clock.c b/stubs/cpu-get-clock.c
index 5a92810e87..6102338743 100644
--- a/stubs/cpu-get-clock.c
+++ b/stubs/cpu-get-clock.c
@@ -1,5 +1,5 @@
 #include "qemu/osdep.h"
-#include "qemu/timer.h"
+#include "sysemu/cpu-timers.h"
 
 int64_t cpu_get_clock(void)
 {
diff --git a/stubs/cpu-get-icount.c b/stubs/cpu-get-icount.c
index b35f844638..23f9154ef3 100644
--- a/stubs/cpu-get-icount.c
+++ b/stubs/cpu-get-icount.c
@@ -1,20 +1,22 @@
 #include "qemu/osdep.h"
-#include "qemu/timer.h"
-#include "sysemu/cpus.h"
+#include "sysemu/cpu-timers.h"
 #include "qemu/main-loop.h"
 
-int use_icount;
-
-int64_t cpu_get_icount(void)
+int64_t icount_get(void)
 {
     abort();
 }
 
-int64_t cpu_get_icount_raw(void)
+int64_t icount_get_raw(void)
 {
     abort();
 }
 
+int icount_enabled(void)
+{
+    return 0;
+}
+
 void qemu_timer_notify_cb(void *opaque, QEMUClockType type)
 {
     qemu_notify_event();
diff --git a/target/alpha/translate.c b/target/alpha/translate.c
index 8870284f57..36be602179 100644
--- a/target/alpha/translate.c
+++ b/target/alpha/translate.c
@@ -20,6 +20,7 @@
 #include "qemu/osdep.h"
 #include "cpu.h"
 #include "sysemu/cpus.h"
+#include "sysemu/cpu-timers.h"
 #include "disas/disas.h"
 #include "qemu/host-utils.h"
 #include "exec/exec-all.h"
@@ -1329,7 +1330,7 @@ static DisasJumpType gen_mfpr(DisasContext *ctx, TCGv va, int regno)
     case 249: /* VMTIME */
         helper = gen_helper_get_vmtime;
     do_helper:
-        if (use_icount) {
+        if (icount_enabled()) {
             gen_io_start();
             helper(va);
             return DISAS_PC_STALE;
diff --git a/target/arm/helper.c b/target/arm/helper.c
index a92ae55672..c9f99f7952 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -24,6 +24,7 @@
 #include "hw/irq.h"
 #include "hw/semihosting/semihost.h"
 #include "sysemu/cpus.h"
+#include "sysemu/cpu-timers.h"
 #include "sysemu/kvm.h"
 #include "sysemu/tcg.h"
 #include "qemu/range.h"
@@ -1205,17 +1206,17 @@ static int64_t cycles_ns_per(uint64_t cycles)
 
 static bool instructions_supported(CPUARMState *env)
 {
-    return use_icount == 1 /* Precise instruction counting */;
+    return icount_enabled() == 1; /* Precise instruction counting */
 }
 
 static uint64_t instructions_get_count(CPUARMState *env)
 {
-    return (uint64_t)cpu_get_icount_raw();
+    return (uint64_t)icount_get_raw();
 }
 
 static int64_t instructions_ns_per(uint64_t icount)
 {
-    return cpu_icount_to_ns((int64_t)icount);
+    return icount_to_ns((int64_t)icount);
 }
 #endif
 
diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index 11d184cd16..6093f73e3a 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -194,8 +194,8 @@ static int write_fcsr(CPURISCVState *env, int csrno, target_ulong val)
 static int read_instret(CPURISCVState *env, int csrno, target_ulong *val)
 {
 #if !defined(CONFIG_USER_ONLY)
-    if (use_icount) {
-        *val = cpu_get_icount();
+    if (icount_enabled()) {
+        *val = icount_get();
     } else {
         *val = cpu_get_host_ticks();
     }
@@ -209,8 +209,8 @@ static int read_instret(CPURISCVState *env, int csrno, target_ulong *val)
 static int read_instreth(CPURISCVState *env, int csrno, target_ulong *val)
 {
 #if !defined(CONFIG_USER_ONLY)
-    if (use_icount) {
-        *val = cpu_get_icount() >> 32;
+    if (icount_enabled()) {
+        *val = icount_get() >> 32;
     } else {
         *val = cpu_get_host_ticks() >> 32;
     }
diff --git a/tests/ptimer-test-stubs.c b/tests/ptimer-test-stubs.c
index ed393d9082..320dcf99b7 100644
--- a/tests/ptimer-test-stubs.c
+++ b/tests/ptimer-test-stubs.c
@@ -12,6 +12,7 @@
 #include "qemu/main-loop.h"
 #include "sysemu/replay.h"
 #include "migration/vmstate.h"
+#include "sysemu/cpu-timers.h"
 
 #include "ptimer-test.h"
 
@@ -126,3 +127,8 @@ void replay_bh_schedule_event(QEMUBH *bh)
 {
     bh->cb(bh->opaque);
 }
+
+int icount_enabled(void)
+{
+    return 0;
+}
diff --git a/tests/test-timed-average.c b/tests/test-timed-average.c
index e2bcf5fe13..82c92500df 100644
--- a/tests/test-timed-average.c
+++ b/tests/test-timed-average.c
@@ -11,7 +11,7 @@
  */
 
 #include "qemu/osdep.h"
-
+#include "sysemu/cpu-timers.h"
 #include "qemu/timed-average.h"
 
 /* This is the clock for QEMU_CLOCK_VIRTUAL */
diff --git a/util/main-loop.c b/util/main-loop.c
index eda63fe4e0..f1af697572 100644
--- a/util/main-loop.c
+++ b/util/main-loop.c
@@ -27,7 +27,7 @@
 #include "qemu/cutils.h"
 #include "qemu/timer.h"
 #include "sysemu/qtest.h"
-#include "sysemu/cpus.h"
+#include "sysemu/cpu-timers.h"
 #include "sysemu/replay.h"
 #include "qemu/main-loop.h"
 #include "block/aio.h"
@@ -521,7 +521,7 @@ void main_loop_wait(int nonblocking)
 
     /* CPU thread can infinitely wait for event after
        missing the warp */
-    qemu_start_warp_timer();
+    icount_start_warp_timer();
     qemu_clock_run_all_timers();
 }
 
diff --git a/util/qemu-timer.c b/util/qemu-timer.c
index b6575a2cd5..da2883f914 100644
--- a/util/qemu-timer.c
+++ b/util/qemu-timer.c
@@ -26,6 +26,7 @@
 #include "qemu/main-loop.h"
 #include "qemu/timer.h"
 #include "qemu/lockable.h"
+#include "sysemu/cpu-timers.h"
 #include "sysemu/replay.h"
 #include "sysemu/cpus.h"
 
@@ -134,7 +135,7 @@ static void qemu_clock_init(QEMUClockType type, QEMUTimerListNotifyCB *notify_cb
 
 bool qemu_clock_use_for_deadline(QEMUClockType type)
 {
-    return !(use_icount && (type == QEMU_CLOCK_VIRTUAL));
+    return !(icount_enabled() && (type == QEMU_CLOCK_VIRTUAL));
 }
 
 void qemu_clock_notify(QEMUClockType type)
@@ -417,7 +418,7 @@ static void timerlist_rearm(QEMUTimerList *timer_list)
 {
     /* Interrupt execution to force deadline recalculation.  */
     if (timer_list->clock->type == QEMU_CLOCK_VIRTUAL) {
-        qemu_start_warp_timer();
+        icount_start_warp_timer();
     }
     timerlist_notify(timer_list);
 }
@@ -647,8 +648,8 @@ int64_t qemu_clock_get_ns(QEMUClockType type)
         return get_clock();
     default:
     case QEMU_CLOCK_VIRTUAL:
-        if (use_icount) {
-            return cpu_get_icount();
+        if (icount_enabled()) {
+            return icount_get();
         } else {
             return cpu_get_clock();
         }
-- 
2.16.4



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC 3/3] cpus: implement cpus interfaces for per-accel threads
  2020-05-21 18:54 [RFC 0/3] QEMU cpus.c refactoring Claudio Fontana
  2020-05-21 18:54 ` [RFC 1/3] cpu-throttle: new module, extracted from cpus.c Claudio Fontana
  2020-05-21 18:54 ` [RFC 2/3] cpu-timers: new module " Claudio Fontana
@ 2020-05-21 18:54 ` Claudio Fontana
  2 siblings, 0 replies; 11+ messages in thread
From: Claudio Fontana @ 2020-05-21 18:54 UTC (permalink / raw)
  To: Paolo Bonzini, Alex Bennée, Peter Maydell,
	Philippe Mathieu-Daudé
  Cc: Laurent Vivier, Thomas Huth, Eduardo Habkost, Marcelo Tosatti,
	open list:All patches CC here, Roman Bolshakov, Wenchao Wang,
	Colin Xu, Claudio Fontana, open list:X86 HAXM CPUs,
	Sunil Muthuswamy, Richard Henderson

Signed-off-by: Claudio Fontana <cfontana@suse.de>
---
 MAINTAINERS                          |   1 +
 accel/kvm/Makefile.objs              |   2 +
 accel/kvm/kvm-all.c                  |  15 +-
 accel/kvm/kvm-cpus-interface.c       |  94 ++++
 accel/kvm/kvm-cpus-interface.h       |   8 +
 accel/qtest.c                        |  82 ++++
 accel/stubs/kvm-stub.c               |   3 +-
 accel/tcg/Makefile.objs              |   1 +
 accel/tcg/tcg-all.c                  |  12 +-
 accel/tcg/tcg-cpus-interface.c       | 523 ++++++++++++++++++++
 accel/tcg/tcg-cpus-interface.h       |   8 +
 cpus.c                               | 911 +++--------------------------------
 hw/core/cpu.c                        |   1 +
 include/sysemu/cpus.h                |  44 ++
 include/sysemu/hvf.h                 |   1 -
 include/sysemu/hw_accel.h            |  57 +--
 include/sysemu/kvm.h                 |   2 +-
 stubs/Makefile.objs                  |   1 +
 stubs/cpu-synchronize-state.c        |  15 +
 target/i386/Makefile.objs            |   7 +-
 target/i386/hax-all.c                |   6 +-
 target/i386/hax-cpus-interface.c     |  85 ++++
 target/i386/hax-cpus-interface.h     |   8 +
 target/i386/hax-i386.h               |   2 +
 target/i386/hax-posix.c              |  12 +
 target/i386/hax-windows.c            |  20 +
 target/i386/hvf/Makefile.objs        |   2 +-
 target/i386/hvf/hvf-cpus-interface.c |  92 ++++
 target/i386/hvf/hvf-cpus-interface.h |   8 +
 target/i386/hvf/hvf.c                |   5 +-
 target/i386/whpx-all.c               |   3 +
 target/i386/whpx-cpus-interface.c    |  96 ++++
 target/i386/whpx-cpus-interface.h    |   8 +
 33 files changed, 1232 insertions(+), 903 deletions(-)
 create mode 100644 accel/kvm/kvm-cpus-interface.c
 create mode 100644 accel/kvm/kvm-cpus-interface.h
 create mode 100644 accel/tcg/tcg-cpus-interface.c
 create mode 100644 accel/tcg/tcg-cpus-interface.h
 create mode 100644 stubs/cpu-synchronize-state.c
 create mode 100644 target/i386/hax-cpus-interface.c
 create mode 100644 target/i386/hax-cpus-interface.h
 create mode 100644 target/i386/hvf/hvf-cpus-interface.c
 create mode 100644 target/i386/hvf/hvf-cpus-interface.h
 create mode 100644 target/i386/whpx-cpus-interface.c
 create mode 100644 target/i386/whpx-cpus-interface.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 1b3b17fda8..a256d5574c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -426,6 +426,7 @@ WHPX CPUs
 M: Sunil Muthuswamy <sunilmut@microsoft.com>
 S: Supported
 F: target/i386/whpx-all.c
+F: target/i386/whpx-cpus-interface.c
 F: target/i386/whp-dispatch.h
 F: accel/stubs/whpx-stub.c
 F: include/sysemu/whpx.h
diff --git a/accel/kvm/Makefile.objs b/accel/kvm/Makefile.objs
index fdfa481578..4babbf7796 100644
--- a/accel/kvm/Makefile.objs
+++ b/accel/kvm/Makefile.objs
@@ -1,2 +1,4 @@
 obj-y += kvm-all.o
+obj-y += kvm-cpus-interface.o
+
 obj-$(call lnot,$(CONFIG_SEV)) += sev-stub.o
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index d06cc04079..c9cbbb1184 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -45,6 +45,10 @@
 #include "qapi/qapi-types-common.h"
 #include "qapi/qapi-visit-common.h"
 #include "sysemu/reset.h"
+#include "qemu/guest-random.h"
+
+#include "sysemu/hw_accel.h"
+#include "kvm-cpus-interface.h"
 
 #include "hw/boards.h"
 
@@ -329,7 +333,7 @@ err:
     return ret;
 }
 
-int kvm_destroy_vcpu(CPUState *cpu)
+static int do_kvm_destroy_vcpu(CPUState *cpu)
 {
     KVMState *s = kvm_state;
     long mmap_size;
@@ -363,6 +367,14 @@ err:
     return ret;
 }
 
+void kvm_destroy_vcpu(CPUState *cpu)
+{
+    if (do_kvm_destroy_vcpu(cpu) < 0) {
+        error_report("kvm_destroy_vcpu failed");
+        exit(EXIT_FAILURE);
+    }
+}
+
 static int kvm_get_vcpu(KVMState *s, unsigned long vcpu_id)
 {
     struct KVMParkedVcpu *cpu;
@@ -2146,6 +2158,7 @@ static int kvm_init(MachineState *ms)
         qemu_balloon_inhibit(true);
     }
 
+    cpus_register_accel_interface(&kvm_cpus_interface);
     return 0;
 
 err:
diff --git a/accel/kvm/kvm-cpus-interface.c b/accel/kvm/kvm-cpus-interface.c
new file mode 100644
index 0000000000..fd3d117364
--- /dev/null
+++ b/accel/kvm/kvm-cpus-interface.c
@@ -0,0 +1,94 @@
+/*
+ * QEMU KVM support
+ *
+ * Copyright IBM, Corp. 2008
+ *           Red Hat, Inc. 2008
+ *
+ * Authors:
+ *  Anthony Liguori   <aliguori@us.ibm.com>
+ *  Glauber Costa     <gcosta@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/error-report.h"
+#include "qemu/main-loop.h"
+#include "sysemu/kvm_int.h"
+#include "sysemu/runstate.h"
+#include "sysemu/cpus.h"
+#include "qemu/guest-random.h"
+
+#include "kvm-cpus-interface.h"
+
+static void kvm_kick_vcpu_thread(CPUState *cpu)
+{
+    cpus_kick_thread(cpu);
+}
+
+static void *kvm_vcpu_thread_fn(void *arg)
+{
+    CPUState *cpu = arg;
+    int r;
+
+    rcu_register_thread();
+
+    qemu_mutex_lock_iothread();
+    qemu_thread_get_self(cpu->thread);
+    cpu->thread_id = qemu_get_thread_id();
+    cpu->can_do_io = 1;
+    current_cpu = cpu;
+
+    r = kvm_init_vcpu(cpu);
+    if (r < 0) {
+        error_report("kvm_init_vcpu failed: %s", strerror(-r));
+        exit(1);
+    }
+
+    kvm_init_cpu_signals(cpu);
+
+    /* signal CPU creation */
+    cpu_thread_signal_created(cpu);
+    qemu_guest_random_seed_thread_part2(cpu->random_seed);
+
+    do {
+        if (cpu_can_run(cpu)) {
+            r = kvm_cpu_exec(cpu);
+            if (r == EXCP_DEBUG) {
+                cpu_handle_guest_debug(cpu);
+            }
+        }
+        qemu_wait_io_event(cpu);
+    } while (!cpu->unplug || cpu_can_run(cpu));
+
+    kvm_destroy_vcpu(cpu);
+    cpu_thread_signal_destroyed(cpu);
+    qemu_mutex_unlock_iothread();
+    rcu_unregister_thread();
+    return NULL;
+}
+
+static void kvm_start_vcpu_thread(CPUState *cpu)
+{
+    char thread_name[VCPU_THREAD_NAME_SIZE];
+
+    cpu->thread = g_malloc0(sizeof(QemuThread));
+    cpu->halt_cond = g_malloc0(sizeof(QemuCond));
+    qemu_cond_init(cpu->halt_cond);
+    snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/KVM",
+             cpu->cpu_index);
+    qemu_thread_create(cpu->thread, thread_name, kvm_vcpu_thread_fn,
+                       cpu, QEMU_THREAD_JOINABLE);
+}
+
+CpusAccelInterface kvm_cpus_interface = {
+    .create_vcpu_thread = kvm_start_vcpu_thread,
+    .kick_vcpu_thread = kvm_kick_vcpu_thread,
+
+    .cpu_synchronize_post_reset = kvm_cpu_synchronize_post_reset,
+    .cpu_synchronize_post_init = kvm_cpu_synchronize_post_init,
+    .cpu_synchronize_state = kvm_cpu_synchronize_state,
+    .cpu_synchronize_pre_loadvm = kvm_cpu_synchronize_pre_loadvm,
+};
diff --git a/accel/kvm/kvm-cpus-interface.h b/accel/kvm/kvm-cpus-interface.h
new file mode 100644
index 0000000000..5531a4a4ad
--- /dev/null
+++ b/accel/kvm/kvm-cpus-interface.h
@@ -0,0 +1,8 @@
+#ifndef KVM_CPUS_INTERFACE_H
+#define KVM_CPUS_INTERFACE_H
+
+#include "sysemu/cpus.h"
+
+extern CpusAccelInterface kvm_cpus_interface;
+
+#endif /* KVM_CPUS_INTERFACE */
diff --git a/accel/qtest.c b/accel/qtest.c
index ef9ee0941a..1677c8724d 100644
--- a/accel/qtest.c
+++ b/accel/qtest.c
@@ -12,6 +12,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/rcu.h"
 #include "qapi/error.h"
 #include "qemu/module.h"
 #include "qemu/option.h"
@@ -20,6 +21,86 @@
 #include "sysemu/qtest.h"
 #include "sysemu/cpus.h"
 #include "sysemu/cpu-timers.h"
+#include "qemu/guest-random.h"
+#include "qemu/main-loop.h"
+#include "hw/core/cpu.h"
+
+static void qtest_cpu_synchronize_noop(CPUState *cpu)
+{
+}
+
+static void qtest_kick_vcpu_thread(CPUState *cpu)
+{
+    cpus_kick_thread(cpu);
+}
+
+static void *qtest_cpu_thread_fn(void *arg)
+{
+#ifdef _WIN32
+    error_report("qtest is not supported under Windows");
+    exit(1);
+#else
+    CPUState *cpu = arg;
+    sigset_t waitset;
+    int r;
+
+    rcu_register_thread();
+
+    qemu_mutex_lock_iothread();
+    qemu_thread_get_self(cpu->thread);
+    cpu->thread_id = qemu_get_thread_id();
+    cpu->can_do_io = 1;
+    current_cpu = cpu;
+
+    sigemptyset(&waitset);
+    sigaddset(&waitset, SIG_IPI);
+
+    /* signal CPU creation */
+    cpu_thread_signal_created(cpu);
+    qemu_guest_random_seed_thread_part2(cpu->random_seed);
+
+    do {
+        qemu_mutex_unlock_iothread();
+        do {
+            int sig;
+            r = sigwait(&waitset, &sig);
+        } while (r == -1 && (errno == EAGAIN || errno == EINTR));
+        if (r == -1) {
+            perror("sigwait");
+            exit(1);
+        }
+        qemu_mutex_lock_iothread();
+        qemu_wait_io_event(cpu);
+    } while (!cpu->unplug);
+
+    qemu_mutex_unlock_iothread();
+    rcu_unregister_thread();
+    return NULL;
+#endif
+}
+
+static void qtest_start_vcpu_thread(CPUState *cpu)
+{
+    char thread_name[VCPU_THREAD_NAME_SIZE];
+
+    cpu->thread = g_malloc0(sizeof(QemuThread));
+    cpu->halt_cond = g_malloc0(sizeof(QemuCond));
+    qemu_cond_init(cpu->halt_cond);
+    snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/DUMMY",
+             cpu->cpu_index);
+    qemu_thread_create(cpu->thread, thread_name, qtest_cpu_thread_fn, cpu,
+                       QEMU_THREAD_JOINABLE);
+}
+
+CpusAccelInterface qtest_cpus_interface = {
+    .create_vcpu_thread = qtest_start_vcpu_thread,
+    .kick_vcpu_thread = qtest_kick_vcpu_thread,
+
+    .cpu_synchronize_post_reset = qtest_cpu_synchronize_noop,
+    .cpu_synchronize_post_init = qtest_cpu_synchronize_noop,
+    .cpu_synchronize_state = qtest_cpu_synchronize_noop,
+    .cpu_synchronize_pre_loadvm = qtest_cpu_synchronize_noop,
+};
 
 static int qtest_init_accel(MachineState *ms)
 {
@@ -28,6 +109,7 @@ static int qtest_init_accel(MachineState *ms)
     qemu_opt_set(opts, "shift", "0", &error_abort);
     icount_configure(opts, &error_abort);
     qemu_opts_del(opts);
+    cpus_register_accel_interface(&qtest_cpus_interface);
     return 0;
 }
 
diff --git a/accel/stubs/kvm-stub.c b/accel/stubs/kvm-stub.c
index 82f118d2df..69f8a842da 100644
--- a/accel/stubs/kvm-stub.c
+++ b/accel/stubs/kvm-stub.c
@@ -32,9 +32,8 @@ bool kvm_readonly_mem_allowed;
 bool kvm_ioeventfd_any_length_allowed;
 bool kvm_msi_use_devid;
 
-int kvm_destroy_vcpu(CPUState *cpu)
+void kvm_destroy_vcpu(CPUState *cpu)
 {
-    return -ENOSYS;
 }
 
 int kvm_init_vcpu(CPUState *cpu)
diff --git a/accel/tcg/Makefile.objs b/accel/tcg/Makefile.objs
index a92f2c454b..ddc57acae2 100644
--- a/accel/tcg/Makefile.objs
+++ b/accel/tcg/Makefile.objs
@@ -1,5 +1,6 @@
 obj-$(CONFIG_SOFTMMU) += tcg-all.o
 obj-$(CONFIG_SOFTMMU) += cputlb.o
+obj-$(CONFIG_SOFTMMU) += tcg-cpus-interface.o
 obj-y += tcg-runtime.o tcg-runtime-gvec.o
 obj-y += cpu-exec.o cpu-exec-common.o translate-all.o
 obj-y += translator.o
diff --git a/accel/tcg/tcg-all.c b/accel/tcg/tcg-all.c
index e27385d051..9e332585d3 100644
--- a/accel/tcg/tcg-all.c
+++ b/accel/tcg/tcg-all.c
@@ -24,19 +24,17 @@
  */
 
 #include "qemu/osdep.h"
-#include "sysemu/accel.h"
+#include "qemu-common.h"
 #include "sysemu/tcg.h"
-#include "qom/object.h"
-#include "cpu.h"
-#include "sysemu/cpus.h"
 #include "sysemu/cpu-timers.h"
-#include "qemu/main-loop.h"
 #include "tcg/tcg.h"
 #include "qapi/error.h"
 #include "qemu/error-report.h"
 #include "hw/boards.h"
 #include "qapi/qapi-builtin-visit.h"
 
+#include "tcg-cpus-interface.h"
+
 typedef struct TCGState {
     AccelState parent_obj;
 
@@ -123,6 +121,8 @@ static void tcg_accel_instance_init(Object *obj)
     s->mttcg_enabled = default_mttcg_enabled();
 }
 
+bool mttcg_enabled;
+
 static int tcg_init(MachineState *ms)
 {
     TCGState *s = TCG_STATE(current_accel());
@@ -130,6 +130,8 @@ static int tcg_init(MachineState *ms)
     tcg_exec_init(s->tb_size * 1024 * 1024);
     cpu_interrupt_handler = tcg_handle_interrupt;
     mttcg_enabled = s->mttcg_enabled;
+    cpus_register_accel_interface(&tcg_cpus_interface);
+
     return 0;
 }
 
diff --git a/accel/tcg/tcg-cpus-interface.c b/accel/tcg/tcg-cpus-interface.c
new file mode 100644
index 0000000000..28a88beb84
--- /dev/null
+++ b/accel/tcg/tcg-cpus-interface.c
@@ -0,0 +1,523 @@
+/*
+ * QEMU System Emulator, accelerator interfaces
+ *
+ * Copyright (c) 2003-2008 Fabrice Bellard
+ * Copyright (c) 2014 Red Hat Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "sysemu/tcg.h"
+#include "sysemu/replay.h"
+#include "qemu/main-loop.h"
+#include "qemu/guest-random.h"
+#include "exec/exec-all.h"
+
+#include "tcg-cpus-interface.h"
+
+/* Kick all RR vCPUs */
+static void qemu_cpu_kick_rr_cpus(void)
+{
+    CPUState *cpu;
+
+    CPU_FOREACH(cpu) {
+        cpu_exit(cpu);
+    };
+}
+
+static void tcg_kick_vcpu_thread(CPUState *cpu)
+{
+    if (qemu_tcg_mttcg_enabled()) {
+        cpu_exit(cpu);
+    } else {
+        qemu_cpu_kick_rr_cpus();
+    }
+}
+
+/*
+ * TCG vCPU kick timer
+ *
+ * The kick timer is responsible for moving single threaded vCPU
+ * emulation on to the next vCPU. If more than one vCPU is running a
+ * timer event with force a cpu->exit so the next vCPU can get
+ * scheduled.
+ *
+ * The timer is removed if all vCPUs are idle and restarted again once
+ * idleness is complete.
+ */
+
+static QEMUTimer *tcg_kick_vcpu_timer;
+static CPUState *tcg_current_rr_cpu;
+
+#define TCG_KICK_PERIOD (NANOSECONDS_PER_SECOND / 10)
+
+static inline int64_t qemu_tcg_next_kick(void)
+{
+    return qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + TCG_KICK_PERIOD;
+}
+
+/* Kick the currently round-robin scheduled vCPU to next */
+static void qemu_cpu_kick_rr_next_cpu(void)
+{
+    CPUState *cpu;
+    do {
+        cpu = atomic_mb_read(&tcg_current_rr_cpu);
+        if (cpu) {
+            cpu_exit(cpu);
+        }
+    } while (cpu != atomic_mb_read(&tcg_current_rr_cpu));
+}
+
+static void kick_tcg_thread(void *opaque)
+{
+    timer_mod(tcg_kick_vcpu_timer, qemu_tcg_next_kick());
+    qemu_cpu_kick_rr_next_cpu();
+}
+
+static void start_tcg_kick_timer(void)
+{
+    assert(!mttcg_enabled);
+    if (!tcg_kick_vcpu_timer && CPU_NEXT(first_cpu)) {
+        tcg_kick_vcpu_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
+                                           kick_tcg_thread, NULL);
+    }
+    if (tcg_kick_vcpu_timer && !timer_pending(tcg_kick_vcpu_timer)) {
+        timer_mod(tcg_kick_vcpu_timer, qemu_tcg_next_kick());
+    }
+}
+
+static void stop_tcg_kick_timer(void)
+{
+    assert(!mttcg_enabled);
+    if (tcg_kick_vcpu_timer && timer_pending(tcg_kick_vcpu_timer)) {
+        timer_del(tcg_kick_vcpu_timer);
+    }
+}
+
+static void qemu_tcg_destroy_vcpu(CPUState *cpu)
+{
+}
+
+static void qemu_tcg_rr_wait_io_event(void)
+{
+    CPUState *cpu;
+
+    while (all_cpu_threads_idle()) {
+        stop_tcg_kick_timer();
+        qemu_cond_wait_iothread(first_cpu->halt_cond);
+    }
+
+    start_tcg_kick_timer();
+
+    CPU_FOREACH(cpu) {
+        qemu_wait_io_event_common(cpu);
+    }
+}
+
+static int64_t tcg_get_icount_limit(void)
+{
+    int64_t deadline;
+
+    if (replay_mode != REPLAY_MODE_PLAY) {
+        /*
+         * Include all the timers, because they may need an attention.
+         * Too long CPU execution may create unnecessary delay in UI.
+         */
+        deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL,
+                                              QEMU_TIMER_ATTR_ALL);
+        /* Check realtime timers, because they help with input processing */
+        deadline = qemu_soonest_timeout(deadline,
+                qemu_clock_deadline_ns_all(QEMU_CLOCK_REALTIME,
+                                           QEMU_TIMER_ATTR_ALL));
+
+        /*
+         * Maintain prior (possibly buggy) behaviour where if no deadline
+         * was set (as there is no QEMU_CLOCK_VIRTUAL timer) or it is more than
+         * INT32_MAX nanoseconds ahead, we still use INT32_MAX
+         * nanoseconds.
+         */
+        if ((deadline < 0) || (deadline > INT32_MAX)) {
+            deadline = INT32_MAX;
+        }
+
+        return icount_round(deadline);
+    } else {
+        return replay_get_instructions();
+    }
+}
+
+static void handle_icount_deadline(void)
+{
+    assert(qemu_in_vcpu_thread());
+    if (icount_enabled()) {
+        int64_t deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL,
+                                                      QEMU_TIMER_ATTR_ALL);
+
+        if (deadline == 0) {
+            /* Wake up other AioContexts.  */
+            qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
+            qemu_clock_run_timers(QEMU_CLOCK_VIRTUAL);
+        }
+    }
+}
+
+static void prepare_icount_for_run(CPUState *cpu)
+{
+    if (icount_enabled()) {
+        int insns_left;
+
+        /*
+         * These should always be cleared by process_icount_data after
+         * each vCPU execution. However u16.high can be raised
+         * asynchronously by cpu_exit/cpu_interrupt/tcg_handle_interrupt
+         */
+        g_assert(cpu_neg(cpu)->icount_decr.u16.low == 0);
+        g_assert(cpu->icount_extra == 0);
+
+        cpu->icount_budget = tcg_get_icount_limit();
+        insns_left = MIN(0xffff, cpu->icount_budget);
+        cpu_neg(cpu)->icount_decr.u16.low = insns_left;
+        cpu->icount_extra = cpu->icount_budget - insns_left;
+
+        replay_mutex_lock();
+    }
+}
+
+static void process_icount_data(CPUState *cpu)
+{
+    if (icount_enabled()) {
+        /* Account for executed instructions */
+        icount_update(cpu);
+
+        /* Reset the counters */
+        cpu_neg(cpu)->icount_decr.u16.low = 0;
+        cpu->icount_extra = 0;
+        cpu->icount_budget = 0;
+
+        replay_account_executed_instructions();
+
+        replay_mutex_unlock();
+    }
+}
+
+static int tcg_cpu_exec(CPUState *cpu)
+{
+    int ret;
+#ifdef CONFIG_PROFILER
+    int64_t ti;
+#endif
+
+    assert(tcg_enabled());
+#ifdef CONFIG_PROFILER
+    ti = profile_getclock();
+#endif
+    cpu_exec_start(cpu);
+    ret = cpu_exec(cpu);
+    cpu_exec_end(cpu);
+#ifdef CONFIG_PROFILER
+    atomic_set(&tcg_ctx->prof.cpu_exec_time,
+               tcg_ctx->prof.cpu_exec_time + profile_getclock() - ti);
+#endif
+    return ret;
+}
+
+/*
+ * Destroy any remaining vCPUs which have been unplugged and have
+ * finished running
+ */
+static void deal_with_unplugged_cpus(void)
+{
+    CPUState *cpu;
+
+    CPU_FOREACH(cpu) {
+        if (cpu->unplug && !cpu_can_run(cpu)) {
+            qemu_tcg_destroy_vcpu(cpu);
+            cpu_thread_signal_destroyed(cpu);
+            break;
+        }
+    }
+}
+
+/*
+ * Single-threaded TCG
+ *
+ * In the single-threaded case each vCPU is simulated in turn. If
+ * there is more than a single vCPU we create a simple timer to kick
+ * the vCPU and ensure we don't get stuck in a tight loop in one vCPU.
+ * This is done explicitly rather than relying on side-effects
+ * elsewhere.
+ */
+
+static void *tcg_rr_cpu_thread_fn(void *arg)
+{
+    CPUState *cpu = arg;
+
+    assert(tcg_enabled());
+    rcu_register_thread();
+    tcg_register_thread();
+
+    qemu_mutex_lock_iothread();
+    qemu_thread_get_self(cpu->thread);
+
+    cpu->thread_id = qemu_get_thread_id();
+    cpu->can_do_io = 1;
+    cpu_thread_signal_created(cpu);
+    qemu_guest_random_seed_thread_part2(cpu->random_seed);
+
+    /* wait for initial kick-off after machine start */
+    while (first_cpu->stopped) {
+        qemu_cond_wait_iothread(first_cpu->halt_cond);
+
+        /* process any pending work */
+        CPU_FOREACH(cpu) {
+            current_cpu = cpu;
+            qemu_wait_io_event_common(cpu);
+        }
+    }
+
+    start_tcg_kick_timer();
+
+    cpu = first_cpu;
+
+    /* process any pending work */
+    cpu->exit_request = 1;
+
+    while (1) {
+        qemu_mutex_unlock_iothread();
+        replay_mutex_lock();
+        qemu_mutex_lock_iothread();
+        /* Account partial waits to QEMU_CLOCK_VIRTUAL.  */
+        icount_account_warp_timer();
+
+        /*
+         * Run the timers here.  This is much more efficient than
+         * waking up the I/O thread and waiting for completion.
+         */
+        handle_icount_deadline();
+
+        replay_mutex_unlock();
+
+        if (!cpu) {
+            cpu = first_cpu;
+        }
+
+        while (cpu && !cpu->queued_work_first && !cpu->exit_request) {
+
+            atomic_mb_set(&tcg_current_rr_cpu, cpu);
+            current_cpu = cpu;
+
+            qemu_clock_enable(QEMU_CLOCK_VIRTUAL,
+                              (cpu->singlestep_enabled & SSTEP_NOTIMER) == 0);
+
+            if (cpu_can_run(cpu)) {
+                int r;
+
+                qemu_mutex_unlock_iothread();
+                prepare_icount_for_run(cpu);
+
+                r = tcg_cpu_exec(cpu);
+
+                process_icount_data(cpu);
+                qemu_mutex_lock_iothread();
+
+                if (r == EXCP_DEBUG) {
+                    cpu_handle_guest_debug(cpu);
+                    break;
+                } else if (r == EXCP_ATOMIC) {
+                    qemu_mutex_unlock_iothread();
+                    cpu_exec_step_atomic(cpu);
+                    qemu_mutex_lock_iothread();
+                    break;
+                }
+            } else if (cpu->stop) {
+                if (cpu->unplug) {
+                    cpu = CPU_NEXT(cpu);
+                }
+                break;
+            }
+
+            cpu = CPU_NEXT(cpu);
+        } /* while (cpu && !cpu->exit_request).. */
+
+        /* Does not need atomic_mb_set because a spurious wakeup is okay.  */
+        atomic_set(&tcg_current_rr_cpu, NULL);
+
+        if (cpu && cpu->exit_request) {
+            atomic_mb_set(&cpu->exit_request, 0);
+        }
+
+        if (icount_enabled() && all_cpu_threads_idle()) {
+            /*
+             * When all cpus are sleeping (e.g in WFI), to avoid a deadlock
+             * in the main_loop, wake it up in order to start the warp timer.
+             */
+            qemu_notify_event();
+        }
+
+        qemu_tcg_rr_wait_io_event();
+        deal_with_unplugged_cpus();
+    }
+
+    rcu_unregister_thread();
+    return NULL;
+}
+
+/*
+ * Multi-threaded TCG
+ *
+ * In the multi-threaded case each vCPU has its own thread. The TLS
+ * variable current_cpu can be used deep in the code to find the
+ * current CPUState for a given thread.
+ */
+
+static void *tcg_cpu_thread_fn(void *arg)
+{
+    CPUState *cpu = arg;
+
+    assert(tcg_enabled());
+    g_assert(!icount_enabled());
+
+    rcu_register_thread();
+    tcg_register_thread();
+
+    qemu_mutex_lock_iothread();
+    qemu_thread_get_self(cpu->thread);
+
+    cpu->thread_id = qemu_get_thread_id();
+    cpu->can_do_io = 1;
+    current_cpu = cpu;
+    cpu_thread_signal_created(cpu);
+    qemu_guest_random_seed_thread_part2(cpu->random_seed);
+
+    /* process any pending work */
+    cpu->exit_request = 1;
+
+    do {
+        if (cpu_can_run(cpu)) {
+            int r;
+            qemu_mutex_unlock_iothread();
+            r = tcg_cpu_exec(cpu);
+            qemu_mutex_lock_iothread();
+            switch (r) {
+            case EXCP_DEBUG:
+                cpu_handle_guest_debug(cpu);
+                break;
+            case EXCP_HALTED:
+                /*
+                 * during start-up the vCPU is reset and the thread is
+                 * kicked several times. If we don't ensure we go back
+                 * to sleep in the halted state we won't cleanly
+                 * start-up when the vCPU is enabled.
+                 *
+                 * cpu->halted should ensure we sleep in wait_io_event
+                 */
+                g_assert(cpu->halted);
+                break;
+            case EXCP_ATOMIC:
+                qemu_mutex_unlock_iothread();
+                cpu_exec_step_atomic(cpu);
+                qemu_mutex_lock_iothread();
+            default:
+                /* Ignore everything else? */
+                break;
+            }
+        }
+
+        atomic_mb_set(&cpu->exit_request, 0);
+        qemu_wait_io_event(cpu);
+    } while (!cpu->unplug || cpu_can_run(cpu));
+
+    qemu_tcg_destroy_vcpu(cpu);
+    cpu_thread_signal_destroyed(cpu);
+    qemu_mutex_unlock_iothread();
+    rcu_unregister_thread();
+    return NULL;
+}
+
+static void tcg_start_vcpu_thread(CPUState *cpu)
+{
+    char thread_name[VCPU_THREAD_NAME_SIZE];
+    static QemuCond *single_tcg_halt_cond;
+    static QemuThread *single_tcg_cpu_thread;
+    static int tcg_region_inited;
+
+    assert(tcg_enabled());
+    /*
+     * Initialize TCG regions--once. Now is a good time, because:
+     * (1) TCG's init context, prologue and target globals have been set up.
+     * (2) qemu_tcg_mttcg_enabled() works now (TCG init code runs before the
+     *     -accel flag is processed, so the check doesn't work then).
+     */
+    if (!tcg_region_inited) {
+        tcg_region_inited = 1;
+        tcg_region_init();
+    }
+
+    if (qemu_tcg_mttcg_enabled() || !single_tcg_cpu_thread) {
+        cpu->thread = g_malloc0(sizeof(QemuThread));
+        cpu->halt_cond = g_malloc0(sizeof(QemuCond));
+        qemu_cond_init(cpu->halt_cond);
+
+        if (qemu_tcg_mttcg_enabled()) {
+            /* create a thread per vCPU with TCG (MTTCG) */
+            parallel_cpus = true;
+            snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/TCG",
+                 cpu->cpu_index);
+
+            qemu_thread_create(cpu->thread, thread_name, tcg_cpu_thread_fn,
+                               cpu, QEMU_THREAD_JOINABLE);
+
+        } else {
+            /* share a single thread for all cpus with TCG */
+            snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "ALL CPUs/TCG");
+            qemu_thread_create(cpu->thread, thread_name,
+                               tcg_rr_cpu_thread_fn,
+                               cpu, QEMU_THREAD_JOINABLE);
+
+            single_tcg_halt_cond = cpu->halt_cond;
+            single_tcg_cpu_thread = cpu->thread;
+        }
+#ifdef _WIN32
+        cpu->hThread = qemu_thread_get_handle(cpu->thread);
+#endif
+    } else {
+        /* For non-MTTCG cases we share the thread */
+        cpu->thread = single_tcg_cpu_thread;
+        cpu->halt_cond = single_tcg_halt_cond;
+        cpu->thread_id = first_cpu->thread_id;
+        cpu->can_do_io = 1;
+        cpu->created = true;
+    }
+}
+
+static void tcg_cpu_synchronize_noop(CPUState *cpu)
+{
+}
+
+CpusAccelInterface tcg_cpus_interface = {
+    .create_vcpu_thread = tcg_start_vcpu_thread,
+    .kick_vcpu_thread = tcg_kick_vcpu_thread,
+
+    .cpu_synchronize_post_reset = tcg_cpu_synchronize_noop,
+    .cpu_synchronize_post_init = tcg_cpu_synchronize_noop,
+    .cpu_synchronize_state = tcg_cpu_synchronize_noop,
+    .cpu_synchronize_pre_loadvm = tcg_cpu_synchronize_noop,
+};
diff --git a/accel/tcg/tcg-cpus-interface.h b/accel/tcg/tcg-cpus-interface.h
new file mode 100644
index 0000000000..c6e96b2af4
--- /dev/null
+++ b/accel/tcg/tcg-cpus-interface.h
@@ -0,0 +1,8 @@
+#ifndef TCG_CPUS_INTERFACE_H
+#define TCG_CPUS_INTERFACE_H
+
+#include "sysemu/cpus.h"
+
+extern CpusAccelInterface tcg_cpus_interface;
+
+#endif /* TCG_CPUS_INTERFACE */
diff --git a/cpus.c b/cpus.c
index 7e9f545be8..3f5d34981e 100644
--- a/cpus.c
+++ b/cpus.c
@@ -24,27 +24,19 @@
 
 #include "qemu/osdep.h"
 #include "qemu-common.h"
-#include "qemu/config-file.h"
-#include "qemu/cutils.h"
-#include "migration/vmstate.h"
 #include "monitor/monitor.h"
 #include "qapi/error.h"
 #include "qapi/qapi-commands-misc.h"
 #include "qapi/qapi-events-run-state.h"
 #include "qapi/qmp/qerror.h"
-#include "qemu/error-report.h"
-#include "qemu/qemu-print.h"
 #include "sysemu/tcg.h"
-#include "sysemu/block-backend.h"
 #include "exec/gdbstub.h"
-#include "sysemu/dma.h"
 #include "sysemu/hw_accel.h"
 #include "sysemu/kvm.h"
 #include "sysemu/hax.h"
 #include "sysemu/hvf.h"
 #include "sysemu/whpx.h"
 #include "exec/exec-all.h"
-
 #include "qemu/thread.h"
 #include "qemu/plugin.h"
 #include "sysemu/cpus.h"
@@ -87,7 +79,7 @@ bool cpu_is_stopped(CPUState *cpu)
     return cpu->stopped || !runstate_is_running();
 }
 
-static bool cpu_thread_is_idle(CPUState *cpu)
+bool cpu_thread_is_idle(CPUState *cpu)
 {
     if (cpu->stop || cpu->queued_work_first) {
         return false;
@@ -114,78 +106,6 @@ bool all_cpu_threads_idle(void)
     return true;
 }
 
-bool mttcg_enabled;
-
-/***********************************************************/
-/* TCG vCPU kick timer
- *
- * The kick timer is responsible for moving single threaded vCPU
- * emulation on to the next vCPU. If more than one vCPU is running a
- * timer event with force a cpu->exit so the next vCPU can get
- * scheduled.
- *
- * The timer is removed if all vCPUs are idle and restarted again once
- * idleness is complete.
- */
-
-static QEMUTimer *tcg_kick_vcpu_timer;
-static CPUState *tcg_current_rr_cpu;
-
-#define TCG_KICK_PERIOD (NANOSECONDS_PER_SECOND / 10)
-
-static inline int64_t qemu_tcg_next_kick(void)
-{
-    return qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + TCG_KICK_PERIOD;
-}
-
-/* Kick the currently round-robin scheduled vCPU to next */
-static void qemu_cpu_kick_rr_next_cpu(void)
-{
-    CPUState *cpu;
-    do {
-        cpu = atomic_mb_read(&tcg_current_rr_cpu);
-        if (cpu) {
-            cpu_exit(cpu);
-        }
-    } while (cpu != atomic_mb_read(&tcg_current_rr_cpu));
-}
-
-/* Kick all RR vCPUs */
-static void qemu_cpu_kick_rr_cpus(void)
-{
-    CPUState *cpu;
-
-    CPU_FOREACH(cpu) {
-        cpu_exit(cpu);
-    };
-}
-
-static void kick_tcg_thread(void *opaque)
-{
-    timer_mod(tcg_kick_vcpu_timer, qemu_tcg_next_kick());
-    qemu_cpu_kick_rr_next_cpu();
-}
-
-static void start_tcg_kick_timer(void)
-{
-    assert(!mttcg_enabled);
-    if (!tcg_kick_vcpu_timer && CPU_NEXT(first_cpu)) {
-        tcg_kick_vcpu_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
-                                           kick_tcg_thread, NULL);
-    }
-    if (tcg_kick_vcpu_timer && !timer_pending(tcg_kick_vcpu_timer)) {
-        timer_mod(tcg_kick_vcpu_timer, qemu_tcg_next_kick());
-    }
-}
-
-static void stop_tcg_kick_timer(void)
-{
-    assert(!mttcg_enabled);
-    if (tcg_kick_vcpu_timer && timer_pending(tcg_kick_vcpu_timer)) {
-        timer_del(tcg_kick_vcpu_timer);
-    }
-}
-
 /***********************************************************/
 void hw_error(const char *fmt, ...)
 {
@@ -204,16 +124,21 @@ void hw_error(const char *fmt, ...)
     abort();
 }
 
+/*
+ * every accelerator is supposed to register this.
+ * Cannot be done cleanly as a machine state or accel class method,
+ * since TCG is not a normal accelerator yet,
+ * with USER mode being special-cased and other complications.
+ */
+static CpusAccelInterface accel_int;
+
 void cpu_synchronize_all_states(void)
 {
     CPUState *cpu;
 
     CPU_FOREACH(cpu) {
-        cpu_synchronize_state(cpu);
-        /* TODO: move to cpu_synchronize_state() */
-        if (hvf_enabled()) {
-            hvf_cpu_synchronize_state(cpu);
-        }
+        assert(accel_int.cpu_synchronize_state != NULL);
+        accel_int.cpu_synchronize_state(cpu);
     }
 }
 
@@ -222,11 +147,8 @@ void cpu_synchronize_all_post_reset(void)
     CPUState *cpu;
 
     CPU_FOREACH(cpu) {
-        cpu_synchronize_post_reset(cpu);
-        /* TODO: move to cpu_synchronize_post_reset() */
-        if (hvf_enabled()) {
-            hvf_cpu_synchronize_post_reset(cpu);
-        }
+        assert(accel_int.cpu_synchronize_post_reset != NULL);
+        accel_int.cpu_synchronize_post_reset(cpu);
     }
 }
 
@@ -235,11 +157,8 @@ void cpu_synchronize_all_post_init(void)
     CPUState *cpu;
 
     CPU_FOREACH(cpu) {
-        cpu_synchronize_post_init(cpu);
-        /* TODO: move to cpu_synchronize_post_init() */
-        if (hvf_enabled()) {
-            hvf_cpu_synchronize_post_init(cpu);
-        }
+        assert(accel_int.cpu_synchronize_post_init != NULL);
+        accel_int.cpu_synchronize_post_init(cpu);
     }
 }
 
@@ -248,7 +167,8 @@ void cpu_synchronize_all_pre_loadvm(void)
     CPUState *cpu;
 
     CPU_FOREACH(cpu) {
-        cpu_synchronize_pre_loadvm(cpu);
+        assert(accel_int.cpu_synchronize_pre_loadvm != NULL);
+        accel_int.cpu_synchronize_pre_loadvm(cpu);
     }
 }
 
@@ -280,7 +200,7 @@ int vm_shutdown(void)
     return do_vm_stop(RUN_STATE_SHUTDOWN, false);
 }
 
-static bool cpu_can_run(CPUState *cpu)
+bool cpu_can_run(CPUState *cpu)
 {
     if (cpu->stop) {
         return false;
@@ -291,7 +211,7 @@ static bool cpu_can_run(CPUState *cpu)
     return true;
 }
 
-static void cpu_handle_guest_debug(CPUState *cpu)
+void cpu_handle_guest_debug(CPUState *cpu)
 {
     gdb_set_stop_cpu(cpu);
     qemu_system_debug_request();
@@ -374,18 +294,6 @@ void run_on_cpu(CPUState *cpu, run_on_cpu_func func, run_on_cpu_data data)
     do_run_on_cpu(cpu, func, data, &qemu_global_mutex);
 }
 
-static void qemu_kvm_destroy_vcpu(CPUState *cpu)
-{
-    if (kvm_destroy_vcpu(cpu) < 0) {
-        error_report("kvm_destroy_vcpu failed");
-        exit(EXIT_FAILURE);
-    }
-}
-
-static void qemu_tcg_destroy_vcpu(CPUState *cpu)
-{
-}
-
 static void qemu_cpu_stop(CPUState *cpu, bool exit)
 {
     g_assert(qemu_cpu_is_self(cpu));
@@ -397,7 +305,7 @@ static void qemu_cpu_stop(CPUState *cpu, bool exit)
     qemu_cond_broadcast(&qemu_pause_cond);
 }
 
-static void qemu_wait_io_event_common(CPUState *cpu)
+void qemu_wait_io_event_common(CPUState *cpu)
 {
     atomic_mb_set(&cpu->thread_kicked, false);
     if (cpu->stop) {
@@ -406,23 +314,7 @@ static void qemu_wait_io_event_common(CPUState *cpu)
     process_queued_cpu_work(cpu);
 }
 
-static void qemu_tcg_rr_wait_io_event(void)
-{
-    CPUState *cpu;
-
-    while (all_cpu_threads_idle()) {
-        stop_tcg_kick_timer();
-        qemu_cond_wait(first_cpu->halt_cond, &qemu_global_mutex);
-    }
-
-    start_tcg_kick_timer();
-
-    CPU_FOREACH(cpu) {
-        qemu_wait_io_event_common(cpu);
-    }
-}
-
-static void qemu_wait_io_event(CPUState *cpu)
+void qemu_wait_io_event(CPUState *cpu)
 {
     bool slept = false;
 
@@ -438,7 +330,8 @@ static void qemu_wait_io_event(CPUState *cpu)
     }
 
 #ifdef _WIN32
-    /* Eat dummy APC queued by qemu_cpu_kick_thread.  */
+    /* Eat dummy APC queued by hax_kick_vcpu_thread */
+    /* NB!!! Should not this be if (hax_enabled)? Is this wrong for whpx? */
     if (!tcg_enabled()) {
         SleepEx(0, TRUE);
     }
@@ -446,540 +339,7 @@ static void qemu_wait_io_event(CPUState *cpu)
     qemu_wait_io_event_common(cpu);
 }
 
-static void *qemu_kvm_cpu_thread_fn(void *arg)
-{
-    CPUState *cpu = arg;
-    int r;
-
-    rcu_register_thread();
-
-    qemu_mutex_lock_iothread();
-    qemu_thread_get_self(cpu->thread);
-    cpu->thread_id = qemu_get_thread_id();
-    cpu->can_do_io = 1;
-    current_cpu = cpu;
-
-    r = kvm_init_vcpu(cpu);
-    if (r < 0) {
-        error_report("kvm_init_vcpu failed: %s", strerror(-r));
-        exit(1);
-    }
-
-    kvm_init_cpu_signals(cpu);
-
-    /* signal CPU creation */
-    cpu->created = true;
-    qemu_cond_signal(&qemu_cpu_cond);
-    qemu_guest_random_seed_thread_part2(cpu->random_seed);
-
-    do {
-        if (cpu_can_run(cpu)) {
-            r = kvm_cpu_exec(cpu);
-            if (r == EXCP_DEBUG) {
-                cpu_handle_guest_debug(cpu);
-            }
-        }
-        qemu_wait_io_event(cpu);
-    } while (!cpu->unplug || cpu_can_run(cpu));
-
-    qemu_kvm_destroy_vcpu(cpu);
-    cpu->created = false;
-    qemu_cond_signal(&qemu_cpu_cond);
-    qemu_mutex_unlock_iothread();
-    rcu_unregister_thread();
-    return NULL;
-}
-
-static void *qemu_dummy_cpu_thread_fn(void *arg)
-{
-#ifdef _WIN32
-    error_report("qtest is not supported under Windows");
-    exit(1);
-#else
-    CPUState *cpu = arg;
-    sigset_t waitset;
-    int r;
-
-    rcu_register_thread();
-
-    qemu_mutex_lock_iothread();
-    qemu_thread_get_self(cpu->thread);
-    cpu->thread_id = qemu_get_thread_id();
-    cpu->can_do_io = 1;
-    current_cpu = cpu;
-
-    sigemptyset(&waitset);
-    sigaddset(&waitset, SIG_IPI);
-
-    /* signal CPU creation */
-    cpu->created = true;
-    qemu_cond_signal(&qemu_cpu_cond);
-    qemu_guest_random_seed_thread_part2(cpu->random_seed);
-
-    do {
-        qemu_mutex_unlock_iothread();
-        do {
-            int sig;
-            r = sigwait(&waitset, &sig);
-        } while (r == -1 && (errno == EAGAIN || errno == EINTR));
-        if (r == -1) {
-            perror("sigwait");
-            exit(1);
-        }
-        qemu_mutex_lock_iothread();
-        qemu_wait_io_event(cpu);
-    } while (!cpu->unplug);
-
-    qemu_mutex_unlock_iothread();
-    rcu_unregister_thread();
-    return NULL;
-#endif
-}
-
-static int64_t tcg_get_icount_limit(void)
-{
-    int64_t deadline;
-
-    if (replay_mode != REPLAY_MODE_PLAY) {
-        /*
-         * Include all the timers, because they may need an attention.
-         * Too long CPU execution may create unnecessary delay in UI.
-         */
-        deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL,
-                                              QEMU_TIMER_ATTR_ALL);
-        /* Check realtime timers, because they help with input processing */
-        deadline = qemu_soonest_timeout(deadline,
-                qemu_clock_deadline_ns_all(QEMU_CLOCK_REALTIME,
-                                           QEMU_TIMER_ATTR_ALL));
-
-        /* Maintain prior (possibly buggy) behaviour where if no deadline
-         * was set (as there is no QEMU_CLOCK_VIRTUAL timer) or it is more than
-         * INT32_MAX nanoseconds ahead, we still use INT32_MAX
-         * nanoseconds.
-         */
-        if ((deadline < 0) || (deadline > INT32_MAX)) {
-            deadline = INT32_MAX;
-        }
-
-        return icount_round(deadline);
-    } else {
-        return replay_get_instructions();
-    }
-}
-
-static void handle_icount_deadline(void)
-{
-    assert(qemu_in_vcpu_thread());
-    if (icount_enabled()) {
-        int64_t deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL,
-                                                      QEMU_TIMER_ATTR_ALL);
-
-        if (deadline == 0) {
-            /* Wake up other AioContexts.  */
-            qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
-            qemu_clock_run_timers(QEMU_CLOCK_VIRTUAL);
-        }
-    }
-}
-
-static void prepare_icount_for_run(CPUState *cpu)
-{
-    if (icount_enabled()) {
-        int insns_left;
-
-        /* These should always be cleared by process_icount_data after
-         * each vCPU execution. However u16.high can be raised
-         * asynchronously by cpu_exit/cpu_interrupt/tcg_handle_interrupt
-         */
-        g_assert(cpu_neg(cpu)->icount_decr.u16.low == 0);
-        g_assert(cpu->icount_extra == 0);
-
-        cpu->icount_budget = tcg_get_icount_limit();
-        insns_left = MIN(0xffff, cpu->icount_budget);
-        cpu_neg(cpu)->icount_decr.u16.low = insns_left;
-        cpu->icount_extra = cpu->icount_budget - insns_left;
-
-        replay_mutex_lock();
-    }
-}
-
-static void process_icount_data(CPUState *cpu)
-{
-    if (icount_enabled()) {
-        /* Account for executed instructions */
-        icount_update(cpu);
-
-        /* Reset the counters */
-        cpu_neg(cpu)->icount_decr.u16.low = 0;
-        cpu->icount_extra = 0;
-        cpu->icount_budget = 0;
-
-        replay_account_executed_instructions();
-
-        replay_mutex_unlock();
-    }
-}
-
-
-static int tcg_cpu_exec(CPUState *cpu)
-{
-    int ret;
-#ifdef CONFIG_PROFILER
-    int64_t ti;
-#endif
-
-    assert(tcg_enabled());
-#ifdef CONFIG_PROFILER
-    ti = profile_getclock();
-#endif
-    cpu_exec_start(cpu);
-    ret = cpu_exec(cpu);
-    cpu_exec_end(cpu);
-#ifdef CONFIG_PROFILER
-    atomic_set(&tcg_ctx->prof.cpu_exec_time,
-               tcg_ctx->prof.cpu_exec_time + profile_getclock() - ti);
-#endif
-    return ret;
-}
-
-/* Destroy any remaining vCPUs which have been unplugged and have
- * finished running
- */
-static void deal_with_unplugged_cpus(void)
-{
-    CPUState *cpu;
-
-    CPU_FOREACH(cpu) {
-        if (cpu->unplug && !cpu_can_run(cpu)) {
-            qemu_tcg_destroy_vcpu(cpu);
-            cpu->created = false;
-            qemu_cond_signal(&qemu_cpu_cond);
-            break;
-        }
-    }
-}
-
-/* Single-threaded TCG
- *
- * In the single-threaded case each vCPU is simulated in turn. If
- * there is more than a single vCPU we create a simple timer to kick
- * the vCPU and ensure we don't get stuck in a tight loop in one vCPU.
- * This is done explicitly rather than relying on side-effects
- * elsewhere.
- */
-
-static void *qemu_tcg_rr_cpu_thread_fn(void *arg)
-{
-    CPUState *cpu = arg;
-
-    assert(tcg_enabled());
-    rcu_register_thread();
-    tcg_register_thread();
-
-    qemu_mutex_lock_iothread();
-    qemu_thread_get_self(cpu->thread);
-
-    cpu->thread_id = qemu_get_thread_id();
-    cpu->created = true;
-    cpu->can_do_io = 1;
-    qemu_cond_signal(&qemu_cpu_cond);
-    qemu_guest_random_seed_thread_part2(cpu->random_seed);
-
-    /* wait for initial kick-off after machine start */
-    while (first_cpu->stopped) {
-        qemu_cond_wait(first_cpu->halt_cond, &qemu_global_mutex);
-
-        /* process any pending work */
-        CPU_FOREACH(cpu) {
-            current_cpu = cpu;
-            qemu_wait_io_event_common(cpu);
-        }
-    }
-
-    start_tcg_kick_timer();
-
-    cpu = first_cpu;
-
-    /* process any pending work */
-    cpu->exit_request = 1;
-
-    while (1) {
-        qemu_mutex_unlock_iothread();
-        replay_mutex_lock();
-        qemu_mutex_lock_iothread();
-        /* Account partial waits to QEMU_CLOCK_VIRTUAL.  */
-        icount_account_warp_timer();
-
-        /* Run the timers here.  This is much more efficient than
-         * waking up the I/O thread and waiting for completion.
-         */
-        handle_icount_deadline();
-
-        replay_mutex_unlock();
-
-        if (!cpu) {
-            cpu = first_cpu;
-        }
-
-        while (cpu && !cpu->queued_work_first && !cpu->exit_request) {
-
-            atomic_mb_set(&tcg_current_rr_cpu, cpu);
-            current_cpu = cpu;
-
-            qemu_clock_enable(QEMU_CLOCK_VIRTUAL,
-                              (cpu->singlestep_enabled & SSTEP_NOTIMER) == 0);
-
-            if (cpu_can_run(cpu)) {
-                int r;
-
-                qemu_mutex_unlock_iothread();
-                prepare_icount_for_run(cpu);
-
-                r = tcg_cpu_exec(cpu);
-
-                process_icount_data(cpu);
-                qemu_mutex_lock_iothread();
-
-                if (r == EXCP_DEBUG) {
-                    cpu_handle_guest_debug(cpu);
-                    break;
-                } else if (r == EXCP_ATOMIC) {
-                    qemu_mutex_unlock_iothread();
-                    cpu_exec_step_atomic(cpu);
-                    qemu_mutex_lock_iothread();
-                    break;
-                }
-            } else if (cpu->stop) {
-                if (cpu->unplug) {
-                    cpu = CPU_NEXT(cpu);
-                }
-                break;
-            }
-
-            cpu = CPU_NEXT(cpu);
-        } /* while (cpu && !cpu->exit_request).. */
-
-        /* Does not need atomic_mb_set because a spurious wakeup is okay.  */
-        atomic_set(&tcg_current_rr_cpu, NULL);
-
-        if (cpu && cpu->exit_request) {
-            atomic_mb_set(&cpu->exit_request, 0);
-        }
-
-        if (icount_enabled() && all_cpu_threads_idle()) {
-            /*
-             * When all cpus are sleeping (e.g in WFI), to avoid a deadlock
-             * in the main_loop, wake it up in order to start the warp timer.
-             */
-            qemu_notify_event();
-        }
-
-        qemu_tcg_rr_wait_io_event();
-        deal_with_unplugged_cpus();
-    }
-
-    rcu_unregister_thread();
-    return NULL;
-}
-
-static void *qemu_hax_cpu_thread_fn(void *arg)
-{
-    CPUState *cpu = arg;
-    int r;
-
-    rcu_register_thread();
-    qemu_mutex_lock_iothread();
-    qemu_thread_get_self(cpu->thread);
-
-    cpu->thread_id = qemu_get_thread_id();
-    cpu->created = true;
-    current_cpu = cpu;
-
-    hax_init_vcpu(cpu);
-    qemu_cond_signal(&qemu_cpu_cond);
-    qemu_guest_random_seed_thread_part2(cpu->random_seed);
-
-    do {
-        if (cpu_can_run(cpu)) {
-            r = hax_smp_cpu_exec(cpu);
-            if (r == EXCP_DEBUG) {
-                cpu_handle_guest_debug(cpu);
-            }
-        }
-
-        qemu_wait_io_event(cpu);
-    } while (!cpu->unplug || cpu_can_run(cpu));
-    rcu_unregister_thread();
-    return NULL;
-}
-
-/* The HVF-specific vCPU thread function. This one should only run when the host
- * CPU supports the VMX "unrestricted guest" feature. */
-static void *qemu_hvf_cpu_thread_fn(void *arg)
-{
-    CPUState *cpu = arg;
-
-    int r;
-
-    assert(hvf_enabled());
-
-    rcu_register_thread();
-
-    qemu_mutex_lock_iothread();
-    qemu_thread_get_self(cpu->thread);
-
-    cpu->thread_id = qemu_get_thread_id();
-    cpu->can_do_io = 1;
-    current_cpu = cpu;
-
-    hvf_init_vcpu(cpu);
-
-    /* signal CPU creation */
-    cpu->created = true;
-    qemu_cond_signal(&qemu_cpu_cond);
-    qemu_guest_random_seed_thread_part2(cpu->random_seed);
-
-    do {
-        if (cpu_can_run(cpu)) {
-            r = hvf_vcpu_exec(cpu);
-            if (r == EXCP_DEBUG) {
-                cpu_handle_guest_debug(cpu);
-            }
-        }
-        qemu_wait_io_event(cpu);
-    } while (!cpu->unplug || cpu_can_run(cpu));
-
-    hvf_vcpu_destroy(cpu);
-    cpu->created = false;
-    qemu_cond_signal(&qemu_cpu_cond);
-    qemu_mutex_unlock_iothread();
-    rcu_unregister_thread();
-    return NULL;
-}
-
-static void *qemu_whpx_cpu_thread_fn(void *arg)
-{
-    CPUState *cpu = arg;
-    int r;
-
-    rcu_register_thread();
-
-    qemu_mutex_lock_iothread();
-    qemu_thread_get_self(cpu->thread);
-    cpu->thread_id = qemu_get_thread_id();
-    current_cpu = cpu;
-
-    r = whpx_init_vcpu(cpu);
-    if (r < 0) {
-        fprintf(stderr, "whpx_init_vcpu failed: %s\n", strerror(-r));
-        exit(1);
-    }
-
-    /* signal CPU creation */
-    cpu->created = true;
-    qemu_cond_signal(&qemu_cpu_cond);
-    qemu_guest_random_seed_thread_part2(cpu->random_seed);
-
-    do {
-        if (cpu_can_run(cpu)) {
-            r = whpx_vcpu_exec(cpu);
-            if (r == EXCP_DEBUG) {
-                cpu_handle_guest_debug(cpu);
-            }
-        }
-        while (cpu_thread_is_idle(cpu)) {
-            qemu_cond_wait(cpu->halt_cond, &qemu_global_mutex);
-        }
-        qemu_wait_io_event_common(cpu);
-    } while (!cpu->unplug || cpu_can_run(cpu));
-
-    whpx_destroy_vcpu(cpu);
-    cpu->created = false;
-    qemu_cond_signal(&qemu_cpu_cond);
-    qemu_mutex_unlock_iothread();
-    rcu_unregister_thread();
-    return NULL;
-}
-
-#ifdef _WIN32
-static void CALLBACK dummy_apc_func(ULONG_PTR unused)
-{
-}
-#endif
-
-/* Multi-threaded TCG
- *
- * In the multi-threaded case each vCPU has its own thread. The TLS
- * variable current_cpu can be used deep in the code to find the
- * current CPUState for a given thread.
- */
-
-static void *qemu_tcg_cpu_thread_fn(void *arg)
-{
-    CPUState *cpu = arg;
-
-    assert(tcg_enabled());
-    g_assert(!icount_enabled());
-
-    rcu_register_thread();
-    tcg_register_thread();
-
-    qemu_mutex_lock_iothread();
-    qemu_thread_get_self(cpu->thread);
-
-    cpu->thread_id = qemu_get_thread_id();
-    cpu->created = true;
-    cpu->can_do_io = 1;
-    current_cpu = cpu;
-    qemu_cond_signal(&qemu_cpu_cond);
-    qemu_guest_random_seed_thread_part2(cpu->random_seed);
-
-    /* process any pending work */
-    cpu->exit_request = 1;
-
-    do {
-        if (cpu_can_run(cpu)) {
-            int r;
-            qemu_mutex_unlock_iothread();
-            r = tcg_cpu_exec(cpu);
-            qemu_mutex_lock_iothread();
-            switch (r) {
-            case EXCP_DEBUG:
-                cpu_handle_guest_debug(cpu);
-                break;
-            case EXCP_HALTED:
-                /* during start-up the vCPU is reset and the thread is
-                 * kicked several times. If we don't ensure we go back
-                 * to sleep in the halted state we won't cleanly
-                 * start-up when the vCPU is enabled.
-                 *
-                 * cpu->halted should ensure we sleep in wait_io_event
-                 */
-                g_assert(cpu->halted);
-                break;
-            case EXCP_ATOMIC:
-                qemu_mutex_unlock_iothread();
-                cpu_exec_step_atomic(cpu);
-                qemu_mutex_lock_iothread();
-            default:
-                /* Ignore everything else? */
-                break;
-            }
-        }
-
-        atomic_mb_set(&cpu->exit_request, 0);
-        qemu_wait_io_event(cpu);
-    } while (!cpu->unplug || cpu_can_run(cpu));
-
-    qemu_tcg_destroy_vcpu(cpu);
-    cpu->created = false;
-    qemu_cond_signal(&qemu_cpu_cond);
-    qemu_mutex_unlock_iothread();
-    rcu_unregister_thread();
-    return NULL;
-}
-
-static void qemu_cpu_kick_thread(CPUState *cpu)
+void cpus_kick_thread(CPUState *cpu)
 {
 #ifndef _WIN32
     int err;
@@ -993,44 +353,20 @@ static void qemu_cpu_kick_thread(CPUState *cpu)
         fprintf(stderr, "qemu:%s: %s", __func__, strerror(err));
         exit(1);
     }
-#else /* _WIN32 */
-    if (!qemu_cpu_is_self(cpu)) {
-        if (whpx_enabled()) {
-            whpx_vcpu_kick(cpu);
-        } else if (!QueueUserAPC(dummy_apc_func, cpu->hThread, 0)) {
-            fprintf(stderr, "%s: QueueUserAPC failed with error %lu\n",
-                    __func__, GetLastError());
-            exit(1);
-        }
-    }
 #endif
 }
 
 void qemu_cpu_kick(CPUState *cpu)
 {
     qemu_cond_broadcast(cpu->halt_cond);
-    if (tcg_enabled()) {
-        if (qemu_tcg_mttcg_enabled()) {
-            cpu_exit(cpu);
-        } else {
-            qemu_cpu_kick_rr_cpus();
-        }
-    } else {
-        if (hax_enabled()) {
-            /*
-             * FIXME: race condition with the exit_request check in
-             * hax_vcpu_hax_exec
-             */
-            cpu->exit_request = 1;
-        }
-        qemu_cpu_kick_thread(cpu);
-    }
+    assert(accel_int.kick_vcpu_thread != NULL);
+    accel_int.kick_vcpu_thread(cpu);
 }
 
 void qemu_cpu_kick_self(void)
 {
     assert(current_cpu);
-    qemu_cpu_kick_thread(current_cpu);
+    cpus_kick_thread(current_cpu);
 }
 
 bool qemu_cpu_is_self(CPUState *cpu)
@@ -1080,6 +416,21 @@ void qemu_cond_timedwait_iothread(QemuCond *cond, int ms)
     qemu_cond_timedwait(cond, &qemu_global_mutex, ms);
 }
 
+/* signal CPU creation */
+void cpu_thread_signal_created(CPUState *cpu)
+{
+    cpu->created = true;
+    qemu_cond_signal(&qemu_cpu_cond);
+}
+
+/* signal CPU destruction */
+void cpu_thread_signal_destroyed(CPUState *cpu)
+{
+    cpu->created = false;
+    qemu_cond_signal(&qemu_cpu_cond);
+}
+
+
 static bool all_vcpus_paused(void)
 {
     CPUState *cpu;
@@ -1155,140 +506,18 @@ void cpu_remove_sync(CPUState *cpu)
     qemu_mutex_lock_iothread();
 }
 
-/* For temporary buffers for forming a name */
-#define VCPU_THREAD_NAME_SIZE 16
-
-static void qemu_tcg_init_vcpu(CPUState *cpu)
-{
-    char thread_name[VCPU_THREAD_NAME_SIZE];
-    static QemuCond *single_tcg_halt_cond;
-    static QemuThread *single_tcg_cpu_thread;
-    static int tcg_region_inited;
-
-    assert(tcg_enabled());
-    /*
-     * Initialize TCG regions--once. Now is a good time, because:
-     * (1) TCG's init context, prologue and target globals have been set up.
-     * (2) qemu_tcg_mttcg_enabled() works now (TCG init code runs before the
-     *     -accel flag is processed, so the check doesn't work then).
-     */
-    if (!tcg_region_inited) {
-        tcg_region_inited = 1;
-        tcg_region_init();
-    }
-
-    if (qemu_tcg_mttcg_enabled() || !single_tcg_cpu_thread) {
-        cpu->thread = g_malloc0(sizeof(QemuThread));
-        cpu->halt_cond = g_malloc0(sizeof(QemuCond));
-        qemu_cond_init(cpu->halt_cond);
-
-        if (qemu_tcg_mttcg_enabled()) {
-            /* create a thread per vCPU with TCG (MTTCG) */
-            parallel_cpus = true;
-            snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/TCG",
-                 cpu->cpu_index);
-
-            qemu_thread_create(cpu->thread, thread_name, qemu_tcg_cpu_thread_fn,
-                               cpu, QEMU_THREAD_JOINABLE);
-
-        } else {
-            /* share a single thread for all cpus with TCG */
-            snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "ALL CPUs/TCG");
-            qemu_thread_create(cpu->thread, thread_name,
-                               qemu_tcg_rr_cpu_thread_fn,
-                               cpu, QEMU_THREAD_JOINABLE);
-
-            single_tcg_halt_cond = cpu->halt_cond;
-            single_tcg_cpu_thread = cpu->thread;
-        }
-#ifdef _WIN32
-        cpu->hThread = qemu_thread_get_handle(cpu->thread);
-#endif
-    } else {
-        /* For non-MTTCG cases we share the thread */
-        cpu->thread = single_tcg_cpu_thread;
-        cpu->halt_cond = single_tcg_halt_cond;
-        cpu->thread_id = first_cpu->thread_id;
-        cpu->can_do_io = 1;
-        cpu->created = true;
-    }
-}
-
-static void qemu_hax_start_vcpu(CPUState *cpu)
-{
-    char thread_name[VCPU_THREAD_NAME_SIZE];
-
-    cpu->thread = g_malloc0(sizeof(QemuThread));
-    cpu->halt_cond = g_malloc0(sizeof(QemuCond));
-    qemu_cond_init(cpu->halt_cond);
-
-    snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/HAX",
-             cpu->cpu_index);
-    qemu_thread_create(cpu->thread, thread_name, qemu_hax_cpu_thread_fn,
-                       cpu, QEMU_THREAD_JOINABLE);
-#ifdef _WIN32
-    cpu->hThread = qemu_thread_get_handle(cpu->thread);
-#endif
-}
-
-static void qemu_kvm_start_vcpu(CPUState *cpu)
+void cpus_register_accel_interface(CpusAccelInterface *i)
 {
-    char thread_name[VCPU_THREAD_NAME_SIZE];
-
-    cpu->thread = g_malloc0(sizeof(QemuThread));
-    cpu->halt_cond = g_malloc0(sizeof(QemuCond));
-    qemu_cond_init(cpu->halt_cond);
-    snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/KVM",
-             cpu->cpu_index);
-    qemu_thread_create(cpu->thread, thread_name, qemu_kvm_cpu_thread_fn,
-                       cpu, QEMU_THREAD_JOINABLE);
-}
-
-static void qemu_hvf_start_vcpu(CPUState *cpu)
-{
-    char thread_name[VCPU_THREAD_NAME_SIZE];
-
-    /* HVF currently does not support TCG, and only runs in
-     * unrestricted-guest mode. */
-    assert(hvf_enabled());
-
-    cpu->thread = g_malloc0(sizeof(QemuThread));
-    cpu->halt_cond = g_malloc0(sizeof(QemuCond));
-    qemu_cond_init(cpu->halt_cond);
+    assert(i != NULL);
+    assert(i->create_vcpu_thread != NULL);
+    assert(i->kick_vcpu_thread != NULL);
 
-    snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/HVF",
-             cpu->cpu_index);
-    qemu_thread_create(cpu->thread, thread_name, qemu_hvf_cpu_thread_fn,
-                       cpu, QEMU_THREAD_JOINABLE);
-}
-
-static void qemu_whpx_start_vcpu(CPUState *cpu)
-{
-    char thread_name[VCPU_THREAD_NAME_SIZE];
-
-    cpu->thread = g_malloc0(sizeof(QemuThread));
-    cpu->halt_cond = g_malloc0(sizeof(QemuCond));
-    qemu_cond_init(cpu->halt_cond);
-    snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/WHPX",
-             cpu->cpu_index);
-    qemu_thread_create(cpu->thread, thread_name, qemu_whpx_cpu_thread_fn,
-                       cpu, QEMU_THREAD_JOINABLE);
-#ifdef _WIN32
-    cpu->hThread = qemu_thread_get_handle(cpu->thread);
-#endif
-}
+    assert(i->cpu_synchronize_post_reset != NULL);
+    assert(i->cpu_synchronize_post_init != NULL);
+    assert(i->cpu_synchronize_state != NULL);
+    assert(i->cpu_synchronize_pre_loadvm != NULL);
 
-static void qemu_dummy_start_vcpu(CPUState *cpu)
-{
-    char thread_name[VCPU_THREAD_NAME_SIZE];
-
-    cpu->thread = g_malloc0(sizeof(QemuThread));
-    cpu->halt_cond = g_malloc0(sizeof(QemuCond));
-    qemu_cond_init(cpu->halt_cond);
-    snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/DUMMY",
-             cpu->cpu_index);
-    qemu_thread_create(cpu->thread, thread_name, qemu_dummy_cpu_thread_fn, cpu,
-                       QEMU_THREAD_JOINABLE);
+    accel_int = *i;
 }
 
 void qemu_init_vcpu(CPUState *cpu)
@@ -1308,19 +537,8 @@ void qemu_init_vcpu(CPUState *cpu)
         cpu_address_space_init(cpu, 0, "cpu-memory", cpu->memory);
     }
 
-    if (kvm_enabled()) {
-        qemu_kvm_start_vcpu(cpu);
-    } else if (hax_enabled()) {
-        qemu_hax_start_vcpu(cpu);
-    } else if (hvf_enabled()) {
-        qemu_hvf_start_vcpu(cpu);
-    } else if (tcg_enabled()) {
-        qemu_tcg_init_vcpu(cpu);
-    } else if (whpx_enabled()) {
-        qemu_whpx_start_vcpu(cpu);
-    } else {
-        qemu_dummy_start_vcpu(cpu);
-    }
+    assert(accel_int.create_vcpu_thread != NULL);
+    accel_int.create_vcpu_thread(cpu);
 
     while (!cpu->created) {
         qemu_cond_wait(&qemu_cpu_cond, &qemu_global_mutex);
@@ -1498,3 +716,26 @@ void qmp_inject_nmi(Error **errp)
     nmi_monitor_handle(monitor_get_cpu_index(), errp);
 }
 
+void cpu_synchronize_state(CPUState *cpu)
+{
+    assert(accel_int.cpu_synchronize_state != NULL);
+    accel_int.cpu_synchronize_state(cpu);
+}
+
+void cpu_synchronize_post_reset(CPUState *cpu)
+{
+    assert(accel_int.cpu_synchronize_post_reset != NULL);
+    accel_int.cpu_synchronize_post_reset(cpu);
+}
+
+void cpu_synchronize_post_init(CPUState *cpu)
+{
+    assert(accel_int.cpu_synchronize_post_init != NULL);
+    accel_int.cpu_synchronize_post_init(cpu);
+}
+
+void cpu_synchronize_pre_loadvm(CPUState *cpu)
+{
+    assert(accel_int.cpu_synchronize_pre_loadvm != NULL);
+    accel_int.cpu_synchronize_pre_loadvm(cpu);
+}
diff --git a/hw/core/cpu.c b/hw/core/cpu.c
index 5284d384fb..b601654a10 100644
--- a/hw/core/cpu.c
+++ b/hw/core/cpu.c
@@ -33,6 +33,7 @@
 #include "hw/qdev-properties.h"
 #include "trace-root.h"
 #include "qemu/plugin.h"
+#include "sysemu/hw_accel.h"
 
 CPUInterruptHandler cpu_interrupt_handler;
 
diff --git a/include/sysemu/cpus.h b/include/sysemu/cpus.h
index 149de000a0..b9d76ebda8 100644
--- a/include/sysemu/cpus.h
+++ b/include/sysemu/cpus.h
@@ -4,7 +4,51 @@
 #include "qemu/timer.h"
 
 /* cpus.c */
+
+/* CPU execution threads */
+
+typedef struct CpusAccelInterface {
+    void (*create_vcpu_thread)(CPUState *cpu);
+    void (*kick_vcpu_thread)(CPUState *cpu);
+
+    void (*cpu_synchronize_post_reset)(CPUState *cpu);
+    void (*cpu_synchronize_post_init)(CPUState *cpu);
+    void (*cpu_synchronize_state)(CPUState *cpu);
+    void (*cpu_synchronize_pre_loadvm)(CPUState *cpu);
+} CpusAccelInterface;
+
+/* register accel-specific interface */
+void cpus_register_accel_interface(CpusAccelInterface *i);
+
+/*
+ * these are the registrable vcpu start functions for all accelerators.
+ * They are arguments to qemu_register_start_vcpu();
+ */
+void qemu_dummy_start_vcpu(CPUState *cpu);
+void qemu_tcg_init_vcpu(CPUState *cpu);
+void qemu_kvm_start_vcpu(CPUState *cpu);
+void qemu_hax_start_vcpu(CPUState *cpu);
+void qemu_hvf_start_vcpu(CPUState *cpu);
+void qemu_whpx_start_vcpu(CPUState *cpu);
+/* end of vcpu start functions for accelerators */
+
+/* interface available for cpus accelerator threads */
+
+/* For temporary buffers for forming a name */
+#define VCPU_THREAD_NAME_SIZE 16
+
+void cpus_kick_thread(CPUState *cpu);
+bool cpu_thread_is_idle(CPUState *cpu);
 bool all_cpu_threads_idle(void);
+bool cpu_can_run(CPUState *cpu);
+void qemu_wait_io_event_common(CPUState *cpu);
+void qemu_wait_io_event(CPUState *cpu);
+void cpu_thread_signal_created(CPUState *cpu);
+void cpu_thread_signal_destroyed(CPUState *cpu);
+void cpu_handle_guest_debug(CPUState *cpu);
+
+/* end interface for cpus accelerator threads */
+
 bool qemu_in_vcpu_thread(void);
 void qemu_init_cpu_loop(void);
 void resume_all_vcpus(void);
diff --git a/include/sysemu/hvf.h b/include/sysemu/hvf.h
index d211e808e9..cdd4172b24 100644
--- a/include/sysemu/hvf.h
+++ b/include/sysemu/hvf.h
@@ -86,7 +86,6 @@ int hvf_smp_cpu_exec(CPUState *);
 void hvf_cpu_synchronize_state(CPUState *);
 void hvf_cpu_synchronize_post_reset(CPUState *);
 void hvf_cpu_synchronize_post_init(CPUState *);
-void _hvf_cpu_synchronize_post_init(CPUState *, run_on_cpu_data);
 
 void hvf_vcpu_destroy(CPUState *);
 void hvf_raise_event(CPUState *);
diff --git a/include/sysemu/hw_accel.h b/include/sysemu/hw_accel.h
index 0ec2372477..336740e10a 100644
--- a/include/sysemu/hw_accel.h
+++ b/include/sysemu/hw_accel.h
@@ -1,5 +1,5 @@
 /*
- * QEMU Hardware accelertors support
+ * QEMU Hardware accelerators support
  *
  * Copyright 2016 Google, Inc.
  *
@@ -16,56 +16,9 @@
 #include "sysemu/kvm.h"
 #include "sysemu/whpx.h"
 
-static inline void cpu_synchronize_state(CPUState *cpu)
-{
-    if (kvm_enabled()) {
-        kvm_cpu_synchronize_state(cpu);
-    }
-    if (hax_enabled()) {
-        hax_cpu_synchronize_state(cpu);
-    }
-    if (whpx_enabled()) {
-        whpx_cpu_synchronize_state(cpu);
-    }
-}
-
-static inline void cpu_synchronize_post_reset(CPUState *cpu)
-{
-    if (kvm_enabled()) {
-        kvm_cpu_synchronize_post_reset(cpu);
-    }
-    if (hax_enabled()) {
-        hax_cpu_synchronize_post_reset(cpu);
-    }
-    if (whpx_enabled()) {
-        whpx_cpu_synchronize_post_reset(cpu);
-    }
-}
-
-static inline void cpu_synchronize_post_init(CPUState *cpu)
-{
-    if (kvm_enabled()) {
-        kvm_cpu_synchronize_post_init(cpu);
-    }
-    if (hax_enabled()) {
-        hax_cpu_synchronize_post_init(cpu);
-    }
-    if (whpx_enabled()) {
-        whpx_cpu_synchronize_post_init(cpu);
-    }
-}
-
-static inline void cpu_synchronize_pre_loadvm(CPUState *cpu)
-{
-    if (kvm_enabled()) {
-        kvm_cpu_synchronize_pre_loadvm(cpu);
-    }
-    if (hax_enabled()) {
-        hax_cpu_synchronize_pre_loadvm(cpu);
-    }
-    if (whpx_enabled()) {
-        whpx_cpu_synchronize_pre_loadvm(cpu);
-    }
-}
+void cpu_synchronize_state(CPUState *cpu);
+void cpu_synchronize_post_reset(CPUState *cpu);
+void cpu_synchronize_post_init(CPUState *cpu);
+void cpu_synchronize_pre_loadvm(CPUState *cpu);
 
 #endif /* QEMU_HW_ACCEL_H */
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 3b2250471c..3d84c940db 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -218,7 +218,7 @@ int kvm_has_intx_set_mask(void);
 
 int kvm_init_vcpu(CPUState *cpu);
 int kvm_cpu_exec(CPUState *cpu);
-int kvm_destroy_vcpu(CPUState *cpu);
+void kvm_destroy_vcpu(CPUState *cpu);
 
 /**
  * kvm_arm_supports_user_irq
diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs
index 45be5dc0ed..3bbd72b7cd 100644
--- a/stubs/Makefile.objs
+++ b/stubs/Makefile.objs
@@ -43,4 +43,5 @@ stub-obj-y += pci-host-piix.o
 stub-obj-y += ram-block.o
 stub-obj-y += ramfb.o
 stub-obj-y += fw_cfg.o
+stub-obj-y += cpu-synchronize-state.o
 stub-obj-$(CONFIG_SOFTMMU) += semihost.o
diff --git a/stubs/cpu-synchronize-state.c b/stubs/cpu-synchronize-state.c
new file mode 100644
index 0000000000..3112fe439d
--- /dev/null
+++ b/stubs/cpu-synchronize-state.c
@@ -0,0 +1,15 @@
+#include "qemu/osdep.h"
+#include "sysemu/hw_accel.h"
+
+void cpu_synchronize_state(CPUState *cpu)
+{
+}
+void cpu_synchronize_post_reset(CPUState *cpu)
+{
+}
+void cpu_synchronize_post_init(CPUState *cpu)
+{
+}
+void cpu_synchronize_pre_loadvm(CPUState *cpu)
+{
+}
diff --git a/target/i386/Makefile.objs b/target/i386/Makefile.objs
index 48e0c28434..a847064e68 100644
--- a/target/i386/Makefile.objs
+++ b/target/i386/Makefile.objs
@@ -9,14 +9,15 @@ obj-y += machine.o arch_memory_mapping.o arch_dump.o monitor.o
 obj-$(CONFIG_KVM) += kvm.o
 obj-$(CONFIG_HYPERV) += hyperv.o
 obj-$(call lnot,$(CONFIG_HYPERV)) += hyperv-stub.o
+obj-$(CONFIG_HAX) += hax-all.o hax-mem.o hax-cpus-interface.o
 ifeq ($(CONFIG_WIN32),y)
-obj-$(CONFIG_HAX) += hax-all.o hax-mem.o hax-windows.o
+obj-$(CONFIG_HAX) += hax-windows.o
 endif
 ifeq ($(CONFIG_POSIX),y)
-obj-$(CONFIG_HAX) += hax-all.o hax-mem.o hax-posix.o
+obj-$(CONFIG_HAX) += hax-posix.o
 endif
 obj-$(CONFIG_HVF) += hvf/
-obj-$(CONFIG_WHPX) += whpx-all.o
+obj-$(CONFIG_WHPX) += whpx-all.o whpx-cpus-interface.o
 endif
 obj-$(CONFIG_SEV) += sev.o
 obj-$(call lnot,$(CONFIG_SEV)) += sev-stub.o
diff --git a/target/i386/hax-all.c b/target/i386/hax-all.c
index f9c83fff25..56c99bffd2 100644
--- a/target/i386/hax-all.c
+++ b/target/i386/hax-all.c
@@ -32,9 +32,10 @@
 #include "sysemu/accel.h"
 #include "sysemu/reset.h"
 #include "sysemu/runstate.h"
-#include "qemu/main-loop.h"
 #include "hw/boards.h"
 
+#include "hax-cpus-interface.h"
+
 #define DEBUG_HAX 0
 
 #define DPRINTF(fmt, ...) \
@@ -361,6 +362,9 @@ static int hax_accel_init(MachineState *ms)
                 !ret ? "working" : "not working",
                 !ret ? "fast virt" : "emulation");
     }
+    if (ret == 0) {
+        cpus_register_accel_interface(&hax_cpus_interface);
+    }
     return ret;
 }
 
diff --git a/target/i386/hax-cpus-interface.c b/target/i386/hax-cpus-interface.c
new file mode 100644
index 0000000000..85cbfb4ae8
--- /dev/null
+++ b/target/i386/hax-cpus-interface.c
@@ -0,0 +1,85 @@
+/*
+ * QEMU HAX support
+ *
+ * Copyright IBM, Corp. 2008
+ *           Red Hat, Inc. 2008
+ *
+ * Authors:
+ *  Anthony Liguori   <aliguori@us.ibm.com>
+ *  Glauber Costa     <gcosta@redhat.com>
+ *
+ * Copyright (c) 2011 Intel Corporation
+ *  Written by:
+ *  Jiang Yunhong<yunhong.jiang@intel.com>
+ *  Xin Xiaohui<xiaohui.xin@intel.com>
+ *  Zhang Xiantao<xiantao.zhang@intel.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/error-report.h"
+#include "qemu/main-loop.h"
+#include "hax-i386.h"
+#include "sysemu/runstate.h"
+#include "sysemu/cpus.h"
+#include "qemu/guest-random.h"
+
+#include "hax-cpus-interface.h"
+
+static void *hax_cpu_thread_fn(void *arg)
+{
+    CPUState *cpu = arg;
+    int r;
+
+    rcu_register_thread();
+    qemu_mutex_lock_iothread();
+    qemu_thread_get_self(cpu->thread);
+
+    cpu->thread_id = qemu_get_thread_id();
+    hax_init_vcpu(cpu);
+    cpu_thread_signal_created(cpu);
+    qemu_guest_random_seed_thread_part2(cpu->random_seed);
+
+    do {
+        if (cpu_can_run(cpu)) {
+            r = hax_smp_cpu_exec(cpu);
+            if (r == EXCP_DEBUG) {
+                cpu_handle_guest_debug(cpu);
+            }
+        }
+
+        qemu_wait_io_event(cpu);
+    } while (!cpu->unplug || cpu_can_run(cpu));
+    rcu_unregister_thread();
+    return NULL;
+}
+
+static void hax_start_vcpu_thread(CPUState *cpu)
+{
+    char thread_name[VCPU_THREAD_NAME_SIZE];
+
+    cpu->thread = g_malloc0(sizeof(QemuThread));
+    cpu->halt_cond = g_malloc0(sizeof(QemuCond));
+    qemu_cond_init(cpu->halt_cond);
+
+    snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/HAX",
+             cpu->cpu_index);
+    qemu_thread_create(cpu->thread, thread_name, hax_cpu_thread_fn,
+                       cpu, QEMU_THREAD_JOINABLE);
+#ifdef _WIN32
+    cpu->hThread = qemu_thread_get_handle(cpu->thread);
+#endif
+}
+
+CpusAccelInterface hax_cpus_interface = {
+    .create_vcpu_thread = hax_start_vcpu_thread,
+    .kick_vcpu_thread = hax_kick_vcpu_thread,
+
+    .cpu_synchronize_post_reset = hax_cpu_synchronize_post_reset,
+    .cpu_synchronize_post_init = hax_cpu_synchronize_post_init,
+    .cpu_synchronize_state = hax_cpu_synchronize_state,
+    .cpu_synchronize_pre_loadvm = hax_cpu_synchronize_pre_loadvm,
+};
diff --git a/target/i386/hax-cpus-interface.h b/target/i386/hax-cpus-interface.h
new file mode 100644
index 0000000000..a06f3df752
--- /dev/null
+++ b/target/i386/hax-cpus-interface.h
@@ -0,0 +1,8 @@
+#ifndef HAX_CPUS_INTERFACE_H
+#define HAX_CPUS_INTERFACE_H
+
+#include "sysemu/cpus.h"
+
+extern CpusAccelInterface hax_cpus_interface;
+
+#endif /* HAX_CPUS_INTERFACE */
diff --git a/target/i386/hax-i386.h b/target/i386/hax-i386.h
index 54e9d8b057..667139c7af 100644
--- a/target/i386/hax-i386.h
+++ b/target/i386/hax-i386.h
@@ -61,6 +61,8 @@ int hax_inject_interrupt(CPUArchState *env, int vector);
 struct hax_vm *hax_vm_create(struct hax_state *hax);
 int hax_vcpu_run(struct hax_vcpu_state *vcpu);
 int hax_vcpu_create(int id);
+void hax_kick_vcpu_thread(CPUState *cpu);
+
 int hax_sync_vcpu_state(CPUArchState *env, struct vcpu_state_t *state,
                         int set);
 int hax_sync_msr(CPUArchState *env, struct hax_msr_data *msrs, int set);
diff --git a/target/i386/hax-posix.c b/target/i386/hax-posix.c
index 3bad89f133..ea956ddfc1 100644
--- a/target/i386/hax-posix.c
+++ b/target/i386/hax-posix.c
@@ -16,6 +16,8 @@
 
 #include "target/i386/hax-i386.h"
 
+#include "sysemu/cpus.h"
+
 hax_fd hax_mod_open(void)
 {
     int fd = open("/dev/HAX", O_RDWR);
@@ -292,3 +294,13 @@ int hax_inject_interrupt(CPUArchState *env, int vector)
 
     return ioctl(fd, HAX_VCPU_IOCTL_INTERRUPT, &vector);
 }
+
+void hax_kick_vcpu_thread(CPUState *cpu)
+{
+    /*
+     * FIXME: race condition with the exit_request check in
+     * hax_vcpu_hax_exec
+     */
+    cpu->exit_request = 1;
+    cpus_kick_thread(cpu);
+}
diff --git a/target/i386/hax-windows.c b/target/i386/hax-windows.c
index 863c2bcc19..469b48e608 100644
--- a/target/i386/hax-windows.c
+++ b/target/i386/hax-windows.c
@@ -463,3 +463,23 @@ int hax_inject_interrupt(CPUArchState *env, int vector)
         return 0;
     }
 }
+
+static void CALLBACK dummy_apc_func(ULONG_PTR unused)
+{
+}
+
+void hax_kick_vcpu_thread(CPUState *cpu)
+{
+    /*
+     * FIXME: race condition with the exit_request check in
+     * hax_vcpu_hax_exec
+     */
+    cpu->exit_request = 1;
+    if (!qemu_cpu_is_self(cpu)) {
+        if (!QueueUserAPC(dummy_apc_func, cpu->hThread, 0)) {
+            fprintf(stderr, "%s: QueueUserAPC failed with error %lu\n",
+                    __func__, GetLastError());
+            exit(1);
+        }
+    }
+}
diff --git a/target/i386/hvf/Makefile.objs b/target/i386/hvf/Makefile.objs
index 927b86bc67..bdbc2c0227 100644
--- a/target/i386/hvf/Makefile.objs
+++ b/target/i386/hvf/Makefile.objs
@@ -1,2 +1,2 @@
-obj-y += hvf.o
+obj-y += hvf.o hvf-cpus-interface.o
 obj-y += x86.o x86_cpuid.o x86_decode.o x86_descr.o x86_emu.o x86_flags.o x86_mmu.o x86hvf.o x86_task.o
diff --git a/target/i386/hvf/hvf-cpus-interface.c b/target/i386/hvf/hvf-cpus-interface.c
new file mode 100644
index 0000000000..54bfe307c1
--- /dev/null
+++ b/target/i386/hvf/hvf-cpus-interface.c
@@ -0,0 +1,92 @@
+#include "qemu/osdep.h"
+#include "qemu/error-report.h"
+#include "qemu/main-loop.h"
+#include "sysemu/hvf.h"
+#include "sysemu/runstate.h"
+#include "sysemu/cpus.h"
+#include "qemu/guest-random.h"
+
+#include "hvf-cpus-interface.h"
+
+/*
+ * The HVF-specific vCPU thread function. This one should only run when the host
+ * CPU supports the VMX "unrestricted guest" feature.
+ */
+static void *hvf_cpu_thread_fn(void *arg)
+{
+    CPUState *cpu = arg;
+
+    int r;
+
+    assert(hvf_enabled());
+
+    rcu_register_thread();
+
+    qemu_mutex_lock_iothread();
+    qemu_thread_get_self(cpu->thread);
+
+    cpu->thread_id = qemu_get_thread_id();
+    cpu->can_do_io = 1;
+    current_cpu = cpu;
+
+    hvf_init_vcpu(cpu);
+
+    /* signal CPU creation */
+    cpu_thread_signal_created(cpu);
+    qemu_guest_random_seed_thread_part2(cpu->random_seed);
+
+    do {
+        if (cpu_can_run(cpu)) {
+            r = hvf_vcpu_exec(cpu);
+            if (r == EXCP_DEBUG) {
+                cpu_handle_guest_debug(cpu);
+            }
+        }
+        qemu_wait_io_event(cpu);
+    } while (!cpu->unplug || cpu_can_run(cpu));
+
+    hvf_vcpu_destroy(cpu);
+    cpu_thread_signal_destroyed(cpu);
+    qemu_mutex_unlock_iothread();
+    rcu_unregister_thread();
+    return NULL;
+}
+
+static void hvf_kick_vcpu_thread(CPUState *cpu)
+{
+    cpus_kick_thread(cpu);
+}
+
+static void hvf_cpu_synchronize_noop(CPUState *cpu)
+{
+}
+
+static void hvf_start_vcpu_thread(CPUState *cpu)
+{
+    char thread_name[VCPU_THREAD_NAME_SIZE];
+
+    /*
+     * HVF currently does not support TCG, and only runs in
+     * unrestricted-guest mode.
+     */
+    assert(hvf_enabled());
+
+    cpu->thread = g_malloc0(sizeof(QemuThread));
+    cpu->halt_cond = g_malloc0(sizeof(QemuCond));
+    qemu_cond_init(cpu->halt_cond);
+
+    snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/HVF",
+             cpu->cpu_index);
+    qemu_thread_create(cpu->thread, thread_name, hvf_cpu_thread_fn,
+                       cpu, QEMU_THREAD_JOINABLE);
+}
+
+CpusAccelInterface hvf_cpus_interface = {
+    .create_vcpu_thread = hvf_start_vcpu_thread,
+    .kick_vcpu_thread = hvf_kick_vcpu_thread,
+
+    .cpu_synchronize_post_reset = hvf_cpu_synchronize_noop,
+    .cpu_synchronize_post_init = hvf_cpu_synchronize_noop,
+    .cpu_synchronize_state = hvf_cpu_synchronize_noop,
+    .cpu_synchronize_pre_loadvm = hvf_cpu_synchronize_noop,
+};
diff --git a/target/i386/hvf/hvf-cpus-interface.h b/target/i386/hvf/hvf-cpus-interface.h
new file mode 100644
index 0000000000..6ea38742e5
--- /dev/null
+++ b/target/i386/hvf/hvf-cpus-interface.h
@@ -0,0 +1,8 @@
+#ifndef HVF_CPUS_INTERFACE_H
+#define HVF_CPUS_INTERFACE_H
+
+#include "sysemu/cpus.h"
+
+extern CpusAccelInterface hvf_cpus_interface;
+
+#endif /* HVF_CPUS_INTERFACE */
diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c
index d72543dc31..75cdf88cec 100644
--- a/target/i386/hvf/hvf.c
+++ b/target/i386/hvf/hvf.c
@@ -72,6 +72,8 @@
 #include "sysemu/accel.h"
 #include "target/i386/cpu.h"
 
+#include "hvf-cpus-interface.h"
+
 HVFState *hvf_state;
 
 static void assert_hvf_ok(hv_return_t ret)
@@ -312,7 +314,7 @@ void hvf_cpu_synchronize_post_reset(CPUState *cpu_state)
     run_on_cpu(cpu_state, do_hvf_cpu_synchronize_post_reset, RUN_ON_CPU_NULL);
 }
 
-void _hvf_cpu_synchronize_post_init(CPUState *cpu, run_on_cpu_data arg)
+static void _hvf_cpu_synchronize_post_init(CPUState *cpu, run_on_cpu_data arg)
 {
     CPUState *cpu_state = cpu;
     hvf_put_registers(cpu_state);
@@ -979,6 +981,7 @@ static int hvf_accel_init(MachineState *ms)
     hvf_state = s;
     cpu_interrupt_handler = hvf_handle_interrupt;
     memory_listener_register(&hvf_memory_listener, &address_space_memory);
+    cpus_register_accel_interface(&hvf_cpus_interface);
     return 0;
 }
 
diff --git a/target/i386/whpx-all.c b/target/i386/whpx-all.c
index c78baac6df..daac9858ae 100644
--- a/target/i386/whpx-all.c
+++ b/target/i386/whpx-all.c
@@ -24,6 +24,8 @@
 #include "migration/blocker.h"
 #include "whp-dispatch.h"
 
+#include "whpx-cpus-interface.h"
+
 #include <WinHvPlatform.h>
 #include <WinHvEmulation.h>
 
@@ -1575,6 +1577,7 @@ static int whpx_accel_init(MachineState *ms)
     whpx_memory_init();
 
     cpu_interrupt_handler = whpx_handle_interrupt;
+    cpus_register_accel_interface(&whpx_cpus_interface);
 
     printf("Windows Hypervisor Platform accelerator is operational\n");
     return 0;
diff --git a/target/i386/whpx-cpus-interface.c b/target/i386/whpx-cpus-interface.c
new file mode 100644
index 0000000000..d2a4ef0699
--- /dev/null
+++ b/target/i386/whpx-cpus-interface.c
@@ -0,0 +1,96 @@
+/*
+ * QEMU Windows Hypervisor Platform accelerator (WHPX)
+ *
+ * Copyright Microsoft Corp. 2017
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "sysemu/kvm_int.h"
+#include "qemu/main-loop.h"
+#include "sysemu/cpus.h"
+#include "qemu/guest-random.h"
+
+#include "sysemu/whpx.h"
+#include "whpx-cpus-interface.h"
+
+#include <WinHvPlatform.h>
+#include <WinHvEmulation.h>
+
+static void *whpx_cpu_thread_fn(void *arg)
+{
+    CPUState *cpu = arg;
+    int r;
+
+    rcu_register_thread();
+
+    qemu_mutex_lock_iothread();
+    qemu_thread_get_self(cpu->thread);
+    cpu->thread_id = qemu_get_thread_id();
+    current_cpu = cpu;
+
+    r = whpx_init_vcpu(cpu);
+    if (r < 0) {
+        fprintf(stderr, "whpx_init_vcpu failed: %s\n", strerror(-r));
+        exit(1);
+    }
+
+    /* signal CPU creation */
+    cpu_thread_signal_created(cpu);
+    qemu_guest_random_seed_thread_part2(cpu->random_seed);
+
+    do {
+        if (cpu_can_run(cpu)) {
+            r = whpx_vcpu_exec(cpu);
+            if (r == EXCP_DEBUG) {
+                cpu_handle_guest_debug(cpu);
+            }
+        }
+        while (cpu_thread_is_idle(cpu)) {
+            qemu_cond_wait_iothread(cpu->halt_cond);
+        }
+        qemu_wait_io_event_common(cpu);
+    } while (!cpu->unplug || cpu_can_run(cpu));
+
+    whpx_destroy_vcpu(cpu);
+    cpu_thread_signal_destroyed(cpu);
+    qemu_mutex_unlock_iothread();
+    rcu_unregister_thread();
+    return NULL;
+}
+
+static void whpx_start_vcpu_thread(CPUState *cpu)
+{
+    char thread_name[VCPU_THREAD_NAME_SIZE];
+
+    cpu->thread = g_malloc0(sizeof(QemuThread));
+    cpu->halt_cond = g_malloc0(sizeof(QemuCond));
+    qemu_cond_init(cpu->halt_cond);
+    snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/WHPX",
+             cpu->cpu_index);
+    qemu_thread_create(cpu->thread, thread_name, whpx_cpu_thread_fn,
+                       cpu, QEMU_THREAD_JOINABLE);
+#ifdef _WIN32
+    cpu->hThread = qemu_thread_get_handle(cpu->thread);
+#endif
+}
+
+static void whpx_kick_vcpu_thread(CPUState *cpu)
+{
+    if (!qemu_cpu_is_self(cpu)) {
+        whpx_vcpu_kick(cpu);
+    }
+}
+
+CpusAccelInterface whpx_cpus_interface = {
+    .create_vcpu_thread = whpx_start_vcpu_thread,
+    .kick_vcpu_thread = whpx_kick_vcpu_thread,
+
+    .cpu_synchronize_post_reset = whpx_cpu_synchronize_post_reset,
+    .cpu_synchronize_post_init = whpx_cpu_synchronize_post_init,
+    .cpu_synchronize_state = whpx_cpu_synchronize_state,
+    .cpu_synchronize_pre_loadvm = whpx_cpu_synchronize_pre_loadvm,
+};
diff --git a/target/i386/whpx-cpus-interface.h b/target/i386/whpx-cpus-interface.h
new file mode 100644
index 0000000000..084e8b15b8
--- /dev/null
+++ b/target/i386/whpx-cpus-interface.h
@@ -0,0 +1,8 @@
+#ifndef WHPX_CPUS_INTERFACE_H
+#define WHPX_CPUS_INTERFACE_H
+
+#include "sysemu/cpus.h"
+
+extern CpusAccelInterface whpx_cpus_interface;
+
+#endif /* WHPX_CPUS_INTERFACE */
-- 
2.16.4



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC 1/3] cpu-throttle: new module, extracted from cpus.c
  2020-05-21 18:54 ` [RFC 1/3] cpu-throttle: new module, extracted from cpus.c Claudio Fontana
@ 2020-05-22  6:07   ` Thomas Huth
  2020-05-22  8:15     ` Claudio Fontana
  0 siblings, 1 reply; 11+ messages in thread
From: Thomas Huth @ 2020-05-22  6:07 UTC (permalink / raw)
  To: Claudio Fontana
  Cc: Laurent Vivier, Peter Maydell, Eduardo Habkost, Alex Bennée,
	open list:X86 HAXM CPUs, Marcelo Tosatti,
	open list:All patches CC here, Roman Bolshakov, Colin Xu,
	Wenchao Wang, Paolo Bonzini, Sunil Muthuswamy,
	Philippe Mathieu-Daudé,
	Richard Henderson

> From: "Claudio Fontana" <cfontana@suse.de>
> Sent: Thursday, May 21, 2020 8:54:05 PM
> 
> this is a first step in the refactoring of cpus.c.

Could you maybe extend the commit message in the next version a little bit? ... say something about *what* you are moving to a separate file (and maybe why it is ok to move it), etc.?

 Thanks,
  Thomas



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC 1/3] cpu-throttle: new module, extracted from cpus.c
  2020-05-22  6:07   ` Thomas Huth
@ 2020-05-22  8:15     ` Claudio Fontana
  2020-05-22 10:26       ` Alex Bennée
  0 siblings, 1 reply; 11+ messages in thread
From: Claudio Fontana @ 2020-05-22  8:15 UTC (permalink / raw)
  To: Thomas Huth
  Cc: Laurent Vivier, Peter Maydell, Eduardo Habkost,
	Philippe Mathieu-Daudé,
	Marcelo Tosatti, open list:All patches CC here, Roman Bolshakov,
	open list:X86 HAXM CPUs, Colin Xu, Paolo Bonzini,
	Sunil Muthuswamy, Richard Henderson, Alex Bennée,
	Wenchao Wang

On 5/22/20 8:07 AM, Thomas Huth wrote:
>> From: "Claudio Fontana" <cfontana@suse.de>
>> Sent: Thursday, May 21, 2020 8:54:05 PM
>>
>> this is a first step in the refactoring of cpus.c.
> 
> Could you maybe extend the commit message in the next version a little bit? ... say something about *what* you are moving to a separate file (and maybe why it is ok to move it), etc.?
> 
>  Thanks,
>   Thomas
> 
> 

Hello Thomas,

thanks for taking a look, I will add an explanatory message.

I was thinking something along the lines of:

"
move the vcpu throttling functionality into its own module.
It contains the controls to adjust and inspect vcpu throttling settings, start (set) and stop
vcpu throttling, and the throttling function itself that is run periodically on vcpus
to make them take a nap.
Execution of the throttling function on all vcpus is triggered by a timer,
registered at module initialization.

No functionality change.
"

Thanks,

Claudio







^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC 1/3] cpu-throttle: new module, extracted from cpus.c
  2020-05-22  8:15     ` Claudio Fontana
@ 2020-05-22 10:26       ` Alex Bennée
  2020-05-22 10:54         ` Claudio Fontana
  0 siblings, 1 reply; 11+ messages in thread
From: Alex Bennée @ 2020-05-22 10:26 UTC (permalink / raw)
  To: Claudio Fontana
  Cc: Laurent Vivier, Peter Maydell, Thomas Huth, Eduardo Habkost,
	Marcelo Tosatti, open list:All patches CC here, Roman Bolshakov,
	open list:X86 HAXM CPUs, Colin Xu, Paolo Bonzini,
	Sunil Muthuswamy, Richard Henderson, Philippe Mathieu-Daudé,
	Wenchao Wang


Claudio Fontana <cfontana@suse.de> writes:

> On 5/22/20 8:07 AM, Thomas Huth wrote:
>>> From: "Claudio Fontana" <cfontana@suse.de>
>>> Sent: Thursday, May 21, 2020 8:54:05 PM
>>>
>>> this is a first step in the refactoring of cpus.c.
>> 
>> Could you maybe extend the commit message in the next version a little bit? ... say something about *what* you are moving to a separate file (and maybe why it is ok to move it), etc.?
>> 
>>  Thanks,
>>   Thomas
>> 
>> 
>
> Hello Thomas,
>
> thanks for taking a look, I will add an explanatory message.
>
> I was thinking something along the lines of:
>
> "
> move the vcpu throttling functionality into its own module.
> It contains the controls to adjust and inspect vcpu throttling settings, start (set) and stop
> vcpu throttling, and the throttling function itself that is run periodically on vcpus
> to make them take a nap.
> Execution of the throttling function on all vcpus is triggered by a timer,
> registered at module initialization.
>
> No functionality change.
> "

Is vcpu throttling a TCG only feature?

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC 1/3] cpu-throttle: new module, extracted from cpus.c
  2020-05-22 10:26       ` Alex Bennée
@ 2020-05-22 10:54         ` Claudio Fontana
  2020-05-22 11:18           ` Alex Bennée
  0 siblings, 1 reply; 11+ messages in thread
From: Claudio Fontana @ 2020-05-22 10:54 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Laurent Vivier, Peter Maydell, Thomas Huth, Eduardo Habkost,
	Marcelo Tosatti, open list:All patches CC here, Roman Bolshakov,
	open list:X86 HAXM CPUs, Colin Xu, Paolo Bonzini,
	Sunil Muthuswamy, Richard Henderson, Philippe Mathieu-Daudé,
	Wenchao Wang

On 5/22/20 12:26 PM, Alex Bennée wrote:
> 
> Claudio Fontana <cfontana@suse.de> writes:
> 
>> On 5/22/20 8:07 AM, Thomas Huth wrote:
>>>> From: "Claudio Fontana" <cfontana@suse.de>
>>>> Sent: Thursday, May 21, 2020 8:54:05 PM
>>>>
>>>> this is a first step in the refactoring of cpus.c.
>>>
>>> Could you maybe extend the commit message in the next version a little bit? ... say something about *what* you are moving to a separate file (and maybe why it is ok to move it), etc.?
>>>
>>>  Thanks,
>>>   Thomas
>>>
>>>
>>
>> Hello Thomas,
>>
>> thanks for taking a look, I will add an explanatory message.
>>
>> I was thinking something along the lines of:
>>
>> "
>> move the vcpu throttling functionality into its own module.
>> It contains the controls to adjust and inspect vcpu throttling settings, start (set) and stop
>> vcpu throttling, and the throttling function itself that is run periodically on vcpus
>> to make them take a nap.
>> Execution of the throttling function on all vcpus is triggered by a timer,
>> registered at module initialization.
>>
>> No functionality change.
>> "
> 
> Is vcpu throttling a TCG only feature?
> 

No, are you suggesting we only refactor code out of cpus.c based on whether it's tcg or not?

Ciao,

Claudio


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC 1/3] cpu-throttle: new module, extracted from cpus.c
  2020-05-22 10:54         ` Claudio Fontana
@ 2020-05-22 11:18           ` Alex Bennée
  2020-05-22 11:23             ` Claudio Fontana
  0 siblings, 1 reply; 11+ messages in thread
From: Alex Bennée @ 2020-05-22 11:18 UTC (permalink / raw)
  To: Claudio Fontana
  Cc: Laurent Vivier, Peter Maydell, Thomas Huth, Eduardo Habkost,
	Marcelo Tosatti, open list:All patches CC here, Roman Bolshakov,
	open list:X86 HAXM CPUs, Colin Xu, Paolo Bonzini,
	Sunil Muthuswamy, Richard Henderson, Philippe Mathieu-Daudé,
	Wenchao Wang


Claudio Fontana <cfontana@suse.de> writes:

> On 5/22/20 12:26 PM, Alex Bennée wrote:
>> 
>> Claudio Fontana <cfontana@suse.de> writes:
>> 
>>> On 5/22/20 8:07 AM, Thomas Huth wrote:
>>>>> From: "Claudio Fontana" <cfontana@suse.de>
>>>>> Sent: Thursday, May 21, 2020 8:54:05 PM
>>>>>
>>>>> this is a first step in the refactoring of cpus.c.
>>>>
>>>> Could you maybe extend the commit message in the next version a little bit? ... say something about *what* you are moving to a separate file (and maybe why it is ok to move it), etc.?
>>>>
>>>>  Thanks,
>>>>   Thomas
>>>>
>>>>
>>>
>>> Hello Thomas,
>>>
>>> thanks for taking a look, I will add an explanatory message.
>>>
>>> I was thinking something along the lines of:
>>>
>>> "
>>> move the vcpu throttling functionality into its own module.
>>> It contains the controls to adjust and inspect vcpu throttling settings, start (set) and stop
>>> vcpu throttling, and the throttling function itself that is run periodically on vcpus
>>> to make them take a nap.
>>> Execution of the throttling function on all vcpus is triggered by a timer,
>>> registered at module initialization.
>>>
>>> No functionality change.
>>> "
>> 
>> Is vcpu throttling a TCG only feature?
>> 
>
> No, are you suggesting we only refactor code out of cpus.c based on
> whether it's tcg or not?

No - but  we should make it clear in the commit message that it is used
by both. I must admit I thought it was only a TCG feature which just
goes to show what I know ;-)

>
> Ciao,
>
> Claudio


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC 1/3] cpu-throttle: new module, extracted from cpus.c
  2020-05-22 11:18           ` Alex Bennée
@ 2020-05-22 11:23             ` Claudio Fontana
  0 siblings, 0 replies; 11+ messages in thread
From: Claudio Fontana @ 2020-05-22 11:23 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Laurent Vivier, Peter Maydell, Thomas Huth, Eduardo Habkost,
	Marcelo Tosatti, open list:All patches CC here, Roman Bolshakov,
	open list:X86 HAXM CPUs, Colin Xu, Paolo Bonzini,
	Sunil Muthuswamy, Richard Henderson, Philippe Mathieu-Daudé,
	Wenchao Wang

On 5/22/20 1:18 PM, Alex Bennée wrote:
> 
> Claudio Fontana <cfontana@suse.de> writes:
> 
>> On 5/22/20 12:26 PM, Alex Bennée wrote:
>>>
>>> Claudio Fontana <cfontana@suse.de> writes:
>>>
>>>> On 5/22/20 8:07 AM, Thomas Huth wrote:
>>>>>> From: "Claudio Fontana" <cfontana@suse.de>
>>>>>> Sent: Thursday, May 21, 2020 8:54:05 PM
>>>>>>
>>>>>> this is a first step in the refactoring of cpus.c.
>>>>>
>>>>> Could you maybe extend the commit message in the next version a little bit? ... say something about *what* you are moving to a separate file (and maybe why it is ok to move it), etc.?
>>>>>
>>>>>  Thanks,
>>>>>   Thomas
>>>>>
>>>>>
>>>>
>>>> Hello Thomas,
>>>>
>>>> thanks for taking a look, I will add an explanatory message.
>>>>
>>>> I was thinking something along the lines of:
>>>>
>>>> "
>>>> move the vcpu throttling functionality into its own module.
>>>> It contains the controls to adjust and inspect vcpu throttling settings, start (set) and stop
>>>> vcpu throttling, and the throttling function itself that is run periodically on vcpus
>>>> to make them take a nap.
>>>> Execution of the throttling function on all vcpus is triggered by a timer,
>>>> registered at module initialization.
>>>>
>>>> No functionality change.
>>>> "
>>>
>>> Is vcpu throttling a TCG only feature?
>>>
>>
>> No, are you suggesting we only refactor code out of cpus.c based on
>> whether it's tcg or not?
> 
> No - but  we should make it clear in the commit message that it is used
> by both. I must admit I thought it was only a TCG feature which just
> goes to show what I know ;-)
> 

Ah, you are right thanks; I could mention its users,
currently migration convergence and on MacOS the cocoa UI.

Ciao,

Claudio 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC 2/3] cpu-timers: new module extracted from cpus.c
  2020-05-21 18:54 ` [RFC 2/3] cpu-timers: new module " Claudio Fontana
@ 2020-05-22 13:49   ` Claudio Fontana
  0 siblings, 0 replies; 11+ messages in thread
From: Claudio Fontana @ 2020-05-22 13:49 UTC (permalink / raw)
  To: Paolo Bonzini, Alex Bennée, Peter Maydell,
	Philippe Mathieu-Daudé
  Cc: Laurent Vivier, Thomas Huth, Eduardo Habkost, Marcelo Tosatti,
	open list:All patches CC here, Roman Bolshakov,
	open list:X86 HAXM CPUs, Wenchao Wang, Sunil Muthuswamy,
	Richard Henderson, Colin Xu


here the obvious next step would be to split icount as its own module,
and make it tcg-only.

The issue I am facing with that is that qtest is actually requiring this,
as it is "using" icount for its qtest_clock_warp functionality.

Does somebody have a good idea here that might save some time?

Thanks,

Claudio

On 5/21/20 8:54 PM, Claudio Fontana wrote:
> Signed-off-by: Claudio Fontana <cfontana@suse.de>
> ---
>  MAINTAINERS                 |   1 +
>  Makefile.target             |   1 +
>  accel/qtest.c               |   3 +-
>  accel/tcg/cpu-exec.c        |  43 ++-
>  accel/tcg/tcg-all.c         |   7 +-
>  accel/tcg/translate-all.c   |   3 +-
>  cpu-timers.c                | 776 ++++++++++++++++++++++++++++++++++++++++++++
>  cpus.c                      | 731 +----------------------------------------
>  docs/replay.txt             |   6 +-
>  exec.c                      |   4 -
>  hw/core/ptimer.c            |   6 +-
>  hw/i386/x86.c               |   1 +
>  include/exec/cpu-all.h      |   4 +
>  include/exec/exec-all.h     |   4 +-
>  include/qemu/timer.h        |  20 --
>  include/sysemu/cpu-timers.h |  73 +++++
>  include/sysemu/cpus.h       |  12 +-
>  include/sysemu/replay.h     |   4 +-
>  qtest.c                     |   2 +-
>  replay/replay.c             |   6 +-
>  softmmu/vl.c                |   8 +-
>  stubs/clock-warp.c          |   4 +-
>  stubs/cpu-get-clock.c       |   2 +-
>  stubs/cpu-get-icount.c      |  14 +-
>  target/alpha/translate.c    |   3 +-
>  target/arm/helper.c         |   7 +-
>  target/riscv/csr.c          |   8 +-
>  tests/ptimer-test-stubs.c   |   6 +
>  tests/test-timed-average.c  |   2 +-
>  util/main-loop.c            |   4 +-
>  util/qemu-timer.c           |   9 +-
>  31 files changed, 965 insertions(+), 809 deletions(-)
>  create mode 100644 cpu-timers.c
>  create mode 100644 include/sysemu/cpu-timers.h
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 35864a275a..1b3b17fda8 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -2143,6 +2143,7 @@ M: Paolo Bonzini <pbonzini@redhat.com>
>  S: Maintained
>  F: cpus.c
>  F: cpu-throttle.c
> +F: cpu-timers.c
>  F: include/qemu/main-loop.h
>  F: include/sysemu/runstate.h
>  F: util/main-loop.c
> diff --git a/Makefile.target b/Makefile.target
> index 60cfa2a78b..1d40237375 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -155,6 +155,7 @@ ifdef CONFIG_SOFTMMU
>  obj-y += arch_init.o
>  obj-y += cpus.o
>  obj-y += cpu-throttle.o
> +obj-y += cpu-timers.o
>  obj-y += gdbstub.o
>  obj-y += balloon.o
>  obj-y += ioport.o
> diff --git a/accel/qtest.c b/accel/qtest.c
> index 5b88f55921..ef9ee0941a 100644
> --- a/accel/qtest.c
> +++ b/accel/qtest.c
> @@ -19,13 +19,14 @@
>  #include "sysemu/accel.h"
>  #include "sysemu/qtest.h"
>  #include "sysemu/cpus.h"
> +#include "sysemu/cpu-timers.h"
>  
>  static int qtest_init_accel(MachineState *ms)
>  {
>      QemuOpts *opts = qemu_opts_create(qemu_find_opts("icount"), NULL, 0,
>                                        &error_abort);
>      qemu_opt_set(opts, "shift", "0", &error_abort);
> -    configure_icount(opts, &error_abort);
> +    icount_configure(opts, &error_abort);
>      qemu_opts_del(opts);
>      return 0;
>  }
> diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
> index d95c4848a4..82155c1db3 100644
> --- a/accel/tcg/cpu-exec.c
> +++ b/accel/tcg/cpu-exec.c
> @@ -19,6 +19,7 @@
>  
>  #include "qemu/osdep.h"
>  #include "qemu-common.h"
> +#include "qemu/qemu-print.h"
>  #include "cpu.h"
>  #include "trace.h"
>  #include "disas/disas.h"
> @@ -36,6 +37,8 @@
>  #include "hw/i386/apic.h"
>  #endif
>  #include "sysemu/cpus.h"
> +#include "exec/cpu-all.h"
> +#include "sysemu/cpu-timers.h"
>  #include "sysemu/replay.h"
>  
>  /* -icount align implementation. */
> @@ -56,6 +59,9 @@ typedef struct SyncClocks {
>  #define MAX_DELAY_PRINT_RATE 2000000000LL
>  #define MAX_NB_PRINTS 100
>  
> +static int64_t max_delay;
> +static int64_t max_advance;
> +
>  static void align_clocks(SyncClocks *sc, CPUState *cpu)
>  {
>      int64_t cpu_icount;
> @@ -65,7 +71,7 @@ static void align_clocks(SyncClocks *sc, CPUState *cpu)
>      }
>  
>      cpu_icount = cpu->icount_extra + cpu_neg(cpu)->icount_decr.u16.low;
> -    sc->diff_clk += cpu_icount_to_ns(sc->last_cpu_icount - cpu_icount);
> +    sc->diff_clk += icount_to_ns(sc->last_cpu_icount - cpu_icount);
>      sc->last_cpu_icount = cpu_icount;
>  
>      if (sc->diff_clk > VM_CLOCK_ADVANCE) {
> @@ -98,9 +104,9 @@ static void print_delay(const SyncClocks *sc)
>              (-sc->diff_clk / (float)1000000000LL <
>               (threshold_delay - THRESHOLD_REDUCE))) {
>              threshold_delay = (-sc->diff_clk / 1000000000LL) + 1;
> -            printf("Warning: The guest is now late by %.1f to %.1f seconds\n",
> -                   threshold_delay - 1,
> -                   threshold_delay);
> +            qemu_printf("Warning: The guest is now late by %.1f to %.1f seconds\n",
> +                        threshold_delay - 1,
> +                        threshold_delay);
>              nb_prints++;
>              last_realtime_clock = sc->realtime_clock;
>          }
> @@ -597,7 +603,7 @@ static inline bool cpu_handle_interrupt(CPUState *cpu,
>  
>      /* Finally, check if we need to exit to the main loop.  */
>      if (unlikely(atomic_read(&cpu->exit_request))
> -        || (use_icount
> +        || (icount_enabled()
>              && cpu_neg(cpu)->icount_decr.u16.low + cpu->icount_extra == 0)) {
>          atomic_set(&cpu->exit_request, 0);
>          if (cpu->exception_index == -1) {
> @@ -638,10 +644,10 @@ static inline void cpu_loop_exec_tb(CPUState *cpu, TranslationBlock *tb,
>      }
>  
>      /* Instruction counter expired.  */
> -    assert(use_icount);
> +    assert(icount_enabled());
>  #ifndef CONFIG_USER_ONLY
>      /* Ensure global icount has gone forward */
> -    cpu_update_icount(cpu);
> +    icount_update(cpu);
>      /* Refill decrementer and continue execution.  */
>      insns_left = MIN(0xffff, cpu->icount_budget);
>      cpu_neg(cpu)->icount_decr.u16.low = insns_left;
> @@ -741,3 +747,26 @@ int cpu_exec(CPUState *cpu)
>  
>      return ret;
>  }
> +
> +#ifndef CONFIG_USER_ONLY
> +
> +void dump_drift_info(void)
> +{
> +    if (!icount_enabled()) {
> +        return;
> +    }
> +
> +    qemu_printf("Host - Guest clock  %"PRIi64" ms\n",
> +                (cpu_get_clock() - icount_get()) / SCALE_MS);
> +    if (icount_align_option) {
> +        qemu_printf("Max guest delay     %"PRIi64" ms\n",
> +                    -max_delay / SCALE_MS);
> +        qemu_printf("Max guest advance   %"PRIi64" ms\n",
> +                    max_advance / SCALE_MS);
> +    } else {
> +        qemu_printf("Max guest delay     NA\n");
> +        qemu_printf("Max guest advance   NA\n");
> +    }
> +}
> +
> +#endif /* !CONFIG_USER_ONLY */
> diff --git a/accel/tcg/tcg-all.c b/accel/tcg/tcg-all.c
> index 3b4fda5640..e27385d051 100644
> --- a/accel/tcg/tcg-all.c
> +++ b/accel/tcg/tcg-all.c
> @@ -29,6 +29,7 @@
>  #include "qom/object.h"
>  #include "cpu.h"
>  #include "sysemu/cpus.h"
> +#include "sysemu/cpu-timers.h"
>  #include "qemu/main-loop.h"
>  #include "tcg/tcg.h"
>  #include "qapi/error.h"
> @@ -65,7 +66,7 @@ static void tcg_handle_interrupt(CPUState *cpu, int mask)
>          qemu_cpu_kick(cpu);
>      } else {
>          atomic_set(&cpu_neg(cpu)->icount_decr.u16.high, -1);
> -        if (use_icount &&
> +        if (icount_enabled() &&
>              !cpu->can_do_io
>              && (mask & ~old_mask) != 0) {
>              cpu_abort(cpu, "Raised interrupt while not in I/O function");
> @@ -104,7 +105,7 @@ static bool check_tcg_memory_orders_compatible(void)
>  
>  static bool default_mttcg_enabled(void)
>  {
> -    if (use_icount || TCG_OVERSIZED_GUEST) {
> +    if (icount_enabled() || TCG_OVERSIZED_GUEST) {
>          return false;
>      } else {
>  #ifdef TARGET_SUPPORTS_MTTCG
> @@ -146,7 +147,7 @@ static void tcg_set_thread(Object *obj, const char *value, Error **errp)
>      if (strcmp(value, "multi") == 0) {
>          if (TCG_OVERSIZED_GUEST) {
>              error_setg(errp, "No MTTCG when guest word size > hosts");
> -        } else if (use_icount) {
> +        } else if (icount_enabled()) {
>              error_setg(errp, "No MTTCG when icount is enabled");
>          } else {
>  #ifndef TARGET_SUPPORTS_MTTCG
> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> index 42ce1dfcff..479edeb2ea 100644
> --- a/accel/tcg/translate-all.c
> +++ b/accel/tcg/translate-all.c
> @@ -57,6 +57,7 @@
>  #include "qemu/main-loop.h"
>  #include "exec/log.h"
>  #include "sysemu/cpus.h"
> +#include "sysemu/cpu-timers.h"
>  #include "sysemu/tcg.h"
>  
>  /* #define DEBUG_TB_INVALIDATE */
> @@ -369,7 +370,7 @@ static int cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
>  
>   found:
>      if (reset_icount && (tb_cflags(tb) & CF_USE_ICOUNT)) {
> -        assert(use_icount);
> +        assert(icount_enabled());
>          /* Reset the cycle counter to the start of the block
>             and shift if to the number of actually executed instructions */
>          cpu_neg(cpu)->icount_decr.u16.low += num_insns - i;
> diff --git a/cpu-timers.c b/cpu-timers.c
> new file mode 100644
> index 0000000000..20fea07625
> --- /dev/null
> +++ b/cpu-timers.c
> @@ -0,0 +1,776 @@
> +/*
> + * QEMU System Emulator
> + *
> + * Copyright (c) 2003-2008 Fabrice Bellard
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu-common.h"
> +#include "qemu/cutils.h"
> +#include "migration/vmstate.h"
> +#include "qapi/error.h"
> +#include "qemu/error-report.h"
> +#include "exec/exec-all.h"
> +#include "sysemu/cpus.h"
> +#include "sysemu/qtest.h"
> +#include "qemu/main-loop.h"
> +#include "qemu/option.h"
> +#include "qemu/seqlock.h"
> +#include "sysemu/replay.h"
> +#include "sysemu/runstate.h"
> +#include "hw/core/cpu.h"
> +#include "sysemu/cpu-timers.h"
> +#include "sysemu/cpu-throttle.h"
> +
> +typedef struct TimersState {
> +    /* Protected by BQL.  */
> +    int64_t cpu_ticks_prev;
> +    int64_t cpu_ticks_offset;
> +
> +    /*
> +     * Protect fields that can be respectively read outside the
> +     * BQL, and written from multiple threads.
> +     */
> +    QemuSeqLock vm_clock_seqlock;
> +    QemuSpin vm_clock_lock;
> +
> +    int16_t cpu_ticks_enabled;
> +
> +    /* Conversion factor from emulated instructions to virtual clock ticks.  */
> +    int16_t icount_time_shift;
> +
> +    /* Compensate for varying guest execution speed.  */
> +    int64_t qemu_icount_bias;
> +
> +    int64_t vm_clock_warp_start;
> +    int64_t cpu_clock_offset;
> +
> +    /* Only written by TCG thread */
> +    int64_t qemu_icount;
> +
> +    /* for adjusting icount */
> +    QEMUTimer *icount_rt_timer;
> +    QEMUTimer *icount_vm_timer;
> +    QEMUTimer *icount_warp_timer;
> +} TimersState;
> +
> +static TimersState timers_state;
> +
> +/*
> + * ICOUNT: Instruction Counter
> + */
> +static bool icount_sleep = true;
> +/* Arbitrarily pick 1MIPS as the minimum allowable speed.  */
> +#define MAX_ICOUNT_SHIFT 10
> +
> +/*
> + * 0 = Do not count executed instructions.
> + * 1 = Fixed conversion of insn to ns via "shift" option
> + * 2 = Runtime adaptive algorithm to compute shift
> + */
> +static int use_icount;
> +
> +int icount_enabled(void)
> +{
> +    return use_icount;
> +}
> +
> +static void icount_enable_precise(void)
> +{
> +    use_icount = 1;
> +}
> +
> +static void icount_enable_adaptive(void)
> +{
> +    use_icount = 2;
> +}
> +
> +/*
> + * The current number of executed instructions is based on what we
> + * originally budgeted minus the current state of the decrementing
> + * icount counters in extra/u16.low.
> + */
> +static int64_t icount_get_executed(CPUState *cpu)
> +{
> +    return (cpu->icount_budget -
> +            (cpu_neg(cpu)->icount_decr.u16.low + cpu->icount_extra));
> +}
> +
> +/*
> + * Update the global shared timer_state.qemu_icount to take into
> + * account executed instructions. This is done by the TCG vCPU
> + * thread so the main-loop can see time has moved forward.
> + */
> +static void icount_update_locked(CPUState *cpu)
> +{
> +    int64_t executed = icount_get_executed(cpu);
> +    cpu->icount_budget -= executed;
> +
> +    atomic_set_i64(&timers_state.qemu_icount,
> +                   timers_state.qemu_icount + executed);
> +}
> +
> +/*
> + * Update the global shared timer_state.qemu_icount to take into
> + * account executed instructions. This is done by the TCG vCPU
> + * thread so the main-loop can see time has moved forward.
> + */
> +void icount_update(CPUState *cpu)
> +{
> +    seqlock_write_lock(&timers_state.vm_clock_seqlock,
> +                       &timers_state.vm_clock_lock);
> +    icount_update_locked(cpu);
> +    seqlock_write_unlock(&timers_state.vm_clock_seqlock,
> +                         &timers_state.vm_clock_lock);
> +}
> +
> +static int64_t icount_get_raw_locked(void)
> +{
> +    CPUState *cpu = current_cpu;
> +
> +    if (cpu && cpu->running) {
> +        if (!cpu->can_do_io) {
> +            error_report("Bad icount read");
> +            exit(1);
> +        }
> +        /* Take into account what has run */
> +        icount_update_locked(cpu);
> +    }
> +    /* The read is protected by the seqlock, but needs atomic64 to avoid UB */
> +    return atomic_read_i64(&timers_state.qemu_icount);
> +}
> +
> +static int64_t icount_get_locked(void)
> +{
> +    int64_t icount = icount_get_raw_locked();
> +    return atomic_read_i64(&timers_state.qemu_icount_bias) +
> +        icount_to_ns(icount);
> +}
> +
> +int64_t icount_get_raw(void)
> +{
> +    int64_t icount;
> +    unsigned start;
> +
> +    do {
> +        start = seqlock_read_begin(&timers_state.vm_clock_seqlock);
> +        icount = icount_get_raw_locked();
> +    } while (seqlock_read_retry(&timers_state.vm_clock_seqlock, start));
> +
> +    return icount;
> +}
> +
> +/* Return the virtual CPU time, based on the instruction counter.  */
> +int64_t icount_get(void)
> +{
> +    int64_t icount;
> +    unsigned start;
> +
> +    do {
> +        start = seqlock_read_begin(&timers_state.vm_clock_seqlock);
> +        icount = icount_get_locked();
> +    } while (seqlock_read_retry(&timers_state.vm_clock_seqlock, start));
> +
> +    return icount;
> +}
> +
> +int64_t icount_to_ns(int64_t icount)
> +{
> +    return icount << atomic_read(&timers_state.icount_time_shift);
> +}
> +
> +/*
> + * Correlation between real and virtual time is always going to be
> + * fairly approximate, so ignore small variation.
> + * When the guest is idle real and virtual time will be aligned in
> + * the IO wait loop.
> + */
> +#define ICOUNT_WOBBLE (NANOSECONDS_PER_SECOND / 10)
> +
> +static int64_t cpu_get_clock_locked(void);
> +
> +static void icount_adjust(void)
> +{
> +    int64_t cur_time;
> +    int64_t cur_icount;
> +    int64_t delta;
> +
> +    /* Protected by TimersState mutex.  */
> +    static int64_t last_delta;
> +
> +    /* If the VM is not running, then do nothing.  */
> +    if (!runstate_is_running()) {
> +        return;
> +    }
> +
> +    seqlock_write_lock(&timers_state.vm_clock_seqlock,
> +                       &timers_state.vm_clock_lock);
> +    cur_time = cpu_get_clock_locked();
> +    cur_icount = icount_get_locked();
> +
> +    delta = cur_icount - cur_time;
> +    /* FIXME: This is a very crude algorithm, somewhat prone to oscillation.  */
> +    if (delta > 0
> +        && last_delta + ICOUNT_WOBBLE < delta * 2
> +        && timers_state.icount_time_shift > 0) {
> +        /* The guest is getting too far ahead.  Slow time down.  */
> +        atomic_set(&timers_state.icount_time_shift,
> +                   timers_state.icount_time_shift - 1);
> +    }
> +    if (delta < 0
> +        && last_delta - ICOUNT_WOBBLE > delta * 2
> +        && timers_state.icount_time_shift < MAX_ICOUNT_SHIFT) {
> +        /* The guest is getting too far behind.  Speed time up.  */
> +        atomic_set(&timers_state.icount_time_shift,
> +                   timers_state.icount_time_shift + 1);
> +    }
> +    last_delta = delta;
> +    atomic_set_i64(&timers_state.qemu_icount_bias,
> +                   cur_icount - (timers_state.qemu_icount
> +                                 << timers_state.icount_time_shift));
> +    seqlock_write_unlock(&timers_state.vm_clock_seqlock,
> +                         &timers_state.vm_clock_lock);
> +}
> +
> +static void icount_adjust_rt(void *opaque)
> +{
> +    timer_mod(timers_state.icount_rt_timer,
> +              qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL_RT) + 1000);
> +    icount_adjust();
> +}
> +
> +static void icount_adjust_vm(void *opaque)
> +{
> +    timer_mod(timers_state.icount_vm_timer,
> +                   qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) +
> +                   NANOSECONDS_PER_SECOND / 10);
> +    icount_adjust();
> +}
> +
> +int64_t icount_round(int64_t count)
> +{
> +    int shift = atomic_read(&timers_state.icount_time_shift);
> +    return (count + (1 << shift) - 1) >> shift;
> +}
> +
> +static void icount_warp_rt(void)
> +{
> +    unsigned seq;
> +    int64_t warp_start;
> +
> +    /*
> +     * The icount_warp_timer is rescheduled soon after vm_clock_warp_start
> +     * changes from -1 to another value, so the race here is okay.
> +     */
> +    do {
> +        seq = seqlock_read_begin(&timers_state.vm_clock_seqlock);
> +        warp_start = timers_state.vm_clock_warp_start;
> +    } while (seqlock_read_retry(&timers_state.vm_clock_seqlock, seq));
> +
> +    if (warp_start == -1) {
> +        return;
> +    }
> +
> +    seqlock_write_lock(&timers_state.vm_clock_seqlock,
> +                       &timers_state.vm_clock_lock);
> +    if (runstate_is_running()) {
> +        int64_t clock = REPLAY_CLOCK_LOCKED(REPLAY_CLOCK_VIRTUAL_RT,
> +                                            cpu_get_clock_locked());
> +        int64_t warp_delta;
> +
> +        warp_delta = clock - timers_state.vm_clock_warp_start;
> +        if (icount_enabled() == 2) {
> +            /*
> +             * In adaptive mode, do not let QEMU_CLOCK_VIRTUAL run too
> +             * far ahead of real time.
> +             */
> +            int64_t cur_icount = icount_get_locked();
> +            int64_t delta = clock - cur_icount;
> +            warp_delta = MIN(warp_delta, delta);
> +        }
> +        atomic_set_i64(&timers_state.qemu_icount_bias,
> +                       timers_state.qemu_icount_bias + warp_delta);
> +    }
> +    timers_state.vm_clock_warp_start = -1;
> +    seqlock_write_unlock(&timers_state.vm_clock_seqlock,
> +                       &timers_state.vm_clock_lock);
> +
> +    if (qemu_clock_expired(QEMU_CLOCK_VIRTUAL)) {
> +        qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
> +    }
> +}
> +
> +static void icount_timer_cb(void *opaque)
> +{
> +    /*
> +     * No need for a checkpoint because the timer already synchronizes
> +     * with CHECKPOINT_CLOCK_VIRTUAL_RT.
> +     */
> +    icount_warp_rt();
> +}
> +
> +void icount_start_warp_timer(void)
> +{
> +    int64_t clock;
> +    int64_t deadline;
> +
> +    if (!icount_enabled()) {
> +        return;
> +    }
> +
> +    /*
> +     * Nothing to do if the VM is stopped: QEMU_CLOCK_VIRTUAL timers
> +     * do not fire, so computing the deadline does not make sense.
> +     */
> +    if (!runstate_is_running()) {
> +        return;
> +    }
> +
> +    if (replay_mode != REPLAY_MODE_PLAY) {
> +        if (!all_cpu_threads_idle()) {
> +            return;
> +        }
> +
> +        if (qtest_enabled()) {
> +            /* When testing, qtest commands advance icount.  */
> +            return;
> +        }
> +
> +        replay_checkpoint(CHECKPOINT_CLOCK_WARP_START);
> +    } else {
> +        /* warp clock deterministically in record/replay mode */
> +        if (!replay_checkpoint(CHECKPOINT_CLOCK_WARP_START)) {
> +            /*
> +             * vCPU is sleeping and warp can't be started.
> +             * It is probably a race condition: notification sent
> +             * to vCPU was processed in advance and vCPU went to sleep.
> +             * Therefore we have to wake it up for doing someting.
> +             */
> +            if (replay_has_checkpoint()) {
> +                qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
> +            }
> +            return;
> +        }
> +    }
> +
> +    /* We want to use the earliest deadline from ALL vm_clocks */
> +    clock = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL_RT);
> +    deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL,
> +                                          ~QEMU_TIMER_ATTR_EXTERNAL);
> +    if (deadline < 0) {
> +        static bool notified;
> +        if (!icount_sleep && !notified) {
> +            warn_report("icount sleep disabled and no active timers");
> +            notified = true;
> +        }
> +        return;
> +    }
> +
> +    if (deadline > 0) {
> +        /*
> +         * Ensure QEMU_CLOCK_VIRTUAL proceeds even when the virtual CPU goes to
> +         * sleep.  Otherwise, the CPU might be waiting for a future timer
> +         * interrupt to wake it up, but the interrupt never comes because
> +         * the vCPU isn't running any insns and thus doesn't advance the
> +         * QEMU_CLOCK_VIRTUAL.
> +         */
> +        if (!icount_sleep) {
> +            /*
> +             * We never let VCPUs sleep in no sleep icount mode.
> +             * If there is a pending QEMU_CLOCK_VIRTUAL timer we just advance
> +             * to the next QEMU_CLOCK_VIRTUAL event and notify it.
> +             * It is useful when we want a deterministic execution time,
> +             * isolated from host latencies.
> +             */
> +            seqlock_write_lock(&timers_state.vm_clock_seqlock,
> +                               &timers_state.vm_clock_lock);
> +            atomic_set_i64(&timers_state.qemu_icount_bias,
> +                           timers_state.qemu_icount_bias + deadline);
> +            seqlock_write_unlock(&timers_state.vm_clock_seqlock,
> +                                 &timers_state.vm_clock_lock);
> +            qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
> +        } else {
> +            /*
> +             * We do stop VCPUs and only advance QEMU_CLOCK_VIRTUAL after some
> +             * "real" time, (related to the time left until the next event) has
> +             * passed. The QEMU_CLOCK_VIRTUAL_RT clock will do this.
> +             * This avoids that the warps are visible externally; for example,
> +             * you will not be sending network packets continuously instead of
> +             * every 100ms.
> +             */
> +            seqlock_write_lock(&timers_state.vm_clock_seqlock,
> +                               &timers_state.vm_clock_lock);
> +            if (timers_state.vm_clock_warp_start == -1
> +                || timers_state.vm_clock_warp_start > clock) {
> +                timers_state.vm_clock_warp_start = clock;
> +            }
> +            seqlock_write_unlock(&timers_state.vm_clock_seqlock,
> +                                 &timers_state.vm_clock_lock);
> +            timer_mod_anticipate(timers_state.icount_warp_timer,
> +                                 clock + deadline);
> +        }
> +    } else if (deadline == 0) {
> +        qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
> +    }
> +}
> +
> +void icount_account_warp_timer(void)
> +{
> +    if (!use_icount || !icount_sleep) {
> +        return;
> +    }
> +
> +    /*
> +     * Nothing to do if the VM is stopped: QEMU_CLOCK_VIRTUAL timers
> +     * do not fire, so computing the deadline does not make sense.
> +     */
> +    if (!runstate_is_running()) {
> +        return;
> +    }
> +
> +    /* warp clock deterministically in record/replay mode */
> +    if (!replay_checkpoint(CHECKPOINT_CLOCK_WARP_ACCOUNT)) {
> +        return;
> +    }
> +
> +    timer_del(timers_state.icount_warp_timer);
> +    icount_warp_rt();
> +}
> +
> +void icount_configure(QemuOpts *opts, Error **errp)
> +{
> +    const char *option = qemu_opt_get(opts, "shift");
> +    bool sleep = qemu_opt_get_bool(opts, "sleep", true);
> +    bool align = qemu_opt_get_bool(opts, "align", false);
> +    long time_shift = -1;
> +
> +    if (!option && qemu_opt_get(opts, "align")) {
> +        error_setg(errp, "Please specify shift option when using align");
> +        return;
> +    }
> +
> +    if (align && !sleep) {
> +        error_setg(errp, "align=on and sleep=off are incompatible");
> +        return;
> +    }
> +
> +    if (strcmp(option, "auto") != 0) {
> +        if (qemu_strtol(option, NULL, 0, &time_shift) < 0
> +            || time_shift < 0 || time_shift > MAX_ICOUNT_SHIFT) {
> +            error_setg(errp, "icount: Invalid shift value");
> +            return;
> +        }
> +    } else if (icount_align_option) {
> +        error_setg(errp, "shift=auto and align=on are incompatible");
> +        return;
> +    } else if (!icount_sleep) {
> +        error_setg(errp, "shift=auto and sleep=off are incompatible");
> +        return;
> +    }
> +
> +    icount_sleep = sleep;
> +    if (icount_sleep) {
> +        timers_state.icount_warp_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL_RT,
> +                                         icount_timer_cb, NULL);
> +    }
> +
> +    icount_align_option = align;
> +
> +    if (time_shift >= 0) {
> +        timers_state.icount_time_shift = time_shift;
> +        icount_enable_precise();
> +        return;
> +    }
> +
> +    icount_enable_adaptive();
> +
> +    /*
> +     * 125MIPS seems a reasonable initial guess at the guest speed.
> +     * It will be corrected fairly quickly anyway.
> +     */
> +    timers_state.icount_time_shift = 3;
> +
> +    /*
> +     * Have both realtime and virtual time triggers for speed adjustment.
> +     * The realtime trigger catches emulated time passing too slowly,
> +     * the virtual time trigger catches emulated time passing too fast.
> +     * Realtime triggers occur even when idle, so use them less frequently
> +     * than VM triggers.
> +     */
> +    timers_state.vm_clock_warp_start = -1;
> +    timers_state.icount_rt_timer = timer_new_ms(QEMU_CLOCK_VIRTUAL_RT,
> +                                   icount_adjust_rt, NULL);
> +    timer_mod(timers_state.icount_rt_timer,
> +                   qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL_RT) + 1000);
> +    timers_state.icount_vm_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
> +                                        icount_adjust_vm, NULL);
> +    timer_mod(timers_state.icount_vm_timer,
> +                   qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) +
> +                   NANOSECONDS_PER_SECOND / 10);
> +}
> +
> +/* clock and ticks */
> +
> +static int64_t cpu_get_ticks_locked(void)
> +{
> +    int64_t ticks = timers_state.cpu_ticks_offset;
> +    if (timers_state.cpu_ticks_enabled) {
> +        ticks += cpu_get_host_ticks();
> +    }
> +
> +    if (timers_state.cpu_ticks_prev > ticks) {
> +        /* Non increasing ticks may happen if the host uses software suspend. */
> +        timers_state.cpu_ticks_offset += timers_state.cpu_ticks_prev - ticks;
> +        ticks = timers_state.cpu_ticks_prev;
> +    }
> +
> +    timers_state.cpu_ticks_prev = ticks;
> +    return ticks;
> +}
> +
> +/*
> + * return the time elapsed in VM between vm_start and vm_stop.  Unless
> + * icount is active, cpu_get_ticks() uses units of the host CPU cycle
> + * counter.
> + */
> +int64_t cpu_get_ticks(void)
> +{
> +    int64_t ticks;
> +
> +    if (icount_enabled()) {
> +        return icount_get();
> +    }
> +
> +    qemu_spin_lock(&timers_state.vm_clock_lock);
> +    ticks = cpu_get_ticks_locked();
> +    qemu_spin_unlock(&timers_state.vm_clock_lock);
> +    return ticks;
> +}
> +
> +static int64_t cpu_get_clock_locked(void)
> +{
> +    int64_t time;
> +
> +    time = timers_state.cpu_clock_offset;
> +    if (timers_state.cpu_ticks_enabled) {
> +        time += get_clock();
> +    }
> +
> +    return time;
> +}
> +
> +/*
> + * Return the monotonic time elapsed in VM, i.e.,
> + * the time between vm_start and vm_stop
> + */
> +int64_t cpu_get_clock(void)
> +{
> +    int64_t ti;
> +    unsigned start;
> +
> +    do {
> +        start = seqlock_read_begin(&timers_state.vm_clock_seqlock);
> +        ti = cpu_get_clock_locked();
> +    } while (seqlock_read_retry(&timers_state.vm_clock_seqlock, start));
> +
> +    return ti;
> +}
> +
> +/*
> + * enable cpu_get_ticks()
> + * Caller must hold BQL which serves as mutex for vm_clock_seqlock.
> + */
> +void cpu_enable_ticks(void)
> +{
> +    seqlock_write_lock(&timers_state.vm_clock_seqlock,
> +                       &timers_state.vm_clock_lock);
> +    if (!timers_state.cpu_ticks_enabled) {
> +        timers_state.cpu_ticks_offset -= cpu_get_host_ticks();
> +        timers_state.cpu_clock_offset -= get_clock();
> +        timers_state.cpu_ticks_enabled = 1;
> +    }
> +    seqlock_write_unlock(&timers_state.vm_clock_seqlock,
> +                       &timers_state.vm_clock_lock);
> +}
> +
> +/*
> + * disable cpu_get_ticks() : the clock is stopped. You must not call
> + * cpu_get_ticks() after that.
> + * Caller must hold BQL which serves as mutex for vm_clock_seqlock.
> + */
> +void cpu_disable_ticks(void)
> +{
> +    seqlock_write_lock(&timers_state.vm_clock_seqlock,
> +                       &timers_state.vm_clock_lock);
> +    if (timers_state.cpu_ticks_enabled) {
> +        timers_state.cpu_ticks_offset += cpu_get_host_ticks();
> +        timers_state.cpu_clock_offset = cpu_get_clock_locked();
> +        timers_state.cpu_ticks_enabled = 0;
> +    }
> +    seqlock_write_unlock(&timers_state.vm_clock_seqlock,
> +                         &timers_state.vm_clock_lock);
> +}
> +
> +void qtest_clock_warp(int64_t dest)
> +{
> +    int64_t clock = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
> +    AioContext *aio_context;
> +    assert(qtest_enabled());
> +    aio_context = qemu_get_aio_context();
> +    while (clock < dest) {
> +        int64_t deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL,
> +                                                      QEMU_TIMER_ATTR_ALL);
> +        int64_t warp = qemu_soonest_timeout(dest - clock, deadline);
> +
> +        seqlock_write_lock(&timers_state.vm_clock_seqlock,
> +                           &timers_state.vm_clock_lock);
> +        atomic_set_i64(&timers_state.qemu_icount_bias,
> +                       timers_state.qemu_icount_bias + warp);
> +        seqlock_write_unlock(&timers_state.vm_clock_seqlock,
> +                             &timers_state.vm_clock_lock);
> +
> +        qemu_clock_run_timers(QEMU_CLOCK_VIRTUAL);
> +        timerlist_run_timers(aio_context->tlg.tl[QEMU_CLOCK_VIRTUAL]);
> +        clock = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
> +    }
> +    qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
> +}
> +
> +static bool icount_state_needed(void *opaque)
> +{
> +    return icount_enabled();
> +}
> +
> +static bool warp_timer_state_needed(void *opaque)
> +{
> +    TimersState *s = opaque;
> +    return s->icount_warp_timer != NULL;
> +}
> +
> +static bool adjust_timers_state_needed(void *opaque)
> +{
> +    TimersState *s = opaque;
> +    return s->icount_rt_timer != NULL;
> +}
> +
> +/*
> + * Subsection for warp timer migration is optional, because may not be created
> + */
> +static const VMStateDescription icount_vmstate_warp_timer = {
> +    .name = "timer/icount/warp_timer",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .needed = warp_timer_state_needed,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_INT64(vm_clock_warp_start, TimersState),
> +        VMSTATE_TIMER_PTR(icount_warp_timer, TimersState),
> +        VMSTATE_END_OF_LIST()
> +    }
> +};
> +
> +static const VMStateDescription icount_vmstate_adjust_timers = {
> +    .name = "timer/icount/timers",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .needed = adjust_timers_state_needed,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_TIMER_PTR(icount_rt_timer, TimersState),
> +        VMSTATE_TIMER_PTR(icount_vm_timer, TimersState),
> +        VMSTATE_END_OF_LIST()
> +    }
> +};
> +
> +/*
> + * This is a subsection for icount migration.
> + */
> +static const VMStateDescription icount_vmstate_timers = {
> +    .name = "timer/icount",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .needed = icount_state_needed,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_INT64(qemu_icount_bias, TimersState),
> +        VMSTATE_INT64(qemu_icount, TimersState),
> +        VMSTATE_END_OF_LIST()
> +    },
> +    .subsections = (const VMStateDescription * []) {
> +        &icount_vmstate_warp_timer,
> +        &icount_vmstate_adjust_timers,
> +        NULL
> +    }
> +};
> +
> +static const VMStateDescription vmstate_timers = {
> +    .name = "timer",
> +    .version_id = 2,
> +    .minimum_version_id = 1,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_INT64(cpu_ticks_offset, TimersState),
> +        VMSTATE_UNUSED(8),
> +        VMSTATE_INT64_V(cpu_clock_offset, TimersState, 2),
> +        VMSTATE_END_OF_LIST()
> +    },
> +    .subsections = (const VMStateDescription * []) {
> +        &icount_vmstate_timers,
> +        NULL
> +    }
> +};
> +
> +static void do_nothing(CPUState *cpu, run_on_cpu_data unused)
> +{
> +}
> +
> +void qemu_timer_notify_cb(void *opaque, QEMUClockType type)
> +{
> +    if (!icount_enabled() || type != QEMU_CLOCK_VIRTUAL) {
> +        qemu_notify_event();
> +        return;
> +    }
> +
> +    if (qemu_in_vcpu_thread()) {
> +        /*
> +         * A CPU is currently running; kick it back out to the
> +         * tcg_cpu_exec() loop so it will recalculate its
> +         * icount deadline immediately.
> +         */
> +        qemu_cpu_kick(current_cpu);
> +    } else if (first_cpu) {
> +        /*
> +         * qemu_cpu_kick is not enough to kick a halted CPU out of
> +         * qemu_tcg_wait_io_event.  async_run_on_cpu, instead,
> +         * causes cpu_thread_is_idle to return false.  This way,
> +         * handle_icount_deadline can run.
> +         * If we have no CPUs at all for some reason, we don't
> +         * need to do anything.
> +         */
> +        async_run_on_cpu(first_cpu, do_nothing, RUN_ON_CPU_NULL);
> +    }
> +}
> +
> +/* initialize this module and the cpu throttle for convenience as well */
> +void cpu_timers_init(void)
> +{
> +    seqlock_init(&timers_state.vm_clock_seqlock);
> +    qemu_spin_init(&timers_state.vm_clock_lock);
> +    vmstate_register(NULL, 0, &vmstate_timers, &timers_state);
> +
> +    cpu_throttle_init();
> +}
> diff --git a/cpus.c b/cpus.c
> index 3a46a4fc2b..7e9f545be8 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -58,11 +58,10 @@
>  #include "hw/nmi.h"
>  #include "sysemu/replay.h"
>  #include "sysemu/runstate.h"
> +#include "sysemu/cpu-timers.h"
>  #include "hw/boards.h"
>  #include "hw/hw.h"
>  
> -#include "sysemu/cpu-throttle.h"
> -
>  #ifdef CONFIG_LINUX
>  
>  #include <sys/prctl.h>
> @@ -83,9 +82,6 @@
>  
>  static QemuMutex qemu_global_mutex;
>  
> -int64_t max_delay;
> -int64_t max_advance;
> -
>  bool cpu_is_stopped(CPUState *cpu)
>  {
>      return cpu->stopped || !runstate_is_running();
> @@ -106,7 +102,7 @@ static bool cpu_thread_is_idle(CPUState *cpu)
>      return true;
>  }
>  
> -static bool all_cpu_threads_idle(void)
> +bool all_cpu_threads_idle(void)
>  {
>      CPUState *cpu;
>  
> @@ -118,668 +114,8 @@ static bool all_cpu_threads_idle(void)
>      return true;
>  }
>  
> -/***********************************************************/
> -/* guest cycle counter */
> -
> -/* Protected by TimersState seqlock */
> -
> -static bool icount_sleep = true;
> -/* Arbitrarily pick 1MIPS as the minimum allowable speed.  */
> -#define MAX_ICOUNT_SHIFT 10
> -
> -typedef struct TimersState {
> -    /* Protected by BQL.  */
> -    int64_t cpu_ticks_prev;
> -    int64_t cpu_ticks_offset;
> -
> -    /* Protect fields that can be respectively read outside the
> -     * BQL, and written from multiple threads.
> -     */
> -    QemuSeqLock vm_clock_seqlock;
> -    QemuSpin vm_clock_lock;
> -
> -    int16_t cpu_ticks_enabled;
> -
> -    /* Conversion factor from emulated instructions to virtual clock ticks.  */
> -    int16_t icount_time_shift;
> -
> -    /* Compensate for varying guest execution speed.  */
> -    int64_t qemu_icount_bias;
> -
> -    int64_t vm_clock_warp_start;
> -    int64_t cpu_clock_offset;
> -
> -    /* Only written by TCG thread */
> -    int64_t qemu_icount;
> -
> -    /* for adjusting icount */
> -    QEMUTimer *icount_rt_timer;
> -    QEMUTimer *icount_vm_timer;
> -    QEMUTimer *icount_warp_timer;
> -} TimersState;
> -
> -static TimersState timers_state;
>  bool mttcg_enabled;
>  
> -
> -/* The current number of executed instructions is based on what we
> - * originally budgeted minus the current state of the decrementing
> - * icount counters in extra/u16.low.
> - */
> -static int64_t cpu_get_icount_executed(CPUState *cpu)
> -{
> -    return (cpu->icount_budget -
> -            (cpu_neg(cpu)->icount_decr.u16.low + cpu->icount_extra));
> -}
> -
> -/*
> - * Update the global shared timer_state.qemu_icount to take into
> - * account executed instructions. This is done by the TCG vCPU
> - * thread so the main-loop can see time has moved forward.
> - */
> -static void cpu_update_icount_locked(CPUState *cpu)
> -{
> -    int64_t executed = cpu_get_icount_executed(cpu);
> -    cpu->icount_budget -= executed;
> -
> -    atomic_set_i64(&timers_state.qemu_icount,
> -                   timers_state.qemu_icount + executed);
> -}
> -
> -/*
> - * Update the global shared timer_state.qemu_icount to take into
> - * account executed instructions. This is done by the TCG vCPU
> - * thread so the main-loop can see time has moved forward.
> - */
> -void cpu_update_icount(CPUState *cpu)
> -{
> -    seqlock_write_lock(&timers_state.vm_clock_seqlock,
> -                       &timers_state.vm_clock_lock);
> -    cpu_update_icount_locked(cpu);
> -    seqlock_write_unlock(&timers_state.vm_clock_seqlock,
> -                         &timers_state.vm_clock_lock);
> -}
> -
> -static int64_t cpu_get_icount_raw_locked(void)
> -{
> -    CPUState *cpu = current_cpu;
> -
> -    if (cpu && cpu->running) {
> -        if (!cpu->can_do_io) {
> -            error_report("Bad icount read");
> -            exit(1);
> -        }
> -        /* Take into account what has run */
> -        cpu_update_icount_locked(cpu);
> -    }
> -    /* The read is protected by the seqlock, but needs atomic64 to avoid UB */
> -    return atomic_read_i64(&timers_state.qemu_icount);
> -}
> -
> -static int64_t cpu_get_icount_locked(void)
> -{
> -    int64_t icount = cpu_get_icount_raw_locked();
> -    return atomic_read_i64(&timers_state.qemu_icount_bias) +
> -        cpu_icount_to_ns(icount);
> -}
> -
> -int64_t cpu_get_icount_raw(void)
> -{
> -    int64_t icount;
> -    unsigned start;
> -
> -    do {
> -        start = seqlock_read_begin(&timers_state.vm_clock_seqlock);
> -        icount = cpu_get_icount_raw_locked();
> -    } while (seqlock_read_retry(&timers_state.vm_clock_seqlock, start));
> -
> -    return icount;
> -}
> -
> -/* Return the virtual CPU time, based on the instruction counter.  */
> -int64_t cpu_get_icount(void)
> -{
> -    int64_t icount;
> -    unsigned start;
> -
> -    do {
> -        start = seqlock_read_begin(&timers_state.vm_clock_seqlock);
> -        icount = cpu_get_icount_locked();
> -    } while (seqlock_read_retry(&timers_state.vm_clock_seqlock, start));
> -
> -    return icount;
> -}
> -
> -int64_t cpu_icount_to_ns(int64_t icount)
> -{
> -    return icount << atomic_read(&timers_state.icount_time_shift);
> -}
> -
> -static int64_t cpu_get_ticks_locked(void)
> -{
> -    int64_t ticks = timers_state.cpu_ticks_offset;
> -    if (timers_state.cpu_ticks_enabled) {
> -        ticks += cpu_get_host_ticks();
> -    }
> -
> -    if (timers_state.cpu_ticks_prev > ticks) {
> -        /* Non increasing ticks may happen if the host uses software suspend.  */
> -        timers_state.cpu_ticks_offset += timers_state.cpu_ticks_prev - ticks;
> -        ticks = timers_state.cpu_ticks_prev;
> -    }
> -
> -    timers_state.cpu_ticks_prev = ticks;
> -    return ticks;
> -}
> -
> -/* return the time elapsed in VM between vm_start and vm_stop.  Unless
> - * icount is active, cpu_get_ticks() uses units of the host CPU cycle
> - * counter.
> - */
> -int64_t cpu_get_ticks(void)
> -{
> -    int64_t ticks;
> -
> -    if (use_icount) {
> -        return cpu_get_icount();
> -    }
> -
> -    qemu_spin_lock(&timers_state.vm_clock_lock);
> -    ticks = cpu_get_ticks_locked();
> -    qemu_spin_unlock(&timers_state.vm_clock_lock);
> -    return ticks;
> -}
> -
> -static int64_t cpu_get_clock_locked(void)
> -{
> -    int64_t time;
> -
> -    time = timers_state.cpu_clock_offset;
> -    if (timers_state.cpu_ticks_enabled) {
> -        time += get_clock();
> -    }
> -
> -    return time;
> -}
> -
> -/* Return the monotonic time elapsed in VM, i.e.,
> - * the time between vm_start and vm_stop
> - */
> -int64_t cpu_get_clock(void)
> -{
> -    int64_t ti;
> -    unsigned start;
> -
> -    do {
> -        start = seqlock_read_begin(&timers_state.vm_clock_seqlock);
> -        ti = cpu_get_clock_locked();
> -    } while (seqlock_read_retry(&timers_state.vm_clock_seqlock, start));
> -
> -    return ti;
> -}
> -
> -/* enable cpu_get_ticks()
> - * Caller must hold BQL which serves as mutex for vm_clock_seqlock.
> - */
> -void cpu_enable_ticks(void)
> -{
> -    seqlock_write_lock(&timers_state.vm_clock_seqlock,
> -                       &timers_state.vm_clock_lock);
> -    if (!timers_state.cpu_ticks_enabled) {
> -        timers_state.cpu_ticks_offset -= cpu_get_host_ticks();
> -        timers_state.cpu_clock_offset -= get_clock();
> -        timers_state.cpu_ticks_enabled = 1;
> -    }
> -    seqlock_write_unlock(&timers_state.vm_clock_seqlock,
> -                       &timers_state.vm_clock_lock);
> -}
> -
> -/* disable cpu_get_ticks() : the clock is stopped. You must not call
> - * cpu_get_ticks() after that.
> - * Caller must hold BQL which serves as mutex for vm_clock_seqlock.
> - */
> -void cpu_disable_ticks(void)
> -{
> -    seqlock_write_lock(&timers_state.vm_clock_seqlock,
> -                       &timers_state.vm_clock_lock);
> -    if (timers_state.cpu_ticks_enabled) {
> -        timers_state.cpu_ticks_offset += cpu_get_host_ticks();
> -        timers_state.cpu_clock_offset = cpu_get_clock_locked();
> -        timers_state.cpu_ticks_enabled = 0;
> -    }
> -    seqlock_write_unlock(&timers_state.vm_clock_seqlock,
> -                         &timers_state.vm_clock_lock);
> -}
> -
> -/* Correlation between real and virtual time is always going to be
> -   fairly approximate, so ignore small variation.
> -   When the guest is idle real and virtual time will be aligned in
> -   the IO wait loop.  */
> -#define ICOUNT_WOBBLE (NANOSECONDS_PER_SECOND / 10)
> -
> -static void icount_adjust(void)
> -{
> -    int64_t cur_time;
> -    int64_t cur_icount;
> -    int64_t delta;
> -
> -    /* Protected by TimersState mutex.  */
> -    static int64_t last_delta;
> -
> -    /* If the VM is not running, then do nothing.  */
> -    if (!runstate_is_running()) {
> -        return;
> -    }
> -
> -    seqlock_write_lock(&timers_state.vm_clock_seqlock,
> -                       &timers_state.vm_clock_lock);
> -    cur_time = cpu_get_clock_locked();
> -    cur_icount = cpu_get_icount_locked();
> -
> -    delta = cur_icount - cur_time;
> -    /* FIXME: This is a very crude algorithm, somewhat prone to oscillation.  */
> -    if (delta > 0
> -        && last_delta + ICOUNT_WOBBLE < delta * 2
> -        && timers_state.icount_time_shift > 0) {
> -        /* The guest is getting too far ahead.  Slow time down.  */
> -        atomic_set(&timers_state.icount_time_shift,
> -                   timers_state.icount_time_shift - 1);
> -    }
> -    if (delta < 0
> -        && last_delta - ICOUNT_WOBBLE > delta * 2
> -        && timers_state.icount_time_shift < MAX_ICOUNT_SHIFT) {
> -        /* The guest is getting too far behind.  Speed time up.  */
> -        atomic_set(&timers_state.icount_time_shift,
> -                   timers_state.icount_time_shift + 1);
> -    }
> -    last_delta = delta;
> -    atomic_set_i64(&timers_state.qemu_icount_bias,
> -                   cur_icount - (timers_state.qemu_icount
> -                                 << timers_state.icount_time_shift));
> -    seqlock_write_unlock(&timers_state.vm_clock_seqlock,
> -                         &timers_state.vm_clock_lock);
> -}
> -
> -static void icount_adjust_rt(void *opaque)
> -{
> -    timer_mod(timers_state.icount_rt_timer,
> -              qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL_RT) + 1000);
> -    icount_adjust();
> -}
> -
> -static void icount_adjust_vm(void *opaque)
> -{
> -    timer_mod(timers_state.icount_vm_timer,
> -                   qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) +
> -                   NANOSECONDS_PER_SECOND / 10);
> -    icount_adjust();
> -}
> -
> -static int64_t qemu_icount_round(int64_t count)
> -{
> -    int shift = atomic_read(&timers_state.icount_time_shift);
> -    return (count + (1 << shift) - 1) >> shift;
> -}
> -
> -static void icount_warp_rt(void)
> -{
> -    unsigned seq;
> -    int64_t warp_start;
> -
> -    /* The icount_warp_timer is rescheduled soon after vm_clock_warp_start
> -     * changes from -1 to another value, so the race here is okay.
> -     */
> -    do {
> -        seq = seqlock_read_begin(&timers_state.vm_clock_seqlock);
> -        warp_start = timers_state.vm_clock_warp_start;
> -    } while (seqlock_read_retry(&timers_state.vm_clock_seqlock, seq));
> -
> -    if (warp_start == -1) {
> -        return;
> -    }
> -
> -    seqlock_write_lock(&timers_state.vm_clock_seqlock,
> -                       &timers_state.vm_clock_lock);
> -    if (runstate_is_running()) {
> -        int64_t clock = REPLAY_CLOCK_LOCKED(REPLAY_CLOCK_VIRTUAL_RT,
> -                                            cpu_get_clock_locked());
> -        int64_t warp_delta;
> -
> -        warp_delta = clock - timers_state.vm_clock_warp_start;
> -        if (use_icount == 2) {
> -            /*
> -             * In adaptive mode, do not let QEMU_CLOCK_VIRTUAL run too
> -             * far ahead of real time.
> -             */
> -            int64_t cur_icount = cpu_get_icount_locked();
> -            int64_t delta = clock - cur_icount;
> -            warp_delta = MIN(warp_delta, delta);
> -        }
> -        atomic_set_i64(&timers_state.qemu_icount_bias,
> -                       timers_state.qemu_icount_bias + warp_delta);
> -    }
> -    timers_state.vm_clock_warp_start = -1;
> -    seqlock_write_unlock(&timers_state.vm_clock_seqlock,
> -                       &timers_state.vm_clock_lock);
> -
> -    if (qemu_clock_expired(QEMU_CLOCK_VIRTUAL)) {
> -        qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
> -    }
> -}
> -
> -static void icount_timer_cb(void *opaque)
> -{
> -    /* No need for a checkpoint because the timer already synchronizes
> -     * with CHECKPOINT_CLOCK_VIRTUAL_RT.
> -     */
> -    icount_warp_rt();
> -}
> -
> -void qtest_clock_warp(int64_t dest)
> -{
> -    int64_t clock = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
> -    AioContext *aio_context;
> -    assert(qtest_enabled());
> -    aio_context = qemu_get_aio_context();
> -    while (clock < dest) {
> -        int64_t deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL,
> -                                                      QEMU_TIMER_ATTR_ALL);
> -        int64_t warp = qemu_soonest_timeout(dest - clock, deadline);
> -
> -        seqlock_write_lock(&timers_state.vm_clock_seqlock,
> -                           &timers_state.vm_clock_lock);
> -        atomic_set_i64(&timers_state.qemu_icount_bias,
> -                       timers_state.qemu_icount_bias + warp);
> -        seqlock_write_unlock(&timers_state.vm_clock_seqlock,
> -                             &timers_state.vm_clock_lock);
> -
> -        qemu_clock_run_timers(QEMU_CLOCK_VIRTUAL);
> -        timerlist_run_timers(aio_context->tlg.tl[QEMU_CLOCK_VIRTUAL]);
> -        clock = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
> -    }
> -    qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
> -}
> -
> -void qemu_start_warp_timer(void)
> -{
> -    int64_t clock;
> -    int64_t deadline;
> -
> -    if (!use_icount) {
> -        return;
> -    }
> -
> -    /* Nothing to do if the VM is stopped: QEMU_CLOCK_VIRTUAL timers
> -     * do not fire, so computing the deadline does not make sense.
> -     */
> -    if (!runstate_is_running()) {
> -        return;
> -    }
> -
> -    if (replay_mode != REPLAY_MODE_PLAY) {
> -        if (!all_cpu_threads_idle()) {
> -            return;
> -        }
> -
> -        if (qtest_enabled()) {
> -            /* When testing, qtest commands advance icount.  */
> -            return;
> -        }
> -
> -        replay_checkpoint(CHECKPOINT_CLOCK_WARP_START);
> -    } else {
> -        /* warp clock deterministically in record/replay mode */
> -        if (!replay_checkpoint(CHECKPOINT_CLOCK_WARP_START)) {
> -            /* vCPU is sleeping and warp can't be started.
> -               It is probably a race condition: notification sent
> -               to vCPU was processed in advance and vCPU went to sleep.
> -               Therefore we have to wake it up for doing someting. */
> -            if (replay_has_checkpoint()) {
> -                qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
> -            }
> -            return;
> -        }
> -    }
> -
> -    /* We want to use the earliest deadline from ALL vm_clocks */
> -    clock = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL_RT);
> -    deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL,
> -                                          ~QEMU_TIMER_ATTR_EXTERNAL);
> -    if (deadline < 0) {
> -        static bool notified;
> -        if (!icount_sleep && !notified) {
> -            warn_report("icount sleep disabled and no active timers");
> -            notified = true;
> -        }
> -        return;
> -    }
> -
> -    if (deadline > 0) {
> -        /*
> -         * Ensure QEMU_CLOCK_VIRTUAL proceeds even when the virtual CPU goes to
> -         * sleep.  Otherwise, the CPU might be waiting for a future timer
> -         * interrupt to wake it up, but the interrupt never comes because
> -         * the vCPU isn't running any insns and thus doesn't advance the
> -         * QEMU_CLOCK_VIRTUAL.
> -         */
> -        if (!icount_sleep) {
> -            /*
> -             * We never let VCPUs sleep in no sleep icount mode.
> -             * If there is a pending QEMU_CLOCK_VIRTUAL timer we just advance
> -             * to the next QEMU_CLOCK_VIRTUAL event and notify it.
> -             * It is useful when we want a deterministic execution time,
> -             * isolated from host latencies.
> -             */
> -            seqlock_write_lock(&timers_state.vm_clock_seqlock,
> -                               &timers_state.vm_clock_lock);
> -            atomic_set_i64(&timers_state.qemu_icount_bias,
> -                           timers_state.qemu_icount_bias + deadline);
> -            seqlock_write_unlock(&timers_state.vm_clock_seqlock,
> -                                 &timers_state.vm_clock_lock);
> -            qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
> -        } else {
> -            /*
> -             * We do stop VCPUs and only advance QEMU_CLOCK_VIRTUAL after some
> -             * "real" time, (related to the time left until the next event) has
> -             * passed. The QEMU_CLOCK_VIRTUAL_RT clock will do this.
> -             * This avoids that the warps are visible externally; for example,
> -             * you will not be sending network packets continuously instead of
> -             * every 100ms.
> -             */
> -            seqlock_write_lock(&timers_state.vm_clock_seqlock,
> -                               &timers_state.vm_clock_lock);
> -            if (timers_state.vm_clock_warp_start == -1
> -                || timers_state.vm_clock_warp_start > clock) {
> -                timers_state.vm_clock_warp_start = clock;
> -            }
> -            seqlock_write_unlock(&timers_state.vm_clock_seqlock,
> -                                 &timers_state.vm_clock_lock);
> -            timer_mod_anticipate(timers_state.icount_warp_timer,
> -                                 clock + deadline);
> -        }
> -    } else if (deadline == 0) {
> -        qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
> -    }
> -}
> -
> -static void qemu_account_warp_timer(void)
> -{
> -    if (!use_icount || !icount_sleep) {
> -        return;
> -    }
> -
> -    /* Nothing to do if the VM is stopped: QEMU_CLOCK_VIRTUAL timers
> -     * do not fire, so computing the deadline does not make sense.
> -     */
> -    if (!runstate_is_running()) {
> -        return;
> -    }
> -
> -    /* warp clock deterministically in record/replay mode */
> -    if (!replay_checkpoint(CHECKPOINT_CLOCK_WARP_ACCOUNT)) {
> -        return;
> -    }
> -
> -    timer_del(timers_state.icount_warp_timer);
> -    icount_warp_rt();
> -}
> -
> -static bool icount_state_needed(void *opaque)
> -{
> -    return use_icount;
> -}
> -
> -static bool warp_timer_state_needed(void *opaque)
> -{
> -    TimersState *s = opaque;
> -    return s->icount_warp_timer != NULL;
> -}
> -
> -static bool adjust_timers_state_needed(void *opaque)
> -{
> -    TimersState *s = opaque;
> -    return s->icount_rt_timer != NULL;
> -}
> -
> -/*
> - * Subsection for warp timer migration is optional, because may not be created
> - */
> -static const VMStateDescription icount_vmstate_warp_timer = {
> -    .name = "timer/icount/warp_timer",
> -    .version_id = 1,
> -    .minimum_version_id = 1,
> -    .needed = warp_timer_state_needed,
> -    .fields = (VMStateField[]) {
> -        VMSTATE_INT64(vm_clock_warp_start, TimersState),
> -        VMSTATE_TIMER_PTR(icount_warp_timer, TimersState),
> -        VMSTATE_END_OF_LIST()
> -    }
> -};
> -
> -static const VMStateDescription icount_vmstate_adjust_timers = {
> -    .name = "timer/icount/timers",
> -    .version_id = 1,
> -    .minimum_version_id = 1,
> -    .needed = adjust_timers_state_needed,
> -    .fields = (VMStateField[]) {
> -        VMSTATE_TIMER_PTR(icount_rt_timer, TimersState),
> -        VMSTATE_TIMER_PTR(icount_vm_timer, TimersState),
> -        VMSTATE_END_OF_LIST()
> -    }
> -};
> -
> -/*
> - * This is a subsection for icount migration.
> - */
> -static const VMStateDescription icount_vmstate_timers = {
> -    .name = "timer/icount",
> -    .version_id = 1,
> -    .minimum_version_id = 1,
> -    .needed = icount_state_needed,
> -    .fields = (VMStateField[]) {
> -        VMSTATE_INT64(qemu_icount_bias, TimersState),
> -        VMSTATE_INT64(qemu_icount, TimersState),
> -        VMSTATE_END_OF_LIST()
> -    },
> -    .subsections = (const VMStateDescription*[]) {
> -        &icount_vmstate_warp_timer,
> -        &icount_vmstate_adjust_timers,
> -        NULL
> -    }
> -};
> -
> -static const VMStateDescription vmstate_timers = {
> -    .name = "timer",
> -    .version_id = 2,
> -    .minimum_version_id = 1,
> -    .fields = (VMStateField[]) {
> -        VMSTATE_INT64(cpu_ticks_offset, TimersState),
> -        VMSTATE_UNUSED(8),
> -        VMSTATE_INT64_V(cpu_clock_offset, TimersState, 2),
> -        VMSTATE_END_OF_LIST()
> -    },
> -    .subsections = (const VMStateDescription*[]) {
> -        &icount_vmstate_timers,
> -        NULL
> -    }
> -};
> -
> -void cpu_ticks_init(void)
> -{
> -    seqlock_init(&timers_state.vm_clock_seqlock);
> -    qemu_spin_init(&timers_state.vm_clock_lock);
> -    vmstate_register(NULL, 0, &vmstate_timers, &timers_state);
> -    cpu_throttle_init();
> -}
> -
> -void configure_icount(QemuOpts *opts, Error **errp)
> -{
> -    const char *option = qemu_opt_get(opts, "shift");
> -    bool sleep = qemu_opt_get_bool(opts, "sleep", true);
> -    bool align = qemu_opt_get_bool(opts, "align", false);
> -    long time_shift = -1;
> -
> -    if (!option && qemu_opt_get(opts, "align")) {
> -        error_setg(errp, "Please specify shift option when using align");
> -        return;
> -    }
> -
> -    if (align && !sleep) {
> -        error_setg(errp, "align=on and sleep=off are incompatible");
> -        return;
> -    }
> -
> -    if (strcmp(option, "auto") != 0) {
> -        if (qemu_strtol(option, NULL, 0, &time_shift) < 0
> -            || time_shift < 0 || time_shift > MAX_ICOUNT_SHIFT) {
> -            error_setg(errp, "icount: Invalid shift value");
> -            return;
> -        }
> -    } else if (icount_align_option) {
> -        error_setg(errp, "shift=auto and align=on are incompatible");
> -        return;
> -    } else if (!icount_sleep) {
> -        error_setg(errp, "shift=auto and sleep=off are incompatible");
> -        return;
> -    }
> -
> -    icount_sleep = sleep;
> -    if (icount_sleep) {
> -        timers_state.icount_warp_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL_RT,
> -                                         icount_timer_cb, NULL);
> -    }
> -
> -    icount_align_option = align;
> -
> -    if (time_shift >= 0) {
> -        timers_state.icount_time_shift = time_shift;
> -        use_icount = 1;
> -        return;
> -    }
> -
> -    use_icount = 2;
> -
> -    /* 125MIPS seems a reasonable initial guess at the guest speed.
> -       It will be corrected fairly quickly anyway.  */
> -    timers_state.icount_time_shift = 3;
> -
> -    /* Have both realtime and virtual time triggers for speed adjustment.
> -       The realtime trigger catches emulated time passing too slowly,
> -       the virtual time trigger catches emulated time passing too fast.
> -       Realtime triggers occur even when idle, so use them less frequently
> -       than VM triggers.  */
> -    timers_state.vm_clock_warp_start = -1;
> -    timers_state.icount_rt_timer = timer_new_ms(QEMU_CLOCK_VIRTUAL_RT,
> -                                   icount_adjust_rt, NULL);
> -    timer_mod(timers_state.icount_rt_timer,
> -                   qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL_RT) + 1000);
> -    timers_state.icount_vm_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
> -                                        icount_adjust_vm, NULL);
> -    timer_mod(timers_state.icount_vm_timer,
> -                   qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) +
> -                   NANOSECONDS_PER_SECOND / 10);
> -}
> -
>  /***********************************************************/
>  /* TCG vCPU kick timer
>   *
> @@ -824,35 +160,6 @@ static void qemu_cpu_kick_rr_cpus(void)
>      };
>  }
>  
> -static void do_nothing(CPUState *cpu, run_on_cpu_data unused)
> -{
> -}
> -
> -void qemu_timer_notify_cb(void *opaque, QEMUClockType type)
> -{
> -    if (!use_icount || type != QEMU_CLOCK_VIRTUAL) {
> -        qemu_notify_event();
> -        return;
> -    }
> -
> -    if (qemu_in_vcpu_thread()) {
> -        /* A CPU is currently running; kick it back out to the
> -         * tcg_cpu_exec() loop so it will recalculate its
> -         * icount deadline immediately.
> -         */
> -        qemu_cpu_kick(current_cpu);
> -    } else if (first_cpu) {
> -        /* qemu_cpu_kick is not enough to kick a halted CPU out of
> -         * qemu_tcg_wait_io_event.  async_run_on_cpu, instead,
> -         * causes cpu_thread_is_idle to return false.  This way,
> -         * handle_icount_deadline can run.
> -         * If we have no CPUs at all for some reason, we don't
> -         * need to do anything.
> -         */
> -        async_run_on_cpu(first_cpu, do_nothing, RUN_ON_CPU_NULL);
> -    }
> -}
> -
>  static void kick_tcg_thread(void *opaque)
>  {
>      timer_mod(tcg_kick_vcpu_timer, qemu_tcg_next_kick());
> @@ -1254,7 +561,7 @@ static int64_t tcg_get_icount_limit(void)
>              deadline = INT32_MAX;
>          }
>  
> -        return qemu_icount_round(deadline);
> +        return icount_round(deadline);
>      } else {
>          return replay_get_instructions();
>      }
> @@ -1263,7 +570,7 @@ static int64_t tcg_get_icount_limit(void)
>  static void handle_icount_deadline(void)
>  {
>      assert(qemu_in_vcpu_thread());
> -    if (use_icount) {
> +    if (icount_enabled()) {
>          int64_t deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL,
>                                                        QEMU_TIMER_ATTR_ALL);
>  
> @@ -1277,7 +584,7 @@ static void handle_icount_deadline(void)
>  
>  static void prepare_icount_for_run(CPUState *cpu)
>  {
> -    if (use_icount) {
> +    if (icount_enabled()) {
>          int insns_left;
>  
>          /* These should always be cleared by process_icount_data after
> @@ -1298,9 +605,9 @@ static void prepare_icount_for_run(CPUState *cpu)
>  
>  static void process_icount_data(CPUState *cpu)
>  {
> -    if (use_icount) {
> +    if (icount_enabled()) {
>          /* Account for executed instructions */
> -        cpu_update_icount(cpu);
> +        icount_update(cpu);
>  
>          /* Reset the counters */
>          cpu_neg(cpu)->icount_decr.u16.low = 0;
> @@ -1401,7 +708,7 @@ static void *qemu_tcg_rr_cpu_thread_fn(void *arg)
>          replay_mutex_lock();
>          qemu_mutex_lock_iothread();
>          /* Account partial waits to QEMU_CLOCK_VIRTUAL.  */
> -        qemu_account_warp_timer();
> +        icount_account_warp_timer();
>  
>          /* Run the timers here.  This is much more efficient than
>           * waking up the I/O thread and waiting for completion.
> @@ -1459,7 +766,7 @@ static void *qemu_tcg_rr_cpu_thread_fn(void *arg)
>              atomic_mb_set(&cpu->exit_request, 0);
>          }
>  
> -        if (use_icount && all_cpu_threads_idle()) {
> +        if (icount_enabled() && all_cpu_threads_idle()) {
>              /*
>               * When all cpus are sleeping (e.g in WFI), to avoid a deadlock
>               * in the main_loop, wake it up in order to start the warp timer.
> @@ -1612,7 +919,7 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
>      CPUState *cpu = arg;
>  
>      assert(tcg_enabled());
> -    g_assert(!use_icount);
> +    g_assert(!icount_enabled());
>  
>      rcu_register_thread();
>      tcg_register_thread();
> @@ -2191,21 +1498,3 @@ void qmp_inject_nmi(Error **errp)
>      nmi_monitor_handle(monitor_get_cpu_index(), errp);
>  }
>  
> -void dump_drift_info(void)
> -{
> -    if (!use_icount) {
> -        return;
> -    }
> -
> -    qemu_printf("Host - Guest clock  %"PRIi64" ms\n",
> -                (cpu_get_clock() - cpu_get_icount())/SCALE_MS);
> -    if (icount_align_option) {
> -        qemu_printf("Max guest delay     %"PRIi64" ms\n",
> -                    -max_delay / SCALE_MS);
> -        qemu_printf("Max guest advance   %"PRIi64" ms\n",
> -                    max_advance / SCALE_MS);
> -    } else {
> -        qemu_printf("Max guest delay     NA\n");
> -        qemu_printf("Max guest advance   NA\n");
> -    }
> -}
> diff --git a/docs/replay.txt b/docs/replay.txt
> index 70c27edb36..8952e6d852 100644
> --- a/docs/replay.txt
> +++ b/docs/replay.txt
> @@ -184,11 +184,11 @@ is then incremented (which is called "warping" the virtual clock) as
>  soon as the timer fires or the CPUs need to go out of the idle state.
>  Two functions are used for this purpose; because these actions change
>  virtual machine state and must be deterministic, each of them creates a
> -checkpoint.  qemu_start_warp_timer checks if the CPUs are idle and if so
> -starts accounting real time to virtual clock.  qemu_account_warp_timer
> +checkpoint.  icount_start_warp_timer checks if the CPUs are idle and if so
> +starts accounting real time to virtual clock.  icount_account_warp_timer
>  is called when the CPUs get an interrupt or when the warp timer fires,
>  and it warps the virtual clock by the amount of real time that has passed
> -since qemu_start_warp_timer.
> +since icount_start_warp_timer.
>  
>  Bottom halves
>  -------------
> diff --git a/exec.c b/exec.c
> index 5162f0d12f..db9a90469b 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -104,10 +104,6 @@ uintptr_t qemu_host_page_size;
>  intptr_t qemu_host_page_mask;
>  
>  #if !defined(CONFIG_USER_ONLY)
> -/* 0 = Do not count executed instructions.
> -   1 = Precise instruction counting.
> -   2 = Adaptive rate instruction counting.  */
> -int use_icount;
>  
>  typedef struct PhysPageEntry PhysPageEntry;
>  
> diff --git a/hw/core/ptimer.c b/hw/core/ptimer.c
> index b5a54e2536..6c9f33208a 100644
> --- a/hw/core/ptimer.c
> +++ b/hw/core/ptimer.c
> @@ -7,11 +7,11 @@
>   */
>  
>  #include "qemu/osdep.h"
> -#include "qemu/timer.h"
>  #include "hw/ptimer.h"
>  #include "migration/vmstate.h"
>  #include "qemu/host-utils.h"
>  #include "sysemu/replay.h"
> +#include "sysemu/cpu-timers.h"
>  #include "sysemu/qtest.h"
>  #include "block/aio.h"
>  #include "sysemu/cpus.h"
> @@ -134,7 +134,7 @@ static void ptimer_reload(ptimer_state *s, int delta_adjust)
>       * on the current generation of host machines.
>       */
>  
> -    if (s->enabled == 1 && (delta * period < 10000) && !use_icount) {
> +    if (s->enabled == 1 && (delta * period < 10000) && !icount_enabled()) {
>          period = 10000 / delta;
>          period_frac = 0;
>      }
> @@ -217,7 +217,7 @@ uint64_t ptimer_get_count(ptimer_state *s)
>              uint32_t period_frac = s->period_frac;
>              uint64_t period = s->period;
>  
> -            if (!oneshot && (s->delta * period < 10000) && !use_icount) {
> +            if (!oneshot && (s->delta * period < 10000) && !icount_enabled()) {
>                  period = 10000 / s->delta;
>                  period_frac = 0;
>              }
> diff --git a/hw/i386/x86.c b/hw/i386/x86.c
> index 7a3bc7ab66..002b3cabc2 100644
> --- a/hw/i386/x86.c
> +++ b/hw/i386/x86.c
> @@ -34,6 +34,7 @@
>  #include "sysemu/numa.h"
>  #include "sysemu/replay.h"
>  #include "sysemu/sysemu.h"
> +#include "sysemu/cpu-timers.h"
>  #include "trace.h"
>  
>  #include "hw/i386/x86.h"
> diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
> index d14374bdd4..49eedd714d 100644
> --- a/include/exec/cpu-all.h
> +++ b/include/exec/cpu-all.h
> @@ -409,8 +409,12 @@ static inline bool tlb_hit(target_ulong tlb_addr, target_ulong addr)
>      return tlb_hit_page(tlb_addr, addr & TARGET_PAGE_MASK);
>  }
>  
> +#ifdef CONFIG_TCG
> +void dump_drift_info(void);
>  void dump_exec_info(void);
>  void dump_opcount_info(void);
> +#endif /* CONFIG_TCG */
> +
>  #endif /* !CONFIG_USER_ONLY */
>  
>  int cpu_memory_rw_debug(CPUState *cpu, target_ulong addr,
> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
> index 8792bea07a..c1f51e37af 100644
> --- a/include/exec/exec-all.h
> +++ b/include/exec/exec-all.h
> @@ -25,7 +25,7 @@
>  #ifdef CONFIG_TCG
>  #include "exec/cpu_ldst.h"
>  #endif
> -#include "sysemu/cpus.h"
> +#include "sysemu/cpu-timers.h"
>  
>  /* allow to see translation results - the slowdown should be negligible, so we leave it */
>  #define DEBUG_DISAS
> @@ -489,7 +489,7 @@ static inline uint32_t tb_cflags(const TranslationBlock *tb)
>  static inline uint32_t curr_cflags(void)
>  {
>      return (parallel_cpus ? CF_PARALLEL : 0)
> -         | (use_icount ? CF_USE_ICOUNT : 0);
> +         | (icount_enabled() ? CF_USE_ICOUNT : 0);
>  }
>  
>  /* TranslationBlock invalidate API */
> diff --git a/include/qemu/timer.h b/include/qemu/timer.h
> index 6a8b48b5a9..c54b1b2813 100644
> --- a/include/qemu/timer.h
> +++ b/include/qemu/timer.h
> @@ -224,13 +224,6 @@ void qemu_clock_notify(QEMUClockType type);
>   */
>  void qemu_clock_enable(QEMUClockType type, bool enabled);
>  
> -/**
> - * qemu_start_warp_timer:
> - *
> - * Starts a timer for virtual clock update
> - */
> -void qemu_start_warp_timer(void);
> -
>  /**
>   * qemu_clock_run_timers:
>   * @type: clock on which to operate
> @@ -791,12 +784,6 @@ static inline int64_t qemu_soonest_timeout(int64_t timeout1, int64_t timeout2)
>   */
>  void init_clocks(QEMUTimerListNotifyCB *notify_cb);
>  
> -int64_t cpu_get_ticks(void);
> -/* Caller must hold BQL */
> -void cpu_enable_ticks(void);
> -/* Caller must hold BQL */
> -void cpu_disable_ticks(void);
> -
>  static inline int64_t get_max_clock_jump(void)
>  {
>      /* This should be small enough to prevent excessive interrupts from being
> @@ -850,13 +837,6 @@ static inline int64_t get_clock(void)
>  }
>  #endif
>  
> -/* icount */
> -int64_t cpu_get_icount_raw(void);
> -int64_t cpu_get_icount(void);
> -int64_t cpu_get_clock(void);
> -int64_t cpu_icount_to_ns(int64_t icount);
> -void    cpu_update_icount(CPUState *cpu);
> -
>  /*******************************************/
>  /* host CPU ticks (if available) */
>  
> diff --git a/include/sysemu/cpu-timers.h b/include/sysemu/cpu-timers.h
> new file mode 100644
> index 0000000000..3db579fde7
> --- /dev/null
> +++ b/include/sysemu/cpu-timers.h
> @@ -0,0 +1,73 @@
> +#ifndef SYSEMU_CPU_TIMERS_H
> +#define SYSEMU_CPU_TIMERS_H
> +
> +#include "qemu/timer.h"
> +
> +/* init the whole cpu timers API, including icount, ticks, and cpu_throttle */
> +void cpu_timers_init(void);
> +
> +/* icount - Instruction Counter API */
> +
> +/*
> + * Return the icount enablement state:
> + *
> + * 0 = Disabled - Do not count executed instructions.
> + * 1 = Enabled - Fixed conversion of insn to ns via "shift" option
> + * 2 = Enabled - Runtime adaptive algorithm to compute shift
> + */
> +int icount_enabled(void);
> +/*
> + * Update the icount with the executed instructions. Called by
> + * cpus-tcg vCPU thread so the main-loop can see time has moved forward.
> + */
> +void icount_update(CPUState *cpu);
> +
> +/* get raw icount value */
> +int64_t icount_get_raw(void);
> +
> +/* return the virtual CPU time in ns, based on the instruction counter. */
> +int64_t icount_get(void);
> +/*
> + * convert an instruction counter value to ns, based on the icount shift.
> + * This shift is set as a fixed value with the icount "shift" option
> + * (precise mode), or it is constantly approximated and corrected at
> + * runtime in adaptive mode.
> + */
> +int64_t icount_to_ns(int64_t icount);
> +
> +/* configure the icount options, including "shift" */
> +void icount_configure(QemuOpts *opts, Error **errp);
> +
> +/* used by tcg vcpu thread to calc icount budget */
> +int64_t icount_round(int64_t count);
> +
> +/* if the CPUs are idle, start accounting real time to virtual clock. */
> +void icount_start_warp_timer(void);
> +void icount_account_warp_timer(void);
> +
> +/*
> + * CPU Ticks and Clock
> + */
> +
> +/* Caller must hold BQL */
> +void cpu_enable_ticks(void);
> +/* Caller must hold BQL */
> +void cpu_disable_ticks(void);
> +
> +/*
> + * return the time elapsed in VM between vm_start and vm_stop.  Unless
> + * icount is active, cpu_get_ticks() uses units of the host CPU cycle
> + * counter.
> + */
> +int64_t cpu_get_ticks(void);
> +
> +/*
> + * Returns the monotonic time elapsed in VM, i.e.,
> + * the time between vm_start and vm_stop
> + */
> +int64_t cpu_get_clock(void);
> +
> +void qemu_timer_notify_cb(void *opaque, QEMUClockType type);
> +void qtest_clock_warp(int64_t dest);
> +
> +#endif /* SYSEMU_CPU_TIMERS_H */
> diff --git a/include/sysemu/cpus.h b/include/sysemu/cpus.h
> index 3c1da6a018..149de000a0 100644
> --- a/include/sysemu/cpus.h
> +++ b/include/sysemu/cpus.h
> @@ -4,33 +4,23 @@
>  #include "qemu/timer.h"
>  
>  /* cpus.c */
> +bool all_cpu_threads_idle(void);
>  bool qemu_in_vcpu_thread(void);
>  void qemu_init_cpu_loop(void);
>  void resume_all_vcpus(void);
>  void pause_all_vcpus(void);
>  void cpu_stop_current(void);
> -void cpu_ticks_init(void);
>  
> -void configure_icount(QemuOpts *opts, Error **errp);
> -extern int use_icount;
>  extern int icount_align_option;
>  
> -/* drift information for info jit command */
> -extern int64_t max_delay;
> -extern int64_t max_advance;
> -void dump_drift_info(void);
> -
>  /* Unblock cpu */
>  void qemu_cpu_kick_self(void);
> -void qemu_timer_notify_cb(void *opaque, QEMUClockType type);
>  
>  void cpu_synchronize_all_states(void);
>  void cpu_synchronize_all_post_reset(void);
>  void cpu_synchronize_all_post_init(void);
>  void cpu_synchronize_all_pre_loadvm(void);
>  
> -void qtest_clock_warp(int64_t dest);
> -
>  #ifndef CONFIG_USER_ONLY
>  /* vl.c */
>  /* *-user doesn't have configurable SMP topology */
> diff --git a/include/sysemu/replay.h b/include/sysemu/replay.h
> index 5471bb514d..a140d69a73 100644
> --- a/include/sysemu/replay.h
> +++ b/include/sysemu/replay.h
> @@ -109,12 +109,12 @@ int64_t replay_read_clock(ReplayClockKind kind);
>  #define REPLAY_CLOCK(clock, value)                                      \
>      (replay_mode == REPLAY_MODE_PLAY ? replay_read_clock((clock))       \
>          : replay_mode == REPLAY_MODE_RECORD                             \
> -            ? replay_save_clock((clock), (value), cpu_get_icount_raw()) \
> +            ? replay_save_clock((clock), (value), icount_get_raw()) \
>          : (value))
>  #define REPLAY_CLOCK_LOCKED(clock, value)                               \
>      (replay_mode == REPLAY_MODE_PLAY ? replay_read_clock((clock))       \
>          : replay_mode == REPLAY_MODE_RECORD                             \
> -            ? replay_save_clock((clock), (value), cpu_get_icount_raw_locked()) \
> +            ? replay_save_clock((clock), (value), icount_get_raw_locked()) \
>          : (value))
>  
>  /* Processing data from random generators */
> diff --git a/qtest.c b/qtest.c
> index 5672b75c35..a1b92853c9 100644
> --- a/qtest.c
> +++ b/qtest.c
> @@ -21,7 +21,7 @@
>  #include "exec/memory.h"
>  #include "hw/irq.h"
>  #include "sysemu/accel.h"
> -#include "sysemu/cpus.h"
> +#include "sysemu/cpu-timers.h"
>  #include "qemu/config-file.h"
>  #include "qemu/option.h"
>  #include "qemu/error-report.h"
> diff --git a/replay/replay.c b/replay/replay.c
> index 706c7b4f4b..9896a3b6f5 100644
> --- a/replay/replay.c
> +++ b/replay/replay.c
> @@ -11,10 +11,10 @@
>  
>  #include "qemu/osdep.h"
>  #include "qapi/error.h"
> +#include "sysemu/cpu-timers.h"
>  #include "sysemu/replay.h"
>  #include "sysemu/runstate.h"
>  #include "replay-internal.h"
> -#include "qemu/timer.h"
>  #include "qemu/main-loop.h"
>  #include "qemu/option.h"
>  #include "sysemu/cpus.h"
> @@ -64,7 +64,7 @@ bool replay_next_event_is(int event)
>  
>  uint64_t replay_get_current_icount(void)
>  {
> -    return cpu_get_icount_raw();
> +    return icount_get_raw();
>  }
>  
>  int replay_get_instructions(void)
> @@ -345,7 +345,7 @@ void replay_start(void)
>          error_reportf_err(replay_blockers->data, "Record/replay: ");
>          exit(1);
>      }
> -    if (!use_icount) {
> +    if (!icount_enabled()) {
>          error_report("Please enable icount to use record/replay");
>          exit(1);
>      }
> diff --git a/softmmu/vl.c b/softmmu/vl.c
> index ae5451bc23..ed53cd1b62 100644
> --- a/softmmu/vl.c
> +++ b/softmmu/vl.c
> @@ -73,6 +73,7 @@
>  #include "hw/audio/soundhw.h"
>  #include "audio/audio.h"
>  #include "sysemu/cpus.h"
> +#include "sysemu/cpu-timers.h"
>  #include "migration/colo.h"
>  #include "migration/postcopy-ram.h"
>  #include "sysemu/kvm.h"
> @@ -2675,7 +2676,7 @@ static void user_register_global_props(void)
>  
>  static int do_configure_icount(void *opaque, QemuOpts *opts, Error **errp)
>  {
> -    configure_icount(opts, errp);
> +    icount_configure(opts, errp);
>      return 0;
>  }
>  
> @@ -2785,7 +2786,7 @@ static void configure_accelerators(const char *progname)
>          error_report("falling back to %s", ac->name);
>      }
>  
> -    if (use_icount && !(tcg_enabled() || qtest_enabled())) {
> +    if (icount_enabled() && !(tcg_enabled() || qtest_enabled())) {
>          error_report("-icount is not allowed with hardware virtualization");
>          exit(1);
>      }
> @@ -4233,7 +4234,8 @@ void qemu_init(int argc, char **argv, char **envp)
>      /* spice needs the timers to be initialized by this point */
>      qemu_spice_init();
>  
> -    cpu_ticks_init();
> +    /* initialize cpu timers and VCPU throttle modules */
> +    cpu_timers_init();
>  
>      if (default_net) {
>          QemuOptsList *net = qemu_find_opts("net");
> diff --git a/stubs/clock-warp.c b/stubs/clock-warp.c
> index b53e5dd94c..304da5091c 100644
> --- a/stubs/clock-warp.c
> +++ b/stubs/clock-warp.c
> @@ -1,7 +1,7 @@
>  #include "qemu/osdep.h"
> -#include "qemu/timer.h"
> +#include "sysemu/cpu-timers.h"
>  
> -void qemu_start_warp_timer(void)
> +void icount_start_warp_timer(void)
>  {
>  }
>  
> diff --git a/stubs/cpu-get-clock.c b/stubs/cpu-get-clock.c
> index 5a92810e87..6102338743 100644
> --- a/stubs/cpu-get-clock.c
> +++ b/stubs/cpu-get-clock.c
> @@ -1,5 +1,5 @@
>  #include "qemu/osdep.h"
> -#include "qemu/timer.h"
> +#include "sysemu/cpu-timers.h"
>  
>  int64_t cpu_get_clock(void)
>  {
> diff --git a/stubs/cpu-get-icount.c b/stubs/cpu-get-icount.c
> index b35f844638..23f9154ef3 100644
> --- a/stubs/cpu-get-icount.c
> +++ b/stubs/cpu-get-icount.c
> @@ -1,20 +1,22 @@
>  #include "qemu/osdep.h"
> -#include "qemu/timer.h"
> -#include "sysemu/cpus.h"
> +#include "sysemu/cpu-timers.h"
>  #include "qemu/main-loop.h"
>  
> -int use_icount;
> -
> -int64_t cpu_get_icount(void)
> +int64_t icount_get(void)
>  {
>      abort();
>  }
>  
> -int64_t cpu_get_icount_raw(void)
> +int64_t icount_get_raw(void)
>  {
>      abort();
>  }
>  
> +int icount_enabled(void)
> +{
> +    return 0;
> +}
> +
>  void qemu_timer_notify_cb(void *opaque, QEMUClockType type)
>  {
>      qemu_notify_event();
> diff --git a/target/alpha/translate.c b/target/alpha/translate.c
> index 8870284f57..36be602179 100644
> --- a/target/alpha/translate.c
> +++ b/target/alpha/translate.c
> @@ -20,6 +20,7 @@
>  #include "qemu/osdep.h"
>  #include "cpu.h"
>  #include "sysemu/cpus.h"
> +#include "sysemu/cpu-timers.h"
>  #include "disas/disas.h"
>  #include "qemu/host-utils.h"
>  #include "exec/exec-all.h"
> @@ -1329,7 +1330,7 @@ static DisasJumpType gen_mfpr(DisasContext *ctx, TCGv va, int regno)
>      case 249: /* VMTIME */
>          helper = gen_helper_get_vmtime;
>      do_helper:
> -        if (use_icount) {
> +        if (icount_enabled()) {
>              gen_io_start();
>              helper(va);
>              return DISAS_PC_STALE;
> diff --git a/target/arm/helper.c b/target/arm/helper.c
> index a92ae55672..c9f99f7952 100644
> --- a/target/arm/helper.c
> +++ b/target/arm/helper.c
> @@ -24,6 +24,7 @@
>  #include "hw/irq.h"
>  #include "hw/semihosting/semihost.h"
>  #include "sysemu/cpus.h"
> +#include "sysemu/cpu-timers.h"
>  #include "sysemu/kvm.h"
>  #include "sysemu/tcg.h"
>  #include "qemu/range.h"
> @@ -1205,17 +1206,17 @@ static int64_t cycles_ns_per(uint64_t cycles)
>  
>  static bool instructions_supported(CPUARMState *env)
>  {
> -    return use_icount == 1 /* Precise instruction counting */;
> +    return icount_enabled() == 1; /* Precise instruction counting */
>  }
>  
>  static uint64_t instructions_get_count(CPUARMState *env)
>  {
> -    return (uint64_t)cpu_get_icount_raw();
> +    return (uint64_t)icount_get_raw();
>  }
>  
>  static int64_t instructions_ns_per(uint64_t icount)
>  {
> -    return cpu_icount_to_ns((int64_t)icount);
> +    return icount_to_ns((int64_t)icount);
>  }
>  #endif
>  
> diff --git a/target/riscv/csr.c b/target/riscv/csr.c
> index 11d184cd16..6093f73e3a 100644
> --- a/target/riscv/csr.c
> +++ b/target/riscv/csr.c
> @@ -194,8 +194,8 @@ static int write_fcsr(CPURISCVState *env, int csrno, target_ulong val)
>  static int read_instret(CPURISCVState *env, int csrno, target_ulong *val)
>  {
>  #if !defined(CONFIG_USER_ONLY)
> -    if (use_icount) {
> -        *val = cpu_get_icount();
> +    if (icount_enabled()) {
> +        *val = icount_get();
>      } else {
>          *val = cpu_get_host_ticks();
>      }
> @@ -209,8 +209,8 @@ static int read_instret(CPURISCVState *env, int csrno, target_ulong *val)
>  static int read_instreth(CPURISCVState *env, int csrno, target_ulong *val)
>  {
>  #if !defined(CONFIG_USER_ONLY)
> -    if (use_icount) {
> -        *val = cpu_get_icount() >> 32;
> +    if (icount_enabled()) {
> +        *val = icount_get() >> 32;
>      } else {
>          *val = cpu_get_host_ticks() >> 32;
>      }
> diff --git a/tests/ptimer-test-stubs.c b/tests/ptimer-test-stubs.c
> index ed393d9082..320dcf99b7 100644
> --- a/tests/ptimer-test-stubs.c
> +++ b/tests/ptimer-test-stubs.c
> @@ -12,6 +12,7 @@
>  #include "qemu/main-loop.h"
>  #include "sysemu/replay.h"
>  #include "migration/vmstate.h"
> +#include "sysemu/cpu-timers.h"
>  
>  #include "ptimer-test.h"
>  
> @@ -126,3 +127,8 @@ void replay_bh_schedule_event(QEMUBH *bh)
>  {
>      bh->cb(bh->opaque);
>  }
> +
> +int icount_enabled(void)
> +{
> +    return 0;
> +}
> diff --git a/tests/test-timed-average.c b/tests/test-timed-average.c
> index e2bcf5fe13..82c92500df 100644
> --- a/tests/test-timed-average.c
> +++ b/tests/test-timed-average.c
> @@ -11,7 +11,7 @@
>   */
>  
>  #include "qemu/osdep.h"
> -
> +#include "sysemu/cpu-timers.h"
>  #include "qemu/timed-average.h"
>  
>  /* This is the clock for QEMU_CLOCK_VIRTUAL */
> diff --git a/util/main-loop.c b/util/main-loop.c
> index eda63fe4e0..f1af697572 100644
> --- a/util/main-loop.c
> +++ b/util/main-loop.c
> @@ -27,7 +27,7 @@
>  #include "qemu/cutils.h"
>  #include "qemu/timer.h"
>  #include "sysemu/qtest.h"
> -#include "sysemu/cpus.h"
> +#include "sysemu/cpu-timers.h"
>  #include "sysemu/replay.h"
>  #include "qemu/main-loop.h"
>  #include "block/aio.h"
> @@ -521,7 +521,7 @@ void main_loop_wait(int nonblocking)
>  
>      /* CPU thread can infinitely wait for event after
>         missing the warp */
> -    qemu_start_warp_timer();
> +    icount_start_warp_timer();
>      qemu_clock_run_all_timers();
>  }
>  
> diff --git a/util/qemu-timer.c b/util/qemu-timer.c
> index b6575a2cd5..da2883f914 100644
> --- a/util/qemu-timer.c
> +++ b/util/qemu-timer.c
> @@ -26,6 +26,7 @@
>  #include "qemu/main-loop.h"
>  #include "qemu/timer.h"
>  #include "qemu/lockable.h"
> +#include "sysemu/cpu-timers.h"
>  #include "sysemu/replay.h"
>  #include "sysemu/cpus.h"
>  
> @@ -134,7 +135,7 @@ static void qemu_clock_init(QEMUClockType type, QEMUTimerListNotifyCB *notify_cb
>  
>  bool qemu_clock_use_for_deadline(QEMUClockType type)
>  {
> -    return !(use_icount && (type == QEMU_CLOCK_VIRTUAL));
> +    return !(icount_enabled() && (type == QEMU_CLOCK_VIRTUAL));
>  }
>  
>  void qemu_clock_notify(QEMUClockType type)
> @@ -417,7 +418,7 @@ static void timerlist_rearm(QEMUTimerList *timer_list)
>  {
>      /* Interrupt execution to force deadline recalculation.  */
>      if (timer_list->clock->type == QEMU_CLOCK_VIRTUAL) {
> -        qemu_start_warp_timer();
> +        icount_start_warp_timer();
>      }
>      timerlist_notify(timer_list);
>  }
> @@ -647,8 +648,8 @@ int64_t qemu_clock_get_ns(QEMUClockType type)
>          return get_clock();
>      default:
>      case QEMU_CLOCK_VIRTUAL:
> -        if (use_icount) {
> -            return cpu_get_icount();
> +        if (icount_enabled()) {
> +            return icount_get();
>          } else {
>              return cpu_get_clock();
>          }
> 



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, back to index

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-21 18:54 [RFC 0/3] QEMU cpus.c refactoring Claudio Fontana
2020-05-21 18:54 ` [RFC 1/3] cpu-throttle: new module, extracted from cpus.c Claudio Fontana
2020-05-22  6:07   ` Thomas Huth
2020-05-22  8:15     ` Claudio Fontana
2020-05-22 10:26       ` Alex Bennée
2020-05-22 10:54         ` Claudio Fontana
2020-05-22 11:18           ` Alex Bennée
2020-05-22 11:23             ` Claudio Fontana
2020-05-21 18:54 ` [RFC 2/3] cpu-timers: new module " Claudio Fontana
2020-05-22 13:49   ` Claudio Fontana
2020-05-21 18:54 ` [RFC 3/3] cpus: implement cpus interfaces for per-accel threads Claudio Fontana

QEMU-Devel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/qemu-devel/0 qemu-devel/git/0.git
	git clone --mirror https://lore.kernel.org/qemu-devel/1 qemu-devel/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 qemu-devel qemu-devel/ https://lore.kernel.org/qemu-devel \
		qemu-devel@nongnu.org
	public-inbox-index qemu-devel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.nongnu.qemu-devel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git