All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC v3 0/8] QEMU cpus.c refactoring part2
@ 2020-08-03  9:05 Claudio Fontana
  2020-08-03  9:05 ` [RFC v3 1/8] cpu-timers, icount: new modules Claudio Fontana
                   ` (9 more replies)
  0 siblings, 10 replies; 25+ messages in thread
From: Claudio Fontana @ 2020-08-03  9:05 UTC (permalink / raw)
  To: Paolo Bonzini, Alex Bennée, Peter Maydell,
	Philippe Mathieu-Daudé
  Cc: Laurent Vivier, Thomas Huth, Eduardo Habkost, Pavel Dovgalyuk,
	Marcelo Tosatti, qemu-devel, Markus Armbruster, Roman Bolshakov,
	Wenchao Wang, Colin Xu, Claudio Fontana, haxm-team,
	Sunil Muthuswamy, Richard Henderson

Motivation and higher level steps:

https://lists.gnu.org/archive/html/qemu-devel/2020-05/msg04628.html

The biggest open item for me is, does it makes sense to:


1) make icount TCG-only (building the icount module only under
CONFIG_TCG), as this series suggests, and provide a separate virtual
counter for qtest,


or


2) continue to keep icount functions and fields, including vmstate,
in all softmmu builds because of qtest current use of field
qemu_icount_bias to implement its virtual counter for qtest_clock_warp?


If I understand correctly Paolo might be for 2) (?)
would also welcome additional input from the community in any direction
(Alex, Peter, Philippe?)

----

RFC v2 -> v3:

* provided defaults for all methods.
  Only create_vcpu_thread is now a mandatory field. (Paolo)

* separated new CpusAccel patch from its first user, new patch nr. 2:
  "cpus: prepare new CpusAccel cpu accelerator interface"

* new CpusAccel methods: get_virtual_clock and get_elapsed_ticks.
  (Paolo)

  In this series, get_virtual_clock has a separate implementation
  between TCG/icount and qtest,
  while get_elapsed_ticks only returns a virtual counter for icount.

  Looking for more comments in this area.

----

RFC v1 -> v2:

* split the cpus.c accelerator refactoring into 6 patches.

* other minor changes to be able to proceed step by step.

----

* Rebased on commit 255ae6e2158c743717bed76c9a2365ee4bcd326e,
"replay: notify the main loop when there are no instructions"

[SPLIT into part1 and part2]

----

v6 -> v7:

* rebased changes on top of Pavel Dovgalyuk changes to dma-helpers.c
  "icount: make dma reads deterministic"

----

v5 -> v6:

* rebased changes on top of Emilio G. Cota changes to cpus.c
  "cpu: convert queued work to a QSIMPLEQ"

* keep a pointer in cpus.c instead of a copy of CpusAccel
  (Alex)

----


v4 -> v5: rebase on latest master

* rebased changes on top of roman series to remove one of the extra states for hvf.
  (Is the result now functional for HVF?)

* rebased changes on top of icount changes and fixes to icount_configure and
  the new shift vmstate. (Markus)

v3 -> v4:

* overall: added copyright headers to all files that were missing them
  (used copyright and license of the module the stuff was extracted from).
  For the new interface files, added SUSE LLC.

* 1/4 (move softmmu only files from root):

  MAINTAINERS: moved softmmu/cpus.c to its final location (from patch 2)

* 2/4 (cpu-throttle):

  MAINTAINERS (to patch 1),
  copyright Fabrice Bellard and license from cpus.c

* 3/4 (cpu-timers, icount):

  - MAINTAINERS: add cpu-timers.c and icount.c to Paolo

  - break very long lines (patchew)

  - add copyright SUSE LLC, GPLv2 to cpu-timers.h

  - add copyright Fabrice Bellard and license from cpus.c to timers-state.h
    as it is lifted from cpus.c

  - vl.c: in configure_accelerators bail out if icount_enabled()
    and !tcg_enabled() as qtest does not enable icount anymore.

* 4/4 (accel stuff to accel):

  - add copyright SUSE LLC to files that mostly only consist of the
    new interface. Add whatever copyright was in the accelerator code
    if instead they mostly consist of accelerator code.

  - change a comment to mention the result of the AccelClass experiment

  - moved qtest accelerator into accel/qtest/ , make it like the others.

  - rename xxx-cpus-interface to xxx-cpus (remove "interface" from names)

  - rename accel_int to cpus_accel

  - rename CpusAccel functions from cpu_synchronize_* to synchronize_*


--------

v2 -> v3:

* turned into a 4 patch series, adding a first patch moving
  softmmu code currently in top_srcdir to softmmu/

* cpu-throttle: moved to softmmu/

* cpu-timers, icount:

  - moved to softmmu/

  - fixed assumption of qtest_enabled() => icount_enabled()
  causing the failure of check-qtest-arm goal, in test-arm-mptimer.c

  Fix is in hw/core/ptimer.c,

  where the artificial timeout rate limit should not be applied
  under qtest_enabled(), in a similar way to how it is not applied
  for icount_enabled().

* CpuAccelInterface: no change.


--------


v1 -> v2:

* 1/3 (cpu-throttle): provide a description in the commit message

* 2/3 (cpu-timers, icount): in this v2 separate icount from cpu-timers,
  as icount is actually TCG-specific. Only build it under CONFIG_TCG.

  To do this, qtest had to be detached from icount. To this end, a
  trivial global counter for qtest has been introduced.

* 3/3 (CpuAccelInterface): provided a description.

This is point 8) in that plan. The idea is to extract the unrelated parts
in cpus, and register interfaces from each single accelerator to the main
cpus module (cpus.c).

While doing this RFC, I noticed some assumptions about Windows being
either TCG or HAX (not considering WHPX) that might need to be revisited.
I added a comment there.

The thing builds successfully based on Linux cross-compilations for
windows/hax, windows/whpx, and I got a good build on Darwin/hvf.

Tests run successully for tcg and kvm configurations, but did not test on
windows or darwin.

Welcome your feedback and help on this,

Claudio

Claudio Fontana (8):
  cpu-timers, icount: new modules
  cpus: prepare new CpusAccel cpu accelerator interface
  cpus: extract out TCG-specific code to accel/tcg
  cpus: extract out qtest-specific code to accel/qtest
  cpus: extract out kvm-specific code to accel/kvm
  cpus: extract out hax-specific code to target/i386/
  cpus: extract out whpx-specific code to target/i386/
  cpus: extract out hvf-specific code to target/i386/hvf/

 MAINTAINERS                    |    5 +-
 accel/Makefile.objs            |    2 +-
 accel/kvm/Makefile.objs        |    2 +
 accel/kvm/kvm-all.c            |   14 +-
 accel/kvm/kvm-cpus.c           |   88 +++
 accel/kvm/kvm-cpus.h           |   17 +
 accel/qtest/Makefile.objs      |    2 +
 accel/qtest/qtest-cpus.c       |   91 +++
 accel/qtest/qtest-cpus.h       |   17 +
 accel/{ => qtest}/qtest.c      |   13 +-
 accel/stubs/kvm-stub.c         |    3 +-
 accel/tcg/Makefile.objs        |    1 +
 accel/tcg/cpu-exec.c           |   43 +-
 accel/tcg/tcg-all.c            |   19 +-
 accel/tcg/tcg-cpus.c           |  541 +++++++++++++
 accel/tcg/tcg-cpus.h           |   17 +
 accel/tcg/translate-all.c      |    3 +-
 dma-helpers.c                  |    4 +-
 docs/replay.txt                |    6 +-
 exec.c                         |    4 -
 hw/core/cpu.c                  |    1 +
 hw/core/ptimer.c               |    8 +-
 hw/i386/x86.c                  |    3 +-
 include/exec/cpu-all.h         |    4 +
 include/exec/exec-all.h        |    4 +-
 include/qemu/timer.h           |   24 +-
 include/sysemu/cpu-timers.h    |   84 ++
 include/sysemu/cpus.h          |   48 +-
 include/sysemu/hw_accel.h      |   69 +-
 include/sysemu/kvm.h           |    2 +-
 include/sysemu/qtest.h         |    2 +
 include/sysemu/replay.h        |    4 +-
 replay/replay.c                |    6 +-
 softmmu/Makefile.objs          |    2 +
 softmmu/cpu-timers.c           |  279 +++++++
 softmmu/cpus.c                 | 1661 +++-------------------------------------
 softmmu/icount.c               |  497 ++++++++++++
 softmmu/qtest.c                |   34 +-
 softmmu/timers-state.h         |   69 ++
 softmmu/vl.c                   |   11 +-
 stubs/Makefile.objs            |    6 +-
 stubs/clock-warp.c             |    7 -
 stubs/cpu-get-clock.c          |    3 +-
 stubs/cpu-get-icount.c         |   21 -
 stubs/cpu-synchronize-state.c  |   15 +
 stubs/cpus-get-virtual-clock.c |    8 +
 stubs/icount.c                 |   52 ++
 stubs/qemu-timer-notify-cb.c   |    8 +
 stubs/qtest.c                  |    5 +
 target/alpha/translate.c       |    3 +-
 target/arm/helper.c            |    7 +-
 target/i386/Makefile.objs      |    7 +-
 target/i386/hax-all.c          |    6 +-
 target/i386/hax-cpus.c         |   85 ++
 target/i386/hax-cpus.h         |   17 +
 target/i386/hax-i386.h         |    2 +
 target/i386/hax-posix.c        |   12 +
 target/i386/hax-windows.c      |   20 +
 target/i386/hvf/Makefile.objs  |    2 +-
 target/i386/hvf/hvf-cpus.c     |  131 ++++
 target/i386/hvf/hvf-cpus.h     |   17 +
 target/i386/hvf/hvf.c          |    3 +
 target/i386/whpx-all.c         |    3 +
 target/i386/whpx-cpus.c        |   96 +++
 target/i386/whpx-cpus.h        |   17 +
 target/riscv/csr.c             |    8 +-
 tests/ptimer-test-stubs.c      |    7 +-
 tests/test-timed-average.c     |    2 +-
 util/main-loop.c               |   12 +-
 util/qemu-timer.c              |   14 +-
 70 files changed, 2528 insertions(+), 1772 deletions(-)
 create mode 100644 accel/kvm/kvm-cpus.c
 create mode 100644 accel/kvm/kvm-cpus.h
 create mode 100644 accel/qtest/Makefile.objs
 create mode 100644 accel/qtest/qtest-cpus.c
 create mode 100644 accel/qtest/qtest-cpus.h
 rename accel/{ => qtest}/qtest.c (81%)
 create mode 100644 accel/tcg/tcg-cpus.c
 create mode 100644 accel/tcg/tcg-cpus.h
 create mode 100644 include/sysemu/cpu-timers.h
 create mode 100644 softmmu/cpu-timers.c
 create mode 100644 softmmu/icount.c
 create mode 100644 softmmu/timers-state.h
 delete mode 100644 stubs/clock-warp.c
 delete mode 100644 stubs/cpu-get-icount.c
 create mode 100644 stubs/cpu-synchronize-state.c
 create mode 100644 stubs/cpus-get-virtual-clock.c
 create mode 100644 stubs/icount.c
 create mode 100644 stubs/qemu-timer-notify-cb.c
 create mode 100644 target/i386/hax-cpus.c
 create mode 100644 target/i386/hax-cpus.h
 create mode 100644 target/i386/hvf/hvf-cpus.c
 create mode 100644 target/i386/hvf/hvf-cpus.h
 create mode 100644 target/i386/whpx-cpus.c
 create mode 100644 target/i386/whpx-cpus.h

-- 
2.16.4



^ permalink raw reply	[flat|nested] 25+ messages in thread

* [RFC v3 1/8] cpu-timers, icount: new modules
  2020-08-03  9:05 [RFC v3 0/8] QEMU cpus.c refactoring part2 Claudio Fontana
@ 2020-08-03  9:05 ` Claudio Fontana
  2020-08-04  8:13   ` Claudio Fontana
  2020-08-03  9:05 ` [RFC v3 2/8] cpus: prepare new CpusAccel cpu accelerator interface Claudio Fontana
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 25+ messages in thread
From: Claudio Fontana @ 2020-08-03  9:05 UTC (permalink / raw)
  To: Paolo Bonzini, Alex Bennée, Peter Maydell,
	Philippe Mathieu-Daudé
  Cc: Laurent Vivier, Thomas Huth, Eduardo Habkost, Pavel Dovgalyuk,
	Marcelo Tosatti, qemu-devel, Markus Armbruster, Roman Bolshakov,
	Wenchao Wang, Colin Xu, Claudio Fontana, haxm-team,
	Sunil Muthuswamy, Richard Henderson

refactoring of cpus.c continues with cpu timer state extraction.

cpu-timers: responsible for the softmmu cpu timers state,
            including cpu clocks and ticks.

icount: counts the TCG instructions executed. As such it is specific to
the TCG accelerator. Therefore, it is built only under CONFIG_TCG.

One complication is due to qtest, which uses an icount field to warp time
as part of qtest (qtest_clock_warp).

In order to solve this problem, provide a separate counter for qtest.

This requires fixing assumptions scattered in the code that
qtest_enabled() implies icount_enabled(), checking each specific case.

Signed-off-by: Claudio Fontana <cfontana@suse.de>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
---
 MAINTAINERS                  |   2 +
 accel/qtest.c                |   6 +-
 accel/tcg/cpu-exec.c         |  43 ++-
 accel/tcg/tcg-all.c          |   7 +-
 accel/tcg/translate-all.c    |   3 +-
 dma-helpers.c                |   4 +-
 docs/replay.txt              |   6 +-
 exec.c                       |   4 -
 hw/core/ptimer.c             |   8 +-
 hw/i386/x86.c                |   1 +
 include/exec/cpu-all.h       |   4 +
 include/exec/exec-all.h      |   4 +-
 include/qemu/timer.h         |  24 +-
 include/sysemu/cpu-timers.h  |  81 +++++
 include/sysemu/cpus.h        |  12 +-
 include/sysemu/qtest.h       |   2 +
 include/sysemu/replay.h      |   4 +-
 replay/replay.c              |   6 +-
 softmmu/Makefile.objs        |   2 +
 softmmu/cpu-timers.c         | 284 ++++++++++++++++
 softmmu/cpus.c               | 750 +------------------------------------------
 softmmu/icount.c             | 497 ++++++++++++++++++++++++++++
 softmmu/qtest.c              |  34 +-
 softmmu/timers-state.h       |  69 ++++
 softmmu/vl.c                 |  11 +-
 stubs/Makefile.objs          |   4 +-
 stubs/clock-warp.c           |   7 -
 stubs/cpu-get-clock.c        |   3 +-
 stubs/cpu-get-icount.c       |  21 --
 stubs/icount.c               |  52 +++
 stubs/qemu-timer-notify-cb.c |   8 +
 stubs/qtest.c                |   5 +
 target/alpha/translate.c     |   3 +-
 target/arm/helper.c          |   7 +-
 target/riscv/csr.c           |   8 +-
 tests/ptimer-test-stubs.c    |   7 +-
 tests/test-timed-average.c   |   2 +-
 util/main-loop.c             |  12 +-
 util/qemu-timer.c            |  14 +-
 39 files changed, 1158 insertions(+), 863 deletions(-)
 create mode 100644 include/sysemu/cpu-timers.h
 create mode 100644 softmmu/cpu-timers.c
 create mode 100644 softmmu/icount.c
 create mode 100644 softmmu/timers-state.h
 delete mode 100644 stubs/clock-warp.c
 delete mode 100644 stubs/cpu-get-icount.c
 create mode 100644 stubs/icount.c
 create mode 100644 stubs/qemu-timer-notify-cb.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 0886eb3d2b..7dcc3ef4c8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2285,6 +2285,8 @@ F: softmmu/vl.c
 F: softmmu/main.c
 F: softmmu/cpus.c
 F: softmmu/cpu-throttle.c
+F: softmmu/cpu-timers.c
+F: softmmu/icount.c
 F: qapi/run-state.json
 
 Human Monitor (HMP)
diff --git a/accel/qtest.c b/accel/qtest.c
index 5b88f55921..119d0f16a4 100644
--- a/accel/qtest.c
+++ b/accel/qtest.c
@@ -19,14 +19,10 @@
 #include "sysemu/accel.h"
 #include "sysemu/qtest.h"
 #include "sysemu/cpus.h"
+#include "sysemu/cpu-timers.h"
 
 static int qtest_init_accel(MachineState *ms)
 {
-    QemuOpts *opts = qemu_opts_create(qemu_find_opts("icount"), NULL, 0,
-                                      &error_abort);
-    qemu_opt_set(opts, "shift", "0", &error_abort);
-    configure_icount(opts, &error_abort);
-    qemu_opts_del(opts);
     return 0;
 }
 
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 66d38f9d85..b44e92b753 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -19,6 +19,7 @@
 
 #include "qemu/osdep.h"
 #include "qemu-common.h"
+#include "qemu/qemu-print.h"
 #include "cpu.h"
 #include "trace.h"
 #include "disas/disas.h"
@@ -36,6 +37,8 @@
 #include "hw/i386/apic.h"
 #endif
 #include "sysemu/cpus.h"
+#include "exec/cpu-all.h"
+#include "sysemu/cpu-timers.h"
 #include "sysemu/replay.h"
 
 /* -icount align implementation. */
@@ -56,6 +59,9 @@ typedef struct SyncClocks {
 #define MAX_DELAY_PRINT_RATE 2000000000LL
 #define MAX_NB_PRINTS 100
 
+static int64_t max_delay;
+static int64_t max_advance;
+
 static void align_clocks(SyncClocks *sc, CPUState *cpu)
 {
     int64_t cpu_icount;
@@ -65,7 +71,7 @@ static void align_clocks(SyncClocks *sc, CPUState *cpu)
     }
 
     cpu_icount = cpu->icount_extra + cpu_neg(cpu)->icount_decr.u16.low;
-    sc->diff_clk += cpu_icount_to_ns(sc->last_cpu_icount - cpu_icount);
+    sc->diff_clk += icount_to_ns(sc->last_cpu_icount - cpu_icount);
     sc->last_cpu_icount = cpu_icount;
 
     if (sc->diff_clk > VM_CLOCK_ADVANCE) {
@@ -98,9 +104,9 @@ static void print_delay(const SyncClocks *sc)
             (-sc->diff_clk / (float)1000000000LL <
              (threshold_delay - THRESHOLD_REDUCE))) {
             threshold_delay = (-sc->diff_clk / 1000000000LL) + 1;
-            printf("Warning: The guest is now late by %.1f to %.1f seconds\n",
-                   threshold_delay - 1,
-                   threshold_delay);
+            qemu_printf("Warning: The guest is now late by %.1f to %.1f seconds\n",
+                        threshold_delay - 1,
+                        threshold_delay);
             nb_prints++;
             last_realtime_clock = sc->realtime_clock;
         }
@@ -614,7 +620,7 @@ static inline bool cpu_handle_interrupt(CPUState *cpu,
 
     /* Finally, check if we need to exit to the main loop.  */
     if (unlikely(atomic_read(&cpu->exit_request))
-        || (use_icount
+        || (icount_enabled()
             && cpu_neg(cpu)->icount_decr.u16.low + cpu->icount_extra == 0)) {
         atomic_set(&cpu->exit_request, 0);
         if (cpu->exception_index == -1) {
@@ -655,10 +661,10 @@ static inline void cpu_loop_exec_tb(CPUState *cpu, TranslationBlock *tb,
     }
 
     /* Instruction counter expired.  */
-    assert(use_icount);
+    assert(icount_enabled());
 #ifndef CONFIG_USER_ONLY
     /* Ensure global icount has gone forward */
-    cpu_update_icount(cpu);
+    icount_update(cpu);
     /* Refill decrementer and continue execution.  */
     insns_left = MIN(0xffff, cpu->icount_budget);
     cpu_neg(cpu)->icount_decr.u16.low = insns_left;
@@ -758,3 +764,26 @@ int cpu_exec(CPUState *cpu)
 
     return ret;
 }
+
+#ifndef CONFIG_USER_ONLY
+
+void dump_drift_info(void)
+{
+    if (!icount_enabled()) {
+        return;
+    }
+
+    qemu_printf("Host - Guest clock  %"PRIi64" ms\n",
+                (cpu_get_clock() - icount_get()) / SCALE_MS);
+    if (icount_align_option) {
+        qemu_printf("Max guest delay     %"PRIi64" ms\n",
+                    -max_delay / SCALE_MS);
+        qemu_printf("Max guest advance   %"PRIi64" ms\n",
+                    max_advance / SCALE_MS);
+    } else {
+        qemu_printf("Max guest delay     NA\n");
+        qemu_printf("Max guest advance   NA\n");
+    }
+}
+
+#endif /* !CONFIG_USER_ONLY */
diff --git a/accel/tcg/tcg-all.c b/accel/tcg/tcg-all.c
index eace2c113b..f1feea20c8 100644
--- a/accel/tcg/tcg-all.c
+++ b/accel/tcg/tcg-all.c
@@ -29,6 +29,7 @@
 #include "qom/object.h"
 #include "cpu.h"
 #include "sysemu/cpus.h"
+#include "sysemu/cpu-timers.h"
 #include "qemu/main-loop.h"
 #include "tcg/tcg.h"
 #include "qapi/error.h"
@@ -65,7 +66,7 @@ static void tcg_handle_interrupt(CPUState *cpu, int mask)
         qemu_cpu_kick(cpu);
     } else {
         atomic_set(&cpu_neg(cpu)->icount_decr.u16.high, -1);
-        if (use_icount &&
+        if (icount_enabled() &&
             !cpu->can_do_io
             && (mask & ~old_mask) != 0) {
             cpu_abort(cpu, "Raised interrupt while not in I/O function");
@@ -104,7 +105,7 @@ static bool check_tcg_memory_orders_compatible(void)
 
 static bool default_mttcg_enabled(void)
 {
-    if (use_icount || TCG_OVERSIZED_GUEST) {
+    if (icount_enabled() || TCG_OVERSIZED_GUEST) {
         return false;
     } else {
 #ifdef TARGET_SUPPORTS_MTTCG
@@ -146,7 +147,7 @@ static void tcg_set_thread(Object *obj, const char *value, Error **errp)
     if (strcmp(value, "multi") == 0) {
         if (TCG_OVERSIZED_GUEST) {
             error_setg(errp, "No MTTCG when guest word size > hosts");
-        } else if (use_icount) {
+        } else if (icount_enabled()) {
             error_setg(errp, "No MTTCG when icount is enabled");
         } else {
 #ifndef TARGET_SUPPORTS_MTTCG
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 2d83013633..c39ff7b047 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -57,6 +57,7 @@
 #include "qemu/main-loop.h"
 #include "exec/log.h"
 #include "sysemu/cpus.h"
+#include "sysemu/cpu-timers.h"
 #include "sysemu/tcg.h"
 
 /* #define DEBUG_TB_INVALIDATE */
@@ -369,7 +370,7 @@ static int cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
 
  found:
     if (reset_icount && (tb_cflags(tb) & CF_USE_ICOUNT)) {
-        assert(use_icount);
+        assert(icount_enabled());
         /* Reset the cycle counter to the start of the block
            and shift if to the number of actually executed instructions */
         cpu_neg(cpu)->icount_decr.u16.low += num_insns - i;
diff --git a/dma-helpers.c b/dma-helpers.c
index 2a77b5a9cb..240ef4d5b8 100644
--- a/dma-helpers.c
+++ b/dma-helpers.c
@@ -13,7 +13,7 @@
 #include "trace-root.h"
 #include "qemu/thread.h"
 #include "qemu/main-loop.h"
-#include "sysemu/cpus.h"
+#include "sysemu/cpu-timers.h"
 #include "qemu/range.h"
 
 /* #define DEBUG_IOMMU */
@@ -151,7 +151,7 @@ static void dma_blk_cb(void *opaque, int ret)
          * from several sectors. This code splits all SGs into several
          * groups. SGs in every group do not overlap.
          */
-        if (mem && use_icount && dbs->dir == DMA_DIRECTION_FROM_DEVICE) {
+        if (mem && icount_enabled() && dbs->dir == DMA_DIRECTION_FROM_DEVICE) {
             int i;
             for (i = 0 ; i < dbs->iov.niov ; ++i) {
                 if (ranges_overlap((intptr_t)dbs->iov.iov[i].iov_base,
diff --git a/docs/replay.txt b/docs/replay.txt
index 70c27edb36..8952e6d852 100644
--- a/docs/replay.txt
+++ b/docs/replay.txt
@@ -184,11 +184,11 @@ is then incremented (which is called "warping" the virtual clock) as
 soon as the timer fires or the CPUs need to go out of the idle state.
 Two functions are used for this purpose; because these actions change
 virtual machine state and must be deterministic, each of them creates a
-checkpoint.  qemu_start_warp_timer checks if the CPUs are idle and if so
-starts accounting real time to virtual clock.  qemu_account_warp_timer
+checkpoint.  icount_start_warp_timer checks if the CPUs are idle and if so
+starts accounting real time to virtual clock.  icount_account_warp_timer
 is called when the CPUs get an interrupt or when the warp timer fires,
 and it warps the virtual clock by the amount of real time that has passed
-since qemu_start_warp_timer.
+since icount_start_warp_timer.
 
 Bottom halves
 -------------
diff --git a/exec.c b/exec.c
index 6f381f98e2..a89ffa93c1 100644
--- a/exec.c
+++ b/exec.c
@@ -102,10 +102,6 @@ uintptr_t qemu_host_page_size;
 intptr_t qemu_host_page_mask;
 
 #if !defined(CONFIG_USER_ONLY)
-/* 0 = Do not count executed instructions.
-   1 = Precise instruction counting.
-   2 = Adaptive rate instruction counting.  */
-int use_icount;
 
 typedef struct PhysPageEntry PhysPageEntry;
 
diff --git a/hw/core/ptimer.c b/hw/core/ptimer.c
index b5a54e2536..c6d2beb1da 100644
--- a/hw/core/ptimer.c
+++ b/hw/core/ptimer.c
@@ -7,11 +7,11 @@
  */
 
 #include "qemu/osdep.h"
-#include "qemu/timer.h"
 #include "hw/ptimer.h"
 #include "migration/vmstate.h"
 #include "qemu/host-utils.h"
 #include "sysemu/replay.h"
+#include "sysemu/cpu-timers.h"
 #include "sysemu/qtest.h"
 #include "block/aio.h"
 #include "sysemu/cpus.h"
@@ -134,7 +134,8 @@ static void ptimer_reload(ptimer_state *s, int delta_adjust)
      * on the current generation of host machines.
      */
 
-    if (s->enabled == 1 && (delta * period < 10000) && !use_icount) {
+    if (s->enabled == 1 && (delta * period < 10000) &&
+        !icount_enabled() && !qtest_enabled()) {
         period = 10000 / delta;
         period_frac = 0;
     }
@@ -217,7 +218,8 @@ uint64_t ptimer_get_count(ptimer_state *s)
             uint32_t period_frac = s->period_frac;
             uint64_t period = s->period;
 
-            if (!oneshot && (s->delta * period < 10000) && !use_icount) {
+            if (!oneshot && (s->delta * period < 10000) &&
+                !icount_enabled() && !qtest_enabled()) {
                 period = 10000 / s->delta;
                 period_frac = 0;
             }
diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 67bee1bcb8..58cf2229d5 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -34,6 +34,7 @@
 #include "sysemu/numa.h"
 #include "sysemu/replay.h"
 #include "sysemu/sysemu.h"
+#include "sysemu/cpu-timers.h"
 #include "trace.h"
 
 #include "hw/i386/x86.h"
diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index fc403d456b..25b6005a91 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -407,8 +407,12 @@ static inline bool tlb_hit(target_ulong tlb_addr, target_ulong addr)
     return tlb_hit_page(tlb_addr, addr & TARGET_PAGE_MASK);
 }
 
+#ifdef CONFIG_TCG
+void dump_drift_info(void);
 void dump_exec_info(void);
 void dump_opcount_info(void);
+#endif /* CONFIG_TCG */
+
 #endif /* !CONFIG_USER_ONLY */
 
 /* Returns: 0 on success, -1 on error */
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 3cf88272df..e019b505a5 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -25,7 +25,7 @@
 #ifdef CONFIG_TCG
 #include "exec/cpu_ldst.h"
 #endif
-#include "sysemu/cpus.h"
+#include "sysemu/cpu-timers.h"
 
 /* allow to see translation results - the slowdown should be negligible, so we leave it */
 #define DEBUG_DISAS
@@ -497,7 +497,7 @@ static inline uint32_t tb_cflags(const TranslationBlock *tb)
 static inline uint32_t curr_cflags(void)
 {
     return (parallel_cpus ? CF_PARALLEL : 0)
-         | (use_icount ? CF_USE_ICOUNT : 0);
+         | (icount_enabled() ? CF_USE_ICOUNT : 0);
 }
 
 /* TranslationBlock invalidate API */
diff --git a/include/qemu/timer.h b/include/qemu/timer.h
index 6a8b48b5a9..2f7afc1f68 100644
--- a/include/qemu/timer.h
+++ b/include/qemu/timer.h
@@ -166,8 +166,8 @@ bool qemu_clock_expired(QEMUClockType type);
  *
  * Determine whether a clock should be used for deadline
  * calculations. Some clocks, for instance vm_clock with
- * use_icount set, do not count in nanoseconds. Such clocks
- * are not used for deadline calculations, and are presumed
+ * icount_enabled() set, do not count in nanoseconds.
+ * Such clocks are not used for deadline calculations, and are presumed
  * to interrupt any poll using qemu_notify/aio_notify
  * etc.
  *
@@ -224,13 +224,6 @@ void qemu_clock_notify(QEMUClockType type);
  */
 void qemu_clock_enable(QEMUClockType type, bool enabled);
 
-/**
- * qemu_start_warp_timer:
- *
- * Starts a timer for virtual clock update
- */
-void qemu_start_warp_timer(void);
-
 /**
  * qemu_clock_run_timers:
  * @type: clock on which to operate
@@ -791,12 +784,6 @@ static inline int64_t qemu_soonest_timeout(int64_t timeout1, int64_t timeout2)
  */
 void init_clocks(QEMUTimerListNotifyCB *notify_cb);
 
-int64_t cpu_get_ticks(void);
-/* Caller must hold BQL */
-void cpu_enable_ticks(void);
-/* Caller must hold BQL */
-void cpu_disable_ticks(void);
-
 static inline int64_t get_max_clock_jump(void)
 {
     /* This should be small enough to prevent excessive interrupts from being
@@ -850,13 +837,6 @@ static inline int64_t get_clock(void)
 }
 #endif
 
-/* icount */
-int64_t cpu_get_icount_raw(void);
-int64_t cpu_get_icount(void);
-int64_t cpu_get_clock(void);
-int64_t cpu_icount_to_ns(int64_t icount);
-void    cpu_update_icount(CPUState *cpu);
-
 /*******************************************/
 /* host CPU ticks (if available) */
 
diff --git a/include/sysemu/cpu-timers.h b/include/sysemu/cpu-timers.h
new file mode 100644
index 0000000000..07d724672f
--- /dev/null
+++ b/include/sysemu/cpu-timers.h
@@ -0,0 +1,81 @@
+/*
+ * CPU timers state API
+ *
+ * Copyright 2020 SUSE LLC
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+#ifndef SYSEMU_CPU_TIMERS_H
+#define SYSEMU_CPU_TIMERS_H
+
+#include "qemu/timer.h"
+
+/* init the whole cpu timers API, including icount, ticks, and cpu_throttle */
+void cpu_timers_init(void);
+
+/* icount - Instruction Counter API */
+
+/*
+ * Return the icount enablement state:
+ *
+ * 0 = Disabled - Do not count executed instructions.
+ * 1 = Enabled - Fixed conversion of insn to ns via "shift" option
+ * 2 = Enabled - Runtime adaptive algorithm to compute shift
+ */
+int icount_enabled(void);
+/*
+ * Update the icount with the executed instructions. Called by
+ * cpus-tcg vCPU thread so the main-loop can see time has moved forward.
+ */
+void icount_update(CPUState *cpu);
+
+/* get raw icount value */
+int64_t icount_get_raw(void);
+
+/* return the virtual CPU time in ns, based on the instruction counter. */
+int64_t icount_get(void);
+/*
+ * convert an instruction counter value to ns, based on the icount shift.
+ * This shift is set as a fixed value with the icount "shift" option
+ * (precise mode), or it is constantly approximated and corrected at
+ * runtime in adaptive mode.
+ */
+int64_t icount_to_ns(int64_t icount);
+
+/* configure the icount options, including "shift" */
+void icount_configure(QemuOpts *opts, Error **errp);
+
+/* used by tcg vcpu thread to calc icount budget */
+int64_t icount_round(int64_t count);
+
+/* if the CPUs are idle, start accounting real time to virtual clock. */
+void icount_start_warp_timer(void);
+void icount_account_warp_timer(void);
+
+/*
+ * CPU Ticks and Clock
+ */
+
+/* Caller must hold BQL */
+void cpu_enable_ticks(void);
+/* Caller must hold BQL */
+void cpu_disable_ticks(void);
+
+/*
+ * return the time elapsed in VM between vm_start and vm_stop.  Unless
+ * icount is active, cpu_get_ticks() uses units of the host CPU cycle
+ * counter.
+ */
+int64_t cpu_get_ticks(void);
+
+/*
+ * Returns the monotonic time elapsed in VM, i.e.,
+ * the time between vm_start and vm_stop
+ */
+int64_t cpu_get_clock(void);
+
+void qemu_timer_notify_cb(void *opaque, QEMUClockType type);
+
+#endif /* SYSEMU_CPU_TIMERS_H */
diff --git a/include/sysemu/cpus.h b/include/sysemu/cpus.h
index 3c1da6a018..149de000a0 100644
--- a/include/sysemu/cpus.h
+++ b/include/sysemu/cpus.h
@@ -4,33 +4,23 @@
 #include "qemu/timer.h"
 
 /* cpus.c */
+bool all_cpu_threads_idle(void);
 bool qemu_in_vcpu_thread(void);
 void qemu_init_cpu_loop(void);
 void resume_all_vcpus(void);
 void pause_all_vcpus(void);
 void cpu_stop_current(void);
-void cpu_ticks_init(void);
 
-void configure_icount(QemuOpts *opts, Error **errp);
-extern int use_icount;
 extern int icount_align_option;
 
-/* drift information for info jit command */
-extern int64_t max_delay;
-extern int64_t max_advance;
-void dump_drift_info(void);
-
 /* Unblock cpu */
 void qemu_cpu_kick_self(void);
-void qemu_timer_notify_cb(void *opaque, QEMUClockType type);
 
 void cpu_synchronize_all_states(void);
 void cpu_synchronize_all_post_reset(void);
 void cpu_synchronize_all_post_init(void);
 void cpu_synchronize_all_pre_loadvm(void);
 
-void qtest_clock_warp(int64_t dest);
-
 #ifndef CONFIG_USER_ONLY
 /* vl.c */
 /* *-user doesn't have configurable SMP topology */
diff --git a/include/sysemu/qtest.h b/include/sysemu/qtest.h
index eedd3664f0..4c53537ef3 100644
--- a/include/sysemu/qtest.h
+++ b/include/sysemu/qtest.h
@@ -30,4 +30,6 @@ void qtest_server_set_send_handler(void (*send)(void *, const char *),
                                  void *opaque);
 void qtest_server_inproc_recv(void *opaque, const char *buf);
 
+int64_t qtest_get_virtual_clock(void);
+
 #endif
diff --git a/include/sysemu/replay.h b/include/sysemu/replay.h
index 5471bb514d..a140d69a73 100644
--- a/include/sysemu/replay.h
+++ b/include/sysemu/replay.h
@@ -109,12 +109,12 @@ int64_t replay_read_clock(ReplayClockKind kind);
 #define REPLAY_CLOCK(clock, value)                                      \
     (replay_mode == REPLAY_MODE_PLAY ? replay_read_clock((clock))       \
         : replay_mode == REPLAY_MODE_RECORD                             \
-            ? replay_save_clock((clock), (value), cpu_get_icount_raw()) \
+            ? replay_save_clock((clock), (value), icount_get_raw()) \
         : (value))
 #define REPLAY_CLOCK_LOCKED(clock, value)                               \
     (replay_mode == REPLAY_MODE_PLAY ? replay_read_clock((clock))       \
         : replay_mode == REPLAY_MODE_RECORD                             \
-            ? replay_save_clock((clock), (value), cpu_get_icount_raw_locked()) \
+            ? replay_save_clock((clock), (value), icount_get_raw_locked()) \
         : (value))
 
 /* Processing data from random generators */
diff --git a/replay/replay.c b/replay/replay.c
index 83ed9e0e24..4c1457b07e 100644
--- a/replay/replay.c
+++ b/replay/replay.c
@@ -11,10 +11,10 @@
 
 #include "qemu/osdep.h"
 #include "qapi/error.h"
+#include "sysemu/cpu-timers.h"
 #include "sysemu/replay.h"
 #include "sysemu/runstate.h"
 #include "replay-internal.h"
-#include "qemu/timer.h"
 #include "qemu/main-loop.h"
 #include "qemu/option.h"
 #include "sysemu/cpus.h"
@@ -64,7 +64,7 @@ bool replay_next_event_is(int event)
 
 uint64_t replay_get_current_icount(void)
 {
-    return cpu_get_icount_raw();
+    return icount_get_raw();
 }
 
 int replay_get_instructions(void)
@@ -345,7 +345,7 @@ void replay_start(void)
         error_reportf_err(replay_blockers->data, "Record/replay: ");
         exit(1);
     }
-    if (!use_icount) {
+    if (!icount_enabled()) {
         error_report("Please enable icount to use record/replay");
         exit(1);
     }
diff --git a/softmmu/Makefile.objs b/softmmu/Makefile.objs
index a414a74c50..9c0125f37b 100644
--- a/softmmu/Makefile.objs
+++ b/softmmu/Makefile.objs
@@ -7,6 +7,8 @@ obj-y += balloon.o
 obj-y += ioport.o
 obj-y += memory.o
 obj-y += memory_mapping.o
+obj-y += cpu-timers.o
+obj-$(CONFIG_TCG) += icount.o
 
 obj-y += qtest.o
 
diff --git a/softmmu/cpu-timers.c b/softmmu/cpu-timers.c
new file mode 100644
index 0000000000..64addb315d
--- /dev/null
+++ b/softmmu/cpu-timers.c
@@ -0,0 +1,284 @@
+/*
+ * QEMU System Emulator
+ *
+ * Copyright (c) 2003-2008 Fabrice Bellard
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "qemu/cutils.h"
+#include "migration/vmstate.h"
+#include "qapi/error.h"
+#include "qemu/error-report.h"
+#include "exec/exec-all.h"
+#include "sysemu/cpus.h"
+#include "sysemu/qtest.h"
+#include "qemu/main-loop.h"
+#include "qemu/option.h"
+#include "qemu/seqlock.h"
+#include "sysemu/replay.h"
+#include "sysemu/runstate.h"
+#include "hw/core/cpu.h"
+#include "sysemu/cpu-timers.h"
+#include "sysemu/cpu-throttle.h"
+#include "timers-state.h"
+
+/* clock and ticks */
+
+static int64_t cpu_get_ticks_locked(void)
+{
+    int64_t ticks = timers_state.cpu_ticks_offset;
+    if (timers_state.cpu_ticks_enabled) {
+        ticks += cpu_get_host_ticks();
+    }
+
+    if (timers_state.cpu_ticks_prev > ticks) {
+        /* Non increasing ticks may happen if the host uses software suspend. */
+        timers_state.cpu_ticks_offset += timers_state.cpu_ticks_prev - ticks;
+        ticks = timers_state.cpu_ticks_prev;
+    }
+
+    timers_state.cpu_ticks_prev = ticks;
+    return ticks;
+}
+
+/*
+ * return the time elapsed in VM between vm_start and vm_stop.  Unless
+ * icount is active, cpu_get_ticks() uses units of the host CPU cycle
+ * counter.
+ */
+int64_t cpu_get_ticks(void)
+{
+    int64_t ticks;
+
+    if (icount_enabled()) {
+        return icount_get();
+    }
+
+    qemu_spin_lock(&timers_state.vm_clock_lock);
+    ticks = cpu_get_ticks_locked();
+    qemu_spin_unlock(&timers_state.vm_clock_lock);
+    return ticks;
+}
+
+int64_t cpu_get_clock_locked(void)
+{
+    int64_t time;
+
+    time = timers_state.cpu_clock_offset;
+    if (timers_state.cpu_ticks_enabled) {
+        time += get_clock();
+    }
+
+    return time;
+}
+
+/*
+ * Return the monotonic time elapsed in VM, i.e.,
+ * the time between vm_start and vm_stop
+ */
+int64_t cpu_get_clock(void)
+{
+    int64_t ti;
+    unsigned start;
+
+    do {
+        start = seqlock_read_begin(&timers_state.vm_clock_seqlock);
+        ti = cpu_get_clock_locked();
+    } while (seqlock_read_retry(&timers_state.vm_clock_seqlock, start));
+
+    return ti;
+}
+
+/*
+ * enable cpu_get_ticks()
+ * Caller must hold BQL which serves as mutex for vm_clock_seqlock.
+ */
+void cpu_enable_ticks(void)
+{
+    seqlock_write_lock(&timers_state.vm_clock_seqlock,
+                       &timers_state.vm_clock_lock);
+    if (!timers_state.cpu_ticks_enabled) {
+        timers_state.cpu_ticks_offset -= cpu_get_host_ticks();
+        timers_state.cpu_clock_offset -= get_clock();
+        timers_state.cpu_ticks_enabled = 1;
+    }
+    seqlock_write_unlock(&timers_state.vm_clock_seqlock,
+                       &timers_state.vm_clock_lock);
+}
+
+/*
+ * disable cpu_get_ticks() : the clock is stopped. You must not call
+ * cpu_get_ticks() after that.
+ * Caller must hold BQL which serves as mutex for vm_clock_seqlock.
+ */
+void cpu_disable_ticks(void)
+{
+    seqlock_write_lock(&timers_state.vm_clock_seqlock,
+                       &timers_state.vm_clock_lock);
+    if (timers_state.cpu_ticks_enabled) {
+        timers_state.cpu_ticks_offset += cpu_get_host_ticks();
+        timers_state.cpu_clock_offset = cpu_get_clock_locked();
+        timers_state.cpu_ticks_enabled = 0;
+    }
+    seqlock_write_unlock(&timers_state.vm_clock_seqlock,
+                         &timers_state.vm_clock_lock);
+}
+
+static bool icount_state_needed(void *opaque)
+{
+    return icount_enabled();
+}
+
+static bool icount_shift_state_needed(void *opaque)
+{
+    return icount_enabled() == 2;
+}
+
+static bool warp_timer_state_needed(void *opaque)
+{
+    TimersState *s = opaque;
+    return s->icount_warp_timer != NULL;
+}
+
+static bool adjust_timers_state_needed(void *opaque)
+{
+    TimersState *s = opaque;
+    return s->icount_rt_timer != NULL;
+}
+
+/*
+ * Subsection for warp timer migration is optional, because may not be created
+ */
+static const VMStateDescription icount_vmstate_warp_timer = {
+    .name = "timer/icount/warp_timer",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = warp_timer_state_needed,
+    .fields = (VMStateField[]) {
+        VMSTATE_INT64(vm_clock_warp_start, TimersState),
+        VMSTATE_TIMER_PTR(icount_warp_timer, TimersState),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static const VMStateDescription icount_vmstate_adjust_timers = {
+    .name = "timer/icount/timers",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = adjust_timers_state_needed,
+    .fields = (VMStateField[]) {
+        VMSTATE_TIMER_PTR(icount_rt_timer, TimersState),
+        VMSTATE_TIMER_PTR(icount_vm_timer, TimersState),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static const VMStateDescription icount_vmstate_shift = {
+    .name = "timer/icount/shift",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = icount_shift_state_needed,
+    .fields = (VMStateField[]) {
+        VMSTATE_INT16(icount_time_shift, TimersState),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+/*
+ * This is a subsection for icount migration.
+ */
+static const VMStateDescription icount_vmstate_timers = {
+    .name = "timer/icount",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = icount_state_needed,
+    .fields = (VMStateField[]) {
+        VMSTATE_INT64(qemu_icount_bias, TimersState),
+        VMSTATE_INT64(qemu_icount, TimersState),
+        VMSTATE_END_OF_LIST()
+    },
+    .subsections = (const VMStateDescription * []) {
+        &icount_vmstate_warp_timer,
+        &icount_vmstate_adjust_timers,
+        &icount_vmstate_shift,
+        NULL
+    }
+};
+
+static const VMStateDescription vmstate_timers = {
+    .name = "timer",
+    .version_id = 2,
+    .minimum_version_id = 1,
+    .fields = (VMStateField[]) {
+        VMSTATE_INT64(cpu_ticks_offset, TimersState),
+        VMSTATE_UNUSED(8),
+        VMSTATE_INT64_V(cpu_clock_offset, TimersState, 2),
+        VMSTATE_END_OF_LIST()
+    },
+    .subsections = (const VMStateDescription * []) {
+        &icount_vmstate_timers,
+        NULL
+    }
+};
+
+static void do_nothing(CPUState *cpu, run_on_cpu_data unused)
+{
+}
+
+void qemu_timer_notify_cb(void *opaque, QEMUClockType type)
+{
+    if (!icount_enabled() || type != QEMU_CLOCK_VIRTUAL) {
+        qemu_notify_event();
+        return;
+    }
+
+    if (qemu_in_vcpu_thread()) {
+        /*
+         * A CPU is currently running; kick it back out to the
+         * tcg_cpu_exec() loop so it will recalculate its
+         * icount deadline immediately.
+         */
+        qemu_cpu_kick(current_cpu);
+    } else if (first_cpu) {
+        /*
+         * qemu_cpu_kick is not enough to kick a halted CPU out of
+         * qemu_tcg_wait_io_event.  async_run_on_cpu, instead,
+         * causes cpu_thread_is_idle to return false.  This way,
+         * handle_icount_deadline can run.
+         * If we have no CPUs at all for some reason, we don't
+         * need to do anything.
+         */
+        async_run_on_cpu(first_cpu, do_nothing, RUN_ON_CPU_NULL);
+    }
+}
+
+TimersState timers_state;
+
+/* initialize timers state and the cpu throttle for convenience */
+void cpu_timers_init(void)
+{
+    seqlock_init(&timers_state.vm_clock_seqlock);
+    qemu_spin_init(&timers_state.vm_clock_lock);
+    vmstate_register(NULL, 0, &vmstate_timers, &timers_state);
+
+    cpu_throttle_init();
+}
diff --git a/softmmu/cpus.c b/softmmu/cpus.c
index a802e899ab..54fdb2761c 100644
--- a/softmmu/cpus.c
+++ b/softmmu/cpus.c
@@ -58,11 +58,10 @@
 #include "hw/nmi.h"
 #include "sysemu/replay.h"
 #include "sysemu/runstate.h"
+#include "sysemu/cpu-timers.h"
 #include "hw/boards.h"
 #include "hw/hw.h"
 
-#include "sysemu/cpu-throttle.h"
-
 #ifdef CONFIG_LINUX
 
 #include <sys/prctl.h>
@@ -83,9 +82,6 @@
 
 static QemuMutex qemu_global_mutex;
 
-int64_t max_delay;
-int64_t max_advance;
-
 bool cpu_is_stopped(CPUState *cpu)
 {
     return cpu->stopped || !runstate_is_running();
@@ -116,7 +112,7 @@ static bool cpu_thread_is_idle(CPUState *cpu)
     return true;
 }
 
-static bool all_cpu_threads_idle(void)
+bool all_cpu_threads_idle(void)
 {
     CPUState *cpu;
 
@@ -128,688 +124,9 @@ static bool all_cpu_threads_idle(void)
     return true;
 }
 
-/***********************************************************/
-/* guest cycle counter */
-
-/* Protected by TimersState seqlock */
-
-static bool icount_sleep = true;
-/* Arbitrarily pick 1MIPS as the minimum allowable speed.  */
-#define MAX_ICOUNT_SHIFT 10
-
-typedef struct TimersState {
-    /* Protected by BQL.  */
-    int64_t cpu_ticks_prev;
-    int64_t cpu_ticks_offset;
-
-    /* Protect fields that can be respectively read outside the
-     * BQL, and written from multiple threads.
-     */
-    QemuSeqLock vm_clock_seqlock;
-    QemuSpin vm_clock_lock;
-
-    int16_t cpu_ticks_enabled;
-
-    /* Conversion factor from emulated instructions to virtual clock ticks.  */
-    int16_t icount_time_shift;
-
-    /* Compensate for varying guest execution speed.  */
-    int64_t qemu_icount_bias;
-
-    int64_t vm_clock_warp_start;
-    int64_t cpu_clock_offset;
-
-    /* Only written by TCG thread */
-    int64_t qemu_icount;
-
-    /* for adjusting icount */
-    QEMUTimer *icount_rt_timer;
-    QEMUTimer *icount_vm_timer;
-    QEMUTimer *icount_warp_timer;
-} TimersState;
-
-static TimersState timers_state;
 bool mttcg_enabled;
 
 
-/* The current number of executed instructions is based on what we
- * originally budgeted minus the current state of the decrementing
- * icount counters in extra/u16.low.
- */
-static int64_t cpu_get_icount_executed(CPUState *cpu)
-{
-    return (cpu->icount_budget -
-            (cpu_neg(cpu)->icount_decr.u16.low + cpu->icount_extra));
-}
-
-/*
- * Update the global shared timer_state.qemu_icount to take into
- * account executed instructions. This is done by the TCG vCPU
- * thread so the main-loop can see time has moved forward.
- */
-static void cpu_update_icount_locked(CPUState *cpu)
-{
-    int64_t executed = cpu_get_icount_executed(cpu);
-    cpu->icount_budget -= executed;
-
-    atomic_set_i64(&timers_state.qemu_icount,
-                   timers_state.qemu_icount + executed);
-}
-
-/*
- * Update the global shared timer_state.qemu_icount to take into
- * account executed instructions. This is done by the TCG vCPU
- * thread so the main-loop can see time has moved forward.
- */
-void cpu_update_icount(CPUState *cpu)
-{
-    seqlock_write_lock(&timers_state.vm_clock_seqlock,
-                       &timers_state.vm_clock_lock);
-    cpu_update_icount_locked(cpu);
-    seqlock_write_unlock(&timers_state.vm_clock_seqlock,
-                         &timers_state.vm_clock_lock);
-}
-
-static int64_t cpu_get_icount_raw_locked(void)
-{
-    CPUState *cpu = current_cpu;
-
-    if (cpu && cpu->running) {
-        if (!cpu->can_do_io) {
-            error_report("Bad icount read");
-            exit(1);
-        }
-        /* Take into account what has run */
-        cpu_update_icount_locked(cpu);
-    }
-    /* The read is protected by the seqlock, but needs atomic64 to avoid UB */
-    return atomic_read_i64(&timers_state.qemu_icount);
-}
-
-static int64_t cpu_get_icount_locked(void)
-{
-    int64_t icount = cpu_get_icount_raw_locked();
-    return atomic_read_i64(&timers_state.qemu_icount_bias) +
-        cpu_icount_to_ns(icount);
-}
-
-int64_t cpu_get_icount_raw(void)
-{
-    int64_t icount;
-    unsigned start;
-
-    do {
-        start = seqlock_read_begin(&timers_state.vm_clock_seqlock);
-        icount = cpu_get_icount_raw_locked();
-    } while (seqlock_read_retry(&timers_state.vm_clock_seqlock, start));
-
-    return icount;
-}
-
-/* Return the virtual CPU time, based on the instruction counter.  */
-int64_t cpu_get_icount(void)
-{
-    int64_t icount;
-    unsigned start;
-
-    do {
-        start = seqlock_read_begin(&timers_state.vm_clock_seqlock);
-        icount = cpu_get_icount_locked();
-    } while (seqlock_read_retry(&timers_state.vm_clock_seqlock, start));
-
-    return icount;
-}
-
-int64_t cpu_icount_to_ns(int64_t icount)
-{
-    return icount << atomic_read(&timers_state.icount_time_shift);
-}
-
-static int64_t cpu_get_ticks_locked(void)
-{
-    int64_t ticks = timers_state.cpu_ticks_offset;
-    if (timers_state.cpu_ticks_enabled) {
-        ticks += cpu_get_host_ticks();
-    }
-
-    if (timers_state.cpu_ticks_prev > ticks) {
-        /* Non increasing ticks may happen if the host uses software suspend.  */
-        timers_state.cpu_ticks_offset += timers_state.cpu_ticks_prev - ticks;
-        ticks = timers_state.cpu_ticks_prev;
-    }
-
-    timers_state.cpu_ticks_prev = ticks;
-    return ticks;
-}
-
-/* return the time elapsed in VM between vm_start and vm_stop.  Unless
- * icount is active, cpu_get_ticks() uses units of the host CPU cycle
- * counter.
- */
-int64_t cpu_get_ticks(void)
-{
-    int64_t ticks;
-
-    if (use_icount) {
-        return cpu_get_icount();
-    }
-
-    qemu_spin_lock(&timers_state.vm_clock_lock);
-    ticks = cpu_get_ticks_locked();
-    qemu_spin_unlock(&timers_state.vm_clock_lock);
-    return ticks;
-}
-
-static int64_t cpu_get_clock_locked(void)
-{
-    int64_t time;
-
-    time = timers_state.cpu_clock_offset;
-    if (timers_state.cpu_ticks_enabled) {
-        time += get_clock();
-    }
-
-    return time;
-}
-
-/* Return the monotonic time elapsed in VM, i.e.,
- * the time between vm_start and vm_stop
- */
-int64_t cpu_get_clock(void)
-{
-    int64_t ti;
-    unsigned start;
-
-    do {
-        start = seqlock_read_begin(&timers_state.vm_clock_seqlock);
-        ti = cpu_get_clock_locked();
-    } while (seqlock_read_retry(&timers_state.vm_clock_seqlock, start));
-
-    return ti;
-}
-
-/* enable cpu_get_ticks()
- * Caller must hold BQL which serves as mutex for vm_clock_seqlock.
- */
-void cpu_enable_ticks(void)
-{
-    seqlock_write_lock(&timers_state.vm_clock_seqlock,
-                       &timers_state.vm_clock_lock);
-    if (!timers_state.cpu_ticks_enabled) {
-        timers_state.cpu_ticks_offset -= cpu_get_host_ticks();
-        timers_state.cpu_clock_offset -= get_clock();
-        timers_state.cpu_ticks_enabled = 1;
-    }
-    seqlock_write_unlock(&timers_state.vm_clock_seqlock,
-                       &timers_state.vm_clock_lock);
-}
-
-/* disable cpu_get_ticks() : the clock is stopped. You must not call
- * cpu_get_ticks() after that.
- * Caller must hold BQL which serves as mutex for vm_clock_seqlock.
- */
-void cpu_disable_ticks(void)
-{
-    seqlock_write_lock(&timers_state.vm_clock_seqlock,
-                       &timers_state.vm_clock_lock);
-    if (timers_state.cpu_ticks_enabled) {
-        timers_state.cpu_ticks_offset += cpu_get_host_ticks();
-        timers_state.cpu_clock_offset = cpu_get_clock_locked();
-        timers_state.cpu_ticks_enabled = 0;
-    }
-    seqlock_write_unlock(&timers_state.vm_clock_seqlock,
-                         &timers_state.vm_clock_lock);
-}
-
-/* Correlation between real and virtual time is always going to be
-   fairly approximate, so ignore small variation.
-   When the guest is idle real and virtual time will be aligned in
-   the IO wait loop.  */
-#define ICOUNT_WOBBLE (NANOSECONDS_PER_SECOND / 10)
-
-static void icount_adjust(void)
-{
-    int64_t cur_time;
-    int64_t cur_icount;
-    int64_t delta;
-
-    /* Protected by TimersState mutex.  */
-    static int64_t last_delta;
-
-    /* If the VM is not running, then do nothing.  */
-    if (!runstate_is_running()) {
-        return;
-    }
-
-    seqlock_write_lock(&timers_state.vm_clock_seqlock,
-                       &timers_state.vm_clock_lock);
-    cur_time = REPLAY_CLOCK_LOCKED(REPLAY_CLOCK_VIRTUAL_RT,
-                                   cpu_get_clock_locked());
-    cur_icount = cpu_get_icount_locked();
-
-    delta = cur_icount - cur_time;
-    /* FIXME: This is a very crude algorithm, somewhat prone to oscillation.  */
-    if (delta > 0
-        && last_delta + ICOUNT_WOBBLE < delta * 2
-        && timers_state.icount_time_shift > 0) {
-        /* The guest is getting too far ahead.  Slow time down.  */
-        atomic_set(&timers_state.icount_time_shift,
-                   timers_state.icount_time_shift - 1);
-    }
-    if (delta < 0
-        && last_delta - ICOUNT_WOBBLE > delta * 2
-        && timers_state.icount_time_shift < MAX_ICOUNT_SHIFT) {
-        /* The guest is getting too far behind.  Speed time up.  */
-        atomic_set(&timers_state.icount_time_shift,
-                   timers_state.icount_time_shift + 1);
-    }
-    last_delta = delta;
-    atomic_set_i64(&timers_state.qemu_icount_bias,
-                   cur_icount - (timers_state.qemu_icount
-                                 << timers_state.icount_time_shift));
-    seqlock_write_unlock(&timers_state.vm_clock_seqlock,
-                         &timers_state.vm_clock_lock);
-}
-
-static void icount_adjust_rt(void *opaque)
-{
-    timer_mod(timers_state.icount_rt_timer,
-              qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL_RT) + 1000);
-    icount_adjust();
-}
-
-static void icount_adjust_vm(void *opaque)
-{
-    timer_mod(timers_state.icount_vm_timer,
-                   qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) +
-                   NANOSECONDS_PER_SECOND / 10);
-    icount_adjust();
-}
-
-static int64_t qemu_icount_round(int64_t count)
-{
-    int shift = atomic_read(&timers_state.icount_time_shift);
-    return (count + (1 << shift) - 1) >> shift;
-}
-
-static void icount_warp_rt(void)
-{
-    unsigned seq;
-    int64_t warp_start;
-
-    /* The icount_warp_timer is rescheduled soon after vm_clock_warp_start
-     * changes from -1 to another value, so the race here is okay.
-     */
-    do {
-        seq = seqlock_read_begin(&timers_state.vm_clock_seqlock);
-        warp_start = timers_state.vm_clock_warp_start;
-    } while (seqlock_read_retry(&timers_state.vm_clock_seqlock, seq));
-
-    if (warp_start == -1) {
-        return;
-    }
-
-    seqlock_write_lock(&timers_state.vm_clock_seqlock,
-                       &timers_state.vm_clock_lock);
-    if (runstate_is_running()) {
-        int64_t clock = REPLAY_CLOCK_LOCKED(REPLAY_CLOCK_VIRTUAL_RT,
-                                            cpu_get_clock_locked());
-        int64_t warp_delta;
-
-        warp_delta = clock - timers_state.vm_clock_warp_start;
-        if (use_icount == 2) {
-            /*
-             * In adaptive mode, do not let QEMU_CLOCK_VIRTUAL run too
-             * far ahead of real time.
-             */
-            int64_t cur_icount = cpu_get_icount_locked();
-            int64_t delta = clock - cur_icount;
-            warp_delta = MIN(warp_delta, delta);
-        }
-        atomic_set_i64(&timers_state.qemu_icount_bias,
-                       timers_state.qemu_icount_bias + warp_delta);
-    }
-    timers_state.vm_clock_warp_start = -1;
-    seqlock_write_unlock(&timers_state.vm_clock_seqlock,
-                       &timers_state.vm_clock_lock);
-
-    if (qemu_clock_expired(QEMU_CLOCK_VIRTUAL)) {
-        qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
-    }
-}
-
-static void icount_timer_cb(void *opaque)
-{
-    /* No need for a checkpoint because the timer already synchronizes
-     * with CHECKPOINT_CLOCK_VIRTUAL_RT.
-     */
-    icount_warp_rt();
-}
-
-void qtest_clock_warp(int64_t dest)
-{
-    int64_t clock = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
-    AioContext *aio_context;
-    assert(qtest_enabled());
-    aio_context = qemu_get_aio_context();
-    while (clock < dest) {
-        int64_t deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL,
-                                                      QEMU_TIMER_ATTR_ALL);
-        int64_t warp = qemu_soonest_timeout(dest - clock, deadline);
-
-        seqlock_write_lock(&timers_state.vm_clock_seqlock,
-                           &timers_state.vm_clock_lock);
-        atomic_set_i64(&timers_state.qemu_icount_bias,
-                       timers_state.qemu_icount_bias + warp);
-        seqlock_write_unlock(&timers_state.vm_clock_seqlock,
-                             &timers_state.vm_clock_lock);
-
-        qemu_clock_run_timers(QEMU_CLOCK_VIRTUAL);
-        timerlist_run_timers(aio_context->tlg.tl[QEMU_CLOCK_VIRTUAL]);
-        clock = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
-    }
-    qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
-}
-
-void qemu_start_warp_timer(void)
-{
-    int64_t clock;
-    int64_t deadline;
-
-    if (!use_icount) {
-        return;
-    }
-
-    /* Nothing to do if the VM is stopped: QEMU_CLOCK_VIRTUAL timers
-     * do not fire, so computing the deadline does not make sense.
-     */
-    if (!runstate_is_running()) {
-        return;
-    }
-
-    if (replay_mode != REPLAY_MODE_PLAY) {
-        if (!all_cpu_threads_idle()) {
-            return;
-        }
-
-        if (qtest_enabled()) {
-            /* When testing, qtest commands advance icount.  */
-            return;
-        }
-
-        replay_checkpoint(CHECKPOINT_CLOCK_WARP_START);
-    } else {
-        /* warp clock deterministically in record/replay mode */
-        if (!replay_checkpoint(CHECKPOINT_CLOCK_WARP_START)) {
-            /* vCPU is sleeping and warp can't be started.
-               It is probably a race condition: notification sent
-               to vCPU was processed in advance and vCPU went to sleep.
-               Therefore we have to wake it up for doing someting. */
-            if (replay_has_checkpoint()) {
-                qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
-            }
-            return;
-        }
-    }
-
-    /* We want to use the earliest deadline from ALL vm_clocks */
-    clock = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL_RT);
-    deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL,
-                                          ~QEMU_TIMER_ATTR_EXTERNAL);
-    if (deadline < 0) {
-        static bool notified;
-        if (!icount_sleep && !notified) {
-            warn_report("icount sleep disabled and no active timers");
-            notified = true;
-        }
-        return;
-    }
-
-    if (deadline > 0) {
-        /*
-         * Ensure QEMU_CLOCK_VIRTUAL proceeds even when the virtual CPU goes to
-         * sleep.  Otherwise, the CPU might be waiting for a future timer
-         * interrupt to wake it up, but the interrupt never comes because
-         * the vCPU isn't running any insns and thus doesn't advance the
-         * QEMU_CLOCK_VIRTUAL.
-         */
-        if (!icount_sleep) {
-            /*
-             * We never let VCPUs sleep in no sleep icount mode.
-             * If there is a pending QEMU_CLOCK_VIRTUAL timer we just advance
-             * to the next QEMU_CLOCK_VIRTUAL event and notify it.
-             * It is useful when we want a deterministic execution time,
-             * isolated from host latencies.
-             */
-            seqlock_write_lock(&timers_state.vm_clock_seqlock,
-                               &timers_state.vm_clock_lock);
-            atomic_set_i64(&timers_state.qemu_icount_bias,
-                           timers_state.qemu_icount_bias + deadline);
-            seqlock_write_unlock(&timers_state.vm_clock_seqlock,
-                                 &timers_state.vm_clock_lock);
-            qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
-        } else {
-            /*
-             * We do stop VCPUs and only advance QEMU_CLOCK_VIRTUAL after some
-             * "real" time, (related to the time left until the next event) has
-             * passed. The QEMU_CLOCK_VIRTUAL_RT clock will do this.
-             * This avoids that the warps are visible externally; for example,
-             * you will not be sending network packets continuously instead of
-             * every 100ms.
-             */
-            seqlock_write_lock(&timers_state.vm_clock_seqlock,
-                               &timers_state.vm_clock_lock);
-            if (timers_state.vm_clock_warp_start == -1
-                || timers_state.vm_clock_warp_start > clock) {
-                timers_state.vm_clock_warp_start = clock;
-            }
-            seqlock_write_unlock(&timers_state.vm_clock_seqlock,
-                                 &timers_state.vm_clock_lock);
-            timer_mod_anticipate(timers_state.icount_warp_timer,
-                                 clock + deadline);
-        }
-    } else if (deadline == 0) {
-        qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
-    }
-}
-
-static void qemu_account_warp_timer(void)
-{
-    if (!use_icount || !icount_sleep) {
-        return;
-    }
-
-    /* Nothing to do if the VM is stopped: QEMU_CLOCK_VIRTUAL timers
-     * do not fire, so computing the deadline does not make sense.
-     */
-    if (!runstate_is_running()) {
-        return;
-    }
-
-    /* warp clock deterministically in record/replay mode */
-    if (!replay_checkpoint(CHECKPOINT_CLOCK_WARP_ACCOUNT)) {
-        return;
-    }
-
-    timer_del(timers_state.icount_warp_timer);
-    icount_warp_rt();
-}
-
-static bool icount_state_needed(void *opaque)
-{
-    return use_icount;
-}
-
-static bool warp_timer_state_needed(void *opaque)
-{
-    TimersState *s = opaque;
-    return s->icount_warp_timer != NULL;
-}
-
-static bool adjust_timers_state_needed(void *opaque)
-{
-    TimersState *s = opaque;
-    return s->icount_rt_timer != NULL;
-}
-
-static bool shift_state_needed(void *opaque)
-{
-    return use_icount == 2;
-}
-
-/*
- * Subsection for warp timer migration is optional, because may not be created
- */
-static const VMStateDescription icount_vmstate_warp_timer = {
-    .name = "timer/icount/warp_timer",
-    .version_id = 1,
-    .minimum_version_id = 1,
-    .needed = warp_timer_state_needed,
-    .fields = (VMStateField[]) {
-        VMSTATE_INT64(vm_clock_warp_start, TimersState),
-        VMSTATE_TIMER_PTR(icount_warp_timer, TimersState),
-        VMSTATE_END_OF_LIST()
-    }
-};
-
-static const VMStateDescription icount_vmstate_adjust_timers = {
-    .name = "timer/icount/timers",
-    .version_id = 1,
-    .minimum_version_id = 1,
-    .needed = adjust_timers_state_needed,
-    .fields = (VMStateField[]) {
-        VMSTATE_TIMER_PTR(icount_rt_timer, TimersState),
-        VMSTATE_TIMER_PTR(icount_vm_timer, TimersState),
-        VMSTATE_END_OF_LIST()
-    }
-};
-
-static const VMStateDescription icount_vmstate_shift = {
-    .name = "timer/icount/shift",
-    .version_id = 1,
-    .minimum_version_id = 1,
-    .needed = shift_state_needed,
-    .fields = (VMStateField[]) {
-        VMSTATE_INT16(icount_time_shift, TimersState),
-        VMSTATE_END_OF_LIST()
-    }
-};
-
-/*
- * This is a subsection for icount migration.
- */
-static const VMStateDescription icount_vmstate_timers = {
-    .name = "timer/icount",
-    .version_id = 1,
-    .minimum_version_id = 1,
-    .needed = icount_state_needed,
-    .fields = (VMStateField[]) {
-        VMSTATE_INT64(qemu_icount_bias, TimersState),
-        VMSTATE_INT64(qemu_icount, TimersState),
-        VMSTATE_END_OF_LIST()
-    },
-    .subsections = (const VMStateDescription*[]) {
-        &icount_vmstate_warp_timer,
-        &icount_vmstate_adjust_timers,
-        &icount_vmstate_shift,
-        NULL
-    }
-};
-
-static const VMStateDescription vmstate_timers = {
-    .name = "timer",
-    .version_id = 2,
-    .minimum_version_id = 1,
-    .fields = (VMStateField[]) {
-        VMSTATE_INT64(cpu_ticks_offset, TimersState),
-        VMSTATE_UNUSED(8),
-        VMSTATE_INT64_V(cpu_clock_offset, TimersState, 2),
-        VMSTATE_END_OF_LIST()
-    },
-    .subsections = (const VMStateDescription*[]) {
-        &icount_vmstate_timers,
-        NULL
-    }
-};
-
-void cpu_ticks_init(void)
-{
-    seqlock_init(&timers_state.vm_clock_seqlock);
-    qemu_spin_init(&timers_state.vm_clock_lock);
-    vmstate_register(NULL, 0, &vmstate_timers, &timers_state);
-    cpu_throttle_init();
-}
-
-void configure_icount(QemuOpts *opts, Error **errp)
-{
-    const char *option = qemu_opt_get(opts, "shift");
-    bool sleep = qemu_opt_get_bool(opts, "sleep", true);
-    bool align = qemu_opt_get_bool(opts, "align", false);
-    long time_shift = -1;
-
-    if (!option) {
-        if (qemu_opt_get(opts, "align") != NULL) {
-            error_setg(errp, "Please specify shift option when using align");
-        }
-        return;
-    }
-
-    if (align && !sleep) {
-        error_setg(errp, "align=on and sleep=off are incompatible");
-        return;
-    }
-
-    if (strcmp(option, "auto") != 0) {
-        if (qemu_strtol(option, NULL, 0, &time_shift) < 0
-            || time_shift < 0 || time_shift > MAX_ICOUNT_SHIFT) {
-            error_setg(errp, "icount: Invalid shift value");
-            return;
-        }
-    } else if (icount_align_option) {
-        error_setg(errp, "shift=auto and align=on are incompatible");
-        return;
-    } else if (!icount_sleep) {
-        error_setg(errp, "shift=auto and sleep=off are incompatible");
-        return;
-    }
-
-    icount_sleep = sleep;
-    if (icount_sleep) {
-        timers_state.icount_warp_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL_RT,
-                                         icount_timer_cb, NULL);
-    }
-
-    icount_align_option = align;
-
-    if (time_shift >= 0) {
-        timers_state.icount_time_shift = time_shift;
-        use_icount = 1;
-        return;
-    }
-
-    use_icount = 2;
-
-    /* 125MIPS seems a reasonable initial guess at the guest speed.
-       It will be corrected fairly quickly anyway.  */
-    timers_state.icount_time_shift = 3;
-
-    /* Have both realtime and virtual time triggers for speed adjustment.
-       The realtime trigger catches emulated time passing too slowly,
-       the virtual time trigger catches emulated time passing too fast.
-       Realtime triggers occur even when idle, so use them less frequently
-       than VM triggers.  */
-    timers_state.vm_clock_warp_start = -1;
-    timers_state.icount_rt_timer = timer_new_ms(QEMU_CLOCK_VIRTUAL_RT,
-                                   icount_adjust_rt, NULL);
-    timer_mod(timers_state.icount_rt_timer,
-                   qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL_RT) + 1000);
-    timers_state.icount_vm_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
-                                        icount_adjust_vm, NULL);
-    timer_mod(timers_state.icount_vm_timer,
-                   qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) +
-                   NANOSECONDS_PER_SECOND / 10);
-}
-
 /***********************************************************/
 /* TCG vCPU kick timer
  *
@@ -854,35 +171,6 @@ static void qemu_cpu_kick_rr_cpus(void)
     };
 }
 
-static void do_nothing(CPUState *cpu, run_on_cpu_data unused)
-{
-}
-
-void qemu_timer_notify_cb(void *opaque, QEMUClockType type)
-{
-    if (!use_icount || type != QEMU_CLOCK_VIRTUAL) {
-        qemu_notify_event();
-        return;
-    }
-
-    if (qemu_in_vcpu_thread()) {
-        /* A CPU is currently running; kick it back out to the
-         * tcg_cpu_exec() loop so it will recalculate its
-         * icount deadline immediately.
-         */
-        qemu_cpu_kick(current_cpu);
-    } else if (first_cpu) {
-        /* qemu_cpu_kick is not enough to kick a halted CPU out of
-         * qemu_tcg_wait_io_event.  async_run_on_cpu, instead,
-         * causes cpu_thread_is_idle to return false.  This way,
-         * handle_icount_deadline can run.
-         * If we have no CPUs at all for some reason, we don't
-         * need to do anything.
-         */
-        async_run_on_cpu(first_cpu, do_nothing, RUN_ON_CPU_NULL);
-    }
-}
-
 static void kick_tcg_thread(void *opaque)
 {
     timer_mod(tcg_kick_vcpu_timer, qemu_tcg_next_kick());
@@ -1272,7 +560,7 @@ static int64_t tcg_get_icount_limit(void)
             deadline = INT32_MAX;
         }
 
-        return qemu_icount_round(deadline);
+        return icount_round(deadline);
     } else {
         return replay_get_instructions();
     }
@@ -1288,7 +576,7 @@ static void notify_aio_contexts(void)
 static void handle_icount_deadline(void)
 {
     assert(qemu_in_vcpu_thread());
-    if (use_icount) {
+    if (icount_enabled()) {
         int64_t deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL,
                                                       QEMU_TIMER_ATTR_ALL);
 
@@ -1300,7 +588,7 @@ static void handle_icount_deadline(void)
 
 static void prepare_icount_for_run(CPUState *cpu)
 {
-    if (use_icount) {
+    if (icount_enabled()) {
         int insns_left;
 
         /* These should always be cleared by process_icount_data after
@@ -1325,9 +613,9 @@ static void prepare_icount_for_run(CPUState *cpu)
 
 static void process_icount_data(CPUState *cpu)
 {
-    if (use_icount) {
+    if (icount_enabled()) {
         /* Account for executed instructions */
-        cpu_update_icount(cpu);
+        icount_update(cpu);
 
         /* Reset the counters */
         cpu_neg(cpu)->icount_decr.u16.low = 0;
@@ -1428,7 +716,7 @@ static void *qemu_tcg_rr_cpu_thread_fn(void *arg)
         replay_mutex_lock();
         qemu_mutex_lock_iothread();
         /* Account partial waits to QEMU_CLOCK_VIRTUAL.  */
-        qemu_account_warp_timer();
+        icount_account_warp_timer();
 
         /* Run the timers here.  This is much more efficient than
          * waking up the I/O thread and waiting for completion.
@@ -1486,7 +774,7 @@ static void *qemu_tcg_rr_cpu_thread_fn(void *arg)
             atomic_mb_set(&cpu->exit_request, 0);
         }
 
-        if (use_icount && all_cpu_threads_idle()) {
+        if (icount_enabled() && all_cpu_threads_idle()) {
             /*
              * When all cpus are sleeping (e.g in WFI), to avoid a deadlock
              * in the main_loop, wake it up in order to start the warp timer.
@@ -1639,7 +927,7 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
     CPUState *cpu = arg;
 
     assert(tcg_enabled());
-    g_assert(!use_icount);
+    g_assert(!icount_enabled());
 
     rcu_register_thread();
     tcg_register_thread();
@@ -2218,21 +1506,3 @@ void qmp_inject_nmi(Error **errp)
     nmi_monitor_handle(monitor_get_cpu_index(), errp);
 }
 
-void dump_drift_info(void)
-{
-    if (!use_icount) {
-        return;
-    }
-
-    qemu_printf("Host - Guest clock  %"PRIi64" ms\n",
-                (cpu_get_clock() - cpu_get_icount())/SCALE_MS);
-    if (icount_align_option) {
-        qemu_printf("Max guest delay     %"PRIi64" ms\n",
-                    -max_delay / SCALE_MS);
-        qemu_printf("Max guest advance   %"PRIi64" ms\n",
-                    max_advance / SCALE_MS);
-    } else {
-        qemu_printf("Max guest delay     NA\n");
-        qemu_printf("Max guest advance   NA\n");
-    }
-}
diff --git a/softmmu/icount.c b/softmmu/icount.c
new file mode 100644
index 0000000000..d4fe48c1f2
--- /dev/null
+++ b/softmmu/icount.c
@@ -0,0 +1,497 @@
+/*
+ * QEMU System Emulator
+ *
+ * Copyright (c) 2003-2008 Fabrice Bellard
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "qemu/cutils.h"
+#include "migration/vmstate.h"
+#include "qapi/error.h"
+#include "qemu/error-report.h"
+#include "exec/exec-all.h"
+#include "sysemu/cpus.h"
+#include "sysemu/qtest.h"
+#include "qemu/main-loop.h"
+#include "qemu/option.h"
+#include "qemu/seqlock.h"
+#include "sysemu/replay.h"
+#include "sysemu/runstate.h"
+#include "hw/core/cpu.h"
+#include "sysemu/cpu-timers.h"
+#include "sysemu/cpu-throttle.h"
+#include "timers-state.h"
+
+/*
+ * ICOUNT: Instruction Counter
+ *
+ * this module is split off from cpu-timers because the icount part
+ * is TCG-specific, and does not need to be built for other accels.
+ */
+static bool icount_sleep = true;
+/* Arbitrarily pick 1MIPS as the minimum allowable speed.  */
+#define MAX_ICOUNT_SHIFT 10
+
+/*
+ * 0 = Do not count executed instructions.
+ * 1 = Fixed conversion of insn to ns via "shift" option
+ * 2 = Runtime adaptive algorithm to compute shift
+ */
+static int use_icount;
+
+int icount_enabled(void)
+{
+    return use_icount;
+}
+
+static void icount_enable_precise(void)
+{
+    use_icount = 1;
+}
+
+static void icount_enable_adaptive(void)
+{
+    use_icount = 2;
+}
+
+/*
+ * The current number of executed instructions is based on what we
+ * originally budgeted minus the current state of the decrementing
+ * icount counters in extra/u16.low.
+ */
+static int64_t icount_get_executed(CPUState *cpu)
+{
+    return (cpu->icount_budget -
+            (cpu_neg(cpu)->icount_decr.u16.low + cpu->icount_extra));
+}
+
+/*
+ * Update the global shared timer_state.qemu_icount to take into
+ * account executed instructions. This is done by the TCG vCPU
+ * thread so the main-loop can see time has moved forward.
+ */
+static void icount_update_locked(CPUState *cpu)
+{
+    int64_t executed = icount_get_executed(cpu);
+    cpu->icount_budget -= executed;
+
+    atomic_set_i64(&timers_state.qemu_icount,
+                   timers_state.qemu_icount + executed);
+}
+
+/*
+ * Update the global shared timer_state.qemu_icount to take into
+ * account executed instructions. This is done by the TCG vCPU
+ * thread so the main-loop can see time has moved forward.
+ */
+void icount_update(CPUState *cpu)
+{
+    seqlock_write_lock(&timers_state.vm_clock_seqlock,
+                       &timers_state.vm_clock_lock);
+    icount_update_locked(cpu);
+    seqlock_write_unlock(&timers_state.vm_clock_seqlock,
+                         &timers_state.vm_clock_lock);
+}
+
+static int64_t icount_get_raw_locked(void)
+{
+    CPUState *cpu = current_cpu;
+
+    if (cpu && cpu->running) {
+        if (!cpu->can_do_io) {
+            error_report("Bad icount read");
+            exit(1);
+        }
+        /* Take into account what has run */
+        icount_update_locked(cpu);
+    }
+    /* The read is protected by the seqlock, but needs atomic64 to avoid UB */
+    return atomic_read_i64(&timers_state.qemu_icount);
+}
+
+static int64_t icount_get_locked(void)
+{
+    int64_t icount = icount_get_raw_locked();
+    return atomic_read_i64(&timers_state.qemu_icount_bias) +
+        icount_to_ns(icount);
+}
+
+int64_t icount_get_raw(void)
+{
+    int64_t icount;
+    unsigned start;
+
+    do {
+        start = seqlock_read_begin(&timers_state.vm_clock_seqlock);
+        icount = icount_get_raw_locked();
+    } while (seqlock_read_retry(&timers_state.vm_clock_seqlock, start));
+
+    return icount;
+}
+
+/* Return the virtual CPU time, based on the instruction counter.  */
+int64_t icount_get(void)
+{
+    int64_t icount;
+    unsigned start;
+
+    do {
+        start = seqlock_read_begin(&timers_state.vm_clock_seqlock);
+        icount = icount_get_locked();
+    } while (seqlock_read_retry(&timers_state.vm_clock_seqlock, start));
+
+    return icount;
+}
+
+int64_t icount_to_ns(int64_t icount)
+{
+    return icount << atomic_read(&timers_state.icount_time_shift);
+}
+
+/*
+ * Correlation between real and virtual time is always going to be
+ * fairly approximate, so ignore small variation.
+ * When the guest is idle real and virtual time will be aligned in
+ * the IO wait loop.
+ */
+#define ICOUNT_WOBBLE (NANOSECONDS_PER_SECOND / 10)
+
+static void icount_adjust(void)
+{
+    int64_t cur_time;
+    int64_t cur_icount;
+    int64_t delta;
+
+    /* Protected by TimersState mutex.  */
+    static int64_t last_delta;
+
+    /* If the VM is not running, then do nothing.  */
+    if (!runstate_is_running()) {
+        return;
+    }
+
+    seqlock_write_lock(&timers_state.vm_clock_seqlock,
+                       &timers_state.vm_clock_lock);
+    cur_time = REPLAY_CLOCK_LOCKED(REPLAY_CLOCK_VIRTUAL_RT,
+                                   cpu_get_clock_locked());
+    cur_icount = icount_get_locked();
+
+    delta = cur_icount - cur_time;
+    /* FIXME: This is a very crude algorithm, somewhat prone to oscillation.  */
+    if (delta > 0
+        && last_delta + ICOUNT_WOBBLE < delta * 2
+        && timers_state.icount_time_shift > 0) {
+        /* The guest is getting too far ahead.  Slow time down.  */
+        atomic_set(&timers_state.icount_time_shift,
+                   timers_state.icount_time_shift - 1);
+    }
+    if (delta < 0
+        && last_delta - ICOUNT_WOBBLE > delta * 2
+        && timers_state.icount_time_shift < MAX_ICOUNT_SHIFT) {
+        /* The guest is getting too far behind.  Speed time up.  */
+        atomic_set(&timers_state.icount_time_shift,
+                   timers_state.icount_time_shift + 1);
+    }
+    last_delta = delta;
+    atomic_set_i64(&timers_state.qemu_icount_bias,
+                   cur_icount - (timers_state.qemu_icount
+                                 << timers_state.icount_time_shift));
+    seqlock_write_unlock(&timers_state.vm_clock_seqlock,
+                         &timers_state.vm_clock_lock);
+}
+
+static void icount_adjust_rt(void *opaque)
+{
+    timer_mod(timers_state.icount_rt_timer,
+              qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL_RT) + 1000);
+    icount_adjust();
+}
+
+static void icount_adjust_vm(void *opaque)
+{
+    timer_mod(timers_state.icount_vm_timer,
+                   qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) +
+                   NANOSECONDS_PER_SECOND / 10);
+    icount_adjust();
+}
+
+int64_t icount_round(int64_t count)
+{
+    int shift = atomic_read(&timers_state.icount_time_shift);
+    return (count + (1 << shift) - 1) >> shift;
+}
+
+static void icount_warp_rt(void)
+{
+    unsigned seq;
+    int64_t warp_start;
+
+    /*
+     * The icount_warp_timer is rescheduled soon after vm_clock_warp_start
+     * changes from -1 to another value, so the race here is okay.
+     */
+    do {
+        seq = seqlock_read_begin(&timers_state.vm_clock_seqlock);
+        warp_start = timers_state.vm_clock_warp_start;
+    } while (seqlock_read_retry(&timers_state.vm_clock_seqlock, seq));
+
+    if (warp_start == -1) {
+        return;
+    }
+
+    seqlock_write_lock(&timers_state.vm_clock_seqlock,
+                       &timers_state.vm_clock_lock);
+    if (runstate_is_running()) {
+        int64_t clock = REPLAY_CLOCK_LOCKED(REPLAY_CLOCK_VIRTUAL_RT,
+                                            cpu_get_clock_locked());
+        int64_t warp_delta;
+
+        warp_delta = clock - timers_state.vm_clock_warp_start;
+        if (icount_enabled() == 2) {
+            /*
+             * In adaptive mode, do not let QEMU_CLOCK_VIRTUAL run too
+             * far ahead of real time.
+             */
+            int64_t cur_icount = icount_get_locked();
+            int64_t delta = clock - cur_icount;
+            warp_delta = MIN(warp_delta, delta);
+        }
+        atomic_set_i64(&timers_state.qemu_icount_bias,
+                       timers_state.qemu_icount_bias + warp_delta);
+    }
+    timers_state.vm_clock_warp_start = -1;
+    seqlock_write_unlock(&timers_state.vm_clock_seqlock,
+                       &timers_state.vm_clock_lock);
+
+    if (qemu_clock_expired(QEMU_CLOCK_VIRTUAL)) {
+        qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
+    }
+}
+
+static void icount_timer_cb(void *opaque)
+{
+    /*
+     * No need for a checkpoint because the timer already synchronizes
+     * with CHECKPOINT_CLOCK_VIRTUAL_RT.
+     */
+    icount_warp_rt();
+}
+
+void icount_start_warp_timer(void)
+{
+    int64_t clock;
+    int64_t deadline;
+
+    assert(icount_enabled());
+
+    /*
+     * Nothing to do if the VM is stopped: QEMU_CLOCK_VIRTUAL timers
+     * do not fire, so computing the deadline does not make sense.
+     */
+    if (!runstate_is_running()) {
+        return;
+    }
+
+    if (replay_mode != REPLAY_MODE_PLAY) {
+        if (!all_cpu_threads_idle()) {
+            return;
+        }
+
+        if (qtest_enabled()) {
+            /* When testing, qtest commands advance icount.  */
+            return;
+        }
+
+        replay_checkpoint(CHECKPOINT_CLOCK_WARP_START);
+    } else {
+        /* warp clock deterministically in record/replay mode */
+        if (!replay_checkpoint(CHECKPOINT_CLOCK_WARP_START)) {
+            /*
+             * vCPU is sleeping and warp can't be started.
+             * It is probably a race condition: notification sent
+             * to vCPU was processed in advance and vCPU went to sleep.
+             * Therefore we have to wake it up for doing someting.
+             */
+            if (replay_has_checkpoint()) {
+                qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
+            }
+            return;
+        }
+    }
+
+    /* We want to use the earliest deadline from ALL vm_clocks */
+    clock = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL_RT);
+    deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL,
+                                          ~QEMU_TIMER_ATTR_EXTERNAL);
+    if (deadline < 0) {
+        static bool notified;
+        if (!icount_sleep && !notified) {
+            warn_report("icount sleep disabled and no active timers");
+            notified = true;
+        }
+        return;
+    }
+
+    if (deadline > 0) {
+        /*
+         * Ensure QEMU_CLOCK_VIRTUAL proceeds even when the virtual CPU goes to
+         * sleep.  Otherwise, the CPU might be waiting for a future timer
+         * interrupt to wake it up, but the interrupt never comes because
+         * the vCPU isn't running any insns and thus doesn't advance the
+         * QEMU_CLOCK_VIRTUAL.
+         */
+        if (!icount_sleep) {
+            /*
+             * We never let VCPUs sleep in no sleep icount mode.
+             * If there is a pending QEMU_CLOCK_VIRTUAL timer we just advance
+             * to the next QEMU_CLOCK_VIRTUAL event and notify it.
+             * It is useful when we want a deterministic execution time,
+             * isolated from host latencies.
+             */
+            seqlock_write_lock(&timers_state.vm_clock_seqlock,
+                               &timers_state.vm_clock_lock);
+            atomic_set_i64(&timers_state.qemu_icount_bias,
+                           timers_state.qemu_icount_bias + deadline);
+            seqlock_write_unlock(&timers_state.vm_clock_seqlock,
+                                 &timers_state.vm_clock_lock);
+            qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
+        } else {
+            /*
+             * We do stop VCPUs and only advance QEMU_CLOCK_VIRTUAL after some
+             * "real" time, (related to the time left until the next event) has
+             * passed. The QEMU_CLOCK_VIRTUAL_RT clock will do this.
+             * This avoids that the warps are visible externally; for example,
+             * you will not be sending network packets continuously instead of
+             * every 100ms.
+             */
+            seqlock_write_lock(&timers_state.vm_clock_seqlock,
+                               &timers_state.vm_clock_lock);
+            if (timers_state.vm_clock_warp_start == -1
+                || timers_state.vm_clock_warp_start > clock) {
+                timers_state.vm_clock_warp_start = clock;
+            }
+            seqlock_write_unlock(&timers_state.vm_clock_seqlock,
+                                 &timers_state.vm_clock_lock);
+            timer_mod_anticipate(timers_state.icount_warp_timer,
+                                 clock + deadline);
+        }
+    } else if (deadline == 0) {
+        qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
+    }
+}
+
+void icount_account_warp_timer(void)
+{
+    if (!use_icount || !icount_sleep) {
+        return;
+    }
+
+    /*
+     * Nothing to do if the VM is stopped: QEMU_CLOCK_VIRTUAL timers
+     * do not fire, so computing the deadline does not make sense.
+     */
+    if (!runstate_is_running()) {
+        return;
+    }
+
+    /* warp clock deterministically in record/replay mode */
+    if (!replay_checkpoint(CHECKPOINT_CLOCK_WARP_ACCOUNT)) {
+        return;
+    }
+
+    timer_del(timers_state.icount_warp_timer);
+    icount_warp_rt();
+}
+
+void icount_configure(QemuOpts *opts, Error **errp)
+{
+    const char *option = qemu_opt_get(opts, "shift");
+    bool sleep = qemu_opt_get_bool(opts, "sleep", true);
+    bool align = qemu_opt_get_bool(opts, "align", false);
+    long time_shift = -1;
+
+    if (!option) {
+        if (qemu_opt_get(opts, "align") != NULL) {
+            error_setg(errp, "Please specify shift option when using align");
+        }
+        return;
+    }
+
+    if (align && !sleep) {
+        error_setg(errp, "align=on and sleep=off are incompatible");
+        return;
+    }
+
+    if (strcmp(option, "auto") != 0) {
+        if (qemu_strtol(option, NULL, 0, &time_shift) < 0
+            || time_shift < 0 || time_shift > MAX_ICOUNT_SHIFT) {
+            error_setg(errp, "icount: Invalid shift value");
+            return;
+        }
+    } else if (icount_align_option) {
+        error_setg(errp, "shift=auto and align=on are incompatible");
+        return;
+    } else if (!icount_sleep) {
+        error_setg(errp, "shift=auto and sleep=off are incompatible");
+        return;
+    }
+
+    icount_sleep = sleep;
+    if (icount_sleep) {
+        timers_state.icount_warp_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL_RT,
+                                         icount_timer_cb, NULL);
+    }
+
+    icount_align_option = align;
+
+    if (time_shift >= 0) {
+        timers_state.icount_time_shift = time_shift;
+        icount_enable_precise();
+        return;
+    }
+
+    icount_enable_adaptive();
+
+    /*
+     * 125MIPS seems a reasonable initial guess at the guest speed.
+     * It will be corrected fairly quickly anyway.
+     */
+    timers_state.icount_time_shift = 3;
+
+    /*
+     * Have both realtime and virtual time triggers for speed adjustment.
+     * The realtime trigger catches emulated time passing too slowly,
+     * the virtual time trigger catches emulated time passing too fast.
+     * Realtime triggers occur even when idle, so use them less frequently
+     * than VM triggers.
+     */
+    timers_state.vm_clock_warp_start = -1;
+    timers_state.icount_rt_timer = timer_new_ms(QEMU_CLOCK_VIRTUAL_RT,
+                                   icount_adjust_rt, NULL);
+    timer_mod(timers_state.icount_rt_timer,
+                   qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL_RT) + 1000);
+    timers_state.icount_vm_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
+                                        icount_adjust_vm, NULL);
+    timer_mod(timers_state.icount_vm_timer,
+                   qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) +
+                   NANOSECONDS_PER_SECOND / 10);
+}
diff --git a/softmmu/qtest.c b/softmmu/qtest.c
index 5672b75c35..737779ea7f 100644
--- a/softmmu/qtest.c
+++ b/softmmu/qtest.c
@@ -21,7 +21,7 @@
 #include "exec/memory.h"
 #include "hw/irq.h"
 #include "sysemu/accel.h"
-#include "sysemu/cpus.h"
+#include "sysemu/cpu-timers.h"
 #include "qemu/config-file.h"
 #include "qemu/option.h"
 #include "qemu/error-report.h"
@@ -273,6 +273,38 @@ static void qtest_irq_handler(void *opaque, int n, int level)
     }
 }
 
+static int64_t qtest_clock_counter;
+
+int64_t qtest_get_virtual_clock(void)
+{
+    return atomic_read_i64(&qtest_clock_counter);
+}
+
+static void qtest_set_virtual_clock(int64_t count)
+{
+    atomic_set_i64(&qtest_clock_counter, count);
+}
+
+static void qtest_clock_warp(int64_t dest)
+{
+    int64_t clock = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
+    AioContext *aio_context;
+    assert(qtest_enabled());
+    aio_context = qemu_get_aio_context();
+    while (clock < dest) {
+        int64_t deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL,
+                                                      QEMU_TIMER_ATTR_ALL);
+        int64_t warp = qemu_soonest_timeout(dest - clock, deadline);
+
+        qtest_set_virtual_clock(qtest_get_virtual_clock() + warp);
+
+        qemu_clock_run_timers(QEMU_CLOCK_VIRTUAL);
+        timerlist_run_timers(aio_context->tlg.tl[QEMU_CLOCK_VIRTUAL]);
+        clock = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
+    }
+    qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
+}
+
 static void qtest_process_command(CharBackend *chr, gchar **words)
 {
     const gchar *command;
diff --git a/softmmu/timers-state.h b/softmmu/timers-state.h
new file mode 100644
index 0000000000..db4e60f18f
--- /dev/null
+++ b/softmmu/timers-state.h
@@ -0,0 +1,69 @@
+/*
+ * QEMU System Emulator
+ *
+ * Copyright (c) 2003-2008 Fabrice Bellard
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#ifndef TIMERS_STATE_H
+#define TIMERS_STATE_H
+
+/* timers state, for sharing between icount and cpu-timers */
+
+typedef struct TimersState {
+    /* Protected by BQL.  */
+    int64_t cpu_ticks_prev;
+    int64_t cpu_ticks_offset;
+
+    /*
+     * Protect fields that can be respectively read outside the
+     * BQL, and written from multiple threads.
+     */
+    QemuSeqLock vm_clock_seqlock;
+    QemuSpin vm_clock_lock;
+
+    int16_t cpu_ticks_enabled;
+
+    /* Conversion factor from emulated instructions to virtual clock ticks.  */
+    int16_t icount_time_shift;
+
+    /* Compensate for varying guest execution speed.  */
+    int64_t qemu_icount_bias;
+
+    int64_t vm_clock_warp_start;
+    int64_t cpu_clock_offset;
+
+    /* Only written by TCG thread */
+    int64_t qemu_icount;
+
+    /* for adjusting icount */
+    QEMUTimer *icount_rt_timer;
+    QEMUTimer *icount_vm_timer;
+    QEMUTimer *icount_warp_timer;
+} TimersState;
+
+extern TimersState timers_state;
+
+/*
+ * icount needs this internal from cpu-timers when adjusting the icount shift.
+ */
+int64_t cpu_get_clock_locked(void);
+
+#endif /* TIMERS_STATE_H */
diff --git a/softmmu/vl.c b/softmmu/vl.c
index 4eb9d1f7fd..8e77df7bea 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -74,6 +74,7 @@
 #include "hw/audio/soundhw.h"
 #include "audio/audio.h"
 #include "sysemu/cpus.h"
+#include "sysemu/cpu-timers.h"
 #include "migration/colo.h"
 #include "migration/postcopy-ram.h"
 #include "sysemu/kvm.h"
@@ -2692,7 +2693,7 @@ static void user_register_global_props(void)
 
 static int do_configure_icount(void *opaque, QemuOpts *opts, Error **errp)
 {
-    configure_icount(opts, errp);
+    icount_configure(opts, errp);
     return 0;
 }
 
@@ -2802,7 +2803,7 @@ static void configure_accelerators(const char *progname)
         error_report("falling back to %s", ac->name);
     }
 
-    if (use_icount && !(tcg_enabled() || qtest_enabled())) {
+    if (icount_enabled() && !tcg_enabled()) {
         error_report("-icount is not allowed with hardware virtualization");
         exit(1);
     }
@@ -4237,7 +4238,11 @@ void qemu_init(int argc, char **argv, char **envp)
         semihosting_arg_fallback(kernel_filename, kernel_cmdline);
     }
 
-    cpu_ticks_init();
+    /* initialize cpu timers and VCPU throttle modules */
+    cpu_timers_init();
+
+    /* spice needs the timers to be initialized by this point */
+    qemu_spice_init();
 
     if (default_net) {
         QemuOptsList *net = qemu_find_opts("net");
diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs
index d42046afe4..e97ad407fa 100644
--- a/stubs/Makefile.objs
+++ b/stubs/Makefile.objs
@@ -1,7 +1,8 @@
 stub-obj-y += blk-commit-all.o
 stub-obj-y += cmos.o
 stub-obj-y += cpu-get-clock.o
-stub-obj-y += cpu-get-icount.o
+stub-obj-y += qemu-timer-notify-cb.o
+stub-obj-y += icount.o
 stub-obj-y += dump.o
 stub-obj-y += error-printf.o
 stub-obj-y += fdset.o
@@ -37,7 +38,6 @@ stub-obj-y += arch_type.o
 stub-obj-y += bdrv-next-monitor-owned.o
 stub-obj-y += blockdev-close-all-bdrv-states.o
 stub-obj-y += change-state-handler.o
-stub-obj-y += clock-warp.o
 stub-obj-y += fd-register.o
 stub-obj-y += fw_cfg.o
 stub-obj-y += get-vm-name.o
diff --git a/stubs/clock-warp.c b/stubs/clock-warp.c
deleted file mode 100644
index b53e5dd94c..0000000000
--- a/stubs/clock-warp.c
+++ /dev/null
@@ -1,7 +0,0 @@
-#include "qemu/osdep.h"
-#include "qemu/timer.h"
-
-void qemu_start_warp_timer(void)
-{
-}
-
diff --git a/stubs/cpu-get-clock.c b/stubs/cpu-get-clock.c
index 5a92810e87..9e92404816 100644
--- a/stubs/cpu-get-clock.c
+++ b/stubs/cpu-get-clock.c
@@ -1,5 +1,6 @@
 #include "qemu/osdep.h"
-#include "qemu/timer.h"
+#include "sysemu/cpu-timers.h"
+#include "qemu/main-loop.h"
 
 int64_t cpu_get_clock(void)
 {
diff --git a/stubs/cpu-get-icount.c b/stubs/cpu-get-icount.c
deleted file mode 100644
index b35f844638..0000000000
--- a/stubs/cpu-get-icount.c
+++ /dev/null
@@ -1,21 +0,0 @@
-#include "qemu/osdep.h"
-#include "qemu/timer.h"
-#include "sysemu/cpus.h"
-#include "qemu/main-loop.h"
-
-int use_icount;
-
-int64_t cpu_get_icount(void)
-{
-    abort();
-}
-
-int64_t cpu_get_icount_raw(void)
-{
-    abort();
-}
-
-void qemu_timer_notify_cb(void *opaque, QEMUClockType type)
-{
-    qemu_notify_event();
-}
diff --git a/stubs/icount.c b/stubs/icount.c
new file mode 100644
index 0000000000..3b35001051
--- /dev/null
+++ b/stubs/icount.c
@@ -0,0 +1,52 @@
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "sysemu/cpu-timers.h"
+
+/* icount - Instruction Counter API */
+
+/*
+ * Return the icount enablement state:
+ *
+ * 0 = Disabled - Do not count executed instructions.
+ */
+int icount_enabled(void)
+{
+    return 0;
+}
+void icount_update(CPUState *cpu)
+{
+    abort();
+}
+void icount_configure(QemuOpts *opts, Error **errp)
+{
+    /* signal error */
+    error_setg(errp, "cannot configure icount, TCG support not available");
+}
+int64_t icount_get_raw(void)
+{
+    abort();
+    return 0;
+}
+int64_t icount_get(void)
+{
+    abort();
+    return 0;
+}
+int64_t icount_to_ns(int64_t icount)
+{
+    abort();
+    return 0;
+}
+int64_t icount_round(int64_t count)
+{
+    abort();
+    return 0;
+}
+void icount_start_warp_timer(void)
+{
+    abort();
+}
+void icount_account_warp_timer(void)
+{
+    abort();
+}
diff --git a/stubs/qemu-timer-notify-cb.c b/stubs/qemu-timer-notify-cb.c
new file mode 100644
index 0000000000..845e46f8e0
--- /dev/null
+++ b/stubs/qemu-timer-notify-cb.c
@@ -0,0 +1,8 @@
+#include "qemu/osdep.h"
+#include "sysemu/cpu-timers.h"
+#include "qemu/main-loop.h"
+
+void qemu_timer_notify_cb(void *opaque, QEMUClockType type)
+{
+    qemu_notify_event();
+}
diff --git a/stubs/qtest.c b/stubs/qtest.c
index 891eb954fb..4666a49d7d 100644
--- a/stubs/qtest.c
+++ b/stubs/qtest.c
@@ -18,3 +18,8 @@ bool qtest_driver(void)
 {
     return false;
 }
+
+int64_t qtest_get_virtual_clock(void)
+{
+    return 0;
+}
diff --git a/target/alpha/translate.c b/target/alpha/translate.c
index 8870284f57..36be602179 100644
--- a/target/alpha/translate.c
+++ b/target/alpha/translate.c
@@ -20,6 +20,7 @@
 #include "qemu/osdep.h"
 #include "cpu.h"
 #include "sysemu/cpus.h"
+#include "sysemu/cpu-timers.h"
 #include "disas/disas.h"
 #include "qemu/host-utils.h"
 #include "exec/exec-all.h"
@@ -1329,7 +1330,7 @@ static DisasJumpType gen_mfpr(DisasContext *ctx, TCGv va, int regno)
     case 249: /* VMTIME */
         helper = gen_helper_get_vmtime;
     do_helper:
-        if (use_icount) {
+        if (icount_enabled()) {
             gen_io_start();
             helper(va);
             return DISAS_PC_STALE;
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 8ef0fb478f..730da0ab6f 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -24,6 +24,7 @@
 #include "hw/irq.h"
 #include "hw/semihosting/semihost.h"
 #include "sysemu/cpus.h"
+#include "sysemu/cpu-timers.h"
 #include "sysemu/kvm.h"
 #include "sysemu/tcg.h"
 #include "qemu/range.h"
@@ -1206,17 +1207,17 @@ static int64_t cycles_ns_per(uint64_t cycles)
 
 static bool instructions_supported(CPUARMState *env)
 {
-    return use_icount == 1 /* Precise instruction counting */;
+    return icount_enabled() == 1; /* Precise instruction counting */
 }
 
 static uint64_t instructions_get_count(CPUARMState *env)
 {
-    return (uint64_t)cpu_get_icount_raw();
+    return (uint64_t)icount_get_raw();
 }
 
 static int64_t instructions_ns_per(uint64_t icount)
 {
-    return cpu_icount_to_ns((int64_t)icount);
+    return icount_to_ns((int64_t)icount);
 }
 #endif
 
diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index 6a96a01b1c..ab2b230991 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -242,8 +242,8 @@ static int write_vstart(CPURISCVState *env, int csrno, target_ulong val)
 static int read_instret(CPURISCVState *env, int csrno, target_ulong *val)
 {
 #if !defined(CONFIG_USER_ONLY)
-    if (use_icount) {
-        *val = cpu_get_icount();
+    if (icount_enabled()) {
+        *val = icount_get();
     } else {
         *val = cpu_get_host_ticks();
     }
@@ -257,8 +257,8 @@ static int read_instret(CPURISCVState *env, int csrno, target_ulong *val)
 static int read_instreth(CPURISCVState *env, int csrno, target_ulong *val)
 {
 #if !defined(CONFIG_USER_ONLY)
-    if (use_icount) {
-        *val = cpu_get_icount() >> 32;
+    if (icount_enabled()) {
+        *val = icount_get() >> 32;
     } else {
         *val = cpu_get_host_ticks() >> 32;
     }
diff --git a/tests/ptimer-test-stubs.c b/tests/ptimer-test-stubs.c
index ed393d9082..b4447a3e44 100644
--- a/tests/ptimer-test-stubs.c
+++ b/tests/ptimer-test-stubs.c
@@ -12,6 +12,7 @@
 #include "qemu/main-loop.h"
 #include "sysemu/replay.h"
 #include "migration/vmstate.h"
+#include "sysemu/cpu-timers.h"
 
 #include "ptimer-test.h"
 
@@ -30,8 +31,10 @@ QEMUTimerListGroup main_loop_tlg;
 
 int64_t ptimer_test_time_ns;
 
-/* Do not artificially limit period - see hw/core/ptimer.c.  */
-int use_icount = 1;
+int icount_enabled(void)
+{
+    return 0;
+}
 bool qtest_allowed;
 
 void timer_init_full(QEMUTimer *ts,
diff --git a/tests/test-timed-average.c b/tests/test-timed-average.c
index e2bcf5fe13..82c92500df 100644
--- a/tests/test-timed-average.c
+++ b/tests/test-timed-average.c
@@ -11,7 +11,7 @@
  */
 
 #include "qemu/osdep.h"
-
+#include "sysemu/cpu-timers.h"
 #include "qemu/timed-average.h"
 
 /* This is the clock for QEMU_CLOCK_VIRTUAL */
diff --git a/util/main-loop.c b/util/main-loop.c
index f69f055013..744b42fc54 100644
--- a/util/main-loop.c
+++ b/util/main-loop.c
@@ -27,7 +27,7 @@
 #include "qemu/cutils.h"
 #include "qemu/timer.h"
 #include "sysemu/qtest.h"
-#include "sysemu/cpus.h"
+#include "sysemu/cpu-timers.h"
 #include "sysemu/replay.h"
 #include "qemu/main-loop.h"
 #include "block/aio.h"
@@ -517,9 +517,13 @@ void main_loop_wait(int nonblocking)
     mlpoll.state = ret < 0 ? MAIN_LOOP_POLL_ERR : MAIN_LOOP_POLL_OK;
     notifier_list_notify(&main_loop_poll_notifiers, &mlpoll);
 
-    /* CPU thread can infinitely wait for event after
-       missing the warp */
-    qemu_start_warp_timer();
+    if (icount_enabled()) {
+        /*
+         * CPU thread can infinitely wait for event after
+         * missing the warp
+         */
+        icount_start_warp_timer();
+    }
     qemu_clock_run_all_timers();
 }
 
diff --git a/util/qemu-timer.c b/util/qemu-timer.c
index f62b4feecd..db51e68f25 100644
--- a/util/qemu-timer.c
+++ b/util/qemu-timer.c
@@ -26,8 +26,10 @@
 #include "qemu/main-loop.h"
 #include "qemu/timer.h"
 #include "qemu/lockable.h"
+#include "sysemu/cpu-timers.h"
 #include "sysemu/replay.h"
 #include "sysemu/cpus.h"
+#include "sysemu/qtest.h"
 
 #ifdef CONFIG_POSIX
 #include <pthread.h>
@@ -134,7 +136,7 @@ static void qemu_clock_init(QEMUClockType type, QEMUTimerListNotifyCB *notify_cb
 
 bool qemu_clock_use_for_deadline(QEMUClockType type)
 {
-    return !(use_icount && (type == QEMU_CLOCK_VIRTUAL));
+    return !(icount_enabled() && (type == QEMU_CLOCK_VIRTUAL));
 }
 
 void qemu_clock_notify(QEMUClockType type)
@@ -416,8 +418,8 @@ static bool timer_mod_ns_locked(QEMUTimerList *timer_list,
 static void timerlist_rearm(QEMUTimerList *timer_list)
 {
     /* Interrupt execution to force deadline recalculation.  */
-    if (timer_list->clock->type == QEMU_CLOCK_VIRTUAL) {
-        qemu_start_warp_timer();
+    if (icount_enabled() && timer_list->clock->type == QEMU_CLOCK_VIRTUAL) {
+        icount_start_warp_timer();
     }
     timerlist_notify(timer_list);
 }
@@ -633,8 +635,10 @@ int64_t qemu_clock_get_ns(QEMUClockType type)
         return get_clock();
     default:
     case QEMU_CLOCK_VIRTUAL:
-        if (use_icount) {
-            return cpu_get_icount();
+        if (icount_enabled()) {
+            return icount_get();
+        } else if (qtest_enabled()) { /* for qtest_clock_warp */
+            return qtest_get_virtual_clock();
         } else {
             return cpu_get_clock();
         }
-- 
2.16.4



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC v3 2/8] cpus: prepare new CpusAccel cpu accelerator interface
  2020-08-03  9:05 [RFC v3 0/8] QEMU cpus.c refactoring part2 Claudio Fontana
  2020-08-03  9:05 ` [RFC v3 1/8] cpu-timers, icount: new modules Claudio Fontana
@ 2020-08-03  9:05 ` Claudio Fontana
  2020-08-05  8:40   ` Claudio Fontana
                     ` (2 more replies)
  2020-08-03  9:05 ` [RFC v3 3/8] cpus: extract out TCG-specific code to accel/tcg Claudio Fontana
                   ` (7 subsequent siblings)
  9 siblings, 3 replies; 25+ messages in thread
From: Claudio Fontana @ 2020-08-03  9:05 UTC (permalink / raw)
  To: Paolo Bonzini, Alex Bennée, Peter Maydell,
	Philippe Mathieu-Daudé
  Cc: Laurent Vivier, Thomas Huth, Eduardo Habkost, Pavel Dovgalyuk,
	Marcelo Tosatti, qemu-devel, Markus Armbruster, Roman Bolshakov,
	Wenchao Wang, Colin Xu, Claudio Fontana, haxm-team,
	Sunil Muthuswamy, Richard Henderson

The new interface starts unused, will start being used by the
next patches.

It provides methods for each accelerator to start a vcpu, kick a vcpu,
synchronize state, get cpu virtual clock and elapsed ticks.

Signed-off-by: Claudio Fontana <cfontana@suse.de>
---
 hw/core/cpu.c                  |   1 +
 hw/i386/x86.c                  |   2 +-
 include/sysemu/cpu-timers.h    |   9 +-
 include/sysemu/cpus.h          |  36 ++++++++
 include/sysemu/hw_accel.h      |  69 ++-------------
 softmmu/cpu-timers.c           |   9 +-
 softmmu/cpus.c                 | 194 ++++++++++++++++++++++++++++++++---------
 stubs/Makefile.objs            |   2 +
 stubs/cpu-synchronize-state.c  |  15 ++++
 stubs/cpus-get-virtual-clock.c |   8 ++
 util/qemu-timer.c              |   8 +-
 11 files changed, 231 insertions(+), 122 deletions(-)
 create mode 100644 stubs/cpu-synchronize-state.c
 create mode 100644 stubs/cpus-get-virtual-clock.c

diff --git a/hw/core/cpu.c b/hw/core/cpu.c
index 594441a150..b389a312df 100644
--- a/hw/core/cpu.c
+++ b/hw/core/cpu.c
@@ -33,6 +33,7 @@
 #include "hw/qdev-properties.h"
 #include "trace-root.h"
 #include "qemu/plugin.h"
+#include "sysemu/hw_accel.h"
 
 CPUInterruptHandler cpu_interrupt_handler;
 
diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 58cf2229d5..00c35bad7e 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -264,7 +264,7 @@ static long get_file_size(FILE *f)
 /* TSC handling */
 uint64_t cpu_get_tsc(CPUX86State *env)
 {
-    return cpu_get_ticks();
+    return cpus_get_elapsed_ticks();
 }
 
 /* IRQ handling */
diff --git a/include/sysemu/cpu-timers.h b/include/sysemu/cpu-timers.h
index 07d724672f..cb83cc5584 100644
--- a/include/sysemu/cpu-timers.h
+++ b/include/sysemu/cpu-timers.h
@@ -64,9 +64,8 @@ void cpu_enable_ticks(void);
 void cpu_disable_ticks(void);
 
 /*
- * return the time elapsed in VM between vm_start and vm_stop.  Unless
- * icount is active, cpu_get_ticks() uses units of the host CPU cycle
- * counter.
+ * return the time elapsed in VM between vm_start and vm_stop.
+ * cpu_get_ticks() uses units of the host CPU cycle counter.
  */
 int64_t cpu_get_ticks(void);
 
@@ -78,4 +77,8 @@ int64_t cpu_get_clock(void);
 
 void qemu_timer_notify_cb(void *opaque, QEMUClockType type);
 
+/* get the VIRTUAL clock and VM elapsed ticks via the cpus accel interface */
+int64_t cpus_get_virtual_clock(void);
+int64_t cpus_get_elapsed_ticks(void);
+
 #endif /* SYSEMU_CPU_TIMERS_H */
diff --git a/include/sysemu/cpus.h b/include/sysemu/cpus.h
index 149de000a0..db196dd96f 100644
--- a/include/sysemu/cpus.h
+++ b/include/sysemu/cpus.h
@@ -4,7 +4,43 @@
 #include "qemu/timer.h"
 
 /* cpus.c */
+
+/* CPU execution threads */
+
+typedef struct CpusAccel {
+    void (*create_vcpu_thread)(CPUState *cpu); /* MANDATORY */
+    void (*kick_vcpu_thread)(CPUState *cpu);
+
+    void (*synchronize_post_reset)(CPUState *cpu);
+    void (*synchronize_post_init)(CPUState *cpu);
+    void (*synchronize_state)(CPUState *cpu);
+    void (*synchronize_pre_loadvm)(CPUState *cpu);
+
+    int64_t (*get_virtual_clock)(void);
+    int64_t (*get_elapsed_ticks)(void);
+} CpusAccel;
+
+/* register accel-specific cpus interface implementation */
+void cpus_register_accel(CpusAccel *i);
+
+/* interface available for cpus accelerator threads */
+
+/* For temporary buffers for forming a name */
+#define VCPU_THREAD_NAME_SIZE 16
+
+void cpus_kick_thread(CPUState *cpu);
+bool cpu_work_list_empty(CPUState *cpu);
+bool cpu_thread_is_idle(CPUState *cpu);
 bool all_cpu_threads_idle(void);
+bool cpu_can_run(CPUState *cpu);
+void qemu_wait_io_event_common(CPUState *cpu);
+void qemu_wait_io_event(CPUState *cpu);
+void cpu_thread_signal_created(CPUState *cpu);
+void cpu_thread_signal_destroyed(CPUState *cpu);
+void cpu_handle_guest_debug(CPUState *cpu);
+
+/* end interface for cpus accelerator threads */
+
 bool qemu_in_vcpu_thread(void);
 void qemu_init_cpu_loop(void);
 void resume_all_vcpus(void);
diff --git a/include/sysemu/hw_accel.h b/include/sysemu/hw_accel.h
index e128f8b06b..ffed6192a3 100644
--- a/include/sysemu/hw_accel.h
+++ b/include/sysemu/hw_accel.h
@@ -1,5 +1,5 @@
 /*
- * QEMU Hardware accelertors support
+ * QEMU Hardware accelerators support
  *
  * Copyright 2016 Google, Inc.
  *
@@ -17,68 +17,9 @@
 #include "sysemu/hvf.h"
 #include "sysemu/whpx.h"
 
-static inline void cpu_synchronize_state(CPUState *cpu)
-{
-    if (kvm_enabled()) {
-        kvm_cpu_synchronize_state(cpu);
-    }
-    if (hax_enabled()) {
-        hax_cpu_synchronize_state(cpu);
-    }
-    if (hvf_enabled()) {
-        hvf_cpu_synchronize_state(cpu);
-    }
-    if (whpx_enabled()) {
-        whpx_cpu_synchronize_state(cpu);
-    }
-}
-
-static inline void cpu_synchronize_post_reset(CPUState *cpu)
-{
-    if (kvm_enabled()) {
-        kvm_cpu_synchronize_post_reset(cpu);
-    }
-    if (hax_enabled()) {
-        hax_cpu_synchronize_post_reset(cpu);
-    }
-    if (hvf_enabled()) {
-        hvf_cpu_synchronize_post_reset(cpu);
-    }
-    if (whpx_enabled()) {
-        whpx_cpu_synchronize_post_reset(cpu);
-    }
-}
-
-static inline void cpu_synchronize_post_init(CPUState *cpu)
-{
-    if (kvm_enabled()) {
-        kvm_cpu_synchronize_post_init(cpu);
-    }
-    if (hax_enabled()) {
-        hax_cpu_synchronize_post_init(cpu);
-    }
-    if (hvf_enabled()) {
-        hvf_cpu_synchronize_post_init(cpu);
-    }
-    if (whpx_enabled()) {
-        whpx_cpu_synchronize_post_init(cpu);
-    }
-}
-
-static inline void cpu_synchronize_pre_loadvm(CPUState *cpu)
-{
-    if (kvm_enabled()) {
-        kvm_cpu_synchronize_pre_loadvm(cpu);
-    }
-    if (hax_enabled()) {
-        hax_cpu_synchronize_pre_loadvm(cpu);
-    }
-    if (hvf_enabled()) {
-        hvf_cpu_synchronize_pre_loadvm(cpu);
-    }
-    if (whpx_enabled()) {
-        whpx_cpu_synchronize_pre_loadvm(cpu);
-    }
-}
+void cpu_synchronize_state(CPUState *cpu);
+void cpu_synchronize_post_reset(CPUState *cpu);
+void cpu_synchronize_post_init(CPUState *cpu);
+void cpu_synchronize_pre_loadvm(CPUState *cpu);
 
 #endif /* QEMU_HW_ACCEL_H */
diff --git a/softmmu/cpu-timers.c b/softmmu/cpu-timers.c
index 64addb315d..3e1da79735 100644
--- a/softmmu/cpu-timers.c
+++ b/softmmu/cpu-timers.c
@@ -61,18 +61,13 @@ static int64_t cpu_get_ticks_locked(void)
 }
 
 /*
- * return the time elapsed in VM between vm_start and vm_stop.  Unless
- * icount is active, cpu_get_ticks() uses units of the host CPU cycle
- * counter.
+ * return the time elapsed in VM between vm_start and vm_stop.
+ * cpu_get_ticks() uses units of the host CPU cycle counter.
  */
 int64_t cpu_get_ticks(void)
 {
     int64_t ticks;
 
-    if (icount_enabled()) {
-        return icount_get();
-    }
-
     qemu_spin_lock(&timers_state.vm_clock_lock);
     ticks = cpu_get_ticks_locked();
     qemu_spin_unlock(&timers_state.vm_clock_lock);
diff --git a/softmmu/cpus.c b/softmmu/cpus.c
index 54fdb2761c..bad6302ca3 100644
--- a/softmmu/cpus.c
+++ b/softmmu/cpus.c
@@ -87,7 +87,7 @@ bool cpu_is_stopped(CPUState *cpu)
     return cpu->stopped || !runstate_is_running();
 }
 
-static inline bool cpu_work_list_empty(CPUState *cpu)
+bool cpu_work_list_empty(CPUState *cpu)
 {
     bool ret;
 
@@ -97,7 +97,7 @@ static inline bool cpu_work_list_empty(CPUState *cpu)
     return ret;
 }
 
-static bool cpu_thread_is_idle(CPUState *cpu)
+bool cpu_thread_is_idle(CPUState *cpu)
 {
     if (cpu->stop || !cpu_work_list_empty(cpu)) {
         return false;
@@ -215,6 +215,11 @@ void hw_error(const char *fmt, ...)
     abort();
 }
 
+/*
+ * The chosen accelerator is supposed to register this.
+ */
+static CpusAccel *cpus_accel;
+
 void cpu_synchronize_all_states(void)
 {
     CPUState *cpu;
@@ -251,6 +256,102 @@ void cpu_synchronize_all_pre_loadvm(void)
     }
 }
 
+void cpu_synchronize_state(CPUState *cpu)
+{
+    if (cpus_accel && cpus_accel->synchronize_state) {
+        cpus_accel->synchronize_state(cpu);
+    }
+    if (kvm_enabled()) {
+        kvm_cpu_synchronize_state(cpu);
+    }
+    if (hax_enabled()) {
+        hax_cpu_synchronize_state(cpu);
+    }
+    if (whpx_enabled()) {
+        whpx_cpu_synchronize_state(cpu);
+    }
+}
+
+void cpu_synchronize_post_reset(CPUState *cpu)
+{
+    if (cpus_accel && cpus_accel->synchronize_post_reset) {
+        cpus_accel->synchronize_post_reset(cpu);
+    }
+    if (kvm_enabled()) {
+        kvm_cpu_synchronize_post_reset(cpu);
+    }
+    if (hax_enabled()) {
+        hax_cpu_synchronize_post_reset(cpu);
+    }
+    if (whpx_enabled()) {
+        whpx_cpu_synchronize_post_reset(cpu);
+    }
+}
+
+void cpu_synchronize_post_init(CPUState *cpu)
+{
+    if (cpus_accel && cpus_accel->synchronize_post_init) {
+        cpus_accel->synchronize_post_init(cpu);
+    }
+    if (kvm_enabled()) {
+        kvm_cpu_synchronize_post_init(cpu);
+    }
+    if (hax_enabled()) {
+        hax_cpu_synchronize_post_init(cpu);
+    }
+    if (whpx_enabled()) {
+        whpx_cpu_synchronize_post_init(cpu);
+    }
+}
+
+void cpu_synchronize_pre_loadvm(CPUState *cpu)
+{
+    if (cpus_accel && cpus_accel->synchronize_pre_loadvm) {
+        cpus_accel->synchronize_pre_loadvm(cpu);
+    }
+    if (kvm_enabled()) {
+        kvm_cpu_synchronize_pre_loadvm(cpu);
+    }
+    if (hax_enabled()) {
+        hax_cpu_synchronize_pre_loadvm(cpu);
+    }
+    if (hvf_enabled()) {
+        hvf_cpu_synchronize_pre_loadvm(cpu);
+    }
+    if (whpx_enabled()) {
+        whpx_cpu_synchronize_pre_loadvm(cpu);
+    }
+}
+
+int64_t cpus_get_virtual_clock(void)
+{
+    if (cpus_accel && cpus_accel->get_virtual_clock) {
+        return cpus_accel->get_virtual_clock();
+    }
+    if (icount_enabled()) {
+        return icount_get();
+    } else if (qtest_enabled()) { /* for qtest_clock_warp */
+        return qtest_get_virtual_clock();
+    }
+    return cpu_get_clock();
+}
+
+/*
+ * return the time elapsed in VM between vm_start and vm_stop.  Unless
+ * icount is active, cpu_get_ticks() uses units of the host CPU cycle
+ * counter.
+ */
+int64_t cpus_get_elapsed_ticks(void)
+{
+    if (cpus_accel && cpus_accel->get_elapsed_ticks) {
+        return cpus_accel->get_elapsed_ticks();
+    }
+    if (icount_enabled()) {
+        return icount_get();
+    }
+    return cpu_get_ticks();
+}
+
 static int do_vm_stop(RunState state, bool send_stop)
 {
     int ret = 0;
@@ -279,7 +380,7 @@ int vm_shutdown(void)
     return do_vm_stop(RUN_STATE_SHUTDOWN, false);
 }
 
-static bool cpu_can_run(CPUState *cpu)
+bool cpu_can_run(CPUState *cpu)
 {
     if (cpu->stop) {
         return false;
@@ -290,7 +391,7 @@ static bool cpu_can_run(CPUState *cpu)
     return true;
 }
 
-static void cpu_handle_guest_debug(CPUState *cpu)
+void cpu_handle_guest_debug(CPUState *cpu)
 {
     gdb_set_stop_cpu(cpu);
     qemu_system_debug_request();
@@ -396,7 +497,7 @@ static void qemu_cpu_stop(CPUState *cpu, bool exit)
     qemu_cond_broadcast(&qemu_pause_cond);
 }
 
-static void qemu_wait_io_event_common(CPUState *cpu)
+void qemu_wait_io_event_common(CPUState *cpu)
 {
     atomic_mb_set(&cpu->thread_kicked, false);
     if (cpu->stop) {
@@ -421,7 +522,7 @@ static void qemu_tcg_rr_wait_io_event(void)
     }
 }
 
-static void qemu_wait_io_event(CPUState *cpu)
+void qemu_wait_io_event(CPUState *cpu)
 {
     bool slept = false;
 
@@ -437,7 +538,8 @@ static void qemu_wait_io_event(CPUState *cpu)
     }
 
 #ifdef _WIN32
-    /* Eat dummy APC queued by qemu_cpu_kick_thread.  */
+    /* Eat dummy APC queued by qemu_cpu_kick_thread. */
+    /* NB!!! Should not this be if (hax_enabled)? Is this wrong for whpx? */
     if (!tcg_enabled()) {
         SleepEx(0, TRUE);
     }
@@ -467,8 +569,7 @@ static void *qemu_kvm_cpu_thread_fn(void *arg)
     kvm_init_cpu_signals(cpu);
 
     /* signal CPU creation */
-    cpu->created = true;
-    qemu_cond_signal(&qemu_cpu_cond);
+    cpu_thread_signal_created(cpu);
     qemu_guest_random_seed_thread_part2(cpu->random_seed);
 
     do {
@@ -482,8 +583,7 @@ static void *qemu_kvm_cpu_thread_fn(void *arg)
     } while (!cpu->unplug || cpu_can_run(cpu));
 
     qemu_kvm_destroy_vcpu(cpu);
-    cpu->created = false;
-    qemu_cond_signal(&qemu_cpu_cond);
+    cpu_thread_signal_destroyed(cpu);
     qemu_mutex_unlock_iothread();
     rcu_unregister_thread();
     return NULL;
@@ -511,8 +611,7 @@ static void *qemu_dummy_cpu_thread_fn(void *arg)
     sigaddset(&waitset, SIG_IPI);
 
     /* signal CPU creation */
-    cpu->created = true;
-    qemu_cond_signal(&qemu_cpu_cond);
+    cpu_thread_signal_created(cpu);
     qemu_guest_random_seed_thread_part2(cpu->random_seed);
 
     do {
@@ -660,8 +759,7 @@ static void deal_with_unplugged_cpus(void)
     CPU_FOREACH(cpu) {
         if (cpu->unplug && !cpu_can_run(cpu)) {
             qemu_tcg_destroy_vcpu(cpu);
-            cpu->created = false;
-            qemu_cond_signal(&qemu_cpu_cond);
+            cpu_thread_signal_destroyed(cpu);
             break;
         }
     }
@@ -688,9 +786,8 @@ static void *qemu_tcg_rr_cpu_thread_fn(void *arg)
     qemu_thread_get_self(cpu->thread);
 
     cpu->thread_id = qemu_get_thread_id();
-    cpu->created = true;
     cpu->can_do_io = 1;
-    qemu_cond_signal(&qemu_cpu_cond);
+    cpu_thread_signal_created(cpu);
     qemu_guest_random_seed_thread_part2(cpu->random_seed);
 
     /* wait for initial kick-off after machine start */
@@ -800,11 +897,9 @@ static void *qemu_hax_cpu_thread_fn(void *arg)
     qemu_thread_get_self(cpu->thread);
 
     cpu->thread_id = qemu_get_thread_id();
-    cpu->created = true;
     current_cpu = cpu;
-
     hax_init_vcpu(cpu);
-    qemu_cond_signal(&qemu_cpu_cond);
+    cpu_thread_signal_created(cpu);
     qemu_guest_random_seed_thread_part2(cpu->random_seed);
 
     do {
@@ -843,8 +938,7 @@ static void *qemu_hvf_cpu_thread_fn(void *arg)
     hvf_init_vcpu(cpu);
 
     /* signal CPU creation */
-    cpu->created = true;
-    qemu_cond_signal(&qemu_cpu_cond);
+    cpu_thread_signal_created(cpu);
     qemu_guest_random_seed_thread_part2(cpu->random_seed);
 
     do {
@@ -858,8 +952,7 @@ static void *qemu_hvf_cpu_thread_fn(void *arg)
     } while (!cpu->unplug || cpu_can_run(cpu));
 
     hvf_vcpu_destroy(cpu);
-    cpu->created = false;
-    qemu_cond_signal(&qemu_cpu_cond);
+    cpu_thread_signal_destroyed(cpu);
     qemu_mutex_unlock_iothread();
     rcu_unregister_thread();
     return NULL;
@@ -884,8 +977,7 @@ static void *qemu_whpx_cpu_thread_fn(void *arg)
     }
 
     /* signal CPU creation */
-    cpu->created = true;
-    qemu_cond_signal(&qemu_cpu_cond);
+    cpu_thread_signal_created(cpu);
     qemu_guest_random_seed_thread_part2(cpu->random_seed);
 
     do {
@@ -902,8 +994,7 @@ static void *qemu_whpx_cpu_thread_fn(void *arg)
     } while (!cpu->unplug || cpu_can_run(cpu));
 
     whpx_destroy_vcpu(cpu);
-    cpu->created = false;
-    qemu_cond_signal(&qemu_cpu_cond);
+    cpu_thread_signal_destroyed(cpu);
     qemu_mutex_unlock_iothread();
     rcu_unregister_thread();
     return NULL;
@@ -936,10 +1027,9 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
     qemu_thread_get_self(cpu->thread);
 
     cpu->thread_id = qemu_get_thread_id();
-    cpu->created = true;
     cpu->can_do_io = 1;
     current_cpu = cpu;
-    qemu_cond_signal(&qemu_cpu_cond);
+    cpu_thread_signal_created(cpu);
     qemu_guest_random_seed_thread_part2(cpu->random_seed);
 
     /* process any pending work */
@@ -980,14 +1070,13 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
     } while (!cpu->unplug || cpu_can_run(cpu));
 
     qemu_tcg_destroy_vcpu(cpu);
-    cpu->created = false;
-    qemu_cond_signal(&qemu_cpu_cond);
+    cpu_thread_signal_destroyed(cpu);
     qemu_mutex_unlock_iothread();
     rcu_unregister_thread();
     return NULL;
 }
 
-static void qemu_cpu_kick_thread(CPUState *cpu)
+void cpus_kick_thread(CPUState *cpu)
 {
 #ifndef _WIN32
     int err;
@@ -1017,7 +1106,10 @@ static void qemu_cpu_kick_thread(CPUState *cpu)
 void qemu_cpu_kick(CPUState *cpu)
 {
     qemu_cond_broadcast(cpu->halt_cond);
-    if (tcg_enabled()) {
+
+    if (cpus_accel && cpus_accel->kick_vcpu_thread) {
+        cpus_accel->kick_vcpu_thread(cpu);
+    } else if (tcg_enabled()) {
         if (qemu_tcg_mttcg_enabled()) {
             cpu_exit(cpu);
         } else {
@@ -1031,14 +1123,14 @@ void qemu_cpu_kick(CPUState *cpu)
              */
             cpu->exit_request = 1;
         }
-        qemu_cpu_kick_thread(cpu);
+        cpus_kick_thread(cpu);
     }
 }
 
 void qemu_cpu_kick_self(void)
 {
     assert(current_cpu);
-    qemu_cpu_kick_thread(current_cpu);
+    cpus_kick_thread(current_cpu);
 }
 
 bool qemu_cpu_is_self(CPUState *cpu)
@@ -1088,6 +1180,21 @@ void qemu_cond_timedwait_iothread(QemuCond *cond, int ms)
     qemu_cond_timedwait(cond, &qemu_global_mutex, ms);
 }
 
+/* signal CPU creation */
+void cpu_thread_signal_created(CPUState *cpu)
+{
+    cpu->created = true;
+    qemu_cond_signal(&qemu_cpu_cond);
+}
+
+/* signal CPU destruction */
+void cpu_thread_signal_destroyed(CPUState *cpu)
+{
+    cpu->created = false;
+    qemu_cond_signal(&qemu_cpu_cond);
+}
+
+
 static bool all_vcpus_paused(void)
 {
     CPUState *cpu;
@@ -1163,9 +1270,6 @@ void cpu_remove_sync(CPUState *cpu)
     qemu_mutex_lock_iothread();
 }
 
-/* For temporary buffers for forming a name */
-#define VCPU_THREAD_NAME_SIZE 16
-
 static void qemu_tcg_init_vcpu(CPUState *cpu)
 {
     char thread_name[VCPU_THREAD_NAME_SIZE];
@@ -1286,6 +1390,13 @@ static void qemu_whpx_start_vcpu(CPUState *cpu)
 #endif
 }
 
+void cpus_register_accel(CpusAccel *ca)
+{
+    assert(ca != NULL);
+    assert(ca->create_vcpu_thread != NULL); /* mandatory */
+    cpus_accel = ca;
+}
+
 static void qemu_dummy_start_vcpu(CPUState *cpu)
 {
     char thread_name[VCPU_THREAD_NAME_SIZE];
@@ -1316,7 +1427,10 @@ void qemu_init_vcpu(CPUState *cpu)
         cpu_address_space_init(cpu, 0, "cpu-memory", cpu->memory);
     }
 
-    if (kvm_enabled()) {
+    if (cpus_accel) {
+        /* accelerator already implements the CpusAccel interface */
+        cpus_accel->create_vcpu_thread(cpu);
+    } else if (kvm_enabled()) {
         qemu_kvm_start_vcpu(cpu);
     } else if (hax_enabled()) {
         qemu_hax_start_vcpu(cpu);
diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs
index e97ad407fa..16345eec43 100644
--- a/stubs/Makefile.objs
+++ b/stubs/Makefile.objs
@@ -1,6 +1,7 @@
 stub-obj-y += blk-commit-all.o
 stub-obj-y += cmos.o
 stub-obj-y += cpu-get-clock.o
+stub-obj-y += cpus-get-virtual-clock.o
 stub-obj-y += qemu-timer-notify-cb.o
 stub-obj-y += icount.o
 stub-obj-y += dump.o
@@ -28,6 +29,7 @@ stub-obj-y += trace-control.o
 stub-obj-y += vmgenid.o
 stub-obj-y += vmstate.o
 stub-obj-$(CONFIG_SOFTMMU) += win32-kbd-hook.o
+stub-obj-y += cpu-synchronize-state.o
 
 #######################################################################
 # code used by both qemu system emulation and qemu-img
diff --git a/stubs/cpu-synchronize-state.c b/stubs/cpu-synchronize-state.c
new file mode 100644
index 0000000000..3112fe439d
--- /dev/null
+++ b/stubs/cpu-synchronize-state.c
@@ -0,0 +1,15 @@
+#include "qemu/osdep.h"
+#include "sysemu/hw_accel.h"
+
+void cpu_synchronize_state(CPUState *cpu)
+{
+}
+void cpu_synchronize_post_reset(CPUState *cpu)
+{
+}
+void cpu_synchronize_post_init(CPUState *cpu)
+{
+}
+void cpu_synchronize_pre_loadvm(CPUState *cpu)
+{
+}
diff --git a/stubs/cpus-get-virtual-clock.c b/stubs/cpus-get-virtual-clock.c
new file mode 100644
index 0000000000..fd447d53f3
--- /dev/null
+++ b/stubs/cpus-get-virtual-clock.c
@@ -0,0 +1,8 @@
+#include "qemu/osdep.h"
+#include "sysemu/cpu-timers.h"
+#include "qemu/main-loop.h"
+
+int64_t cpus_get_virtual_clock(void)
+{
+    return cpu_get_clock();
+}
diff --git a/util/qemu-timer.c b/util/qemu-timer.c
index db51e68f25..50b325c65b 100644
--- a/util/qemu-timer.c
+++ b/util/qemu-timer.c
@@ -635,13 +635,7 @@ int64_t qemu_clock_get_ns(QEMUClockType type)
         return get_clock();
     default:
     case QEMU_CLOCK_VIRTUAL:
-        if (icount_enabled()) {
-            return icount_get();
-        } else if (qtest_enabled()) { /* for qtest_clock_warp */
-            return qtest_get_virtual_clock();
-        } else {
-            return cpu_get_clock();
-        }
+        return cpus_get_virtual_clock();
     case QEMU_CLOCK_HOST:
         return REPLAY_CLOCK(REPLAY_CLOCK_HOST, get_clock_realtime());
     case QEMU_CLOCK_VIRTUAL_RT:
-- 
2.16.4



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC v3 3/8] cpus: extract out TCG-specific code to accel/tcg
  2020-08-03  9:05 [RFC v3 0/8] QEMU cpus.c refactoring part2 Claudio Fontana
  2020-08-03  9:05 ` [RFC v3 1/8] cpu-timers, icount: new modules Claudio Fontana
  2020-08-03  9:05 ` [RFC v3 2/8] cpus: prepare new CpusAccel cpu accelerator interface Claudio Fontana
@ 2020-08-03  9:05 ` Claudio Fontana
  2020-08-03  9:05 ` [RFC v3 4/8] cpus: extract out qtest-specific code to accel/qtest Claudio Fontana
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Claudio Fontana @ 2020-08-03  9:05 UTC (permalink / raw)
  To: Paolo Bonzini, Alex Bennée, Peter Maydell,
	Philippe Mathieu-Daudé
  Cc: Laurent Vivier, Thomas Huth, Eduardo Habkost, Pavel Dovgalyuk,
	Marcelo Tosatti, qemu-devel, Markus Armbruster, Roman Bolshakov,
	Wenchao Wang, Colin Xu, Claudio Fontana, haxm-team,
	Sunil Muthuswamy, Richard Henderson

TCG is the first accelerator to register a "CpusAccel" interface
on initialization, providing functions for starting a vcpu,
kicking a vcpu, sychronizing state and getting virtual clock
and ticks.

Signed-off-by: Claudio Fontana <cfontana@suse.de>
---
 accel/tcg/Makefile.objs |   1 +
 accel/tcg/tcg-all.c     |  12 +-
 accel/tcg/tcg-cpus.c    | 541 ++++++++++++++++++++++++++++++++++++++++++++++++
 accel/tcg/tcg-cpus.h    |  17 ++
 softmmu/cpus.c          | 500 +-------------------------------------------
 5 files changed, 569 insertions(+), 502 deletions(-)
 create mode 100644 accel/tcg/tcg-cpus.c
 create mode 100644 accel/tcg/tcg-cpus.h

diff --git a/accel/tcg/Makefile.objs b/accel/tcg/Makefile.objs
index a92f2c454b..ecf9aa582e 100644
--- a/accel/tcg/Makefile.objs
+++ b/accel/tcg/Makefile.objs
@@ -1,5 +1,6 @@
 obj-$(CONFIG_SOFTMMU) += tcg-all.o
 obj-$(CONFIG_SOFTMMU) += cputlb.o
+obj-$(CONFIG_SOFTMMU) += tcg-cpus.o
 obj-y += tcg-runtime.o tcg-runtime-gvec.o
 obj-y += cpu-exec.o cpu-exec-common.o translate-all.o
 obj-y += translator.o
diff --git a/accel/tcg/tcg-all.c b/accel/tcg/tcg-all.c
index f1feea20c8..01957b130d 100644
--- a/accel/tcg/tcg-all.c
+++ b/accel/tcg/tcg-all.c
@@ -24,19 +24,17 @@
  */
 
 #include "qemu/osdep.h"
-#include "sysemu/accel.h"
+#include "qemu-common.h"
 #include "sysemu/tcg.h"
-#include "qom/object.h"
-#include "cpu.h"
-#include "sysemu/cpus.h"
 #include "sysemu/cpu-timers.h"
-#include "qemu/main-loop.h"
 #include "tcg/tcg.h"
 #include "qapi/error.h"
 #include "qemu/error-report.h"
 #include "hw/boards.h"
 #include "qapi/qapi-builtin-visit.h"
 
+#include "tcg-cpus.h"
+
 typedef struct TCGState {
     AccelState parent_obj;
 
@@ -123,6 +121,8 @@ static void tcg_accel_instance_init(Object *obj)
     s->mttcg_enabled = default_mttcg_enabled();
 }
 
+bool mttcg_enabled;
+
 static int tcg_init(MachineState *ms)
 {
     TCGState *s = TCG_STATE(current_accel());
@@ -130,6 +130,8 @@ static int tcg_init(MachineState *ms)
     tcg_exec_init(s->tb_size * 1024 * 1024);
     cpu_interrupt_handler = tcg_handle_interrupt;
     mttcg_enabled = s->mttcg_enabled;
+    cpus_register_accel(&tcg_cpus);
+
     return 0;
 }
 
diff --git a/accel/tcg/tcg-cpus.c b/accel/tcg/tcg-cpus.c
new file mode 100644
index 0000000000..c82d142523
--- /dev/null
+++ b/accel/tcg/tcg-cpus.c
@@ -0,0 +1,541 @@
+/*
+ * QEMU System Emulator
+ *
+ * Copyright (c) 2003-2008 Fabrice Bellard
+ * Copyright (c) 2014 Red Hat Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "sysemu/tcg.h"
+#include "sysemu/replay.h"
+#include "qemu/main-loop.h"
+#include "qemu/guest-random.h"
+#include "exec/exec-all.h"
+
+#include "tcg-cpus.h"
+
+/* Kick all RR vCPUs */
+static void qemu_cpu_kick_rr_cpus(void)
+{
+    CPUState *cpu;
+
+    CPU_FOREACH(cpu) {
+        cpu_exit(cpu);
+    };
+}
+
+static void tcg_kick_vcpu_thread(CPUState *cpu)
+{
+    if (qemu_tcg_mttcg_enabled()) {
+        cpu_exit(cpu);
+    } else {
+        qemu_cpu_kick_rr_cpus();
+    }
+}
+
+/*
+ * TCG vCPU kick timer
+ *
+ * The kick timer is responsible for moving single threaded vCPU
+ * emulation on to the next vCPU. If more than one vCPU is running a
+ * timer event with force a cpu->exit so the next vCPU can get
+ * scheduled.
+ *
+ * The timer is removed if all vCPUs are idle and restarted again once
+ * idleness is complete.
+ */
+
+static QEMUTimer *tcg_kick_vcpu_timer;
+static CPUState *tcg_current_rr_cpu;
+
+#define TCG_KICK_PERIOD (NANOSECONDS_PER_SECOND / 10)
+
+static inline int64_t qemu_tcg_next_kick(void)
+{
+    return qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + TCG_KICK_PERIOD;
+}
+
+/* Kick the currently round-robin scheduled vCPU to next */
+static void qemu_cpu_kick_rr_next_cpu(void)
+{
+    CPUState *cpu;
+    do {
+        cpu = atomic_mb_read(&tcg_current_rr_cpu);
+        if (cpu) {
+            cpu_exit(cpu);
+        }
+    } while (cpu != atomic_mb_read(&tcg_current_rr_cpu));
+}
+
+static void kick_tcg_thread(void *opaque)
+{
+    timer_mod(tcg_kick_vcpu_timer, qemu_tcg_next_kick());
+    qemu_cpu_kick_rr_next_cpu();
+}
+
+static void start_tcg_kick_timer(void)
+{
+    assert(!mttcg_enabled);
+    if (!tcg_kick_vcpu_timer && CPU_NEXT(first_cpu)) {
+        tcg_kick_vcpu_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
+                                           kick_tcg_thread, NULL);
+    }
+    if (tcg_kick_vcpu_timer && !timer_pending(tcg_kick_vcpu_timer)) {
+        timer_mod(tcg_kick_vcpu_timer, qemu_tcg_next_kick());
+    }
+}
+
+static void stop_tcg_kick_timer(void)
+{
+    assert(!mttcg_enabled);
+    if (tcg_kick_vcpu_timer && timer_pending(tcg_kick_vcpu_timer)) {
+        timer_del(tcg_kick_vcpu_timer);
+    }
+}
+
+static void qemu_tcg_destroy_vcpu(CPUState *cpu)
+{
+}
+
+static void qemu_tcg_rr_wait_io_event(void)
+{
+    CPUState *cpu;
+
+    while (all_cpu_threads_idle()) {
+        stop_tcg_kick_timer();
+        qemu_cond_wait_iothread(first_cpu->halt_cond);
+    }
+
+    start_tcg_kick_timer();
+
+    CPU_FOREACH(cpu) {
+        qemu_wait_io_event_common(cpu);
+    }
+}
+
+static int64_t tcg_get_icount_limit(void)
+{
+    int64_t deadline;
+
+    if (replay_mode != REPLAY_MODE_PLAY) {
+        /*
+         * Include all the timers, because they may need an attention.
+         * Too long CPU execution may create unnecessary delay in UI.
+         */
+        deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL,
+                                              QEMU_TIMER_ATTR_ALL);
+        /* Check realtime timers, because they help with input processing */
+        deadline = qemu_soonest_timeout(deadline,
+                qemu_clock_deadline_ns_all(QEMU_CLOCK_REALTIME,
+                                           QEMU_TIMER_ATTR_ALL));
+
+        /*
+         * Maintain prior (possibly buggy) behaviour where if no deadline
+         * was set (as there is no QEMU_CLOCK_VIRTUAL timer) or it is more than
+         * INT32_MAX nanoseconds ahead, we still use INT32_MAX
+         * nanoseconds.
+         */
+        if ((deadline < 0) || (deadline > INT32_MAX)) {
+            deadline = INT32_MAX;
+        }
+
+        return icount_round(deadline);
+    } else {
+        return replay_get_instructions();
+    }
+}
+
+static void notify_aio_contexts(void)
+{
+    /* Wake up other AioContexts.  */
+    qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
+    qemu_clock_run_timers(QEMU_CLOCK_VIRTUAL);
+}
+
+static void handle_icount_deadline(void)
+{
+    assert(qemu_in_vcpu_thread());
+    if (icount_enabled()) {
+        int64_t deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL,
+                                                      QEMU_TIMER_ATTR_ALL);
+
+        if (deadline == 0) {
+            notify_aio_contexts();
+        }
+    }
+}
+
+static void prepare_icount_for_run(CPUState *cpu)
+{
+    if (icount_enabled()) {
+        int insns_left;
+
+        /*
+         * These should always be cleared by process_icount_data after
+         * each vCPU execution. However u16.high can be raised
+         * asynchronously by cpu_exit/cpu_interrupt/tcg_handle_interrupt
+         */
+        g_assert(cpu_neg(cpu)->icount_decr.u16.low == 0);
+        g_assert(cpu->icount_extra == 0);
+
+        cpu->icount_budget = tcg_get_icount_limit();
+        insns_left = MIN(0xffff, cpu->icount_budget);
+        cpu_neg(cpu)->icount_decr.u16.low = insns_left;
+        cpu->icount_extra = cpu->icount_budget - insns_left;
+
+        replay_mutex_lock();
+
+        if (cpu->icount_budget == 0 && replay_has_checkpoint()) {
+            notify_aio_contexts();
+        }
+    }
+}
+
+static void process_icount_data(CPUState *cpu)
+{
+    if (icount_enabled()) {
+        /* Account for executed instructions */
+        icount_update(cpu);
+
+        /* Reset the counters */
+        cpu_neg(cpu)->icount_decr.u16.low = 0;
+        cpu->icount_extra = 0;
+        cpu->icount_budget = 0;
+
+        replay_account_executed_instructions();
+
+        replay_mutex_unlock();
+    }
+}
+
+static int tcg_cpu_exec(CPUState *cpu)
+{
+    int ret;
+#ifdef CONFIG_PROFILER
+    int64_t ti;
+#endif
+
+    assert(tcg_enabled());
+#ifdef CONFIG_PROFILER
+    ti = profile_getclock();
+#endif
+    cpu_exec_start(cpu);
+    ret = cpu_exec(cpu);
+    cpu_exec_end(cpu);
+#ifdef CONFIG_PROFILER
+    atomic_set(&tcg_ctx->prof.cpu_exec_time,
+               tcg_ctx->prof.cpu_exec_time + profile_getclock() - ti);
+#endif
+    return ret;
+}
+
+/*
+ * Destroy any remaining vCPUs which have been unplugged and have
+ * finished running
+ */
+static void deal_with_unplugged_cpus(void)
+{
+    CPUState *cpu;
+
+    CPU_FOREACH(cpu) {
+        if (cpu->unplug && !cpu_can_run(cpu)) {
+            qemu_tcg_destroy_vcpu(cpu);
+            cpu_thread_signal_destroyed(cpu);
+            break;
+        }
+    }
+}
+
+/*
+ * Single-threaded TCG
+ *
+ * In the single-threaded case each vCPU is simulated in turn. If
+ * there is more than a single vCPU we create a simple timer to kick
+ * the vCPU and ensure we don't get stuck in a tight loop in one vCPU.
+ * This is done explicitly rather than relying on side-effects
+ * elsewhere.
+ */
+
+static void *tcg_rr_cpu_thread_fn(void *arg)
+{
+    CPUState *cpu = arg;
+
+    assert(tcg_enabled());
+    rcu_register_thread();
+    tcg_register_thread();
+
+    qemu_mutex_lock_iothread();
+    qemu_thread_get_self(cpu->thread);
+
+    cpu->thread_id = qemu_get_thread_id();
+    cpu->can_do_io = 1;
+    cpu_thread_signal_created(cpu);
+    qemu_guest_random_seed_thread_part2(cpu->random_seed);
+
+    /* wait for initial kick-off after machine start */
+    while (first_cpu->stopped) {
+        qemu_cond_wait_iothread(first_cpu->halt_cond);
+
+        /* process any pending work */
+        CPU_FOREACH(cpu) {
+            current_cpu = cpu;
+            qemu_wait_io_event_common(cpu);
+        }
+    }
+
+    start_tcg_kick_timer();
+
+    cpu = first_cpu;
+
+    /* process any pending work */
+    cpu->exit_request = 1;
+
+    while (1) {
+        qemu_mutex_unlock_iothread();
+        replay_mutex_lock();
+        qemu_mutex_lock_iothread();
+        /* Account partial waits to QEMU_CLOCK_VIRTUAL.  */
+        icount_account_warp_timer();
+
+        /*
+         * Run the timers here.  This is much more efficient than
+         * waking up the I/O thread and waiting for completion.
+         */
+        handle_icount_deadline();
+
+        replay_mutex_unlock();
+
+        if (!cpu) {
+            cpu = first_cpu;
+        }
+
+        while (cpu && cpu_work_list_empty(cpu) && !cpu->exit_request) {
+
+            atomic_mb_set(&tcg_current_rr_cpu, cpu);
+            current_cpu = cpu;
+
+            qemu_clock_enable(QEMU_CLOCK_VIRTUAL,
+                              (cpu->singlestep_enabled & SSTEP_NOTIMER) == 0);
+
+            if (cpu_can_run(cpu)) {
+                int r;
+
+                qemu_mutex_unlock_iothread();
+                prepare_icount_for_run(cpu);
+
+                r = tcg_cpu_exec(cpu);
+
+                process_icount_data(cpu);
+                qemu_mutex_lock_iothread();
+
+                if (r == EXCP_DEBUG) {
+                    cpu_handle_guest_debug(cpu);
+                    break;
+                } else if (r == EXCP_ATOMIC) {
+                    qemu_mutex_unlock_iothread();
+                    cpu_exec_step_atomic(cpu);
+                    qemu_mutex_lock_iothread();
+                    break;
+                }
+            } else if (cpu->stop) {
+                if (cpu->unplug) {
+                    cpu = CPU_NEXT(cpu);
+                }
+                break;
+            }
+
+            cpu = CPU_NEXT(cpu);
+        } /* while (cpu && !cpu->exit_request).. */
+
+        /* Does not need atomic_mb_set because a spurious wakeup is okay.  */
+        atomic_set(&tcg_current_rr_cpu, NULL);
+
+        if (cpu && cpu->exit_request) {
+            atomic_mb_set(&cpu->exit_request, 0);
+        }
+
+        if (icount_enabled() && all_cpu_threads_idle()) {
+            /*
+             * When all cpus are sleeping (e.g in WFI), to avoid a deadlock
+             * in the main_loop, wake it up in order to start the warp timer.
+             */
+            qemu_notify_event();
+        }
+
+        qemu_tcg_rr_wait_io_event();
+        deal_with_unplugged_cpus();
+    }
+
+    rcu_unregister_thread();
+    return NULL;
+}
+
+/*
+ * Multi-threaded TCG
+ *
+ * In the multi-threaded case each vCPU has its own thread. The TLS
+ * variable current_cpu can be used deep in the code to find the
+ * current CPUState for a given thread.
+ */
+
+static void *tcg_cpu_thread_fn(void *arg)
+{
+    CPUState *cpu = arg;
+
+    assert(tcg_enabled());
+    g_assert(!icount_enabled());
+
+    rcu_register_thread();
+    tcg_register_thread();
+
+    qemu_mutex_lock_iothread();
+    qemu_thread_get_self(cpu->thread);
+
+    cpu->thread_id = qemu_get_thread_id();
+    cpu->can_do_io = 1;
+    current_cpu = cpu;
+    cpu_thread_signal_created(cpu);
+    qemu_guest_random_seed_thread_part2(cpu->random_seed);
+
+    /* process any pending work */
+    cpu->exit_request = 1;
+
+    do {
+        if (cpu_can_run(cpu)) {
+            int r;
+            qemu_mutex_unlock_iothread();
+            r = tcg_cpu_exec(cpu);
+            qemu_mutex_lock_iothread();
+            switch (r) {
+            case EXCP_DEBUG:
+                cpu_handle_guest_debug(cpu);
+                break;
+            case EXCP_HALTED:
+                /*
+                 * during start-up the vCPU is reset and the thread is
+                 * kicked several times. If we don't ensure we go back
+                 * to sleep in the halted state we won't cleanly
+                 * start-up when the vCPU is enabled.
+                 *
+                 * cpu->halted should ensure we sleep in wait_io_event
+                 */
+                g_assert(cpu->halted);
+                break;
+            case EXCP_ATOMIC:
+                qemu_mutex_unlock_iothread();
+                cpu_exec_step_atomic(cpu);
+                qemu_mutex_lock_iothread();
+            default:
+                /* Ignore everything else? */
+                break;
+            }
+        }
+
+        atomic_mb_set(&cpu->exit_request, 0);
+        qemu_wait_io_event(cpu);
+    } while (!cpu->unplug || cpu_can_run(cpu));
+
+    qemu_tcg_destroy_vcpu(cpu);
+    cpu_thread_signal_destroyed(cpu);
+    qemu_mutex_unlock_iothread();
+    rcu_unregister_thread();
+    return NULL;
+}
+
+static void tcg_start_vcpu_thread(CPUState *cpu)
+{
+    char thread_name[VCPU_THREAD_NAME_SIZE];
+    static QemuCond *single_tcg_halt_cond;
+    static QemuThread *single_tcg_cpu_thread;
+    static int tcg_region_inited;
+
+    assert(tcg_enabled());
+    /*
+     * Initialize TCG regions--once. Now is a good time, because:
+     * (1) TCG's init context, prologue and target globals have been set up.
+     * (2) qemu_tcg_mttcg_enabled() works now (TCG init code runs before the
+     *     -accel flag is processed, so the check doesn't work then).
+     */
+    if (!tcg_region_inited) {
+        tcg_region_inited = 1;
+        tcg_region_init();
+    }
+
+    if (qemu_tcg_mttcg_enabled() || !single_tcg_cpu_thread) {
+        cpu->thread = g_malloc0(sizeof(QemuThread));
+        cpu->halt_cond = g_malloc0(sizeof(QemuCond));
+        qemu_cond_init(cpu->halt_cond);
+
+        if (qemu_tcg_mttcg_enabled()) {
+            /* create a thread per vCPU with TCG (MTTCG) */
+            parallel_cpus = true;
+            snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/TCG",
+                 cpu->cpu_index);
+
+            qemu_thread_create(cpu->thread, thread_name, tcg_cpu_thread_fn,
+                               cpu, QEMU_THREAD_JOINABLE);
+
+        } else {
+            /* share a single thread for all cpus with TCG */
+            snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "ALL CPUs/TCG");
+            qemu_thread_create(cpu->thread, thread_name,
+                               tcg_rr_cpu_thread_fn,
+                               cpu, QEMU_THREAD_JOINABLE);
+
+            single_tcg_halt_cond = cpu->halt_cond;
+            single_tcg_cpu_thread = cpu->thread;
+        }
+#ifdef _WIN32
+        cpu->hThread = qemu_thread_get_handle(cpu->thread);
+#endif
+    } else {
+        /* For non-MTTCG cases we share the thread */
+        cpu->thread = single_tcg_cpu_thread;
+        cpu->halt_cond = single_tcg_halt_cond;
+        cpu->thread_id = first_cpu->thread_id;
+        cpu->can_do_io = 1;
+        cpu->created = true;
+    }
+}
+
+static int64_t tcg_get_virtual_clock(void)
+{
+    if (icount_enabled()) {
+        return icount_get();
+    }
+    return cpu_get_clock();
+}
+
+static int64_t tcg_get_elapsed_ticks(void)
+{
+    if (icount_enabled()) {
+        return icount_get();
+    }
+    return cpu_get_ticks();
+}
+
+CpusAccel tcg_cpus = {
+    .create_vcpu_thread = tcg_start_vcpu_thread,
+    .kick_vcpu_thread = tcg_kick_vcpu_thread,
+    .get_virtual_clock = tcg_get_virtual_clock,
+    .get_elapsed_ticks = tcg_get_elapsed_ticks,
+};
diff --git a/accel/tcg/tcg-cpus.h b/accel/tcg/tcg-cpus.h
new file mode 100644
index 0000000000..af4be6a151
--- /dev/null
+++ b/accel/tcg/tcg-cpus.h
@@ -0,0 +1,17 @@
+/*
+ * Accelerator CPUS Interface
+ *
+ * Copyright 2020 SUSE LLC
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef TCG_CPUS_H
+#define TCG_CPUS_H
+
+#include "sysemu/cpus.h"
+
+extern CpusAccel tcg_cpus;
+
+#endif /* TCG_CPUS_H */
diff --git a/softmmu/cpus.c b/softmmu/cpus.c
index bad6302ca3..eaed6749e3 100644
--- a/softmmu/cpus.c
+++ b/softmmu/cpus.c
@@ -24,27 +24,19 @@
 
 #include "qemu/osdep.h"
 #include "qemu-common.h"
-#include "qemu/config-file.h"
-#include "qemu/cutils.h"
-#include "migration/vmstate.h"
 #include "monitor/monitor.h"
 #include "qapi/error.h"
 #include "qapi/qapi-commands-misc.h"
 #include "qapi/qapi-events-run-state.h"
 #include "qapi/qmp/qerror.h"
-#include "qemu/error-report.h"
-#include "qemu/qemu-print.h"
 #include "sysemu/tcg.h"
-#include "sysemu/block-backend.h"
 #include "exec/gdbstub.h"
-#include "sysemu/dma.h"
 #include "sysemu/hw_accel.h"
 #include "sysemu/kvm.h"
 #include "sysemu/hax.h"
 #include "sysemu/hvf.h"
 #include "sysemu/whpx.h"
 #include "exec/exec-all.h"
-
 #include "qemu/thread.h"
 #include "qemu/plugin.h"
 #include "sysemu/cpus.h"
@@ -124,79 +116,6 @@ bool all_cpu_threads_idle(void)
     return true;
 }
 
-bool mttcg_enabled;
-
-
-/***********************************************************/
-/* TCG vCPU kick timer
- *
- * The kick timer is responsible for moving single threaded vCPU
- * emulation on to the next vCPU. If more than one vCPU is running a
- * timer event with force a cpu->exit so the next vCPU can get
- * scheduled.
- *
- * The timer is removed if all vCPUs are idle and restarted again once
- * idleness is complete.
- */
-
-static QEMUTimer *tcg_kick_vcpu_timer;
-static CPUState *tcg_current_rr_cpu;
-
-#define TCG_KICK_PERIOD (NANOSECONDS_PER_SECOND / 10)
-
-static inline int64_t qemu_tcg_next_kick(void)
-{
-    return qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + TCG_KICK_PERIOD;
-}
-
-/* Kick the currently round-robin scheduled vCPU to next */
-static void qemu_cpu_kick_rr_next_cpu(void)
-{
-    CPUState *cpu;
-    do {
-        cpu = atomic_mb_read(&tcg_current_rr_cpu);
-        if (cpu) {
-            cpu_exit(cpu);
-        }
-    } while (cpu != atomic_mb_read(&tcg_current_rr_cpu));
-}
-
-/* Kick all RR vCPUs */
-static void qemu_cpu_kick_rr_cpus(void)
-{
-    CPUState *cpu;
-
-    CPU_FOREACH(cpu) {
-        cpu_exit(cpu);
-    };
-}
-
-static void kick_tcg_thread(void *opaque)
-{
-    timer_mod(tcg_kick_vcpu_timer, qemu_tcg_next_kick());
-    qemu_cpu_kick_rr_next_cpu();
-}
-
-static void start_tcg_kick_timer(void)
-{
-    assert(!mttcg_enabled);
-    if (!tcg_kick_vcpu_timer && CPU_NEXT(first_cpu)) {
-        tcg_kick_vcpu_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
-                                           kick_tcg_thread, NULL);
-    }
-    if (tcg_kick_vcpu_timer && !timer_pending(tcg_kick_vcpu_timer)) {
-        timer_mod(tcg_kick_vcpu_timer, qemu_tcg_next_kick());
-    }
-}
-
-static void stop_tcg_kick_timer(void)
-{
-    assert(!mttcg_enabled);
-    if (tcg_kick_vcpu_timer && timer_pending(tcg_kick_vcpu_timer)) {
-        timer_del(tcg_kick_vcpu_timer);
-    }
-}
-
 /***********************************************************/
 void hw_error(const char *fmt, ...)
 {
@@ -328,9 +247,7 @@ int64_t cpus_get_virtual_clock(void)
     if (cpus_accel && cpus_accel->get_virtual_clock) {
         return cpus_accel->get_virtual_clock();
     }
-    if (icount_enabled()) {
-        return icount_get();
-    } else if (qtest_enabled()) { /* for qtest_clock_warp */
+    if (qtest_enabled()) { /* for qtest_clock_warp */
         return qtest_get_virtual_clock();
     }
     return cpu_get_clock();
@@ -338,7 +255,7 @@ int64_t cpus_get_virtual_clock(void)
 
 /*
  * return the time elapsed in VM between vm_start and vm_stop.  Unless
- * icount is active, cpu_get_ticks() uses units of the host CPU cycle
+ * icount is active, cpus_get_elapsed_ticks() uses units of the host CPU cycle
  * counter.
  */
 int64_t cpus_get_elapsed_ticks(void)
@@ -346,9 +263,6 @@ int64_t cpus_get_elapsed_ticks(void)
     if (cpus_accel && cpus_accel->get_elapsed_ticks) {
         return cpus_accel->get_elapsed_ticks();
     }
-    if (icount_enabled()) {
-        return icount_get();
-    }
     return cpu_get_ticks();
 }
 
@@ -482,10 +396,6 @@ static void qemu_kvm_destroy_vcpu(CPUState *cpu)
     }
 }
 
-static void qemu_tcg_destroy_vcpu(CPUState *cpu)
-{
-}
-
 static void qemu_cpu_stop(CPUState *cpu, bool exit)
 {
     g_assert(qemu_cpu_is_self(cpu));
@@ -506,22 +416,6 @@ void qemu_wait_io_event_common(CPUState *cpu)
     process_queued_cpu_work(cpu);
 }
 
-static void qemu_tcg_rr_wait_io_event(void)
-{
-    CPUState *cpu;
-
-    while (all_cpu_threads_idle()) {
-        stop_tcg_kick_timer();
-        qemu_cond_wait(first_cpu->halt_cond, &qemu_global_mutex);
-    }
-
-    start_tcg_kick_timer();
-
-    CPU_FOREACH(cpu) {
-        qemu_wait_io_event_common(cpu);
-    }
-}
-
 void qemu_wait_io_event(CPUState *cpu)
 {
     bool slept = false;
@@ -538,7 +432,7 @@ void qemu_wait_io_event(CPUState *cpu)
     }
 
 #ifdef _WIN32
-    /* Eat dummy APC queued by qemu_cpu_kick_thread. */
+    /* Eat dummy APC queued by cpus_kick_thread */
     /* NB!!! Should not this be if (hax_enabled)? Is this wrong for whpx? */
     if (!tcg_enabled()) {
         SleepEx(0, TRUE);
@@ -634,259 +528,6 @@ static void *qemu_dummy_cpu_thread_fn(void *arg)
 #endif
 }
 
-static int64_t tcg_get_icount_limit(void)
-{
-    int64_t deadline;
-
-    if (replay_mode != REPLAY_MODE_PLAY) {
-        /*
-         * Include all the timers, because they may need an attention.
-         * Too long CPU execution may create unnecessary delay in UI.
-         */
-        deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL,
-                                              QEMU_TIMER_ATTR_ALL);
-        /* Check realtime timers, because they help with input processing */
-        deadline = qemu_soonest_timeout(deadline,
-                qemu_clock_deadline_ns_all(QEMU_CLOCK_REALTIME,
-                                           QEMU_TIMER_ATTR_ALL));
-
-        /* Maintain prior (possibly buggy) behaviour where if no deadline
-         * was set (as there is no QEMU_CLOCK_VIRTUAL timer) or it is more than
-         * INT32_MAX nanoseconds ahead, we still use INT32_MAX
-         * nanoseconds.
-         */
-        if ((deadline < 0) || (deadline > INT32_MAX)) {
-            deadline = INT32_MAX;
-        }
-
-        return icount_round(deadline);
-    } else {
-        return replay_get_instructions();
-    }
-}
-
-static void notify_aio_contexts(void)
-{
-    /* Wake up other AioContexts.  */
-    qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
-    qemu_clock_run_timers(QEMU_CLOCK_VIRTUAL);
-}
-
-static void handle_icount_deadline(void)
-{
-    assert(qemu_in_vcpu_thread());
-    if (icount_enabled()) {
-        int64_t deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL,
-                                                      QEMU_TIMER_ATTR_ALL);
-
-        if (deadline == 0) {
-            notify_aio_contexts();
-        }
-    }
-}
-
-static void prepare_icount_for_run(CPUState *cpu)
-{
-    if (icount_enabled()) {
-        int insns_left;
-
-        /* These should always be cleared by process_icount_data after
-         * each vCPU execution. However u16.high can be raised
-         * asynchronously by cpu_exit/cpu_interrupt/tcg_handle_interrupt
-         */
-        g_assert(cpu_neg(cpu)->icount_decr.u16.low == 0);
-        g_assert(cpu->icount_extra == 0);
-
-        cpu->icount_budget = tcg_get_icount_limit();
-        insns_left = MIN(0xffff, cpu->icount_budget);
-        cpu_neg(cpu)->icount_decr.u16.low = insns_left;
-        cpu->icount_extra = cpu->icount_budget - insns_left;
-
-        replay_mutex_lock();
-
-        if (cpu->icount_budget == 0 && replay_has_checkpoint()) {
-            notify_aio_contexts();
-        }
-    }
-}
-
-static void process_icount_data(CPUState *cpu)
-{
-    if (icount_enabled()) {
-        /* Account for executed instructions */
-        icount_update(cpu);
-
-        /* Reset the counters */
-        cpu_neg(cpu)->icount_decr.u16.low = 0;
-        cpu->icount_extra = 0;
-        cpu->icount_budget = 0;
-
-        replay_account_executed_instructions();
-
-        replay_mutex_unlock();
-    }
-}
-
-
-static int tcg_cpu_exec(CPUState *cpu)
-{
-    int ret;
-#ifdef CONFIG_PROFILER
-    int64_t ti;
-#endif
-
-    assert(tcg_enabled());
-#ifdef CONFIG_PROFILER
-    ti = profile_getclock();
-#endif
-    cpu_exec_start(cpu);
-    ret = cpu_exec(cpu);
-    cpu_exec_end(cpu);
-#ifdef CONFIG_PROFILER
-    atomic_set(&tcg_ctx->prof.cpu_exec_time,
-               tcg_ctx->prof.cpu_exec_time + profile_getclock() - ti);
-#endif
-    return ret;
-}
-
-/* Destroy any remaining vCPUs which have been unplugged and have
- * finished running
- */
-static void deal_with_unplugged_cpus(void)
-{
-    CPUState *cpu;
-
-    CPU_FOREACH(cpu) {
-        if (cpu->unplug && !cpu_can_run(cpu)) {
-            qemu_tcg_destroy_vcpu(cpu);
-            cpu_thread_signal_destroyed(cpu);
-            break;
-        }
-    }
-}
-
-/* Single-threaded TCG
- *
- * In the single-threaded case each vCPU is simulated in turn. If
- * there is more than a single vCPU we create a simple timer to kick
- * the vCPU and ensure we don't get stuck in a tight loop in one vCPU.
- * This is done explicitly rather than relying on side-effects
- * elsewhere.
- */
-
-static void *qemu_tcg_rr_cpu_thread_fn(void *arg)
-{
-    CPUState *cpu = arg;
-
-    assert(tcg_enabled());
-    rcu_register_thread();
-    tcg_register_thread();
-
-    qemu_mutex_lock_iothread();
-    qemu_thread_get_self(cpu->thread);
-
-    cpu->thread_id = qemu_get_thread_id();
-    cpu->can_do_io = 1;
-    cpu_thread_signal_created(cpu);
-    qemu_guest_random_seed_thread_part2(cpu->random_seed);
-
-    /* wait for initial kick-off after machine start */
-    while (first_cpu->stopped) {
-        qemu_cond_wait(first_cpu->halt_cond, &qemu_global_mutex);
-
-        /* process any pending work */
-        CPU_FOREACH(cpu) {
-            current_cpu = cpu;
-            qemu_wait_io_event_common(cpu);
-        }
-    }
-
-    start_tcg_kick_timer();
-
-    cpu = first_cpu;
-
-    /* process any pending work */
-    cpu->exit_request = 1;
-
-    while (1) {
-        qemu_mutex_unlock_iothread();
-        replay_mutex_lock();
-        qemu_mutex_lock_iothread();
-        /* Account partial waits to QEMU_CLOCK_VIRTUAL.  */
-        icount_account_warp_timer();
-
-        /* Run the timers here.  This is much more efficient than
-         * waking up the I/O thread and waiting for completion.
-         */
-        handle_icount_deadline();
-
-        replay_mutex_unlock();
-
-        if (!cpu) {
-            cpu = first_cpu;
-        }
-
-        while (cpu && cpu_work_list_empty(cpu) && !cpu->exit_request) {
-
-            atomic_mb_set(&tcg_current_rr_cpu, cpu);
-            current_cpu = cpu;
-
-            qemu_clock_enable(QEMU_CLOCK_VIRTUAL,
-                              (cpu->singlestep_enabled & SSTEP_NOTIMER) == 0);
-
-            if (cpu_can_run(cpu)) {
-                int r;
-
-                qemu_mutex_unlock_iothread();
-                prepare_icount_for_run(cpu);
-
-                r = tcg_cpu_exec(cpu);
-
-                process_icount_data(cpu);
-                qemu_mutex_lock_iothread();
-
-                if (r == EXCP_DEBUG) {
-                    cpu_handle_guest_debug(cpu);
-                    break;
-                } else if (r == EXCP_ATOMIC) {
-                    qemu_mutex_unlock_iothread();
-                    cpu_exec_step_atomic(cpu);
-                    qemu_mutex_lock_iothread();
-                    break;
-                }
-            } else if (cpu->stop) {
-                if (cpu->unplug) {
-                    cpu = CPU_NEXT(cpu);
-                }
-                break;
-            }
-
-            cpu = CPU_NEXT(cpu);
-        } /* while (cpu && !cpu->exit_request).. */
-
-        /* Does not need atomic_mb_set because a spurious wakeup is okay.  */
-        atomic_set(&tcg_current_rr_cpu, NULL);
-
-        if (cpu && cpu->exit_request) {
-            atomic_mb_set(&cpu->exit_request, 0);
-        }
-
-        if (icount_enabled() && all_cpu_threads_idle()) {
-            /*
-             * When all cpus are sleeping (e.g in WFI), to avoid a deadlock
-             * in the main_loop, wake it up in order to start the warp timer.
-             */
-            qemu_notify_event();
-        }
-
-        qemu_tcg_rr_wait_io_event();
-        deal_with_unplugged_cpus();
-    }
-
-    rcu_unregister_thread();
-    return NULL;
-}
-
 static void *qemu_hax_cpu_thread_fn(void *arg)
 {
     CPUState *cpu = arg;
@@ -1006,76 +647,6 @@ static void CALLBACK dummy_apc_func(ULONG_PTR unused)
 }
 #endif
 
-/* Multi-threaded TCG
- *
- * In the multi-threaded case each vCPU has its own thread. The TLS
- * variable current_cpu can be used deep in the code to find the
- * current CPUState for a given thread.
- */
-
-static void *qemu_tcg_cpu_thread_fn(void *arg)
-{
-    CPUState *cpu = arg;
-
-    assert(tcg_enabled());
-    g_assert(!icount_enabled());
-
-    rcu_register_thread();
-    tcg_register_thread();
-
-    qemu_mutex_lock_iothread();
-    qemu_thread_get_self(cpu->thread);
-
-    cpu->thread_id = qemu_get_thread_id();
-    cpu->can_do_io = 1;
-    current_cpu = cpu;
-    cpu_thread_signal_created(cpu);
-    qemu_guest_random_seed_thread_part2(cpu->random_seed);
-
-    /* process any pending work */
-    cpu->exit_request = 1;
-
-    do {
-        if (cpu_can_run(cpu)) {
-            int r;
-            qemu_mutex_unlock_iothread();
-            r = tcg_cpu_exec(cpu);
-            qemu_mutex_lock_iothread();
-            switch (r) {
-            case EXCP_DEBUG:
-                cpu_handle_guest_debug(cpu);
-                break;
-            case EXCP_HALTED:
-                /* during start-up the vCPU is reset and the thread is
-                 * kicked several times. If we don't ensure we go back
-                 * to sleep in the halted state we won't cleanly
-                 * start-up when the vCPU is enabled.
-                 *
-                 * cpu->halted should ensure we sleep in wait_io_event
-                 */
-                g_assert(cpu->halted);
-                break;
-            case EXCP_ATOMIC:
-                qemu_mutex_unlock_iothread();
-                cpu_exec_step_atomic(cpu);
-                qemu_mutex_lock_iothread();
-            default:
-                /* Ignore everything else? */
-                break;
-            }
-        }
-
-        atomic_mb_set(&cpu->exit_request, 0);
-        qemu_wait_io_event(cpu);
-    } while (!cpu->unplug || cpu_can_run(cpu));
-
-    qemu_tcg_destroy_vcpu(cpu);
-    cpu_thread_signal_destroyed(cpu);
-    qemu_mutex_unlock_iothread();
-    rcu_unregister_thread();
-    return NULL;
-}
-
 void cpus_kick_thread(CPUState *cpu)
 {
 #ifndef _WIN32
@@ -1106,15 +677,8 @@ void cpus_kick_thread(CPUState *cpu)
 void qemu_cpu_kick(CPUState *cpu)
 {
     qemu_cond_broadcast(cpu->halt_cond);
-
     if (cpus_accel && cpus_accel->kick_vcpu_thread) {
         cpus_accel->kick_vcpu_thread(cpu);
-    } else if (tcg_enabled()) {
-        if (qemu_tcg_mttcg_enabled()) {
-            cpu_exit(cpu);
-        } else {
-            qemu_cpu_kick_rr_cpus();
-        }
     } else {
         if (hax_enabled()) {
             /*
@@ -1270,62 +834,6 @@ void cpu_remove_sync(CPUState *cpu)
     qemu_mutex_lock_iothread();
 }
 
-static void qemu_tcg_init_vcpu(CPUState *cpu)
-{
-    char thread_name[VCPU_THREAD_NAME_SIZE];
-    static QemuCond *single_tcg_halt_cond;
-    static QemuThread *single_tcg_cpu_thread;
-    static int tcg_region_inited;
-
-    assert(tcg_enabled());
-    /*
-     * Initialize TCG regions--once. Now is a good time, because:
-     * (1) TCG's init context, prologue and target globals have been set up.
-     * (2) qemu_tcg_mttcg_enabled() works now (TCG init code runs before the
-     *     -accel flag is processed, so the check doesn't work then).
-     */
-    if (!tcg_region_inited) {
-        tcg_region_inited = 1;
-        tcg_region_init();
-    }
-
-    if (qemu_tcg_mttcg_enabled() || !single_tcg_cpu_thread) {
-        cpu->thread = g_malloc0(sizeof(QemuThread));
-        cpu->halt_cond = g_malloc0(sizeof(QemuCond));
-        qemu_cond_init(cpu->halt_cond);
-
-        if (qemu_tcg_mttcg_enabled()) {
-            /* create a thread per vCPU with TCG (MTTCG) */
-            parallel_cpus = true;
-            snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/TCG",
-                 cpu->cpu_index);
-
-            qemu_thread_create(cpu->thread, thread_name, qemu_tcg_cpu_thread_fn,
-                               cpu, QEMU_THREAD_JOINABLE);
-
-        } else {
-            /* share a single thread for all cpus with TCG */
-            snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "ALL CPUs/TCG");
-            qemu_thread_create(cpu->thread, thread_name,
-                               qemu_tcg_rr_cpu_thread_fn,
-                               cpu, QEMU_THREAD_JOINABLE);
-
-            single_tcg_halt_cond = cpu->halt_cond;
-            single_tcg_cpu_thread = cpu->thread;
-        }
-#ifdef _WIN32
-        cpu->hThread = qemu_thread_get_handle(cpu->thread);
-#endif
-    } else {
-        /* For non-MTTCG cases we share the thread */
-        cpu->thread = single_tcg_cpu_thread;
-        cpu->halt_cond = single_tcg_halt_cond;
-        cpu->thread_id = first_cpu->thread_id;
-        cpu->can_do_io = 1;
-        cpu->created = true;
-    }
-}
-
 static void qemu_hax_start_vcpu(CPUState *cpu)
 {
     char thread_name[VCPU_THREAD_NAME_SIZE];
@@ -1436,8 +944,6 @@ void qemu_init_vcpu(CPUState *cpu)
         qemu_hax_start_vcpu(cpu);
     } else if (hvf_enabled()) {
         qemu_hvf_start_vcpu(cpu);
-    } else if (tcg_enabled()) {
-        qemu_tcg_init_vcpu(cpu);
     } else if (whpx_enabled()) {
         qemu_whpx_start_vcpu(cpu);
     } else {
-- 
2.16.4



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC v3 4/8] cpus: extract out qtest-specific code to accel/qtest
  2020-08-03  9:05 [RFC v3 0/8] QEMU cpus.c refactoring part2 Claudio Fontana
                   ` (2 preceding siblings ...)
  2020-08-03  9:05 ` [RFC v3 3/8] cpus: extract out TCG-specific code to accel/tcg Claudio Fontana
@ 2020-08-03  9:05 ` Claudio Fontana
  2020-08-03  9:05 ` [RFC v3 5/8] cpus: extract out kvm-specific code to accel/kvm Claudio Fontana
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Claudio Fontana @ 2020-08-03  9:05 UTC (permalink / raw)
  To: Paolo Bonzini, Alex Bennée, Peter Maydell,
	Philippe Mathieu-Daudé
  Cc: Laurent Vivier, Thomas Huth, Eduardo Habkost, Pavel Dovgalyuk,
	Marcelo Tosatti, qemu-devel, Markus Armbruster, Roman Bolshakov,
	Wenchao Wang, Colin Xu, Claudio Fontana, haxm-team,
	Sunil Muthuswamy, Richard Henderson

register a "CpusAccel" interface for qtest as well.

Signed-off-by: Claudio Fontana <cfontana@suse.de>
---
 MAINTAINERS               |  2 +-
 accel/Makefile.objs       |  2 +-
 accel/qtest/Makefile.objs |  2 ++
 accel/qtest/qtest-cpus.c  | 91 +++++++++++++++++++++++++++++++++++++++++++++++
 accel/qtest/qtest-cpus.h  | 17 +++++++++
 accel/{ => qtest}/qtest.c |  7 ++++
 softmmu/cpus.c            | 64 +--------------------------------
 7 files changed, 120 insertions(+), 65 deletions(-)
 create mode 100644 accel/qtest/Makefile.objs
 create mode 100644 accel/qtest/qtest-cpus.c
 create mode 100644 accel/qtest/qtest-cpus.h
 rename accel/{ => qtest}/qtest.c (86%)

diff --git a/MAINTAINERS b/MAINTAINERS
index 7dcc3ef4c8..f8bac8cb64 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2442,7 +2442,7 @@ M: Laurent Vivier <lvivier@redhat.com>
 R: Paolo Bonzini <pbonzini@redhat.com>
 S: Maintained
 F: softmmu/qtest.c
-F: accel/qtest.c
+F: accel/qtest/
 F: tests/qtest/
 X: tests/qtest/bios-tables-test-allowed-diff.h
 
diff --git a/accel/Makefile.objs b/accel/Makefile.objs
index ff72f0d030..c5e58eb53d 100644
--- a/accel/Makefile.objs
+++ b/accel/Makefile.objs
@@ -1,5 +1,5 @@
 common-obj-$(CONFIG_SOFTMMU) += accel.o
-obj-$(call land,$(CONFIG_SOFTMMU),$(CONFIG_POSIX)) += qtest.o
+obj-$(call land,$(CONFIG_SOFTMMU),$(CONFIG_POSIX)) += qtest/
 obj-$(CONFIG_KVM) += kvm/
 obj-$(CONFIG_TCG) += tcg/
 obj-$(CONFIG_XEN) += xen/
diff --git a/accel/qtest/Makefile.objs b/accel/qtest/Makefile.objs
new file mode 100644
index 0000000000..627014200e
--- /dev/null
+++ b/accel/qtest/Makefile.objs
@@ -0,0 +1,2 @@
+obj-y += qtest.o
+obj-y += qtest-cpus.o
diff --git a/accel/qtest/qtest-cpus.c b/accel/qtest/qtest-cpus.c
new file mode 100644
index 0000000000..ac10976ac6
--- /dev/null
+++ b/accel/qtest/qtest-cpus.c
@@ -0,0 +1,91 @@
+/*
+ * QTest accelerator code
+ *
+ * Copyright IBM, Corp. 2011
+ *
+ * Authors:
+ *  Anthony Liguori   <aliguori@us.ibm.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/rcu.h"
+#include "qapi/error.h"
+#include "qemu/module.h"
+#include "qemu/option.h"
+#include "qemu/config-file.h"
+#include "sysemu/accel.h"
+#include "sysemu/qtest.h"
+#include "sysemu/cpus.h"
+#include "sysemu/cpu-timers.h"
+#include "qemu/guest-random.h"
+#include "qemu/main-loop.h"
+#include "hw/core/cpu.h"
+
+#include "qtest-cpus.h"
+
+static void *qtest_cpu_thread_fn(void *arg)
+{
+#ifdef _WIN32
+    error_report("qtest is not supported under Windows");
+    exit(1);
+#else
+    CPUState *cpu = arg;
+    sigset_t waitset;
+    int r;
+
+    rcu_register_thread();
+
+    qemu_mutex_lock_iothread();
+    qemu_thread_get_self(cpu->thread);
+    cpu->thread_id = qemu_get_thread_id();
+    cpu->can_do_io = 1;
+    current_cpu = cpu;
+
+    sigemptyset(&waitset);
+    sigaddset(&waitset, SIG_IPI);
+
+    /* signal CPU creation */
+    cpu_thread_signal_created(cpu);
+    qemu_guest_random_seed_thread_part2(cpu->random_seed);
+
+    do {
+        qemu_mutex_unlock_iothread();
+        do {
+            int sig;
+            r = sigwait(&waitset, &sig);
+        } while (r == -1 && (errno == EAGAIN || errno == EINTR));
+        if (r == -1) {
+            perror("sigwait");
+            exit(1);
+        }
+        qemu_mutex_lock_iothread();
+        qemu_wait_io_event(cpu);
+    } while (!cpu->unplug);
+
+    qemu_mutex_unlock_iothread();
+    rcu_unregister_thread();
+    return NULL;
+#endif
+}
+
+static void qtest_start_vcpu_thread(CPUState *cpu)
+{
+    char thread_name[VCPU_THREAD_NAME_SIZE];
+
+    cpu->thread = g_malloc0(sizeof(QemuThread));
+    cpu->halt_cond = g_malloc0(sizeof(QemuCond));
+    qemu_cond_init(cpu->halt_cond);
+    snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/DUMMY",
+             cpu->cpu_index);
+    qemu_thread_create(cpu->thread, thread_name, qtest_cpu_thread_fn, cpu,
+                       QEMU_THREAD_JOINABLE);
+}
+
+CpusAccel qtest_cpus = {
+    .create_vcpu_thread = qtest_start_vcpu_thread,
+    .get_virtual_clock = qtest_get_virtual_clock,
+};
diff --git a/accel/qtest/qtest-cpus.h b/accel/qtest/qtest-cpus.h
new file mode 100644
index 0000000000..c1fab96b9e
--- /dev/null
+++ b/accel/qtest/qtest-cpus.h
@@ -0,0 +1,17 @@
+/*
+ * Accelerator CPUS Interface
+ *
+ * Copyright 2020 SUSE LLC
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef QTEST_CPUS_H
+#define QTEST_CPUS_H
+
+#include "sysemu/cpus.h"
+
+extern CpusAccel qtest_cpus;
+
+#endif /* QTEST_CPUS_H */
diff --git a/accel/qtest.c b/accel/qtest/qtest.c
similarity index 86%
rename from accel/qtest.c
rename to accel/qtest/qtest.c
index 119d0f16a4..537e8b449c 100644
--- a/accel/qtest.c
+++ b/accel/qtest/qtest.c
@@ -12,6 +12,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/rcu.h"
 #include "qapi/error.h"
 #include "qemu/module.h"
 #include "qemu/option.h"
@@ -20,9 +21,15 @@
 #include "sysemu/qtest.h"
 #include "sysemu/cpus.h"
 #include "sysemu/cpu-timers.h"
+#include "qemu/guest-random.h"
+#include "qemu/main-loop.h"
+#include "hw/core/cpu.h"
+
+#include "qtest-cpus.h"
 
 static int qtest_init_accel(MachineState *ms)
 {
+    cpus_register_accel(&qtest_cpus);
     return 0;
 }
 
diff --git a/softmmu/cpus.c b/softmmu/cpus.c
index eaed6749e3..9b4c59f6f5 100644
--- a/softmmu/cpus.c
+++ b/softmmu/cpus.c
@@ -40,7 +40,6 @@
 #include "qemu/thread.h"
 #include "qemu/plugin.h"
 #include "sysemu/cpus.h"
-#include "sysemu/qtest.h"
 #include "qemu/main-loop.h"
 #include "qemu/option.h"
 #include "qemu/bitmap.h"
@@ -247,9 +246,6 @@ int64_t cpus_get_virtual_clock(void)
     if (cpus_accel && cpus_accel->get_virtual_clock) {
         return cpus_accel->get_virtual_clock();
     }
-    if (qtest_enabled()) { /* for qtest_clock_warp */
-        return qtest_get_virtual_clock();
-    }
     return cpu_get_clock();
 }
 
@@ -483,51 +479,6 @@ static void *qemu_kvm_cpu_thread_fn(void *arg)
     return NULL;
 }
 
-static void *qemu_dummy_cpu_thread_fn(void *arg)
-{
-#ifdef _WIN32
-    error_report("qtest is not supported under Windows");
-    exit(1);
-#else
-    CPUState *cpu = arg;
-    sigset_t waitset;
-    int r;
-
-    rcu_register_thread();
-
-    qemu_mutex_lock_iothread();
-    qemu_thread_get_self(cpu->thread);
-    cpu->thread_id = qemu_get_thread_id();
-    cpu->can_do_io = 1;
-    current_cpu = cpu;
-
-    sigemptyset(&waitset);
-    sigaddset(&waitset, SIG_IPI);
-
-    /* signal CPU creation */
-    cpu_thread_signal_created(cpu);
-    qemu_guest_random_seed_thread_part2(cpu->random_seed);
-
-    do {
-        qemu_mutex_unlock_iothread();
-        do {
-            int sig;
-            r = sigwait(&waitset, &sig);
-        } while (r == -1 && (errno == EAGAIN || errno == EINTR));
-        if (r == -1) {
-            perror("sigwait");
-            exit(1);
-        }
-        qemu_mutex_lock_iothread();
-        qemu_wait_io_event(cpu);
-    } while (!cpu->unplug);
-
-    qemu_mutex_unlock_iothread();
-    rcu_unregister_thread();
-    return NULL;
-#endif
-}
-
 static void *qemu_hax_cpu_thread_fn(void *arg)
 {
     CPUState *cpu = arg;
@@ -905,19 +856,6 @@ void cpus_register_accel(CpusAccel *ca)
     cpus_accel = ca;
 }
 
-static void qemu_dummy_start_vcpu(CPUState *cpu)
-{
-    char thread_name[VCPU_THREAD_NAME_SIZE];
-
-    cpu->thread = g_malloc0(sizeof(QemuThread));
-    cpu->halt_cond = g_malloc0(sizeof(QemuCond));
-    qemu_cond_init(cpu->halt_cond);
-    snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/DUMMY",
-             cpu->cpu_index);
-    qemu_thread_create(cpu->thread, thread_name, qemu_dummy_cpu_thread_fn, cpu,
-                       QEMU_THREAD_JOINABLE);
-}
-
 void qemu_init_vcpu(CPUState *cpu)
 {
     MachineState *ms = MACHINE(qdev_get_machine());
@@ -947,7 +885,7 @@ void qemu_init_vcpu(CPUState *cpu)
     } else if (whpx_enabled()) {
         qemu_whpx_start_vcpu(cpu);
     } else {
-        qemu_dummy_start_vcpu(cpu);
+        assert(0);
     }
 
     while (!cpu->created) {
-- 
2.16.4



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC v3 5/8] cpus: extract out kvm-specific code to accel/kvm
  2020-08-03  9:05 [RFC v3 0/8] QEMU cpus.c refactoring part2 Claudio Fontana
                   ` (3 preceding siblings ...)
  2020-08-03  9:05 ` [RFC v3 4/8] cpus: extract out qtest-specific code to accel/qtest Claudio Fontana
@ 2020-08-03  9:05 ` Claudio Fontana
  2020-08-03  9:05 ` [RFC v3 6/8] cpus: extract out hax-specific code to target/i386/ Claudio Fontana
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Claudio Fontana @ 2020-08-03  9:05 UTC (permalink / raw)
  To: Paolo Bonzini, Alex Bennée, Peter Maydell,
	Philippe Mathieu-Daudé
  Cc: Laurent Vivier, Thomas Huth, Eduardo Habkost, Pavel Dovgalyuk,
	Marcelo Tosatti, qemu-devel, Markus Armbruster, Roman Bolshakov,
	Wenchao Wang, Colin Xu, Claudio Fontana, haxm-team,
	Sunil Muthuswamy, Richard Henderson

register a "CpusAccel" interface for KVM as well.

Signed-off-by: Claudio Fontana <cfontana@suse.de>
---
 accel/kvm/Makefile.objs |  2 ++
 accel/kvm/kvm-all.c     | 14 +++++++-
 accel/kvm/kvm-cpus.c    | 88 +++++++++++++++++++++++++++++++++++++++++++++++++
 accel/kvm/kvm-cpus.h    | 17 ++++++++++
 accel/stubs/kvm-stub.c  |  3 +-
 include/sysemu/kvm.h    |  2 +-
 softmmu/cpus.c          | 77 -------------------------------------------
 7 files changed, 122 insertions(+), 81 deletions(-)
 create mode 100644 accel/kvm/kvm-cpus.c
 create mode 100644 accel/kvm/kvm-cpus.h

diff --git a/accel/kvm/Makefile.objs b/accel/kvm/Makefile.objs
index fdfa481578..ce0f492b8d 100644
--- a/accel/kvm/Makefile.objs
+++ b/accel/kvm/Makefile.objs
@@ -1,2 +1,4 @@
 obj-y += kvm-all.o
+obj-y += kvm-cpus.o
+
 obj-$(call lnot,$(CONFIG_SEV)) += sev-stub.o
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 63ef6af9a1..fbd82cb444 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -44,6 +44,9 @@
 #include "qapi/qapi-types-common.h"
 #include "qapi/qapi-visit-common.h"
 #include "sysemu/reset.h"
+#include "qemu/guest-random.h"
+#include "sysemu/hw_accel.h"
+#include "kvm-cpus.h"
 
 #include "hw/boards.h"
 
@@ -378,7 +381,7 @@ err:
     return ret;
 }
 
-int kvm_destroy_vcpu(CPUState *cpu)
+static int do_kvm_destroy_vcpu(CPUState *cpu)
 {
     KVMState *s = kvm_state;
     long mmap_size;
@@ -412,6 +415,14 @@ err:
     return ret;
 }
 
+void kvm_destroy_vcpu(CPUState *cpu)
+{
+    if (do_kvm_destroy_vcpu(cpu) < 0) {
+        error_report("kvm_destroy_vcpu failed");
+        exit(EXIT_FAILURE);
+    }
+}
+
 static int kvm_get_vcpu(KVMState *s, unsigned long vcpu_id)
 {
     struct KVMParkedVcpu *cpu;
@@ -2232,6 +2243,7 @@ static int kvm_init(MachineState *ms)
         assert(!ret);
     }
 
+    cpus_register_accel(&kvm_cpus);
     return 0;
 
 err:
diff --git a/accel/kvm/kvm-cpus.c b/accel/kvm/kvm-cpus.c
new file mode 100644
index 0000000000..7866a2e9c3
--- /dev/null
+++ b/accel/kvm/kvm-cpus.c
@@ -0,0 +1,88 @@
+/*
+ * QEMU KVM support
+ *
+ * Copyright IBM, Corp. 2008
+ *           Red Hat, Inc. 2008
+ *
+ * Authors:
+ *  Anthony Liguori   <aliguori@us.ibm.com>
+ *  Glauber Costa     <gcosta@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/error-report.h"
+#include "qemu/main-loop.h"
+#include "sysemu/kvm_int.h"
+#include "sysemu/runstate.h"
+#include "sysemu/cpus.h"
+#include "qemu/guest-random.h"
+
+#include "kvm-cpus.h"
+
+static void *kvm_vcpu_thread_fn(void *arg)
+{
+    CPUState *cpu = arg;
+    int r;
+
+    rcu_register_thread();
+
+    qemu_mutex_lock_iothread();
+    qemu_thread_get_self(cpu->thread);
+    cpu->thread_id = qemu_get_thread_id();
+    cpu->can_do_io = 1;
+    current_cpu = cpu;
+
+    r = kvm_init_vcpu(cpu);
+    if (r < 0) {
+        error_report("kvm_init_vcpu failed: %s", strerror(-r));
+        exit(1);
+    }
+
+    kvm_init_cpu_signals(cpu);
+
+    /* signal CPU creation */
+    cpu_thread_signal_created(cpu);
+    qemu_guest_random_seed_thread_part2(cpu->random_seed);
+
+    do {
+        if (cpu_can_run(cpu)) {
+            r = kvm_cpu_exec(cpu);
+            if (r == EXCP_DEBUG) {
+                cpu_handle_guest_debug(cpu);
+            }
+        }
+        qemu_wait_io_event(cpu);
+    } while (!cpu->unplug || cpu_can_run(cpu));
+
+    kvm_destroy_vcpu(cpu);
+    cpu_thread_signal_destroyed(cpu);
+    qemu_mutex_unlock_iothread();
+    rcu_unregister_thread();
+    return NULL;
+}
+
+static void kvm_start_vcpu_thread(CPUState *cpu)
+{
+    char thread_name[VCPU_THREAD_NAME_SIZE];
+
+    cpu->thread = g_malloc0(sizeof(QemuThread));
+    cpu->halt_cond = g_malloc0(sizeof(QemuCond));
+    qemu_cond_init(cpu->halt_cond);
+    snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/KVM",
+             cpu->cpu_index);
+    qemu_thread_create(cpu->thread, thread_name, kvm_vcpu_thread_fn,
+                       cpu, QEMU_THREAD_JOINABLE);
+}
+
+CpusAccel kvm_cpus = {
+    .create_vcpu_thread = kvm_start_vcpu_thread,
+
+    .synchronize_post_reset = kvm_cpu_synchronize_post_reset,
+    .synchronize_post_init = kvm_cpu_synchronize_post_init,
+    .synchronize_state = kvm_cpu_synchronize_state,
+    .synchronize_pre_loadvm = kvm_cpu_synchronize_pre_loadvm,
+};
diff --git a/accel/kvm/kvm-cpus.h b/accel/kvm/kvm-cpus.h
new file mode 100644
index 0000000000..62fbc911d9
--- /dev/null
+++ b/accel/kvm/kvm-cpus.h
@@ -0,0 +1,17 @@
+/*
+ * Accelerator CPUS Interface
+ *
+ * Copyright 2020 SUSE LLC
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef KVM_CPUS_H
+#define KVM_CPUS_H
+
+#include "sysemu/cpus.h"
+
+extern CpusAccel kvm_cpus;
+
+#endif /* KVM_CPUS_H */
diff --git a/accel/stubs/kvm-stub.c b/accel/stubs/kvm-stub.c
index 82f118d2df..69f8a842da 100644
--- a/accel/stubs/kvm-stub.c
+++ b/accel/stubs/kvm-stub.c
@@ -32,9 +32,8 @@ bool kvm_readonly_mem_allowed;
 bool kvm_ioeventfd_any_length_allowed;
 bool kvm_msi_use_devid;
 
-int kvm_destroy_vcpu(CPUState *cpu)
+void kvm_destroy_vcpu(CPUState *cpu)
 {
-    return -ENOSYS;
 }
 
 int kvm_init_vcpu(CPUState *cpu)
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index b4174d941c..7a5f973b6f 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -218,7 +218,7 @@ int kvm_has_intx_set_mask(void);
 
 int kvm_init_vcpu(CPUState *cpu);
 int kvm_cpu_exec(CPUState *cpu);
-int kvm_destroy_vcpu(CPUState *cpu);
+void kvm_destroy_vcpu(CPUState *cpu);
 
 /**
  * kvm_arm_supports_user_irq
diff --git a/softmmu/cpus.c b/softmmu/cpus.c
index 9b4c59f6f5..f4cc05128b 100644
--- a/softmmu/cpus.c
+++ b/softmmu/cpus.c
@@ -179,9 +179,6 @@ void cpu_synchronize_state(CPUState *cpu)
     if (cpus_accel && cpus_accel->synchronize_state) {
         cpus_accel->synchronize_state(cpu);
     }
-    if (kvm_enabled()) {
-        kvm_cpu_synchronize_state(cpu);
-    }
     if (hax_enabled()) {
         hax_cpu_synchronize_state(cpu);
     }
@@ -195,9 +192,6 @@ void cpu_synchronize_post_reset(CPUState *cpu)
     if (cpus_accel && cpus_accel->synchronize_post_reset) {
         cpus_accel->synchronize_post_reset(cpu);
     }
-    if (kvm_enabled()) {
-        kvm_cpu_synchronize_post_reset(cpu);
-    }
     if (hax_enabled()) {
         hax_cpu_synchronize_post_reset(cpu);
     }
@@ -211,9 +205,6 @@ void cpu_synchronize_post_init(CPUState *cpu)
     if (cpus_accel && cpus_accel->synchronize_post_init) {
         cpus_accel->synchronize_post_init(cpu);
     }
-    if (kvm_enabled()) {
-        kvm_cpu_synchronize_post_init(cpu);
-    }
     if (hax_enabled()) {
         hax_cpu_synchronize_post_init(cpu);
     }
@@ -227,9 +218,6 @@ void cpu_synchronize_pre_loadvm(CPUState *cpu)
     if (cpus_accel && cpus_accel->synchronize_pre_loadvm) {
         cpus_accel->synchronize_pre_loadvm(cpu);
     }
-    if (kvm_enabled()) {
-        kvm_cpu_synchronize_pre_loadvm(cpu);
-    }
     if (hax_enabled()) {
         hax_cpu_synchronize_pre_loadvm(cpu);
     }
@@ -384,14 +372,6 @@ void run_on_cpu(CPUState *cpu, run_on_cpu_func func, run_on_cpu_data data)
     do_run_on_cpu(cpu, func, data, &qemu_global_mutex);
 }
 
-static void qemu_kvm_destroy_vcpu(CPUState *cpu)
-{
-    if (kvm_destroy_vcpu(cpu) < 0) {
-        error_report("kvm_destroy_vcpu failed");
-        exit(EXIT_FAILURE);
-    }
-}
-
 static void qemu_cpu_stop(CPUState *cpu, bool exit)
 {
     g_assert(qemu_cpu_is_self(cpu));
@@ -437,48 +417,6 @@ void qemu_wait_io_event(CPUState *cpu)
     qemu_wait_io_event_common(cpu);
 }
 
-static void *qemu_kvm_cpu_thread_fn(void *arg)
-{
-    CPUState *cpu = arg;
-    int r;
-
-    rcu_register_thread();
-
-    qemu_mutex_lock_iothread();
-    qemu_thread_get_self(cpu->thread);
-    cpu->thread_id = qemu_get_thread_id();
-    cpu->can_do_io = 1;
-    current_cpu = cpu;
-
-    r = kvm_init_vcpu(cpu);
-    if (r < 0) {
-        error_report("kvm_init_vcpu failed: %s", strerror(-r));
-        exit(1);
-    }
-
-    kvm_init_cpu_signals(cpu);
-
-    /* signal CPU creation */
-    cpu_thread_signal_created(cpu);
-    qemu_guest_random_seed_thread_part2(cpu->random_seed);
-
-    do {
-        if (cpu_can_run(cpu)) {
-            r = kvm_cpu_exec(cpu);
-            if (r == EXCP_DEBUG) {
-                cpu_handle_guest_debug(cpu);
-            }
-        }
-        qemu_wait_io_event(cpu);
-    } while (!cpu->unplug || cpu_can_run(cpu));
-
-    qemu_kvm_destroy_vcpu(cpu);
-    cpu_thread_signal_destroyed(cpu);
-    qemu_mutex_unlock_iothread();
-    rcu_unregister_thread();
-    return NULL;
-}
-
 static void *qemu_hax_cpu_thread_fn(void *arg)
 {
     CPUState *cpu = arg;
@@ -802,19 +740,6 @@ static void qemu_hax_start_vcpu(CPUState *cpu)
 #endif
 }
 
-static void qemu_kvm_start_vcpu(CPUState *cpu)
-{
-    char thread_name[VCPU_THREAD_NAME_SIZE];
-
-    cpu->thread = g_malloc0(sizeof(QemuThread));
-    cpu->halt_cond = g_malloc0(sizeof(QemuCond));
-    qemu_cond_init(cpu->halt_cond);
-    snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/KVM",
-             cpu->cpu_index);
-    qemu_thread_create(cpu->thread, thread_name, qemu_kvm_cpu_thread_fn,
-                       cpu, QEMU_THREAD_JOINABLE);
-}
-
 static void qemu_hvf_start_vcpu(CPUState *cpu)
 {
     char thread_name[VCPU_THREAD_NAME_SIZE];
@@ -876,8 +801,6 @@ void qemu_init_vcpu(CPUState *cpu)
     if (cpus_accel) {
         /* accelerator already implements the CpusAccel interface */
         cpus_accel->create_vcpu_thread(cpu);
-    } else if (kvm_enabled()) {
-        qemu_kvm_start_vcpu(cpu);
     } else if (hax_enabled()) {
         qemu_hax_start_vcpu(cpu);
     } else if (hvf_enabled()) {
-- 
2.16.4



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC v3 6/8] cpus: extract out hax-specific code to target/i386/
  2020-08-03  9:05 [RFC v3 0/8] QEMU cpus.c refactoring part2 Claudio Fontana
                   ` (4 preceding siblings ...)
  2020-08-03  9:05 ` [RFC v3 5/8] cpus: extract out kvm-specific code to accel/kvm Claudio Fontana
@ 2020-08-03  9:05 ` Claudio Fontana
  2020-08-03  9:05 ` [RFC v3 7/8] cpus: extract out whpx-specific " Claudio Fontana
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Claudio Fontana @ 2020-08-03  9:05 UTC (permalink / raw)
  To: Paolo Bonzini, Alex Bennée, Peter Maydell,
	Philippe Mathieu-Daudé
  Cc: Laurent Vivier, Thomas Huth, Eduardo Habkost, Pavel Dovgalyuk,
	Marcelo Tosatti, qemu-devel, Markus Armbruster, Roman Bolshakov,
	Wenchao Wang, Colin Xu, Claudio Fontana, haxm-team,
	Sunil Muthuswamy, Richard Henderson

register a "CpusAccel" interface for HAX as well.

Signed-off-by: Claudio Fontana <cfontana@suse.de>
---
 softmmu/cpus.c            | 80 +-------------------------------------------
 target/i386/Makefile.objs |  5 +--
 target/i386/hax-all.c     |  6 +++-
 target/i386/hax-cpus.c    | 85 +++++++++++++++++++++++++++++++++++++++++++++++
 target/i386/hax-cpus.h    | 17 ++++++++++
 target/i386/hax-i386.h    |  2 ++
 target/i386/hax-posix.c   | 12 +++++++
 target/i386/hax-windows.c | 20 +++++++++++
 8 files changed, 145 insertions(+), 82 deletions(-)
 create mode 100644 target/i386/hax-cpus.c
 create mode 100644 target/i386/hax-cpus.h

diff --git a/softmmu/cpus.c b/softmmu/cpus.c
index f4cc05128b..784593adec 100644
--- a/softmmu/cpus.c
+++ b/softmmu/cpus.c
@@ -33,7 +33,6 @@
 #include "exec/gdbstub.h"
 #include "sysemu/hw_accel.h"
 #include "sysemu/kvm.h"
-#include "sysemu/hax.h"
 #include "sysemu/hvf.h"
 #include "sysemu/whpx.h"
 #include "exec/exec-all.h"
@@ -179,9 +178,6 @@ void cpu_synchronize_state(CPUState *cpu)
     if (cpus_accel && cpus_accel->synchronize_state) {
         cpus_accel->synchronize_state(cpu);
     }
-    if (hax_enabled()) {
-        hax_cpu_synchronize_state(cpu);
-    }
     if (whpx_enabled()) {
         whpx_cpu_synchronize_state(cpu);
     }
@@ -192,9 +188,6 @@ void cpu_synchronize_post_reset(CPUState *cpu)
     if (cpus_accel && cpus_accel->synchronize_post_reset) {
         cpus_accel->synchronize_post_reset(cpu);
     }
-    if (hax_enabled()) {
-        hax_cpu_synchronize_post_reset(cpu);
-    }
     if (whpx_enabled()) {
         whpx_cpu_synchronize_post_reset(cpu);
     }
@@ -205,9 +198,6 @@ void cpu_synchronize_post_init(CPUState *cpu)
     if (cpus_accel && cpus_accel->synchronize_post_init) {
         cpus_accel->synchronize_post_init(cpu);
     }
-    if (hax_enabled()) {
-        hax_cpu_synchronize_post_init(cpu);
-    }
     if (whpx_enabled()) {
         whpx_cpu_synchronize_post_init(cpu);
     }
@@ -218,9 +208,6 @@ void cpu_synchronize_pre_loadvm(CPUState *cpu)
     if (cpus_accel && cpus_accel->synchronize_pre_loadvm) {
         cpus_accel->synchronize_pre_loadvm(cpu);
     }
-    if (hax_enabled()) {
-        hax_cpu_synchronize_pre_loadvm(cpu);
-    }
     if (hvf_enabled()) {
         hvf_cpu_synchronize_pre_loadvm(cpu);
     }
@@ -417,35 +404,6 @@ void qemu_wait_io_event(CPUState *cpu)
     qemu_wait_io_event_common(cpu);
 }
 
-static void *qemu_hax_cpu_thread_fn(void *arg)
-{
-    CPUState *cpu = arg;
-    int r;
-
-    rcu_register_thread();
-    qemu_mutex_lock_iothread();
-    qemu_thread_get_self(cpu->thread);
-
-    cpu->thread_id = qemu_get_thread_id();
-    current_cpu = cpu;
-    hax_init_vcpu(cpu);
-    cpu_thread_signal_created(cpu);
-    qemu_guest_random_seed_thread_part2(cpu->random_seed);
-
-    do {
-        if (cpu_can_run(cpu)) {
-            r = hax_smp_cpu_exec(cpu);
-            if (r == EXCP_DEBUG) {
-                cpu_handle_guest_debug(cpu);
-            }
-        }
-
-        qemu_wait_io_event(cpu);
-    } while (!cpu->unplug || cpu_can_run(cpu));
-    rcu_unregister_thread();
-    return NULL;
-}
-
 /* The HVF-specific vCPU thread function. This one should only run when the host
  * CPU supports the VMX "unrestricted guest" feature. */
 static void *qemu_hvf_cpu_thread_fn(void *arg)
@@ -530,12 +488,6 @@ static void *qemu_whpx_cpu_thread_fn(void *arg)
     return NULL;
 }
 
-#ifdef _WIN32
-static void CALLBACK dummy_apc_func(ULONG_PTR unused)
-{
-}
-#endif
-
 void cpus_kick_thread(CPUState *cpu)
 {
 #ifndef _WIN32
@@ -554,10 +506,6 @@ void cpus_kick_thread(CPUState *cpu)
     if (!qemu_cpu_is_self(cpu)) {
         if (whpx_enabled()) {
             whpx_vcpu_kick(cpu);
-        } else if (!QueueUserAPC(dummy_apc_func, cpu->hThread, 0)) {
-            fprintf(stderr, "%s: QueueUserAPC failed with error %lu\n",
-                    __func__, GetLastError());
-            exit(1);
         }
     }
 #endif
@@ -568,14 +516,7 @@ void qemu_cpu_kick(CPUState *cpu)
     qemu_cond_broadcast(cpu->halt_cond);
     if (cpus_accel && cpus_accel->kick_vcpu_thread) {
         cpus_accel->kick_vcpu_thread(cpu);
-    } else {
-        if (hax_enabled()) {
-            /*
-             * FIXME: race condition with the exit_request check in
-             * hax_vcpu_hax_exec
-             */
-            cpu->exit_request = 1;
-        }
+    } else { /* default */
         cpus_kick_thread(cpu);
     }
 }
@@ -723,23 +664,6 @@ void cpu_remove_sync(CPUState *cpu)
     qemu_mutex_lock_iothread();
 }
 
-static void qemu_hax_start_vcpu(CPUState *cpu)
-{
-    char thread_name[VCPU_THREAD_NAME_SIZE];
-
-    cpu->thread = g_malloc0(sizeof(QemuThread));
-    cpu->halt_cond = g_malloc0(sizeof(QemuCond));
-    qemu_cond_init(cpu->halt_cond);
-
-    snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/HAX",
-             cpu->cpu_index);
-    qemu_thread_create(cpu->thread, thread_name, qemu_hax_cpu_thread_fn,
-                       cpu, QEMU_THREAD_JOINABLE);
-#ifdef _WIN32
-    cpu->hThread = qemu_thread_get_handle(cpu->thread);
-#endif
-}
-
 static void qemu_hvf_start_vcpu(CPUState *cpu)
 {
     char thread_name[VCPU_THREAD_NAME_SIZE];
@@ -801,8 +725,6 @@ void qemu_init_vcpu(CPUState *cpu)
     if (cpus_accel) {
         /* accelerator already implements the CpusAccel interface */
         cpus_accel->create_vcpu_thread(cpu);
-    } else if (hax_enabled()) {
-        qemu_hax_start_vcpu(cpu);
     } else if (hvf_enabled()) {
         qemu_hvf_start_vcpu(cpu);
     } else if (whpx_enabled()) {
diff --git a/target/i386/Makefile.objs b/target/i386/Makefile.objs
index 0b93143e27..ee5a8fd4b4 100644
--- a/target/i386/Makefile.objs
+++ b/target/i386/Makefile.objs
@@ -10,11 +10,12 @@ obj-y += machine.o arch_memory_mapping.o arch_dump.o monitor.o
 obj-$(CONFIG_KVM) += kvm.o
 obj-$(CONFIG_HYPERV) += hyperv.o
 obj-$(call lnot,$(CONFIG_HYPERV)) += hyperv-stub.o
+obj-$(CONFIG_HAX) += hax-all.o hax-mem.o hax-cpus.o
 ifeq ($(CONFIG_WIN32),y)
-obj-$(CONFIG_HAX) += hax-all.o hax-mem.o hax-windows.o
+obj-$(CONFIG_HAX) += hax-windows.o
 endif
 ifeq ($(CONFIG_POSIX),y)
-obj-$(CONFIG_HAX) += hax-all.o hax-mem.o hax-posix.o
+obj-$(CONFIG_HAX) += hax-posix.o
 endif
 obj-$(CONFIG_HVF) += hvf/
 obj-$(CONFIG_WHPX) += whpx-all.o
diff --git a/target/i386/hax-all.c b/target/i386/hax-all.c
index c93bb23a44..b66ddeb8bf 100644
--- a/target/i386/hax-all.c
+++ b/target/i386/hax-all.c
@@ -32,9 +32,10 @@
 #include "sysemu/accel.h"
 #include "sysemu/reset.h"
 #include "sysemu/runstate.h"
-#include "qemu/main-loop.h"
 #include "hw/boards.h"
 
+#include "hax-cpus.h"
+
 #define DEBUG_HAX 0
 
 #define DPRINTF(fmt, ...) \
@@ -374,6 +375,9 @@ static int hax_accel_init(MachineState *ms)
                 !ret ? "working" : "not working",
                 !ret ? "fast virt" : "emulation");
     }
+    if (ret == 0) {
+        cpus_register_accel(&hax_cpus);
+    }
     return ret;
 }
 
diff --git a/target/i386/hax-cpus.c b/target/i386/hax-cpus.c
new file mode 100644
index 0000000000..69a4162939
--- /dev/null
+++ b/target/i386/hax-cpus.c
@@ -0,0 +1,85 @@
+/*
+ * QEMU HAX support
+ *
+ * Copyright IBM, Corp. 2008
+ *           Red Hat, Inc. 2008
+ *
+ * Authors:
+ *  Anthony Liguori   <aliguori@us.ibm.com>
+ *  Glauber Costa     <gcosta@redhat.com>
+ *
+ * Copyright (c) 2011 Intel Corporation
+ *  Written by:
+ *  Jiang Yunhong<yunhong.jiang@intel.com>
+ *  Xin Xiaohui<xiaohui.xin@intel.com>
+ *  Zhang Xiantao<xiantao.zhang@intel.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/error-report.h"
+#include "qemu/main-loop.h"
+#include "hax-i386.h"
+#include "sysemu/runstate.h"
+#include "sysemu/cpus.h"
+#include "qemu/guest-random.h"
+
+#include "hax-cpus.h"
+
+static void *hax_cpu_thread_fn(void *arg)
+{
+    CPUState *cpu = arg;
+    int r;
+
+    rcu_register_thread();
+    qemu_mutex_lock_iothread();
+    qemu_thread_get_self(cpu->thread);
+
+    cpu->thread_id = qemu_get_thread_id();
+    hax_init_vcpu(cpu);
+    cpu_thread_signal_created(cpu);
+    qemu_guest_random_seed_thread_part2(cpu->random_seed);
+
+    do {
+        if (cpu_can_run(cpu)) {
+            r = hax_smp_cpu_exec(cpu);
+            if (r == EXCP_DEBUG) {
+                cpu_handle_guest_debug(cpu);
+            }
+        }
+
+        qemu_wait_io_event(cpu);
+    } while (!cpu->unplug || cpu_can_run(cpu));
+    rcu_unregister_thread();
+    return NULL;
+}
+
+static void hax_start_vcpu_thread(CPUState *cpu)
+{
+    char thread_name[VCPU_THREAD_NAME_SIZE];
+
+    cpu->thread = g_malloc0(sizeof(QemuThread));
+    cpu->halt_cond = g_malloc0(sizeof(QemuCond));
+    qemu_cond_init(cpu->halt_cond);
+
+    snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/HAX",
+             cpu->cpu_index);
+    qemu_thread_create(cpu->thread, thread_name, hax_cpu_thread_fn,
+                       cpu, QEMU_THREAD_JOINABLE);
+#ifdef _WIN32
+    cpu->hThread = qemu_thread_get_handle(cpu->thread);
+#endif
+}
+
+CpusAccel hax_cpus = {
+    .create_vcpu_thread = hax_start_vcpu_thread,
+    .kick_vcpu_thread = hax_kick_vcpu_thread,
+
+    .synchronize_post_reset = hax_cpu_synchronize_post_reset,
+    .synchronize_post_init = hax_cpu_synchronize_post_init,
+    .synchronize_state = hax_cpu_synchronize_state,
+    .synchronize_pre_loadvm = hax_cpu_synchronize_pre_loadvm,
+};
diff --git a/target/i386/hax-cpus.h b/target/i386/hax-cpus.h
new file mode 100644
index 0000000000..ac3cf1f8ae
--- /dev/null
+++ b/target/i386/hax-cpus.h
@@ -0,0 +1,17 @@
+/*
+ * Accelerator CPUS Interface
+ *
+ * Copyright 2020 SUSE LLC
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef HAX_CPUS_H
+#define HAX_CPUS_H
+
+#include "sysemu/cpus.h"
+
+extern CpusAccel hax_cpus;
+
+#endif /* HAX_CPUS_H */
diff --git a/target/i386/hax-i386.h b/target/i386/hax-i386.h
index ec28708185..48c4abe14e 100644
--- a/target/i386/hax-i386.h
+++ b/target/i386/hax-i386.h
@@ -60,6 +60,8 @@ int hax_inject_interrupt(CPUArchState *env, int vector);
 struct hax_vm *hax_vm_create(struct hax_state *hax, int max_cpus);
 int hax_vcpu_run(struct hax_vcpu_state *vcpu);
 int hax_vcpu_create(int id);
+void hax_kick_vcpu_thread(CPUState *cpu);
+
 int hax_sync_vcpu_state(CPUArchState *env, struct vcpu_state_t *state,
                         int set);
 int hax_sync_msr(CPUArchState *env, struct hax_msr_data *msrs, int set);
diff --git a/target/i386/hax-posix.c b/target/i386/hax-posix.c
index 5f9d1b803d..6fb7867d11 100644
--- a/target/i386/hax-posix.c
+++ b/target/i386/hax-posix.c
@@ -16,6 +16,8 @@
 
 #include "target/i386/hax-i386.h"
 
+#include "sysemu/cpus.h"
+
 hax_fd hax_mod_open(void)
 {
     int fd = open("/dev/HAX", O_RDWR);
@@ -292,3 +294,13 @@ int hax_inject_interrupt(CPUArchState *env, int vector)
 
     return ioctl(fd, HAX_VCPU_IOCTL_INTERRUPT, &vector);
 }
+
+void hax_kick_vcpu_thread(CPUState *cpu)
+{
+    /*
+     * FIXME: race condition with the exit_request check in
+     * hax_vcpu_hax_exec
+     */
+    cpu->exit_request = 1;
+    cpus_kick_thread(cpu);
+}
diff --git a/target/i386/hax-windows.c b/target/i386/hax-windows.c
index 863c2bcc19..469b48e608 100644
--- a/target/i386/hax-windows.c
+++ b/target/i386/hax-windows.c
@@ -463,3 +463,23 @@ int hax_inject_interrupt(CPUArchState *env, int vector)
         return 0;
     }
 }
+
+static void CALLBACK dummy_apc_func(ULONG_PTR unused)
+{
+}
+
+void hax_kick_vcpu_thread(CPUState *cpu)
+{
+    /*
+     * FIXME: race condition with the exit_request check in
+     * hax_vcpu_hax_exec
+     */
+    cpu->exit_request = 1;
+    if (!qemu_cpu_is_self(cpu)) {
+        if (!QueueUserAPC(dummy_apc_func, cpu->hThread, 0)) {
+            fprintf(stderr, "%s: QueueUserAPC failed with error %lu\n",
+                    __func__, GetLastError());
+            exit(1);
+        }
+    }
+}
-- 
2.16.4



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC v3 7/8] cpus: extract out whpx-specific code to target/i386/
  2020-08-03  9:05 [RFC v3 0/8] QEMU cpus.c refactoring part2 Claudio Fontana
                   ` (5 preceding siblings ...)
  2020-08-03  9:05 ` [RFC v3 6/8] cpus: extract out hax-specific code to target/i386/ Claudio Fontana
@ 2020-08-03  9:05 ` Claudio Fontana
  2020-08-03  9:05 ` [RFC v3 8/8] cpus: extract out hvf-specific code to target/i386/hvf/ Claudio Fontana
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Claudio Fontana @ 2020-08-03  9:05 UTC (permalink / raw)
  To: Paolo Bonzini, Alex Bennée, Peter Maydell,
	Philippe Mathieu-Daudé
  Cc: Laurent Vivier, Thomas Huth, Eduardo Habkost, Pavel Dovgalyuk,
	Marcelo Tosatti, qemu-devel, Markus Armbruster, Roman Bolshakov,
	Wenchao Wang, Colin Xu, Claudio Fontana, haxm-team,
	Sunil Muthuswamy, Richard Henderson

register a "CpusAccel" interface for WHPX as well.

Signed-off-by: Claudio Fontana <cfontana@suse.de>
---
 MAINTAINERS               |  1 +
 softmmu/cpus.c            | 79 --------------------------------------
 target/i386/Makefile.objs |  2 +-
 target/i386/whpx-all.c    |  3 ++
 target/i386/whpx-cpus.c   | 96 +++++++++++++++++++++++++++++++++++++++++++++++
 target/i386/whpx-cpus.h   | 17 +++++++++
 6 files changed, 118 insertions(+), 80 deletions(-)
 create mode 100644 target/i386/whpx-cpus.c
 create mode 100644 target/i386/whpx-cpus.h

diff --git a/MAINTAINERS b/MAINTAINERS
index f8bac8cb64..e38097a265 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -450,6 +450,7 @@ WHPX CPUs
 M: Sunil Muthuswamy <sunilmut@microsoft.com>
 S: Supported
 F: target/i386/whpx-all.c
+F: target/i386/whpx-cpus.c
 F: target/i386/whp-dispatch.h
 F: accel/stubs/whpx-stub.c
 F: include/sysemu/whpx.h
diff --git a/softmmu/cpus.c b/softmmu/cpus.c
index 784593adec..586b4acaab 100644
--- a/softmmu/cpus.c
+++ b/softmmu/cpus.c
@@ -34,7 +34,6 @@
 #include "sysemu/hw_accel.h"
 #include "sysemu/kvm.h"
 #include "sysemu/hvf.h"
-#include "sysemu/whpx.h"
 #include "exec/exec-all.h"
 #include "qemu/thread.h"
 #include "qemu/plugin.h"
@@ -178,9 +177,6 @@ void cpu_synchronize_state(CPUState *cpu)
     if (cpus_accel && cpus_accel->synchronize_state) {
         cpus_accel->synchronize_state(cpu);
     }
-    if (whpx_enabled()) {
-        whpx_cpu_synchronize_state(cpu);
-    }
 }
 
 void cpu_synchronize_post_reset(CPUState *cpu)
@@ -188,9 +184,6 @@ void cpu_synchronize_post_reset(CPUState *cpu)
     if (cpus_accel && cpus_accel->synchronize_post_reset) {
         cpus_accel->synchronize_post_reset(cpu);
     }
-    if (whpx_enabled()) {
-        whpx_cpu_synchronize_post_reset(cpu);
-    }
 }
 
 void cpu_synchronize_post_init(CPUState *cpu)
@@ -198,9 +191,6 @@ void cpu_synchronize_post_init(CPUState *cpu)
     if (cpus_accel && cpus_accel->synchronize_post_init) {
         cpus_accel->synchronize_post_init(cpu);
     }
-    if (whpx_enabled()) {
-        whpx_cpu_synchronize_post_init(cpu);
-    }
 }
 
 void cpu_synchronize_pre_loadvm(CPUState *cpu)
@@ -211,9 +201,6 @@ void cpu_synchronize_pre_loadvm(CPUState *cpu)
     if (hvf_enabled()) {
         hvf_cpu_synchronize_pre_loadvm(cpu);
     }
-    if (whpx_enabled()) {
-        whpx_cpu_synchronize_pre_loadvm(cpu);
-    }
 }
 
 int64_t cpus_get_virtual_clock(void)
@@ -446,48 +433,6 @@ static void *qemu_hvf_cpu_thread_fn(void *arg)
     return NULL;
 }
 
-static void *qemu_whpx_cpu_thread_fn(void *arg)
-{
-    CPUState *cpu = arg;
-    int r;
-
-    rcu_register_thread();
-
-    qemu_mutex_lock_iothread();
-    qemu_thread_get_self(cpu->thread);
-    cpu->thread_id = qemu_get_thread_id();
-    current_cpu = cpu;
-
-    r = whpx_init_vcpu(cpu);
-    if (r < 0) {
-        fprintf(stderr, "whpx_init_vcpu failed: %s\n", strerror(-r));
-        exit(1);
-    }
-
-    /* signal CPU creation */
-    cpu_thread_signal_created(cpu);
-    qemu_guest_random_seed_thread_part2(cpu->random_seed);
-
-    do {
-        if (cpu_can_run(cpu)) {
-            r = whpx_vcpu_exec(cpu);
-            if (r == EXCP_DEBUG) {
-                cpu_handle_guest_debug(cpu);
-            }
-        }
-        while (cpu_thread_is_idle(cpu)) {
-            qemu_cond_wait(cpu->halt_cond, &qemu_global_mutex);
-        }
-        qemu_wait_io_event_common(cpu);
-    } while (!cpu->unplug || cpu_can_run(cpu));
-
-    whpx_destroy_vcpu(cpu);
-    cpu_thread_signal_destroyed(cpu);
-    qemu_mutex_unlock_iothread();
-    rcu_unregister_thread();
-    return NULL;
-}
-
 void cpus_kick_thread(CPUState *cpu)
 {
 #ifndef _WIN32
@@ -502,12 +447,6 @@ void cpus_kick_thread(CPUState *cpu)
         fprintf(stderr, "qemu:%s: %s", __func__, strerror(err));
         exit(1);
     }
-#else /* _WIN32 */
-    if (!qemu_cpu_is_self(cpu)) {
-        if (whpx_enabled()) {
-            whpx_vcpu_kick(cpu);
-        }
-    }
 #endif
 }
 
@@ -682,22 +621,6 @@ static void qemu_hvf_start_vcpu(CPUState *cpu)
                        cpu, QEMU_THREAD_JOINABLE);
 }
 
-static void qemu_whpx_start_vcpu(CPUState *cpu)
-{
-    char thread_name[VCPU_THREAD_NAME_SIZE];
-
-    cpu->thread = g_malloc0(sizeof(QemuThread));
-    cpu->halt_cond = g_malloc0(sizeof(QemuCond));
-    qemu_cond_init(cpu->halt_cond);
-    snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/WHPX",
-             cpu->cpu_index);
-    qemu_thread_create(cpu->thread, thread_name, qemu_whpx_cpu_thread_fn,
-                       cpu, QEMU_THREAD_JOINABLE);
-#ifdef _WIN32
-    cpu->hThread = qemu_thread_get_handle(cpu->thread);
-#endif
-}
-
 void cpus_register_accel(CpusAccel *ca)
 {
     assert(ca != NULL);
@@ -727,8 +650,6 @@ void qemu_init_vcpu(CPUState *cpu)
         cpus_accel->create_vcpu_thread(cpu);
     } else if (hvf_enabled()) {
         qemu_hvf_start_vcpu(cpu);
-    } else if (whpx_enabled()) {
-        qemu_whpx_start_vcpu(cpu);
     } else {
         assert(0);
     }
diff --git a/target/i386/Makefile.objs b/target/i386/Makefile.objs
index ee5a8fd4b4..606dec67d1 100644
--- a/target/i386/Makefile.objs
+++ b/target/i386/Makefile.objs
@@ -18,7 +18,7 @@ ifeq ($(CONFIG_POSIX),y)
 obj-$(CONFIG_HAX) += hax-posix.o
 endif
 obj-$(CONFIG_HVF) += hvf/
-obj-$(CONFIG_WHPX) += whpx-all.o
+obj-$(CONFIG_WHPX) += whpx-all.o whpx-cpus.o
 endif
 obj-$(CONFIG_SEV) += sev.o
 obj-$(call lnot,$(CONFIG_SEV)) += sev-stub.o
diff --git a/target/i386/whpx-all.c b/target/i386/whpx-all.c
index c78baac6df..8b6986c864 100644
--- a/target/i386/whpx-all.c
+++ b/target/i386/whpx-all.c
@@ -24,6 +24,8 @@
 #include "migration/blocker.h"
 #include "whp-dispatch.h"
 
+#include "whpx-cpus.h"
+
 #include <WinHvPlatform.h>
 #include <WinHvEmulation.h>
 
@@ -1575,6 +1577,7 @@ static int whpx_accel_init(MachineState *ms)
     whpx_memory_init();
 
     cpu_interrupt_handler = whpx_handle_interrupt;
+    cpus_register_accel(&whpx_cpus);
 
     printf("Windows Hypervisor Platform accelerator is operational\n");
     return 0;
diff --git a/target/i386/whpx-cpus.c b/target/i386/whpx-cpus.c
new file mode 100644
index 0000000000..3a0b69f771
--- /dev/null
+++ b/target/i386/whpx-cpus.c
@@ -0,0 +1,96 @@
+/*
+ * QEMU Windows Hypervisor Platform accelerator (WHPX)
+ *
+ * Copyright Microsoft Corp. 2017
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "sysemu/kvm_int.h"
+#include "qemu/main-loop.h"
+#include "sysemu/cpus.h"
+#include "qemu/guest-random.h"
+
+#include "sysemu/whpx.h"
+#include "whpx-cpus.h"
+
+#include <WinHvPlatform.h>
+#include <WinHvEmulation.h>
+
+static void *whpx_cpu_thread_fn(void *arg)
+{
+    CPUState *cpu = arg;
+    int r;
+
+    rcu_register_thread();
+
+    qemu_mutex_lock_iothread();
+    qemu_thread_get_self(cpu->thread);
+    cpu->thread_id = qemu_get_thread_id();
+    current_cpu = cpu;
+
+    r = whpx_init_vcpu(cpu);
+    if (r < 0) {
+        fprintf(stderr, "whpx_init_vcpu failed: %s\n", strerror(-r));
+        exit(1);
+    }
+
+    /* signal CPU creation */
+    cpu_thread_signal_created(cpu);
+    qemu_guest_random_seed_thread_part2(cpu->random_seed);
+
+    do {
+        if (cpu_can_run(cpu)) {
+            r = whpx_vcpu_exec(cpu);
+            if (r == EXCP_DEBUG) {
+                cpu_handle_guest_debug(cpu);
+            }
+        }
+        while (cpu_thread_is_idle(cpu)) {
+            qemu_cond_wait_iothread(cpu->halt_cond);
+        }
+        qemu_wait_io_event_common(cpu);
+    } while (!cpu->unplug || cpu_can_run(cpu));
+
+    whpx_destroy_vcpu(cpu);
+    cpu_thread_signal_destroyed(cpu);
+    qemu_mutex_unlock_iothread();
+    rcu_unregister_thread();
+    return NULL;
+}
+
+static void whpx_start_vcpu_thread(CPUState *cpu)
+{
+    char thread_name[VCPU_THREAD_NAME_SIZE];
+
+    cpu->thread = g_malloc0(sizeof(QemuThread));
+    cpu->halt_cond = g_malloc0(sizeof(QemuCond));
+    qemu_cond_init(cpu->halt_cond);
+    snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/WHPX",
+             cpu->cpu_index);
+    qemu_thread_create(cpu->thread, thread_name, whpx_cpu_thread_fn,
+                       cpu, QEMU_THREAD_JOINABLE);
+#ifdef _WIN32
+    cpu->hThread = qemu_thread_get_handle(cpu->thread);
+#endif
+}
+
+static void whpx_kick_vcpu_thread(CPUState *cpu)
+{
+    if (!qemu_cpu_is_self(cpu)) {
+        whpx_vcpu_kick(cpu);
+    }
+}
+
+CpusAccel whpx_cpus = {
+    .create_vcpu_thread = whpx_start_vcpu_thread,
+    .kick_vcpu_thread = whpx_kick_vcpu_thread,
+
+    .synchronize_post_reset = whpx_cpu_synchronize_post_reset,
+    .synchronize_post_init = whpx_cpu_synchronize_post_init,
+    .synchronize_state = whpx_cpu_synchronize_state,
+    .synchronize_pre_loadvm = whpx_cpu_synchronize_pre_loadvm,
+};
diff --git a/target/i386/whpx-cpus.h b/target/i386/whpx-cpus.h
new file mode 100644
index 0000000000..60b7be3735
--- /dev/null
+++ b/target/i386/whpx-cpus.h
@@ -0,0 +1,17 @@
+/*
+ * Accelerator CPUS Interface
+ *
+ * Copyright 2020 SUSE LLC
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef WHPX_CPUS_H
+#define WHPX_CPUS_H
+
+#include "sysemu/cpus.h"
+
+extern CpusAccel whpx_cpus;
+
+#endif /* WHPX_CPUS_H */
-- 
2.16.4



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC v3 8/8] cpus: extract out hvf-specific code to target/i386/hvf/
  2020-08-03  9:05 [RFC v3 0/8] QEMU cpus.c refactoring part2 Claudio Fontana
                   ` (6 preceding siblings ...)
  2020-08-03  9:05 ` [RFC v3 7/8] cpus: extract out whpx-specific " Claudio Fontana
@ 2020-08-03  9:05 ` Claudio Fontana
  2020-08-11  9:00   ` Roman Bolshakov
  2020-08-03  9:40 ` [RFC v3 0/8] QEMU cpus.c refactoring part2 Paolo Bonzini
  2020-08-03 11:48 ` Alex Bennée
  9 siblings, 1 reply; 25+ messages in thread
From: Claudio Fontana @ 2020-08-03  9:05 UTC (permalink / raw)
  To: Paolo Bonzini, Alex Bennée, Peter Maydell,
	Philippe Mathieu-Daudé
  Cc: Laurent Vivier, Thomas Huth, Eduardo Habkost, Pavel Dovgalyuk,
	Marcelo Tosatti, qemu-devel, Markus Armbruster, Roman Bolshakov,
	Wenchao Wang, Colin Xu, Claudio Fontana, haxm-team,
	Sunil Muthuswamy, Richard Henderson

register a "CpusAccel" interface for HVF as well.

Signed-off-by: Claudio Fontana <cfontana@suse.de>
---
 softmmu/cpus.c                |  63 --------------------
 target/i386/hvf/Makefile.objs |   2 +-
 target/i386/hvf/hvf-cpus.c    | 131 ++++++++++++++++++++++++++++++++++++++++++
 target/i386/hvf/hvf-cpus.h    |  17 ++++++
 target/i386/hvf/hvf.c         |   3 +
 5 files changed, 152 insertions(+), 64 deletions(-)
 create mode 100644 target/i386/hvf/hvf-cpus.c
 create mode 100644 target/i386/hvf/hvf-cpus.h

diff --git a/softmmu/cpus.c b/softmmu/cpus.c
index 586b4acaab..d327b2685c 100644
--- a/softmmu/cpus.c
+++ b/softmmu/cpus.c
@@ -33,7 +33,6 @@
 #include "exec/gdbstub.h"
 #include "sysemu/hw_accel.h"
 #include "sysemu/kvm.h"
-#include "sysemu/hvf.h"
 #include "exec/exec-all.h"
 #include "qemu/thread.h"
 #include "qemu/plugin.h"
@@ -391,48 +390,6 @@ void qemu_wait_io_event(CPUState *cpu)
     qemu_wait_io_event_common(cpu);
 }
 
-/* The HVF-specific vCPU thread function. This one should only run when the host
- * CPU supports the VMX "unrestricted guest" feature. */
-static void *qemu_hvf_cpu_thread_fn(void *arg)
-{
-    CPUState *cpu = arg;
-
-    int r;
-
-    assert(hvf_enabled());
-
-    rcu_register_thread();
-
-    qemu_mutex_lock_iothread();
-    qemu_thread_get_self(cpu->thread);
-
-    cpu->thread_id = qemu_get_thread_id();
-    cpu->can_do_io = 1;
-    current_cpu = cpu;
-
-    hvf_init_vcpu(cpu);
-
-    /* signal CPU creation */
-    cpu_thread_signal_created(cpu);
-    qemu_guest_random_seed_thread_part2(cpu->random_seed);
-
-    do {
-        if (cpu_can_run(cpu)) {
-            r = hvf_vcpu_exec(cpu);
-            if (r == EXCP_DEBUG) {
-                cpu_handle_guest_debug(cpu);
-            }
-        }
-        qemu_wait_io_event(cpu);
-    } while (!cpu->unplug || cpu_can_run(cpu));
-
-    hvf_vcpu_destroy(cpu);
-    cpu_thread_signal_destroyed(cpu);
-    qemu_mutex_unlock_iothread();
-    rcu_unregister_thread();
-    return NULL;
-}
-
 void cpus_kick_thread(CPUState *cpu)
 {
 #ifndef _WIN32
@@ -603,24 +560,6 @@ void cpu_remove_sync(CPUState *cpu)
     qemu_mutex_lock_iothread();
 }
 
-static void qemu_hvf_start_vcpu(CPUState *cpu)
-{
-    char thread_name[VCPU_THREAD_NAME_SIZE];
-
-    /* HVF currently does not support TCG, and only runs in
-     * unrestricted-guest mode. */
-    assert(hvf_enabled());
-
-    cpu->thread = g_malloc0(sizeof(QemuThread));
-    cpu->halt_cond = g_malloc0(sizeof(QemuCond));
-    qemu_cond_init(cpu->halt_cond);
-
-    snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/HVF",
-             cpu->cpu_index);
-    qemu_thread_create(cpu->thread, thread_name, qemu_hvf_cpu_thread_fn,
-                       cpu, QEMU_THREAD_JOINABLE);
-}
-
 void cpus_register_accel(CpusAccel *ca)
 {
     assert(ca != NULL);
@@ -648,8 +587,6 @@ void qemu_init_vcpu(CPUState *cpu)
     if (cpus_accel) {
         /* accelerator already implements the CpusAccel interface */
         cpus_accel->create_vcpu_thread(cpu);
-    } else if (hvf_enabled()) {
-        qemu_hvf_start_vcpu(cpu);
     } else {
         assert(0);
     }
diff --git a/target/i386/hvf/Makefile.objs b/target/i386/hvf/Makefile.objs
index 927b86bc67..af9f7dcfc1 100644
--- a/target/i386/hvf/Makefile.objs
+++ b/target/i386/hvf/Makefile.objs
@@ -1,2 +1,2 @@
-obj-y += hvf.o
+obj-y += hvf.o hvf-cpus.o
 obj-y += x86.o x86_cpuid.o x86_decode.o x86_descr.o x86_emu.o x86_flags.o x86_mmu.o x86hvf.o x86_task.o
diff --git a/target/i386/hvf/hvf-cpus.c b/target/i386/hvf/hvf-cpus.c
new file mode 100644
index 0000000000..9540157f1e
--- /dev/null
+++ b/target/i386/hvf/hvf-cpus.c
@@ -0,0 +1,131 @@
+/*
+ * Copyright 2008 IBM Corporation
+ *           2008 Red Hat, Inc.
+ * Copyright 2011 Intel Corporation
+ * Copyright 2016 Veertu, Inc.
+ * Copyright 2017 The Android Open Source Project
+ *
+ * QEMU Hypervisor.framework support
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ * This file contain code under public domain from the hvdos project:
+ * https://github.com/mist64/hvdos
+ *
+ * Parts Copyright (c) 2011 NetApp, Inc.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY NETAPP, INC ``AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED.  IN NO EVENT SHALL NETAPP, INC OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/error-report.h"
+#include "qemu/main-loop.h"
+#include "sysemu/hvf.h"
+#include "sysemu/runstate.h"
+#include "target/i386/cpu.h"
+#include "qemu/guest-random.h"
+
+#include "hvf-cpus.h"
+
+/*
+ * The HVF-specific vCPU thread function. This one should only run when the host
+ * CPU supports the VMX "unrestricted guest" feature.
+ */
+static void *hvf_cpu_thread_fn(void *arg)
+{
+    CPUState *cpu = arg;
+
+    int r;
+
+    assert(hvf_enabled());
+
+    rcu_register_thread();
+
+    qemu_mutex_lock_iothread();
+    qemu_thread_get_self(cpu->thread);
+
+    cpu->thread_id = qemu_get_thread_id();
+    cpu->can_do_io = 1;
+    current_cpu = cpu;
+
+    hvf_init_vcpu(cpu);
+
+    /* signal CPU creation */
+    cpu_thread_signal_created(cpu);
+    qemu_guest_random_seed_thread_part2(cpu->random_seed);
+
+    do {
+        if (cpu_can_run(cpu)) {
+            r = hvf_vcpu_exec(cpu);
+            if (r == EXCP_DEBUG) {
+                cpu_handle_guest_debug(cpu);
+            }
+        }
+        qemu_wait_io_event(cpu);
+    } while (!cpu->unplug || cpu_can_run(cpu));
+
+    hvf_vcpu_destroy(cpu);
+    cpu_thread_signal_destroyed(cpu);
+    qemu_mutex_unlock_iothread();
+    rcu_unregister_thread();
+    return NULL;
+}
+
+static void hvf_start_vcpu_thread(CPUState *cpu)
+{
+    char thread_name[VCPU_THREAD_NAME_SIZE];
+
+    /*
+     * HVF currently does not support TCG, and only runs in
+     * unrestricted-guest mode.
+     */
+    assert(hvf_enabled());
+
+    cpu->thread = g_malloc0(sizeof(QemuThread));
+    cpu->halt_cond = g_malloc0(sizeof(QemuCond));
+    qemu_cond_init(cpu->halt_cond);
+
+    snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/HVF",
+             cpu->cpu_index);
+    qemu_thread_create(cpu->thread, thread_name, hvf_cpu_thread_fn,
+                       cpu, QEMU_THREAD_JOINABLE);
+}
+
+CpusAccel hvf_cpus = {
+    .create_vcpu_thread = hvf_start_vcpu_thread,
+
+    .synchronize_post_reset = hvf_cpu_synchronize_post_reset,
+    .synchronize_post_init = hvf_cpu_synchronize_post_init,
+    .synchronize_state = hvf_cpu_synchronize_state,
+    .synchronize_pre_loadvm = hvf_cpu_synchronize_pre_loadvm,
+};
diff --git a/target/i386/hvf/hvf-cpus.h b/target/i386/hvf/hvf-cpus.h
new file mode 100644
index 0000000000..b66f4889b0
--- /dev/null
+++ b/target/i386/hvf/hvf-cpus.h
@@ -0,0 +1,17 @@
+/*
+ * Accelerator CPUS Interface
+ *
+ * Copyright 2020 SUSE LLC
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef HVF_CPUS_H
+#define HVF_CPUS_H
+
+#include "sysemu/cpus.h"
+
+extern CpusAccel hvf_cpus;
+
+#endif /* HVF_CPUS_H */
diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c
index d81f569aed..7ac6987c1b 100644
--- a/target/i386/hvf/hvf.c
+++ b/target/i386/hvf/hvf.c
@@ -72,6 +72,8 @@
 #include "sysemu/accel.h"
 #include "target/i386/cpu.h"
 
+#include "hvf-cpus.h"
+
 HVFState *hvf_state;
 
 static void assert_hvf_ok(hv_return_t ret)
@@ -894,6 +896,7 @@ static int hvf_accel_init(MachineState *ms)
     hvf_state = s;
     cpu_interrupt_handler = hvf_handle_interrupt;
     memory_listener_register(&hvf_memory_listener, &address_space_memory);
+    cpus_register_accel(&hvf_cpus);
     return 0;
 }
 
-- 
2.16.4



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [RFC v3 0/8] QEMU cpus.c refactoring part2
  2020-08-03  9:05 [RFC v3 0/8] QEMU cpus.c refactoring part2 Claudio Fontana
                   ` (7 preceding siblings ...)
  2020-08-03  9:05 ` [RFC v3 8/8] cpus: extract out hvf-specific code to target/i386/hvf/ Claudio Fontana
@ 2020-08-03  9:40 ` Paolo Bonzini
  2020-08-03 11:48 ` Alex Bennée
  9 siblings, 0 replies; 25+ messages in thread
From: Paolo Bonzini @ 2020-08-03  9:40 UTC (permalink / raw)
  To: Claudio Fontana, Alex Bennée, Peter Maydell,
	Philippe Mathieu-Daudé
  Cc: Laurent Vivier, Thomas Huth, Eduardo Habkost, Pavel Dovgalyuk,
	Marcelo Tosatti, qemu-devel, Markus Armbruster, Roman Bolshakov,
	Colin Xu, Wenchao Wang, haxm-team, Sunil Muthuswamy,
	Richard Henderson

On 03/08/20 11:05, Claudio Fontana wrote:
> 1) make icount TCG-only (building the icount module only under
> CONFIG_TCG), as this series suggests, and provide a separate virtual
> counter for qtest,
> 
> 
> or
> 
> 
> 2) continue to keep icount functions and fields, including vmstate,
> in all softmmu builds because of qtest current use of field
> qemu_icount_bias to implement its virtual counter for qtest_clock_warp?
> 
> 
> If I understand correctly Paolo might be for 2) (?)

I am for (1), but using function pointers and not extra "if"s.  I
quickly skimmed this patchset and it seems to DTRT; we could get into
huge discussions on how to organize the sers, but let's just not do that. :)

Paolo



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC v3 0/8] QEMU cpus.c refactoring part2
  2020-08-03  9:05 [RFC v3 0/8] QEMU cpus.c refactoring part2 Claudio Fontana
                   ` (8 preceding siblings ...)
  2020-08-03  9:40 ` [RFC v3 0/8] QEMU cpus.c refactoring part2 Paolo Bonzini
@ 2020-08-03 11:48 ` Alex Bennée
  2020-08-05 17:03   ` Claudio Fontana
  9 siblings, 1 reply; 25+ messages in thread
From: Alex Bennée @ 2020-08-03 11:48 UTC (permalink / raw)
  To: Claudio Fontana
  Cc: Laurent Vivier, Peter Maydell, Thomas Huth, Eduardo Habkost,
	Pavel Dovgalyuk, haxm-team, Marcelo Tosatti, qemu-devel,
	Markus Armbruster, Roman Bolshakov, Colin Xu, Wenchao Wang,
	Paolo Bonzini, Sunil Muthuswamy, Philippe Mathieu-Daudé,
	Richard Henderson


Claudio Fontana <cfontana@suse.de> writes:

> Motivation and higher level steps:
>
> https://lists.gnu.org/archive/html/qemu-devel/2020-05/msg04628.html
>
> The biggest open item for me is, does it makes sense to:
>
>
> 1) make icount TCG-only (building the icount module only under
> CONFIG_TCG), as this series suggests, and provide a separate virtual
> counter for qtest,

Well icount certainly never has any use except with TCG - the fields are
all wasted in the KVM case.

> or
>
>
> 2) continue to keep icount functions and fields, including vmstate,
> in all softmmu builds because of qtest current use of field
> qemu_icount_bias to implement its virtual counter for
> qtest_clock_warp?

Is this just a case of maintaining compatibility for saved VM images? We
could certainly keep the fields in VM state and stub out (or warn?) if a
icount related field turned up when reloading a VM into a KVM only build
or a build with !tcg_enabled().

I would defer to the vmstate experts on the best way to do this? Is the
field currently unconditional? Certainly the rr bits are only registered
when RR is enabled.

> If I understand correctly Paolo might be for 2) (?)
> would also welcome additional input from the community in any direction
> (Alex, Peter, Philippe?)
>
> ----
>
> RFC v2 -> v3:
>
> * provided defaults for all methods.
>   Only create_vcpu_thread is now a mandatory field. (Paolo)
>
> * separated new CpusAccel patch from its first user, new patch nr. 2:
>   "cpus: prepare new CpusAccel cpu accelerator interface"
>
> * new CpusAccel methods: get_virtual_clock and get_elapsed_ticks.
>   (Paolo)
>
>   In this series, get_virtual_clock has a separate implementation
>   between TCG/icount and qtest,
>   while get_elapsed_ticks only returns a virtual counter for icount.
>
>   Looking for more comments in this area.
>
> ----
>
> RFC v1 -> v2:
>
> * split the cpus.c accelerator refactoring into 6 patches.
>
> * other minor changes to be able to proceed step by step.
>
> ----
>
> * Rebased on commit 255ae6e2158c743717bed76c9a2365ee4bcd326e,
> "replay: notify the main loop when there are no instructions"
>
> [SPLIT into part1 and part2]
>
> ----
>
> v6 -> v7:
>
> * rebased changes on top of Pavel Dovgalyuk changes to dma-helpers.c
>   "icount: make dma reads deterministic"
>
> ----
>
> v5 -> v6:
>
> * rebased changes on top of Emilio G. Cota changes to cpus.c
>   "cpu: convert queued work to a QSIMPLEQ"
>
> * keep a pointer in cpus.c instead of a copy of CpusAccel
>   (Alex)
>
> ----
>
>
> v4 -> v5: rebase on latest master
>
> * rebased changes on top of roman series to remove one of the extra states for hvf.
>   (Is the result now functional for HVF?)
>
> * rebased changes on top of icount changes and fixes to icount_configure and
>   the new shift vmstate. (Markus)
>
> v3 -> v4:
>
> * overall: added copyright headers to all files that were missing them
>   (used copyright and license of the module the stuff was extracted from).
>   For the new interface files, added SUSE LLC.
>
> * 1/4 (move softmmu only files from root):
>
>   MAINTAINERS: moved softmmu/cpus.c to its final location (from patch 2)
>
> * 2/4 (cpu-throttle):
>
>   MAINTAINERS (to patch 1),
>   copyright Fabrice Bellard and license from cpus.c
>
> * 3/4 (cpu-timers, icount):
>
>   - MAINTAINERS: add cpu-timers.c and icount.c to Paolo
>
>   - break very long lines (patchew)
>
>   - add copyright SUSE LLC, GPLv2 to cpu-timers.h
>
>   - add copyright Fabrice Bellard and license from cpus.c to timers-state.h
>     as it is lifted from cpus.c
>
>   - vl.c: in configure_accelerators bail out if icount_enabled()
>     and !tcg_enabled() as qtest does not enable icount anymore.
>
> * 4/4 (accel stuff to accel):
>
>   - add copyright SUSE LLC to files that mostly only consist of the
>     new interface. Add whatever copyright was in the accelerator code
>     if instead they mostly consist of accelerator code.
>
>   - change a comment to mention the result of the AccelClass experiment
>
>   - moved qtest accelerator into accel/qtest/ , make it like the others.
>
>   - rename xxx-cpus-interface to xxx-cpus (remove "interface" from names)
>
>   - rename accel_int to cpus_accel
>
>   - rename CpusAccel functions from cpu_synchronize_* to synchronize_*
>
>
> --------
>
> v2 -> v3:
>
> * turned into a 4 patch series, adding a first patch moving
>   softmmu code currently in top_srcdir to softmmu/
>
> * cpu-throttle: moved to softmmu/
>
> * cpu-timers, icount:
>
>   - moved to softmmu/
>
>   - fixed assumption of qtest_enabled() => icount_enabled()
>   causing the failure of check-qtest-arm goal, in test-arm-mptimer.c
>
>   Fix is in hw/core/ptimer.c,
>
>   where the artificial timeout rate limit should not be applied
>   under qtest_enabled(), in a similar way to how it is not applied
>   for icount_enabled().
>
> * CpuAccelInterface: no change.
>
>
> --------
>
>
> v1 -> v2:
>
> * 1/3 (cpu-throttle): provide a description in the commit message
>
> * 2/3 (cpu-timers, icount): in this v2 separate icount from cpu-timers,
>   as icount is actually TCG-specific. Only build it under CONFIG_TCG.
>
>   To do this, qtest had to be detached from icount. To this end, a
>   trivial global counter for qtest has been introduced.
>
> * 3/3 (CpuAccelInterface): provided a description.
>
> This is point 8) in that plan. The idea is to extract the unrelated parts
> in cpus, and register interfaces from each single accelerator to the main
> cpus module (cpus.c).
>
> While doing this RFC, I noticed some assumptions about Windows being
> either TCG or HAX (not considering WHPX) that might need to be revisited.
> I added a comment there.
>
> The thing builds successfully based on Linux cross-compilations for
> windows/hax, windows/whpx, and I got a good build on Darwin/hvf.
>
> Tests run successully for tcg and kvm configurations, but did not test on
> windows or darwin.
>
> Welcome your feedback and help on this,
>
> Claudio
>
> Claudio Fontana (8):
>   cpu-timers, icount: new modules
>   cpus: prepare new CpusAccel cpu accelerator interface
>   cpus: extract out TCG-specific code to accel/tcg
>   cpus: extract out qtest-specific code to accel/qtest
>   cpus: extract out kvm-specific code to accel/kvm
>   cpus: extract out hax-specific code to target/i386/
>   cpus: extract out whpx-specific code to target/i386/
>   cpus: extract out hvf-specific code to target/i386/hvf/
>
>  MAINTAINERS                    |    5 +-
>  accel/Makefile.objs            |    2 +-
>  accel/kvm/Makefile.objs        |    2 +
>  accel/kvm/kvm-all.c            |   14 +-
>  accel/kvm/kvm-cpus.c           |   88 +++
>  accel/kvm/kvm-cpus.h           |   17 +
>  accel/qtest/Makefile.objs      |    2 +
>  accel/qtest/qtest-cpus.c       |   91 +++
>  accel/qtest/qtest-cpus.h       |   17 +
>  accel/{ => qtest}/qtest.c      |   13 +-
>  accel/stubs/kvm-stub.c         |    3 +-
>  accel/tcg/Makefile.objs        |    1 +
>  accel/tcg/cpu-exec.c           |   43 +-
>  accel/tcg/tcg-all.c            |   19 +-
>  accel/tcg/tcg-cpus.c           |  541 +++++++++++++
>  accel/tcg/tcg-cpus.h           |   17 +
>  accel/tcg/translate-all.c      |    3 +-
>  dma-helpers.c                  |    4 +-
>  docs/replay.txt                |    6 +-
>  exec.c                         |    4 -
>  hw/core/cpu.c                  |    1 +
>  hw/core/ptimer.c               |    8 +-
>  hw/i386/x86.c                  |    3 +-
>  include/exec/cpu-all.h         |    4 +
>  include/exec/exec-all.h        |    4 +-
>  include/qemu/timer.h           |   24 +-
>  include/sysemu/cpu-timers.h    |   84 ++
>  include/sysemu/cpus.h          |   48 +-
>  include/sysemu/hw_accel.h      |   69 +-
>  include/sysemu/kvm.h           |    2 +-
>  include/sysemu/qtest.h         |    2 +
>  include/sysemu/replay.h        |    4 +-
>  replay/replay.c                |    6 +-
>  softmmu/Makefile.objs          |    2 +
>  softmmu/cpu-timers.c           |  279 +++++++
>  softmmu/cpus.c                 | 1661 +++-------------------------------------
>  softmmu/icount.c               |  497 ++++++++++++
>  softmmu/qtest.c                |   34 +-
>  softmmu/timers-state.h         |   69 ++
>  softmmu/vl.c                   |   11 +-
>  stubs/Makefile.objs            |    6 +-
>  stubs/clock-warp.c             |    7 -
>  stubs/cpu-get-clock.c          |    3 +-
>  stubs/cpu-get-icount.c         |   21 -
>  stubs/cpu-synchronize-state.c  |   15 +
>  stubs/cpus-get-virtual-clock.c |    8 +
>  stubs/icount.c                 |   52 ++
>  stubs/qemu-timer-notify-cb.c   |    8 +
>  stubs/qtest.c                  |    5 +
>  target/alpha/translate.c       |    3 +-
>  target/arm/helper.c            |    7 +-
>  target/i386/Makefile.objs      |    7 +-
>  target/i386/hax-all.c          |    6 +-
>  target/i386/hax-cpus.c         |   85 ++
>  target/i386/hax-cpus.h         |   17 +
>  target/i386/hax-i386.h         |    2 +
>  target/i386/hax-posix.c        |   12 +
>  target/i386/hax-windows.c      |   20 +
>  target/i386/hvf/Makefile.objs  |    2 +-
>  target/i386/hvf/hvf-cpus.c     |  131 ++++
>  target/i386/hvf/hvf-cpus.h     |   17 +
>  target/i386/hvf/hvf.c          |    3 +
>  target/i386/whpx-all.c         |    3 +
>  target/i386/whpx-cpus.c        |   96 +++
>  target/i386/whpx-cpus.h        |   17 +
>  target/riscv/csr.c             |    8 +-
>  tests/ptimer-test-stubs.c      |    7 +-
>  tests/test-timed-average.c     |    2 +-
>  util/main-loop.c               |   12 +-
>  util/qemu-timer.c              |   14 +-
>  70 files changed, 2528 insertions(+), 1772 deletions(-)
>  create mode 100644 accel/kvm/kvm-cpus.c
>  create mode 100644 accel/kvm/kvm-cpus.h
>  create mode 100644 accel/qtest/Makefile.objs
>  create mode 100644 accel/qtest/qtest-cpus.c
>  create mode 100644 accel/qtest/qtest-cpus.h
>  rename accel/{ => qtest}/qtest.c (81%)
>  create mode 100644 accel/tcg/tcg-cpus.c
>  create mode 100644 accel/tcg/tcg-cpus.h
>  create mode 100644 include/sysemu/cpu-timers.h
>  create mode 100644 softmmu/cpu-timers.c
>  create mode 100644 softmmu/icount.c
>  create mode 100644 softmmu/timers-state.h
>  delete mode 100644 stubs/clock-warp.c
>  delete mode 100644 stubs/cpu-get-icount.c
>  create mode 100644 stubs/cpu-synchronize-state.c
>  create mode 100644 stubs/cpus-get-virtual-clock.c
>  create mode 100644 stubs/icount.c
>  create mode 100644 stubs/qemu-timer-notify-cb.c
>  create mode 100644 target/i386/hax-cpus.c
>  create mode 100644 target/i386/hax-cpus.h
>  create mode 100644 target/i386/hvf/hvf-cpus.c
>  create mode 100644 target/i386/hvf/hvf-cpus.h
>  create mode 100644 target/i386/whpx-cpus.c
>  create mode 100644 target/i386/whpx-cpus.h


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC v3 1/8] cpu-timers, icount: new modules
  2020-08-03  9:05 ` [RFC v3 1/8] cpu-timers, icount: new modules Claudio Fontana
@ 2020-08-04  8:13   ` Claudio Fontana
  2020-08-04  8:23     ` Paolo Bonzini
  0 siblings, 1 reply; 25+ messages in thread
From: Claudio Fontana @ 2020-08-04  8:13 UTC (permalink / raw)
  To: Paolo Bonzini, Alex Bennée, Peter Maydell,
	Philippe Mathieu-Daudé,
	Pavel Dovgalyuk
  Cc: Laurent Vivier, Thomas Huth, Eduardo Habkost, Marcelo Tosatti,
	Markus Armbruster, qemu-devel, Roman Bolshakov, haxm-team,
	Wenchao Wang, Sunil Muthuswamy, Richard Henderson, Colin Xu

Hi Alex, Paolo and all,

thank you for your feedback, could you help me answer the question below?

On 8/3/20 11:05 AM, Claudio Fontana wrote:
> ...

> diff --git a/dma-helpers.c b/dma-helpers.c
> index 2a77b5a9cb..240ef4d5b8 100644
> --- a/dma-helpers.c
> +++ b/dma-helpers.c
> @@ -13,7 +13,7 @@
>  #include "trace-root.h"
>  #include "qemu/thread.h"
>  #include "qemu/main-loop.h"
> -#include "sysemu/cpus.h"
> +#include "sysemu/cpu-timers.h"
>  #include "qemu/range.h"
>  
>  /* #define DEBUG_IOMMU */
> @@ -151,7 +151,7 @@ static void dma_blk_cb(void *opaque, int ret)
>           * from several sectors. This code splits all SGs into several
>           * groups. SGs in every group do not overlap.
>           */
> -        if (mem && use_icount && dbs->dir == DMA_DIRECTION_FROM_DEVICE) {
> +        if (mem && icount_enabled() && dbs->dir == DMA_DIRECTION_FROM_DEVICE) {




In this specific case, where dma_blk_cb() changes its behaviour to be more deterministic
if icount_enabled(),

do you think that if qtest_enabled() we should also follow the more deterministic path,
or should we go through the "normal" path instead, as this patch does?

Tests pass in any case, but I wonder what would be the best behavior for qtest accel in this case.
(Maybe Pavel?)



>              int i;
>              for (i = 0 ; i < dbs->iov.niov ; ++i) {
>                  if (ranges_overlap((intptr_t)dbs->iov.iov[i].iov_base,
> diff --git a/docs/replay.txt b/docs/replay.txt
> index 70c27edb36..8952e6d852 100644
> --- a/docs/replay.txt
> +++ b/docs/replay.txt
> @@ -184,11 +184,11 @@ is then incremented (which is called "warping" the virtual clock) as
>  soon as the timer fires or the CPUs need to go out of the idle state.
>  Two functions are used for this purpose; because these actions change
>  virtual machine state and must be deterministic, each of them creates a
> -checkpoint.  qemu_start_warp_timer checks if the CPUs are idle and if so
> -starts accounting real time to virtual clock.  qemu_account_warp_timer
> +checkpoint.  icount_start_warp_timer checks if the CPUs are idle and if so
> +starts accounting real time to virtual clock.  icount_account_warp_timer
>  is called when the CPUs get an interrupt or when the warp timer fires,
>  and it warps the virtual clock by the amount of real time that has passed
> -since qemu_start_warp_timer.
> +since icount_start_warp_timer.
>  
>  Bottom halves
>  -------------
> diff --git a/exec.c b/exec.c
> index 6f381f98e2..a89ffa93c1 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -102,10 +102,6 @@ uintptr_t qemu_host_page_size;
>  intptr_t qemu_host_page_mask;
>  
>  #if !defined(CONFIG_USER_ONLY)
> -/* 0 = Do not count executed instructions.
> -   1 = Precise instruction counting.
> -   2 = Adaptive rate instruction counting.  */
> -int use_icount;
>  
>  typedef struct PhysPageEntry PhysPageEntry;
>  
> diff --git a/hw/core/ptimer.c b/hw/core/ptimer.c
> index b5a54e2536..c6d2beb1da 100644
> --- a/hw/core/ptimer.c
> +++ b/hw/core/ptimer.c
> @@ -7,11 +7,11 @@
>   */
>  
>  #include "qemu/osdep.h"
> -#include "qemu/timer.h"
>  #include "hw/ptimer.h"
>  #include "migration/vmstate.h"
>  #include "qemu/host-utils.h"
>  #include "sysemu/replay.h"
> +#include "sysemu/cpu-timers.h"
>  #include "sysemu/qtest.h"
>  #include "block/aio.h"
>  #include "sysemu/cpus.h"
> @@ -134,7 +134,8 @@ static void ptimer_reload(ptimer_state *s, int delta_adjust)
>       * on the current generation of host machines.
>       */
>  
> -    if (s->enabled == 1 && (delta * period < 10000) && !use_icount) {
> +    if (s->enabled == 1 && (delta * period < 10000) &&
> +        !icount_enabled() && !qtest_enabled()) {


In this case, it is necessary to also make qtest more deterministic in order to make existing tests pass,
as the results of the timer are affecting the ptimer test results (IIRC tests/ptimer-test.c)


>          period = 10000 / delta;
>          period_frac = 0;
>      }
> @@ -217,7 +218,8 @@ uint64_t ptimer_get_count(ptimer_state *s)
>              uint32_t period_frac = s->period_frac;
>              uint64_t period = s->period;
>  
> -            if (!oneshot && (s->delta * period < 10000) && !use_icount) {
> +            if (!oneshot && (s->delta * period < 10000) &&
> +                !icount_enabled() && !qtest_enabled()) {

...same here.

>                  period = 10000 / s->delta;
>                  period_frac = 0;
>              }


Thanks for your feedback,

Claudio


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC v3 1/8] cpu-timers, icount: new modules
  2020-08-04  8:13   ` Claudio Fontana
@ 2020-08-04  8:23     ` Paolo Bonzini
  0 siblings, 0 replies; 25+ messages in thread
From: Paolo Bonzini @ 2020-08-04  8:23 UTC (permalink / raw)
  To: Claudio Fontana, Alex Bennée, Peter Maydell,
	Philippe Mathieu-Daudé,
	Pavel Dovgalyuk
  Cc: Laurent Vivier, Thomas Huth, Eduardo Habkost, Marcelo Tosatti,
	Markus Armbruster, qemu-devel, Roman Bolshakov, haxm-team,
	Wenchao Wang, Sunil Muthuswamy, Richard Henderson, Colin Xu

On 04/08/20 10:13, Claudio Fontana wrote:
>> -        if (mem && use_icount && dbs->dir == DMA_DIRECTION_FROM_DEVICE) {
>> +        if (mem && icount_enabled() && dbs->dir == DMA_DIRECTION_FROM_DEVICE) {
> 
> 
> 
> In this specific case, where dma_blk_cb() changes its behaviour to be more deterministic
> if icount_enabled(),
> 
> do you think that if qtest_enabled() we should also follow the more deterministic path,
> or should we go through the "normal" path instead, as this patch does?
> 
> Tests pass in any case, but I wonder what would be the best behavior for qtest accel in this case.
> (Maybe Pavel?)

No, qtests simply should not use SG lists that cause the problematic
nondeterminism.  We don't have that luxury for guests.

Paolo



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC v3 2/8] cpus: prepare new CpusAccel cpu accelerator interface
  2020-08-03  9:05 ` [RFC v3 2/8] cpus: prepare new CpusAccel cpu accelerator interface Claudio Fontana
@ 2020-08-05  8:40   ` Claudio Fontana
  2020-08-05  8:47     ` Paolo Bonzini
  2020-08-11  8:59   ` Roman Bolshakov
  2020-08-20  8:17   ` Claudio Fontana
  2 siblings, 1 reply; 25+ messages in thread
From: Claudio Fontana @ 2020-08-05  8:40 UTC (permalink / raw)
  To: Paolo Bonzini, Alex Bennée, Peter Maydell,
	Philippe Mathieu-Daudé
  Cc: Laurent Vivier, Thomas Huth, Eduardo Habkost, Marcelo Tosatti,
	qemu-devel, Markus Armbruster, Roman Bolshakov, Pavel Dovgalyuk,
	Wenchao Wang, haxm-team, Sunil Muthuswamy, Richard Henderson,
	Colin Xu

Hi all,

could you give a check to this detail, marked as a comment here?

While doing the refactoring and looking at the history,
I _think_ I noticed something that could be wrong related to whpx and hax,

and I marked this as a comment. Maybe Paolo?


On 8/3/20 11:05 AM, Claudio Fontana wrote:
[...]

  
> -static void qemu_wait_io_event(CPUState *cpu)
> +void qemu_wait_io_event(CPUState *cpu)
>  {
>      bool slept = false;
>  
> @@ -437,7 +538,8 @@ static void qemu_wait_io_event(CPUState *cpu)
>      }
>  
>  #ifdef _WIN32
> -    /* Eat dummy APC queued by qemu_cpu_kick_thread.  */
> +    /* Eat dummy APC queued by qemu_cpu_kick_thread. */
> +    /* NB!!! Should not this be if (hax_enabled)? Is this wrong for whpx? */
>      if (!tcg_enabled()) {
>          SleepEx(0, TRUE);
>      }


Looking at the history here, I think this should be if (hax_enabled());
this check was added at a time when whpx did not exist, so I _think_ there might have been an assumption here
that !tcg_enabled() on windows means actually hax_enabled() for eating this dummy APC.

Probably it does not cause problems, because whpx does not end up calling qemu_wait_io_event,
instead it calls qemu_wait_io_event_common. But it would be more expressive to use if (hax_enabled()) I think.

Could be separately patched.. relevant commits in history follow.

Thanks,

Claudio


commit db08b687cdd5319286665aabd34f82665630416f
Author: Paolo Bonzini <pbonzini@redhat.com>
Date:   Thu Jan 11 13:53:12 2018 +0100

    cpus: unify qemu_*_wait_io_event
    
    Except for round-robin TCG, every other accelerator is using more or
    less the same code around qemu_wait_io_event_common.  The exception
    is HAX, which also has to eat the dummy APC that is queued by
    qemu_cpu_kick_thread.
    
    We can add the SleepEx call to qemu_wait_io_event under "if
    (!tcg_enabled())", since that is the condition that is used in
    qemu_cpu_kick_thread, and unify the function for KVM, HAX, HVF and
    multi-threaded TCG.  Single-threaded TCG code can also be simplified
    since it is only used in the round-robin, sleep-if-all-CPUs-idle case.
    
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


commit 19306806ae30b7fb5fe61a9130c6995402acad00
Author: Justin Terry (VM) <juterry@microsoft.com>
Date:   Mon Jan 22 13:07:49 2018 -0800

    Add the WHPX acceleration enlightenments
    
    Implements the WHPX accelerator cpu enlightenments to actually use the whpx-all
    accelerator on Windows platforms.
    
    Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
    Message-Id: <1516655269-1785-5-git-send-email-juterry@microsoft.com>
    [Register/unregister VCPU thread with RCU. - Paolo]
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

commit b0cb0a66d6d535112aa513568ef21dcb1ad283ed
Author: Vincent Palatin <vpalatin@chromium.org>
Date:   Tue Jan 10 11:59:57 2017 +0100

    Plumb the HAXM-based hardware acceleration support
    
    Use the Intel HAX is kernel-based hardware acceleration module for
    Windows (similar to KVM on Linux).
    
    Based on the "target/i386: Add Intel HAX to android emulator" patch
    from David Chou <david.j.chou@intel.com>
    
    Signed-off-by: Vincent Palatin <vpalatin@chromium.org>
    Message-Id: <7b9cae28a0c379ab459c7a8545c9a39762bd394f.1484045952.git.vpalatin@chromium.org>
    [Drop hax_populate_ram stub. - Paolo]
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>









^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC v3 2/8] cpus: prepare new CpusAccel cpu accelerator interface
  2020-08-05  8:40   ` Claudio Fontana
@ 2020-08-05  8:47     ` Paolo Bonzini
  2020-08-05  8:50       ` Claudio Fontana
  0 siblings, 1 reply; 25+ messages in thread
From: Paolo Bonzini @ 2020-08-05  8:47 UTC (permalink / raw)
  To: Claudio Fontana, Alex Bennée, Peter Maydell,
	Philippe Mathieu-Daudé
  Cc: Laurent Vivier, Thomas Huth, Eduardo Habkost, Marcelo Tosatti,
	qemu-devel, Markus Armbruster, Roman Bolshakov, Pavel Dovgalyuk,
	Wenchao Wang, haxm-team, Sunil Muthuswamy, Richard Henderson,
	Colin Xu

On 05/08/20 10:40, Claudio Fontana wrote:
>>  #ifdef _WIN32
>> -    /* Eat dummy APC queued by qemu_cpu_kick_thread.  */
>> +    /* Eat dummy APC queued by qemu_cpu_kick_thread. */
>> +    /* NB!!! Should not this be if (hax_enabled)? Is this wrong for whpx? */
>>      if (!tcg_enabled()) {
>>          SleepEx(0, TRUE);
>>      }
> 
> Looking at the history here, I think this should be if (hax_enabled());
> this check was added at a time when whpx did not exist, so I _think_ there might have been an assumption here
> that !tcg_enabled() on windows means actually hax_enabled() for eating this dummy APC.

Yes, that matches the condition under which QueueUserAPC is called in
qemu_cpu_kick_thread.

Paolo

> Probably it does not cause problems, because whpx does not end up calling qemu_wait_io_event,
> instead it calls qemu_wait_io_event_common. But it would be more expressive to use if (hax_enabled()) I think.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC v3 2/8] cpus: prepare new CpusAccel cpu accelerator interface
  2020-08-05  8:47     ` Paolo Bonzini
@ 2020-08-05  8:50       ` Claudio Fontana
  0 siblings, 0 replies; 25+ messages in thread
From: Claudio Fontana @ 2020-08-05  8:50 UTC (permalink / raw)
  To: Paolo Bonzini, Alex Bennée, Peter Maydell,
	Philippe Mathieu-Daudé
  Cc: Laurent Vivier, Thomas Huth, Eduardo Habkost, Marcelo Tosatti,
	qemu-devel, Markus Armbruster, Roman Bolshakov, Pavel Dovgalyuk,
	Wenchao Wang, haxm-team, Sunil Muthuswamy, Richard Henderson,
	Colin Xu

On 8/5/20 10:47 AM, Paolo Bonzini wrote:
> On 05/08/20 10:40, Claudio Fontana wrote:
>>>  #ifdef _WIN32
>>> -    /* Eat dummy APC queued by qemu_cpu_kick_thread.  */
>>> +    /* Eat dummy APC queued by qemu_cpu_kick_thread. */
>>> +    /* NB!!! Should not this be if (hax_enabled)? Is this wrong for whpx? */
>>>      if (!tcg_enabled()) {
>>>          SleepEx(0, TRUE);
>>>      }
>>
>> Looking at the history here, I think this should be if (hax_enabled());
>> this check was added at a time when whpx did not exist, so I _think_ there might have been an assumption here
>> that !tcg_enabled() on windows means actually hax_enabled() for eating this dummy APC.
> 
> Yes, that matches the condition under which QueueUserAPC is called in
> qemu_cpu_kick_thread.
> 
> Paolo
> 
>> Probably it does not cause problems, because whpx does not end up calling qemu_wait_io_event,
>> instead it calls qemu_wait_io_event_common. But it would be more expressive to use if (hax_enabled()) I think.
> 

Thanks for the clarification, indeed,
I'd then convert it to hax_enabled() in the series then, because this allows removing an extra include in cpus.c

(no need to check for tcg_enabled() in cpus.c anymore)...

thanks,

Claudio


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC v3 0/8] QEMU cpus.c refactoring part2
  2020-08-03 11:48 ` Alex Bennée
@ 2020-08-05 17:03   ` Claudio Fontana
  0 siblings, 0 replies; 25+ messages in thread
From: Claudio Fontana @ 2020-08-05 17:03 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Laurent Vivier, Peter Maydell, Thomas Huth, Eduardo Habkost,
	Marcelo Tosatti, qemu-devel, Markus Armbruster, Roman Bolshakov,
	Pavel Dovgalyuk, Colin Xu, Paolo Bonzini, haxm-team,
	Sunil Muthuswamy, Richard Henderson, Philippe Mathieu-Daudé,
	Wenchao Wang

On 8/3/20 1:48 PM, Alex Bennée wrote:
> 
> Claudio Fontana <cfontana@suse.de> writes:
> 
>> Motivation and higher level steps:
>>
>> https://lists.gnu.org/archive/html/qemu-devel/2020-05/msg04628.html
>>
>> The biggest open item for me is, does it makes sense to:
>>
>>
>> 1) make icount TCG-only (building the icount module only under
>> CONFIG_TCG), as this series suggests, and provide a separate virtual
>> counter for qtest,
> 
> Well icount certainly never has any use except with TCG - the fields are
> all wasted in the KVM case.
> 
>> or
>>
>>
>> 2) continue to keep icount functions and fields, including vmstate,
>> in all softmmu builds because of qtest current use of field
>> qemu_icount_bias to implement its virtual counter for
>> qtest_clock_warp?
> 
> Is this just a case of maintaining compatibility for saved VM images? We
> could certainly keep the fields in VM state and stub out (or warn?) if a
> icount related field turned up when reloading a VM into a KVM only build
> or a build with !tcg_enabled().
> 
> I would defer to the vmstate experts on the best way to do this? Is the
> field currently unconditional? Certainly the rr bits are only registered
> when RR is enabled.


Hi Alex and all,

do we have a compatibility issue to worry about?

Ie, I assumed looking at the "needed" functions in vmstate that
if a VM contains a subfield that is unneeded when loaded, it would just be ignored.

But maybe this was a too optimistic assumption?

Thank you,

Claudio


> 
>> If I understand correctly Paolo might be for 2) (?)
>> would also welcome additional input from the community in any direction
>> (Alex, Peter, Philippe?)
>>
>> ----
>>
>> RFC v2 -> v3:
>>
>> * provided defaults for all methods.
>>   Only create_vcpu_thread is now a mandatory field. (Paolo)
>>
>> * separated new CpusAccel patch from its first user, new patch nr. 2:
>>   "cpus: prepare new CpusAccel cpu accelerator interface"
>>
>> * new CpusAccel methods: get_virtual_clock and get_elapsed_ticks.
>>   (Paolo)
>>
>>   In this series, get_virtual_clock has a separate implementation
>>   between TCG/icount and qtest,
>>   while get_elapsed_ticks only returns a virtual counter for icount.
>>
>>   Looking for more comments in this area.
>>
>> ----
>>
>> RFC v1 -> v2:
>>
>> * split the cpus.c accelerator refactoring into 6 patches.
>>
>> * other minor changes to be able to proceed step by step.
>>
>> ----
>>
>> * Rebased on commit 255ae6e2158c743717bed76c9a2365ee4bcd326e,
>> "replay: notify the main loop when there are no instructions"
>>
>> [SPLIT into part1 and part2]
>>
>> ----
>>
>> v6 -> v7:
>>
>> * rebased changes on top of Pavel Dovgalyuk changes to dma-helpers.c
>>   "icount: make dma reads deterministic"
>>
>> ----
>>
>> v5 -> v6:
>>
>> * rebased changes on top of Emilio G. Cota changes to cpus.c
>>   "cpu: convert queued work to a QSIMPLEQ"
>>
>> * keep a pointer in cpus.c instead of a copy of CpusAccel
>>   (Alex)
>>
>> ----
>>
>>
>> v4 -> v5: rebase on latest master
>>
>> * rebased changes on top of roman series to remove one of the extra states for hvf.
>>   (Is the result now functional for HVF?)
>>
>> * rebased changes on top of icount changes and fixes to icount_configure and
>>   the new shift vmstate. (Markus)
>>
>> v3 -> v4:
>>
>> * overall: added copyright headers to all files that were missing them
>>   (used copyright and license of the module the stuff was extracted from).
>>   For the new interface files, added SUSE LLC.
>>
>> * 1/4 (move softmmu only files from root):
>>
>>   MAINTAINERS: moved softmmu/cpus.c to its final location (from patch 2)
>>
>> * 2/4 (cpu-throttle):
>>
>>   MAINTAINERS (to patch 1),
>>   copyright Fabrice Bellard and license from cpus.c
>>
>> * 3/4 (cpu-timers, icount):
>>
>>   - MAINTAINERS: add cpu-timers.c and icount.c to Paolo
>>
>>   - break very long lines (patchew)
>>
>>   - add copyright SUSE LLC, GPLv2 to cpu-timers.h
>>
>>   - add copyright Fabrice Bellard and license from cpus.c to timers-state.h
>>     as it is lifted from cpus.c
>>
>>   - vl.c: in configure_accelerators bail out if icount_enabled()
>>     and !tcg_enabled() as qtest does not enable icount anymore.
>>
>> * 4/4 (accel stuff to accel):
>>
>>   - add copyright SUSE LLC to files that mostly only consist of the
>>     new interface. Add whatever copyright was in the accelerator code
>>     if instead they mostly consist of accelerator code.
>>
>>   - change a comment to mention the result of the AccelClass experiment
>>
>>   - moved qtest accelerator into accel/qtest/ , make it like the others.
>>
>>   - rename xxx-cpus-interface to xxx-cpus (remove "interface" from names)
>>
>>   - rename accel_int to cpus_accel
>>
>>   - rename CpusAccel functions from cpu_synchronize_* to synchronize_*
>>
>>
>> --------
>>
>> v2 -> v3:
>>
>> * turned into a 4 patch series, adding a first patch moving
>>   softmmu code currently in top_srcdir to softmmu/
>>
>> * cpu-throttle: moved to softmmu/
>>
>> * cpu-timers, icount:
>>
>>   - moved to softmmu/
>>
>>   - fixed assumption of qtest_enabled() => icount_enabled()
>>   causing the failure of check-qtest-arm goal, in test-arm-mptimer.c
>>
>>   Fix is in hw/core/ptimer.c,
>>
>>   where the artificial timeout rate limit should not be applied
>>   under qtest_enabled(), in a similar way to how it is not applied
>>   for icount_enabled().
>>
>> * CpuAccelInterface: no change.
>>
>>
>> --------
>>
>>
>> v1 -> v2:
>>
>> * 1/3 (cpu-throttle): provide a description in the commit message
>>
>> * 2/3 (cpu-timers, icount): in this v2 separate icount from cpu-timers,
>>   as icount is actually TCG-specific. Only build it under CONFIG_TCG.
>>
>>   To do this, qtest had to be detached from icount. To this end, a
>>   trivial global counter for qtest has been introduced.
>>
>> * 3/3 (CpuAccelInterface): provided a description.
>>
>> This is point 8) in that plan. The idea is to extract the unrelated parts
>> in cpus, and register interfaces from each single accelerator to the main
>> cpus module (cpus.c).
>>
>> While doing this RFC, I noticed some assumptions about Windows being
>> either TCG or HAX (not considering WHPX) that might need to be revisited.
>> I added a comment there.
>>
>> The thing builds successfully based on Linux cross-compilations for
>> windows/hax, windows/whpx, and I got a good build on Darwin/hvf.
>>
>> Tests run successully for tcg and kvm configurations, but did not test on
>> windows or darwin.
>>
>> Welcome your feedback and help on this,
>>
>> Claudio
>>
>> Claudio Fontana (8):
>>   cpu-timers, icount: new modules
>>   cpus: prepare new CpusAccel cpu accelerator interface
>>   cpus: extract out TCG-specific code to accel/tcg
>>   cpus: extract out qtest-specific code to accel/qtest
>>   cpus: extract out kvm-specific code to accel/kvm
>>   cpus: extract out hax-specific code to target/i386/
>>   cpus: extract out whpx-specific code to target/i386/
>>   cpus: extract out hvf-specific code to target/i386/hvf/
>>
>>  MAINTAINERS                    |    5 +-
>>  accel/Makefile.objs            |    2 +-
>>  accel/kvm/Makefile.objs        |    2 +
>>  accel/kvm/kvm-all.c            |   14 +-
>>  accel/kvm/kvm-cpus.c           |   88 +++
>>  accel/kvm/kvm-cpus.h           |   17 +
>>  accel/qtest/Makefile.objs      |    2 +
>>  accel/qtest/qtest-cpus.c       |   91 +++
>>  accel/qtest/qtest-cpus.h       |   17 +
>>  accel/{ => qtest}/qtest.c      |   13 +-
>>  accel/stubs/kvm-stub.c         |    3 +-
>>  accel/tcg/Makefile.objs        |    1 +
>>  accel/tcg/cpu-exec.c           |   43 +-
>>  accel/tcg/tcg-all.c            |   19 +-
>>  accel/tcg/tcg-cpus.c           |  541 +++++++++++++
>>  accel/tcg/tcg-cpus.h           |   17 +
>>  accel/tcg/translate-all.c      |    3 +-
>>  dma-helpers.c                  |    4 +-
>>  docs/replay.txt                |    6 +-
>>  exec.c                         |    4 -
>>  hw/core/cpu.c                  |    1 +
>>  hw/core/ptimer.c               |    8 +-
>>  hw/i386/x86.c                  |    3 +-
>>  include/exec/cpu-all.h         |    4 +
>>  include/exec/exec-all.h        |    4 +-
>>  include/qemu/timer.h           |   24 +-
>>  include/sysemu/cpu-timers.h    |   84 ++
>>  include/sysemu/cpus.h          |   48 +-
>>  include/sysemu/hw_accel.h      |   69 +-
>>  include/sysemu/kvm.h           |    2 +-
>>  include/sysemu/qtest.h         |    2 +
>>  include/sysemu/replay.h        |    4 +-
>>  replay/replay.c                |    6 +-
>>  softmmu/Makefile.objs          |    2 +
>>  softmmu/cpu-timers.c           |  279 +++++++
>>  softmmu/cpus.c                 | 1661 +++-------------------------------------
>>  softmmu/icount.c               |  497 ++++++++++++
>>  softmmu/qtest.c                |   34 +-
>>  softmmu/timers-state.h         |   69 ++
>>  softmmu/vl.c                   |   11 +-
>>  stubs/Makefile.objs            |    6 +-
>>  stubs/clock-warp.c             |    7 -
>>  stubs/cpu-get-clock.c          |    3 +-
>>  stubs/cpu-get-icount.c         |   21 -
>>  stubs/cpu-synchronize-state.c  |   15 +
>>  stubs/cpus-get-virtual-clock.c |    8 +
>>  stubs/icount.c                 |   52 ++
>>  stubs/qemu-timer-notify-cb.c   |    8 +
>>  stubs/qtest.c                  |    5 +
>>  target/alpha/translate.c       |    3 +-
>>  target/arm/helper.c            |    7 +-
>>  target/i386/Makefile.objs      |    7 +-
>>  target/i386/hax-all.c          |    6 +-
>>  target/i386/hax-cpus.c         |   85 ++
>>  target/i386/hax-cpus.h         |   17 +
>>  target/i386/hax-i386.h         |    2 +
>>  target/i386/hax-posix.c        |   12 +
>>  target/i386/hax-windows.c      |   20 +
>>  target/i386/hvf/Makefile.objs  |    2 +-
>>  target/i386/hvf/hvf-cpus.c     |  131 ++++
>>  target/i386/hvf/hvf-cpus.h     |   17 +
>>  target/i386/hvf/hvf.c          |    3 +
>>  target/i386/whpx-all.c         |    3 +
>>  target/i386/whpx-cpus.c        |   96 +++
>>  target/i386/whpx-cpus.h        |   17 +
>>  target/riscv/csr.c             |    8 +-
>>  tests/ptimer-test-stubs.c      |    7 +-
>>  tests/test-timed-average.c     |    2 +-
>>  util/main-loop.c               |   12 +-
>>  util/qemu-timer.c              |   14 +-
>>  70 files changed, 2528 insertions(+), 1772 deletions(-)
>>  create mode 100644 accel/kvm/kvm-cpus.c
>>  create mode 100644 accel/kvm/kvm-cpus.h
>>  create mode 100644 accel/qtest/Makefile.objs
>>  create mode 100644 accel/qtest/qtest-cpus.c
>>  create mode 100644 accel/qtest/qtest-cpus.h
>>  rename accel/{ => qtest}/qtest.c (81%)
>>  create mode 100644 accel/tcg/tcg-cpus.c
>>  create mode 100644 accel/tcg/tcg-cpus.h
>>  create mode 100644 include/sysemu/cpu-timers.h
>>  create mode 100644 softmmu/cpu-timers.c
>>  create mode 100644 softmmu/icount.c
>>  create mode 100644 softmmu/timers-state.h
>>  delete mode 100644 stubs/clock-warp.c
>>  delete mode 100644 stubs/cpu-get-icount.c
>>  create mode 100644 stubs/cpu-synchronize-state.c
>>  create mode 100644 stubs/cpus-get-virtual-clock.c
>>  create mode 100644 stubs/icount.c
>>  create mode 100644 stubs/qemu-timer-notify-cb.c
>>  create mode 100644 target/i386/hax-cpus.c
>>  create mode 100644 target/i386/hax-cpus.h
>>  create mode 100644 target/i386/hvf/hvf-cpus.c
>>  create mode 100644 target/i386/hvf/hvf-cpus.h
>>  create mode 100644 target/i386/whpx-cpus.c
>>  create mode 100644 target/i386/whpx-cpus.h
> 
> 



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC v3 2/8] cpus: prepare new CpusAccel cpu accelerator interface
  2020-08-03  9:05 ` [RFC v3 2/8] cpus: prepare new CpusAccel cpu accelerator interface Claudio Fontana
  2020-08-05  8:40   ` Claudio Fontana
@ 2020-08-11  8:59   ` Roman Bolshakov
  2020-08-11 10:57     ` Claudio Fontana
  2020-08-20  8:17   ` Claudio Fontana
  2 siblings, 1 reply; 25+ messages in thread
From: Roman Bolshakov @ 2020-08-11  8:59 UTC (permalink / raw)
  To: Claudio Fontana
  Cc: Laurent Vivier, Peter Maydell, Thomas Huth, Eduardo Habkost,
	Pavel Dovgalyuk, Alex Bennée, haxm-team, Marcelo Tosatti,
	qemu-devel, Markus Armbruster, Colin Xu, Wenchao Wang,
	Paolo Bonzini, Sunil Muthuswamy, Philippe Mathieu-Daudé,
	Richard Henderson

On Mon, Aug 03, 2020 at 11:05:27AM +0200, Claudio Fontana wrote:
> The new interface starts unused, will start being used by the
> next patches.
> 
> It provides methods for each accelerator to start a vcpu, kick a vcpu,
> synchronize state, get cpu virtual clock and elapsed ticks.
> 
> Signed-off-by: Claudio Fontana <cfontana@suse.de>
> ---
>  hw/core/cpu.c                  |   1 +
>  hw/i386/x86.c                  |   2 +-
>  include/sysemu/cpu-timers.h    |   9 +-
>  include/sysemu/cpus.h          |  36 ++++++++
>  include/sysemu/hw_accel.h      |  69 ++-------------
>  softmmu/cpu-timers.c           |   9 +-
>  softmmu/cpus.c                 | 194 ++++++++++++++++++++++++++++++++---------
>  stubs/Makefile.objs            |   2 +
>  stubs/cpu-synchronize-state.c  |  15 ++++
>  stubs/cpus-get-virtual-clock.c |   8 ++
>  util/qemu-timer.c              |   8 +-
>  11 files changed, 231 insertions(+), 122 deletions(-)
>  create mode 100644 stubs/cpu-synchronize-state.c
>  create mode 100644 stubs/cpus-get-virtual-clock.c
> 
> diff --git a/hw/core/cpu.c b/hw/core/cpu.c
> index 594441a150..b389a312df 100644
> --- a/hw/core/cpu.c
> +++ b/hw/core/cpu.c
> @@ -33,6 +33,7 @@
>  #include "hw/qdev-properties.h"
>  #include "trace-root.h"
>  #include "qemu/plugin.h"
> +#include "sysemu/hw_accel.h"
>  
>  CPUInterruptHandler cpu_interrupt_handler;
>  
> diff --git a/hw/i386/x86.c b/hw/i386/x86.c
> index 58cf2229d5..00c35bad7e 100644
> --- a/hw/i386/x86.c
> +++ b/hw/i386/x86.c
> @@ -264,7 +264,7 @@ static long get_file_size(FILE *f)
>  /* TSC handling */
>  uint64_t cpu_get_tsc(CPUX86State *env)
>  {
> -    return cpu_get_ticks();
> +    return cpus_get_elapsed_ticks();

Hi Claudio,

I still don't understand why plural form of "cpus" is used in files,
CpusAccel interface name and cpus_ prefix of the functions/variables.

Original cpus.c had functions to create CPU threads for multiple
accelerators, that justified naming of cpus.c. It had TCG, KVM and other
kinds of vCPUs. After you factor cpus.c into separate implementations of
CPU interface it should get singular form.

I’m not a native English speaker but the naming looks confusing to me.

>  }
>  
>  /* IRQ handling */
> diff --git a/softmmu/cpus.c b/softmmu/cpus.c
> index 54fdb2761c..bad6302ca3 100644
> --- a/softmmu/cpus.c
> +++ b/softmmu/cpus.c
> @@ -87,7 +87,7 @@ bool cpu_is_stopped(CPUState *cpu)
>      return cpu->stopped || !runstate_is_running();
>  }
>  
> -static inline bool cpu_work_list_empty(CPUState *cpu)
> +bool cpu_work_list_empty(CPUState *cpu)
>  {
>      bool ret;
>  
> @@ -97,7 +97,7 @@ static inline bool cpu_work_list_empty(CPUState *cpu)
>      return ret;
>  }
>  
> -static bool cpu_thread_is_idle(CPUState *cpu)
> +bool cpu_thread_is_idle(CPUState *cpu)
>  {
>      if (cpu->stop || !cpu_work_list_empty(cpu)) {
>          return false;
> @@ -215,6 +215,11 @@ void hw_error(const char *fmt, ...)
>      abort();
>  }
>  
> +/*
> + * The chosen accelerator is supposed to register this.
> + */
> +static CpusAccel *cpus_accel;
> +
>  void cpu_synchronize_all_states(void)
>  {
>      CPUState *cpu;
> @@ -251,6 +256,102 @@ void cpu_synchronize_all_pre_loadvm(void)
>      }
>  }
>  
> +void cpu_synchronize_state(CPUState *cpu)
> +{
> +    if (cpus_accel && cpus_accel->synchronize_state) {
> +        cpus_accel->synchronize_state(cpu);

I think the condition can be removed altogether if you move it to the
bootom inside else body. cpu_interrupt_handler and cpu_interrupt() in
hw/core/cpu.c is an example of that. Likely cpu_interrupt_handler should
be part of the accel interface. You might also avoid indirected function
call by using standalone fuction pointer. Like that:


void cpu_synchronize_state(CPUState *cpu)
{
    if (cpus_accel && cpus_accel->synchronize_state) {
        cpus_accel->synchronize_state(cpu);
    }
    if (kvm_enabled()) {
        kvm_cpu_synchronize_state(cpu);
    }
    else if (hax_enabled()) {
        hax_cpu_synchronize_state(cpu);
    }
    else if (whpx_enabled()) {
        whpx_cpu_synchronize_state(cpu);
    } else {
        cpu_synchronize_state_handler(cpu);
    }
}

After you finish factoring, it becomes:


void cpu_synchronize_state(CPUState *cpu)
{
    cpu_synchronize_state_handler(cpu);
}

cpu_register_accel would just assign non-NULL function pointer
from a CPUAccel field over generic_cpu_synchronize_state_handler.

Regards,
Roman

> +    }
> +    if (kvm_enabled()) {
> +        kvm_cpu_synchronize_state(cpu);
> +    }
> +    if (hax_enabled()) {
> +        hax_cpu_synchronize_state(cpu);
> +    }
> +    if (whpx_enabled()) {
> +        whpx_cpu_synchronize_state(cpu);
> +    }
> +}
> +


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC v3 8/8] cpus: extract out hvf-specific code to target/i386/hvf/
  2020-08-03  9:05 ` [RFC v3 8/8] cpus: extract out hvf-specific code to target/i386/hvf/ Claudio Fontana
@ 2020-08-11  9:00   ` Roman Bolshakov
  2020-08-11 13:42     ` Claudio Fontana
  0 siblings, 1 reply; 25+ messages in thread
From: Roman Bolshakov @ 2020-08-11  9:00 UTC (permalink / raw)
  To: Claudio Fontana
  Cc: Laurent Vivier, Peter Maydell, Thomas Huth, Eduardo Habkost,
	Pavel Dovgalyuk, Alex Bennée, haxm-team, Marcelo Tosatti,
	qemu-devel, Markus Armbruster, Colin Xu, Wenchao Wang,
	Paolo Bonzini, Sunil Muthuswamy, Philippe Mathieu-Daudé,
	Richard Henderson

On Mon, Aug 03, 2020 at 11:05:33AM +0200, Claudio Fontana wrote:
> register a "CpusAccel" interface for HVF as well.
> 
> Signed-off-by: Claudio Fontana <cfontana@suse.de>
> ---
>  softmmu/cpus.c                |  63 --------------------
>  target/i386/hvf/Makefile.objs |   2 +-
>  target/i386/hvf/hvf-cpus.c    | 131 ++++++++++++++++++++++++++++++++++++++++++
>  target/i386/hvf/hvf-cpus.h    |  17 ++++++
>  target/i386/hvf/hvf.c         |   3 +
>  5 files changed, 152 insertions(+), 64 deletions(-)
>  create mode 100644 target/i386/hvf/hvf-cpus.c
>  create mode 100644 target/i386/hvf/hvf-cpus.h
> 
> diff --git a/softmmu/cpus.c b/softmmu/cpus.c
> index 586b4acaab..d327b2685c 100644
> --- a/softmmu/cpus.c
> +++ b/softmmu/cpus.c
> @@ -33,7 +33,6 @@
>  #include "exec/gdbstub.h"
>  #include "sysemu/hw_accel.h"
>  #include "sysemu/kvm.h"
> -#include "sysemu/hvf.h"

I wonder if the declarations should be moved from sysemu/hvf.h to
someplace inside target/i386/hvf/:

int hvf_init_vcpu(CPUState *);
int hvf_vcpu_exec(CPUState *);
void hvf_cpu_synchronize_state(CPUState *);
void hvf_cpu_synchronize_post_reset(CPUState *);
void hvf_cpu_synchronize_post_init(CPUState *);
void hvf_cpu_synchronize_pre_loadvm(CPUState *);
void hvf_vcpu_destroy(CPUState *);

They're not used outside of target/i386/hvf/

I also wonder if we need stubs at all?

>  #include "exec/exec-all.h"
>  #include "qemu/thread.h"
>  #include "qemu/plugin.h"
> @@ -391,48 +390,6 @@ void qemu_wait_io_event(CPUState *cpu)
>      qemu_wait_io_event_common(cpu);
>  }
>  
> -/* The HVF-specific vCPU thread function. This one should only run when the host
> - * CPU supports the VMX "unrestricted guest" feature. */
> -static void *qemu_hvf_cpu_thread_fn(void *arg)
> -{
> -    CPUState *cpu = arg;
> -
> -    int r;
> -
> -    assert(hvf_enabled());
> -
> -    rcu_register_thread();
> -
> -    qemu_mutex_lock_iothread();
> -    qemu_thread_get_self(cpu->thread);
> -
> -    cpu->thread_id = qemu_get_thread_id();
> -    cpu->can_do_io = 1;
> -    current_cpu = cpu;
> -
> -    hvf_init_vcpu(cpu);
> -
> -    /* signal CPU creation */
> -    cpu_thread_signal_created(cpu);
> -    qemu_guest_random_seed_thread_part2(cpu->random_seed);
> -
> -    do {
> -        if (cpu_can_run(cpu)) {
> -            r = hvf_vcpu_exec(cpu);
> -            if (r == EXCP_DEBUG) {
> -                cpu_handle_guest_debug(cpu);
> -            }
> -        }
> -        qemu_wait_io_event(cpu);
> -    } while (!cpu->unplug || cpu_can_run(cpu));
> -
> -    hvf_vcpu_destroy(cpu);
> -    cpu_thread_signal_destroyed(cpu);
> -    qemu_mutex_unlock_iothread();
> -    rcu_unregister_thread();
> -    return NULL;
> -}
> -
>  void cpus_kick_thread(CPUState *cpu)
>  {
>  #ifndef _WIN32
> @@ -603,24 +560,6 @@ void cpu_remove_sync(CPUState *cpu)
>      qemu_mutex_lock_iothread();
>  }
>  
> -static void qemu_hvf_start_vcpu(CPUState *cpu)
> -{
> -    char thread_name[VCPU_THREAD_NAME_SIZE];
> -
> -    /* HVF currently does not support TCG, and only runs in
> -     * unrestricted-guest mode. */
> -    assert(hvf_enabled());
> -
> -    cpu->thread = g_malloc0(sizeof(QemuThread));
> -    cpu->halt_cond = g_malloc0(sizeof(QemuCond));
> -    qemu_cond_init(cpu->halt_cond);
> -
> -    snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/HVF",
> -             cpu->cpu_index);
> -    qemu_thread_create(cpu->thread, thread_name, qemu_hvf_cpu_thread_fn,
> -                       cpu, QEMU_THREAD_JOINABLE);
> -}
> -
>  void cpus_register_accel(CpusAccel *ca)
>  {
>      assert(ca != NULL);
> @@ -648,8 +587,6 @@ void qemu_init_vcpu(CPUState *cpu)
>      if (cpus_accel) {
>          /* accelerator already implements the CpusAccel interface */
>          cpus_accel->create_vcpu_thread(cpu);
> -    } else if (hvf_enabled()) {
> -        qemu_hvf_start_vcpu(cpu);
>      } else {
>          assert(0);
>      }
> diff --git a/target/i386/hvf/Makefile.objs b/target/i386/hvf/Makefile.objs
> index 927b86bc67..af9f7dcfc1 100644
> --- a/target/i386/hvf/Makefile.objs
> +++ b/target/i386/hvf/Makefile.objs
> @@ -1,2 +1,2 @@
> -obj-y += hvf.o
> +obj-y += hvf.o hvf-cpus.o
>  obj-y += x86.o x86_cpuid.o x86_decode.o x86_descr.o x86_emu.o x86_flags.o x86_mmu.o x86hvf.o x86_task.o
> diff --git a/target/i386/hvf/hvf-cpus.c b/target/i386/hvf/hvf-cpus.c
> new file mode 100644
> index 0000000000..9540157f1e
> --- /dev/null
> +++ b/target/i386/hvf/hvf-cpus.c

I'd prefer singular form in variables and file names. More on that in
the comment to patch 2.

Besides that it works fine,

Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com>
Tested-by: Roman Bolshakov <r.bolshakov@yadro.com>

Regards,
Roman


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC v3 2/8] cpus: prepare new CpusAccel cpu accelerator interface
  2020-08-11  8:59   ` Roman Bolshakov
@ 2020-08-11 10:57     ` Claudio Fontana
  0 siblings, 0 replies; 25+ messages in thread
From: Claudio Fontana @ 2020-08-11 10:57 UTC (permalink / raw)
  To: Roman Bolshakov
  Cc: Laurent Vivier, Peter Maydell, Thomas Huth, Eduardo Habkost,
	Pavel Dovgalyuk, Alex Bennée, haxm-team, Marcelo Tosatti,
	qemu-devel, Markus Armbruster, Colin Xu, Wenchao Wang,
	Paolo Bonzini, Sunil Muthuswamy, Philippe Mathieu-Daudé,
	Richard Henderson

On 8/11/20 10:59 AM, Roman Bolshakov wrote:
> On Mon, Aug 03, 2020 at 11:05:27AM +0200, Claudio Fontana wrote:
>> The new interface starts unused, will start being used by the
>> next patches.
>>
>> It provides methods for each accelerator to start a vcpu, kick a vcpu,
>> synchronize state, get cpu virtual clock and elapsed ticks.
>>
>> Signed-off-by: Claudio Fontana <cfontana@suse.de>
>> ---
>>  hw/core/cpu.c                  |   1 +
>>  hw/i386/x86.c                  |   2 +-
>>  include/sysemu/cpu-timers.h    |   9 +-
>>  include/sysemu/cpus.h          |  36 ++++++++
>>  include/sysemu/hw_accel.h      |  69 ++-------------
>>  softmmu/cpu-timers.c           |   9 +-
>>  softmmu/cpus.c                 | 194 ++++++++++++++++++++++++++++++++---------
>>  stubs/Makefile.objs            |   2 +
>>  stubs/cpu-synchronize-state.c  |  15 ++++
>>  stubs/cpus-get-virtual-clock.c |   8 ++
>>  util/qemu-timer.c              |   8 +-
>>  11 files changed, 231 insertions(+), 122 deletions(-)
>>  create mode 100644 stubs/cpu-synchronize-state.c
>>  create mode 100644 stubs/cpus-get-virtual-clock.c
>>
>> diff --git a/hw/core/cpu.c b/hw/core/cpu.c
>> index 594441a150..b389a312df 100644
>> --- a/hw/core/cpu.c
>> +++ b/hw/core/cpu.c
>> @@ -33,6 +33,7 @@
>>  #include "hw/qdev-properties.h"
>>  #include "trace-root.h"
>>  #include "qemu/plugin.h"
>> +#include "sysemu/hw_accel.h"
>>  
>>  CPUInterruptHandler cpu_interrupt_handler;
>>  
>> diff --git a/hw/i386/x86.c b/hw/i386/x86.c
>> index 58cf2229d5..00c35bad7e 100644
>> --- a/hw/i386/x86.c
>> +++ b/hw/i386/x86.c
>> @@ -264,7 +264,7 @@ static long get_file_size(FILE *f)
>>  /* TSC handling */
>>  uint64_t cpu_get_tsc(CPUX86State *env)
>>  {
>> -    return cpu_get_ticks();
>> +    return cpus_get_elapsed_ticks();
> 
> Hi Claudio,
> 
> I still don't understand why plural form of "cpus" is used in files,
> CpusAccel interface name and cpus_ prefix of the functions/variables.

cpus.c is the module, and the functions do sometimes affect more than one single cpu,
or get properties that are not specific to a single cpu.

For example the existing functions:

all_cpu_threads_idle
cpu_synchronize_all_states
cpu_synchronize_all_post_reset
cpu_synchronize_all_post_init
cpu_synchronize_all_pre_loadvm
qemu_init_cpu_loop
qemu_init_sigbus
qemu_in_vcpu_thread
qemu_mutex_iothread_locked
qemu_mutex_lock_iothread_impl
qemu_mutex_unlock_iothread
pause_all_vcpus
resume_all_vcpus
vm_shutdown
vm_stop
vm_prepare_start
vm_start
vm_stop_force_state
list_cpus

and the new identifiers:

cpus_accel
cpus_register_accel
cpus_get_virtual_clock
cpus_get_elapsed_ticks

are all affecting _all_ the cpus in the VM, not just one.

Of course the module contains also functions that do affect one single cpu,
but with the huge amount of functions in the qemu code called cpu_something,
scattered all around the directories, having a cpus_ prefix would immediately point to softmmu/cpus.c making it
easier to find and understand.

So I would be for eventually having all the functions prefixed with the cpus_ prefix for the cpus.c module,
as this module is about the _set_ of cpus running in the VM.


> 
> Original cpus.c had functions to create CPU threads for multiple
> accelerators, that justified naming of cpus.c. It had TCG, KVM and other
> kinds of vCPUs. After you factor cpus.c into separate implementations of
> CPU interface it should get singular form.
> 
> I’m not a native English speaker but the naming looks confusing to me.

See above for the reason I think cpus as a name is still warranted for this module.
It is about the set of all cpus, not a single cpu.

> 
>>  }
>>  
>>  /* IRQ handling */
>> diff --git a/softmmu/cpus.c b/softmmu/cpus.c
>> index 54fdb2761c..bad6302ca3 100644
>> --- a/softmmu/cpus.c
>> +++ b/softmmu/cpus.c
>> @@ -87,7 +87,7 @@ bool cpu_is_stopped(CPUState *cpu)
>>      return cpu->stopped || !runstate_is_running();
>>  }
>>  
>> -static inline bool cpu_work_list_empty(CPUState *cpu)
>> +bool cpu_work_list_empty(CPUState *cpu)
>>  {
>>      bool ret;
>>  
>> @@ -97,7 +97,7 @@ static inline bool cpu_work_list_empty(CPUState *cpu)
>>      return ret;
>>  }
>>  
>> -static bool cpu_thread_is_idle(CPUState *cpu)
>> +bool cpu_thread_is_idle(CPUState *cpu)
>>  {
>>      if (cpu->stop || !cpu_work_list_empty(cpu)) {
>>          return false;
>> @@ -215,6 +215,11 @@ void hw_error(const char *fmt, ...)
>>      abort();
>>  }
>>  
>> +/*
>> + * The chosen accelerator is supposed to register this.
>> + */
>> +static CpusAccel *cpus_accel;
>> +
>>  void cpu_synchronize_all_states(void)
>>  {
>>      CPUState *cpu;
>> @@ -251,6 +256,102 @@ void cpu_synchronize_all_pre_loadvm(void)
>>      }
>>  }
>>  
>> +void cpu_synchronize_state(CPUState *cpu)
>> +{
>> +    if (cpus_accel && cpus_accel->synchronize_state) {
>> +        cpus_accel->synchronize_state(cpu);
> 
> I think the condition can be removed altogether if you move it to the
> bootom inside else body. cpu_interrupt_handler and cpu_interrupt() in
> hw/core/cpu.c is an example of that. Likely cpu_interrupt_handler should
> be part of the accel interface. You might also avoid indirected function
> call by using standalone fuction pointer. Like that:
> 
> 
> void cpu_synchronize_state(CPUState *cpu)
> {
>     if (cpus_accel && cpus_accel->synchronize_state) {
>         cpus_accel->synchronize_state(cpu);
>     }
>     if (kvm_enabled()) {
>         kvm_cpu_synchronize_state(cpu);
>     }
>     else if (hax_enabled()) {
>         hax_cpu_synchronize_state(cpu);
>     }
>     else if (whpx_enabled()) {
>         whpx_cpu_synchronize_state(cpu);
>     } else {
>         cpu_synchronize_state_handler(cpu);
>     }
> }
> 
> After you finish factoring, it becomes:
> 
> 
> void cpu_synchronize_state(CPUState *cpu)
> {
>     cpu_synchronize_state_handler(cpu);
> }
> 
> cpu_register_accel would just assign non-NULL function pointer
> from a CPUAccel field over generic_cpu_synchronize_state_handler.
> 
> Regards,
> Roman

I'll take a look at how things look after adding static inlines to the .h file to speed this up,
I wonder what are the real hot paths here though, I'd like to find the best balance between
readability and performance, as we could go overboard with this when a simpler to read solution would suffice.

Thanks!

Claudio

> 
>> +    }
>> +    if (kvm_enabled()) {
>> +        kvm_cpu_synchronize_state(cpu);
>> +    }
>> +    if (hax_enabled()) {
>> +        hax_cpu_synchronize_state(cpu);
>> +    }
>> +    if (whpx_enabled()) {
>> +        whpx_cpu_synchronize_state(cpu);
>> +    }
>> +}
>> +



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC v3 8/8] cpus: extract out hvf-specific code to target/i386/hvf/
  2020-08-11  9:00   ` Roman Bolshakov
@ 2020-08-11 13:42     ` Claudio Fontana
  2020-08-11 14:28       ` Claudio Fontana
  0 siblings, 1 reply; 25+ messages in thread
From: Claudio Fontana @ 2020-08-11 13:42 UTC (permalink / raw)
  To: Roman Bolshakov
  Cc: Laurent Vivier, Peter Maydell, Thomas Huth, Eduardo Habkost,
	Pavel Dovgalyuk, Alex Bennée, haxm-team, Marcelo Tosatti,
	qemu-devel, Markus Armbruster, Colin Xu, Wenchao Wang,
	Paolo Bonzini, Sunil Muthuswamy, Philippe Mathieu-Daudé,
	Richard Henderson

On 8/11/20 11:00 AM, Roman Bolshakov wrote:
> On Mon, Aug 03, 2020 at 11:05:33AM +0200, Claudio Fontana wrote:
>> register a "CpusAccel" interface for HVF as well.
>>
>> Signed-off-by: Claudio Fontana <cfontana@suse.de>
>> ---
>>  softmmu/cpus.c                |  63 --------------------
>>  target/i386/hvf/Makefile.objs |   2 +-
>>  target/i386/hvf/hvf-cpus.c    | 131 ++++++++++++++++++++++++++++++++++++++++++
>>  target/i386/hvf/hvf-cpus.h    |  17 ++++++
>>  target/i386/hvf/hvf.c         |   3 +
>>  5 files changed, 152 insertions(+), 64 deletions(-)
>>  create mode 100644 target/i386/hvf/hvf-cpus.c
>>  create mode 100644 target/i386/hvf/hvf-cpus.h
>>
>> diff --git a/softmmu/cpus.c b/softmmu/cpus.c
>> index 586b4acaab..d327b2685c 100644
>> --- a/softmmu/cpus.c
>> +++ b/softmmu/cpus.c
>> @@ -33,7 +33,6 @@
>>  #include "exec/gdbstub.h"
>>  #include "sysemu/hw_accel.h"
>>  #include "sysemu/kvm.h"
>> -#include "sysemu/hvf.h"
> 
> I wonder if the declarations should be moved from sysemu/hvf.h to
> someplace inside target/i386/hvf/:
> 
> int hvf_init_vcpu(CPUState *);
> int hvf_vcpu_exec(CPUState *);
> void hvf_cpu_synchronize_state(CPUState *);
> void hvf_cpu_synchronize_post_reset(CPUState *);
> void hvf_cpu_synchronize_post_init(CPUState *);
> void hvf_cpu_synchronize_pre_loadvm(CPUState *);
> void hvf_vcpu_destroy(CPUState *);
> 
> They're not used outside of target/i386/hvf/
> 
> I also wonder if we need stubs at all?
> 
>>  #include "exec/exec-all.h"
>>  #include "qemu/thread.h"
>>  #include "qemu/plugin.h"
>> @@ -391,48 +390,6 @@ void qemu_wait_io_event(CPUState *cpu)
>>      qemu_wait_io_event_common(cpu);
>>  }
>>  
>> -/* The HVF-specific vCPU thread function. This one should only run when the host
>> - * CPU supports the VMX "unrestricted guest" feature. */
>> -static void *qemu_hvf_cpu_thread_fn(void *arg)
>> -{
>> -    CPUState *cpu = arg;
>> -
>> -    int r;
>> -
>> -    assert(hvf_enabled());
>> -
>> -    rcu_register_thread();
>> -
>> -    qemu_mutex_lock_iothread();
>> -    qemu_thread_get_self(cpu->thread);
>> -
>> -    cpu->thread_id = qemu_get_thread_id();
>> -    cpu->can_do_io = 1;
>> -    current_cpu = cpu;
>> -
>> -    hvf_init_vcpu(cpu);
>> -
>> -    /* signal CPU creation */
>> -    cpu_thread_signal_created(cpu);
>> -    qemu_guest_random_seed_thread_part2(cpu->random_seed);
>> -
>> -    do {
>> -        if (cpu_can_run(cpu)) {
>> -            r = hvf_vcpu_exec(cpu);
>> -            if (r == EXCP_DEBUG) {
>> -                cpu_handle_guest_debug(cpu);
>> -            }
>> -        }
>> -        qemu_wait_io_event(cpu);
>> -    } while (!cpu->unplug || cpu_can_run(cpu));
>> -
>> -    hvf_vcpu_destroy(cpu);
>> -    cpu_thread_signal_destroyed(cpu);
>> -    qemu_mutex_unlock_iothread();
>> -    rcu_unregister_thread();
>> -    return NULL;
>> -}
>> -
>>  void cpus_kick_thread(CPUState *cpu)
>>  {
>>  #ifndef _WIN32
>> @@ -603,24 +560,6 @@ void cpu_remove_sync(CPUState *cpu)
>>      qemu_mutex_lock_iothread();
>>  }
>>  
>> -static void qemu_hvf_start_vcpu(CPUState *cpu)
>> -{
>> -    char thread_name[VCPU_THREAD_NAME_SIZE];
>> -
>> -    /* HVF currently does not support TCG, and only runs in
>> -     * unrestricted-guest mode. */
>> -    assert(hvf_enabled());
>> -
>> -    cpu->thread = g_malloc0(sizeof(QemuThread));
>> -    cpu->halt_cond = g_malloc0(sizeof(QemuCond));
>> -    qemu_cond_init(cpu->halt_cond);
>> -
>> -    snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/HVF",
>> -             cpu->cpu_index);
>> -    qemu_thread_create(cpu->thread, thread_name, qemu_hvf_cpu_thread_fn,
>> -                       cpu, QEMU_THREAD_JOINABLE);
>> -}
>> -
>>  void cpus_register_accel(CpusAccel *ca)
>>  {
>>      assert(ca != NULL);
>> @@ -648,8 +587,6 @@ void qemu_init_vcpu(CPUState *cpu)
>>      if (cpus_accel) {
>>          /* accelerator already implements the CpusAccel interface */
>>          cpus_accel->create_vcpu_thread(cpu);
>> -    } else if (hvf_enabled()) {
>> -        qemu_hvf_start_vcpu(cpu);
>>      } else {
>>          assert(0);
>>      }
>> diff --git a/target/i386/hvf/Makefile.objs b/target/i386/hvf/Makefile.objs
>> index 927b86bc67..af9f7dcfc1 100644
>> --- a/target/i386/hvf/Makefile.objs
>> +++ b/target/i386/hvf/Makefile.objs
>> @@ -1,2 +1,2 @@
>> -obj-y += hvf.o
>> +obj-y += hvf.o hvf-cpus.o
>>  obj-y += x86.o x86_cpuid.o x86_decode.o x86_descr.o x86_emu.o x86_flags.o x86_mmu.o x86hvf.o x86_task.o
>> diff --git a/target/i386/hvf/hvf-cpus.c b/target/i386/hvf/hvf-cpus.c
>> new file mode 100644
>> index 0000000000..9540157f1e
>> --- /dev/null
>> +++ b/target/i386/hvf/hvf-cpus.c
> 
> I'd prefer singular form in variables and file names. More on that in
> the comment to patch 2.
> 
> Besides that it works fine,
> 
> Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com>
> Tested-by: Roman Bolshakov <r.bolshakov@yadro.com>
> 
> Regards,
> Roman
> 

Hi Roman,

thanks, sure lets discuss more the naming stuff on patch 2.

I noticed a missing chunk in this patch, ie, it leaves a lingering

} else if (hvf_enabled()) {

in cpu_synchronize_pre_loadvm().

that needs to be elided, should not change the behavior, but who knows. I will respin this one in the next version.

Thank you!

Claudio





^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC v3 8/8] cpus: extract out hvf-specific code to target/i386/hvf/
  2020-08-11 13:42     ` Claudio Fontana
@ 2020-08-11 14:28       ` Claudio Fontana
  0 siblings, 0 replies; 25+ messages in thread
From: Claudio Fontana @ 2020-08-11 14:28 UTC (permalink / raw)
  To: Roman Bolshakov
  Cc: Laurent Vivier, Peter Maydell, Thomas Huth, Eduardo Habkost,
	Pavel Dovgalyuk, Alex Bennée, haxm-team, Marcelo Tosatti,
	qemu-devel, Markus Armbruster, Colin Xu, Wenchao Wang,
	Paolo Bonzini, Sunil Muthuswamy, Philippe Mathieu-Daudé,
	Richard Henderson

On 8/11/20 3:42 PM, Claudio Fontana wrote:
> On 8/11/20 11:00 AM, Roman Bolshakov wrote:
>> On Mon, Aug 03, 2020 at 11:05:33AM +0200, Claudio Fontana wrote:
>>> register a "CpusAccel" interface for HVF as well.
>>>
>>> Signed-off-by: Claudio Fontana <cfontana@suse.de>
>>> ---
>>>  softmmu/cpus.c                |  63 --------------------
>>>  target/i386/hvf/Makefile.objs |   2 +-
>>>  target/i386/hvf/hvf-cpus.c    | 131 ++++++++++++++++++++++++++++++++++++++++++
>>>  target/i386/hvf/hvf-cpus.h    |  17 ++++++
>>>  target/i386/hvf/hvf.c         |   3 +
>>>  5 files changed, 152 insertions(+), 64 deletions(-)
>>>  create mode 100644 target/i386/hvf/hvf-cpus.c
>>>  create mode 100644 target/i386/hvf/hvf-cpus.h
>>>
>>> diff --git a/softmmu/cpus.c b/softmmu/cpus.c
>>> index 586b4acaab..d327b2685c 100644
>>> --- a/softmmu/cpus.c
>>> +++ b/softmmu/cpus.c
>>> @@ -33,7 +33,6 @@
>>>  #include "exec/gdbstub.h"
>>>  #include "sysemu/hw_accel.h"
>>>  #include "sysemu/kvm.h"
>>> -#include "sysemu/hvf.h"
>>
>> I wonder if the declarations should be moved from sysemu/hvf.h to
>> someplace inside target/i386/hvf/:
>>
>> int hvf_init_vcpu(CPUState *);
>> int hvf_vcpu_exec(CPUState *);
>> void hvf_cpu_synchronize_state(CPUState *);
>> void hvf_cpu_synchronize_post_reset(CPUState *);
>> void hvf_cpu_synchronize_post_init(CPUState *);
>> void hvf_cpu_synchronize_pre_loadvm(CPUState *);
>> void hvf_vcpu_destroy(CPUState *);
>>
>> They're not used outside of target/i386/hvf/
>>
>> I also wonder if we need stubs at all?

Ah, missed this,

yes good catch! I think we can remove quite a few stubs and not only for HVF!

Thanks a lot,

Claudio


>>
>>>  #include "exec/exec-all.h"
>>>  #include "qemu/thread.h"
>>>  #include "qemu/plugin.h"
>>> @@ -391,48 +390,6 @@ void qemu_wait_io_event(CPUState *cpu)
>>>      qemu_wait_io_event_common(cpu);
>>>  }
>>>  
>>> -/* The HVF-specific vCPU thread function. This one should only run when the host
>>> - * CPU supports the VMX "unrestricted guest" feature. */
>>> -static void *qemu_hvf_cpu_thread_fn(void *arg)
>>> -{
>>> -    CPUState *cpu = arg;
>>> -
>>> -    int r;
>>> -
>>> -    assert(hvf_enabled());
>>> -
>>> -    rcu_register_thread();
>>> -
>>> -    qemu_mutex_lock_iothread();
>>> -    qemu_thread_get_self(cpu->thread);
>>> -
>>> -    cpu->thread_id = qemu_get_thread_id();
>>> -    cpu->can_do_io = 1;
>>> -    current_cpu = cpu;
>>> -
>>> -    hvf_init_vcpu(cpu);
>>> -
>>> -    /* signal CPU creation */
>>> -    cpu_thread_signal_created(cpu);
>>> -    qemu_guest_random_seed_thread_part2(cpu->random_seed);
>>> -
>>> -    do {
>>> -        if (cpu_can_run(cpu)) {
>>> -            r = hvf_vcpu_exec(cpu);
>>> -            if (r == EXCP_DEBUG) {
>>> -                cpu_handle_guest_debug(cpu);
>>> -            }
>>> -        }
>>> -        qemu_wait_io_event(cpu);
>>> -    } while (!cpu->unplug || cpu_can_run(cpu));
>>> -
>>> -    hvf_vcpu_destroy(cpu);
>>> -    cpu_thread_signal_destroyed(cpu);
>>> -    qemu_mutex_unlock_iothread();
>>> -    rcu_unregister_thread();
>>> -    return NULL;
>>> -}
>>> -
>>>  void cpus_kick_thread(CPUState *cpu)
>>>  {
>>>  #ifndef _WIN32
>>> @@ -603,24 +560,6 @@ void cpu_remove_sync(CPUState *cpu)
>>>      qemu_mutex_lock_iothread();
>>>  }
>>>  
>>> -static void qemu_hvf_start_vcpu(CPUState *cpu)
>>> -{
>>> -    char thread_name[VCPU_THREAD_NAME_SIZE];
>>> -
>>> -    /* HVF currently does not support TCG, and only runs in
>>> -     * unrestricted-guest mode. */
>>> -    assert(hvf_enabled());
>>> -
>>> -    cpu->thread = g_malloc0(sizeof(QemuThread));
>>> -    cpu->halt_cond = g_malloc0(sizeof(QemuCond));
>>> -    qemu_cond_init(cpu->halt_cond);
>>> -
>>> -    snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/HVF",
>>> -             cpu->cpu_index);
>>> -    qemu_thread_create(cpu->thread, thread_name, qemu_hvf_cpu_thread_fn,
>>> -                       cpu, QEMU_THREAD_JOINABLE);
>>> -}
>>> -
>>>  void cpus_register_accel(CpusAccel *ca)
>>>  {
>>>      assert(ca != NULL);
>>> @@ -648,8 +587,6 @@ void qemu_init_vcpu(CPUState *cpu)
>>>      if (cpus_accel) {
>>>          /* accelerator already implements the CpusAccel interface */
>>>          cpus_accel->create_vcpu_thread(cpu);
>>> -    } else if (hvf_enabled()) {
>>> -        qemu_hvf_start_vcpu(cpu);
>>>      } else {
>>>          assert(0);
>>>      }
>>> diff --git a/target/i386/hvf/Makefile.objs b/target/i386/hvf/Makefile.objs
>>> index 927b86bc67..af9f7dcfc1 100644
>>> --- a/target/i386/hvf/Makefile.objs
>>> +++ b/target/i386/hvf/Makefile.objs
>>> @@ -1,2 +1,2 @@
>>> -obj-y += hvf.o
>>> +obj-y += hvf.o hvf-cpus.o
>>>  obj-y += x86.o x86_cpuid.o x86_decode.o x86_descr.o x86_emu.o x86_flags.o x86_mmu.o x86hvf.o x86_task.o
>>> diff --git a/target/i386/hvf/hvf-cpus.c b/target/i386/hvf/hvf-cpus.c
>>> new file mode 100644
>>> index 0000000000..9540157f1e
>>> --- /dev/null
>>> +++ b/target/i386/hvf/hvf-cpus.c
>>
>> I'd prefer singular form in variables and file names. More on that in
>> the comment to patch 2.
>>
>> Besides that it works fine,
>>
>> Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com>
>> Tested-by: Roman Bolshakov <r.bolshakov@yadro.com>
>>
>> Regards,
>> Roman
>>
> 
> Hi Roman,
> 
> thanks, sure lets discuss more the naming stuff on patch 2.
> 
> I noticed a missing chunk in this patch, ie, it leaves a lingering
> 
> } else if (hvf_enabled()) {
> 
> in cpu_synchronize_pre_loadvm().
> 
> that needs to be elided, should not change the behavior, but who knows. I will respin this one in the next version.
> 
> Thank you!
> 
> Claudio
> 
> 
> 



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC v3 2/8] cpus: prepare new CpusAccel cpu accelerator interface
  2020-08-03  9:05 ` [RFC v3 2/8] cpus: prepare new CpusAccel cpu accelerator interface Claudio Fontana
  2020-08-05  8:40   ` Claudio Fontana
  2020-08-11  8:59   ` Roman Bolshakov
@ 2020-08-20  8:17   ` Claudio Fontana
  2020-08-30 13:34     ` Claudio Fontana
  2 siblings, 1 reply; 25+ messages in thread
From: Claudio Fontana @ 2020-08-20  8:17 UTC (permalink / raw)
  To: Paolo Bonzini, Alex Bennée, Peter Maydell,
	Philippe Mathieu-Daudé
  Cc: Laurent Vivier, Thomas Huth, Eduardo Habkost, Marcelo Tosatti,
	qemu-devel, Markus Armbruster, Roman Bolshakov, Pavel Dovgalyuk,
	Wenchao Wang, haxm-team, Sunil Muthuswamy, Richard Henderson,
	Colin Xu

Hi Paolo and all,

back in RFC v3 I introduced cpus_get_virtual_clock in this patch.

I observed an issue when adding the get_virtual_clock to the CpusAccel interface, ie
it seems that qemu_clock_get_ns() is called in some io-tests before the accelerator is initialized,
which seems to collide with the idea to make it part of the CpusAccel interface:

(gdb) bt
#0  0x00005555558e6af0 in cpus_get_virtual_clock () at /home/claudio/git/qemu-pristine/qemu/softmmu/cpus.c:219
#1  0x0000555555c5099c in qemu_clock_get_ns (type=type@entry=QEMU_CLOCK_VIRTUAL)
    at /home/claudio/(gdb) bt
#0  0x00005555558e6af0 in cpus_get_virtual_clock () at /home/claudio/git/qemu-pristine/qemu/softmmu/cpus.c:219
#1  0x0000555555c5099c in qemu_clock_get_ns (type=type@entry=QEMU_CLOCK_VIRTUAL)
    at /home/claudio/git/qemu-pristine/qemu/util/qemu-timer.c:638
#2  0x0000555555b6077a in qemu_clock_get_ms (type=QEMU_CLOCK_VIRTUAL) at /home/claudio/git/qemu-pristine/qemu/include/qemu/timer.h:118
#3  0x0000555555b6077a in cache_clean_timer_init (bs=bs@entry=0x5555568381a0, context=0x555556821930)
    at /home/claudio/git/qemu-pristine/qemu/block/qcow2.c:846
#4  0x0000555555b63012 in qcow2_update_options_commit (bs=bs@entry=0x5555568381a0, r=r@entry=0x7fffd6a45e10)
    at /home/claudio/git/qemu-pristine/qemu/block/qcow2.c:1221
#5  0x0000555555b657ea in qcow2_update_options
    (bs=bs@entry=0x5555568381a0, options=options@entry=0x55555683d600, flags=flags@entry=139266, errp=errp@entry=0x7fffffffd580)
    at /home/claudio/git/qemu-pristine/qemu/block/qcow2.c:1248
#6  0x0000555555b671a2 in qcow2_do_open (bs=0x5555568381a0, options=0x55555683d600, flags=139266, errp=0x7fffffffd580)
    at /home/claudio/git/qemu-pristine/qemu/block/qcow2.c:1579
#7  0x0000555555b67e62 in qcow2_open_entry (opaque=0x7fffffffd520) at /home/claudio/git/qemu-pristine/qemu/block/qcow2.c:1867
#8  0x0000555555c4854c in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>)
    at /home/claudio/git/qemu-pristine/qemu/util/coroutine-ucontext.c:173
#9  0x00007fffed3779c0 in __start_context () at /lib64/libc.so.6
#10 0x00007fffffffcd90 in  ()
#11 0x0000000000000000 in  ()

(gdb) p *current_machine
$3 = {parent_obj = {class = 0x5555567a2090, free = 0x7ffff72d9840 <g_free>, Python Exception <class 'gdb.error'> There is no member named keys.: 
properties = 0x55555681c580, ref = 2, 
    parent = 0x55555682aa90}, sysbus_notifier = {notify = 0x555555990130 <machine_init_notify>, node = {le_next = 
    0x5555564e1130 <chardev_machine_done_notify>, le_prev = 0x5555565079f0 <machine_init_done_notifiers>}}, dtb = 0x0, dumpdtb = 0x0, 
  phandle_start = 0, dt_compatible = 0x0, dump_guest_core = true, mem_merge = true, usb = false, usb_disabled = false, firmware = 0x0, 
  iommu = false, suppress_vmdesc = false, enforce_config_section = false, enable_graphics = true, memory_encryption = 0x0, 
  ram_memdev_id = 0x0, ram = 0x0, device_memory = 0x0, ram_size = 0, maxram_size = 0, ram_slots = 0, boot_order = 0x0, 
  kernel_filename = 0x0, kernel_cmdline = 0x0, initrd_filename = 0x0, cpu_type = 0x0, accelerator = 0x0, possible_cpus = 0x0, smp = {
    cpus = 1, cores = 1, threads = 1, sockets = 1, max_cpus = 1}, nvdimms_state = 0x555556822850, numa_state = 0x555556822be0}


The affected tests are:

Failures: 030 040 041 060 099 120 127 140 156 161 172 181 191 192 195 203 229 249 256 267

Are the tests wrong here, to trigger this call stack before the accel is set,
or should the get virtual clock functionality be taken out of the interface, or ...?

Thanks for any advice,

Ciao,

Claudio  


On 8/3/20 11:05 AM, Claudio Fontana wrote:
> The new interface starts unused, will start being used by the
> next patches.
> 
> It provides methods for each accelerator to start a vcpu, kick a vcpu,
> synchronize state, get cpu virtual clock and elapsed ticks.
> 
> Signed-off-by: Claudio Fontana <cfontana@suse.de>
> ---
>  hw/core/cpu.c                  |   1 +
>  hw/i386/x86.c                  |   2 +-
>  include/sysemu/cpu-timers.h    |   9 +-
>  include/sysemu/cpus.h          |  36 ++++++++
>  include/sysemu/hw_accel.h      |  69 ++-------------
>  softmmu/cpu-timers.c           |   9 +-
>  softmmu/cpus.c                 | 194 ++++++++++++++++++++++++++++++++---------
>  stubs/Makefile.objs            |   2 +
>  stubs/cpu-synchronize-state.c  |  15 ++++
>  stubs/cpus-get-virtual-clock.c |   8 ++
>  util/qemu-timer.c              |   8 +-
>  11 files changed, 231 insertions(+), 122 deletions(-)
>  create mode 100644 stubs/cpu-synchronize-state.c
>  create mode 100644 stubs/cpus-get-virtual-clock.c
> 
> diff --git a/hw/core/cpu.c b/hw/core/cpu.c
> index 594441a150..b389a312df 100644
> --- a/hw/core/cpu.c
> +++ b/hw/core/cpu.c
> @@ -33,6 +33,7 @@
>  #include "hw/qdev-properties.h"
>  #include "trace-root.h"
>  #include "qemu/plugin.h"
> +#include "sysemu/hw_accel.h"
>  
>  CPUInterruptHandler cpu_interrupt_handler;
>  
> diff --git a/hw/i386/x86.c b/hw/i386/x86.c
> index 58cf2229d5..00c35bad7e 100644
> --- a/hw/i386/x86.c
> +++ b/hw/i386/x86.c
> @@ -264,7 +264,7 @@ static long get_file_size(FILE *f)
>  /* TSC handling */
>  uint64_t cpu_get_tsc(CPUX86State *env)
>  {
> -    return cpu_get_ticks();
> +    return cpus_get_elapsed_ticks();
>  }
>  
>  /* IRQ handling */
> diff --git a/include/sysemu/cpu-timers.h b/include/sysemu/cpu-timers.h
> index 07d724672f..cb83cc5584 100644
> --- a/include/sysemu/cpu-timers.h
> +++ b/include/sysemu/cpu-timers.h
> @@ -64,9 +64,8 @@ void cpu_enable_ticks(void);
>  void cpu_disable_ticks(void);
>  
>  /*
> - * return the time elapsed in VM between vm_start and vm_stop.  Unless
> - * icount is active, cpu_get_ticks() uses units of the host CPU cycle
> - * counter.
> + * return the time elapsed in VM between vm_start and vm_stop.
> + * cpu_get_ticks() uses units of the host CPU cycle counter.
>   */
>  int64_t cpu_get_ticks(void);
>  
> @@ -78,4 +77,8 @@ int64_t cpu_get_clock(void);
>  
>  void qemu_timer_notify_cb(void *opaque, QEMUClockType type);
>  
> +/* get the VIRTUAL clock and VM elapsed ticks via the cpus accel interface */
> +int64_t cpus_get_virtual_clock(void);
> +int64_t cpus_get_elapsed_ticks(void);
> +
>  #endif /* SYSEMU_CPU_TIMERS_H */
> diff --git a/include/sysemu/cpus.h b/include/sysemu/cpus.h
> index 149de000a0..db196dd96f 100644
> --- a/include/sysemu/cpus.h
> +++ b/include/sysemu/cpus.h
> @@ -4,7 +4,43 @@
>  #include "qemu/timer.h"
>  
>  /* cpus.c */
> +
> +/* CPU execution threads */
> +
> +typedef struct CpusAccel {
> +    void (*create_vcpu_thread)(CPUState *cpu); /* MANDATORY */
> +    void (*kick_vcpu_thread)(CPUState *cpu);
> +
> +    void (*synchronize_post_reset)(CPUState *cpu);
> +    void (*synchronize_post_init)(CPUState *cpu);
> +    void (*synchronize_state)(CPUState *cpu);
> +    void (*synchronize_pre_loadvm)(CPUState *cpu);
> +
> +    int64_t (*get_virtual_clock)(void);
> +    int64_t (*get_elapsed_ticks)(void);
> +} CpusAccel;
> +
> +/* register accel-specific cpus interface implementation */
> +void cpus_register_accel(CpusAccel *i);
> +
> +/* interface available for cpus accelerator threads */
> +
> +/* For temporary buffers for forming a name */
> +#define VCPU_THREAD_NAME_SIZE 16
> +
> +void cpus_kick_thread(CPUState *cpu);
> +bool cpu_work_list_empty(CPUState *cpu);
> +bool cpu_thread_is_idle(CPUState *cpu);
>  bool all_cpu_threads_idle(void);
> +bool cpu_can_run(CPUState *cpu);
> +void qemu_wait_io_event_common(CPUState *cpu);
> +void qemu_wait_io_event(CPUState *cpu);
> +void cpu_thread_signal_created(CPUState *cpu);
> +void cpu_thread_signal_destroyed(CPUState *cpu);
> +void cpu_handle_guest_debug(CPUState *cpu);
> +
> +/* end interface for cpus accelerator threads */
> +
>  bool qemu_in_vcpu_thread(void);
>  void qemu_init_cpu_loop(void);
>  void resume_all_vcpus(void);
> diff --git a/include/sysemu/hw_accel.h b/include/sysemu/hw_accel.h
> index e128f8b06b..ffed6192a3 100644
> --- a/include/sysemu/hw_accel.h
> +++ b/include/sysemu/hw_accel.h
> @@ -1,5 +1,5 @@
>  /*
> - * QEMU Hardware accelertors support
> + * QEMU Hardware accelerators support
>   *
>   * Copyright 2016 Google, Inc.
>   *
> @@ -17,68 +17,9 @@
>  #include "sysemu/hvf.h"
>  #include "sysemu/whpx.h"
>  
> -static inline void cpu_synchronize_state(CPUState *cpu)
> -{
> -    if (kvm_enabled()) {
> -        kvm_cpu_synchronize_state(cpu);
> -    }
> -    if (hax_enabled()) {
> -        hax_cpu_synchronize_state(cpu);
> -    }
> -    if (hvf_enabled()) {
> -        hvf_cpu_synchronize_state(cpu);
> -    }
> -    if (whpx_enabled()) {
> -        whpx_cpu_synchronize_state(cpu);
> -    }
> -}
> -
> -static inline void cpu_synchronize_post_reset(CPUState *cpu)
> -{
> -    if (kvm_enabled()) {
> -        kvm_cpu_synchronize_post_reset(cpu);
> -    }
> -    if (hax_enabled()) {
> -        hax_cpu_synchronize_post_reset(cpu);
> -    }
> -    if (hvf_enabled()) {
> -        hvf_cpu_synchronize_post_reset(cpu);
> -    }
> -    if (whpx_enabled()) {
> -        whpx_cpu_synchronize_post_reset(cpu);
> -    }
> -}
> -
> -static inline void cpu_synchronize_post_init(CPUState *cpu)
> -{
> -    if (kvm_enabled()) {
> -        kvm_cpu_synchronize_post_init(cpu);
> -    }
> -    if (hax_enabled()) {
> -        hax_cpu_synchronize_post_init(cpu);
> -    }
> -    if (hvf_enabled()) {
> -        hvf_cpu_synchronize_post_init(cpu);
> -    }
> -    if (whpx_enabled()) {
> -        whpx_cpu_synchronize_post_init(cpu);
> -    }
> -}
> -
> -static inline void cpu_synchronize_pre_loadvm(CPUState *cpu)
> -{
> -    if (kvm_enabled()) {
> -        kvm_cpu_synchronize_pre_loadvm(cpu);
> -    }
> -    if (hax_enabled()) {
> -        hax_cpu_synchronize_pre_loadvm(cpu);
> -    }
> -    if (hvf_enabled()) {
> -        hvf_cpu_synchronize_pre_loadvm(cpu);
> -    }
> -    if (whpx_enabled()) {
> -        whpx_cpu_synchronize_pre_loadvm(cpu);
> -    }
> -}
> +void cpu_synchronize_state(CPUState *cpu);
> +void cpu_synchronize_post_reset(CPUState *cpu);
> +void cpu_synchronize_post_init(CPUState *cpu);
> +void cpu_synchronize_pre_loadvm(CPUState *cpu);
>  
>  #endif /* QEMU_HW_ACCEL_H */
> diff --git a/softmmu/cpu-timers.c b/softmmu/cpu-timers.c
> index 64addb315d..3e1da79735 100644
> --- a/softmmu/cpu-timers.c
> +++ b/softmmu/cpu-timers.c
> @@ -61,18 +61,13 @@ static int64_t cpu_get_ticks_locked(void)
>  }
>  
>  /*
> - * return the time elapsed in VM between vm_start and vm_stop.  Unless
> - * icount is active, cpu_get_ticks() uses units of the host CPU cycle
> - * counter.
> + * return the time elapsed in VM between vm_start and vm_stop.
> + * cpu_get_ticks() uses units of the host CPU cycle counter.
>   */
>  int64_t cpu_get_ticks(void)
>  {
>      int64_t ticks;
>  
> -    if (icount_enabled()) {
> -        return icount_get();
> -    }
> -
>      qemu_spin_lock(&timers_state.vm_clock_lock);
>      ticks = cpu_get_ticks_locked();
>      qemu_spin_unlock(&timers_state.vm_clock_lock);
> diff --git a/softmmu/cpus.c b/softmmu/cpus.c
> index 54fdb2761c..bad6302ca3 100644
> --- a/softmmu/cpus.c
> +++ b/softmmu/cpus.c
> @@ -87,7 +87,7 @@ bool cpu_is_stopped(CPUState *cpu)
>      return cpu->stopped || !runstate_is_running();
>  }
>  
> -static inline bool cpu_work_list_empty(CPUState *cpu)
> +bool cpu_work_list_empty(CPUState *cpu)
>  {
>      bool ret;
>  
> @@ -97,7 +97,7 @@ static inline bool cpu_work_list_empty(CPUState *cpu)
>      return ret;
>  }
>  
> -static bool cpu_thread_is_idle(CPUState *cpu)
> +bool cpu_thread_is_idle(CPUState *cpu)
>  {
>      if (cpu->stop || !cpu_work_list_empty(cpu)) {
>          return false;
> @@ -215,6 +215,11 @@ void hw_error(const char *fmt, ...)
>      abort();
>  }
>  
> +/*
> + * The chosen accelerator is supposed to register this.
> + */
> +static CpusAccel *cpus_accel;
> +
>  void cpu_synchronize_all_states(void)
>  {
>      CPUState *cpu;
> @@ -251,6 +256,102 @@ void cpu_synchronize_all_pre_loadvm(void)
>      }
>  }
>  
> +void cpu_synchronize_state(CPUState *cpu)
> +{
> +    if (cpus_accel && cpus_accel->synchronize_state) {
> +        cpus_accel->synchronize_state(cpu);
> +    }
> +    if (kvm_enabled()) {
> +        kvm_cpu_synchronize_state(cpu);
> +    }
> +    if (hax_enabled()) {
> +        hax_cpu_synchronize_state(cpu);
> +    }
> +    if (whpx_enabled()) {
> +        whpx_cpu_synchronize_state(cpu);
> +    }
> +}
> +
> +void cpu_synchronize_post_reset(CPUState *cpu)
> +{
> +    if (cpus_accel && cpus_accel->synchronize_post_reset) {
> +        cpus_accel->synchronize_post_reset(cpu);
> +    }
> +    if (kvm_enabled()) {
> +        kvm_cpu_synchronize_post_reset(cpu);
> +    }
> +    if (hax_enabled()) {
> +        hax_cpu_synchronize_post_reset(cpu);
> +    }
> +    if (whpx_enabled()) {
> +        whpx_cpu_synchronize_post_reset(cpu);
> +    }
> +}
> +
> +void cpu_synchronize_post_init(CPUState *cpu)
> +{
> +    if (cpus_accel && cpus_accel->synchronize_post_init) {
> +        cpus_accel->synchronize_post_init(cpu);
> +    }
> +    if (kvm_enabled()) {
> +        kvm_cpu_synchronize_post_init(cpu);
> +    }
> +    if (hax_enabled()) {
> +        hax_cpu_synchronize_post_init(cpu);
> +    }
> +    if (whpx_enabled()) {
> +        whpx_cpu_synchronize_post_init(cpu);
> +    }
> +}
> +
> +void cpu_synchronize_pre_loadvm(CPUState *cpu)
> +{
> +    if (cpus_accel && cpus_accel->synchronize_pre_loadvm) {
> +        cpus_accel->synchronize_pre_loadvm(cpu);
> +    }
> +    if (kvm_enabled()) {
> +        kvm_cpu_synchronize_pre_loadvm(cpu);
> +    }
> +    if (hax_enabled()) {
> +        hax_cpu_synchronize_pre_loadvm(cpu);
> +    }
> +    if (hvf_enabled()) {
> +        hvf_cpu_synchronize_pre_loadvm(cpu);
> +    }
> +    if (whpx_enabled()) {
> +        whpx_cpu_synchronize_pre_loadvm(cpu);
> +    }
> +}
> +
> +int64_t cpus_get_virtual_clock(void)
> +{
> +    if (cpus_accel && cpus_accel->get_virtual_clock) {
> +        return cpus_accel->get_virtual_clock();
> +    }
> +    if (icount_enabled()) {
> +        return icount_get();
> +    } else if (qtest_enabled()) { /* for qtest_clock_warp */
> +        return qtest_get_virtual_clock();
> +    }
> +    return cpu_get_clock();
> +}
> +
> +/*
> + * return the time elapsed in VM between vm_start and vm_stop.  Unless
> + * icount is active, cpu_get_ticks() uses units of the host CPU cycle
> + * counter.
> + */
> +int64_t cpus_get_elapsed_ticks(void)
> +{
> +    if (cpus_accel && cpus_accel->get_elapsed_ticks) {
> +        return cpus_accel->get_elapsed_ticks();
> +    }
> +    if (icount_enabled()) {
> +        return icount_get();
> +    }
> +    return cpu_get_ticks();
> +}
> +
>  static int do_vm_stop(RunState state, bool send_stop)
>  {
>      int ret = 0;
> @@ -279,7 +380,7 @@ int vm_shutdown(void)
>      return do_vm_stop(RUN_STATE_SHUTDOWN, false);
>  }
>  
> -static bool cpu_can_run(CPUState *cpu)
> +bool cpu_can_run(CPUState *cpu)
>  {
>      if (cpu->stop) {
>          return false;
> @@ -290,7 +391,7 @@ static bool cpu_can_run(CPUState *cpu)
>      return true;
>  }
>  
> -static void cpu_handle_guest_debug(CPUState *cpu)
> +void cpu_handle_guest_debug(CPUState *cpu)
>  {
>      gdb_set_stop_cpu(cpu);
>      qemu_system_debug_request();
> @@ -396,7 +497,7 @@ static void qemu_cpu_stop(CPUState *cpu, bool exit)
>      qemu_cond_broadcast(&qemu_pause_cond);
>  }
>  
> -static void qemu_wait_io_event_common(CPUState *cpu)
> +void qemu_wait_io_event_common(CPUState *cpu)
>  {
>      atomic_mb_set(&cpu->thread_kicked, false);
>      if (cpu->stop) {
> @@ -421,7 +522,7 @@ static void qemu_tcg_rr_wait_io_event(void)
>      }
>  }
>  
> -static void qemu_wait_io_event(CPUState *cpu)
> +void qemu_wait_io_event(CPUState *cpu)
>  {
>      bool slept = false;
>  
> @@ -437,7 +538,8 @@ static void qemu_wait_io_event(CPUState *cpu)
>      }
>  
>  #ifdef _WIN32
> -    /* Eat dummy APC queued by qemu_cpu_kick_thread.  */
> +    /* Eat dummy APC queued by qemu_cpu_kick_thread. */
> +    /* NB!!! Should not this be if (hax_enabled)? Is this wrong for whpx? */
>      if (!tcg_enabled()) {
>          SleepEx(0, TRUE);
>      }
> @@ -467,8 +569,7 @@ static void *qemu_kvm_cpu_thread_fn(void *arg)
>      kvm_init_cpu_signals(cpu);
>  
>      /* signal CPU creation */
> -    cpu->created = true;
> -    qemu_cond_signal(&qemu_cpu_cond);
> +    cpu_thread_signal_created(cpu);
>      qemu_guest_random_seed_thread_part2(cpu->random_seed);
>  
>      do {
> @@ -482,8 +583,7 @@ static void *qemu_kvm_cpu_thread_fn(void *arg)
>      } while (!cpu->unplug || cpu_can_run(cpu));
>  
>      qemu_kvm_destroy_vcpu(cpu);
> -    cpu->created = false;
> -    qemu_cond_signal(&qemu_cpu_cond);
> +    cpu_thread_signal_destroyed(cpu);
>      qemu_mutex_unlock_iothread();
>      rcu_unregister_thread();
>      return NULL;
> @@ -511,8 +611,7 @@ static void *qemu_dummy_cpu_thread_fn(void *arg)
>      sigaddset(&waitset, SIG_IPI);
>  
>      /* signal CPU creation */
> -    cpu->created = true;
> -    qemu_cond_signal(&qemu_cpu_cond);
> +    cpu_thread_signal_created(cpu);
>      qemu_guest_random_seed_thread_part2(cpu->random_seed);
>  
>      do {
> @@ -660,8 +759,7 @@ static void deal_with_unplugged_cpus(void)
>      CPU_FOREACH(cpu) {
>          if (cpu->unplug && !cpu_can_run(cpu)) {
>              qemu_tcg_destroy_vcpu(cpu);
> -            cpu->created = false;
> -            qemu_cond_signal(&qemu_cpu_cond);
> +            cpu_thread_signal_destroyed(cpu);
>              break;
>          }
>      }
> @@ -688,9 +786,8 @@ static void *qemu_tcg_rr_cpu_thread_fn(void *arg)
>      qemu_thread_get_self(cpu->thread);
>  
>      cpu->thread_id = qemu_get_thread_id();
> -    cpu->created = true;
>      cpu->can_do_io = 1;
> -    qemu_cond_signal(&qemu_cpu_cond);
> +    cpu_thread_signal_created(cpu);
>      qemu_guest_random_seed_thread_part2(cpu->random_seed);
>  
>      /* wait for initial kick-off after machine start */
> @@ -800,11 +897,9 @@ static void *qemu_hax_cpu_thread_fn(void *arg)
>      qemu_thread_get_self(cpu->thread);
>  
>      cpu->thread_id = qemu_get_thread_id();
> -    cpu->created = true;
>      current_cpu = cpu;
> -
>      hax_init_vcpu(cpu);
> -    qemu_cond_signal(&qemu_cpu_cond);
> +    cpu_thread_signal_created(cpu);
>      qemu_guest_random_seed_thread_part2(cpu->random_seed);
>  
>      do {
> @@ -843,8 +938,7 @@ static void *qemu_hvf_cpu_thread_fn(void *arg)
>      hvf_init_vcpu(cpu);
>  
>      /* signal CPU creation */
> -    cpu->created = true;
> -    qemu_cond_signal(&qemu_cpu_cond);
> +    cpu_thread_signal_created(cpu);
>      qemu_guest_random_seed_thread_part2(cpu->random_seed);
>  
>      do {
> @@ -858,8 +952,7 @@ static void *qemu_hvf_cpu_thread_fn(void *arg)
>      } while (!cpu->unplug || cpu_can_run(cpu));
>  
>      hvf_vcpu_destroy(cpu);
> -    cpu->created = false;
> -    qemu_cond_signal(&qemu_cpu_cond);
> +    cpu_thread_signal_destroyed(cpu);
>      qemu_mutex_unlock_iothread();
>      rcu_unregister_thread();
>      return NULL;
> @@ -884,8 +977,7 @@ static void *qemu_whpx_cpu_thread_fn(void *arg)
>      }
>  
>      /* signal CPU creation */
> -    cpu->created = true;
> -    qemu_cond_signal(&qemu_cpu_cond);
> +    cpu_thread_signal_created(cpu);
>      qemu_guest_random_seed_thread_part2(cpu->random_seed);
>  
>      do {
> @@ -902,8 +994,7 @@ static void *qemu_whpx_cpu_thread_fn(void *arg)
>      } while (!cpu->unplug || cpu_can_run(cpu));
>  
>      whpx_destroy_vcpu(cpu);
> -    cpu->created = false;
> -    qemu_cond_signal(&qemu_cpu_cond);
> +    cpu_thread_signal_destroyed(cpu);
>      qemu_mutex_unlock_iothread();
>      rcu_unregister_thread();
>      return NULL;
> @@ -936,10 +1027,9 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
>      qemu_thread_get_self(cpu->thread);
>  
>      cpu->thread_id = qemu_get_thread_id();
> -    cpu->created = true;
>      cpu->can_do_io = 1;
>      current_cpu = cpu;
> -    qemu_cond_signal(&qemu_cpu_cond);
> +    cpu_thread_signal_created(cpu);
>      qemu_guest_random_seed_thread_part2(cpu->random_seed);
>  
>      /* process any pending work */
> @@ -980,14 +1070,13 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
>      } while (!cpu->unplug || cpu_can_run(cpu));
>  
>      qemu_tcg_destroy_vcpu(cpu);
> -    cpu->created = false;
> -    qemu_cond_signal(&qemu_cpu_cond);
> +    cpu_thread_signal_destroyed(cpu);
>      qemu_mutex_unlock_iothread();
>      rcu_unregister_thread();
>      return NULL;
>  }
>  
> -static void qemu_cpu_kick_thread(CPUState *cpu)
> +void cpus_kick_thread(CPUState *cpu)
>  {
>  #ifndef _WIN32
>      int err;
> @@ -1017,7 +1106,10 @@ static void qemu_cpu_kick_thread(CPUState *cpu)
>  void qemu_cpu_kick(CPUState *cpu)
>  {
>      qemu_cond_broadcast(cpu->halt_cond);
> -    if (tcg_enabled()) {
> +
> +    if (cpus_accel && cpus_accel->kick_vcpu_thread) {
> +        cpus_accel->kick_vcpu_thread(cpu);
> +    } else if (tcg_enabled()) {
>          if (qemu_tcg_mttcg_enabled()) {
>              cpu_exit(cpu);
>          } else {
> @@ -1031,14 +1123,14 @@ void qemu_cpu_kick(CPUState *cpu)
>               */
>              cpu->exit_request = 1;
>          }
> -        qemu_cpu_kick_thread(cpu);
> +        cpus_kick_thread(cpu);
>      }
>  }
>  
>  void qemu_cpu_kick_self(void)
>  {
>      assert(current_cpu);
> -    qemu_cpu_kick_thread(current_cpu);
> +    cpus_kick_thread(current_cpu);
>  }
>  
>  bool qemu_cpu_is_self(CPUState *cpu)
> @@ -1088,6 +1180,21 @@ void qemu_cond_timedwait_iothread(QemuCond *cond, int ms)
>      qemu_cond_timedwait(cond, &qemu_global_mutex, ms);
>  }
>  
> +/* signal CPU creation */
> +void cpu_thread_signal_created(CPUState *cpu)
> +{
> +    cpu->created = true;
> +    qemu_cond_signal(&qemu_cpu_cond);
> +}
> +
> +/* signal CPU destruction */
> +void cpu_thread_signal_destroyed(CPUState *cpu)
> +{
> +    cpu->created = false;
> +    qemu_cond_signal(&qemu_cpu_cond);
> +}
> +
> +
>  static bool all_vcpus_paused(void)
>  {
>      CPUState *cpu;
> @@ -1163,9 +1270,6 @@ void cpu_remove_sync(CPUState *cpu)
>      qemu_mutex_lock_iothread();
>  }
>  
> -/* For temporary buffers for forming a name */
> -#define VCPU_THREAD_NAME_SIZE 16
> -
>  static void qemu_tcg_init_vcpu(CPUState *cpu)
>  {
>      char thread_name[VCPU_THREAD_NAME_SIZE];
> @@ -1286,6 +1390,13 @@ static void qemu_whpx_start_vcpu(CPUState *cpu)
>  #endif
>  }
>  
> +void cpus_register_accel(CpusAccel *ca)
> +{
> +    assert(ca != NULL);
> +    assert(ca->create_vcpu_thread != NULL); /* mandatory */
> +    cpus_accel = ca;
> +}
> +
>  static void qemu_dummy_start_vcpu(CPUState *cpu)
>  {
>      char thread_name[VCPU_THREAD_NAME_SIZE];
> @@ -1316,7 +1427,10 @@ void qemu_init_vcpu(CPUState *cpu)
>          cpu_address_space_init(cpu, 0, "cpu-memory", cpu->memory);
>      }
>  
> -    if (kvm_enabled()) {
> +    if (cpus_accel) {
> +        /* accelerator already implements the CpusAccel interface */
> +        cpus_accel->create_vcpu_thread(cpu);
> +    } else if (kvm_enabled()) {
>          qemu_kvm_start_vcpu(cpu);
>      } else if (hax_enabled()) {
>          qemu_hax_start_vcpu(cpu);
> diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs
> index e97ad407fa..16345eec43 100644
> --- a/stubs/Makefile.objs
> +++ b/stubs/Makefile.objs
> @@ -1,6 +1,7 @@
>  stub-obj-y += blk-commit-all.o
>  stub-obj-y += cmos.o
>  stub-obj-y += cpu-get-clock.o
> +stub-obj-y += cpus-get-virtual-clock.o
>  stub-obj-y += qemu-timer-notify-cb.o
>  stub-obj-y += icount.o
>  stub-obj-y += dump.o
> @@ -28,6 +29,7 @@ stub-obj-y += trace-control.o
>  stub-obj-y += vmgenid.o
>  stub-obj-y += vmstate.o
>  stub-obj-$(CONFIG_SOFTMMU) += win32-kbd-hook.o
> +stub-obj-y += cpu-synchronize-state.o
>  
>  #######################################################################
>  # code used by both qemu system emulation and qemu-img
> diff --git a/stubs/cpu-synchronize-state.c b/stubs/cpu-synchronize-state.c
> new file mode 100644
> index 0000000000..3112fe439d
> --- /dev/null
> +++ b/stubs/cpu-synchronize-state.c
> @@ -0,0 +1,15 @@
> +#include "qemu/osdep.h"
> +#include "sysemu/hw_accel.h"
> +
> +void cpu_synchronize_state(CPUState *cpu)
> +{
> +}
> +void cpu_synchronize_post_reset(CPUState *cpu)
> +{
> +}
> +void cpu_synchronize_post_init(CPUState *cpu)
> +{
> +}
> +void cpu_synchronize_pre_loadvm(CPUState *cpu)
> +{
> +}
> diff --git a/stubs/cpus-get-virtual-clock.c b/stubs/cpus-get-virtual-clock.c
> new file mode 100644
> index 0000000000..fd447d53f3
> --- /dev/null
> +++ b/stubs/cpus-get-virtual-clock.c
> @@ -0,0 +1,8 @@
> +#include "qemu/osdep.h"
> +#include "sysemu/cpu-timers.h"
> +#include "qemu/main-loop.h"
> +
> +int64_t cpus_get_virtual_clock(void)
> +{
> +    return cpu_get_clock();
> +}
> diff --git a/util/qemu-timer.c b/util/qemu-timer.c
> index db51e68f25..50b325c65b 100644
> --- a/util/qemu-timer.c
> +++ b/util/qemu-timer.c
> @@ -635,13 +635,7 @@ int64_t qemu_clock_get_ns(QEMUClockType type)
>          return get_clock();
>      default:
>      case QEMU_CLOCK_VIRTUAL:
> -        if (icount_enabled()) {
> -            return icount_get();
> -        } else if (qtest_enabled()) { /* for qtest_clock_warp */
> -            return qtest_get_virtual_clock();
> -        } else {
> -            return cpu_get_clock();
> -        }
> +        return cpus_get_virtual_clock();
>      case QEMU_CLOCK_HOST:
>          return REPLAY_CLOCK(REPLAY_CLOCK_HOST, get_clock_realtime());
>      case QEMU_CLOCK_VIRTUAL_RT:
> 



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC v3 2/8] cpus: prepare new CpusAccel cpu accelerator interface
  2020-08-20  8:17   ` Claudio Fontana
@ 2020-08-30 13:34     ` Claudio Fontana
  2020-08-30 16:41       ` Paolo Bonzini
  0 siblings, 1 reply; 25+ messages in thread
From: Claudio Fontana @ 2020-08-30 13:34 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Laurent Vivier, Peter Maydell, Thomas Huth, Eduardo Habkost,
	Colin Xu, Philippe Mathieu-Daudé,
	Marcelo Tosatti, qemu-devel, Markus Armbruster, Roman Bolshakov,
	Pavel Dovgalyuk, Wenchao Wang, haxm-team, Sunil Muthuswamy,
	Alex Bennée, Richard Henderson

Ciao Paolo,

just a ping on this one, it would seem that qemu_clock_get_ns needs to be called before
any accelerator is initialized, before ticks are enabled, as part of qcow2 initialization.

I could add a check specifically for this and a comment in the cpus_get_virtual_clock(), but do you have any thoughts?

Thanks,

Claudio


On 8/20/20 10:17 AM, Claudio Fontana wrote:
> Hi Paolo and all,
> 
> back in RFC v3 I introduced cpus_get_virtual_clock in this patch.
> 
> I observed an issue when adding the get_virtual_clock to the CpusAccel interface, ie
> it seems that qemu_clock_get_ns() is called in some io-tests before the accelerator is initialized,
> which seems to collide with the idea to make it part of the CpusAccel interface:
> 
> (gdb) bt
> #0  0x00005555558e6af0 in cpus_get_virtual_clock () at /home/claudio/git/qemu-pristine/qemu/softmmu/cpus.c:219
> #1  0x0000555555c5099c in qemu_clock_get_ns (type=type@entry=QEMU_CLOCK_VIRTUAL)
>     at /home/claudio/git/qemu-pristine/qemu/util/qemu-timer.c:638
> #2  0x0000555555b6077a in qemu_clock_get_ms (type=QEMU_CLOCK_VIRTUAL) at /home/claudio/git/qemu-pristine/qemu/include/qemu/timer.h:118
> #3  0x0000555555b6077a in cache_clean_timer_init (bs=bs@entry=0x5555568381a0, context=0x555556821930)
>     at /home/claudio/git/qemu-pristine/qemu/block/qcow2.c:846
> #4  0x0000555555b63012 in qcow2_update_options_commit (bs=bs@entry=0x5555568381a0, r=r@entry=0x7fffd6a45e10)
>     at /home/claudio/git/qemu-pristine/qemu/block/qcow2.c:1221
> #5  0x0000555555b657ea in qcow2_update_options
>     (bs=bs@entry=0x5555568381a0, options=options@entry=0x55555683d600, flags=flags@entry=139266, errp=errp@entry=0x7fffffffd580)
>     at /home/claudio/git/qemu-pristine/qemu/block/qcow2.c:1248
> #6  0x0000555555b671a2 in qcow2_do_open (bs=0x5555568381a0, options=0x55555683d600, flags=139266, errp=0x7fffffffd580)
>     at /home/claudio/git/qemu-pristine/qemu/block/qcow2.c:1579
> #7  0x0000555555b67e62 in qcow2_open_entry (opaque=0x7fffffffd520) at /home/claudio/git/qemu-pristine/qemu/block/qcow2.c:1867
> #8  0x0000555555c4854c in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>)
>     at /home/claudio/git/qemu-pristine/qemu/util/coroutine-ucontext.c:173
> #9  0x00007fffed3779c0 in __start_context () at /lib64/libc.so.6
> #10 0x00007fffffffcd90 in  ()
> #11 0x0000000000000000 in  ()
> 
> (gdb) p *current_machine
> $3 = {parent_obj = {class = 0x5555567a2090, free = 0x7ffff72d9840 <g_free>, Python Exception <class 'gdb.error'> There is no member named keys.: 
> properties = 0x55555681c580, ref = 2, 
>     parent = 0x55555682aa90}, sysbus_notifier = {notify = 0x555555990130 <machine_init_notify>, node = {le_next = 
>     0x5555564e1130 <chardev_machine_done_notify>, le_prev = 0x5555565079f0 <machine_init_done_notifiers>}}, dtb = 0x0, dumpdtb = 0x0, 
>   phandle_start = 0, dt_compatible = 0x0, dump_guest_core = true, mem_merge = true, usb = false, usb_disabled = false, firmware = 0x0, 
>   iommu = false, suppress_vmdesc = false, enforce_config_section = false, enable_graphics = true, memory_encryption = 0x0, 
>   ram_memdev_id = 0x0, ram = 0x0, device_memory = 0x0, ram_size = 0, maxram_size = 0, ram_slots = 0, boot_order = 0x0, 
>   kernel_filename = 0x0, kernel_cmdline = 0x0, initrd_filename = 0x0, cpu_type = 0x0, accelerator = 0x0, possible_cpus = 0x0, smp = {
>     cpus = 1, cores = 1, threads = 1, sockets = 1, max_cpus = 1}, nvdimms_state = 0x555556822850, numa_state = 0x555556822be0}
> 
> 
> The affected tests are:
> 
> Failures: 030 040 041 060 099 120 127 140 156 161 172 181 191 192 195 203 229 249 256 267
> 
> Are the tests wrong here, to trigger this call stack before the accel is set,
> or should the get virtual clock functionality be taken out of the interface, or ...?
> 
> Thanks for any advice,
> 
> Ciao,
> 
> Claudio  
> 
> 
> On 8/3/20 11:05 AM, Claudio Fontana wrote:
>> The new interface starts unused, will start being used by the
>> next patches.
>>
>> It provides methods for each accelerator to start a vcpu, kick a vcpu,
>> synchronize state, get cpu virtual clock and elapsed ticks.
>>
>> Signed-off-by: Claudio Fontana <cfontana@suse.de>
>> ---
>>  hw/core/cpu.c                  |   1 +
>>  hw/i386/x86.c                  |   2 +-
>>  include/sysemu/cpu-timers.h    |   9 +-
>>  include/sysemu/cpus.h          |  36 ++++++++
>>  include/sysemu/hw_accel.h      |  69 ++-------------
>>  softmmu/cpu-timers.c           |   9 +-
>>  softmmu/cpus.c                 | 194 ++++++++++++++++++++++++++++++++---------
>>  stubs/Makefile.objs            |   2 +
>>  stubs/cpu-synchronize-state.c  |  15 ++++
>>  stubs/cpus-get-virtual-clock.c |   8 ++
>>  util/qemu-timer.c              |   8 +-
>>  11 files changed, 231 insertions(+), 122 deletions(-)
>>  create mode 100644 stubs/cpu-synchronize-state.c
>>  create mode 100644 stubs/cpus-get-virtual-clock.c
>>
>> diff --git a/hw/core/cpu.c b/hw/core/cpu.c
>> index 594441a150..b389a312df 100644
>> --- a/hw/core/cpu.c
>> +++ b/hw/core/cpu.c
>> @@ -33,6 +33,7 @@
>>  #include "hw/qdev-properties.h"
>>  #include "trace-root.h"
>>  #include "qemu/plugin.h"
>> +#include "sysemu/hw_accel.h"
>>  
>>  CPUInterruptHandler cpu_interrupt_handler;
>>  
>> diff --git a/hw/i386/x86.c b/hw/i386/x86.c
>> index 58cf2229d5..00c35bad7e 100644
>> --- a/hw/i386/x86.c
>> +++ b/hw/i386/x86.c
>> @@ -264,7 +264,7 @@ static long get_file_size(FILE *f)
>>  /* TSC handling */
>>  uint64_t cpu_get_tsc(CPUX86State *env)
>>  {
>> -    return cpu_get_ticks();
>> +    return cpus_get_elapsed_ticks();
>>  }
>>  
>>  /* IRQ handling */
>> diff --git a/include/sysemu/cpu-timers.h b/include/sysemu/cpu-timers.h
>> index 07d724672f..cb83cc5584 100644
>> --- a/include/sysemu/cpu-timers.h
>> +++ b/include/sysemu/cpu-timers.h
>> @@ -64,9 +64,8 @@ void cpu_enable_ticks(void);
>>  void cpu_disable_ticks(void);
>>  
>>  /*
>> - * return the time elapsed in VM between vm_start and vm_stop.  Unless
>> - * icount is active, cpu_get_ticks() uses units of the host CPU cycle
>> - * counter.
>> + * return the time elapsed in VM between vm_start and vm_stop.
>> + * cpu_get_ticks() uses units of the host CPU cycle counter.
>>   */
>>  int64_t cpu_get_ticks(void);
>>  
>> @@ -78,4 +77,8 @@ int64_t cpu_get_clock(void);
>>  
>>  void qemu_timer_notify_cb(void *opaque, QEMUClockType type);
>>  
>> +/* get the VIRTUAL clock and VM elapsed ticks via the cpus accel interface */
>> +int64_t cpus_get_virtual_clock(void);
>> +int64_t cpus_get_elapsed_ticks(void);
>> +
>>  #endif /* SYSEMU_CPU_TIMERS_H */
>> diff --git a/include/sysemu/cpus.h b/include/sysemu/cpus.h
>> index 149de000a0..db196dd96f 100644
>> --- a/include/sysemu/cpus.h
>> +++ b/include/sysemu/cpus.h
>> @@ -4,7 +4,43 @@
>>  #include "qemu/timer.h"
>>  
>>  /* cpus.c */
>> +
>> +/* CPU execution threads */
>> +
>> +typedef struct CpusAccel {
>> +    void (*create_vcpu_thread)(CPUState *cpu); /* MANDATORY */
>> +    void (*kick_vcpu_thread)(CPUState *cpu);
>> +
>> +    void (*synchronize_post_reset)(CPUState *cpu);
>> +    void (*synchronize_post_init)(CPUState *cpu);
>> +    void (*synchronize_state)(CPUState *cpu);
>> +    void (*synchronize_pre_loadvm)(CPUState *cpu);
>> +
>> +    int64_t (*get_virtual_clock)(void);
>> +    int64_t (*get_elapsed_ticks)(void);
>> +} CpusAccel;
>> +
>> +/* register accel-specific cpus interface implementation */
>> +void cpus_register_accel(CpusAccel *i);
>> +
>> +/* interface available for cpus accelerator threads */
>> +
>> +/* For temporary buffers for forming a name */
>> +#define VCPU_THREAD_NAME_SIZE 16
>> +
>> +void cpus_kick_thread(CPUState *cpu);
>> +bool cpu_work_list_empty(CPUState *cpu);
>> +bool cpu_thread_is_idle(CPUState *cpu);
>>  bool all_cpu_threads_idle(void);
>> +bool cpu_can_run(CPUState *cpu);
>> +void qemu_wait_io_event_common(CPUState *cpu);
>> +void qemu_wait_io_event(CPUState *cpu);
>> +void cpu_thread_signal_created(CPUState *cpu);
>> +void cpu_thread_signal_destroyed(CPUState *cpu);
>> +void cpu_handle_guest_debug(CPUState *cpu);
>> +
>> +/* end interface for cpus accelerator threads */
>> +
>>  bool qemu_in_vcpu_thread(void);
>>  void qemu_init_cpu_loop(void);
>>  void resume_all_vcpus(void);
>> diff --git a/include/sysemu/hw_accel.h b/include/sysemu/hw_accel.h
>> index e128f8b06b..ffed6192a3 100644
>> --- a/include/sysemu/hw_accel.h
>> +++ b/include/sysemu/hw_accel.h
>> @@ -1,5 +1,5 @@
>>  /*
>> - * QEMU Hardware accelertors support
>> + * QEMU Hardware accelerators support
>>   *
>>   * Copyright 2016 Google, Inc.
>>   *
>> @@ -17,68 +17,9 @@
>>  #include "sysemu/hvf.h"
>>  #include "sysemu/whpx.h"
>>  
>> -static inline void cpu_synchronize_state(CPUState *cpu)
>> -{
>> -    if (kvm_enabled()) {
>> -        kvm_cpu_synchronize_state(cpu);
>> -    }
>> -    if (hax_enabled()) {
>> -        hax_cpu_synchronize_state(cpu);
>> -    }
>> -    if (hvf_enabled()) {
>> -        hvf_cpu_synchronize_state(cpu);
>> -    }
>> -    if (whpx_enabled()) {
>> -        whpx_cpu_synchronize_state(cpu);
>> -    }
>> -}
>> -
>> -static inline void cpu_synchronize_post_reset(CPUState *cpu)
>> -{
>> -    if (kvm_enabled()) {
>> -        kvm_cpu_synchronize_post_reset(cpu);
>> -    }
>> -    if (hax_enabled()) {
>> -        hax_cpu_synchronize_post_reset(cpu);
>> -    }
>> -    if (hvf_enabled()) {
>> -        hvf_cpu_synchronize_post_reset(cpu);
>> -    }
>> -    if (whpx_enabled()) {
>> -        whpx_cpu_synchronize_post_reset(cpu);
>> -    }
>> -}
>> -
>> -static inline void cpu_synchronize_post_init(CPUState *cpu)
>> -{
>> -    if (kvm_enabled()) {
>> -        kvm_cpu_synchronize_post_init(cpu);
>> -    }
>> -    if (hax_enabled()) {
>> -        hax_cpu_synchronize_post_init(cpu);
>> -    }
>> -    if (hvf_enabled()) {
>> -        hvf_cpu_synchronize_post_init(cpu);
>> -    }
>> -    if (whpx_enabled()) {
>> -        whpx_cpu_synchronize_post_init(cpu);
>> -    }
>> -}
>> -
>> -static inline void cpu_synchronize_pre_loadvm(CPUState *cpu)
>> -{
>> -    if (kvm_enabled()) {
>> -        kvm_cpu_synchronize_pre_loadvm(cpu);
>> -    }
>> -    if (hax_enabled()) {
>> -        hax_cpu_synchronize_pre_loadvm(cpu);
>> -    }
>> -    if (hvf_enabled()) {
>> -        hvf_cpu_synchronize_pre_loadvm(cpu);
>> -    }
>> -    if (whpx_enabled()) {
>> -        whpx_cpu_synchronize_pre_loadvm(cpu);
>> -    }
>> -}
>> +void cpu_synchronize_state(CPUState *cpu);
>> +void cpu_synchronize_post_reset(CPUState *cpu);
>> +void cpu_synchronize_post_init(CPUState *cpu);
>> +void cpu_synchronize_pre_loadvm(CPUState *cpu);
>>  
>>  #endif /* QEMU_HW_ACCEL_H */
>> diff --git a/softmmu/cpu-timers.c b/softmmu/cpu-timers.c
>> index 64addb315d..3e1da79735 100644
>> --- a/softmmu/cpu-timers.c
>> +++ b/softmmu/cpu-timers.c
>> @@ -61,18 +61,13 @@ static int64_t cpu_get_ticks_locked(void)
>>  }
>>  
>>  /*
>> - * return the time elapsed in VM between vm_start and vm_stop.  Unless
>> - * icount is active, cpu_get_ticks() uses units of the host CPU cycle
>> - * counter.
>> + * return the time elapsed in VM between vm_start and vm_stop.
>> + * cpu_get_ticks() uses units of the host CPU cycle counter.
>>   */
>>  int64_t cpu_get_ticks(void)
>>  {
>>      int64_t ticks;
>>  
>> -    if (icount_enabled()) {
>> -        return icount_get();
>> -    }
>> -
>>      qemu_spin_lock(&timers_state.vm_clock_lock);
>>      ticks = cpu_get_ticks_locked();
>>      qemu_spin_unlock(&timers_state.vm_clock_lock);
>> diff --git a/softmmu/cpus.c b/softmmu/cpus.c
>> index 54fdb2761c..bad6302ca3 100644
>> --- a/softmmu/cpus.c
>> +++ b/softmmu/cpus.c
>> @@ -87,7 +87,7 @@ bool cpu_is_stopped(CPUState *cpu)
>>      return cpu->stopped || !runstate_is_running();
>>  }
>>  
>> -static inline bool cpu_work_list_empty(CPUState *cpu)
>> +bool cpu_work_list_empty(CPUState *cpu)
>>  {
>>      bool ret;
>>  
>> @@ -97,7 +97,7 @@ static inline bool cpu_work_list_empty(CPUState *cpu)
>>      return ret;
>>  }
>>  
>> -static bool cpu_thread_is_idle(CPUState *cpu)
>> +bool cpu_thread_is_idle(CPUState *cpu)
>>  {
>>      if (cpu->stop || !cpu_work_list_empty(cpu)) {
>>          return false;
>> @@ -215,6 +215,11 @@ void hw_error(const char *fmt, ...)
>>      abort();
>>  }
>>  
>> +/*
>> + * The chosen accelerator is supposed to register this.
>> + */
>> +static CpusAccel *cpus_accel;
>> +
>>  void cpu_synchronize_all_states(void)
>>  {
>>      CPUState *cpu;
>> @@ -251,6 +256,102 @@ void cpu_synchronize_all_pre_loadvm(void)
>>      }
>>  }
>>  
>> +void cpu_synchronize_state(CPUState *cpu)
>> +{
>> +    if (cpus_accel && cpus_accel->synchronize_state) {
>> +        cpus_accel->synchronize_state(cpu);
>> +    }
>> +    if (kvm_enabled()) {
>> +        kvm_cpu_synchronize_state(cpu);
>> +    }
>> +    if (hax_enabled()) {
>> +        hax_cpu_synchronize_state(cpu);
>> +    }
>> +    if (whpx_enabled()) {
>> +        whpx_cpu_synchronize_state(cpu);
>> +    }
>> +}
>> +
>> +void cpu_synchronize_post_reset(CPUState *cpu)
>> +{
>> +    if (cpus_accel && cpus_accel->synchronize_post_reset) {
>> +        cpus_accel->synchronize_post_reset(cpu);
>> +    }
>> +    if (kvm_enabled()) {
>> +        kvm_cpu_synchronize_post_reset(cpu);
>> +    }
>> +    if (hax_enabled()) {
>> +        hax_cpu_synchronize_post_reset(cpu);
>> +    }
>> +    if (whpx_enabled()) {
>> +        whpx_cpu_synchronize_post_reset(cpu);
>> +    }
>> +}
>> +
>> +void cpu_synchronize_post_init(CPUState *cpu)
>> +{
>> +    if (cpus_accel && cpus_accel->synchronize_post_init) {
>> +        cpus_accel->synchronize_post_init(cpu);
>> +    }
>> +    if (kvm_enabled()) {
>> +        kvm_cpu_synchronize_post_init(cpu);
>> +    }
>> +    if (hax_enabled()) {
>> +        hax_cpu_synchronize_post_init(cpu);
>> +    }
>> +    if (whpx_enabled()) {
>> +        whpx_cpu_synchronize_post_init(cpu);
>> +    }
>> +}
>> +
>> +void cpu_synchronize_pre_loadvm(CPUState *cpu)
>> +{
>> +    if (cpus_accel && cpus_accel->synchronize_pre_loadvm) {
>> +        cpus_accel->synchronize_pre_loadvm(cpu);
>> +    }
>> +    if (kvm_enabled()) {
>> +        kvm_cpu_synchronize_pre_loadvm(cpu);
>> +    }
>> +    if (hax_enabled()) {
>> +        hax_cpu_synchronize_pre_loadvm(cpu);
>> +    }
>> +    if (hvf_enabled()) {
>> +        hvf_cpu_synchronize_pre_loadvm(cpu);
>> +    }
>> +    if (whpx_enabled()) {
>> +        whpx_cpu_synchronize_pre_loadvm(cpu);
>> +    }
>> +}
>> +
>> +int64_t cpus_get_virtual_clock(void)
>> +{
>> +    if (cpus_accel && cpus_accel->get_virtual_clock) {
>> +        return cpus_accel->get_virtual_clock();
>> +    }
>> +    if (icount_enabled()) {
>> +        return icount_get();
>> +    } else if (qtest_enabled()) { /* for qtest_clock_warp */
>> +        return qtest_get_virtual_clock();
>> +    }
>> +    return cpu_get_clock();
>> +}
>> +
>> +/*
>> + * return the time elapsed in VM between vm_start and vm_stop.  Unless
>> + * icount is active, cpu_get_ticks() uses units of the host CPU cycle
>> + * counter.
>> + */
>> +int64_t cpus_get_elapsed_ticks(void)
>> +{
>> +    if (cpus_accel && cpus_accel->get_elapsed_ticks) {
>> +        return cpus_accel->get_elapsed_ticks();
>> +    }
>> +    if (icount_enabled()) {
>> +        return icount_get();
>> +    }
>> +    return cpu_get_ticks();
>> +}
>> +
>>  static int do_vm_stop(RunState state, bool send_stop)
>>  {
>>      int ret = 0;
>> @@ -279,7 +380,7 @@ int vm_shutdown(void)
>>      return do_vm_stop(RUN_STATE_SHUTDOWN, false);
>>  }
>>  
>> -static bool cpu_can_run(CPUState *cpu)
>> +bool cpu_can_run(CPUState *cpu)
>>  {
>>      if (cpu->stop) {
>>          return false;
>> @@ -290,7 +391,7 @@ static bool cpu_can_run(CPUState *cpu)
>>      return true;
>>  }
>>  
>> -static void cpu_handle_guest_debug(CPUState *cpu)
>> +void cpu_handle_guest_debug(CPUState *cpu)
>>  {
>>      gdb_set_stop_cpu(cpu);
>>      qemu_system_debug_request();
>> @@ -396,7 +497,7 @@ static void qemu_cpu_stop(CPUState *cpu, bool exit)
>>      qemu_cond_broadcast(&qemu_pause_cond);
>>  }
>>  
>> -static void qemu_wait_io_event_common(CPUState *cpu)
>> +void qemu_wait_io_event_common(CPUState *cpu)
>>  {
>>      atomic_mb_set(&cpu->thread_kicked, false);
>>      if (cpu->stop) {
>> @@ -421,7 +522,7 @@ static void qemu_tcg_rr_wait_io_event(void)
>>      }
>>  }
>>  
>> -static void qemu_wait_io_event(CPUState *cpu)
>> +void qemu_wait_io_event(CPUState *cpu)
>>  {
>>      bool slept = false;
>>  
>> @@ -437,7 +538,8 @@ static void qemu_wait_io_event(CPUState *cpu)
>>      }
>>  
>>  #ifdef _WIN32
>> -    /* Eat dummy APC queued by qemu_cpu_kick_thread.  */
>> +    /* Eat dummy APC queued by qemu_cpu_kick_thread. */
>> +    /* NB!!! Should not this be if (hax_enabled)? Is this wrong for whpx? */
>>      if (!tcg_enabled()) {
>>          SleepEx(0, TRUE);
>>      }
>> @@ -467,8 +569,7 @@ static void *qemu_kvm_cpu_thread_fn(void *arg)
>>      kvm_init_cpu_signals(cpu);
>>  
>>      /* signal CPU creation */
>> -    cpu->created = true;
>> -    qemu_cond_signal(&qemu_cpu_cond);
>> +    cpu_thread_signal_created(cpu);
>>      qemu_guest_random_seed_thread_part2(cpu->random_seed);
>>  
>>      do {
>> @@ -482,8 +583,7 @@ static void *qemu_kvm_cpu_thread_fn(void *arg)
>>      } while (!cpu->unplug || cpu_can_run(cpu));
>>  
>>      qemu_kvm_destroy_vcpu(cpu);
>> -    cpu->created = false;
>> -    qemu_cond_signal(&qemu_cpu_cond);
>> +    cpu_thread_signal_destroyed(cpu);
>>      qemu_mutex_unlock_iothread();
>>      rcu_unregister_thread();
>>      return NULL;
>> @@ -511,8 +611,7 @@ static void *qemu_dummy_cpu_thread_fn(void *arg)
>>      sigaddset(&waitset, SIG_IPI);
>>  
>>      /* signal CPU creation */
>> -    cpu->created = true;
>> -    qemu_cond_signal(&qemu_cpu_cond);
>> +    cpu_thread_signal_created(cpu);
>>      qemu_guest_random_seed_thread_part2(cpu->random_seed);
>>  
>>      do {
>> @@ -660,8 +759,7 @@ static void deal_with_unplugged_cpus(void)
>>      CPU_FOREACH(cpu) {
>>          if (cpu->unplug && !cpu_can_run(cpu)) {
>>              qemu_tcg_destroy_vcpu(cpu);
>> -            cpu->created = false;
>> -            qemu_cond_signal(&qemu_cpu_cond);
>> +            cpu_thread_signal_destroyed(cpu);
>>              break;
>>          }
>>      }
>> @@ -688,9 +786,8 @@ static void *qemu_tcg_rr_cpu_thread_fn(void *arg)
>>      qemu_thread_get_self(cpu->thread);
>>  
>>      cpu->thread_id = qemu_get_thread_id();
>> -    cpu->created = true;
>>      cpu->can_do_io = 1;
>> -    qemu_cond_signal(&qemu_cpu_cond);
>> +    cpu_thread_signal_created(cpu);
>>      qemu_guest_random_seed_thread_part2(cpu->random_seed);
>>  
>>      /* wait for initial kick-off after machine start */
>> @@ -800,11 +897,9 @@ static void *qemu_hax_cpu_thread_fn(void *arg)
>>      qemu_thread_get_self(cpu->thread);
>>  
>>      cpu->thread_id = qemu_get_thread_id();
>> -    cpu->created = true;
>>      current_cpu = cpu;
>> -
>>      hax_init_vcpu(cpu);
>> -    qemu_cond_signal(&qemu_cpu_cond);
>> +    cpu_thread_signal_created(cpu);
>>      qemu_guest_random_seed_thread_part2(cpu->random_seed);
>>  
>>      do {
>> @@ -843,8 +938,7 @@ static void *qemu_hvf_cpu_thread_fn(void *arg)
>>      hvf_init_vcpu(cpu);
>>  
>>      /* signal CPU creation */
>> -    cpu->created = true;
>> -    qemu_cond_signal(&qemu_cpu_cond);
>> +    cpu_thread_signal_created(cpu);
>>      qemu_guest_random_seed_thread_part2(cpu->random_seed);
>>  
>>      do {
>> @@ -858,8 +952,7 @@ static void *qemu_hvf_cpu_thread_fn(void *arg)
>>      } while (!cpu->unplug || cpu_can_run(cpu));
>>  
>>      hvf_vcpu_destroy(cpu);
>> -    cpu->created = false;
>> -    qemu_cond_signal(&qemu_cpu_cond);
>> +    cpu_thread_signal_destroyed(cpu);
>>      qemu_mutex_unlock_iothread();
>>      rcu_unregister_thread();
>>      return NULL;
>> @@ -884,8 +977,7 @@ static void *qemu_whpx_cpu_thread_fn(void *arg)
>>      }
>>  
>>      /* signal CPU creation */
>> -    cpu->created = true;
>> -    qemu_cond_signal(&qemu_cpu_cond);
>> +    cpu_thread_signal_created(cpu);
>>      qemu_guest_random_seed_thread_part2(cpu->random_seed);
>>  
>>      do {
>> @@ -902,8 +994,7 @@ static void *qemu_whpx_cpu_thread_fn(void *arg)
>>      } while (!cpu->unplug || cpu_can_run(cpu));
>>  
>>      whpx_destroy_vcpu(cpu);
>> -    cpu->created = false;
>> -    qemu_cond_signal(&qemu_cpu_cond);
>> +    cpu_thread_signal_destroyed(cpu);
>>      qemu_mutex_unlock_iothread();
>>      rcu_unregister_thread();
>>      return NULL;
>> @@ -936,10 +1027,9 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
>>      qemu_thread_get_self(cpu->thread);
>>  
>>      cpu->thread_id = qemu_get_thread_id();
>> -    cpu->created = true;
>>      cpu->can_do_io = 1;
>>      current_cpu = cpu;
>> -    qemu_cond_signal(&qemu_cpu_cond);
>> +    cpu_thread_signal_created(cpu);
>>      qemu_guest_random_seed_thread_part2(cpu->random_seed);
>>  
>>      /* process any pending work */
>> @@ -980,14 +1070,13 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
>>      } while (!cpu->unplug || cpu_can_run(cpu));
>>  
>>      qemu_tcg_destroy_vcpu(cpu);
>> -    cpu->created = false;
>> -    qemu_cond_signal(&qemu_cpu_cond);
>> +    cpu_thread_signal_destroyed(cpu);
>>      qemu_mutex_unlock_iothread();
>>      rcu_unregister_thread();
>>      return NULL;
>>  }
>>  
>> -static void qemu_cpu_kick_thread(CPUState *cpu)
>> +void cpus_kick_thread(CPUState *cpu)
>>  {
>>  #ifndef _WIN32
>>      int err;
>> @@ -1017,7 +1106,10 @@ static void qemu_cpu_kick_thread(CPUState *cpu)
>>  void qemu_cpu_kick(CPUState *cpu)
>>  {
>>      qemu_cond_broadcast(cpu->halt_cond);
>> -    if (tcg_enabled()) {
>> +
>> +    if (cpus_accel && cpus_accel->kick_vcpu_thread) {
>> +        cpus_accel->kick_vcpu_thread(cpu);
>> +    } else if (tcg_enabled()) {
>>          if (qemu_tcg_mttcg_enabled()) {
>>              cpu_exit(cpu);
>>          } else {
>> @@ -1031,14 +1123,14 @@ void qemu_cpu_kick(CPUState *cpu)
>>               */
>>              cpu->exit_request = 1;
>>          }
>> -        qemu_cpu_kick_thread(cpu);
>> +        cpus_kick_thread(cpu);
>>      }
>>  }
>>  
>>  void qemu_cpu_kick_self(void)
>>  {
>>      assert(current_cpu);
>> -    qemu_cpu_kick_thread(current_cpu);
>> +    cpus_kick_thread(current_cpu);
>>  }
>>  
>>  bool qemu_cpu_is_self(CPUState *cpu)
>> @@ -1088,6 +1180,21 @@ void qemu_cond_timedwait_iothread(QemuCond *cond, int ms)
>>      qemu_cond_timedwait(cond, &qemu_global_mutex, ms);
>>  }
>>  
>> +/* signal CPU creation */
>> +void cpu_thread_signal_created(CPUState *cpu)
>> +{
>> +    cpu->created = true;
>> +    qemu_cond_signal(&qemu_cpu_cond);
>> +}
>> +
>> +/* signal CPU destruction */
>> +void cpu_thread_signal_destroyed(CPUState *cpu)
>> +{
>> +    cpu->created = false;
>> +    qemu_cond_signal(&qemu_cpu_cond);
>> +}
>> +
>> +
>>  static bool all_vcpus_paused(void)
>>  {
>>      CPUState *cpu;
>> @@ -1163,9 +1270,6 @@ void cpu_remove_sync(CPUState *cpu)
>>      qemu_mutex_lock_iothread();
>>  }
>>  
>> -/* For temporary buffers for forming a name */
>> -#define VCPU_THREAD_NAME_SIZE 16
>> -
>>  static void qemu_tcg_init_vcpu(CPUState *cpu)
>>  {
>>      char thread_name[VCPU_THREAD_NAME_SIZE];
>> @@ -1286,6 +1390,13 @@ static void qemu_whpx_start_vcpu(CPUState *cpu)
>>  #endif
>>  }
>>  
>> +void cpus_register_accel(CpusAccel *ca)
>> +{
>> +    assert(ca != NULL);
>> +    assert(ca->create_vcpu_thread != NULL); /* mandatory */
>> +    cpus_accel = ca;
>> +}
>> +
>>  static void qemu_dummy_start_vcpu(CPUState *cpu)
>>  {
>>      char thread_name[VCPU_THREAD_NAME_SIZE];
>> @@ -1316,7 +1427,10 @@ void qemu_init_vcpu(CPUState *cpu)
>>          cpu_address_space_init(cpu, 0, "cpu-memory", cpu->memory);
>>      }
>>  
>> -    if (kvm_enabled()) {
>> +    if (cpus_accel) {
>> +        /* accelerator already implements the CpusAccel interface */
>> +        cpus_accel->create_vcpu_thread(cpu);
>> +    } else if (kvm_enabled()) {
>>          qemu_kvm_start_vcpu(cpu);
>>      } else if (hax_enabled()) {
>>          qemu_hax_start_vcpu(cpu);
>> diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs
>> index e97ad407fa..16345eec43 100644
>> --- a/stubs/Makefile.objs
>> +++ b/stubs/Makefile.objs
>> @@ -1,6 +1,7 @@
>>  stub-obj-y += blk-commit-all.o
>>  stub-obj-y += cmos.o
>>  stub-obj-y += cpu-get-clock.o
>> +stub-obj-y += cpus-get-virtual-clock.o
>>  stub-obj-y += qemu-timer-notify-cb.o
>>  stub-obj-y += icount.o
>>  stub-obj-y += dump.o
>> @@ -28,6 +29,7 @@ stub-obj-y += trace-control.o
>>  stub-obj-y += vmgenid.o
>>  stub-obj-y += vmstate.o
>>  stub-obj-$(CONFIG_SOFTMMU) += win32-kbd-hook.o
>> +stub-obj-y += cpu-synchronize-state.o
>>  
>>  #######################################################################
>>  # code used by both qemu system emulation and qemu-img
>> diff --git a/stubs/cpu-synchronize-state.c b/stubs/cpu-synchronize-state.c
>> new file mode 100644
>> index 0000000000..3112fe439d
>> --- /dev/null
>> +++ b/stubs/cpu-synchronize-state.c
>> @@ -0,0 +1,15 @@
>> +#include "qemu/osdep.h"
>> +#include "sysemu/hw_accel.h"
>> +
>> +void cpu_synchronize_state(CPUState *cpu)
>> +{
>> +}
>> +void cpu_synchronize_post_reset(CPUState *cpu)
>> +{
>> +}
>> +void cpu_synchronize_post_init(CPUState *cpu)
>> +{
>> +}
>> +void cpu_synchronize_pre_loadvm(CPUState *cpu)
>> +{
>> +}
>> diff --git a/stubs/cpus-get-virtual-clock.c b/stubs/cpus-get-virtual-clock.c
>> new file mode 100644
>> index 0000000000..fd447d53f3
>> --- /dev/null
>> +++ b/stubs/cpus-get-virtual-clock.c
>> @@ -0,0 +1,8 @@
>> +#include "qemu/osdep.h"
>> +#include "sysemu/cpu-timers.h"
>> +#include "qemu/main-loop.h"
>> +
>> +int64_t cpus_get_virtual_clock(void)
>> +{
>> +    return cpu_get_clock();
>> +}
>> diff --git a/util/qemu-timer.c b/util/qemu-timer.c
>> index db51e68f25..50b325c65b 100644
>> --- a/util/qemu-timer.c
>> +++ b/util/qemu-timer.c
>> @@ -635,13 +635,7 @@ int64_t qemu_clock_get_ns(QEMUClockType type)
>>          return get_clock();
>>      default:
>>      case QEMU_CLOCK_VIRTUAL:
>> -        if (icount_enabled()) {
>> -            return icount_get();
>> -        } else if (qtest_enabled()) { /* for qtest_clock_warp */
>> -            return qtest_get_virtual_clock();
>> -        } else {
>> -            return cpu_get_clock();
>> -        }
>> +        return cpus_get_virtual_clock();
>>      case QEMU_CLOCK_HOST:
>>          return REPLAY_CLOCK(REPLAY_CLOCK_HOST, get_clock_realtime());
>>      case QEMU_CLOCK_VIRTUAL_RT:
>>
> 
> 



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC v3 2/8] cpus: prepare new CpusAccel cpu accelerator interface
  2020-08-30 13:34     ` Claudio Fontana
@ 2020-08-30 16:41       ` Paolo Bonzini
  0 siblings, 0 replies; 25+ messages in thread
From: Paolo Bonzini @ 2020-08-30 16:41 UTC (permalink / raw)
  To: Claudio Fontana
  Cc: Laurent Vivier, Peter Maydell, Thomas Huth, Alberto Garcia,
	Eduardo Habkost, Colin Xu, Philippe Mathieu-Daudé,
	Marcelo Tosatti, qemu-devel, Markus Armbruster, Roman Bolshakov,
	Pavel Dovgalyuk, Wenchao Wang, haxm-team, Sunil Muthuswamy,
	Alex Bennée, Richard Henderson

On 30/08/20 15:34, Claudio Fontana wrote:
> Ciao Paolo,
> 
> just a ping on this one, it would seem that qemu_clock_get_ns needs to be called before
> any accelerator is initialized, before ticks are enabled, as part of qcow2 initialization.
> 
> I could add a check specifically for this and a comment in the cpus_get_virtual_clock(), but do you have any thoughts?

I think you could always return 0 before the accelerator is initialized;
the CPUs haven't started yet so the return value must be 0.

However, I wonder if that is already causing problems with live
migration (where the QEMU_CLOCK_VIRTUAL jumps from 0 to a possibly high
value after migration is completed).  So independent of this series,
perhaps QEMU_CLOCK_REALTIME should be used instead.  CCing Berto.

Paolo



^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2020-08-30 16:42 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-03  9:05 [RFC v3 0/8] QEMU cpus.c refactoring part2 Claudio Fontana
2020-08-03  9:05 ` [RFC v3 1/8] cpu-timers, icount: new modules Claudio Fontana
2020-08-04  8:13   ` Claudio Fontana
2020-08-04  8:23     ` Paolo Bonzini
2020-08-03  9:05 ` [RFC v3 2/8] cpus: prepare new CpusAccel cpu accelerator interface Claudio Fontana
2020-08-05  8:40   ` Claudio Fontana
2020-08-05  8:47     ` Paolo Bonzini
2020-08-05  8:50       ` Claudio Fontana
2020-08-11  8:59   ` Roman Bolshakov
2020-08-11 10:57     ` Claudio Fontana
2020-08-20  8:17   ` Claudio Fontana
2020-08-30 13:34     ` Claudio Fontana
2020-08-30 16:41       ` Paolo Bonzini
2020-08-03  9:05 ` [RFC v3 3/8] cpus: extract out TCG-specific code to accel/tcg Claudio Fontana
2020-08-03  9:05 ` [RFC v3 4/8] cpus: extract out qtest-specific code to accel/qtest Claudio Fontana
2020-08-03  9:05 ` [RFC v3 5/8] cpus: extract out kvm-specific code to accel/kvm Claudio Fontana
2020-08-03  9:05 ` [RFC v3 6/8] cpus: extract out hax-specific code to target/i386/ Claudio Fontana
2020-08-03  9:05 ` [RFC v3 7/8] cpus: extract out whpx-specific " Claudio Fontana
2020-08-03  9:05 ` [RFC v3 8/8] cpus: extract out hvf-specific code to target/i386/hvf/ Claudio Fontana
2020-08-11  9:00   ` Roman Bolshakov
2020-08-11 13:42     ` Claudio Fontana
2020-08-11 14:28       ` Claudio Fontana
2020-08-03  9:40 ` [RFC v3 0/8] QEMU cpus.c refactoring part2 Paolo Bonzini
2020-08-03 11:48 ` Alex Bennée
2020-08-05 17:03   ` Claudio Fontana

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.