All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC v4 00/28] Base enabling patches for MTTCG
@ 2016-08-11 15:23 Alex Bennée
  2016-08-11 15:23 ` [Qemu-devel] [RFC v4 01/28] cpus: make all_vcpus_paused() return bool Alex Bennée
                   ` (29 more replies)
  0 siblings, 30 replies; 68+ messages in thread
From: Alex Bennée @ 2016-08-11 15:23 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, cota, bobby.prani, nikunj
  Cc: mark.burton, pbonzini, jan.kiszka, serge.fdrv, rth,
	peter.maydell, claudio.fontana, Alex Bennée

This is the fourth iteration of the RFC patch set which aims to
provide the basic framework for MTTCG. I hope this will provide a good
base for discussion at KVM Forum later this month.

Prerequisites
=============

This tree has been built on top of two other series of patches:

  - Reduce lock contention on TCG hot-path (v5, in Paolo's tree)
  - cpu-exec: Safe work in quiescent state (v5, in my tree)

You can find the base tree (based off -rc0) at:

  https://github.com/stsquad/qemu/tree/mttcg/async-safe-work-v5

Changes
=======

Since the last posting there have been a number of updates to the
original patches:

   - more updates to docs/multi-thread-tcg.txt design document
   - clean ups of sleep handling (and safe work integration)
   - split the big enable-multi-thread patch
   - split some re-factoring movement stuff into individual patches

As usual the patches themselves have a revision summary under the ---

In addition I've brought forward a number of changes from the original
ARM enabling patches to support the various cputlb operations which
are basically generic anyway. These include:

   - making cross-vCPU tlb_flush operations use async_run_on_cpu
   - making tlb_reset_dirty_range atomically apply the TLB_NOTDIRTY flag

A copy of the tree can be found at:

  https://github.com/stsquad/qemu/tree/mttcg/base-patches-v4

The series includes all the generic work needed and in theory just
needs MTTCG aware atomics and memory barriers for the various
host/guest combinations to be enabled by default.

In practice the memory barrier problems don't show up with an x86
host. In fact I have created a tree which merges in the Emilio's
cmpxchg atomics which happily boots ARMv7 Debian systems without any
additional changes. You can find that at:

  https://github.com/stsquad/qemu/tree/mttcg/base-patches-v4-with-cmpxchg-atomics-v2


Testing
=======

I've tested this boots ARMv7 Debian and all both ARMv7 and v8 kvm-unit-tests with:

  -accel tcg,thread=single

In addition I've tested ARMv7 and ARMv8 kvm-unit-tests of the tcg and
tlbflush group with:

  -accel tcg,thread=multi

These tests are safe as they don't rely on atomics to be work but do
exercise the parallel execution, invalidation and flushing of code.
The full invocation of all the tests is:

  echo "Running all tests in Single Thread Mode"
  ./run_tests.sh -t -o "-accel tcg,thread=single -name debug-threads=on"
  echo "Running tlbflush in Multi Thread Mode"
  ./run_tests.sh -t -g tlbflush -o "-accel tcg,thread=multi -name debug-threads=on"
  echo "Running TCG in Multi Thread Mode"
  ./run_tests.sh -t -g tcg -o "-accel tcg,thread=multi -name debug-threads=on"


Performance
===========

You can't do full work-load testing on this tree due to the lack of
atomic support (but I will run some numbers on
mttcg/base-patches-v4-with-cmpxchg-atomics-v2). However you certainly
see a run time improvement with the kvm-unit-tests TCG group.

  retry.py called with ['./run_tests.sh', '-t', '-g', 'tcg', '-o', '-accel tcg,thread=single']
  run 1: ret=0 (PASS), time=1047.147924 (1/1)
  run 2: ret=0 (PASS), time=1071.921204 (2/2)
  run 3: ret=0 (PASS), time=1048.141600 (3/3)
  Results summary:
  0: 3 times (100.00%), avg time 1055.737 (196.70 varience/14.02 deviation)
  Ran command 3 times, 3 passes
  retry.py called with ['./run_tests.sh', '-t', '-g', 'tcg', '-o', '-accel tcg,thread=multi']
  run 1: ret=0 (PASS), time=303.074210 (1/1)
  run 2: ret=0 (PASS), time=304.574991 (2/2)
  run 3: ret=0 (PASS), time=303.327408 (3/3)
  Results summary:
  0: 3 times (100.00%), avg time 303.659 (0.65 varience/0.80 deviation)
  Ran command 3 times, 3 passes

The TCG tests run with -smp 4 on my system. While the TCG tests are
purely CPU bound they do exercise the hot and cold paths of TCG
execution (especially when triggering SMC detection). However there is
still a benefit even with a 50% overhead compared to the ideal 263
second elapsed time.

Alex

Alex Bennée (23):
  cpus: make all_vcpus_paused() return bool
  translate_all: DEBUG_FLUSH -> DEBUG_TB_FLUSH
  translate-all: add DEBUG_LOCKING asserts
  cpu-exec: include cpu_index in CPU_LOG_EXEC messages
  docs: new design document multi-thread-tcg.txt (DRAFTING)
  linux-user/elfload: ensure mmap_lock() held while setting up
  translate-all: Add assert_(memory|tb)_lock annotations
  target-arm/arm-powerctl: wake up sleeping CPUs
  tcg: move tcg_exec_all and helpers above thread fn
  tcg: cpus rm tcg_exec_all()
  tcg: add kick timer for single-threaded vCPU emulation
  tcg: rename tcg_current_cpu to tcg_current_rr_cpu
  cpus: re-factor out handle_icount_deadline
  tcg: remove global exit_request
  tcg: move locking for tb_invalidate_phys_page_range up
  cpus: tweak sleeping and safe_work rules for MTTCG
  tcg: enable tb_lock() for SoftMMU
  tcg: enable thread-per-vCPU
  atomic: introduce cmpxchg_bool
  cputlb: add assert_cpu_is_self checks
  cputlb: tweak qemu_ram_addr_from_host_nofail reporting
  cputlb: make tlb_reset_dirty safe for MTTCG
  cputlb: make tlb_flush_by_mmuidx safe for MTTCG

Jan Kiszka (1):
  tcg: drop global lock during TCG code execution

KONRAD Frederic (3):
  tcg: protect TBContext with tb_lock.
  tcg: add options for enabling MTTCG
  cputlb: introduce tlb_flush_* async work.

Paolo Bonzini (1):
  tcg: comment on which functions have to be called with tb_lock held

 bsd-user/mmap.c           |   5 +
 cpu-exec-common.c         |  19 +-
 cpu-exec.c                |  41 ++--
 cpus.c                    | 510 +++++++++++++++++++++++++++++-----------------
 cputlb.c                  | 279 ++++++++++++++++++-------
 docs/multi-thread-tcg.txt | 310 ++++++++++++++++++++++++++++
 exec.c                    |  28 +++
 hw/i386/kvmvapic.c        |   4 +
 include/exec/cputlb.h     |   2 -
 include/exec/exec-all.h   |   5 +-
 include/qemu/atomic.h     |   9 +
 include/qom/cpu.h         |  27 +++
 include/sysemu/cpus.h     |   2 +
 linux-user/elfload.c      |   4 +
 linux-user/mmap.c         |   5 +
 memory.c                  |   2 +
 qemu-options.hx           |  20 ++
 qom/cpu.c                 |  10 +
 softmmu_template.h        |  17 ++
 target-arm/Makefile.objs  |   2 +-
 target-arm/arm-powerctl.c |   2 +
 target-i386/smm_helper.c  |   7 +
 tcg/tcg.h                 |   2 +
 translate-all.c           | 175 +++++++++++++---
 vl.c                      |  48 ++++-
 25 files changed, 1227 insertions(+), 308 deletions(-)
 create mode 100644 docs/multi-thread-tcg.txt

-- 
2.7.4

^ permalink raw reply	[flat|nested] 68+ messages in thread
[parent not found: <mailman.11856.1470929072.26858.qemu-devel@nongnu.org>]

end of thread, other threads:[~2016-09-27 16:16 UTC | newest]

Thread overview: 68+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-11 15:23 [Qemu-devel] [RFC v4 00/28] Base enabling patches for MTTCG Alex Bennée
2016-08-11 15:23 ` [Qemu-devel] [RFC v4 01/28] cpus: make all_vcpus_paused() return bool Alex Bennée
2016-08-11 15:23 ` [Qemu-devel] [RFC v4 02/28] translate_all: DEBUG_FLUSH -> DEBUG_TB_FLUSH Alex Bennée
2016-08-11 15:23 ` [Qemu-devel] [RFC v4 03/28] translate-all: add DEBUG_LOCKING asserts Alex Bennée
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 04/28] cpu-exec: include cpu_index in CPU_LOG_EXEC messages Alex Bennée
2016-09-07  2:21   ` Richard Henderson
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 05/28] docs: new design document multi-thread-tcg.txt (DRAFTING) Alex Bennée
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 06/28] tcg: comment on which functions have to be called with tb_lock held Alex Bennée
2016-09-07  2:30   ` Richard Henderson
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 07/28] linux-user/elfload: ensure mmap_lock() held while setting up Alex Bennée
2016-09-07  2:34   ` Richard Henderson
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 08/28] translate-all: Add assert_(memory|tb)_lock annotations Alex Bennée
2016-09-07  2:41   ` Richard Henderson
2016-09-07  7:08     ` Alex Bennée
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 09/28] tcg: protect TBContext with tb_lock Alex Bennée
2016-09-07  2:48   ` Richard Henderson
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 10/28] target-arm/arm-powerctl: wake up sleeping CPUs Alex Bennée
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 11/28] tcg: move tcg_exec_all and helpers above thread fn Alex Bennée
2016-09-07  2:53   ` Richard Henderson
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 12/28] tcg: cpus rm tcg_exec_all() Alex Bennée
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 13/28] tcg: add options for enabling MTTCG Alex Bennée
2016-09-07  3:06   ` Richard Henderson
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 14/28] tcg: add kick timer for single-threaded vCPU emulation Alex Bennée
2016-09-07  3:25   ` Richard Henderson
2016-09-07  5:40     ` Paolo Bonzini
2016-09-07 10:15       ` Alex Bennée
2016-09-07 10:19     ` Alex Bennée
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 15/28] tcg: rename tcg_current_cpu to tcg_current_rr_cpu Alex Bennée
2016-09-07  3:34   ` Richard Henderson
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 16/28] tcg: drop global lock during TCG code execution Alex Bennée
2016-09-07  4:03   ` Richard Henderson
2016-09-07  5:43     ` Paolo Bonzini
2016-09-07  6:43       ` Richard Henderson
2016-09-07 15:15         ` Paolo Bonzini
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 17/28] cpus: re-factor out handle_icount_deadline Alex Bennée
2016-09-07  4:06   ` Richard Henderson
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 18/28] tcg: remove global exit_request Alex Bennée
2016-09-07  4:11   ` Richard Henderson
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 19/28] tcg: move locking for tb_invalidate_phys_page_range up Alex Bennée
2016-09-27 15:56   ` Paolo Bonzini
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 20/28] cpus: tweak sleeping and safe_work rules for MTTCG Alex Bennée
2016-09-07  4:22   ` Richard Henderson
2016-09-07 10:05   ` Paolo Bonzini
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 21/28] tcg: enable tb_lock() for SoftMMU Alex Bennée
2016-09-07  4:26   ` Richard Henderson
2016-09-27 16:16   ` Paolo Bonzini
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 22/28] tcg: enable thread-per-vCPU Alex Bennée
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 23/28] atomic: introduce cmpxchg_bool Alex Bennée
2016-09-08  0:12   ` Richard Henderson
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 24/28] cputlb: add assert_cpu_is_self checks Alex Bennée
2016-09-08 17:19   ` Richard Henderson
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 25/28] cputlb: introduce tlb_flush_* async work Alex Bennée
2016-09-07 10:08   ` Paolo Bonzini
2016-09-08 17:23   ` Richard Henderson
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 26/28] cputlb: tweak qemu_ram_addr_from_host_nofail reporting Alex Bennée
2016-09-08 17:24   ` Richard Henderson
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 27/28] cputlb: make tlb_reset_dirty safe for MTTCG Alex Bennée
2016-09-08 17:34   ` Richard Henderson
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 28/28] cputlb: make tlb_flush_by_mmuidx " Alex Bennée
2016-09-07 10:09   ` Paolo Bonzini
2016-09-08 17:54   ` Richard Henderson
2016-08-11 17:22 ` [Qemu-devel] [RFC v4 00/28] Base enabling patches " Alex Bennée
2016-08-12  8:02   ` Alex Bennée
2016-09-06  9:24 ` Alex Bennée
     [not found] <mailman.11856.1470929072.26858.qemu-devel@nongnu.org>
2016-08-11 16:43 ` G 3
2016-08-12 13:19   ` Alex Bennée
2016-08-12 13:31     ` G 3
2016-08-12 15:01       ` Alex Bennée

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.