All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v5 00/33] MTTCG Base Enabling patches with ARM on x86 defaults
@ 2016-10-27 15:09 Alex Bennée
  2016-10-27 15:09 ` [Qemu-devel] [PATCH v5 01/33] cpus: make all_vcpus_paused() return bool Alex Bennée
                   ` (33 more replies)
  0 siblings, 34 replies; 48+ messages in thread
From: Alex Bennée @ 2016-10-27 15:09 UTC (permalink / raw)
  To: pbonzini
  Cc: qemu-devel, mttcg, fred.konrad, a.rigo, cota, bobby.prani,
	nikunj, mark.burton, jan.kiszka, serge.fdrv, rth, peter.maydell,
	claudio.fontana, Alex Bennée

This is the fifth iteration of the MTTCG patches and I'm finally
dropping the RFC tag from the series. Previous versions had suffered
from hangs which have been fixed by the additional cputlb fixes. A lot
of races where identified and fixed using ThreadSanitizer (although a
chunk of those fixes will come in a separate series).

I'm hoping to get this into 2.8 although if the maintainers aren't
quite ready to take the full tree I'd appreciate cherry picking a good
chunk of the clean-up patches to reduce the delta we need to hold over
some of the work to the 2.9 cycle. This series enables MTTCG for ARM
guests on x86_64 hosts by default.

Prerequisites
=============

Most of the pre-requisites have already been merged. The final one is
a solution for atomic instruction emulation. This series has been
based on v7 of Emilo & Richard's cmpxchg based atomics series. Once
that is merged this series should apply cleanly.

You can find the base of my tree at:

  https://github.com/stsquad/qemu/tree/mttcg/cmpxchg-atomics-v7-prepull

Changes
=======

Since the last posting there have been a number of updates to the
original patches:

   - usual update of r-b tags
   - fixed bunch of races identified by ThreadSanitizer
   - updated the single-threaded kick timer as per review comments
   - a bunch of BQL asserts (IRQ processing)
   - use of parallel_cpus/tb_flush to ensure correct codegen
   - cputlb updates for atomic setting of dirty flags
   - cputlb fixes where work was not being deferred to async safe work

It introduces a new patch to add run_on_cpu_data as a type for the
*_run_on_cpu functions. The main aim is to ensure a target pointer
(i.e. target_ulong) can always be passed in one argument even when
emulating 64 bit targets on a 32 bit build.

Finally there are some ARM specific updates:

   - cpu_reset is deferred to async work
   - arm specific messing to TLB removed
   - BQL taken for ARM_CP_IO register access
   - some helpers take BQL

The last two patches expand on the approach we take for device
emulation through MMIO. Any case where the emulation may touch global
state (device emulation, cross-vCPU) needs to take the BQL. Simple
helper functions which only update their own cpu->env are not
affected.

Testing on additional hardware models would be useful although pretty
much any MMIO device is already protected by the BQL. The ARM_CP_IO
registers where a little special as they updated the GIC which needed
locking for serialisation.

As usual the patches themselves have a revision summary under the ---

A copy of the tree can be found at:

  https://github.com/stsquad/qemu/tree/mttcg/base-patches-v5


Testing
=======

I've tested this boots ARMv7/ARMv8 Debian with a repeating compile
test load (which previously would trigger cputlb races) as well as
both ARMv7 and v8 kvm-unit-tests with both:

  -accel tcg,thread=single

and:

  -accel tcg,thread=multi

Performance
===========

The following was measured with my boot+build benchmark:

 $QEMU_BIN -machine type=virt -display none -m 4096 \
   -cpu $CPU -serial telnet:127.0.0.1:4444 -monitor stdio \
   -netdev user,id=unet,hostfwd=tcp::2222-:22 \
   -device virtio-net-device,netdev=unet \
   -drive file=${JESSIE}.qcow2,id=myblock,index=0,if=none,snapshot=on \
   -device virtio-blk-device,drive=myblock
   -append "console=ttyAMA0 root=/dev/vda1 systemd.unit=benchmark-build.service" \
   -kernel ${KERNEL} -name debug-threads=on \
   -machine gic-version=3 -accel tcg,thread=multi -smp @

My Desktop (i7, 4+4)

| smp | armv7, single | armv7, multi |    x | armv8, single | armv8, multi |    x |
|-----+---------------+--------------+------+---------------+--------------+------|
|   1 |       224.035 |      224.010 | 1.00 |       397.285 |      399.456 | 0.99 |
|   2 |       231.043 |      125.923 | 1.83 |       415.307 |      225.760 | 1.84 |
|   3 |       235.548 |       94.837 | 2.48 |       422.565 |      170.647 | 2.48 |
|   4 |       239.403 |       81.145 | 2.95 |       432.743 |      146.869 | 2.95 |
|   5 |       243.107 |       81.045 | 3.00 |       435.414 |      146.367 | 2.97 |
|   6 |       249.164 |       78.742 | 3.16 |       445.176 |      143.415 | 3.10 |

Alex

Alex Bennée (28):
  cpus: make all_vcpus_paused() return bool
  translate_all: DEBUG_FLUSH -> DEBUG_TB_FLUSH
  translate-all: add DEBUG_LOCKING asserts
  cpu-exec: include cpu_index in CPU_LOG_EXEC messages
  docs: new design document multi-thread-tcg.txt (DRAFTING)
  linux-user/elfload: ensure mmap_lock() held while setting up
  translate-all: Add assert_(memory|tb)_lock annotations
  target-arm/arm-powerctl: wake up sleeping CPUs
  tcg: move tcg_exec_all and helpers above thread fn
  tcg: cpus rm tcg_exec_all()
  tcg: add kick timer for single-threaded vCPU emulation
  tcg: rename tcg_current_cpu to tcg_current_rr_cpu
  cpus: re-factor out handle_icount_deadline
  tcg: remove global exit_request
  tcg: move locking for tb_invalidate_phys_page_range up
  tcg: enable tb_lock() for SoftMMU
  tcg: enable thread-per-vCPU
  atomic: introduce cmpxchg_bool
  *_run_on_cpu: introduce run_on_cpu_data type
  cputlb: add assert_cpu_is_self checks
  cputlb: tweak qemu_ram_addr_from_host_nofail reporting
  cputlb: atomically update tlb fields used by tlb_reset_dirty
  cputlb: make tlb_flush_by_mmuidx safe for MTTCG
  target-arm/powerctl: defer cpu reset work to CPU context
  target-arm/cpu: don't reset TLB structures, use cputlb to do it
  target-arm: ensure BQL taken for ARM_CP_IO register access
  target-arm: helpers which may affect global state need the BQL
  tcg: enable MTTCG by default for ARM on x86 hosts

Jan Kiszka (1):
  tcg: drop global lock during TCG code execution

KONRAD Frederic (3):
  tcg: protect translation related stuff with tb_lock.
  tcg: add options for enabling MTTCG
  cputlb: introduce tlb_flush_* async work.

Paolo Bonzini (1):
  tcg: comment on which functions have to be called with tb_lock held

 bsd-user/mmap.c                 |   5 +
 configure                       |  12 +
 cpu-exec-common.c               |   3 -
 cpu-exec.c                      |  48 ++--
 cpus-common.c                   |   9 +-
 cpus.c                          | 544 ++++++++++++++++++++++++++--------------
 cputlb.c                        | 400 +++++++++++++++++++++++------
 default-configs/arm-softmmu.mak |   2 +
 docs/multi-thread-tcg.txt       | 310 +++++++++++++++++++++++
 exec.c                          |  28 +++
 hw/core/irq.c                   |   1 +
 hw/i386/kvm/apic.c              |  14 +-
 hw/i386/kvmvapic.c              |  17 +-
 hw/intc/arm_gicv3_cpuif.c       |   3 +
 hw/ppc/ppce500_spin.c           |   6 +-
 hw/ppc/spapr.c                  |   7 +-
 hw/ppc/spapr_hcall.c            |  12 +-
 include/exec/cputlb.h           |   2 -
 include/exec/exec-all.h         |   7 +-
 include/qemu/atomic.h           |   9 +
 include/qom/cpu.h               |  51 +++-
 include/sysemu/cpus.h           |   2 +
 kvm-all.c                       |  20 +-
 linux-user/elfload.c            |   4 +
 linux-user/mmap.c               |   5 +
 memory.c                        |   2 +
 qemu-options.hx                 |  20 ++
 qom/cpu.c                       |  10 +
 target-arm/Makefile.objs        |   2 +-
 target-arm/arm-powerctl.c       | 142 ++++++-----
 target-arm/cpu.c                |   6 +
 target-arm/helper.c             |   6 +
 target-arm/op_helper.c          |  43 +++-
 target-i386/helper.c            |   8 +-
 target-i386/kvm.c               |   4 +-
 target-i386/smm_helper.c        |   7 +
 target-s390x/cpu.c              |   4 +-
 target-s390x/cpu.h              |   4 +-
 target-s390x/misc_helper.c      |   9 +-
 tcg/tcg.h                       |   2 +
 translate-all.c                 | 192 +++++++++++---
 translate-common.c              |  21 +-
 vl.c                            |  49 +++-
 43 files changed, 1590 insertions(+), 462 deletions(-)
 create mode 100644 docs/multi-thread-tcg.txt

-- 
2.10.1

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2016-11-01 16:53 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-27 15:09 [Qemu-devel] [PATCH v5 00/33] MTTCG Base Enabling patches with ARM on x86 defaults Alex Bennée
2016-10-27 15:09 ` [Qemu-devel] [PATCH v5 01/33] cpus: make all_vcpus_paused() return bool Alex Bennée
2016-10-27 15:09 ` [Qemu-devel] [PATCH v5 02/33] translate_all: DEBUG_FLUSH -> DEBUG_TB_FLUSH Alex Bennée
2016-10-27 15:10 ` [Qemu-devel] [PATCH v5 03/33] translate-all: add DEBUG_LOCKING asserts Alex Bennée
2016-10-27 15:10 ` [Qemu-devel] [PATCH v5 04/33] cpu-exec: include cpu_index in CPU_LOG_EXEC messages Alex Bennée
2016-10-27 15:10 ` [Qemu-devel] [PATCH v5 05/33] docs: new design document multi-thread-tcg.txt (DRAFTING) Alex Bennée
2016-10-27 15:10 ` [Qemu-devel] [PATCH v5 06/33] tcg: comment on which functions have to be called with tb_lock held Alex Bennée
2016-10-27 15:10 ` [Qemu-devel] [PATCH v5 07/33] linux-user/elfload: ensure mmap_lock() held while setting up Alex Bennée
2016-10-27 15:10 ` [Qemu-devel] [PATCH v5 08/33] translate-all: Add assert_(memory|tb)_lock annotations Alex Bennée
2016-10-27 15:10 ` [Qemu-devel] [PATCH v5 09/33] tcg: protect translation related stuff with tb_lock Alex Bennée
2016-10-27 15:10 ` [Qemu-devel] [PATCH v5 10/33] target-arm/arm-powerctl: wake up sleeping CPUs Alex Bennée
2016-10-27 15:10 ` [Qemu-devel] [PATCH v5 11/33] tcg: move tcg_exec_all and helpers above thread fn Alex Bennée
2016-10-27 15:10 ` [Qemu-devel] [PATCH v5 12/33] tcg: cpus rm tcg_exec_all() Alex Bennée
2016-10-27 15:10 ` [Qemu-devel] [PATCH v5 13/33] tcg: add options for enabling MTTCG Alex Bennée
2016-10-27 15:10 ` [Qemu-devel] [PATCH v5 14/33] tcg: add kick timer for single-threaded vCPU emulation Alex Bennée
2016-10-27 15:30   ` KONRAD Frederic
2016-10-27 15:35     ` Alex Bennée
2016-10-27 15:10 ` [Qemu-devel] [PATCH v5 15/33] tcg: rename tcg_current_cpu to tcg_current_rr_cpu Alex Bennée
2016-10-27 15:10 ` [Qemu-devel] [PATCH v5 16/33] tcg: drop global lock during TCG code execution Alex Bennée
2016-10-27 15:10 ` [Qemu-devel] [PATCH v5 17/33] cpus: re-factor out handle_icount_deadline Alex Bennée
2016-10-27 15:10 ` [Qemu-devel] [PATCH v5 18/33] tcg: remove global exit_request Alex Bennée
2016-10-27 15:10 ` [Qemu-devel] [PATCH v5 19/33] tcg: move locking for tb_invalidate_phys_page_range up Alex Bennée
2016-10-27 15:10 ` [Qemu-devel] [PATCH v5 20/33] tcg: enable tb_lock() for SoftMMU Alex Bennée
2016-10-27 15:10 ` [Qemu-devel] [PATCH v5 21/33] tcg: enable thread-per-vCPU Alex Bennée
2016-10-27 15:10 ` [Qemu-devel] [PATCH v5 22/33] atomic: introduce cmpxchg_bool Alex Bennée
2016-10-27 15:10 ` [PATCH v5 23/33] *_run_on_cpu: introduce run_on_cpu_data type Alex Bennée
2016-10-27 15:10   ` [Qemu-devel] " Alex Bennée
2016-10-27 15:10 ` [Qemu-devel] [PATCH v5 24/33] cputlb: add assert_cpu_is_self checks Alex Bennée
2016-10-27 15:10 ` [Qemu-devel] [PATCH v5 25/33] cputlb: introduce tlb_flush_* async work Alex Bennée
2016-10-27 15:10 ` [Qemu-devel] [PATCH v5 26/33] cputlb: tweak qemu_ram_addr_from_host_nofail reporting Alex Bennée
2016-10-27 15:10 ` [Qemu-devel] [PATCH v5 27/33] cputlb: atomically update tlb fields used by tlb_reset_dirty Alex Bennée
2016-10-27 15:10 ` [Qemu-devel] [PATCH v5 28/33] cputlb: make tlb_flush_by_mmuidx safe for MTTCG Alex Bennée
2016-11-01  5:20   ` Pranith Kumar
2016-11-01  7:45     ` Alex Bennée
2016-11-01  8:03       ` Peter Maydell
2016-11-01 13:22       ` Pranith Kumar
2016-11-01 16:53         ` Alex Bennée
2016-10-27 15:10 ` [Qemu-devel] [PATCH v5 29/33] target-arm/powerctl: defer cpu reset work to CPU context Alex Bennée
2016-10-27 15:10 ` [Qemu-devel] [PATCH v5 30/33] target-arm/cpu: don't reset TLB structures, use cputlb to do it Alex Bennée
2016-10-27 16:10   ` Richard Henderson
2016-10-28  8:38     ` Alex Bennée
2016-10-28  9:07       ` Peter Maydell
2016-10-28  9:17         ` Alex Bennée
2016-10-27 15:10 ` [Qemu-devel] [PATCH v5 31/33] target-arm: ensure BQL taken for ARM_CP_IO register access Alex Bennée
2016-10-27 15:10 ` [Qemu-devel] [PATCH v5 32/33] target-arm: helpers which may affect global state need the BQL Alex Bennée
2016-10-27 15:10 ` [Qemu-devel] [PATCH v5 33/33] tcg: enable MTTCG by default for ARM on x86 hosts Alex Bennée
2016-10-31  8:03 ` [Qemu-devel] [PATCH v5 00/33] MTTCG Base Enabling patches with ARM on x86 defaults Alex Bennée
2016-10-31  8:48   ` Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.