[Qemu-devel] [PATCH v7 00/27] Remaining MTTCG Base patches and ARM enablement

* [Qemu-devel] [PATCH v7 00/27] Remaining MTTCG Base patches and ARM enablement
@ 2017-01-19 17:04 Alex Bennée
  2017-01-19 17:04 ` [Qemu-devel] [PATCH v7 01/27] docs: new design document multi-thread-tcg.txt Alex Bennée
                   ` (26 more replies)
  0 siblings, 27 replies; 51+ messages in thread
From: Alex Bennée @ 2017-01-19 17:04 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, cota, bobby.prani, nikunj
  Cc: mark.burton, pbonzini, jan.kiszka, serge.fdrv, rth,
	peter.maydell, claudio.fontana, bamvor.zhangjian,
	Alex Bennée

Hi,

Here we go with another iteration of the MTTCG patches and I think it
is feature complete for at least ARMv7/v8 on x86 hosts.

One of the big changes was to address the concerns about TLB flush
semantics. We introduce a number of new tlb_flush_*_all helpers which
the guests can call instead of iterating through all the vCPUs
themselves. Crucially these helpers have a flag which indicates if the
flush is to complete with respect to the issuing vCPU. In this case
the run-loop is exited, all vCPUs halt and drain their work queues
before everything is restarted again. The calling vCPU needs to ensure
the PC will be correct for the restart which is done in ARMs case with
ARM_CP_EXIT_PC tags on the TLB flush helpers. I've added a new test
case (tlbflush-data) to my kvm-unit-tests which can demonstrate a race
condition if this is not the case.

I did consider optimising the flushes by deferring the completion
until the architecturally defined barrier operations but given the
flush only really shows up in my super aggressive micro-benchmarks it
seemed a lot of complexity for little gain. We can always revisit this
later.

There has been some more cleanup to the cputlb code which deals with
the atomic updating of flags. One consequence of the clean-up is we
explicitly disable MTTCG for 64bit guests on 32bit hosts. While the
most common host (x86) can have support for oversized atomics greater
than the natural word length it seemed a bit too fiddly to work around
so for now we just disable MTTCG for this combination.

Another change is to the default handling for turning on MTTCG. The
TARGET (guest) needs to set the TARGET_SUPPORTS_MTTCG once all the
requisite changes have been made to the model. As all the TCG_TARGETS
(host backends) support the appropriate barrier and atomic semantics
we know we can enable if the default memory model (i.e. the implicit
barriers in normal load/stores) is stronger than the guests. In this
case I've only declared the memory models for the ARM frontend and x86
backend as that is what I've tested but once we have tested on other
architectures the changes are fairly minor. In the meantime you can
still force MTTCG on at the command line.

Pranith sent a number of small fixes to debugging, cpu_exec_step and
EXCP_ATOMIC handling which I've folded into the series.

The rest of the changes are documented as usual bellow --- in each
patch.

The series applies to origin/master as of today and you can find my
tree at:

  https://github.com/stsquad/qemu/tree/mttcg/base-patches-v7

As usual review comments, testing and question welcome. I'm hoping we
are in good shape to get this merged this development cycle.

Cheers,

Alex

Alex Bennée (21):
  docs: new design document multi-thread-tcg.txt
  tcg: move TCG_MO/BAR types into own file
  tcg: add kick timer for single-threaded vCPU emulation
  tcg: rename tcg_current_cpu to tcg_current_rr_cpu
  tcg: remove global exit_request
  tcg: enable tb_lock() for SoftMMU
  tcg: enable thread-per-vCPU
  cputlb: add assert_cpu_is_self checks
  cputlb: tweak qemu_ram_addr_from_host_nofail reporting
  cputlb: add tlb_flush_by_mmuidx async routines
  cputlb: atomically update tlb fields used by tlb_reset_dirty
  cputlb: introduce tlb_flush_*_all_cpus
  target-arm/powerctl: defer cpu reset work to CPU context
  target-arm: ensure BQL taken for ARM_CP_IO register access
  target-arm: helpers which may affect global state need the BQL
  target-arm: don't generate WFE/YIELD calls for MTTCG
  target-arm/cpu.h: make ARM_CP defined consistent
  target-arm: introduce ARM_CP_EXIT_PC
  target-arm: ensure all cross vCPUs TLB flushes complete
  tcg: enable MTTCG by default for ARM on x86 hosts
  target-ppc: take global mutex for set_irq

Jan Kiszka (1):
  tcg: drop global lock during TCG code execution

KONRAD Frederic (2):
  tcg: add options for enabling MTTCG
  cputlb: introduce tlb_flush_* async work.

Pranith Kumar (3):
  mttcg: translate-all: Enable locking debug in a debug build
  mttcg: Add missing tb_lock/unlock() in cpu_exec_step()
  tcg: handle EXCP_ATOMIC exception for system emulation

 configure                  |   6 +
 cpu-exec-common.c          |   3 -
 cpu-exec.c                 |  41 ++--
 cpus.c                     | 342 ++++++++++++++++++++++++-------
 cputlb.c                   | 487 ++++++++++++++++++++++++++++++++++++++-------
 docs/multi-thread-tcg.txt  | 350 ++++++++++++++++++++++++++++++++
 exec.c                     |  12 +-
 hw/core/irq.c              |   1 +
 hw/i386/kvmvapic.c         |   4 +-
 hw/intc/arm_gicv3_cpuif.c  |   3 +
 hw/ppc/ppc.c               |  16 +-
 hw/ppc/spapr.c             |   3 +
 include/exec/cputlb.h      |   2 -
 include/exec/exec-all.h    |  68 ++++++-
 include/qom/cpu.h          |  16 ++
 include/sysemu/cpus.h      |   2 +
 memory.c                   |   2 +
 qemu-options.hx            |  20 ++
 qom/cpu.c                  |  10 +
 target/arm/arm-powerctl.c  | 146 ++++++++------
 target/arm/cpu.h           |  32 +--
 target/arm/helper.c        | 200 +++++++++----------
 target/arm/op_helper.c     |  50 ++++-
 target/arm/translate-a64.c |  12 +-
 target/arm/translate.c     |  24 ++-
 target/i386/smm_helper.c   |   7 +
 target/s390x/misc_helper.c |   5 +-
 tcg/i386/tcg-target.h      |  16 ++
 tcg/tcg-mo.h               |  45 +++++
 tcg/tcg.h                  |  27 +--
 translate-all.c            |  66 ++----
 translate-common.c         |  21 +-
 vl.c                       |  49 ++++-
 33 files changed, 1645 insertions(+), 443 deletions(-)
 create mode 100644 docs/multi-thread-tcg.txt
 create mode 100644 tcg/tcg-mo.h

-- 
2.11.0

^ permalink raw reply	[flat|nested] 51+ messages in thread