[PATCH for-8.0 00/29] tcg: Improve atomicity support

* [PATCH for-8.0 00/29] tcg: Improve atomicity support
@ 2022-11-18  9:47 Richard Henderson
  2022-11-18  9:47 ` [PATCH for-8.0 01/29] include/qemu/cpuid: Introduce xgetbv_low Richard Henderson
                   ` (28 more replies)
  0 siblings, 29 replies; 48+ messages in thread
From: Richard Henderson @ 2022-11-18  9:47 UTC (permalink / raw)
  To: qemu-devel

The main objective here is to support Arm FEAT_LSE2, which says that any
single memory access that does not cross a 16-byte boundary is atomic.
This is the MO_ATOM_WITHIN16 control.

While I'm touching all of this, a secondary objective is to handle the
atomicity of the IBM machines.  Both Power and s390x treat misaligned
accesses as atomic on the lsb of the pointer.  For instance, an 8-byte
access at ptr % 8 == 4 will appear as two atomic 4-byte accesses, and
ptr % 4 == 2 will appear as four 3-byte accesses.
This is the MO_ATOM_SUBALIGN control.

By default, acceses are atomic only if aligned, which is the current
behaviour of the tcg code generator (mostly, anyway, there were bugs).
This is the MO_ATOM_IFALIGN control.

Further, one can say that a large memory access is really a set of
contiguous smaller accesses, and we need not provide more atomicity
than that (modulo MO_ATOM_WITHIN16).  This is the MO_ATMAX_* control.

While I've had a go at documenting all of this, I'm certain it could
be improved -- soliciting suggestions.

r~

Based-on: 20221118091858.242569-1-richard.henderson@linaro.org
("main-loop: Introduce QEMU_IOTHREAD_LOCK_GUARD")
which itself depends on "tcg: Support for Int128 with helpers".

Richard Henderson (29):
  include/qemu/cpuid: Introduce xgetbv_low
  include/exec/memop: Add bits describing atomicity
  accel/tcg: Add cpu_in_serial_context
  accel/tcg: Introduce tlb_read_idx
  accel/tcg: Reorg system mode load helpers
  accel/tcg: Reorg system mode store helpers
  accel/tcg: Honor atomicity of loads
  accel/tcg: Honor atomicity of stores
  tcg/tci: Use cpu_{ld,st}_mmu
  tcg: Unify helper_{be,le}_{ld,st}*
  accel/tcg: Implement helper_{ld,st}*_mmu for user-only
  tcg: Add 128-bit guest memory primitives
  meson: Detect atomic128 support with optimization
  tcg/i386: Add have_atomic16
  include/qemu/int128: Add vector type to Int128Alias
  accel/tcg: Use have_atomic16 in ldst_atomicity.c.inc
  tcg/aarch64: Add have_lse, have_lse2
  accel/tcg: Add aarch64 specific support in ldst_atomicity
  tcg: Introduce TCG_OPF_TYPE_MASK
  tcg: Add INDEX_op_qemu_{ld,st}_i128
  tcg/i386: Introduce tcg_out_mov2
  tcg/i386: Introduce tcg_out_testi
  tcg/i386: Use full load/store helpers in user-only mode
  tcg/i386: Replace is64 with type in qemu_ld/st routines
  tcg/i386: Mark Win64 call-saved vector regs as reserved
  tcg/i386: Examine MemOp for atomicity and alignment
  tcg/i386: Support 128-bit load/store with have_atomic16
  tcg/i386: Add vex_v argument to tcg_out_vex_modrm_pool
  tcg/i386: Honor 64-bit atomicity in 32-bit mode

 accel/tcg/internal.h             |    5 +
 accel/tcg/tcg-runtime.h          |    3 +
 include/exec/cpu-defs.h          |    7 +-
 include/exec/cpu_ldst.h          |   26 +-
 include/exec/memop.h             |   36 +
 include/qemu/cpuid.h             |   25 +
 include/qemu/int128.h            |   10 +-
 include/tcg/tcg-ldst.h           |   70 +-
 include/tcg/tcg-opc.h            |    8 +
 include/tcg/tcg.h                |   22 +-
 tcg/aarch64/tcg-target.h         |    5 +
 tcg/arm/tcg-target.h             |    2 +
 tcg/i386/tcg-target.h            |    4 +
 tcg/loongarch64/tcg-target.h     |    2 +
 tcg/mips/tcg-target.h            |    2 +
 tcg/ppc/tcg-target.h             |    2 +
 tcg/riscv/tcg-target.h           |    2 +
 tcg/s390x/tcg-target.h           |    2 +
 tcg/sparc64/tcg-target.h         |    2 +
 tcg/tci/tcg-target.h             |    2 +
 accel/tcg/cpu-exec-common.c      |    3 +
 accel/tcg/cputlb.c               | 1884 +++++++++++++++++++-----------
 accel/tcg/tb-maint.c             |    2 +-
 accel/tcg/user-exec.c            |  478 +++++---
 tcg/optimize.c                   |   15 +-
 tcg/tcg-op.c                     |  246 ++--
 tcg/tcg.c                        |    8 +-
 tcg/tci.c                        |  127 +-
 util/bufferiszero.c              |    3 +-
 accel/tcg/ldst_atomicity.c.inc   | 1170 +++++++++++++++++++
 docs/devel/loads-stores.rst      |   36 +-
 meson.build                      |   52 +-
 tcg/README                       |   10 +-
 tcg/aarch64/tcg-target.c.inc     |   57 +-
 tcg/arm/tcg-target.c.inc         |   45 +-
 tcg/i386/tcg-target.c.inc        | 1228 +++++++++++++------
 tcg/loongarch64/tcg-target.c.inc |   25 +-
 tcg/mips/tcg-target.c.inc        |   40 +-
 tcg/ppc/tcg-target.c.inc         |   30 +-
 tcg/riscv/tcg-target.c.inc       |   51 +-
 tcg/s390x/tcg-target.c.inc       |   38 +-
 tcg/sparc64/tcg-target.c.inc     |   37 +-
 tcg/tci/tcg-target.c.inc         |    3 +-
 43 files changed, 4145 insertions(+), 1680 deletions(-)
 create mode 100644 accel/tcg/ldst_atomicity.c.inc

-- 
2.34.1

^ permalink raw reply	[flat|nested] 48+ messages in thread