[PATCH v2 00/30] tcg: Improve atomicity support

* [PATCH v2 00/30] tcg: Improve atomicity support
@ 2023-02-16  2:57 Richard Henderson
  2023-02-16  2:57 ` [PATCH v2 01/30] include/qemu/cpuid: Introduce xgetbv_low Richard Henderson
                   ` (29 more replies)
  0 siblings, 30 replies; 42+ messages in thread
From: Richard Henderson @ 2023-02-16  2:57 UTC (permalink / raw)
  To: qemu-devel

Version 1 was back in November:
https://lore.kernel.org/qemu-devel/20221118094754.242910-1-richard.henderson@linaro.org/

Prerequisites, and there were many, are now upstream.
Changes are too many to mention.  But at least I've fixed
the clang and darwin build problems Phil reported.

The main objective here is to support Arm FEAT_LSE2, which says that any
single memory access that does not cross a 16-byte boundary is atomic.
This is the MO_ATOM_WITHIN16 control.

While I'm touching all of this, a secondary objective is to handle the
atomicity of the IBM machines.  Both Power and s390x treat misaligned
accesses as atomic on the lsb of the pointer.  For instance, an 8-byte
access at ptr % 8 == 4 will appear as two atomic 4-byte accesses, and
ptr % 4 == 2 will appear as four 3-byte accesses.
This is the MO_ATOM_SUBALIGN control.

By default, acceses are atomic only if aligned, which is the current
behaviour of the tcg code generator (mostly, anyway, there were bugs).
This is the MO_ATOM_IFALIGN control.

Further, one can say that a large memory access is really a set of
contiguous smaller accesses, and we need not provide more atomicity
than that (modulo MO_ATOM_WITHIN16).  This is the MO_ATMAX_* control.

While I've had a go at documenting all of this, I'm certain it could
be improved -- soliciting suggestions.

r~

Richard Henderson (30):
  include/qemu/cpuid: Introduce xgetbv_low
  include/exec/memop: Add bits describing atomicity
  accel/tcg: Add cpu_in_serial_context
  accel/tcg: Introduce tlb_read_idx
  accel/tcg: Reorg system mode load helpers
  accel/tcg: Reorg system mode store helpers
  accel/tcg: Honor atomicity of loads
  accel/tcg: Honor atomicity of stores
  tcg/tci: Use cpu_{ld,st}_mmu
  tcg: Unify helper_{be,le}_{ld,st}*
  accel/tcg: Implement helper_{ld,st}*_mmu for user-only
  tcg: Add 128-bit guest memory primitives
  meson: Detect atomic128 support with optimization
  tcg/i386: Add have_atomic16
  accel/tcg: Use have_atomic16 in ldst_atomicity.c.inc
  accel/tcg: Add aarch64 specific support in ldst_atomicity
  tcg/aarch64: Detect have_lse, have_lse2 for linux
  tcg/aarch64: Detect have_lse, have_lse2 for darwin
  accel/tcg: Add have_lse2 support in ldst_atomicity
  tcg: Introduce TCG_OPF_TYPE_MASK
  tcg: Add INDEX_op_qemu_{ld,st}_i128
  tcg/i386: Introduce tcg_out_mov2
  tcg/i386: Introduce tcg_out_testi
  tcg/i386: Use full load/store helpers in user-only mode
  tcg/i386: Replace is64 with type in qemu_ld/st routines
  tcg/i386: Mark Win64 call-saved vector regs as reserved
  tcg/i386: Examine MemOp for atomicity and alignment
  tcg/i386: Support 128-bit load/store with have_atomic16
  tcg/i386: Add vex_v argument to tcg_out_vex_modrm_pool
  tcg/i386: Honor 64-bit atomicity in 32-bit mode

 docs/devel/loads-stores.rst      |   36 +-
 docs/devel/tcg-ops.rst           |   11 +-
 meson.build                      |   52 +-
 accel/tcg/internal.h             |    5 +
 accel/tcg/tcg-runtime.h          |    3 +
 include/exec/cpu-defs.h          |    7 +-
 include/exec/cpu_ldst.h          |   26 +-
 include/exec/memop.h             |   36 +
 include/qemu/cpuid.h             |   25 +
 include/tcg/tcg-ldst.h           |   70 +-
 include/tcg/tcg-opc.h            |    8 +
 include/tcg/tcg.h                |   22 +-
 tcg/aarch64/tcg-target.h         |    5 +
 tcg/arm/tcg-target.h             |    2 +
 tcg/i386/tcg-target.h            |    4 +
 tcg/loongarch64/tcg-target.h     |    1 +
 tcg/mips/tcg-target.h            |    2 +
 tcg/ppc/tcg-target.h             |    2 +
 tcg/riscv/tcg-target.h           |    2 +
 tcg/s390x/tcg-target.h           |    2 +
 tcg/sparc64/tcg-target.h         |    2 +
 tcg/tci/tcg-target.h             |    2 +
 accel/tcg/cpu-exec-common.c      |    3 +
 accel/tcg/cputlb.c               | 1767 ++++++++++++++++++------------
 accel/tcg/tb-maint.c             |    2 +-
 accel/tcg/user-exec.c            |  478 +++++---
 tcg/optimize.c                   |   15 +-
 tcg/tcg-op.c                     |  246 +++--
 tcg/tcg.c                        |    8 +-
 tcg/tci.c                        |  127 +--
 util/bufferiszero.c              |    3 +-
 accel/tcg/ldst_atomicity.c.inc   | 1370 +++++++++++++++++++++++
 tcg/aarch64/tcg-target.c.inc     |   87 +-
 tcg/arm/tcg-target.c.inc         |   45 +-
 tcg/i386/tcg-target.c.inc        | 1228 ++++++++++++++-------
 tcg/loongarch64/tcg-target.c.inc |   25 +-
 tcg/mips/tcg-target.c.inc        |   40 +-
 tcg/ppc/tcg-target.c.inc         |   30 +-
 tcg/riscv/tcg-target.c.inc       |   51 +-
 tcg/s390x/tcg-target.c.inc       |   38 +-
 tcg/sparc64/tcg-target.c.inc     |   37 +-
 tcg/tci/tcg-target.c.inc         |    3 +-
 42 files changed, 4236 insertions(+), 1692 deletions(-)
 create mode 100644 accel/tcg/ldst_atomicity.c.inc

-- 
2.34.1

^ permalink raw reply	[flat|nested] 42+ messages in thread