All of lore.kernel.org
 help / color / mirror / Atom feed
* [PULL 00/27] tcg patch queue
@ 2023-05-30 18:59 Richard Henderson
  2023-05-30 18:59 ` [PULL 01/27] tcg: Fix register move type in tcg_out_ld_helper_ret Richard Henderson
                   ` (27 more replies)
  0 siblings, 28 replies; 35+ messages in thread
From: Richard Henderson @ 2023-05-30 18:59 UTC (permalink / raw)
  To: qemu-devel

The following changes since commit 7fe6cb68117ac856e03c93d18aca09de015392b0:

  Merge tag 'pull-target-arm-20230530-1' of https://git.linaro.org/people/pmaydell/qemu-arm into staging (2023-05-30 08:02:05 -0700)

are available in the Git repository at:

  https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20230530

for you to fetch changes up to 276d77de503e8f5f5cbd3f7d94302ca12d1d982e:

  tests/decode: Add tests for various named-field cases (2023-05-30 10:55:39 -0700)

----------------------------------------------------------------
Improvements to 128-bit atomics:
  - Separate __int128_t type and arithmetic detection
  - Support 128-bit load/store in backend for i386, aarch64, ppc64, s390x
  - Accelerate atomics via host/include/
Decodetree:
  - Add named field syntax
  - Move tests to meson

----------------------------------------------------------------
Peter Maydell (5):
      docs: Document decodetree named field syntax
      scripts/decodetree: Pass lvalue-formatter function to str_extract()
      scripts/decodetree: Implement a topological sort
      scripts/decodetree: Implement named field support
      tests/decode: Add tests for various named-field cases

Richard Henderson (22):
      tcg: Fix register move type in tcg_out_ld_helper_ret
      accel/tcg: Fix check for page writeability in load_atomic16_or_exit
      meson: Split test for __int128_t type from __int128_t arithmetic
      qemu/atomic128: Add x86_64 atomic128-ldst.h
      tcg/i386: Support 128-bit load/store
      tcg/aarch64: Rename temporaries
      tcg/aarch64: Reserve TCG_REG_TMP1, TCG_REG_TMP2
      tcg/aarch64: Simplify constraints on qemu_ld/st
      tcg/aarch64: Support 128-bit load/store
      tcg/ppc: Support 128-bit load/store
      tcg/s390x: Support 128-bit load/store
      accel/tcg: Extract load_atom_extract_al16_or_al8 to host header
      accel/tcg: Extract store_atom_insert_al16 to host header
      accel/tcg: Add x86_64 load_atom_extract_al16_or_al8
      accel/tcg: Add aarch64 lse2 load_atom_extract_al16_or_al8
      accel/tcg: Add aarch64 store_atom_insert_al16
      tcg: Remove TCG_TARGET_TLB_DISPLACEMENT_BITS
      decodetree: Add --test-for-error
      decodetree: Fix recursion in prop_format and build_tree
      decodetree: Diagnose empty pattern group
      decodetree: Do not remove output_file from /dev
      tests/decode: Convert tests to meson

 docs/devel/decodetree.rst                         |  33 ++-
 meson.build                                       |  15 +-
 host/include/aarch64/host/load-extract-al16-al8.h |  40 ++++
 host/include/aarch64/host/store-insert-al16.h     |  47 ++++
 host/include/generic/host/load-extract-al16-al8.h |  45 ++++
 host/include/generic/host/store-insert-al16.h     |  50 ++++
 host/include/x86_64/host/atomic128-ldst.h         |  68 ++++++
 host/include/x86_64/host/load-extract-al16-al8.h  |  50 ++++
 include/qemu/int128.h                             |   4 +-
 tcg/aarch64/tcg-target-con-set.h                  |   4 +-
 tcg/aarch64/tcg-target-con-str.h                  |   1 -
 tcg/aarch64/tcg-target.h                          |  12 +-
 tcg/arm/tcg-target.h                              |   1 -
 tcg/i386/tcg-target.h                             |   5 +-
 tcg/mips/tcg-target.h                             |   1 -
 tcg/ppc/tcg-target-con-set.h                      |   2 +
 tcg/ppc/tcg-target-con-str.h                      |   1 +
 tcg/ppc/tcg-target.h                              |   4 +-
 tcg/riscv/tcg-target.h                            |   1 -
 tcg/s390x/tcg-target-con-set.h                    |   2 +
 tcg/s390x/tcg-target.h                            |   3 +-
 tcg/sparc64/tcg-target.h                          |   1 -
 tcg/tci/tcg-target.h                              |   1 -
 tests/decode/err_field10.decode                   |   7 +
 tests/decode/err_field7.decode                    |   7 +
 tests/decode/err_field8.decode                    |   8 +
 tests/decode/err_field9.decode                    |  14 ++
 tests/decode/succ_named_field.decode              |  19 ++
 tcg/tcg.c                                         |   4 +-
 accel/tcg/ldst_atomicity.c.inc                    |  80 +------
 tcg/aarch64/tcg-target.c.inc                      | 243 +++++++++++++++-----
 tcg/i386/tcg-target.c.inc                         | 191 +++++++++++++++-
 tcg/ppc/tcg-target.c.inc                          | 108 ++++++++-
 tcg/s390x/tcg-target.c.inc                        | 107 ++++++++-
 scripts/decodetree.py                             | 265 ++++++++++++++++++++--
 tests/decode/check.sh                             |  24 --
 tests/decode/meson.build                          |  64 ++++++
 tests/meson.build                                 |   5 +-
 38 files changed, 1312 insertions(+), 225 deletions(-)
 create mode 100644 host/include/aarch64/host/load-extract-al16-al8.h
 create mode 100644 host/include/aarch64/host/store-insert-al16.h
 create mode 100644 host/include/generic/host/load-extract-al16-al8.h
 create mode 100644 host/include/generic/host/store-insert-al16.h
 create mode 100644 host/include/x86_64/host/atomic128-ldst.h
 create mode 100644 host/include/x86_64/host/load-extract-al16-al8.h
 create mode 100644 tests/decode/err_field10.decode
 create mode 100644 tests/decode/err_field7.decode
 create mode 100644 tests/decode/err_field8.decode
 create mode 100644 tests/decode/err_field9.decode
 create mode 100644 tests/decode/succ_named_field.decode
 delete mode 100755 tests/decode/check.sh
 create mode 100644 tests/decode/meson.build


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PULL 01/27] tcg: Fix register move type in tcg_out_ld_helper_ret
  2023-05-30 18:59 [PULL 00/27] tcg patch queue Richard Henderson
@ 2023-05-30 18:59 ` Richard Henderson
  2023-05-30 18:59 ` [PULL 02/27] accel/tcg: Fix check for page writeability in load_atomic16_or_exit Richard Henderson
                   ` (26 subsequent siblings)
  27 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2023-05-30 18:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell

The first move was incorrectly using TCG_TYPE_I32 while the second
move was correctly using TCG_TYPE_REG.  This prevents a 64-bit host
from moving all 128-bits of the return value.

Fixes: ebebea53ef8 ("tcg: Support TCG_TYPE_I128 in tcg_out_{ld,st}_helper_{args,ret}")
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
---
 tcg/tcg.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index ac30d484f5..2352ca4ade 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -5736,8 +5736,8 @@ static void tcg_out_ld_helper_ret(TCGContext *s, const TCGLabelQemuLdst *ldst,
     mov[0].dst = ldst->datalo_reg;
     mov[0].src =
         tcg_target_call_oarg_reg(TCG_CALL_RET_NORMAL, HOST_BIG_ENDIAN);
-    mov[0].dst_type = TCG_TYPE_I32;
-    mov[0].src_type = TCG_TYPE_I32;
+    mov[0].dst_type = TCG_TYPE_REG;
+    mov[0].src_type = TCG_TYPE_REG;
     mov[0].src_ext = TCG_TARGET_REG_BITS == 32 ? MO_32 : MO_64;
 
     mov[1].dst = ldst->datahi_reg;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PULL 02/27] accel/tcg: Fix check for page writeability in load_atomic16_or_exit
  2023-05-30 18:59 [PULL 00/27] tcg patch queue Richard Henderson
  2023-05-30 18:59 ` [PULL 01/27] tcg: Fix register move type in tcg_out_ld_helper_ret Richard Henderson
@ 2023-05-30 18:59 ` Richard Henderson
  2023-05-30 18:59 ` [PULL 03/27] meson: Split test for __int128_t type from __int128_t arithmetic Richard Henderson
                   ` (25 subsequent siblings)
  27 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2023-05-30 18:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell

PAGE_WRITE is current writability, as modified by TB protection;
PAGE_WRITE_ORG is the original page writability.

Fixes: cdfac37be0d ("accel/tcg: Honor atomicity of loads")
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/ldst_atomicity.c.inc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/accel/tcg/ldst_atomicity.c.inc b/accel/tcg/ldst_atomicity.c.inc
index 0f6b3f8ab6..35ce6d6368 100644
--- a/accel/tcg/ldst_atomicity.c.inc
+++ b/accel/tcg/ldst_atomicity.c.inc
@@ -156,7 +156,7 @@ static uint64_t load_atomic8_or_exit(CPUArchState *env, uintptr_t ra, void *pv)
      * another process, because the fallback start_exclusive solution
      * provides no protection across processes.
      */
-    if (!page_check_range(h2g(pv), 8, PAGE_WRITE)) {
+    if (!page_check_range(h2g(pv), 8, PAGE_WRITE_ORG)) {
         uint64_t *p = __builtin_assume_aligned(pv, 8);
         return *p;
     }
@@ -191,7 +191,7 @@ static Int128 load_atomic16_or_exit(CPUArchState *env, uintptr_t ra, void *pv)
      * another process, because the fallback start_exclusive solution
      * provides no protection across processes.
      */
-    if (!page_check_range(h2g(p), 16, PAGE_WRITE)) {
+    if (!page_check_range(h2g(p), 16, PAGE_WRITE_ORG)) {
         return *p;
     }
 #endif
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PULL 03/27] meson: Split test for __int128_t type from __int128_t arithmetic
  2023-05-30 18:59 [PULL 00/27] tcg patch queue Richard Henderson
  2023-05-30 18:59 ` [PULL 01/27] tcg: Fix register move type in tcg_out_ld_helper_ret Richard Henderson
  2023-05-30 18:59 ` [PULL 02/27] accel/tcg: Fix check for page writeability in load_atomic16_or_exit Richard Henderson
@ 2023-05-30 18:59 ` Richard Henderson
  2023-05-30 18:59 ` [PULL 04/27] qemu/atomic128: Add x86_64 atomic128-ldst.h Richard Henderson
                   ` (24 subsequent siblings)
  27 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2023-05-30 18:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell

Older versions of clang have missing runtime functions for arithmetic
with -fsanitize=undefined (see 464e3671f9d5c), so we cannot use
__int128_t for implementing Int128.  But __int128_t is present,
data movement works, and it can be used for atomic128.

Probe for both CONFIG_INT128_TYPE and CONFIG_INT128, adjust
qemu/int128.h to define Int128Alias if CONFIG_INT128_TYPE,
and adjust the meson probe for atomics to use has_int128_type.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 meson.build           | 15 ++++++++++-----
 include/qemu/int128.h |  4 ++--
 2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/meson.build b/meson.build
index 2d48aa1e2e..bc76ea96bf 100644
--- a/meson.build
+++ b/meson.build
@@ -2543,7 +2543,13 @@ config_host_data.set('CONFIG_ATOMIC64', cc.links('''
     return 0;
   }'''))
 
-has_int128 = cc.links('''
+has_int128_type = cc.compiles('''
+  __int128_t a;
+  __uint128_t b;
+  int main(void) { b = a; }''')
+config_host_data.set('CONFIG_INT128_TYPE', has_int128_type)
+
+has_int128 = has_int128_type and cc.links('''
   __int128_t a;
   __uint128_t b;
   int main (void) {
@@ -2552,10 +2558,9 @@ has_int128 = cc.links('''
     a = a * a;
     return 0;
   }''')
-
 config_host_data.set('CONFIG_INT128', has_int128)
 
-if has_int128
+if has_int128_type
   # "do we have 128-bit atomics which are handled inline and specifically not
   # via libatomic". The reason we can't use libatomic is documented in the
   # comment starting "GCC is a house divided" in include/qemu/atomic128.h.
@@ -2564,7 +2569,7 @@ if has_int128
   # __alignof(unsigned __int128) for the host.
   atomic_test_128 = '''
     int main(int ac, char **av) {
-      unsigned __int128 *p = __builtin_assume_aligned(av[ac - 1], 16);
+      __uint128_t *p = __builtin_assume_aligned(av[ac - 1], 16);
       p[1] = __atomic_load_n(&p[0], __ATOMIC_RELAXED);
       __atomic_store_n(&p[2], p[3], __ATOMIC_RELAXED);
       __atomic_compare_exchange_n(&p[4], &p[5], p[6], 0, __ATOMIC_RELAXED, __ATOMIC_RELAXED);
@@ -2586,7 +2591,7 @@ if has_int128
       config_host_data.set('CONFIG_CMPXCHG128', cc.links('''
         int main(void)
         {
-          unsigned __int128 x = 0, y = 0;
+          __uint128_t x = 0, y = 0;
           __sync_val_compare_and_swap_16(&x, y, x);
           return 0;
         }
diff --git a/include/qemu/int128.h b/include/qemu/int128.h
index 9e46cfaefc..73624e8be7 100644
--- a/include/qemu/int128.h
+++ b/include/qemu/int128.h
@@ -481,7 +481,7 @@ static inline void bswap128s(Int128 *s)
  * a possible structure and the native types.  Ease parameter passing
  * via use of the transparent union extension.
  */
-#ifdef CONFIG_INT128
+#ifdef CONFIG_INT128_TYPE
 typedef union {
     __uint128_t u;
     __int128_t i;
@@ -489,6 +489,6 @@ typedef union {
 } Int128Alias __attribute__((transparent_union));
 #else
 typedef Int128 Int128Alias;
-#endif /* CONFIG_INT128 */
+#endif /* CONFIG_INT128_TYPE */
 
 #endif /* INT128_H */
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PULL 04/27] qemu/atomic128: Add x86_64 atomic128-ldst.h
  2023-05-30 18:59 [PULL 00/27] tcg patch queue Richard Henderson
                   ` (2 preceding siblings ...)
  2023-05-30 18:59 ` [PULL 03/27] meson: Split test for __int128_t type from __int128_t arithmetic Richard Henderson
@ 2023-05-30 18:59 ` Richard Henderson
  2023-05-30 18:59 ` [PULL 05/27] tcg/i386: Support 128-bit load/store Richard Henderson
                   ` (23 subsequent siblings)
  27 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2023-05-30 18:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alex Bennée

With CPUINFO_ATOMIC_VMOVDQA, we can perform proper atomic
load/store without cmpxchg16b.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 host/include/x86_64/host/atomic128-ldst.h | 68 +++++++++++++++++++++++
 1 file changed, 68 insertions(+)
 create mode 100644 host/include/x86_64/host/atomic128-ldst.h

diff --git a/host/include/x86_64/host/atomic128-ldst.h b/host/include/x86_64/host/atomic128-ldst.h
new file mode 100644
index 0000000000..adc9332f91
--- /dev/null
+++ b/host/include/x86_64/host/atomic128-ldst.h
@@ -0,0 +1,68 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ * Load/store for 128-bit atomic operations, x86_64 version.
+ *
+ * Copyright (C) 2023 Linaro, Ltd.
+ *
+ * See docs/devel/atomics.rst for discussion about the guarantees each
+ * atomic primitive is meant to provide.
+ */
+
+#ifndef AARCH64_ATOMIC128_LDST_H
+#define AARCH64_ATOMIC128_LDST_H
+
+#ifdef CONFIG_INT128_TYPE
+#include "host/cpuinfo.h"
+#include "tcg/debug-assert.h"
+
+/*
+ * Through clang 16, with -mcx16, __atomic_load_n is incorrectly
+ * expanded to a read-write operation: lock cmpxchg16b.
+ */
+
+#define HAVE_ATOMIC128_RO  likely(cpuinfo & CPUINFO_ATOMIC_VMOVDQA)
+#define HAVE_ATOMIC128_RW  1
+
+static inline Int128 atomic16_read_ro(const Int128 *ptr)
+{
+    Int128Alias r;
+
+    tcg_debug_assert(HAVE_ATOMIC128_RO);
+    asm("vmovdqa %1, %0" : "=x" (r.i) : "m" (*ptr));
+
+    return r.s;
+}
+
+static inline Int128 atomic16_read_rw(Int128 *ptr)
+{
+    __int128_t *ptr_align = __builtin_assume_aligned(ptr, 16);
+    Int128Alias r;
+
+    if (HAVE_ATOMIC128_RO) {
+        asm("vmovdqa %1, %0" : "=x" (r.i) : "m" (*ptr_align));
+    } else {
+        r.i = __sync_val_compare_and_swap_16(ptr_align, 0, 0);
+    }
+    return r.s;
+}
+
+static inline void atomic16_set(Int128 *ptr, Int128 val)
+{
+    __int128_t *ptr_align = __builtin_assume_aligned(ptr, 16);
+    Int128Alias new = { .s = val };
+
+    if (HAVE_ATOMIC128_RO) {
+        asm("vmovdqa %1, %0" : "=m"(*ptr_align) : "x" (new.i));
+    } else {
+        __int128_t old;
+        do {
+            old = *ptr_align;
+        } while (!__sync_bool_compare_and_swap_16(ptr_align, old, new.i));
+    }
+}
+#else
+/* Provide QEMU_ERROR stubs. */
+#include "host/include/generic/host/atomic128-ldst.h"
+#endif
+
+#endif /* AARCH64_ATOMIC128_LDST_H */
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PULL 05/27] tcg/i386: Support 128-bit load/store
  2023-05-30 18:59 [PULL 00/27] tcg patch queue Richard Henderson
                   ` (3 preceding siblings ...)
  2023-05-30 18:59 ` [PULL 04/27] qemu/atomic128: Add x86_64 atomic128-ldst.h Richard Henderson
@ 2023-05-30 18:59 ` Richard Henderson
  2023-05-30 18:59 ` [PULL 06/27] tcg/aarch64: Rename temporaries Richard Henderson
                   ` (22 subsequent siblings)
  27 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2023-05-30 18:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/i386/tcg-target.h     |   4 +-
 tcg/i386/tcg-target.c.inc | 191 +++++++++++++++++++++++++++++++++++++-
 2 files changed, 190 insertions(+), 5 deletions(-)

diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 0106946996..b167f1e8d6 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -118,7 +118,6 @@ typedef enum {
 #define have_avx1         (cpuinfo & CPUINFO_AVX1)
 #define have_avx2         (cpuinfo & CPUINFO_AVX2)
 #define have_movbe        (cpuinfo & CPUINFO_MOVBE)
-#define have_atomic16     (cpuinfo & CPUINFO_ATOMIC_VMOVDQA)
 
 /*
  * There are interesting instructions in AVX512, so long as we have AVX512VL,
@@ -202,7 +201,8 @@ typedef enum {
 #define TCG_TARGET_HAS_qemu_st8_i32     1
 #endif
 
-#define TCG_TARGET_HAS_qemu_ldst_i128   0
+#define TCG_TARGET_HAS_qemu_ldst_i128 \
+    (TCG_TARGET_REG_BITS == 64 && (cpuinfo & CPUINFO_ATOMIC_VMOVDQA))
 
 /* We do not support older SSE systems, only beginning with AVX1.  */
 #define TCG_TARGET_HAS_v64              have_avx1
diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index bfe9d98b7e..ae54e5fbf3 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -91,6 +91,8 @@ static const int tcg_target_reg_alloc_order[] = {
 #endif
 };
 
+#define TCG_TMP_VEC  TCG_REG_XMM5
+
 static const int tcg_target_call_iarg_regs[] = {
 #if TCG_TARGET_REG_BITS == 64
 #if defined(_WIN64)
@@ -319,6 +321,8 @@ static bool tcg_target_const_match(int64_t val, TCGType type, int ct)
 #define OPC_PCMPGTW     (0x65 | P_EXT | P_DATA16)
 #define OPC_PCMPGTD     (0x66 | P_EXT | P_DATA16)
 #define OPC_PCMPGTQ     (0x37 | P_EXT38 | P_DATA16)
+#define OPC_PEXTRD      (0x16 | P_EXT3A | P_DATA16)
+#define OPC_PINSRD      (0x22 | P_EXT3A | P_DATA16)
 #define OPC_PMAXSB      (0x3c | P_EXT38 | P_DATA16)
 #define OPC_PMAXSW      (0xee | P_EXT | P_DATA16)
 #define OPC_PMAXSD      (0x3d | P_EXT38 | P_DATA16)
@@ -1753,7 +1757,21 @@ typedef struct {
 
 bool tcg_target_has_memory_bswap(MemOp memop)
 {
-    return have_movbe;
+    TCGAtomAlign aa;
+
+    if (!have_movbe) {
+        return false;
+    }
+    if ((memop & MO_SIZE) < MO_128) {
+        return true;
+    }
+
+    /*
+     * Reject 16-byte memop with 16-byte atomicity, i.e. VMOVDQA,
+     * but do allow a pair of 64-bit operations, i.e. MOVBEQ.
+     */
+    aa = atom_and_align_for_opc(tcg_ctx, memop, MO_ATOM_IFALIGN, true);
+    return aa.atom < MO_128;
 }
 
 /*
@@ -1781,6 +1799,30 @@ static const TCGLdstHelperParam ldst_helper_param = {
 static const TCGLdstHelperParam ldst_helper_param = { };
 #endif
 
+static void tcg_out_vec_to_pair(TCGContext *s, TCGType type,
+                                TCGReg l, TCGReg h, TCGReg v)
+{
+    int rexw = type == TCG_TYPE_I32 ? 0 : P_REXW;
+
+    /* vpmov{d,q} %v, %l */
+    tcg_out_vex_modrm(s, OPC_MOVD_EyVy + rexw, v, 0, l);
+    /* vpextr{d,q} $1, %v, %h */
+    tcg_out_vex_modrm(s, OPC_PEXTRD + rexw, v, 0, h);
+    tcg_out8(s, 1);
+}
+
+static void tcg_out_pair_to_vec(TCGContext *s, TCGType type,
+                                TCGReg v, TCGReg l, TCGReg h)
+{
+    int rexw = type == TCG_TYPE_I32 ? 0 : P_REXW;
+
+    /* vmov{d,q} %l, %v */
+    tcg_out_vex_modrm(s, OPC_MOVD_VyEy + rexw, v, 0, l);
+    /* vpinsr{d,q} $1, %h, %v, %v */
+    tcg_out_vex_modrm(s, OPC_PINSRD + rexw, v, v, h);
+    tcg_out8(s, 1);
+}
+
 /*
  * Generate code for the slow path for a load at the end of block
  */
@@ -1870,6 +1912,7 @@ static TCGLabelQemuLdst *prepare_host_addr(TCGContext *s, HostAddress *h,
 {
     TCGLabelQemuLdst *ldst = NULL;
     MemOp opc = get_memop(oi);
+    MemOp s_bits = opc & MO_SIZE;
     unsigned a_mask;
 
 #ifdef CONFIG_SOFTMMU
@@ -1880,7 +1923,7 @@ static TCGLabelQemuLdst *prepare_host_addr(TCGContext *s, HostAddress *h,
     *h = x86_guest_base;
 #endif
     h->base = addrlo;
-    h->aa = atom_and_align_for_opc(s, opc, MO_ATOM_IFALIGN, false);
+    h->aa = atom_and_align_for_opc(s, opc, MO_ATOM_IFALIGN, s_bits == MO_128);
     a_mask = (1 << h->aa.align) - 1;
 
 #ifdef CONFIG_SOFTMMU
@@ -1890,7 +1933,6 @@ static TCGLabelQemuLdst *prepare_host_addr(TCGContext *s, HostAddress *h,
     TCGType tlbtype = TCG_TYPE_I32;
     int trexw = 0, hrexw = 0, tlbrexw = 0;
     unsigned mem_index = get_mmuidx(oi);
-    unsigned s_bits = opc & MO_SIZE;
     unsigned s_mask = (1 << s_bits) - 1;
     int tlb_mask;
 
@@ -2070,6 +2112,72 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, TCGReg datalo, TCGReg datahi,
                                      h.base, h.index, 0, h.ofs + 4);
         }
         break;
+
+    case MO_128:
+        tcg_debug_assert(TCG_TARGET_REG_BITS == 64);
+
+        /*
+         * Without 16-byte atomicity, use integer regs.
+         * That is where we want the data, and it allows bswaps.
+         */
+        if (h.aa.atom < MO_128) {
+            if (use_movbe) {
+                TCGReg t = datalo;
+                datalo = datahi;
+                datahi = t;
+            }
+            if (h.base == datalo || h.index == datalo) {
+                tcg_out_modrm_sib_offset(s, OPC_LEA + P_REXW, datahi,
+                                         h.base, h.index, 0, h.ofs);
+                tcg_out_modrm_offset(s, movop + P_REXW + h.seg,
+                                     datalo, datahi, 0);
+                tcg_out_modrm_offset(s, movop + P_REXW + h.seg,
+                                     datahi, datahi, 8);
+            } else {
+                tcg_out_modrm_sib_offset(s, movop + P_REXW + h.seg, datalo,
+                                         h.base, h.index, 0, h.ofs);
+                tcg_out_modrm_sib_offset(s, movop + P_REXW + h.seg, datahi,
+                                         h.base, h.index, 0, h.ofs + 8);
+            }
+            break;
+        }
+
+        /*
+         * With 16-byte atomicity, a vector load is required.
+         * If we already have 16-byte alignment, then VMOVDQA always works.
+         * Else if VMOVDQU has atomicity with dynamic alignment, use that.
+         * Else use we require a runtime test for alignment for VMOVDQA;
+         * use VMOVDQU on the unaligned nonatomic path for simplicity.
+         */
+        if (h.aa.align >= MO_128) {
+            tcg_out_vex_modrm_sib_offset(s, OPC_MOVDQA_VxWx + h.seg,
+                                         TCG_TMP_VEC, 0,
+                                         h.base, h.index, 0, h.ofs);
+        } else if (cpuinfo & CPUINFO_ATOMIC_VMOVDQU) {
+            tcg_out_vex_modrm_sib_offset(s, OPC_MOVDQU_VxWx + h.seg,
+                                         TCG_TMP_VEC, 0,
+                                         h.base, h.index, 0, h.ofs);
+        } else {
+            TCGLabel *l1 = gen_new_label();
+            TCGLabel *l2 = gen_new_label();
+
+            tcg_out_testi(s, h.base, 15);
+            tcg_out_jxx(s, JCC_JNE, l1, true);
+
+            tcg_out_vex_modrm_sib_offset(s, OPC_MOVDQA_VxWx + h.seg,
+                                         TCG_TMP_VEC, 0,
+                                         h.base, h.index, 0, h.ofs);
+            tcg_out_jxx(s, JCC_JMP, l2, true);
+
+            tcg_out_label(s, l1);
+            tcg_out_vex_modrm_sib_offset(s, OPC_MOVDQU_VxWx + h.seg,
+                                         TCG_TMP_VEC, 0,
+                                         h.base, h.index, 0, h.ofs);
+            tcg_out_label(s, l2);
+        }
+        tcg_out_vec_to_pair(s, TCG_TYPE_I64, datalo, datahi, TCG_TMP_VEC);
+        break;
+
     default:
         g_assert_not_reached();
     }
@@ -2140,6 +2248,63 @@ static void tcg_out_qemu_st_direct(TCGContext *s, TCGReg datalo, TCGReg datahi,
                                      h.base, h.index, 0, h.ofs + 4);
         }
         break;
+
+    case MO_128:
+        tcg_debug_assert(TCG_TARGET_REG_BITS == 64);
+
+        /*
+         * Without 16-byte atomicity, use integer regs.
+         * That is where we have the data, and it allows bswaps.
+         */
+        if (h.aa.atom < MO_128) {
+            if (use_movbe) {
+                TCGReg t = datalo;
+                datalo = datahi;
+                datahi = t;
+            }
+            tcg_out_modrm_sib_offset(s, movop + P_REXW + h.seg, datalo,
+                                     h.base, h.index, 0, h.ofs);
+            tcg_out_modrm_sib_offset(s, movop + P_REXW + h.seg, datahi,
+                                     h.base, h.index, 0, h.ofs + 8);
+            break;
+        }
+
+        /*
+         * With 16-byte atomicity, a vector store is required.
+         * If we already have 16-byte alignment, then VMOVDQA always works.
+         * Else if VMOVDQU has atomicity with dynamic alignment, use that.
+         * Else use we require a runtime test for alignment for VMOVDQA;
+         * use VMOVDQU on the unaligned nonatomic path for simplicity.
+         */
+        tcg_out_pair_to_vec(s, TCG_TYPE_I64, TCG_TMP_VEC, datalo, datahi);
+        if (h.aa.align >= MO_128) {
+            tcg_out_vex_modrm_sib_offset(s, OPC_MOVDQA_WxVx + h.seg,
+                                         TCG_TMP_VEC, 0,
+                                         h.base, h.index, 0, h.ofs);
+        } else if (cpuinfo & CPUINFO_ATOMIC_VMOVDQU) {
+            tcg_out_vex_modrm_sib_offset(s, OPC_MOVDQU_WxVx + h.seg,
+                                         TCG_TMP_VEC, 0,
+                                         h.base, h.index, 0, h.ofs);
+        } else {
+            TCGLabel *l1 = gen_new_label();
+            TCGLabel *l2 = gen_new_label();
+
+            tcg_out_testi(s, h.base, 15);
+            tcg_out_jxx(s, JCC_JNE, l1, true);
+
+            tcg_out_vex_modrm_sib_offset(s, OPC_MOVDQA_WxVx + h.seg,
+                                         TCG_TMP_VEC, 0,
+                                         h.base, h.index, 0, h.ofs);
+            tcg_out_jxx(s, JCC_JMP, l2, true);
+
+            tcg_out_label(s, l1);
+            tcg_out_vex_modrm_sib_offset(s, OPC_MOVDQU_WxVx + h.seg,
+                                         TCG_TMP_VEC, 0,
+                                         h.base, h.index, 0, h.ofs);
+            tcg_out_label(s, l2);
+        }
+        break;
+
     default:
         g_assert_not_reached();
     }
@@ -2470,6 +2635,11 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
             tcg_out_qemu_ld(s, a0, a1, a2, args[3], args[4], TCG_TYPE_I64);
         }
         break;
+    case INDEX_op_qemu_ld_a32_i128:
+    case INDEX_op_qemu_ld_a64_i128:
+        tcg_debug_assert(TCG_TARGET_REG_BITS == 64);
+        tcg_out_qemu_ld(s, a0, a1, a2, -1, args[3], TCG_TYPE_I128);
+        break;
 
     case INDEX_op_qemu_st_a64_i32:
     case INDEX_op_qemu_st8_a64_i32:
@@ -2496,6 +2666,11 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
             tcg_out_qemu_st(s, a0, a1, a2, args[3], args[4], TCG_TYPE_I64);
         }
         break;
+    case INDEX_op_qemu_st_a32_i128:
+    case INDEX_op_qemu_st_a64_i128:
+        tcg_debug_assert(TCG_TARGET_REG_BITS == 64);
+        tcg_out_qemu_st(s, a0, a1, a2, -1, args[3], TCG_TYPE_I128);
+        break;
 
     OP_32_64(mulu2):
         tcg_out_modrm(s, OPC_GRP3_Ev + rexw, EXT3_MUL, args[3]);
@@ -3193,6 +3368,15 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_qemu_st_a64_i64:
         return TCG_TARGET_REG_BITS == 64 ? C_O0_I2(L, L) : C_O0_I4(L, L, L, L);
 
+    case INDEX_op_qemu_ld_a32_i128:
+    case INDEX_op_qemu_ld_a64_i128:
+        tcg_debug_assert(TCG_TARGET_REG_BITS == 64);
+        return C_O2_I1(r, r, L);
+    case INDEX_op_qemu_st_a32_i128:
+    case INDEX_op_qemu_st_a64_i128:
+        tcg_debug_assert(TCG_TARGET_REG_BITS == 64);
+        return C_O0_I3(L, L, L);
+
     case INDEX_op_brcond2_i32:
         return C_O0_I4(r, r, ri, ri);
 
@@ -3962,6 +4146,7 @@ static void tcg_target_init(TCGContext *s)
 
     s->reserved_regs = 0;
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_CALL_STACK);
+    tcg_regset_set_reg(s->reserved_regs, TCG_TMP_VEC);
 #ifdef _WIN64
     /* These are call saved, and we don't save them, so don't use them. */
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_XMM6);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PULL 06/27] tcg/aarch64: Rename temporaries
  2023-05-30 18:59 [PULL 00/27] tcg patch queue Richard Henderson
                   ` (4 preceding siblings ...)
  2023-05-30 18:59 ` [PULL 05/27] tcg/i386: Support 128-bit load/store Richard Henderson
@ 2023-05-30 18:59 ` Richard Henderson
  2023-05-30 18:59 ` [PULL 07/27] tcg/aarch64: Reserve TCG_REG_TMP1, TCG_REG_TMP2 Richard Henderson
                   ` (21 subsequent siblings)
  27 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2023-05-30 18:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell

We will need to allocate a second general-purpose temporary.
Rename the existing temps to add a distinguishing number.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/aarch64/tcg-target.c.inc | 50 ++++++++++++++++++------------------
 1 file changed, 25 insertions(+), 25 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
index 84283665e7..8996e29ca9 100644
--- a/tcg/aarch64/tcg-target.c.inc
+++ b/tcg/aarch64/tcg-target.c.inc
@@ -71,8 +71,8 @@ static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind kind, int slot)
     return TCG_REG_X0 + slot;
 }
 
-#define TCG_REG_TMP TCG_REG_X30
-#define TCG_VEC_TMP TCG_REG_V31
+#define TCG_REG_TMP0 TCG_REG_X30
+#define TCG_VEC_TMP0 TCG_REG_V31
 
 #ifndef CONFIG_SOFTMMU
 #define TCG_REG_GUEST_BASE TCG_REG_X28
@@ -984,7 +984,7 @@ static bool tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece,
 static bool tcg_out_dupm_vec(TCGContext *s, TCGType type, unsigned vece,
                              TCGReg r, TCGReg base, intptr_t offset)
 {
-    TCGReg temp = TCG_REG_TMP;
+    TCGReg temp = TCG_REG_TMP0;
 
     if (offset < -0xffffff || offset > 0xffffff) {
         tcg_out_movi(s, TCG_TYPE_PTR, temp, offset);
@@ -1136,8 +1136,8 @@ static void tcg_out_ldst(TCGContext *s, AArch64Insn insn, TCGReg rd,
     }
 
     /* Worst-case scenario, move offset to temp register, use reg offset.  */
-    tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_TMP, offset);
-    tcg_out_ldst_r(s, insn, rd, rn, TCG_TYPE_I64, TCG_REG_TMP);
+    tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_TMP0, offset);
+    tcg_out_ldst_r(s, insn, rd, rn, TCG_TYPE_I64, TCG_REG_TMP0);
 }
 
 static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
@@ -1353,8 +1353,8 @@ static void tcg_out_call_int(TCGContext *s, const tcg_insn_unit *target)
     if (offset == sextract64(offset, 0, 26)) {
         tcg_out_insn(s, 3206, BL, offset);
     } else {
-        tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_TMP, (intptr_t)target);
-        tcg_out_insn(s, 3207, BLR, TCG_REG_TMP);
+        tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_TMP0, (intptr_t)target);
+        tcg_out_insn(s, 3207, BLR, TCG_REG_TMP0);
     }
 }
 
@@ -1491,7 +1491,7 @@ static void tcg_out_addsub2(TCGContext *s, TCGType ext, TCGReg rl,
     AArch64Insn insn;
 
     if (rl == ah || (!const_bh && rl == bh)) {
-        rl = TCG_REG_TMP;
+        rl = TCG_REG_TMP0;
     }
 
     if (const_bl) {
@@ -1508,7 +1508,7 @@ static void tcg_out_addsub2(TCGContext *s, TCGType ext, TCGReg rl,
                possibility of adding 0+const in the low part, and the
                immediate add instructions encode XSP not XZR.  Don't try
                anything more elaborate here than loading another zero.  */
-            al = TCG_REG_TMP;
+            al = TCG_REG_TMP0;
             tcg_out_movi(s, ext, al, 0);
         }
         tcg_out_insn_3401(s, insn, ext, rl, al, bl);
@@ -1549,7 +1549,7 @@ static void tcg_out_cltz(TCGContext *s, TCGType ext, TCGReg d,
 {
     TCGReg a1 = a0;
     if (is_ctz) {
-        a1 = TCG_REG_TMP;
+        a1 = TCG_REG_TMP0;
         tcg_out_insn(s, 3507, RBIT, ext, a1, a0);
     }
     if (const_b && b == (ext ? 64 : 32)) {
@@ -1558,7 +1558,7 @@ static void tcg_out_cltz(TCGContext *s, TCGType ext, TCGReg d,
         AArch64Insn sel = I3506_CSEL;
 
         tcg_out_cmp(s, ext, a0, 0, 1);
-        tcg_out_insn(s, 3507, CLZ, ext, TCG_REG_TMP, a1);
+        tcg_out_insn(s, 3507, CLZ, ext, TCG_REG_TMP0, a1);
 
         if (const_b) {
             if (b == -1) {
@@ -1571,7 +1571,7 @@ static void tcg_out_cltz(TCGContext *s, TCGType ext, TCGReg d,
                 b = d;
             }
         }
-        tcg_out_insn_3506(s, sel, ext, d, TCG_REG_TMP, b, TCG_COND_NE);
+        tcg_out_insn_3506(s, sel, ext, d, TCG_REG_TMP0, b, TCG_COND_NE);
     }
 }
 
@@ -1588,7 +1588,7 @@ bool tcg_target_has_memory_bswap(MemOp memop)
 }
 
 static const TCGLdstHelperParam ldst_helper_param = {
-    .ntmp = 1, .tmp = { TCG_REG_TMP }
+    .ntmp = 1, .tmp = { TCG_REG_TMP0 }
 };
 
 static bool tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
@@ -1847,7 +1847,7 @@ static void tcg_out_goto_tb(TCGContext *s, int which)
 
     set_jmp_insn_offset(s, which);
     tcg_out32(s, I3206_B);
-    tcg_out_insn(s, 3207, BR, TCG_REG_TMP);
+    tcg_out_insn(s, 3207, BR, TCG_REG_TMP0);
     set_jmp_reset_offset(s, which);
 }
 
@@ -1866,7 +1866,7 @@ void tb_target_set_jmp_target(const TranslationBlock *tb, int n,
         ptrdiff_t i_offset = i_addr - jmp_rx;
 
         /* Note that we asserted this in range in tcg_out_goto_tb. */
-        insn = deposit32(I3305_LDR | TCG_REG_TMP, 5, 19, i_offset >> 2);
+        insn = deposit32(I3305_LDR | TCG_REG_TMP0, 5, 19, i_offset >> 2);
     }
     qatomic_set((uint32_t *)jmp_rw, insn);
     flush_idcache_range(jmp_rx, jmp_rw, 4);
@@ -2060,13 +2060,13 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 
     case INDEX_op_rem_i64:
     case INDEX_op_rem_i32:
-        tcg_out_insn(s, 3508, SDIV, ext, TCG_REG_TMP, a1, a2);
-        tcg_out_insn(s, 3509, MSUB, ext, a0, TCG_REG_TMP, a2, a1);
+        tcg_out_insn(s, 3508, SDIV, ext, TCG_REG_TMP0, a1, a2);
+        tcg_out_insn(s, 3509, MSUB, ext, a0, TCG_REG_TMP0, a2, a1);
         break;
     case INDEX_op_remu_i64:
     case INDEX_op_remu_i32:
-        tcg_out_insn(s, 3508, UDIV, ext, TCG_REG_TMP, a1, a2);
-        tcg_out_insn(s, 3509, MSUB, ext, a0, TCG_REG_TMP, a2, a1);
+        tcg_out_insn(s, 3508, UDIV, ext, TCG_REG_TMP0, a1, a2);
+        tcg_out_insn(s, 3509, MSUB, ext, a0, TCG_REG_TMP0, a2, a1);
         break;
 
     case INDEX_op_shl_i64:
@@ -2110,8 +2110,8 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         if (c2) {
             tcg_out_rotl(s, ext, a0, a1, a2);
         } else {
-            tcg_out_insn(s, 3502, SUB, 0, TCG_REG_TMP, TCG_REG_XZR, a2);
-            tcg_out_insn(s, 3508, RORV, ext, a0, a1, TCG_REG_TMP);
+            tcg_out_insn(s, 3502, SUB, 0, TCG_REG_TMP0, TCG_REG_XZR, a2);
+            tcg_out_insn(s, 3508, RORV, ext, a0, a1, TCG_REG_TMP0);
         }
         break;
 
@@ -2517,8 +2517,8 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
                             break;
                         }
                     }
-                    tcg_out_dupi_vec(s, type, MO_8, TCG_VEC_TMP, 0);
-                    a2 = TCG_VEC_TMP;
+                    tcg_out_dupi_vec(s, type, MO_8, TCG_VEC_TMP0, 0);
+                    a2 = TCG_VEC_TMP0;
                 }
                 if (is_scalar) {
                     insn = cmp_scalar_insn[cond];
@@ -2900,9 +2900,9 @@ static void tcg_target_init(TCGContext *s)
     s->reserved_regs = 0;
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_SP);
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_FP);
-    tcg_regset_set_reg(s->reserved_regs, TCG_REG_TMP);
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_X18); /* platform register */
-    tcg_regset_set_reg(s->reserved_regs, TCG_VEC_TMP);
+    tcg_regset_set_reg(s->reserved_regs, TCG_REG_TMP0);
+    tcg_regset_set_reg(s->reserved_regs, TCG_VEC_TMP0);
 }
 
 /* Saving pairs: (X19, X20) .. (X27, X28), (X29(fp), X30(lr)).  */
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PULL 07/27] tcg/aarch64: Reserve TCG_REG_TMP1, TCG_REG_TMP2
  2023-05-30 18:59 [PULL 00/27] tcg patch queue Richard Henderson
                   ` (5 preceding siblings ...)
  2023-05-30 18:59 ` [PULL 06/27] tcg/aarch64: Rename temporaries Richard Henderson
@ 2023-05-30 18:59 ` Richard Henderson
  2023-05-30 18:59 ` [PULL 08/27] tcg/aarch64: Simplify constraints on qemu_ld/st Richard Henderson
                   ` (20 subsequent siblings)
  27 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2023-05-30 18:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/aarch64/tcg-target.c.inc | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
index 8996e29ca9..5e7ac6fb76 100644
--- a/tcg/aarch64/tcg-target.c.inc
+++ b/tcg/aarch64/tcg-target.c.inc
@@ -40,11 +40,12 @@ static const int tcg_target_reg_alloc_order[] = {
 
     TCG_REG_X8, TCG_REG_X9, TCG_REG_X10, TCG_REG_X11,
     TCG_REG_X12, TCG_REG_X13, TCG_REG_X14, TCG_REG_X15,
-    TCG_REG_X16, TCG_REG_X17,
 
     TCG_REG_X0, TCG_REG_X1, TCG_REG_X2, TCG_REG_X3,
     TCG_REG_X4, TCG_REG_X5, TCG_REG_X6, TCG_REG_X7,
 
+    /* X16 reserved as temporary */
+    /* X17 reserved as temporary */
     /* X18 reserved by system */
     /* X19 reserved for AREG0 */
     /* X29 reserved as fp */
@@ -71,7 +72,9 @@ static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind kind, int slot)
     return TCG_REG_X0 + slot;
 }
 
-#define TCG_REG_TMP0 TCG_REG_X30
+#define TCG_REG_TMP0 TCG_REG_X16
+#define TCG_REG_TMP1 TCG_REG_X17
+#define TCG_REG_TMP2 TCG_REG_X30
 #define TCG_VEC_TMP0 TCG_REG_V31
 
 #ifndef CONFIG_SOFTMMU
@@ -2902,6 +2905,8 @@ static void tcg_target_init(TCGContext *s)
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_FP);
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_X18); /* platform register */
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_TMP0);
+    tcg_regset_set_reg(s->reserved_regs, TCG_REG_TMP1);
+    tcg_regset_set_reg(s->reserved_regs, TCG_REG_TMP2);
     tcg_regset_set_reg(s->reserved_regs, TCG_VEC_TMP0);
 }
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PULL 08/27] tcg/aarch64: Simplify constraints on qemu_ld/st
  2023-05-30 18:59 [PULL 00/27] tcg patch queue Richard Henderson
                   ` (6 preceding siblings ...)
  2023-05-30 18:59 ` [PULL 07/27] tcg/aarch64: Reserve TCG_REG_TMP1, TCG_REG_TMP2 Richard Henderson
@ 2023-05-30 18:59 ` Richard Henderson
  2023-05-30 18:59 ` [PULL 09/27] tcg/aarch64: Support 128-bit load/store Richard Henderson
                   ` (19 subsequent siblings)
  27 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2023-05-30 18:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell

Adjust the softmmu tlb to use TMP[0-2], not any of the normally available
registers.  Since we handle overlap betwen inputs and helper arguments,
we can allow any allocatable reg.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/aarch64/tcg-target-con-set.h |  2 --
 tcg/aarch64/tcg-target-con-str.h |  1 -
 tcg/aarch64/tcg-target.c.inc     | 45 ++++++++++++++------------------
 3 files changed, 19 insertions(+), 29 deletions(-)

diff --git a/tcg/aarch64/tcg-target-con-set.h b/tcg/aarch64/tcg-target-con-set.h
index d6c6866878..109f4ab97c 100644
--- a/tcg/aarch64/tcg-target-con-set.h
+++ b/tcg/aarch64/tcg-target-con-set.h
@@ -10,11 +10,9 @@
  * tcg-target-con-str.h; the constraint combination is inclusive or.
  */
 C_O0_I1(r)
-C_O0_I2(lZ, l)
 C_O0_I2(r, rA)
 C_O0_I2(rZ, r)
 C_O0_I2(w, r)
-C_O1_I1(r, l)
 C_O1_I1(r, r)
 C_O1_I1(w, r)
 C_O1_I1(w, w)
diff --git a/tcg/aarch64/tcg-target-con-str.h b/tcg/aarch64/tcg-target-con-str.h
index 00adb64594..fb1a845b4f 100644
--- a/tcg/aarch64/tcg-target-con-str.h
+++ b/tcg/aarch64/tcg-target-con-str.h
@@ -9,7 +9,6 @@
  * REGS(letter, register_mask)
  */
 REGS('r', ALL_GENERAL_REGS)
-REGS('l', ALL_QLDST_REGS)
 REGS('w', ALL_VECTOR_REGS)
 
 /*
diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
index 5e7ac6fb76..00d9033e2f 100644
--- a/tcg/aarch64/tcg-target.c.inc
+++ b/tcg/aarch64/tcg-target.c.inc
@@ -132,14 +132,6 @@ static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
 #define ALL_GENERAL_REGS  0xffffffffu
 #define ALL_VECTOR_REGS   0xffffffff00000000ull
 
-#ifdef CONFIG_SOFTMMU
-#define ALL_QLDST_REGS \
-    (ALL_GENERAL_REGS & ~((1 << TCG_REG_X0) | (1 << TCG_REG_X1) | \
-                          (1 << TCG_REG_X2) | (1 << TCG_REG_X3)))
-#else
-#define ALL_QLDST_REGS   ALL_GENERAL_REGS
-#endif
-
 /* Match a constant valid for addition (12-bit, optionally shifted).  */
 static inline bool is_aimm(uint64_t val)
 {
@@ -1648,7 +1640,7 @@ static TCGLabelQemuLdst *prepare_host_addr(TCGContext *s, HostAddress *h,
     unsigned s_bits = opc & MO_SIZE;
     unsigned s_mask = (1u << s_bits) - 1;
     unsigned mem_index = get_mmuidx(oi);
-    TCGReg x3;
+    TCGReg addr_adj;
     TCGType mask_type;
     uint64_t compare_mask;
 
@@ -1660,27 +1652,27 @@ static TCGLabelQemuLdst *prepare_host_addr(TCGContext *s, HostAddress *h,
     mask_type = (s->page_bits + s->tlb_dyn_max_bits > 32
                  ? TCG_TYPE_I64 : TCG_TYPE_I32);
 
-    /* Load env_tlb(env)->f[mmu_idx].{mask,table} into {x0,x1}.  */
+    /* Load env_tlb(env)->f[mmu_idx].{mask,table} into {tmp0,tmp1}. */
     QEMU_BUILD_BUG_ON(TLB_MASK_TABLE_OFS(0) > 0);
     QEMU_BUILD_BUG_ON(TLB_MASK_TABLE_OFS(0) < -512);
     QEMU_BUILD_BUG_ON(offsetof(CPUTLBDescFast, mask) != 0);
     QEMU_BUILD_BUG_ON(offsetof(CPUTLBDescFast, table) != 8);
-    tcg_out_insn(s, 3314, LDP, TCG_REG_X0, TCG_REG_X1, TCG_AREG0,
+    tcg_out_insn(s, 3314, LDP, TCG_REG_TMP0, TCG_REG_TMP1, TCG_AREG0,
                  TLB_MASK_TABLE_OFS(mem_index), 1, 0);
 
     /* Extract the TLB index from the address into X0.  */
     tcg_out_insn(s, 3502S, AND_LSR, mask_type == TCG_TYPE_I64,
-                 TCG_REG_X0, TCG_REG_X0, addr_reg,
+                 TCG_REG_TMP0, TCG_REG_TMP0, addr_reg,
                  s->page_bits - CPU_TLB_ENTRY_BITS);
 
-    /* Add the tlb_table pointer, creating the CPUTLBEntry address into X1.  */
-    tcg_out_insn(s, 3502, ADD, 1, TCG_REG_X1, TCG_REG_X1, TCG_REG_X0);
+    /* Add the tlb_table pointer, forming the CPUTLBEntry address in TMP1. */
+    tcg_out_insn(s, 3502, ADD, 1, TCG_REG_TMP1, TCG_REG_TMP1, TCG_REG_TMP0);
 
-    /* Load the tlb comparator into X0, and the fast path addend into X1.  */
-    tcg_out_ld(s, addr_type, TCG_REG_X0, TCG_REG_X1,
+    /* Load the tlb comparator into TMP0, and the fast path addend into TMP1. */
+    tcg_out_ld(s, addr_type, TCG_REG_TMP0, TCG_REG_TMP1,
                is_ld ? offsetof(CPUTLBEntry, addr_read)
                      : offsetof(CPUTLBEntry, addr_write));
-    tcg_out_ld(s, TCG_TYPE_PTR, TCG_REG_X1, TCG_REG_X1,
+    tcg_out_ld(s, TCG_TYPE_PTR, TCG_REG_TMP1, TCG_REG_TMP1,
                offsetof(CPUTLBEntry, addend));
 
     /*
@@ -1689,25 +1681,26 @@ static TCGLabelQemuLdst *prepare_host_addr(TCGContext *s, HostAddress *h,
      * cross pages using the address of the last byte of the access.
      */
     if (a_mask >= s_mask) {
-        x3 = addr_reg;
+        addr_adj = addr_reg;
     } else {
+        addr_adj = TCG_REG_TMP2;
         tcg_out_insn(s, 3401, ADDI, addr_type,
-                     TCG_REG_X3, addr_reg, s_mask - a_mask);
-        x3 = TCG_REG_X3;
+                     addr_adj, addr_reg, s_mask - a_mask);
     }
     compare_mask = (uint64_t)s->page_mask | a_mask;
 
-    /* Store the page mask part of the address into X3.  */
-    tcg_out_logicali(s, I3404_ANDI, addr_type, TCG_REG_X3, x3, compare_mask);
+    /* Store the page mask part of the address into TMP2.  */
+    tcg_out_logicali(s, I3404_ANDI, addr_type, TCG_REG_TMP2,
+                     addr_adj, compare_mask);
 
     /* Perform the address comparison. */
-    tcg_out_cmp(s, addr_type, TCG_REG_X0, TCG_REG_X3, 0);
+    tcg_out_cmp(s, addr_type, TCG_REG_TMP0, TCG_REG_TMP2, 0);
 
     /* If not equal, we jump to the slow path. */
     ldst->label_ptr[0] = s->code_ptr;
     tcg_out_insn(s, 3202, B_C, TCG_COND_NE, 0);
 
-    h->base = TCG_REG_X1,
+    h->base = TCG_REG_TMP1;
     h->index = addr_reg;
     h->index_ext = addr_type;
 #else
@@ -2802,12 +2795,12 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_qemu_ld_a64_i32:
     case INDEX_op_qemu_ld_a32_i64:
     case INDEX_op_qemu_ld_a64_i64:
-        return C_O1_I1(r, l);
+        return C_O1_I1(r, r);
     case INDEX_op_qemu_st_a32_i32:
     case INDEX_op_qemu_st_a64_i32:
     case INDEX_op_qemu_st_a32_i64:
     case INDEX_op_qemu_st_a64_i64:
-        return C_O0_I2(lZ, l);
+        return C_O0_I2(rZ, r);
 
     case INDEX_op_deposit_i32:
     case INDEX_op_deposit_i64:
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PULL 09/27] tcg/aarch64: Support 128-bit load/store
  2023-05-30 18:59 [PULL 00/27] tcg patch queue Richard Henderson
                   ` (7 preceding siblings ...)
  2023-05-30 18:59 ` [PULL 08/27] tcg/aarch64: Simplify constraints on qemu_ld/st Richard Henderson
@ 2023-05-30 18:59 ` Richard Henderson
  2023-05-30 18:59 ` [PULL 10/27] tcg/ppc: " Richard Henderson
                   ` (18 subsequent siblings)
  27 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2023-05-30 18:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell

With FEAT_LSE2, LDP/STP suffices.  Without FEAT_LSE2, use LDXP+STXP
16-byte atomicity is required and LDP/STP otherwise.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/aarch64/tcg-target-con-set.h |   2 +
 tcg/aarch64/tcg-target.h         |  11 ++-
 tcg/aarch64/tcg-target.c.inc     | 141 ++++++++++++++++++++++++++++++-
 3 files changed, 151 insertions(+), 3 deletions(-)

diff --git a/tcg/aarch64/tcg-target-con-set.h b/tcg/aarch64/tcg-target-con-set.h
index 109f4ab97c..3fdee26a3d 100644
--- a/tcg/aarch64/tcg-target-con-set.h
+++ b/tcg/aarch64/tcg-target-con-set.h
@@ -13,6 +13,7 @@ C_O0_I1(r)
 C_O0_I2(r, rA)
 C_O0_I2(rZ, r)
 C_O0_I2(w, r)
+C_O0_I3(rZ, rZ, r)
 C_O1_I1(r, r)
 C_O1_I1(w, r)
 C_O1_I1(w, w)
@@ -31,4 +32,5 @@ C_O1_I2(w, w, wO)
 C_O1_I2(w, w, wZ)
 C_O1_I3(w, w, w, w)
 C_O1_I4(r, r, rA, rZ, rZ)
+C_O2_I1(r, r, r)
 C_O2_I4(r, r, rZ, rZ, rA, rMZ)
diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index d5f7614880..192a2758c5 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -131,7 +131,16 @@ typedef enum {
 #define TCG_TARGET_HAS_muluh_i64        1
 #define TCG_TARGET_HAS_mulsh_i64        1
 
-#define TCG_TARGET_HAS_qemu_ldst_i128   0
+/*
+ * Without FEAT_LSE2, we must use LDXP+STXP to implement atomic 128-bit load,
+ * which requires writable pages.  We must defer to the helper for user-only,
+ * but in system mode all ram is writable for the host.
+ */
+#ifdef CONFIG_USER_ONLY
+#define TCG_TARGET_HAS_qemu_ldst_i128   have_lse2
+#else
+#define TCG_TARGET_HAS_qemu_ldst_i128   1
+#endif
 
 #define TCG_TARGET_HAS_v64              1
 #define TCG_TARGET_HAS_v128             1
diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
index 00d9033e2f..261ad25210 100644
--- a/tcg/aarch64/tcg-target.c.inc
+++ b/tcg/aarch64/tcg-target.c.inc
@@ -385,6 +385,10 @@ typedef enum {
     I3305_LDR_v64   = 0x5c000000,
     I3305_LDR_v128  = 0x9c000000,
 
+    /* Load/store exclusive. */
+    I3306_LDXP      = 0xc8600000,
+    I3306_STXP      = 0xc8200000,
+
     /* Load/store register.  Described here as 3.3.12, but the helper
        that emits them can transform to 3.3.10 or 3.3.13.  */
     I3312_STRB      = 0x38000000 | LDST_ST << 22 | MO_8 << 30,
@@ -449,6 +453,9 @@ typedef enum {
     I3406_ADR       = 0x10000000,
     I3406_ADRP      = 0x90000000,
 
+    /* Add/subtract extended register instructions. */
+    I3501_ADD       = 0x0b200000,
+
     /* Add/subtract shifted register instructions (without a shift).  */
     I3502_ADD       = 0x0b000000,
     I3502_ADDS      = 0x2b000000,
@@ -619,6 +626,12 @@ static void tcg_out_insn_3305(TCGContext *s, AArch64Insn insn,
     tcg_out32(s, insn | (imm19 & 0x7ffff) << 5 | rt);
 }
 
+static void tcg_out_insn_3306(TCGContext *s, AArch64Insn insn, TCGReg rs,
+                              TCGReg rt, TCGReg rt2, TCGReg rn)
+{
+    tcg_out32(s, insn | rs << 16 | rt2 << 10 | rn << 5 | rt);
+}
+
 static void tcg_out_insn_3201(TCGContext *s, AArch64Insn insn, TCGType ext,
                               TCGReg rt, int imm19)
 {
@@ -701,6 +714,14 @@ static void tcg_out_insn_3406(TCGContext *s, AArch64Insn insn,
     tcg_out32(s, insn | (disp & 3) << 29 | (disp & 0x1ffffc) << (5 - 2) | rd);
 }
 
+static inline void tcg_out_insn_3501(TCGContext *s, AArch64Insn insn,
+                                     TCGType sf, TCGReg rd, TCGReg rn,
+                                     TCGReg rm, int opt, int imm3)
+{
+    tcg_out32(s, insn | sf << 31 | rm << 16 | opt << 13 |
+              imm3 << 10 | rn << 5 | rd);
+}
+
 /* This function is for both 3.5.2 (Add/Subtract shifted register), for
    the rare occasion when we actually want to supply a shift amount.  */
 static inline void tcg_out_insn_3502S(TCGContext *s, AArch64Insn insn,
@@ -1628,16 +1649,16 @@ static TCGLabelQemuLdst *prepare_host_addr(TCGContext *s, HostAddress *h,
     TCGType addr_type = s->addr_type;
     TCGLabelQemuLdst *ldst = NULL;
     MemOp opc = get_memop(oi);
+    MemOp s_bits = opc & MO_SIZE;
     unsigned a_mask;
 
     h->aa = atom_and_align_for_opc(s, opc,
                                    have_lse2 ? MO_ATOM_WITHIN16
                                              : MO_ATOM_IFALIGN,
-                                   false);
+                                   s_bits == MO_128);
     a_mask = (1 << h->aa.align) - 1;
 
 #ifdef CONFIG_SOFTMMU
-    unsigned s_bits = opc & MO_SIZE;
     unsigned s_mask = (1u << s_bits) - 1;
     unsigned mem_index = get_mmuidx(oi);
     TCGReg addr_adj;
@@ -1818,6 +1839,108 @@ static void tcg_out_qemu_st(TCGContext *s, TCGReg data_reg, TCGReg addr_reg,
     }
 }
 
+static void tcg_out_qemu_ldst_i128(TCGContext *s, TCGReg datalo, TCGReg datahi,
+                                   TCGReg addr_reg, MemOpIdx oi, bool is_ld)
+{
+    TCGLabelQemuLdst *ldst;
+    HostAddress h;
+    TCGReg base;
+    bool use_pair;
+
+    ldst = prepare_host_addr(s, &h, addr_reg, oi, is_ld);
+
+    /* Compose the final address, as LDP/STP have no indexing. */
+    if (h.index == TCG_REG_XZR) {
+        base = h.base;
+    } else {
+        base = TCG_REG_TMP2;
+        if (h.index_ext == TCG_TYPE_I32) {
+            /* add base, base, index, uxtw */
+            tcg_out_insn(s, 3501, ADD, TCG_TYPE_I64, base,
+                         h.base, h.index, MO_32, 0);
+        } else {
+            /* add base, base, index */
+            tcg_out_insn(s, 3502, ADD, 1, base, h.base, h.index);
+        }
+    }
+
+    use_pair = h.aa.atom < MO_128 || have_lse2;
+
+    if (!use_pair) {
+        tcg_insn_unit *branch = NULL;
+        TCGReg ll, lh, sl, sh;
+
+        /*
+         * If we have already checked for 16-byte alignment, that's all
+         * we need. Otherwise we have determined that misaligned atomicity
+         * may be handled with two 8-byte loads.
+         */
+        if (h.aa.align < MO_128) {
+            /*
+             * TODO: align should be MO_64, so we only need test bit 3,
+             * which means we could use TBNZ instead of ANDS+B_C.
+             */
+            tcg_out_logicali(s, I3404_ANDSI, 0, TCG_REG_XZR, addr_reg, 15);
+            branch = s->code_ptr;
+            tcg_out_insn(s, 3202, B_C, TCG_COND_NE, 0);
+            use_pair = true;
+        }
+
+        if (is_ld) {
+            /*
+             * 16-byte atomicity without LSE2 requires LDXP+STXP loop:
+             *    ldxp lo, hi, [base]
+             *    stxp t0, lo, hi, [base]
+             *    cbnz t0, .-8
+             * Require no overlap between data{lo,hi} and base.
+             */
+            if (datalo == base || datahi == base) {
+                tcg_out_mov(s, TCG_TYPE_REG, TCG_REG_TMP2, base);
+                base = TCG_REG_TMP2;
+            }
+            ll = sl = datalo;
+            lh = sh = datahi;
+        } else {
+            /*
+             * 16-byte atomicity without LSE2 requires LDXP+STXP loop:
+             * 1: ldxp t0, t1, [base]
+             *    stxp t0, lo, hi, [base]
+             *    cbnz t0, 1b
+             */
+            tcg_debug_assert(base != TCG_REG_TMP0 && base != TCG_REG_TMP1);
+            ll = TCG_REG_TMP0;
+            lh = TCG_REG_TMP1;
+            sl = datalo;
+            sh = datahi;
+        }
+
+        tcg_out_insn(s, 3306, LDXP, TCG_REG_XZR, ll, lh, base);
+        tcg_out_insn(s, 3306, STXP, TCG_REG_TMP0, sl, sh, base);
+        tcg_out_insn(s, 3201, CBNZ, 0, TCG_REG_TMP0, -2);
+
+        if (use_pair) {
+            /* "b .+8", branching across the one insn of use_pair. */
+            tcg_out_insn(s, 3206, B, 2);
+            reloc_pc19(branch, tcg_splitwx_to_rx(s->code_ptr));
+        }
+    }
+
+    if (use_pair) {
+        if (is_ld) {
+            tcg_out_insn(s, 3314, LDP, datalo, datahi, base, 0, 1, 0);
+        } else {
+            tcg_out_insn(s, 3314, STP, datalo, datahi, base, 0, 1, 0);
+        }
+    }
+
+    if (ldst) {
+        ldst->type = TCG_TYPE_I128;
+        ldst->datalo_reg = datalo;
+        ldst->datahi_reg = datahi;
+        ldst->raddr = tcg_splitwx_to_rx(s->code_ptr);
+    }
+}
+
 static const tcg_insn_unit *tb_ret_addr;
 
 static void tcg_out_exit_tb(TCGContext *s, uintptr_t a0)
@@ -2157,6 +2280,14 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_qemu_st_a64_i64:
         tcg_out_qemu_st(s, REG0(0), a1, a2, ext);
         break;
+    case INDEX_op_qemu_ld_a32_i128:
+    case INDEX_op_qemu_ld_a64_i128:
+        tcg_out_qemu_ldst_i128(s, a0, a1, a2, args[3], true);
+        break;
+    case INDEX_op_qemu_st_a32_i128:
+    case INDEX_op_qemu_st_a64_i128:
+        tcg_out_qemu_ldst_i128(s, REG0(0), REG0(1), a2, args[3], false);
+        break;
 
     case INDEX_op_bswap64_i64:
         tcg_out_rev(s, TCG_TYPE_I64, MO_64, a0, a1);
@@ -2796,11 +2927,17 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_qemu_ld_a32_i64:
     case INDEX_op_qemu_ld_a64_i64:
         return C_O1_I1(r, r);
+    case INDEX_op_qemu_ld_a32_i128:
+    case INDEX_op_qemu_ld_a64_i128:
+        return C_O2_I1(r, r, r);
     case INDEX_op_qemu_st_a32_i32:
     case INDEX_op_qemu_st_a64_i32:
     case INDEX_op_qemu_st_a32_i64:
     case INDEX_op_qemu_st_a64_i64:
         return C_O0_I2(rZ, r);
+    case INDEX_op_qemu_st_a32_i128:
+    case INDEX_op_qemu_st_a64_i128:
+        return C_O0_I3(rZ, rZ, r);
 
     case INDEX_op_deposit_i32:
     case INDEX_op_deposit_i64:
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PULL 10/27] tcg/ppc: Support 128-bit load/store
  2023-05-30 18:59 [PULL 00/27] tcg patch queue Richard Henderson
                   ` (8 preceding siblings ...)
  2023-05-30 18:59 ` [PULL 09/27] tcg/aarch64: Support 128-bit load/store Richard Henderson
@ 2023-05-30 18:59 ` Richard Henderson
  2023-05-30 18:59 ` [PULL 11/27] tcg/s390x: " Richard Henderson
                   ` (17 subsequent siblings)
  27 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2023-05-30 18:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: Daniel Henrique Barboza

Use LQ/STQ with ISA v2.07, and 16-byte atomicity is required.
Note that these instructions do not require 16-byte alignment.

Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/ppc/tcg-target-con-set.h |   2 +
 tcg/ppc/tcg-target-con-str.h |   1 +
 tcg/ppc/tcg-target.h         |   3 +-
 tcg/ppc/tcg-target.c.inc     | 108 +++++++++++++++++++++++++++++++----
 4 files changed, 101 insertions(+), 13 deletions(-)

diff --git a/tcg/ppc/tcg-target-con-set.h b/tcg/ppc/tcg-target-con-set.h
index f206b29205..bbd7b21247 100644
--- a/tcg/ppc/tcg-target-con-set.h
+++ b/tcg/ppc/tcg-target-con-set.h
@@ -14,6 +14,7 @@ C_O0_I2(r, r)
 C_O0_I2(r, ri)
 C_O0_I2(v, r)
 C_O0_I3(r, r, r)
+C_O0_I3(o, m, r)
 C_O0_I4(r, r, ri, ri)
 C_O0_I4(r, r, r, r)
 C_O1_I1(r, r)
@@ -34,6 +35,7 @@ C_O1_I3(v, v, v, v)
 C_O1_I4(r, r, ri, rZ, rZ)
 C_O1_I4(r, r, r, ri, ri)
 C_O2_I1(r, r, r)
+C_O2_I1(o, m, r)
 C_O2_I2(r, r, r, r)
 C_O2_I4(r, r, rI, rZM, r, r)
 C_O2_I4(r, r, r, r, rI, rZM)
diff --git a/tcg/ppc/tcg-target-con-str.h b/tcg/ppc/tcg-target-con-str.h
index 094613cbcb..20846901de 100644
--- a/tcg/ppc/tcg-target-con-str.h
+++ b/tcg/ppc/tcg-target-con-str.h
@@ -9,6 +9,7 @@
  * REGS(letter, register_mask)
  */
 REGS('r', ALL_GENERAL_REGS)
+REGS('o', ALL_GENERAL_REGS & 0xAAAAAAAAu)  /* odd registers */
 REGS('v', ALL_VECTOR_REGS)
 
 /*
diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index 0914380bd7..204b70f86a 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -149,7 +149,8 @@ extern bool have_vsx;
 #define TCG_TARGET_HAS_mulsh_i64        1
 #endif
 
-#define TCG_TARGET_HAS_qemu_ldst_i128   0
+#define TCG_TARGET_HAS_qemu_ldst_i128   \
+    (TCG_TARGET_REG_BITS == 64 && have_isa_2_07)
 
 /*
  * While technically Altivec could support V64, it has no 64-bit store
diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
index d4269dffcf..d47a9e3478 100644
--- a/tcg/ppc/tcg-target.c.inc
+++ b/tcg/ppc/tcg-target.c.inc
@@ -295,25 +295,27 @@ static bool tcg_target_const_match(int64_t val, TCGType type, int ct)
 
 #define B      OPCD( 18)
 #define BC     OPCD( 16)
+
 #define LBZ    OPCD( 34)
 #define LHZ    OPCD( 40)
 #define LHA    OPCD( 42)
 #define LWZ    OPCD( 32)
 #define LWZUX  XO31( 55)
-#define STB    OPCD( 38)
-#define STH    OPCD( 44)
-#define STW    OPCD( 36)
-
-#define STD    XO62(  0)
-#define STDU   XO62(  1)
-#define STDX   XO31(149)
-
 #define LD     XO58(  0)
 #define LDX    XO31( 21)
 #define LDU    XO58(  1)
 #define LDUX   XO31( 53)
 #define LWA    XO58(  2)
 #define LWAX   XO31(341)
+#define LQ     OPCD( 56)
+
+#define STB    OPCD( 38)
+#define STH    OPCD( 44)
+#define STW    OPCD( 36)
+#define STD    XO62(  0)
+#define STDU   XO62(  1)
+#define STDX   XO31(149)
+#define STQ    XO62(  2)
 
 #define ADDIC  OPCD( 12)
 #define ADDI   OPCD( 14)
@@ -2020,7 +2022,18 @@ typedef struct {
 
 bool tcg_target_has_memory_bswap(MemOp memop)
 {
-    return true;
+    TCGAtomAlign aa;
+
+    if ((memop & MO_SIZE) <= MO_64) {
+        return true;
+    }
+
+    /*
+     * Reject 16-byte memop with 16-byte atomicity,
+     * but do allow a pair of 64-bit operations.
+     */
+    aa = atom_and_align_for_opc(tcg_ctx, memop, MO_ATOM_IFALIGN, true);
+    return aa.atom <= MO_64;
 }
 
 /*
@@ -2035,7 +2048,7 @@ static TCGLabelQemuLdst *prepare_host_addr(TCGContext *s, HostAddress *h,
 {
     TCGLabelQemuLdst *ldst = NULL;
     MemOp opc = get_memop(oi);
-    MemOp a_bits;
+    MemOp a_bits, s_bits;
 
     /*
      * Book II, Section 1.4, Single-Copy Atomicity, specifies:
@@ -2047,10 +2060,11 @@ static TCGLabelQemuLdst *prepare_host_addr(TCGContext *s, HostAddress *h,
      * As of 3.0, "the non-atomic access is performed as described in
      * the corresponding list", which matches MO_ATOM_SUBALIGN.
      */
+    s_bits = opc & MO_SIZE;
     h->aa = atom_and_align_for_opc(s, opc,
                                    have_isa_3_00 ? MO_ATOM_SUBALIGN
                                                  : MO_ATOM_IFALIGN,
-                                   false);
+                                   s_bits == MO_128);
     a_bits = h->aa.align;
 
 #ifdef CONFIG_SOFTMMU
@@ -2060,7 +2074,6 @@ static TCGLabelQemuLdst *prepare_host_addr(TCGContext *s, HostAddress *h,
     int fast_off = TLB_MASK_TABLE_OFS(mem_index);
     int mask_off = fast_off + offsetof(CPUTLBDescFast, mask);
     int table_off = fast_off + offsetof(CPUTLBDescFast, table);
-    unsigned s_bits = opc & MO_SIZE;
 
     ldst = new_ldst_label(s);
     ldst->is_ld = is_ld;
@@ -2303,6 +2316,60 @@ static void tcg_out_qemu_st(TCGContext *s, TCGReg datalo, TCGReg datahi,
     }
 }
 
+static void tcg_out_qemu_ldst_i128(TCGContext *s, TCGReg datalo, TCGReg datahi,
+                                   TCGReg addr_reg, MemOpIdx oi, bool is_ld)
+{
+    TCGLabelQemuLdst *ldst;
+    HostAddress h;
+    bool need_bswap;
+    uint32_t insn;
+    TCGReg index;
+
+    ldst = prepare_host_addr(s, &h, addr_reg, -1, oi, is_ld);
+
+    /* Compose the final address, as LQ/STQ have no indexing. */
+    index = h.index;
+    if (h.base != 0) {
+        index = TCG_REG_TMP1;
+        tcg_out32(s, ADD | TAB(index, h.base, h.index));
+    }
+    need_bswap = get_memop(oi) & MO_BSWAP;
+
+    if (h.aa.atom == MO_128) {
+        tcg_debug_assert(!need_bswap);
+        tcg_debug_assert(datalo & 1);
+        tcg_debug_assert(datahi == datalo - 1);
+        insn = is_ld ? LQ : STQ;
+        tcg_out32(s, insn | TAI(datahi, index, 0));
+    } else {
+        TCGReg d1, d2;
+
+        if (HOST_BIG_ENDIAN ^ need_bswap) {
+            d1 = datahi, d2 = datalo;
+        } else {
+            d1 = datalo, d2 = datahi;
+        }
+
+        if (need_bswap) {
+            tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R0, 8);
+            insn = is_ld ? LDBRX : STDBRX;
+            tcg_out32(s, insn | TAB(d1, 0, index));
+            tcg_out32(s, insn | TAB(d2, index, TCG_REG_R0));
+        } else {
+            insn = is_ld ? LD : STD;
+            tcg_out32(s, insn | TAI(d1, index, 0));
+            tcg_out32(s, insn | TAI(d2, index, 8));
+        }
+    }
+
+    if (ldst) {
+        ldst->type = TCG_TYPE_I128;
+        ldst->datalo_reg = datalo;
+        ldst->datahi_reg = datahi;
+        ldst->raddr = tcg_splitwx_to_rx(s->code_ptr);
+    }
+}
+
 static void tcg_out_nop_fill(tcg_insn_unit *p, int count)
 {
     int i;
@@ -2860,6 +2927,11 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
                             args[4], TCG_TYPE_I64);
         }
         break;
+    case INDEX_op_qemu_ld_a32_i128:
+    case INDEX_op_qemu_ld_a64_i128:
+        tcg_debug_assert(TCG_TARGET_REG_BITS == 64);
+        tcg_out_qemu_ldst_i128(s, args[0], args[1], args[2], args[3], true);
+        break;
 
     case INDEX_op_qemu_st_a64_i32:
         if (TCG_TARGET_REG_BITS == 32) {
@@ -2889,6 +2961,11 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
                             args[4], TCG_TYPE_I64);
         }
         break;
+    case INDEX_op_qemu_st_a32_i128:
+    case INDEX_op_qemu_st_a64_i128:
+        tcg_debug_assert(TCG_TARGET_REG_BITS == 64);
+        tcg_out_qemu_ldst_i128(s, args[0], args[1], args[2], args[3], false);
+        break;
 
     case INDEX_op_setcond_i32:
         tcg_out_setcond(s, TCG_TYPE_I32, args[3], args[0], args[1], args[2],
@@ -3722,6 +3799,13 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_qemu_st_a64_i64:
         return TCG_TARGET_REG_BITS == 64 ? C_O0_I2(r, r) : C_O0_I4(r, r, r, r);
 
+    case INDEX_op_qemu_ld_a32_i128:
+    case INDEX_op_qemu_ld_a64_i128:
+        return C_O2_I1(o, m, r);
+    case INDEX_op_qemu_st_a32_i128:
+    case INDEX_op_qemu_st_a64_i128:
+        return C_O0_I3(o, m, r);
+
     case INDEX_op_add_vec:
     case INDEX_op_sub_vec:
     case INDEX_op_mul_vec:
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PULL 11/27] tcg/s390x: Support 128-bit load/store
  2023-05-30 18:59 [PULL 00/27] tcg patch queue Richard Henderson
                   ` (9 preceding siblings ...)
  2023-05-30 18:59 ` [PULL 10/27] tcg/ppc: " Richard Henderson
@ 2023-05-30 18:59 ` Richard Henderson
  2023-07-10  8:58   ` TCG broken on s390x hosts (was: [PULL 11/27] tcg/s390x: Support 128-bit load/store) Thomas Huth
  2023-05-30 18:59 ` [PULL 12/27] accel/tcg: Extract load_atom_extract_al16_or_al8 to host header Richard Henderson
                   ` (16 subsequent siblings)
  27 siblings, 1 reply; 35+ messages in thread
From: Richard Henderson @ 2023-05-30 18:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell

Use LPQ/STPQ when 16-byte atomicity is required.
Note that these instructions require 16-byte alignment.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/s390x/tcg-target-con-set.h |   2 +
 tcg/s390x/tcg-target.h         |   2 +-
 tcg/s390x/tcg-target.c.inc     | 107 ++++++++++++++++++++++++++++++++-
 3 files changed, 107 insertions(+), 4 deletions(-)

diff --git a/tcg/s390x/tcg-target-con-set.h b/tcg/s390x/tcg-target-con-set.h
index ecc079bb6d..cbad91b2b5 100644
--- a/tcg/s390x/tcg-target-con-set.h
+++ b/tcg/s390x/tcg-target-con-set.h
@@ -14,6 +14,7 @@ C_O0_I2(r, r)
 C_O0_I2(r, ri)
 C_O0_I2(r, rA)
 C_O0_I2(v, r)
+C_O0_I3(o, m, r)
 C_O1_I1(r, r)
 C_O1_I1(v, r)
 C_O1_I1(v, v)
@@ -36,6 +37,7 @@ C_O1_I2(v, v, v)
 C_O1_I3(v, v, v, v)
 C_O1_I4(r, r, ri, rI, r)
 C_O1_I4(r, r, rA, rI, r)
+C_O2_I1(o, m, r)
 C_O2_I2(o, m, 0, r)
 C_O2_I2(o, m, r, r)
 C_O2_I3(o, m, 0, 1, r)
diff --git a/tcg/s390x/tcg-target.h b/tcg/s390x/tcg-target.h
index 170007bea5..ec96952172 100644
--- a/tcg/s390x/tcg-target.h
+++ b/tcg/s390x/tcg-target.h
@@ -140,7 +140,7 @@ extern uint64_t s390_facilities[3];
 #define TCG_TARGET_HAS_muluh_i64      0
 #define TCG_TARGET_HAS_mulsh_i64      0
 
-#define TCG_TARGET_HAS_qemu_ldst_i128 0
+#define TCG_TARGET_HAS_qemu_ldst_i128 1
 
 #define TCG_TARGET_HAS_v64            HAVE_FACILITY(VECTOR)
 #define TCG_TARGET_HAS_v128           HAVE_FACILITY(VECTOR)
diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index dfaa34c264..503126cd66 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -243,6 +243,7 @@ typedef enum S390Opcode {
     RXY_LLGF    = 0xe316,
     RXY_LLGH    = 0xe391,
     RXY_LMG     = 0xeb04,
+    RXY_LPQ     = 0xe38f,
     RXY_LRV     = 0xe31e,
     RXY_LRVG    = 0xe30f,
     RXY_LRVH    = 0xe31f,
@@ -253,6 +254,7 @@ typedef enum S390Opcode {
     RXY_STG     = 0xe324,
     RXY_STHY    = 0xe370,
     RXY_STMG    = 0xeb24,
+    RXY_STPQ    = 0xe38e,
     RXY_STRV    = 0xe33e,
     RXY_STRVG   = 0xe32f,
     RXY_STRVH   = 0xe33f,
@@ -1577,7 +1579,18 @@ typedef struct {
 
 bool tcg_target_has_memory_bswap(MemOp memop)
 {
-    return true;
+    TCGAtomAlign aa;
+
+    if ((memop & MO_SIZE) <= MO_64) {
+        return true;
+    }
+
+    /*
+     * Reject 16-byte memop with 16-byte atomicity,
+     * but do allow a pair of 64-bit operations.
+     */
+    aa = atom_and_align_for_opc(tcg_ctx, memop, MO_ATOM_IFALIGN, true);
+    return aa.atom <= MO_64;
 }
 
 static void tcg_out_qemu_ld_direct(TCGContext *s, MemOp opc, TCGReg data,
@@ -1734,13 +1747,13 @@ static TCGLabelQemuLdst *prepare_host_addr(TCGContext *s, HostAddress *h,
 {
     TCGLabelQemuLdst *ldst = NULL;
     MemOp opc = get_memop(oi);
+    MemOp s_bits = opc & MO_SIZE;
     unsigned a_mask;
 
-    h->aa = atom_and_align_for_opc(s, opc, MO_ATOM_IFALIGN, false);
+    h->aa = atom_and_align_for_opc(s, opc, MO_ATOM_IFALIGN, s_bits == MO_128);
     a_mask = (1 << h->aa.align) - 1;
 
 #ifdef CONFIG_SOFTMMU
-    unsigned s_bits = opc & MO_SIZE;
     unsigned s_mask = (1 << s_bits) - 1;
     int mem_index = get_mmuidx(oi);
     int fast_off = TLB_MASK_TABLE_OFS(mem_index);
@@ -1865,6 +1878,80 @@ static void tcg_out_qemu_st(TCGContext* s, TCGReg data_reg, TCGReg addr_reg,
     }
 }
 
+static void tcg_out_qemu_ldst_i128(TCGContext *s, TCGReg datalo, TCGReg datahi,
+                                   TCGReg addr_reg, MemOpIdx oi, bool is_ld)
+{
+    TCGLabel *l1 = NULL, *l2 = NULL;
+    TCGLabelQemuLdst *ldst;
+    HostAddress h;
+    bool need_bswap;
+    bool use_pair;
+    S390Opcode insn;
+
+    ldst = prepare_host_addr(s, &h, addr_reg, oi, is_ld);
+
+    use_pair = h.aa.atom < MO_128;
+    need_bswap = get_memop(oi) & MO_BSWAP;
+
+    if (!use_pair) {
+        /*
+         * Atomicity requires we use LPQ.  If we've already checked for
+         * 16-byte alignment, that's all we need.  If we arrive with
+         * lesser alignment, we have determined that less than 16-byte
+         * alignment can be satisfied with two 8-byte loads.
+         */
+        if (h.aa.align < MO_128) {
+            use_pair = true;
+            l1 = gen_new_label();
+            l2 = gen_new_label();
+
+            tcg_out_insn(s, RI, TMLL, addr_reg, 15);
+            tgen_branch(s, 7, l1); /* CC in {1,2,3} */
+        }
+
+        tcg_debug_assert(!need_bswap);
+        tcg_debug_assert(datalo & 1);
+        tcg_debug_assert(datahi == datalo - 1);
+        insn = is_ld ? RXY_LPQ : RXY_STPQ;
+        tcg_out_insn_RXY(s, insn, datahi, h.base, h.index, h.disp);
+
+        if (use_pair) {
+            tgen_branch(s, S390_CC_ALWAYS, l2);
+            tcg_out_label(s, l1);
+        }
+    }
+    if (use_pair) {
+        TCGReg d1, d2;
+
+        if (need_bswap) {
+            d1 = datalo, d2 = datahi;
+            insn = is_ld ? RXY_LRVG : RXY_STRVG;
+        } else {
+            d1 = datahi, d2 = datalo;
+            insn = is_ld ? RXY_LG : RXY_STG;
+        }
+
+        if (h.base == d1 || h.index == d1) {
+            tcg_out_insn(s, RXY, LAY, TCG_TMP0, h.base, h.index, h.disp);
+            h.base = TCG_TMP0;
+            h.index = TCG_REG_NONE;
+            h.disp = 0;
+        }
+        tcg_out_insn_RXY(s, insn, d1, h.base, h.index, h.disp);
+        tcg_out_insn_RXY(s, insn, d2, h.base, h.index, h.disp + 8);
+    }
+    if (l2) {
+        tcg_out_label(s, l2);
+    }
+
+    if (ldst) {
+        ldst->type = TCG_TYPE_I128;
+        ldst->datalo_reg = datalo;
+        ldst->datahi_reg = datahi;
+        ldst->raddr = tcg_splitwx_to_rx(s->code_ptr);
+    }
+}
+
 static void tcg_out_exit_tb(TCGContext *s, uintptr_t a0)
 {
     /* Reuse the zeroing that exists for goto_ptr.  */
@@ -2226,6 +2313,14 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_qemu_st_a64_i64:
         tcg_out_qemu_st(s, args[0], args[1], args[2], TCG_TYPE_I64);
         break;
+    case INDEX_op_qemu_ld_a32_i128:
+    case INDEX_op_qemu_ld_a64_i128:
+        tcg_out_qemu_ldst_i128(s, args[0], args[1], args[2], args[3], true);
+        break;
+    case INDEX_op_qemu_st_a32_i128:
+    case INDEX_op_qemu_st_a64_i128:
+        tcg_out_qemu_ldst_i128(s, args[0], args[1], args[2], args[3], false);
+        break;
 
     case INDEX_op_ld16s_i64:
         tcg_out_mem(s, 0, RXY_LGH, args[0], args[1], TCG_REG_NONE, args[2]);
@@ -3107,6 +3202,12 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_qemu_st_a32_i32:
     case INDEX_op_qemu_st_a64_i32:
         return C_O0_I2(r, r);
+    case INDEX_op_qemu_ld_a32_i128:
+    case INDEX_op_qemu_ld_a64_i128:
+        return C_O2_I1(o, m, r);
+    case INDEX_op_qemu_st_a32_i128:
+    case INDEX_op_qemu_st_a64_i128:
+        return C_O0_I3(o, m, r);
 
     case INDEX_op_deposit_i32:
     case INDEX_op_deposit_i64:
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PULL 12/27] accel/tcg: Extract load_atom_extract_al16_or_al8 to host header
  2023-05-30 18:59 [PULL 00/27] tcg patch queue Richard Henderson
                   ` (10 preceding siblings ...)
  2023-05-30 18:59 ` [PULL 11/27] tcg/s390x: " Richard Henderson
@ 2023-05-30 18:59 ` Richard Henderson
  2023-05-30 18:59 ` [PULL 13/27] accel/tcg: Extract store_atom_insert_al16 " Richard Henderson
                   ` (15 subsequent siblings)
  27 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2023-05-30 18:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 .../generic/host/load-extract-al16-al8.h      | 45 +++++++++++++++++++
 accel/tcg/ldst_atomicity.c.inc                | 36 +--------------
 2 files changed, 47 insertions(+), 34 deletions(-)
 create mode 100644 host/include/generic/host/load-extract-al16-al8.h

diff --git a/host/include/generic/host/load-extract-al16-al8.h b/host/include/generic/host/load-extract-al16-al8.h
new file mode 100644
index 0000000000..d95556130f
--- /dev/null
+++ b/host/include/generic/host/load-extract-al16-al8.h
@@ -0,0 +1,45 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ * Atomic extract 64 from 128-bit, generic version.
+ *
+ * Copyright (C) 2023 Linaro, Ltd.
+ */
+
+#ifndef HOST_LOAD_EXTRACT_AL16_AL8_H
+#define HOST_LOAD_EXTRACT_AL16_AL8_H
+
+/**
+ * load_atom_extract_al16_or_al8:
+ * @pv: host address
+ * @s: object size in bytes, @s <= 8.
+ *
+ * Load @s bytes from @pv, when pv % s != 0.  If [p, p+s-1] does not
+ * cross an 16-byte boundary then the access must be 16-byte atomic,
+ * otherwise the access must be 8-byte atomic.
+ */
+static inline uint64_t ATTRIBUTE_ATOMIC128_OPT
+load_atom_extract_al16_or_al8(void *pv, int s)
+{
+    uintptr_t pi = (uintptr_t)pv;
+    int o = pi & 7;
+    int shr = (HOST_BIG_ENDIAN ? 16 - s - o : o) * 8;
+    Int128 r;
+
+    pv = (void *)(pi & ~7);
+    if (pi & 8) {
+        uint64_t *p8 = __builtin_assume_aligned(pv, 16, 8);
+        uint64_t a = qatomic_read__nocheck(p8);
+        uint64_t b = qatomic_read__nocheck(p8 + 1);
+
+        if (HOST_BIG_ENDIAN) {
+            r = int128_make128(b, a);
+        } else {
+            r = int128_make128(a, b);
+        }
+    } else {
+        r = atomic16_read_ro(pv);
+    }
+    return int128_getlo(int128_urshift(r, shr));
+}
+
+#endif /* HOST_LOAD_EXTRACT_AL16_AL8_H */
diff --git a/accel/tcg/ldst_atomicity.c.inc b/accel/tcg/ldst_atomicity.c.inc
index 35ce6d6368..6063395e11 100644
--- a/accel/tcg/ldst_atomicity.c.inc
+++ b/accel/tcg/ldst_atomicity.c.inc
@@ -9,6 +9,8 @@
  * See the COPYING file in the top-level directory.
  */
 
+#include "host/load-extract-al16-al8.h"
+
 #ifdef CONFIG_ATOMIC64
 # define HAVE_al8          true
 #else
@@ -311,40 +313,6 @@ static uint64_t load_atom_extract_al16_or_exit(CPUArchState *env, uintptr_t ra,
     return int128_getlo(r);
 }
 
-/**
- * load_atom_extract_al16_or_al8:
- * @p: host address
- * @s: object size in bytes, @s <= 8.
- *
- * Load @s bytes from @p, when p % s != 0.  If [p, p+s-1] does not
- * cross an 16-byte boundary then the access must be 16-byte atomic,
- * otherwise the access must be 8-byte atomic.
- */
-static inline uint64_t ATTRIBUTE_ATOMIC128_OPT
-load_atom_extract_al16_or_al8(void *pv, int s)
-{
-    uintptr_t pi = (uintptr_t)pv;
-    int o = pi & 7;
-    int shr = (HOST_BIG_ENDIAN ? 16 - s - o : o) * 8;
-    Int128 r;
-
-    pv = (void *)(pi & ~7);
-    if (pi & 8) {
-        uint64_t *p8 = __builtin_assume_aligned(pv, 16, 8);
-        uint64_t a = qatomic_read__nocheck(p8);
-        uint64_t b = qatomic_read__nocheck(p8 + 1);
-
-        if (HOST_BIG_ENDIAN) {
-            r = int128_make128(b, a);
-        } else {
-            r = int128_make128(a, b);
-        }
-    } else {
-        r = atomic16_read_ro(pv);
-    }
-    return int128_getlo(int128_urshift(r, shr));
-}
-
 /**
  * load_atom_4_by_2:
  * @pv: host address
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PULL 13/27] accel/tcg: Extract store_atom_insert_al16 to host header
  2023-05-30 18:59 [PULL 00/27] tcg patch queue Richard Henderson
                   ` (11 preceding siblings ...)
  2023-05-30 18:59 ` [PULL 12/27] accel/tcg: Extract load_atom_extract_al16_or_al8 to host header Richard Henderson
@ 2023-05-30 18:59 ` Richard Henderson
  2023-05-30 18:59 ` [PULL 14/27] accel/tcg: Add x86_64 load_atom_extract_al16_or_al8 Richard Henderson
                   ` (14 subsequent siblings)
  27 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2023-05-30 18:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 host/include/generic/host/store-insert-al16.h | 50 +++++++++++++++++++
 accel/tcg/ldst_atomicity.c.inc                | 40 +--------------
 2 files changed, 51 insertions(+), 39 deletions(-)
 create mode 100644 host/include/generic/host/store-insert-al16.h

diff --git a/host/include/generic/host/store-insert-al16.h b/host/include/generic/host/store-insert-al16.h
new file mode 100644
index 0000000000..4a1662183d
--- /dev/null
+++ b/host/include/generic/host/store-insert-al16.h
@@ -0,0 +1,50 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ * Atomic store insert into 128-bit, generic version.
+ *
+ * Copyright (C) 2023 Linaro, Ltd.
+ */
+
+#ifndef HOST_STORE_INSERT_AL16_H
+#define HOST_STORE_INSERT_AL16_H
+
+/**
+ * store_atom_insert_al16:
+ * @p: host address
+ * @val: shifted value to store
+ * @msk: mask for value to store
+ *
+ * Atomically store @val to @p masked by @msk.
+ */
+static inline void ATTRIBUTE_ATOMIC128_OPT
+store_atom_insert_al16(Int128 *ps, Int128 val, Int128 msk)
+{
+#if defined(CONFIG_ATOMIC128)
+    __uint128_t *pu;
+    Int128Alias old, new;
+
+    /* With CONFIG_ATOMIC128, we can avoid the memory barriers. */
+    pu = __builtin_assume_aligned(ps, 16);
+    old.u = *pu;
+    msk = int128_not(msk);
+    do {
+        new.s = int128_and(old.s, msk);
+        new.s = int128_or(new.s, val);
+    } while (!__atomic_compare_exchange_n(pu, &old.u, new.u, true,
+                                          __ATOMIC_RELAXED, __ATOMIC_RELAXED));
+#else
+    Int128 old, new, cmp;
+
+    ps = __builtin_assume_aligned(ps, 16);
+    old = *ps;
+    msk = int128_not(msk);
+    do {
+        cmp = old;
+        new = int128_and(old, msk);
+        new = int128_or(new, val);
+        old = atomic16_cmpxchg(ps, cmp, new);
+    } while (int128_ne(cmp, old));
+#endif
+}
+
+#endif /* HOST_STORE_INSERT_AL16_H */
diff --git a/accel/tcg/ldst_atomicity.c.inc b/accel/tcg/ldst_atomicity.c.inc
index 6063395e11..2514899408 100644
--- a/accel/tcg/ldst_atomicity.c.inc
+++ b/accel/tcg/ldst_atomicity.c.inc
@@ -10,6 +10,7 @@
  */
 
 #include "host/load-extract-al16-al8.h"
+#include "host/store-insert-al16.h"
 
 #ifdef CONFIG_ATOMIC64
 # define HAVE_al8          true
@@ -681,45 +682,6 @@ static void store_atom_insert_al8(uint64_t *p, uint64_t val, uint64_t msk)
                                           __ATOMIC_RELAXED, __ATOMIC_RELAXED));
 }
 
-/**
- * store_atom_insert_al16:
- * @p: host address
- * @val: shifted value to store
- * @msk: mask for value to store
- *
- * Atomically store @val to @p masked by @msk.
- */
-static void ATTRIBUTE_ATOMIC128_OPT
-store_atom_insert_al16(Int128 *ps, Int128Alias val, Int128Alias msk)
-{
-#if defined(CONFIG_ATOMIC128)
-    __uint128_t *pu, old, new;
-
-    /* With CONFIG_ATOMIC128, we can avoid the memory barriers. */
-    pu = __builtin_assume_aligned(ps, 16);
-    old = *pu;
-    do {
-        new = (old & ~msk.u) | val.u;
-    } while (!__atomic_compare_exchange_n(pu, &old, new, true,
-                                          __ATOMIC_RELAXED, __ATOMIC_RELAXED));
-#elif defined(CONFIG_CMPXCHG128)
-    __uint128_t *pu, old, new;
-
-    /*
-     * Without CONFIG_ATOMIC128, __atomic_compare_exchange_n will always
-     * defer to libatomic, so we must use __sync_*_compare_and_swap_16
-     * and accept the sequential consistency that comes with it.
-     */
-    pu = __builtin_assume_aligned(ps, 16);
-    do {
-        old = *pu;
-        new = (old & ~msk.u) | val.u;
-    } while (!__sync_bool_compare_and_swap_16(pu, old, new));
-#else
-    qemu_build_not_reached();
-#endif
-}
-
 /**
  * store_bytes_leN:
  * @pv: host address
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PULL 14/27] accel/tcg: Add x86_64 load_atom_extract_al16_or_al8
  2023-05-30 18:59 [PULL 00/27] tcg patch queue Richard Henderson
                   ` (12 preceding siblings ...)
  2023-05-30 18:59 ` [PULL 13/27] accel/tcg: Extract store_atom_insert_al16 " Richard Henderson
@ 2023-05-30 18:59 ` Richard Henderson
  2023-05-30 18:59 ` [PULL 15/27] accel/tcg: Add aarch64 lse2 load_atom_extract_al16_or_al8 Richard Henderson
                   ` (13 subsequent siblings)
  27 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2023-05-30 18:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 .../x86_64/host/load-extract-al16-al8.h       | 50 +++++++++++++++++++
 1 file changed, 50 insertions(+)
 create mode 100644 host/include/x86_64/host/load-extract-al16-al8.h

diff --git a/host/include/x86_64/host/load-extract-al16-al8.h b/host/include/x86_64/host/load-extract-al16-al8.h
new file mode 100644
index 0000000000..31b6fe8c45
--- /dev/null
+++ b/host/include/x86_64/host/load-extract-al16-al8.h
@@ -0,0 +1,50 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ * Atomic extract 64 from 128-bit, x86_64 version.
+ *
+ * Copyright (C) 2023 Linaro, Ltd.
+ */
+
+#ifndef X86_64_LOAD_EXTRACT_AL16_AL8_H
+#define X86_64_LOAD_EXTRACT_AL16_AL8_H
+
+#ifdef CONFIG_INT128_TYPE
+#include "host/cpuinfo.h"
+
+/**
+ * load_atom_extract_al16_or_al8:
+ * @pv: host address
+ * @s: object size in bytes, @s <= 8.
+ *
+ * Load @s bytes from @pv, when pv % s != 0.  If [p, p+s-1] does not
+ * cross an 16-byte boundary then the access must be 16-byte atomic,
+ * otherwise the access must be 8-byte atomic.
+ */
+static inline uint64_t ATTRIBUTE_ATOMIC128_OPT
+load_atom_extract_al16_or_al8(void *pv, int s)
+{
+    uintptr_t pi = (uintptr_t)pv;
+    __int128_t *ptr_align = (__int128_t *)(pi & ~7);
+    int shr = (pi & 7) * 8;
+    Int128Alias r;
+
+    /*
+     * ptr_align % 16 is now only 0 or 8.
+     * If the host supports atomic loads with VMOVDQU, then always use that,
+     * making the branch highly predictable.  Otherwise we must use VMOVDQA
+     * when ptr_align % 16 == 0 for 16-byte atomicity.
+     */
+    if ((cpuinfo & CPUINFO_ATOMIC_VMOVDQU) || (pi & 8)) {
+        asm("vmovdqu %1, %0" : "=x" (r.i) : "m" (*ptr_align));
+    } else {
+        asm("vmovdqa %1, %0" : "=x" (r.i) : "m" (*ptr_align));
+    }
+    return int128_getlo(int128_urshift(r.s, shr));
+}
+#else
+/* Fallback definition that must be optimized away, or error.  */
+uint64_t QEMU_ERROR("unsupported atomic")
+    load_atom_extract_al16_or_al8(void *pv, int s);
+#endif
+
+#endif /* X86_64_LOAD_EXTRACT_AL16_AL8_H */
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PULL 15/27] accel/tcg: Add aarch64 lse2 load_atom_extract_al16_or_al8
  2023-05-30 18:59 [PULL 00/27] tcg patch queue Richard Henderson
                   ` (13 preceding siblings ...)
  2023-05-30 18:59 ` [PULL 14/27] accel/tcg: Add x86_64 load_atom_extract_al16_or_al8 Richard Henderson
@ 2023-05-30 18:59 ` Richard Henderson
  2023-05-30 18:59 ` [PULL 16/27] accel/tcg: Add aarch64 store_atom_insert_al16 Richard Henderson
                   ` (12 subsequent siblings)
  27 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2023-05-30 18:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 .../aarch64/host/load-extract-al16-al8.h      | 40 +++++++++++++++++++
 1 file changed, 40 insertions(+)
 create mode 100644 host/include/aarch64/host/load-extract-al16-al8.h

diff --git a/host/include/aarch64/host/load-extract-al16-al8.h b/host/include/aarch64/host/load-extract-al16-al8.h
new file mode 100644
index 0000000000..bd677c5e26
--- /dev/null
+++ b/host/include/aarch64/host/load-extract-al16-al8.h
@@ -0,0 +1,40 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ * Atomic extract 64 from 128-bit, AArch64 version.
+ *
+ * Copyright (C) 2023 Linaro, Ltd.
+ */
+
+#ifndef AARCH64_LOAD_EXTRACT_AL16_AL8_H
+#define AARCH64_LOAD_EXTRACT_AL16_AL8_H
+
+#include "host/cpuinfo.h"
+#include "tcg/debug-assert.h"
+
+/**
+ * load_atom_extract_al16_or_al8:
+ * @pv: host address
+ * @s: object size in bytes, @s <= 8.
+ *
+ * Load @s bytes from @pv, when pv % s != 0.  If [p, p+s-1] does not
+ * cross an 16-byte boundary then the access must be 16-byte atomic,
+ * otherwise the access must be 8-byte atomic.
+ */
+static inline uint64_t load_atom_extract_al16_or_al8(void *pv, int s)
+{
+    uintptr_t pi = (uintptr_t)pv;
+    __int128_t *ptr_align = (__int128_t *)(pi & ~7);
+    int shr = (pi & 7) * 8;
+    uint64_t l, h;
+
+    /*
+     * With FEAT_LSE2, LDP is single-copy atomic if 16-byte aligned
+     * and single-copy atomic on the parts if 8-byte aligned.
+     * All we need do is align the pointer mod 8.
+     */
+    tcg_debug_assert(HAVE_ATOMIC128_RO);
+    asm("ldp %0, %1, %2" : "=r"(l), "=r"(h) : "m"(*ptr_align));
+    return (l >> shr) | (h << (-shr & 63));
+}
+
+#endif /* AARCH64_LOAD_EXTRACT_AL16_AL8_H */
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PULL 16/27] accel/tcg: Add aarch64 store_atom_insert_al16
  2023-05-30 18:59 [PULL 00/27] tcg patch queue Richard Henderson
                   ` (14 preceding siblings ...)
  2023-05-30 18:59 ` [PULL 15/27] accel/tcg: Add aarch64 lse2 load_atom_extract_al16_or_al8 Richard Henderson
@ 2023-05-30 18:59 ` Richard Henderson
  2023-05-30 18:59 ` [PULL 17/27] tcg: Remove TCG_TARGET_TLB_DISPLACEMENT_BITS Richard Henderson
                   ` (11 subsequent siblings)
  27 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2023-05-30 18:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 host/include/aarch64/host/store-insert-al16.h | 47 +++++++++++++++++++
 1 file changed, 47 insertions(+)
 create mode 100644 host/include/aarch64/host/store-insert-al16.h

diff --git a/host/include/aarch64/host/store-insert-al16.h b/host/include/aarch64/host/store-insert-al16.h
new file mode 100644
index 0000000000..1943155bc6
--- /dev/null
+++ b/host/include/aarch64/host/store-insert-al16.h
@@ -0,0 +1,47 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ * Atomic store insert into 128-bit, AArch64 version.
+ *
+ * Copyright (C) 2023 Linaro, Ltd.
+ */
+
+#ifndef AARCH64_STORE_INSERT_AL16_H
+#define AARCH64_STORE_INSERT_AL16_H
+
+/**
+ * store_atom_insert_al16:
+ * @p: host address
+ * @val: shifted value to store
+ * @msk: mask for value to store
+ *
+ * Atomically store @val to @p masked by @msk.
+ */
+static inline void ATTRIBUTE_ATOMIC128_OPT
+store_atom_insert_al16(Int128 *ps, Int128 val, Int128 msk)
+{
+    /*
+     * GCC only implements __sync* primitives for int128 on aarch64.
+     * We can do better without the barriers, and integrating the
+     * arithmetic into the load-exclusive/store-conditional pair.
+     */
+    uint64_t tl, th, vl, vh, ml, mh;
+    uint32_t fail;
+
+    qemu_build_assert(!HOST_BIG_ENDIAN);
+    vl = int128_getlo(val);
+    vh = int128_gethi(val);
+    ml = int128_getlo(msk);
+    mh = int128_gethi(msk);
+
+    asm("0: ldxp %[l], %[h], %[mem]\n\t"
+        "bic %[l], %[l], %[ml]\n\t"
+        "bic %[h], %[h], %[mh]\n\t"
+        "orr %[l], %[l], %[vl]\n\t"
+        "orr %[h], %[h], %[vh]\n\t"
+        "stxp %w[f], %[l], %[h], %[mem]\n\t"
+        "cbnz %w[f], 0b\n"
+        : [mem] "+Q"(*ps), [f] "=&r"(fail), [l] "=&r"(tl), [h] "=&r"(th)
+        : [vl] "r"(vl), [vh] "r"(vh), [ml] "r"(ml), [mh] "r"(mh));
+}
+
+#endif /* AARCH64_STORE_INSERT_AL16_H */
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PULL 17/27] tcg: Remove TCG_TARGET_TLB_DISPLACEMENT_BITS
  2023-05-30 18:59 [PULL 00/27] tcg patch queue Richard Henderson
                   ` (15 preceding siblings ...)
  2023-05-30 18:59 ` [PULL 16/27] accel/tcg: Add aarch64 store_atom_insert_al16 Richard Henderson
@ 2023-05-30 18:59 ` Richard Henderson
  2023-05-30 18:59 ` [PULL 18/27] decodetree: Add --test-for-error Richard Henderson
                   ` (10 subsequent siblings)
  27 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2023-05-30 18:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: Philippe Mathieu-Daudé

The last use was removed by e77c89fb086a.

Fixes: e77c89fb086a ("cputlb: Remove static tlb sizing")
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/aarch64/tcg-target.h | 1 -
 tcg/arm/tcg-target.h     | 1 -
 tcg/i386/tcg-target.h    | 1 -
 tcg/mips/tcg-target.h    | 1 -
 tcg/ppc/tcg-target.h     | 1 -
 tcg/riscv/tcg-target.h   | 1 -
 tcg/s390x/tcg-target.h   | 1 -
 tcg/sparc64/tcg-target.h | 1 -
 tcg/tci/tcg-target.h     | 1 -
 9 files changed, 9 deletions(-)

diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index 192a2758c5..ce64de06e5 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -16,7 +16,6 @@
 #include "host/cpuinfo.h"
 
 #define TCG_TARGET_INSN_UNIT_SIZE  4
-#define TCG_TARGET_TLB_DISPLACEMENT_BITS 24
 #define MAX_CODE_GEN_BUFFER_SIZE  ((size_t)-1)
 
 typedef enum {
diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
index 65efc538f4..c649db72a6 100644
--- a/tcg/arm/tcg-target.h
+++ b/tcg/arm/tcg-target.h
@@ -31,7 +31,6 @@ extern int arm_arch;
 #define use_armv7_instructions  (__ARM_ARCH >= 7 || arm_arch >= 7)
 
 #define TCG_TARGET_INSN_UNIT_SIZE 4
-#define TCG_TARGET_TLB_DISPLACEMENT_BITS 16
 #define MAX_CODE_GEN_BUFFER_SIZE  UINT32_MAX
 
 typedef enum {
diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index b167f1e8d6..1468f8ef25 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -28,7 +28,6 @@
 #include "host/cpuinfo.h"
 
 #define TCG_TARGET_INSN_UNIT_SIZE  1
-#define TCG_TARGET_TLB_DISPLACEMENT_BITS 31
 
 #ifdef __x86_64__
 # define TCG_TARGET_REG_BITS  64
diff --git a/tcg/mips/tcg-target.h b/tcg/mips/tcg-target.h
index 8fbb6c6507..e4806f6ff5 100644
--- a/tcg/mips/tcg-target.h
+++ b/tcg/mips/tcg-target.h
@@ -36,7 +36,6 @@
 #endif
 
 #define TCG_TARGET_INSN_UNIT_SIZE 4
-#define TCG_TARGET_TLB_DISPLACEMENT_BITS 16
 #define TCG_TARGET_NB_REGS 32
 
 #define MAX_CODE_GEN_BUFFER_SIZE  ((size_t)-1)
diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index 204b70f86a..40f20b0c1a 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -34,7 +34,6 @@
 
 #define TCG_TARGET_NB_REGS 64
 #define TCG_TARGET_INSN_UNIT_SIZE 4
-#define TCG_TARGET_TLB_DISPLACEMENT_BITS 16
 
 typedef enum {
     TCG_REG_R0,  TCG_REG_R1,  TCG_REG_R2,  TCG_REG_R3,
diff --git a/tcg/riscv/tcg-target.h b/tcg/riscv/tcg-target.h
index 62fe61af7b..54fdff0caa 100644
--- a/tcg/riscv/tcg-target.h
+++ b/tcg/riscv/tcg-target.h
@@ -35,7 +35,6 @@
 #define TCG_TARGET_REG_BITS 64
 
 #define TCG_TARGET_INSN_UNIT_SIZE 4
-#define TCG_TARGET_TLB_DISPLACEMENT_BITS 20
 #define TCG_TARGET_NB_REGS 32
 #define MAX_CODE_GEN_BUFFER_SIZE  ((size_t)-1)
 
diff --git a/tcg/s390x/tcg-target.h b/tcg/s390x/tcg-target.h
index ec96952172..9a405003b9 100644
--- a/tcg/s390x/tcg-target.h
+++ b/tcg/s390x/tcg-target.h
@@ -26,7 +26,6 @@
 #define S390_TCG_TARGET_H
 
 #define TCG_TARGET_INSN_UNIT_SIZE 2
-#define TCG_TARGET_TLB_DISPLACEMENT_BITS 19
 
 /* We have a +- 4GB range on the branches; leave some slop.  */
 #define MAX_CODE_GEN_BUFFER_SIZE  (3 * GiB)
diff --git a/tcg/sparc64/tcg-target.h b/tcg/sparc64/tcg-target.h
index 31c5537379..d454278811 100644
--- a/tcg/sparc64/tcg-target.h
+++ b/tcg/sparc64/tcg-target.h
@@ -26,7 +26,6 @@
 #define SPARC_TCG_TARGET_H
 
 #define TCG_TARGET_INSN_UNIT_SIZE 4
-#define TCG_TARGET_TLB_DISPLACEMENT_BITS 32
 #define TCG_TARGET_NB_REGS 32
 #define MAX_CODE_GEN_BUFFER_SIZE  (2 * GiB)
 
diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
index 28dc6d5cfc..60a6ed65ce 100644
--- a/tcg/tci/tcg-target.h
+++ b/tcg/tci/tcg-target.h
@@ -42,7 +42,6 @@
 
 #define TCG_TARGET_INTERPRETER 1
 #define TCG_TARGET_INSN_UNIT_SIZE 4
-#define TCG_TARGET_TLB_DISPLACEMENT_BITS 32
 #define MAX_CODE_GEN_BUFFER_SIZE  ((size_t)-1)
 
 #if UINTPTR_MAX == UINT32_MAX
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PULL 18/27] decodetree: Add --test-for-error
  2023-05-30 18:59 [PULL 00/27] tcg patch queue Richard Henderson
                   ` (16 preceding siblings ...)
  2023-05-30 18:59 ` [PULL 17/27] tcg: Remove TCG_TARGET_TLB_DISPLACEMENT_BITS Richard Henderson
@ 2023-05-30 18:59 ` Richard Henderson
  2023-05-30 18:59 ` [PULL 19/27] decodetree: Fix recursion in prop_format and build_tree Richard Henderson
                   ` (9 subsequent siblings)
  27 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2023-05-30 18:59 UTC (permalink / raw)
  To: qemu-devel

Invert the exit code, for use with the testsuite.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 scripts/decodetree.py | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/scripts/decodetree.py b/scripts/decodetree.py
index a03dc6b5e3..3f9f6876f7 100644
--- a/scripts/decodetree.py
+++ b/scripts/decodetree.py
@@ -35,6 +35,7 @@
 formats = {}
 allpatterns = []
 anyextern = False
+testforerror = False
 
 translate_prefix = 'trans'
 translate_scope = 'static '
@@ -71,7 +72,7 @@ def error_with_file(file, lineno, *args):
     if output_file and output_fd:
         output_fd.close()
         os.remove(output_file)
-    exit(1)
+    exit(0 if testforerror else 1)
 # end error_with_file
 
 
@@ -1286,11 +1287,12 @@ def main():
     global bitop_width
     global variablewidth
     global anyextern
+    global testforerror
 
     decode_scope = 'static '
 
     long_opts = ['decode=', 'translate=', 'output=', 'insnwidth=',
-                 'static-decode=', 'varinsnwidth=']
+                 'static-decode=', 'varinsnwidth=', 'test-for-error']
     try:
         (opts, args) = getopt.gnu_getopt(sys.argv[1:], 'o:vw:', long_opts)
     except getopt.GetoptError as err:
@@ -1319,6 +1321,8 @@ def main():
                 bitop_width = 64
             elif insnwidth != 32:
                 error(0, 'cannot handle insns of width', insnwidth)
+        elif o == '--test-for-error':
+            testforerror = True
         else:
             assert False, 'unhandled option'
 
@@ -1417,6 +1421,7 @@ def main():
 
     if output_file:
         output_fd.close()
+    exit(1 if testforerror else 0)
 # end main
 
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PULL 19/27] decodetree: Fix recursion in prop_format and build_tree
  2023-05-30 18:59 [PULL 00/27] tcg patch queue Richard Henderson
                   ` (17 preceding siblings ...)
  2023-05-30 18:59 ` [PULL 18/27] decodetree: Add --test-for-error Richard Henderson
@ 2023-05-30 18:59 ` Richard Henderson
  2023-05-30 18:59 ` [PULL 20/27] decodetree: Diagnose empty pattern group Richard Henderson
                   ` (8 subsequent siblings)
  27 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2023-05-30 18:59 UTC (permalink / raw)
  To: qemu-devel

Two copy-paste errors walking the parse tree.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 scripts/decodetree.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/scripts/decodetree.py b/scripts/decodetree.py
index 3f9f6876f7..e2640cc79b 100644
--- a/scripts/decodetree.py
+++ b/scripts/decodetree.py
@@ -474,7 +474,7 @@ def build_tree(self):
 
     def prop_format(self):
         for p in self.pats:
-            p.build_tree()
+            p.prop_format()
 
     def prop_width(self):
         width = None
@@ -624,7 +624,7 @@ def __build_tree(pats, outerbits, outermask):
         return t
 
     def build_tree(self):
-        super().prop_format()
+        super().build_tree()
         self.tree = self.__build_tree(self.pats, self.fixedbits,
                                       self.fixedmask)
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PULL 20/27] decodetree: Diagnose empty pattern group
  2023-05-30 18:59 [PULL 00/27] tcg patch queue Richard Henderson
                   ` (18 preceding siblings ...)
  2023-05-30 18:59 ` [PULL 19/27] decodetree: Fix recursion in prop_format and build_tree Richard Henderson
@ 2023-05-30 18:59 ` Richard Henderson
  2023-05-30 18:59 ` [PULL 21/27] decodetree: Do not remove output_file from /dev Richard Henderson
                   ` (7 subsequent siblings)
  27 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2023-05-30 18:59 UTC (permalink / raw)
  To: qemu-devel

Test err_pattern_group_empty.decode failed with exception:

Traceback (most recent call last):
  File "./scripts/decodetree.py", line 1424, in <module> main()
  File "./scripts/decodetree.py", line 1342, in main toppat.build_tree()
  File "./scripts/decodetree.py", line 627, in build_tree
    self.tree = self.__build_tree(self.pats, self.fixedbits,
  File "./scripts/decodetree.py", line 607, in __build_tree
    fb = i.fixedbits & innermask
TypeError: unsupported operand type(s) for &: 'NoneType' and 'int'

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 scripts/decodetree.py | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/scripts/decodetree.py b/scripts/decodetree.py
index e2640cc79b..e4ef0a03cc 100644
--- a/scripts/decodetree.py
+++ b/scripts/decodetree.py
@@ -506,6 +506,12 @@ def output_code(self, i, extracted, outerbits, outermask):
                 output(ind, '}\n')
             else:
                 p.output_code(i, extracted, p.fixedbits, p.fixedmask)
+
+    def build_tree(self):
+        if not self.pats:
+            error_with_file(self.file, self.lineno, 'empty pattern group')
+        super().build_tree()
+
 #end IncMultiPattern
 
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PULL 21/27] decodetree: Do not remove output_file from /dev
  2023-05-30 18:59 [PULL 00/27] tcg patch queue Richard Henderson
                   ` (19 preceding siblings ...)
  2023-05-30 18:59 ` [PULL 20/27] decodetree: Diagnose empty pattern group Richard Henderson
@ 2023-05-30 18:59 ` Richard Henderson
  2023-05-30 18:59 ` [PULL 22/27] tests/decode: Convert tests to meson Richard Henderson
                   ` (6 subsequent siblings)
  27 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2023-05-30 18:59 UTC (permalink / raw)
  To: qemu-devel

Nor report any PermissionError on remove.
The primary purpose is testing with -o /dev/null.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 scripts/decodetree.py | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/scripts/decodetree.py b/scripts/decodetree.py
index e4ef0a03cc..a9a0cd0fa3 100644
--- a/scripts/decodetree.py
+++ b/scripts/decodetree.py
@@ -71,7 +71,12 @@ def error_with_file(file, lineno, *args):
 
     if output_file and output_fd:
         output_fd.close()
-        os.remove(output_file)
+        # Do not try to remove e.g. -o /dev/null
+        if not output_file.startswith("/dev"):
+            try:
+                os.remove(output_file)
+            except PermissionError:
+                pass
     exit(0 if testforerror else 1)
 # end error_with_file
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PULL 22/27] tests/decode: Convert tests to meson
  2023-05-30 18:59 [PULL 00/27] tcg patch queue Richard Henderson
                   ` (20 preceding siblings ...)
  2023-05-30 18:59 ` [PULL 21/27] decodetree: Do not remove output_file from /dev Richard Henderson
@ 2023-05-30 18:59 ` Richard Henderson
  2023-05-30 18:59 ` [PULL 23/27] docs: Document decodetree named field syntax Richard Henderson
                   ` (5 subsequent siblings)
  27 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2023-05-30 18:59 UTC (permalink / raw)
  To: qemu-devel

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tests/decode/check.sh    | 24 ----------------
 tests/decode/meson.build | 59 ++++++++++++++++++++++++++++++++++++++++
 tests/meson.build        |  5 +---
 3 files changed, 60 insertions(+), 28 deletions(-)
 delete mode 100755 tests/decode/check.sh
 create mode 100644 tests/decode/meson.build

diff --git a/tests/decode/check.sh b/tests/decode/check.sh
deleted file mode 100755
index 95445a0115..0000000000
--- a/tests/decode/check.sh
+++ /dev/null
@@ -1,24 +0,0 @@
-#!/bin/sh
-# This work is licensed under the terms of the GNU LGPL, version 2 or later.
-# See the COPYING.LIB file in the top-level directory.
-
-PYTHON=$1
-DECODETREE=$2
-E=0
-
-# All of these tests should produce errors
-for i in err_*.decode; do
-    if $PYTHON $DECODETREE $i > /dev/null 2> /dev/null; then
-        # Pass, aka failed to fail.
-        echo FAIL: $i 1>&2
-        E=1
-    fi
-done
-
-for i in succ_*.decode; do
-    if ! $PYTHON $DECODETREE $i > /dev/null 2> /dev/null; then
-        echo FAIL:$i 1>&2
-    fi
-done
-
-exit $E
diff --git a/tests/decode/meson.build b/tests/decode/meson.build
new file mode 100644
index 0000000000..b4850562f9
--- /dev/null
+++ b/tests/decode/meson.build
@@ -0,0 +1,59 @@
+err_tests = [
+    'err_argset1.decode',
+    'err_argset2.decode',
+    'err_field1.decode',
+    'err_field2.decode',
+    'err_field3.decode',
+    'err_field4.decode',
+    'err_field5.decode',
+    'err_field6.decode',
+    'err_init1.decode',
+    'err_init2.decode',
+    'err_init3.decode',
+    'err_init4.decode',
+    'err_overlap1.decode',
+    'err_overlap2.decode',
+    'err_overlap3.decode',
+    'err_overlap4.decode',
+    'err_overlap5.decode',
+    'err_overlap6.decode',
+    'err_overlap7.decode',
+    'err_overlap8.decode',
+    'err_overlap9.decode',
+    'err_pattern_group_empty.decode',
+    'err_pattern_group_ident1.decode',
+    'err_pattern_group_ident2.decode',
+    'err_pattern_group_nest1.decode',
+    'err_pattern_group_nest2.decode',
+    'err_pattern_group_nest3.decode',
+    'err_pattern_group_overlap1.decode',
+    'err_width1.decode',
+    'err_width2.decode',
+    'err_width3.decode',
+    'err_width4.decode',
+]
+
+succ_tests = [
+    'succ_argset_type1.decode',
+    'succ_function.decode',
+    'succ_ident1.decode',
+    'succ_pattern_group_nest1.decode',
+    'succ_pattern_group_nest2.decode',
+    'succ_pattern_group_nest3.decode',
+    'succ_pattern_group_nest4.decode',
+]
+
+suite = 'decodetree'
+decodetree = find_program(meson.project_source_root() / 'scripts/decodetree.py')
+
+foreach t: err_tests
+    test(fs.replace_suffix(t, ''),
+         decodetree, args: ['-o', '/dev/null', '--test-for-error', files(t)],
+         suite: suite)
+endforeach
+
+foreach t: succ_tests
+    test(fs.replace_suffix(t, ''),
+         decodetree, args: ['-o', '/dev/null', files(t)],
+         suite: suite)
+endforeach
diff --git a/tests/meson.build b/tests/meson.build
index 8e318ec513..083f2990bd 100644
--- a/tests/meson.build
+++ b/tests/meson.build
@@ -74,10 +74,7 @@ if have_tools and have_vhost_user and 'CONFIG_LINUX' in config_host
              dependencies: [qemuutil, vhost_user])
 endif
 
-test('decodetree', sh,
-     args: [ files('decode/check.sh'), config_host['PYTHON'], files('../scripts/decodetree.py') ],
-     workdir: meson.current_source_dir() / 'decode',
-     suite: 'decodetree')
+subdir('decode')
 
 if 'CONFIG_TCG' in config_all
   subdir('fp')
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PULL 23/27] docs: Document decodetree named field syntax
  2023-05-30 18:59 [PULL 00/27] tcg patch queue Richard Henderson
                   ` (21 preceding siblings ...)
  2023-05-30 18:59 ` [PULL 22/27] tests/decode: Convert tests to meson Richard Henderson
@ 2023-05-30 18:59 ` Richard Henderson
  2023-05-30 18:59 ` [PULL 24/27] scripts/decodetree: Pass lvalue-formatter function to str_extract() Richard Henderson
                   ` (4 subsequent siblings)
  27 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2023-05-30 18:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell

From: Peter Maydell <peter.maydell@linaro.org>

Document the named field syntax that we want to implement for the
decodetree script.  This allows a field to be defined in terms of
some other field that the instruction pattern has already set, for
example:

   %sz_imm 10:3 sz:3 !function=expand_sz_imm

to allow a function to be passed both an immediate field from the
instruction and also a sz value which might have been specified by
the instruction pattern directly (sz=1, etc) rather than being a
simple field within the instruction.

Note that the restriction on not having the format referring to the
pattern and the pattern referring to the format simultaneously is a
restriction of the decoder generator rather than inherently being a
silly thing to do.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230523120447.728365-3-peter.maydell@linaro.org>
---
 docs/devel/decodetree.rst | 33 ++++++++++++++++++++++++++++-----
 1 file changed, 28 insertions(+), 5 deletions(-)

diff --git a/docs/devel/decodetree.rst b/docs/devel/decodetree.rst
index 49ea50c2a7..e3392aa705 100644
--- a/docs/devel/decodetree.rst
+++ b/docs/devel/decodetree.rst
@@ -23,22 +23,42 @@ Fields
 
 Syntax::
 
-  field_def     := '%' identifier ( unnamed_field )* ( !function=identifier )?
+  field_def     := '%' identifier ( field )* ( !function=identifier )?
+  field         := unnamed_field | named_field
   unnamed_field := number ':' ( 's' ) number
+  named_field   := identifier ':' ( 's' ) number
 
 For *unnamed_field*, the first number is the least-significant bit position
 of the field and the second number is the length of the field.  If the 's' is
-present, the field is considered signed.  If multiple ``unnamed_fields`` are
-present, they are concatenated.  In this way one can define disjoint fields.
+present, the field is considered signed.
+
+A *named_field* refers to some other field in the instruction pattern
+or format. Regardless of the length of the other field where it is
+defined, it will be inserted into this field with the specified
+signedness and bit width.
+
+Field definitions that involve loops (i.e. where a field is defined
+directly or indirectly in terms of itself) are errors.
+
+A format can include fields that refer to named fields that are
+defined in the instruction pattern(s) that use the format.
+Conversely, an instruction pattern can include fields that refer to
+named fields that are defined in the format it uses. However you
+cannot currently do both at once (i.e. pattern P uses format F; F has
+a field A that refers to a named field B that is defined in P, and P
+has a field C that refers to a named field D that is defined in F).
+
+If multiple ``fields`` are present, they are concatenated.
+In this way one can define disjoint fields.
 
 If ``!function`` is specified, the concatenated result is passed through the
 named function, taking and returning an integral value.
 
-One may use ``!function`` with zero ``unnamed_fields``.  This case is called
+One may use ``!function`` with zero ``fields``.  This case is called
 a *parameter*, and the named function is only passed the ``DisasContext``
 and returns an integral value extracted from there.
 
-A field with no ``unnamed_fields`` and no ``!function`` is in error.
+A field with no ``fields`` and no ``!function`` is in error.
 
 Field examples:
 
@@ -56,6 +76,9 @@ Field examples:
 | %shimm8 5:s8 13:1         | expand_shimm8(sextract(i, 5, 8) << 1 |      |
 |   !function=expand_shimm8 |               extract(i, 13, 1))            |
 +---------------------------+---------------------------------------------+
+| %sz_imm 10:2 sz:3         | expand_sz_imm(extract(i, 10, 2) << 3 |      |
+|   !function=expand_sz_imm |               extract(a->sz, 0, 3))         |
++---------------------------+---------------------------------------------+
 
 Argument Sets
 =============
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PULL 24/27] scripts/decodetree: Pass lvalue-formatter function to str_extract()
  2023-05-30 18:59 [PULL 00/27] tcg patch queue Richard Henderson
                   ` (22 preceding siblings ...)
  2023-05-30 18:59 ` [PULL 23/27] docs: Document decodetree named field syntax Richard Henderson
@ 2023-05-30 18:59 ` Richard Henderson
  2023-05-30 18:59 ` [PULL 25/27] scripts/decodetree: Implement a topological sort Richard Henderson
                   ` (3 subsequent siblings)
  27 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2023-05-30 18:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell

From: Peter Maydell <peter.maydell@linaro.org>

To support referring to other named fields in field definitions, we
need to pass the str_extract() method a function which tells it how
to emit the code for a previously initialized named field.  (In
Pattern::output_code() the other field will be "u.f_foo.field", and
in Format::output_extract() it is "a->field".)

Refactor the two callsites that currently do "output code to
initialize each field", and have them pass a lambda that defines how
to format the lvalue in each case.  This is then used both in
emitting the LHS of the assignment and also passed down to
str_extract() as a new argument (unused at the moment, but will be
used in the following patch).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230523120447.728365-4-peter.maydell@linaro.org>
---
 scripts/decodetree.py | 26 +++++++++++++++-----------
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/scripts/decodetree.py b/scripts/decodetree.py
index a9a0cd0fa3..73d569c128 100644
--- a/scripts/decodetree.py
+++ b/scripts/decodetree.py
@@ -211,7 +211,7 @@ def __str__(self):
             s = ''
         return str(self.pos) + ':' + s + str(self.len)
 
-    def str_extract(self):
+    def str_extract(self, lvalue_formatter):
         global bitop_width
         s = 's' if self.sign else ''
         return f'{s}extract{bitop_width}(insn, {self.pos}, {self.len})'
@@ -234,12 +234,12 @@ def __init__(self, subs, mask):
     def __str__(self):
         return str(self.subs)
 
-    def str_extract(self):
+    def str_extract(self, lvalue_formatter):
         global bitop_width
         ret = '0'
         pos = 0
         for f in reversed(self.subs):
-            ext = f.str_extract()
+            ext = f.str_extract(lvalue_formatter)
             if pos == 0:
                 ret = ext
             else:
@@ -270,7 +270,7 @@ def __init__(self, value):
     def __str__(self):
         return str(self.value)
 
-    def str_extract(self):
+    def str_extract(self, lvalue_formatter):
         return str(self.value)
 
     def __cmp__(self, other):
@@ -289,8 +289,9 @@ def __init__(self, func, base):
     def __str__(self):
         return self.func + '(' + str(self.base) + ')'
 
-    def str_extract(self):
-        return self.func + '(ctx, ' + self.base.str_extract() + ')'
+    def str_extract(self, lvalue_formatter):
+        return (self.func + '(ctx, '
+                + self.base.str_extract(lvalue_formatter) + ')')
 
     def __eq__(self, other):
         return self.func == other.func and self.base == other.base
@@ -310,7 +311,7 @@ def __init__(self, func):
     def __str__(self):
         return self.func
 
-    def str_extract(self):
+    def str_extract(self, lvalue_formatter):
         return self.func + '(ctx)'
 
     def __eq__(self, other):
@@ -363,6 +364,11 @@ def __str__(self):
 
     def str1(self, i):
         return str_indent(i) + self.__str__()
+
+    def output_fields(self, indent, lvalue_formatter):
+        for n, f in self.fields.items():
+            output(indent, lvalue_formatter(n), ' = ',
+                   f.str_extract(lvalue_formatter), ';\n')
 # end General
 
 
@@ -376,8 +382,7 @@ def extract_name(self):
     def output_extract(self):
         output('static void ', self.extract_name(), '(DisasContext *ctx, ',
                self.base.struct_name(), ' *a, ', insntype, ' insn)\n{\n')
-        for n, f in self.fields.items():
-            output('    a->', n, ' = ', f.str_extract(), ';\n')
+        self.output_fields(str_indent(4), lambda n: 'a->' + n)
         output('}\n\n')
 # end Format
 
@@ -401,8 +406,7 @@ def output_code(self, i, extracted, outerbits, outermask):
         if not extracted:
             output(ind, self.base.extract_name(),
                    '(ctx, &u.f_', arg, ', insn);\n')
-        for n, f in self.fields.items():
-            output(ind, 'u.f_', arg, '.', n, ' = ', f.str_extract(), ';\n')
+        self.output_fields(ind, lambda n: 'u.f_' + arg + '.' + n)
         output(ind, 'if (', translate_prefix, '_', self.name,
                '(ctx, &u.f_', arg, ')) return true;\n')
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PULL 25/27] scripts/decodetree: Implement a topological sort
  2023-05-30 18:59 [PULL 00/27] tcg patch queue Richard Henderson
                   ` (23 preceding siblings ...)
  2023-05-30 18:59 ` [PULL 24/27] scripts/decodetree: Pass lvalue-formatter function to str_extract() Richard Henderson
@ 2023-05-30 18:59 ` Richard Henderson
  2023-05-30 18:59 ` [PULL 26/27] scripts/decodetree: Implement named field support Richard Henderson
                   ` (2 subsequent siblings)
  27 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2023-05-30 18:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell

From: Peter Maydell <peter.maydell@linaro.org>

To support named fields, we will need to be able to do a topological
sort (so that we ensure that we output the assignment to field A
before the assignment to field B if field B refers to field A by
name). The good news is that there is a tsort in the python standard
library; the bad news is that it was only added in Python 3.9.

To bridge the gap between our current minimum supported Python
version and 3.9, provide a local implementation that has the
same API as the stdlib version for the parts we care about.
In future when QEMU's minimum Python version requirement reaches
3.9 we can delete this code and replace it with an 'import' line.

The core of this implementation is based on
https://code.activestate.com/recipes/578272-topological-sort/
which is MIT-licensed.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230523120447.728365-5-peter.maydell@linaro.org>
---
 scripts/decodetree.py | 74 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 74 insertions(+)

diff --git a/scripts/decodetree.py b/scripts/decodetree.py
index 73d569c128..db019a25c6 100644
--- a/scripts/decodetree.py
+++ b/scripts/decodetree.py
@@ -54,6 +54,80 @@
 re_fmt_ident = '@[a-zA-Z0-9_]*'
 re_pat_ident = '[a-zA-Z0-9_]*'
 
+# Local implementation of a topological sort. We use the same API that
+# the Python graphlib does, so that when QEMU moves forward to a
+# baseline of Python 3.9 or newer this code can all be dropped and
+# replaced with:
+#    from graphlib import TopologicalSorter, CycleError
+#
+# https://docs.python.org/3.9/library/graphlib.html#graphlib.TopologicalSorter
+#
+# We only implement the parts of TopologicalSorter we care about:
+#  ts = TopologicalSorter(graph=None)
+#    create the sorter. graph is a dictionary whose keys are
+#    nodes and whose values are lists of the predecessors of that node.
+#    (That is, if graph contains "A" -> ["B", "C"] then we must output
+#    B and C before A.)
+#  ts.static_order()
+#    returns a list of all the nodes in sorted order, or raises CycleError
+#  CycleError
+#    exception raised if there are cycles in the graph. The second
+#    element in the args attribute is a list of nodes which form a
+#    cycle; the first and last element are the same, eg [a, b, c, a]
+#    (Our implementation doesn't give the order correctly.)
+#
+# For our purposes we can assume that the data set is always small
+# (typically 10 nodes or less, actual links in the graph very rare),
+# so we don't need to worry about efficiency of implementation.
+#
+# The core of this implementation is from
+# https://code.activestate.com/recipes/578272-topological-sort/
+# (but updated to Python 3), and is under the MIT license.
+
+class CycleError(ValueError):
+    """Subclass of ValueError raised if cycles exist in the graph"""
+    pass
+
+class TopologicalSorter:
+    """Topologically sort a graph"""
+    def __init__(self, graph=None):
+        self.graph = graph
+
+    def static_order(self):
+        # We do the sort right here, unlike the stdlib version
+        from functools import reduce
+        data = {}
+        r = []
+
+        if not self.graph:
+            return []
+
+        # This code wants the values in the dict to be specifically sets
+        for k, v in self.graph.items():
+            data[k] = set(v)
+
+        # Find all items that don't depend on anything.
+        extra_items_in_deps = (reduce(set.union, data.values())
+                               - set(data.keys()))
+        # Add empty dependencies where needed
+        data.update({item:{} for item in extra_items_in_deps})
+        while True:
+            ordered = set(item for item, dep in data.items() if not dep)
+            if not ordered:
+                break
+            r.extend(ordered)
+            data = {item: (dep - ordered)
+                    for item, dep in data.items()
+                        if item not in ordered}
+        if data:
+            # This doesn't give as nice results as the stdlib, which
+            # gives you the cycle by listing the nodes in order. Here
+            # we only know the nodes in the cycle but not their order.
+            raise CycleError(f'nodes are in a cycle', list(data.keys()))
+
+        return r
+# end TopologicalSorter
+
 def error_with_file(file, lineno, *args):
     """Print an error message from file:line and args and exit."""
     global output_file
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PULL 26/27] scripts/decodetree: Implement named field support
  2023-05-30 18:59 [PULL 00/27] tcg patch queue Richard Henderson
                   ` (24 preceding siblings ...)
  2023-05-30 18:59 ` [PULL 25/27] scripts/decodetree: Implement a topological sort Richard Henderson
@ 2023-05-30 18:59 ` Richard Henderson
  2023-05-30 18:59 ` [PULL 27/27] tests/decode: Add tests for various named-field cases Richard Henderson
  2023-05-31  1:08 ` [PULL 00/27] tcg patch queue Richard Henderson
  27 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2023-05-30 18:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell

From: Peter Maydell <peter.maydell@linaro.org>

Implement support for named fields, i.e.  where one field is defined
in terms of another, rather than directly in terms of bits extracted
from the instruction.

The new method referenced_fields() on all the Field classes returns a
list of fields that this field references.  This just passes through,
except for the new NamedField class.

We can then use referenced_fields() to:
 * construct a list of 'dangling references' for a format or
   pattern, which is the fields that the format/pattern uses but
   doesn't define itself
 * do a topological sort, so that we output "field = value"
   assignments in an order that means that we assign a field before
   we reference it in a subsequent assignment
 * check when we output the code for a pattern whether we need to
   fill in the format fields before or after the pattern fields, and
   do other error checking

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230523120447.728365-6-peter.maydell@linaro.org>
---
 scripts/decodetree.py | 145 ++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 139 insertions(+), 6 deletions(-)

diff --git a/scripts/decodetree.py b/scripts/decodetree.py
index db019a25c6..13db585d04 100644
--- a/scripts/decodetree.py
+++ b/scripts/decodetree.py
@@ -290,6 +290,9 @@ def str_extract(self, lvalue_formatter):
         s = 's' if self.sign else ''
         return f'{s}extract{bitop_width}(insn, {self.pos}, {self.len})'
 
+    def referenced_fields(self):
+        return []
+
     def __eq__(self, other):
         return self.sign == other.sign and self.mask == other.mask
 
@@ -321,6 +324,12 @@ def str_extract(self, lvalue_formatter):
             pos += f.len
         return ret
 
+    def referenced_fields(self):
+        l = []
+        for f in self.subs:
+            l.extend(f.referenced_fields())
+        return l
+
     def __ne__(self, other):
         if len(self.subs) != len(other.subs):
             return True
@@ -347,6 +356,9 @@ def __str__(self):
     def str_extract(self, lvalue_formatter):
         return str(self.value)
 
+    def referenced_fields(self):
+        return []
+
     def __cmp__(self, other):
         return self.value - other.value
 # end ConstField
@@ -367,6 +379,9 @@ def str_extract(self, lvalue_formatter):
         return (self.func + '(ctx, '
                 + self.base.str_extract(lvalue_formatter) + ')')
 
+    def referenced_fields(self):
+        return self.base.referenced_fields()
+
     def __eq__(self, other):
         return self.func == other.func and self.base == other.base
 
@@ -388,6 +403,9 @@ def __str__(self):
     def str_extract(self, lvalue_formatter):
         return self.func + '(ctx)'
 
+    def referenced_fields(self):
+        return []
+
     def __eq__(self, other):
         return self.func == other.func
 
@@ -395,6 +413,32 @@ def __ne__(self, other):
         return not self.__eq__(other)
 # end ParameterField
 
+class NamedField:
+    """Class representing a field already named in the pattern"""
+    def __init__(self, name, sign, len):
+        self.mask = 0
+        self.sign = sign
+        self.len = len
+        self.name = name
+
+    def __str__(self):
+        return self.name
+
+    def str_extract(self, lvalue_formatter):
+        global bitop_width
+        s = 's' if self.sign else ''
+        lvalue = lvalue_formatter(self.name)
+        return f'{s}extract{bitop_width}({lvalue}, 0, {self.len})'
+
+    def referenced_fields(self):
+        return [self.name]
+
+    def __eq__(self, other):
+        return self.name == other.name
+
+    def __ne__(self, other):
+        return not self.__eq__(other)
+# end NamedField
 
 class Arguments:
     """Class representing the extracted fields of a format"""
@@ -418,7 +462,6 @@ def output_def(self):
             output('} ', self.struct_name(), ';\n\n')
 # end Arguments
 
-
 class General:
     """Common code between instruction formats and instruction patterns"""
     def __init__(self, name, lineno, base, fixb, fixm, udfm, fldm, flds, w):
@@ -432,6 +475,7 @@ def __init__(self, name, lineno, base, fixb, fixm, udfm, fldm, flds, w):
         self.fieldmask = fldm
         self.fields = flds
         self.width = w
+        self.dangling = None
 
     def __str__(self):
         return self.name + ' ' + str_match_bits(self.fixedbits, self.fixedmask)
@@ -439,10 +483,51 @@ def __str__(self):
     def str1(self, i):
         return str_indent(i) + self.__str__()
 
+    def dangling_references(self):
+        # Return a list of all named references which aren't satisfied
+        # directly by this format/pattern. This will be either:
+        #  * a format referring to a field which is specified by the
+        #    pattern(s) using it
+        #  * a pattern referring to a field which is specified by the
+        #    format it uses
+        #  * a user error (referring to a field that doesn't exist at all)
+        if self.dangling is None:
+            # Compute this once and cache the answer
+            dangling = []
+            for n, f in self.fields.items():
+                for r in f.referenced_fields():
+                    if r not in self.fields:
+                        dangling.append(r)
+            self.dangling = dangling
+        return self.dangling
+
     def output_fields(self, indent, lvalue_formatter):
+        # We use a topological sort to ensure that any use of NamedField
+        # comes after the initialization of the field it is referencing.
+        graph = {}
         for n, f in self.fields.items():
-            output(indent, lvalue_formatter(n), ' = ',
-                   f.str_extract(lvalue_formatter), ';\n')
+            refs = f.referenced_fields()
+            graph[n] = refs
+
+        try:
+            ts = TopologicalSorter(graph)
+            for n in ts.static_order():
+                # We only want to emit assignments for the keys
+                # in our fields list, not for anything that ends up
+                # in the tsort graph only because it was referenced as
+                # a NamedField.
+                try:
+                    f = self.fields[n]
+                    output(indent, lvalue_formatter(n), ' = ',
+                           f.str_extract(lvalue_formatter), ';\n')
+                except KeyError:
+                    pass
+        except CycleError as e:
+            # The second element of args is a list of nodes which form
+            # a cycle (there might be others too, but only one is reported).
+            # Pretty-print it to tell the user.
+            cycle = ' => '.join(e.args[1])
+            error(self.lineno, 'field definitions form a cycle: ' + cycle)
 # end General
 
 
@@ -477,10 +562,36 @@ def output_code(self, i, extracted, outerbits, outermask):
         ind = str_indent(i)
         arg = self.base.base.name
         output(ind, '/* ', self.file, ':', str(self.lineno), ' */\n')
+        # We might have named references in the format that refer to fields
+        # in the pattern, or named references in the pattern that refer
+        # to fields in the format. This affects whether we extract the fields
+        # for the format before or after the ones for the pattern.
+        # For simplicity we don't allow cross references in both directions.
+        # This is also where we catch the syntax error of referring to
+        # a nonexistent field.
+        fmt_refs = self.base.dangling_references()
+        for r in fmt_refs:
+            if r not in self.fields:
+                error(self.lineno, f'format refers to undefined field {r}')
+        pat_refs = self.dangling_references()
+        for r in pat_refs:
+            if r not in self.base.fields:
+                error(self.lineno, f'pattern refers to undefined field {r}')
+        if pat_refs and fmt_refs:
+            error(self.lineno, ('pattern that uses fields defined in format '
+                                'cannot use format that uses fields defined '
+                                'in pattern'))
+        if fmt_refs:
+            # pattern fields first
+            self.output_fields(ind, lambda n: 'u.f_' + arg + '.' + n)
+            assert not extracted, "dangling fmt refs but it was already extracted"
         if not extracted:
             output(ind, self.base.extract_name(),
                    '(ctx, &u.f_', arg, ', insn);\n')
-        self.output_fields(ind, lambda n: 'u.f_' + arg + '.' + n)
+        if not fmt_refs:
+            # pattern fields last
+            self.output_fields(ind, lambda n: 'u.f_' + arg + '.' + n)
+
         output(ind, 'if (', translate_prefix, '_', self.name,
                '(ctx, &u.f_', arg, ')) return true;\n')
 
@@ -626,8 +737,10 @@ def output_code(self, i, extracted, outerbits, outermask):
         ind = str_indent(i)
 
         # If we identified all nodes below have the same format,
-        # extract the fields now.
-        if not extracted and self.base:
+        # extract the fields now. But don't do it if the format relies
+        # on named fields from the insn pattern, as those won't have
+        # been initialised at this point.
+        if not extracted and self.base and not self.base.dangling_references():
             output(ind, self.base.extract_name(),
                    '(ctx, &u.f_', self.base.base.name, ', insn);\n')
             extracted = True
@@ -749,6 +862,7 @@ def parse_field(lineno, name, toks):
     """Parse one instruction field from TOKS at LINENO"""
     global fields
     global insnwidth
+    global re_C_ident
 
     # A "simple" field will have only one entry;
     # a "multifield" will have several.
@@ -763,6 +877,25 @@ def parse_field(lineno, name, toks):
             func = func[1]
             continue
 
+        if re.fullmatch(re_C_ident + ':s[0-9]+', t):
+            # Signed named field
+            subtoks = t.split(':')
+            n = subtoks[0]
+            le = int(subtoks[1])
+            f = NamedField(n, True, le)
+            subs.append(f)
+            width += le
+            continue
+        if re.fullmatch(re_C_ident + ':[0-9]+', t):
+            # Unsigned named field
+            subtoks = t.split(':')
+            n = subtoks[0]
+            le = int(subtoks[1])
+            f = NamedField(n, False, le)
+            subs.append(f)
+            width += le
+            continue
+
         if re.fullmatch('[0-9]+:s[0-9]+', t):
             # Signed field extract
             subtoks = t.split(':s')
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PULL 27/27] tests/decode: Add tests for various named-field cases
  2023-05-30 18:59 [PULL 00/27] tcg patch queue Richard Henderson
                   ` (25 preceding siblings ...)
  2023-05-30 18:59 ` [PULL 26/27] scripts/decodetree: Implement named field support Richard Henderson
@ 2023-05-30 18:59 ` Richard Henderson
  2023-05-31  1:08 ` [PULL 00/27] tcg patch queue Richard Henderson
  27 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2023-05-30 18:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell

From: Peter Maydell <peter.maydell@linaro.org>

Add some tests for various cases of named-field use, both ones that
should work and ones that should be diagnosed as errors.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230523120447.728365-7-peter.maydell@linaro.org>
---
 tests/decode/err_field10.decode      |  7 +++++++
 tests/decode/err_field7.decode       |  7 +++++++
 tests/decode/err_field8.decode       |  8 ++++++++
 tests/decode/err_field9.decode       | 14 ++++++++++++++
 tests/decode/succ_named_field.decode | 19 +++++++++++++++++++
 tests/decode/meson.build             |  5 +++++
 6 files changed, 60 insertions(+)
 create mode 100644 tests/decode/err_field10.decode
 create mode 100644 tests/decode/err_field7.decode
 create mode 100644 tests/decode/err_field8.decode
 create mode 100644 tests/decode/err_field9.decode
 create mode 100644 tests/decode/succ_named_field.decode

diff --git a/tests/decode/err_field10.decode b/tests/decode/err_field10.decode
new file mode 100644
index 0000000000..3e672b7459
--- /dev/null
+++ b/tests/decode/err_field10.decode
@@ -0,0 +1,7 @@
+# This work is licensed under the terms of the GNU LGPL, version 2 or later.
+# See the COPYING.LIB file in the top-level directory.
+
+# Diagnose formats which refer to undefined fields
+%field1        field2:3
+@fmt ........ ........ ........ ........ %field1
+insn 00000000 00000000 00000000 00000000 @fmt
diff --git a/tests/decode/err_field7.decode b/tests/decode/err_field7.decode
new file mode 100644
index 0000000000..51fad7ccea
--- /dev/null
+++ b/tests/decode/err_field7.decode
@@ -0,0 +1,7 @@
+# This work is licensed under the terms of the GNU LGPL, version 2 or later.
+# See the COPYING.LIB file in the top-level directory.
+
+# Diagnose fields whose definitions form a loop
+%field1        field2:3
+%field2        field1:4
+insn 00000000 00000000 00000000 00000000 %field1 %field2
diff --git a/tests/decode/err_field8.decode b/tests/decode/err_field8.decode
new file mode 100644
index 0000000000..cc47c08a45
--- /dev/null
+++ b/tests/decode/err_field8.decode
@@ -0,0 +1,8 @@
+# This work is licensed under the terms of the GNU LGPL, version 2 or later.
+# See the COPYING.LIB file in the top-level directory.
+
+# Diagnose patterns which refer to undefined fields
+&f1 f1 a
+%field1        field2:3
+@fmt ........ ........ ........ .... a:4 &f1
+insn 00000000 00000000 00000000 0000 .... @fmt f1=%field1
diff --git a/tests/decode/err_field9.decode b/tests/decode/err_field9.decode
new file mode 100644
index 0000000000..e7361d521b
--- /dev/null
+++ b/tests/decode/err_field9.decode
@@ -0,0 +1,14 @@
+# This work is licensed under the terms of the GNU LGPL, version 2 or later.
+# See the COPYING.LIB file in the top-level directory.
+
+# Diagnose fields where the format refers to a field defined in the
+# pattern and the pattern refers to a field defined in the format.
+# This is theoretically not impossible to implement, but is not
+# supported by the script at this time.
+&abcd a b c d
+%refa        a:3
+%refc        c:4
+# Format defines 'c' and sets 'b' to an indirect ref to 'a'
+@fmt ........ ........ ........ c:8 &abcd b=%refa
+# Pattern defines 'a' and sets 'd' to an indirect ref to 'c'
+insn 00000000 00000000 00000000 ........ @fmt d=%refc a=6
diff --git a/tests/decode/succ_named_field.decode b/tests/decode/succ_named_field.decode
new file mode 100644
index 0000000000..e64b3f9356
--- /dev/null
+++ b/tests/decode/succ_named_field.decode
@@ -0,0 +1,19 @@
+# This work is licensed under the terms of the GNU LGPL, version 2 or later.
+# See the COPYING.LIB file in the top-level directory.
+
+# field using a named_field
+%imm_sz	8:8 sz:3
+insn 00000000 00000000 ........ 00000000 imm_sz=%imm_sz sz=1
+
+# Ditto, via a format. Here a field in the format
+# references a named field defined in the insn pattern:
+&imm_a imm alpha
+%foo 0:16 alpha:4
+@foo 00000001 ........ ........ ........ &imm_a imm=%foo
+i1   ........ 00000000 ........ ........ @foo alpha=1
+i2   ........ 00000001 ........ ........ @foo alpha=2
+
+# Here the named field is defined in the format and referenced
+# from the insn pattern:
+@bar 00000010 ........ ........ ........ &imm_a alpha=4
+i3   ........ 00000000 ........ ........ @bar imm=%foo
diff --git a/tests/decode/meson.build b/tests/decode/meson.build
index b4850562f9..38a0629d67 100644
--- a/tests/decode/meson.build
+++ b/tests/decode/meson.build
@@ -7,6 +7,10 @@ err_tests = [
     'err_field4.decode',
     'err_field5.decode',
     'err_field6.decode',
+    'err_field7.decode',
+    'err_field8.decode',
+    'err_field9.decode',
+    'err_field10.decode',
     'err_init1.decode',
     'err_init2.decode',
     'err_init3.decode',
@@ -37,6 +41,7 @@ succ_tests = [
     'succ_argset_type1.decode',
     'succ_function.decode',
     'succ_ident1.decode',
+    'succ_named_field.decode',
     'succ_pattern_group_nest1.decode',
     'succ_pattern_group_nest2.decode',
     'succ_pattern_group_nest3.decode',
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PULL 00/27] tcg patch queue
  2023-05-30 18:59 [PULL 00/27] tcg patch queue Richard Henderson
                   ` (26 preceding siblings ...)
  2023-05-30 18:59 ` [PULL 27/27] tests/decode: Add tests for various named-field cases Richard Henderson
@ 2023-05-31  1:08 ` Richard Henderson
  2023-05-31 16:12   ` Thomas Huth
  27 siblings, 1 reply; 35+ messages in thread
From: Richard Henderson @ 2023-05-31  1:08 UTC (permalink / raw)
  To: qemu-devel

On 5/30/23 11:59, Richard Henderson wrote:
> The following changes since commit 7fe6cb68117ac856e03c93d18aca09de015392b0:
> 
>    Merge tag 'pull-target-arm-20230530-1' ofhttps://git.linaro.org/people/pmaydell/qemu-arm  into staging (2023-05-30 08:02:05 -0700)
> 
> are available in the Git repository at:
> 
>    https://gitlab.com/rth7680/qemu.git  tags/pull-tcg-20230530
> 
> for you to fetch changes up to 276d77de503e8f5f5cbd3f7d94302ca12d1d982e:
> 
>    tests/decode: Add tests for various named-field cases (2023-05-30 10:55:39 -0700)
> 
> ----------------------------------------------------------------
> Improvements to 128-bit atomics:
>    - Separate __int128_t type and arithmetic detection
>    - Support 128-bit load/store in backend for i386, aarch64, ppc64, s390x
>    - Accelerate atomics via host/include/
> Decodetree:
>    - Add named field syntax
>    - Move tests to meson

Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/8.1 as appropriate.


r~



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PULL 00/27] tcg patch queue
  2023-05-31  1:08 ` [PULL 00/27] tcg patch queue Richard Henderson
@ 2023-05-31 16:12   ` Thomas Huth
       [not found]     ` <227e27e0-4035-8e17-2259-3098340f716e@linaro.org>
  2023-06-01 10:52     ` Mark Cave-Ayland
  0 siblings, 2 replies; 35+ messages in thread
From: Thomas Huth @ 2023-05-31 16:12 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

On 31/05/2023 03.08, Richard Henderson wrote:
> On 5/30/23 11:59, Richard Henderson wrote:
>> The following changes since commit 7fe6cb68117ac856e03c93d18aca09de015392b0:
>>
>>    Merge tag 'pull-target-arm-20230530-1' 
>> ofhttps://git.linaro.org/people/pmaydell/qemu-arm  into staging 
>> (2023-05-30 08:02:05 -0700)
>>
>> are available in the Git repository at:
>>
>>    https://gitlab.com/rth7680/qemu.git  tags/pull-tcg-20230530
>>
>> for you to fetch changes up to 276d77de503e8f5f5cbd3f7d94302ca12d1d982e:
>>
>>    tests/decode: Add tests for various named-field cases (2023-05-30 
>> 10:55:39 -0700)
>>
>> ----------------------------------------------------------------
>> Improvements to 128-bit atomics:
>>    - Separate __int128_t type and arithmetic detection
>>    - Support 128-bit load/store in backend for i386, aarch64, ppc64, s390x
>>    - Accelerate atomics via host/include/
>> Decodetree:
>>    - Add named field syntax
>>    - Move tests to meson
> 
> Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/8.1 as 
> appropriate.

Too bad that we've run out of CI minutes for the Windows jobs ... FYI, this 
is causing now failure in the msys2 jobs:

  https://gitlab.com/thuth/qemu/-/jobs/4385862382#L4821
  https://gitlab.com/thuth/qemu/-/jobs/4385862378#L4632

  Thomas




^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PULL 00/27] tcg patch queue
       [not found]     ` <227e27e0-4035-8e17-2259-3098340f716e@linaro.org>
@ 2023-05-31 22:20       ` Richard Henderson
  0 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2023-05-31 22:20 UTC (permalink / raw)
  To: Thomas Huth, qemu-devel

On 5/31/23 11:07, Richard Henderson wrote:
> On 5/31/23 09:12, Thomas Huth wrote:
>> On 31/05/2023 03.08, Richard Henderson wrote:
>>> On 5/30/23 11:59, Richard Henderson wrote:
>>>> The following changes since commit 7fe6cb68117ac856e03c93d18aca09de015392b0:
>>>>
>>>>    Merge tag 'pull-target-arm-20230530-1' 
>>>> ofhttps://git.linaro.org/people/pmaydell/qemu-arm  into staging (2023-05-30 08:02:05 
>>>> -0700)
>>>>
>>>> are available in the Git repository at:
>>>>
>>>>    https://gitlab.com/rth7680/qemu.git  tags/pull-tcg-20230530
>>>>
>>>> for you to fetch changes up to 276d77de503e8f5f5cbd3f7d94302ca12d1d982e:
>>>>
>>>>    tests/decode: Add tests for various named-field cases (2023-05-30 10:55:39 -0700)
>>>>
>>>> ----------------------------------------------------------------
>>>> Improvements to 128-bit atomics:
>>>>    - Separate __int128_t type and arithmetic detection
>>>>    - Support 128-bit load/store in backend for i386, aarch64, ppc64, s390x
>>>>    - Accelerate atomics via host/include/
>>>> Decodetree:
>>>>    - Add named field syntax
>>>>    - Move tests to meson
>>>
>>> Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/8.1 as appropriate.
>>
>> Too bad that we've run out of CI minutes for the Windows jobs ... FYI, this is causing 
>> now failure in the msys2 jobs:
>>
>>   https://gitlab.com/thuth/qemu/-/jobs/4385862382#L4821
>>   https://gitlab.com/thuth/qemu/-/jobs/4385862378#L4632
> 
> Grr.  It doesn't even say what failed, just wrong exit code.
> I wonder if the -o /dev/null is getting in the way...

Whee.  From meson-logs/testlog.txt:

FileNotFoundError: [Errno 2] No such file or directory: '/dev/null'


r~


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PULL 00/27] tcg patch queue
  2023-05-31 16:12   ` Thomas Huth
       [not found]     ` <227e27e0-4035-8e17-2259-3098340f716e@linaro.org>
@ 2023-06-01 10:52     ` Mark Cave-Ayland
  1 sibling, 0 replies; 35+ messages in thread
From: Mark Cave-Ayland @ 2023-06-01 10:52 UTC (permalink / raw)
  To: Thomas Huth, Richard Henderson, qemu-devel

On 31/05/2023 17:12, Thomas Huth wrote:

> On 31/05/2023 03.08, Richard Henderson wrote:
>> On 5/30/23 11:59, Richard Henderson wrote:
>>> The following changes since commit 7fe6cb68117ac856e03c93d18aca09de015392b0:
>>>
>>>    Merge tag 'pull-target-arm-20230530-1' 
>>> ofhttps://git.linaro.org/people/pmaydell/qemu-arm  into staging (2023-05-30 
>>> 08:02:05 -0700)
>>>
>>> are available in the Git repository at:
>>>
>>>    https://gitlab.com/rth7680/qemu.git  tags/pull-tcg-20230530
>>>
>>> for you to fetch changes up to 276d77de503e8f5f5cbd3f7d94302ca12d1d982e:
>>>
>>>    tests/decode: Add tests for various named-field cases (2023-05-30 10:55:39 -0700)
>>>
>>> ----------------------------------------------------------------
>>> Improvements to 128-bit atomics:
>>>    - Separate __int128_t type and arithmetic detection
>>>    - Support 128-bit load/store in backend for i386, aarch64, ppc64, s390x
>>>    - Accelerate atomics via host/include/
>>> Decodetree:
>>>    - Add named field syntax
>>>    - Move tests to meson
>>
>> Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/8.1 as appropriate.
> 
> Too bad that we've run out of CI minutes for the Windows jobs ... FYI, this is 
> causing now failure in the msys2 jobs:
> 
>   https://gitlab.com/thuth/qemu/-/jobs/4385862382#L4821
>   https://gitlab.com/thuth/qemu/-/jobs/4385862378#L4632

Given that we now run Kubernetes on Azure, should we consider setting up a Windows 
Kubernetes node for the msys2 jobs? When I last looked at this for a client ~2 years 
ago it was still experimental, but in theory it should be possible to set up a 
Windows node as part of the cluster and install the Windows GitLab runner in a 
container on it...


ATB,

Mark.



^ permalink raw reply	[flat|nested] 35+ messages in thread

* TCG broken on s390x hosts (was: [PULL 11/27] tcg/s390x: Support 128-bit load/store)
  2023-05-30 18:59 ` [PULL 11/27] tcg/s390x: " Richard Henderson
@ 2023-07-10  8:58   ` Thomas Huth
  2023-07-10  9:31     ` Richard Henderson
  0 siblings, 1 reply; 35+ messages in thread
From: Thomas Huth @ 2023-07-10  8:58 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: Peter Maydell, qemu-s390x

On 30/05/2023 20.59, Richard Henderson wrote:
> Use LPQ/STPQ when 16-byte atomicity is required.
> Note that these instructions require 16-byte alignment.
> 
> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   tcg/s390x/tcg-target-con-set.h |   2 +
>   tcg/s390x/tcg-target.h         |   2 +-
>   tcg/s390x/tcg-target.c.inc     | 107 ++++++++++++++++++++++++++++++++-
>   3 files changed, 107 insertions(+), 4 deletions(-)

  Hi Richard!

I think this patch broke something on s390x hosts. I currently
cannot run the BootLinuxS390X.test_s390_ccw_virtio_tcg avocado
test on a s390x host - it times out:

  make check-venv
  ./tests/venv/bin/avocado run -t arch:s390x \
   tests/avocado/boot_linux.py:BootLinuxS390X.test_s390_ccw_virtio_tcg

... gets INTERRUPTED after the timeout expired.

It used to work fine in the past, so I bisected the problem
and it seems like 4caad79f8d60a5df20ceed1c396724af745c9512
is the first bad commit. If I revert it on top of the master
branch, the test works fine again. Could you please have
a look?

  Thanks,
   Thomas



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: TCG broken on s390x hosts (was: [PULL 11/27] tcg/s390x: Support 128-bit load/store)
  2023-07-10  8:58   ` TCG broken on s390x hosts (was: [PULL 11/27] tcg/s390x: Support 128-bit load/store) Thomas Huth
@ 2023-07-10  9:31     ` Richard Henderson
  2023-07-10 11:10       ` Thomas Huth
  0 siblings, 1 reply; 35+ messages in thread
From: Richard Henderson @ 2023-07-10  9:31 UTC (permalink / raw)
  To: Thomas Huth, qemu-devel; +Cc: Peter Maydell, qemu-s390x

On 7/10/23 09:58, Thomas Huth wrote:
> On 30/05/2023 20.59, Richard Henderson wrote:
>> Use LPQ/STPQ when 16-byte atomicity is required.
>> Note that these instructions require 16-byte alignment.
>>
>> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>   tcg/s390x/tcg-target-con-set.h |   2 +
>>   tcg/s390x/tcg-target.h         |   2 +-
>>   tcg/s390x/tcg-target.c.inc     | 107 ++++++++++++++++++++++++++++++++-
>>   3 files changed, 107 insertions(+), 4 deletions(-)
> 
>   Hi Richard!
> 
> I think this patch broke something on s390x hosts. I currently
> cannot run the BootLinuxS390X.test_s390_ccw_virtio_tcg avocado
> test on a s390x host - it times out:
> 
>   make check-venv
>   ./tests/venv/bin/avocado run -t arch:s390x \
>    tests/avocado/boot_linux.py:BootLinuxS390X.test_s390_ccw_virtio_tcg
> 
> ... gets INTERRUPTED after the timeout expired.
> 
> It used to work fine in the past, so I bisected the problem
> and it seems like 4caad79f8d60a5df20ceed1c396724af745c9512
> is the first bad commit. If I revert it on top of the master
> branch, the test works fine again. Could you please have
> a look?
> 

I have a patch already:

https://patchew.org/QEMU/20230707102955.5607-1-richard.henderson@linaro.org/


r~



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: TCG broken on s390x hosts (was: [PULL 11/27] tcg/s390x: Support 128-bit load/store)
  2023-07-10  9:31     ` Richard Henderson
@ 2023-07-10 11:10       ` Thomas Huth
  0 siblings, 0 replies; 35+ messages in thread
From: Thomas Huth @ 2023-07-10 11:10 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: Peter Maydell, qemu-s390x

On 10/07/2023 11.31, Richard Henderson wrote:
> On 7/10/23 09:58, Thomas Huth wrote:
>> On 30/05/2023 20.59, Richard Henderson wrote:
>>> Use LPQ/STPQ when 16-byte atomicity is required.
>>> Note that these instructions require 16-byte alignment.
>>>
>>> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
>>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>>> ---
>>>   tcg/s390x/tcg-target-con-set.h |   2 +
>>>   tcg/s390x/tcg-target.h         |   2 +-
>>>   tcg/s390x/tcg-target.c.inc     | 107 ++++++++++++++++++++++++++++++++-
>>>   3 files changed, 107 insertions(+), 4 deletions(-)
>>
>>   Hi Richard!
>>
>> I think this patch broke something on s390x hosts. I currently
>> cannot run the BootLinuxS390X.test_s390_ccw_virtio_tcg avocado
>> test on a s390x host - it times out:
>>
>>   make check-venv
>>   ./tests/venv/bin/avocado run -t arch:s390x \
>>    tests/avocado/boot_linux.py:BootLinuxS390X.test_s390_ccw_virtio_tcg
>>
>> ... gets INTERRUPTED after the timeout expired.
>>
>> It used to work fine in the past, so I bisected the problem
>> and it seems like 4caad79f8d60a5df20ceed1c396724af745c9512
>> is the first bad commit. If I revert it on top of the master
>> branch, the test works fine again. Could you please have
>> a look?
>>
> 
> I have a patch already:
> 
> https://patchew.org/QEMU/20230707102955.5607-1-richard.henderson@linaro.org/

Hmm, either I'm doing something wrong, or this patch does not fix the issue 
yet - I applied it and the test still times out ...

  Thomas



^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2023-07-10 11:11 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-30 18:59 [PULL 00/27] tcg patch queue Richard Henderson
2023-05-30 18:59 ` [PULL 01/27] tcg: Fix register move type in tcg_out_ld_helper_ret Richard Henderson
2023-05-30 18:59 ` [PULL 02/27] accel/tcg: Fix check for page writeability in load_atomic16_or_exit Richard Henderson
2023-05-30 18:59 ` [PULL 03/27] meson: Split test for __int128_t type from __int128_t arithmetic Richard Henderson
2023-05-30 18:59 ` [PULL 04/27] qemu/atomic128: Add x86_64 atomic128-ldst.h Richard Henderson
2023-05-30 18:59 ` [PULL 05/27] tcg/i386: Support 128-bit load/store Richard Henderson
2023-05-30 18:59 ` [PULL 06/27] tcg/aarch64: Rename temporaries Richard Henderson
2023-05-30 18:59 ` [PULL 07/27] tcg/aarch64: Reserve TCG_REG_TMP1, TCG_REG_TMP2 Richard Henderson
2023-05-30 18:59 ` [PULL 08/27] tcg/aarch64: Simplify constraints on qemu_ld/st Richard Henderson
2023-05-30 18:59 ` [PULL 09/27] tcg/aarch64: Support 128-bit load/store Richard Henderson
2023-05-30 18:59 ` [PULL 10/27] tcg/ppc: " Richard Henderson
2023-05-30 18:59 ` [PULL 11/27] tcg/s390x: " Richard Henderson
2023-07-10  8:58   ` TCG broken on s390x hosts (was: [PULL 11/27] tcg/s390x: Support 128-bit load/store) Thomas Huth
2023-07-10  9:31     ` Richard Henderson
2023-07-10 11:10       ` Thomas Huth
2023-05-30 18:59 ` [PULL 12/27] accel/tcg: Extract load_atom_extract_al16_or_al8 to host header Richard Henderson
2023-05-30 18:59 ` [PULL 13/27] accel/tcg: Extract store_atom_insert_al16 " Richard Henderson
2023-05-30 18:59 ` [PULL 14/27] accel/tcg: Add x86_64 load_atom_extract_al16_or_al8 Richard Henderson
2023-05-30 18:59 ` [PULL 15/27] accel/tcg: Add aarch64 lse2 load_atom_extract_al16_or_al8 Richard Henderson
2023-05-30 18:59 ` [PULL 16/27] accel/tcg: Add aarch64 store_atom_insert_al16 Richard Henderson
2023-05-30 18:59 ` [PULL 17/27] tcg: Remove TCG_TARGET_TLB_DISPLACEMENT_BITS Richard Henderson
2023-05-30 18:59 ` [PULL 18/27] decodetree: Add --test-for-error Richard Henderson
2023-05-30 18:59 ` [PULL 19/27] decodetree: Fix recursion in prop_format and build_tree Richard Henderson
2023-05-30 18:59 ` [PULL 20/27] decodetree: Diagnose empty pattern group Richard Henderson
2023-05-30 18:59 ` [PULL 21/27] decodetree: Do not remove output_file from /dev Richard Henderson
2023-05-30 18:59 ` [PULL 22/27] tests/decode: Convert tests to meson Richard Henderson
2023-05-30 18:59 ` [PULL 23/27] docs: Document decodetree named field syntax Richard Henderson
2023-05-30 18:59 ` [PULL 24/27] scripts/decodetree: Pass lvalue-formatter function to str_extract() Richard Henderson
2023-05-30 18:59 ` [PULL 25/27] scripts/decodetree: Implement a topological sort Richard Henderson
2023-05-30 18:59 ` [PULL 26/27] scripts/decodetree: Implement named field support Richard Henderson
2023-05-30 18:59 ` [PULL 27/27] tests/decode: Add tests for various named-field cases Richard Henderson
2023-05-31  1:08 ` [PULL 00/27] tcg patch queue Richard Henderson
2023-05-31 16:12   ` Thomas Huth
     [not found]     ` <227e27e0-4035-8e17-2259-3098340f716e@linaro.org>
2023-05-31 22:20       ` Richard Henderson
2023-06-01 10:52     ` Mark Cave-Ayland

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.