All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 00/27] cmpxchg-based emulation of atomics
@ 2016-07-01 17:04 Richard Henderson
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 01/27] atomics: add atomic_xor Richard Henderson
                   ` (27 more replies)
  0 siblings, 28 replies; 48+ messages in thread
From: Richard Henderson @ 2016-07-01 17:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota, alex.bennee, pbonzini, peter.maydell, serge.fdrv

I spent a couple evenings this week tweaking Emilio's patch set.

The first major change is to "qemu/int128.h", so that we can use
that type in the context of a 16-byte cmpxchg.  I have yet to teach
TCG code generation about this type, so it's really only usable
from other helper functions.  But that's still an improvement over
having to return two uint64_t by reference.

The second major change is to funnel atomic operation generation
through functions in tcg-op.c.  There we can test whether or not
we're generating code in a parallel context and require atomic
operations.  This also centralizes the helper functions so that we
don't have to have the same sets in every target.

The third major change is providing a mechanism by which we can
trap on atomic operations that we do not support, exit the cpu loop,
stop the world, and then re-execute the instruction in a serial context.
This is obviously something that will need to be filled in further
as MTTCG progresses.

This minimally tested, but it is good enough to boot Fedora 24 x86-64,
even with the softmmu single-step stubbed out.  Perhaps unsurprisingly,
Fedora does not attempt an unaligned atomic operation.

Comments?


r~


Emilio G. Cota (18):
  atomics: add atomic_xor
  atomics: add atomic_op_fetch variants
  target-i386: emulate LOCK'ed cmpxchg using cmpxchg helpers
  target-i386: emulate LOCK'ed OP instructions using atomic helpers
  target-i386: emulate LOCK'ed INC using atomic helper
  target-i386: emulate LOCK'ed NOT using atomic helper
  target-i386: emulate LOCK'ed NEG using cmpxchg helper
  target-i386: emulate LOCK'ed XADD using atomic helper
  target-i386: emulate LOCK'ed BTX ops using atomic helpers
  target-i386: emulate XCHG using atomic helper
  target-i386: remove helper_lock()
  tests: add atomic_add-bench
  target-arm: emulate LL/SC using cmpxchg helpers
  target-arm: emulate SWP with atomic_xchg helper
  target-arm: emulate aarch64's LL/SC using cmpxchg helpers
  linux-user: remove handling of ARM's EXCP_STREX
  linux-user: remove handling of aarch64's EXCP_STREX
  target-arm: remove EXCP_STREX + cpu_exclusive_{test, info}

Richard Henderson (9):
  exec: Avoid direct references to Int128 parts
  int128: Use __int128 if available
  int128: Add int128_make128
  int128: Use complex numbers if advisable
  tcg: Add EXCP_ATOMIC
  HACK: Always enable parallel_cpus
  tcg: Add atomic helpers
  tcg: Add atomic128 helpers
  target-arm: Rearrange aa32 load and store functions

 Makefile.objs              |   1 -
 Makefile.target            |   1 +
 atomic_template.h          | 220 ++++++++++++++++++++++++++
 configure                  |  29 +++-
 cpu-exec-common.c          |   6 +
 cpu-exec.c                 |  23 +++
 cpus.c                     |   6 +
 cputlb.c                   |   9 +-
 exec.c                     |   4 +-
 include/exec/cpu-all.h     |   1 +
 include/exec/exec-all.h    |   1 +
 include/qemu-common.h      |   1 +
 include/qemu/atomic.h      |  21 +++
 include/qemu/int128.h      | 261 ++++++++++++++++++++++++++-----
 linux-user/main.c          | 277 +++++++--------------------------
 softmmu_template.h         | 264 +++++++++++++++++++++++++++++---
 target-arm/cpu.h           |  17 +--
 target-arm/helper-a64.c    | 135 ++++++++++++++++
 target-arm/helper-a64.h    |   2 +
 target-arm/internals.h     |   4 +-
 target-arm/translate-a64.c | 106 ++++++-------
 target-arm/translate.c     | 342 +++++++++++++++--------------------------
 target-arm/translate.h     |   4 -
 target-i386/helper.h       |   2 -
 target-i386/mem_helper.c   | 127 ++++++++-------
 target-i386/translate.c    | 374 ++++++++++++++++++++++++++++-----------------
 tcg-runtime.c              |  27 ++--
 tcg/tcg-op.c               | 324 +++++++++++++++++++++++++++++++++++++++
 tcg/tcg-op.h               |  44 ++++++
 tcg/tcg-runtime.h          |  75 +++++++++
 tcg/tcg.h                  |  76 +++++++++
 tests/.gitignore           |   1 +
 tests/Makefile.include     |   4 +-
 tests/atomic_add-bench.c   | 180 ++++++++++++++++++++++
 translate-all.c            |   1 +
 35 files changed, 2189 insertions(+), 781 deletions(-)
 create mode 100644 atomic_template.h
 create mode 100644 tests/atomic_add-bench.c

-- 
2.5.5

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH v2 01/27] atomics: add atomic_xor
  2016-07-01 17:04 [Qemu-devel] [PATCH v2 00/27] cmpxchg-based emulation of atomics Richard Henderson
@ 2016-07-01 17:04 ` Richard Henderson
  2016-08-11 17:19   ` Alex Bennée
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 02/27] atomics: add atomic_op_fetch variants Richard Henderson
                   ` (26 subsequent siblings)
  27 siblings, 1 reply; 48+ messages in thread
From: Richard Henderson @ 2016-07-01 17:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota, alex.bennee, pbonzini, peter.maydell, serge.fdrv

From: "Emilio G. Cota" <cota@braap.org>

This paves the way for upcoming work.

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1467054136-10430-8-git-send-email-cota@braap.org>
---
 include/qemu/atomic.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/include/qemu/atomic.h b/include/qemu/atomic.h
index 7a59096..a5531da 100644
--- a/include/qemu/atomic.h
+++ b/include/qemu/atomic.h
@@ -161,6 +161,7 @@
 #define atomic_fetch_sub(ptr, n) __atomic_fetch_sub(ptr, n, __ATOMIC_SEQ_CST)
 #define atomic_fetch_and(ptr, n) __atomic_fetch_and(ptr, n, __ATOMIC_SEQ_CST)
 #define atomic_fetch_or(ptr, n)  __atomic_fetch_or(ptr, n, __ATOMIC_SEQ_CST)
+#define atomic_fetch_xor(ptr, n) __atomic_fetch_xor(ptr, n, __ATOMIC_SEQ_CST)
 
 /* And even shorter names that return void.  */
 #define atomic_inc(ptr)    ((void) __atomic_fetch_add(ptr, 1, __ATOMIC_SEQ_CST))
@@ -169,6 +170,7 @@
 #define atomic_sub(ptr, n) ((void) __atomic_fetch_sub(ptr, n, __ATOMIC_SEQ_CST))
 #define atomic_and(ptr, n) ((void) __atomic_fetch_and(ptr, n, __ATOMIC_SEQ_CST))
 #define atomic_or(ptr, n)  ((void) __atomic_fetch_or(ptr, n, __ATOMIC_SEQ_CST))
+#define atomic_xor(ptr, n) ((void) __atomic_fetch_xor(ptr, n, __ATOMIC_SEQ_CST))
 
 #else /* __ATOMIC_RELAXED */
 
@@ -355,6 +357,7 @@
 #define atomic_fetch_sub       __sync_fetch_and_sub
 #define atomic_fetch_and       __sync_fetch_and_and
 #define atomic_fetch_or        __sync_fetch_and_or
+#define atomic_fetch_xor       __sync_fetch_and_xor
 #define atomic_cmpxchg         __sync_val_compare_and_swap
 
 /* And even shorter names that return void.  */
@@ -364,6 +367,7 @@
 #define atomic_sub(ptr, n)     ((void) __sync_fetch_and_sub(ptr, n))
 #define atomic_and(ptr, n)     ((void) __sync_fetch_and_and(ptr, n))
 #define atomic_or(ptr, n)      ((void) __sync_fetch_and_or(ptr, n))
+#define atomic_xor(ptr, n)     ((void) __sync_fetch_and_xor(ptr, n))
 
 #endif /* __ATOMIC_RELAXED */
 #endif /* __QEMU_ATOMIC_H */
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH v2 02/27] atomics: add atomic_op_fetch variants
  2016-07-01 17:04 [Qemu-devel] [PATCH v2 00/27] cmpxchg-based emulation of atomics Richard Henderson
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 01/27] atomics: add atomic_xor Richard Henderson
@ 2016-07-01 17:04 ` Richard Henderson
  2016-08-11 17:20   ` Alex Bennée
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 03/27] exec: Avoid direct references to Int128 parts Richard Henderson
                   ` (25 subsequent siblings)
  27 siblings, 1 reply; 48+ messages in thread
From: Richard Henderson @ 2016-07-01 17:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota, alex.bennee, pbonzini, peter.maydell, serge.fdrv

From: "Emilio G. Cota" <cota@braap.org>

This paves the way for upcoming work.

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1467054136-10430-9-git-send-email-cota@braap.org>
---
 include/qemu/atomic.h | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/include/qemu/atomic.h b/include/qemu/atomic.h
index a5531da..d75bfde 100644
--- a/include/qemu/atomic.h
+++ b/include/qemu/atomic.h
@@ -163,6 +163,14 @@
 #define atomic_fetch_or(ptr, n)  __atomic_fetch_or(ptr, n, __ATOMIC_SEQ_CST)
 #define atomic_fetch_xor(ptr, n) __atomic_fetch_xor(ptr, n, __ATOMIC_SEQ_CST)
 
+#define atomic_inc_fetch(ptr)    __atomic_add_fetch(ptr, 1, __ATOMIC_SEQ_CST)
+#define atomic_dec_fetch(ptr)    __atomic_sub_fetch(ptr, 1, __ATOMIC_SEQ_CST)
+#define atomic_add_fetch(ptr, n) __atomic_add_fetch(ptr, n, __ATOMIC_SEQ_CST)
+#define atomic_sub_fetch(ptr, n) __atomic_sub_fetch(ptr, n, __ATOMIC_SEQ_CST)
+#define atomic_and_fetch(ptr, n) __atomic_and_fetch(ptr, n, __ATOMIC_SEQ_CST)
+#define atomic_or_fetch(ptr, n)  __atomic_or_fetch(ptr, n, __ATOMIC_SEQ_CST)
+#define atomic_xor_fetch(ptr, n) __atomic_xor_fetch(ptr, n, __ATOMIC_SEQ_CST)
+
 /* And even shorter names that return void.  */
 #define atomic_inc(ptr)    ((void) __atomic_fetch_add(ptr, 1, __ATOMIC_SEQ_CST))
 #define atomic_dec(ptr)    ((void) __atomic_fetch_sub(ptr, 1, __ATOMIC_SEQ_CST))
@@ -358,6 +366,15 @@
 #define atomic_fetch_and       __sync_fetch_and_and
 #define atomic_fetch_or        __sync_fetch_and_or
 #define atomic_fetch_xor       __sync_fetch_and_xor
+
+#define atomic_inc_fetch(ptr)  __sync_add_and_fetch(ptr, 1)
+#define atomic_dec_fetch(ptr)  __sync_add_and_fetch(ptr, -1)
+#define atomic_add_fetch       __sync_add_and_fetch
+#define atomic_sub_fetch       __sync_sub_and_fetch
+#define atomic_and_fetch       __sync_and_and_fetch
+#define atomic_or_fetch        __sync_or_and_fetch
+#define atomic_xor_fetch       __sync_xor_and_fetch
+
 #define atomic_cmpxchg         __sync_val_compare_and_swap
 
 /* And even shorter names that return void.  */
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH v2 03/27] exec: Avoid direct references to Int128 parts
  2016-07-01 17:04 [Qemu-devel] [PATCH v2 00/27] cmpxchg-based emulation of atomics Richard Henderson
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 01/27] atomics: add atomic_xor Richard Henderson
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 02/27] atomics: add atomic_op_fetch variants Richard Henderson
@ 2016-07-01 17:04 ` Richard Henderson
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 04/27] int128: Use __int128 if available Richard Henderson
                   ` (24 subsequent siblings)
  27 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2016-07-01 17:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota, alex.bennee, pbonzini, peter.maydell, serge.fdrv

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 exec.c                |  4 ++--
 include/qemu/int128.h | 10 ++++++++++
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/exec.c b/exec.c
index 0122ef7..806e2fe 100644
--- a/exec.c
+++ b/exec.c
@@ -318,9 +318,9 @@ static inline bool section_covers_addr(const MemoryRegionSection *section,
     /* Memory topology clips a memory region to [0, 2^64); size.hi > 0 means
      * the section must cover the entire address space.
      */
-    return section->size.hi ||
+    return int128_gethi(section->size) ||
            range_covers_byte(section->offset_within_address_space,
-                             section->size.lo, addr);
+                             int128_getlo(section->size), addr);
 }
 
 static MemoryRegionSection *phys_page_find(PhysPageEntry lp, hwaddr addr,
diff --git a/include/qemu/int128.h b/include/qemu/int128.h
index c598881..52aaf99 100644
--- a/include/qemu/int128.h
+++ b/include/qemu/int128.h
@@ -20,6 +20,16 @@ static inline uint64_t int128_get64(Int128 a)
     return a.lo;
 }
 
+static inline uint64_t int128_getlo(Int128 a)
+{
+    return a.lo;
+}
+
+static inline int64_t int128_gethi(Int128 a)
+{
+    return a.hi;
+}
+
 static inline Int128 int128_zero(void)
 {
     return int128_make64(0);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH v2 04/27] int128: Use __int128 if available
  2016-07-01 17:04 [Qemu-devel] [PATCH v2 00/27] cmpxchg-based emulation of atomics Richard Henderson
                   ` (2 preceding siblings ...)
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 03/27] exec: Avoid direct references to Int128 parts Richard Henderson
@ 2016-07-01 17:04 ` Richard Henderson
  2016-08-11 10:45   ` Alex Bennée
  2016-08-25 19:09   ` [Qemu-devel] [PATCH] fixup! " Alex Bennée
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 06/27] int128: Use complex numbers if advisable Richard Henderson
                   ` (23 subsequent siblings)
  27 siblings, 2 replies; 48+ messages in thread
From: Richard Henderson @ 2016-07-01 17:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota, alex.bennee, pbonzini, peter.maydell, serge.fdrv

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 include/qemu/int128.h | 135 +++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 134 insertions(+), 1 deletion(-)

diff --git a/include/qemu/int128.h b/include/qemu/int128.h
index 52aaf99..08f1db1 100644
--- a/include/qemu/int128.h
+++ b/include/qemu/int128.h
@@ -1,6 +1,138 @@
 #ifndef INT128_H
 #define INT128_H
 
+#ifdef CONFIG_INT128
+
+typedef __int128 Int128;
+
+static inline Int128 int128_make64(uint64_t a)
+{
+    return a;
+}
+
+static inline uint64_t int128_get64(Int128 a)
+{
+    uint64_t r = a;
+    assert(r == a);
+    return r;
+}
+
+static inline uint64_t int128_getlo(Int128 a)
+{
+    return a;
+}
+
+static inline int64_t int128_gethi(Int128 a)
+{
+    return a >> 64;
+}
+
+static inline Int128 int128_zero(void)
+{
+    return 0;
+}
+
+static inline Int128 int128_one(void)
+{
+    return 1;
+}
+
+static inline Int128 int128_2_64(void)
+{
+    return (Int128)1 << 64;
+}
+
+static inline Int128 int128_exts64(int64_t a)
+{
+    return a;
+}
+
+static inline Int128 int128_and(Int128 a, Int128 b)
+{
+    return a & b;
+}
+
+static inline Int128 int128_rshift(Int128 a, int n)
+{
+    return a >> n;
+}
+
+static inline Int128 int128_add(Int128 a, Int128 b)
+{
+    return a + b;
+}
+
+static inline Int128 int128_neg(Int128 a)
+{
+    return -a;
+}
+
+static inline Int128 int128_sub(Int128 a, Int128 b)
+{
+    return a - b;
+}
+
+static inline bool int128_nonneg(Int128 a)
+{
+    return a >= 0;
+}
+
+static inline bool int128_eq(Int128 a, Int128 b)
+{
+    return a == b;
+}
+
+static inline bool int128_ne(Int128 a, Int128 b)
+{
+    return a != b;
+}
+
+static inline bool int128_ge(Int128 a, Int128 b)
+{
+    return a >= b;
+}
+
+static inline bool int128_lt(Int128 a, Int128 b)
+{
+    return a < b;
+}
+
+static inline bool int128_le(Int128 a, Int128 b)
+{
+    return a <= b;
+}
+
+static inline bool int128_gt(Int128 a, Int128 b)
+{
+    return a > b;
+}
+
+static inline bool int128_nz(Int128 a)
+{
+    return a != 0;
+}
+
+static inline Int128 int128_min(Int128 a, Int128 b)
+{
+    return a < b ? a : b;
+}
+
+static inline Int128 int128_max(Int128 a, Int128 b)
+{
+    return a > b ? a : b;
+}
+
+static inline void int128_addto(Int128 *a, Int128 b)
+{
+    *a += b;
+}
+
+static inline void int128_subfrom(Int128 *a, Int128 b)
+{
+    *a -= b;
+}
+
+#else /* !CONFIG_INT128 */
 
 typedef struct Int128 Int128;
 
@@ -153,4 +285,5 @@ static inline void int128_subfrom(Int128 *a, Int128 b)
     *a = int128_sub(*a, b);
 }
 
-#endif
+#endif /* CONFIG_INT128 */
+#endif /* INT128_H */
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH v2 06/27] int128: Use complex numbers if advisable
  2016-07-01 17:04 [Qemu-devel] [PATCH v2 00/27] cmpxchg-based emulation of atomics Richard Henderson
                   ` (3 preceding siblings ...)
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 04/27] int128: Use __int128 if available Richard Henderson
@ 2016-07-01 17:04 ` Richard Henderson
  2016-07-04 11:51   ` Paolo Bonzini
  2016-07-04 12:07   ` Peter Maydell
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 07/27] tcg: Add EXCP_ATOMIC Richard Henderson
                   ` (22 subsequent siblings)
  27 siblings, 2 replies; 48+ messages in thread
From: Richard Henderson @ 2016-07-01 17:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota, alex.bennee, pbonzini, peter.maydell, serge.fdrv

If __int128 is not supported, prefer a base type that is
returned in registers rather than memory.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 include/qemu/int128.h | 110 +++++++++++++++++++++++++++++++-------------------
 1 file changed, 69 insertions(+), 41 deletions(-)

diff --git a/include/qemu/int128.h b/include/qemu/int128.h
index 67440fa..ab67275 100644
--- a/include/qemu/int128.h
+++ b/include/qemu/int128.h
@@ -139,27 +139,37 @@ static inline void int128_subfrom(Int128 *a, Int128 b)
 
 #else /* !CONFIG_INT128 */
 
-typedef struct Int128 Int128;
+/* Here we are catering to the ABI of the host.  If the host returns
+   64-bit complex in registers, but the 128-bit structure in memory,
+   then choose the complex representation.  */
+#if defined(__GNUC__) \
+    && (defined(__powerpc__) || defined(__sparc__)) \
+    && !defined(CONFIG_TCG_INTERPRETER)
+typedef _Complex unsigned long long Int128;
 
-struct Int128 {
-    uint64_t lo;
-    int64_t hi;
-};
+static inline Int128 int128_make128(uint64_t lo, uint64_t hi)
+{
+    return lo + 1i * hi;
+}
 
-static inline Int128 int128_make64(uint64_t a)
+static inline uint64_t int128_getlo(Int128 a)
 {
-    return (Int128) { a, 0 };
+    return __real__ a;
 }
 
-static inline Int128 int128_make128(uint64_t lo, uint64_t hi)
+static inline int64_t int128_gethi(Int128 a)
 {
-    return (Int128) { lo, hi };
+    return __imag__ a;
 }
+#else
+typedef struct Int128 {
+    uint64_t lo;
+    int64_t hi;
+} Int128;
 
-static inline uint64_t int128_get64(Int128 a)
+static inline Int128 int128_make128(uint64_t lo, uint64_t hi)
 {
-    assert(!a.hi);
-    return a.lo;
+    return (Int128) { lo, hi };
 }
 
 static inline uint64_t int128_getlo(Int128 a)
@@ -171,78 +181,92 @@ static inline int64_t int128_gethi(Int128 a)
 {
     return a.hi;
 }
+#endif /* complex 128 */
+
+static inline Int128 int128_make64(uint64_t a)
+{
+    return int128_make128(a, 0);
+}
+
+static inline uint64_t int128_get64(Int128 a)
+{
+    assert(int128_gethi(a) == 0);
+    return int128_getlo(a);
+}
 
 static inline Int128 int128_zero(void)
 {
-    return int128_make64(0);
+    return int128_make128(0, 0);
 }
 
 static inline Int128 int128_one(void)
 {
-    return int128_make64(1);
+    return int128_make128(1, 0);
 }
 
 static inline Int128 int128_2_64(void)
 {
-    return (Int128) { 0, 1 };
+    return int128_make128(0, 1);
 }
 
 static inline Int128 int128_exts64(int64_t a)
 {
-    return (Int128) { .lo = a, .hi = (a < 0) ? -1 : 0 };
+    return int128_make128(a, a < 0 ? -1 : 0);
 }
 
 static inline Int128 int128_and(Int128 a, Int128 b)
 {
-    return (Int128) { a.lo & b.lo, a.hi & b.hi };
+    uint64_t al = int128_getlo(a), bl = int128_getlo(b);
+    uint64_t ah = int128_gethi(a), bh = int128_gethi(b);
+
+    return int128_make128(al & bl, ah & bh);
 }
 
 static inline Int128 int128_rshift(Int128 a, int n)
 {
-    int64_t h;
-    if (!n) {
-        return a;
-    }
-    h = a.hi >> (n & 63);
-    if (n >= 64) {
+    uint64_t al = int128_getlo(a), ah = int128_gethi(a);
+    int64_t h = ((int64_t)ah) >> (n & 63);
+    if (n & 64) {
         return int128_make128(h, h >> 63);
     } else {
-        return int128_make128((a.lo >> n) | ((uint64_t)a.hi << (64 - n)), h);
+        return int128_make128((al >> n) | (ah << (64 - n)), h);
     }
 }
 
 static inline Int128 int128_add(Int128 a, Int128 b)
 {
-    uint64_t lo = a.lo + b.lo;
+    uint64_t al = int128_getlo(a), bl = int128_getlo(b);
+    uint64_t ah = int128_gethi(a), bh = int128_gethi(b);
+    uint64_t lo = al + bl;
 
-    /* a.lo <= a.lo + b.lo < a.lo + k (k is the base, 2^64).  Hence,
-     * a.lo + b.lo >= k implies 0 <= lo = a.lo + b.lo - k < a.lo.
-     * Similarly, a.lo + b.lo < k implies a.lo <= lo = a.lo + b.lo < k.
-     *
-     * So the carry is lo < a.lo.
-     */
-    return int128_make128(lo, (uint64_t)a.hi + b.hi + (lo < a.lo));
+    return int128_make128(lo, ah + bh + (lo < al));
 }
 
-static inline Int128 int128_neg(Int128 a)
+static inline Int128 int128_sub(Int128 a, Int128 b)
 {
-    uint64_t lo = -a.lo;
-    return int128_make128(lo, ~(uint64_t)a.hi + !lo);
+    uint64_t al = int128_getlo(a), bl = int128_getlo(b);
+    uint64_t ah = int128_gethi(a), bh = int128_gethi(b);
+
+    return int128_make128(al - bl, ah - bh - (al < bl));
 }
 
-static inline Int128 int128_sub(Int128 a, Int128 b)
+static inline Int128 int128_neg(Int128 a)
 {
-    return int128_make128(a.lo - b.lo, (uint64_t)a.hi - b.hi - (a.lo < b.lo));
+    uint64_t al = int128_getlo(a), ah = int128_gethi(a);
+    return int128_make128(-al, ~ah + !al);
 }
 
 static inline bool int128_nonneg(Int128 a)
 {
-    return a.hi >= 0;
+    return int128_gethi(a) >= 0;
 }
 
 static inline bool int128_eq(Int128 a, Int128 b)
 {
-    return a.lo == b.lo && a.hi == b.hi;
+    uint64_t al = int128_getlo(a), bl = int128_getlo(b);
+    uint64_t ah = int128_gethi(a), bh = int128_gethi(b);
+
+    return al == bl && ah == bh;
 }
 
 static inline bool int128_ne(Int128 a, Int128 b)
@@ -252,7 +276,10 @@ static inline bool int128_ne(Int128 a, Int128 b)
 
 static inline bool int128_ge(Int128 a, Int128 b)
 {
-    return a.hi > b.hi || (a.hi == b.hi && a.lo >= b.lo);
+    uint64_t al = int128_getlo(a), bl = int128_getlo(b);
+    int64_t  ah = int128_gethi(a), bh = int128_gethi(b);
+
+    return ah > bh || (ah == bh && al >= bl);
 }
 
 static inline bool int128_lt(Int128 a, Int128 b)
@@ -272,7 +299,8 @@ static inline bool int128_gt(Int128 a, Int128 b)
 
 static inline bool int128_nz(Int128 a)
 {
-    return a.lo || a.hi;
+    uint64_t al = int128_getlo(a), ah = int128_gethi(a);
+    return (al | ah) != 0;
 }
 
 static inline Int128 int128_min(Int128 a, Int128 b)
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH v2 07/27] tcg: Add EXCP_ATOMIC
  2016-07-01 17:04 [Qemu-devel] [PATCH v2 00/27] cmpxchg-based emulation of atomics Richard Henderson
                   ` (4 preceding siblings ...)
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 06/27] int128: Use complex numbers if advisable Richard Henderson
@ 2016-07-01 17:04 ` Richard Henderson
  2016-09-08  8:38   ` Alex Bennée
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 08/27] HACK: Always enable parallel_cpus Richard Henderson
                   ` (21 subsequent siblings)
  27 siblings, 1 reply; 48+ messages in thread
From: Richard Henderson @ 2016-07-01 17:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota, alex.bennee, pbonzini, peter.maydell, serge.fdrv

When we cannot emulate an atomic operation within a parallel
context, this exception allows us to stop the world and try
again in a serial context.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 cpu-exec-common.c       |  6 +++++
 cpu-exec.c              | 23 +++++++++++++++++++
 cpus.c                  |  6 +++++
 include/exec/cpu-all.h  |  1 +
 include/exec/exec-all.h |  1 +
 include/qemu-common.h   |  1 +
 linux-user/main.c       | 59 ++++++++++++++++++++++++++++++++++++++++++++++++-
 tcg/tcg.h               |  1 +
 translate-all.c         |  1 +
 9 files changed, 98 insertions(+), 1 deletion(-)

diff --git a/cpu-exec-common.c b/cpu-exec-common.c
index 0cb4ae6..767d9c6 100644
--- a/cpu-exec-common.c
+++ b/cpu-exec-common.c
@@ -77,3 +77,9 @@ void cpu_loop_exit_restore(CPUState *cpu, uintptr_t pc)
     }
     siglongjmp(cpu->jmp_env, 1);
 }
+
+void cpu_loop_exit_atomic(CPUState *cpu, uintptr_t pc)
+{
+    cpu->exception_index = EXCP_ATOMIC;
+    cpu_loop_exit_restore(cpu, pc);
+}
diff --git a/cpu-exec.c b/cpu-exec.c
index b840e1d..041f8b7 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -225,6 +225,29 @@ static void cpu_exec_nocache(CPUState *cpu, int max_cycles,
 }
 #endif
 
+void cpu_exec_step(CPUState *cpu)
+{
+    CPUArchState *env = (CPUArchState *)cpu->env_ptr;
+    TranslationBlock *tb;
+    target_ulong cs_base, pc;
+    uint32_t flags;
+    bool old_tb_flushed;
+
+    old_tb_flushed = cpu->tb_flushed;
+    cpu->tb_flushed = false;
+
+    cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
+    tb = tb_gen_code(cpu, pc, cs_base, flags,
+                     1 | CF_NOCACHE | CF_IGNORE_ICOUNT);
+    tb->orig_tb = NULL;
+    cpu->tb_flushed |= old_tb_flushed;
+    /* execute the generated code */
+    trace_exec_tb_nocache(tb, pc);
+    cpu_tb_exec(cpu, tb);
+    tb_phys_invalidate(tb, -1);
+    tb_free(tb);
+}
+
 struct tb_desc {
     target_ulong pc;
     target_ulong cs_base;
diff --git a/cpus.c b/cpus.c
index 84c3520..a01bbbd 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1575,6 +1575,12 @@ static void tcg_exec_all(void)
             if (r == EXCP_DEBUG) {
                 cpu_handle_guest_debug(cpu);
                 break;
+            } else if (r == EXCP_ATOMIC) {
+                /* ??? When we begin running cpus in parallel, we should
+                   stop all cpus, clear parallel_cpus, and interpret a
+                   single insn with cpu_exec_step.  In the meantime,
+                   we should never get here.  */
+                abort();
             }
         } else if (cpu->stop || cpu->stopped) {
             if (cpu->unplug) {
diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index 8007abb..d2aac4b 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -31,6 +31,7 @@
 #define EXCP_DEBUG      0x10002 /* cpu stopped after a breakpoint or singlestep */
 #define EXCP_HALTED     0x10003 /* cpu is halted (waiting for external event) */
 #define EXCP_YIELD      0x10004 /* cpu wants to yield timeslice to another */
+#define EXCP_ATOMIC     0x10005 /* stop-the-world and emulate atomic */
 
 /* some important defines:
  *
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index c1f59fa..ec72c5a 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -59,6 +59,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
 void cpu_exec_init(CPUState *cpu, Error **errp);
 void QEMU_NORETURN cpu_loop_exit(CPUState *cpu);
 void QEMU_NORETURN cpu_loop_exit_restore(CPUState *cpu, uintptr_t pc);
+void QEMU_NORETURN cpu_loop_exit_atomic(CPUState *cpu, uintptr_t pc);
 
 #if !defined(CONFIG_USER_ONLY)
 void cpu_reloading_memory_map(void);
diff --git a/include/qemu-common.h b/include/qemu-common.h
index 1f2cb94..77e379d 100644
--- a/include/qemu-common.h
+++ b/include/qemu-common.h
@@ -76,6 +76,7 @@ void tcg_exec_init(unsigned long tb_size);
 bool tcg_enabled(void);
 
 void cpu_exec_init_all(void);
+void cpu_exec_step(CPUState *cpu);
 
 /**
  * Sends a (part of) iovec down a socket, yielding when the socket is full, or
diff --git a/linux-user/main.c b/linux-user/main.c
index 78d8d04..54df300 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -179,13 +179,25 @@ static inline void start_exclusive(void)
 }
 
 /* Finish an exclusive operation.  */
-static inline void __attribute__((unused)) end_exclusive(void)
+static inline void end_exclusive(void)
 {
     pending_cpus = 0;
     pthread_cond_broadcast(&exclusive_resume);
     pthread_mutex_unlock(&exclusive_lock);
 }
 
+static void step_atomic(CPUState *cpu)
+{
+    start_exclusive();
+
+    /* Since we got here, we know that parallel_cpus must be true.  */
+    parallel_cpus = false;
+    cpu_exec_step(cpu);
+    parallel_cpus = true;
+
+    end_exclusive();
+}
+
 /* Wait for exclusive ops to finish, and begin cpu execution.  */
 static inline void cpu_exec_start(CPUState *cpu)
 {
@@ -437,6 +449,9 @@ void cpu_loop(CPUX86State *env)
                   }
             }
             break;
+        case EXCP_ATOMIC:
+            step_atomic(cs);
+            break;
         default:
             pc = env->segs[R_CS].base + env->eip;
             EXCP_DUMP(env, "qemu: 0x%08lx: unhandled CPU exception 0x%x - aborting\n",
@@ -929,6 +944,9 @@ void cpu_loop(CPUARMState *env)
         case EXCP_YIELD:
             /* nothing to do here for user-mode, just resume guest code */
             break;
+        case EXCP_ATOMIC:
+            step_atomic(cs);
+            break;
         default:
         error:
             EXCP_DUMP(env, "qemu: unhandled CPU exception 0x%x - aborting\n", trapnr);
@@ -1128,6 +1146,9 @@ void cpu_loop(CPUARMState *env)
         case EXCP_YIELD:
             /* nothing to do here for user-mode, just resume guest code */
             break;
+        case EXCP_ATOMIC:
+            step_atomic(cs);
+            break;
         default:
             EXCP_DUMP(env, "qemu: unhandled CPU exception 0x%x - aborting\n", trapnr);
             abort();
@@ -1217,6 +1238,9 @@ void cpu_loop(CPUUniCore32State *env)
                 }
             }
             break;
+        case EXCP_ATOMIC:
+            step_atomic(cs);
+            break;
         default:
             goto error;
         }
@@ -1489,6 +1513,9 @@ void cpu_loop (CPUSPARCState *env)
                   }
             }
             break;
+        case EXCP_ATOMIC:
+            step_atomic(cs);
+            break;
         default:
             printf ("Unhandled trap: 0x%x\n", trapnr);
             cpu_dump_state(cs, stderr, fprintf, 0);
@@ -2037,6 +2064,9 @@ void cpu_loop(CPUPPCState *env)
         case EXCP_INTERRUPT:
             /* just indicate that signals should be handled asap */
             break;
+        case EXCP_ATOMIC:
+            step_atomic(cs);
+            break;
         default:
             cpu_abort(cs, "Unknown exception 0x%d. Aborting\n", trapnr);
             break;
@@ -2710,6 +2740,9 @@ done_syscall:
                 }
             }
             break;
+        case EXCP_ATOMIC:
+            step_atomic(cs);
+            break;
         default:
 error:
             EXCP_DUMP(env, "qemu: unhandled CPU exception 0x%x - aborting\n", trapnr);
@@ -2796,6 +2829,9 @@ void cpu_loop(CPUOpenRISCState *env)
         case EXCP_NR:
             qemu_log_mask(CPU_LOG_INT, "\nNR\n");
             break;
+        case EXCP_ATOMIC:
+            step_atomic(cs);
+            break;
         default:
             EXCP_DUMP(env, "\nqemu: unhandled CPU exception %#x - aborting\n",
                      trapnr);
@@ -2871,6 +2907,9 @@ void cpu_loop(CPUSH4State *env)
             queue_signal(env, info.si_signo, &info);
 	    break;
 
+        case EXCP_ATOMIC:
+            step_atomic(cs);
+            break;
         default:
             printf ("Unhandled trap: 0x%x\n", trapnr);
             cpu_dump_state(cs, stderr, fprintf, 0);
@@ -2936,6 +2975,9 @@ void cpu_loop(CPUCRISState *env)
                   }
             }
             break;
+        case EXCP_ATOMIC:
+            step_atomic(cs);
+            break;
         default:
             printf ("Unhandled trap: 0x%x\n", trapnr);
             cpu_dump_state(cs, stderr, fprintf, 0);
@@ -3050,6 +3092,9 @@ void cpu_loop(CPUMBState *env)
                   }
             }
             break;
+        case EXCP_ATOMIC:
+            step_atomic(cs);
+            break;
         default:
             printf ("Unhandled trap: 0x%x\n", trapnr);
             cpu_dump_state(cs, stderr, fprintf, 0);
@@ -3151,6 +3196,9 @@ void cpu_loop(CPUM68KState *env)
                   }
             }
             break;
+        case EXCP_ATOMIC:
+            step_atomic(cs);
+            break;
         default:
             EXCP_DUMP(env, "qemu: unhandled CPU exception 0x%x - aborting\n", trapnr);
             abort();
@@ -3386,6 +3434,9 @@ void cpu_loop(CPUAlphaState *env)
         case EXCP_INTERRUPT:
             /* Just indicate that signals should be handled asap.  */
             break;
+        case EXCP_ATOMIC:
+            step_atomic(cs);
+            break;
         default:
             printf ("Unhandled trap: 0x%x\n", trapnr);
             cpu_dump_state(cs, stderr, fprintf, 0);
@@ -3513,6 +3564,9 @@ void cpu_loop(CPUS390XState *env)
             queue_signal(env, info.si_signo, &info);
             break;
 
+        case EXCP_ATOMIC:
+            step_atomic(cs);
+            break;
         default:
             fprintf(stderr, "Unhandled trap: 0x%x\n", trapnr);
             cpu_dump_state(cs, stderr, fprintf, 0);
@@ -3765,6 +3819,9 @@ void cpu_loop(CPUTLGState *env)
         case TILEGX_EXCP_REG_UDN_ACCESS:
             gen_sigill_reg(env);
             break;
+        case EXCP_ATOMIC:
+            step_atomic(cs);
+            break;
         default:
             fprintf(stderr, "trapnr is %d[0x%x].\n", trapnr, trapnr);
             g_assert_not_reached();
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 66ae0c7..ab67537 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -691,6 +691,7 @@ struct TCGContext {
 };
 
 extern TCGContext tcg_ctx;
+extern bool parallel_cpus;
 
 static inline void tcg_set_insn_param(int op_idx, int arg, TCGArg v)
 {
diff --git a/translate-all.c b/translate-all.c
index eaa95e4..99ae7f9 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -119,6 +119,7 @@ static void *l1_map[V_L1_SIZE];
 
 /* code generation context */
 TCGContext tcg_ctx;
+bool parallel_cpus;
 
 /* translation block context */
 #ifdef CONFIG_USER_ONLY
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH v2 08/27] HACK: Always enable parallel_cpus
  2016-07-01 17:04 [Qemu-devel] [PATCH v2 00/27] cmpxchg-based emulation of atomics Richard Henderson
                   ` (5 preceding siblings ...)
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 07/27] tcg: Add EXCP_ATOMIC Richard Henderson
@ 2016-07-01 17:04 ` Richard Henderson
  2016-09-08  8:39   ` Alex Bennée
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 09/27] tcg: Add atomic helpers Richard Henderson
                   ` (20 subsequent siblings)
  27 siblings, 1 reply; 48+ messages in thread
From: Richard Henderson @ 2016-07-01 17:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota, alex.bennee, pbonzini, peter.maydell, serge.fdrv

This is really just a placeholder for an actual
command-line switch for mttcg.
---
 translate-all.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/translate-all.c b/translate-all.c
index 99ae7f9..a10fa06 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -119,7 +119,7 @@ static void *l1_map[V_L1_SIZE];
 
 /* code generation context */
 TCGContext tcg_ctx;
-bool parallel_cpus;
+bool parallel_cpus = 1;
 
 /* translation block context */
 #ifdef CONFIG_USER_ONLY
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH v2 09/27] tcg: Add atomic helpers
  2016-07-01 17:04 [Qemu-devel] [PATCH v2 00/27] cmpxchg-based emulation of atomics Richard Henderson
                   ` (6 preceding siblings ...)
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 08/27] HACK: Always enable parallel_cpus Richard Henderson
@ 2016-07-01 17:04 ` Richard Henderson
  2016-09-08 13:43   ` Alex Bennée
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 10/27] tcg: Add atomic128 helpers Richard Henderson
                   ` (19 subsequent siblings)
  27 siblings, 1 reply; 48+ messages in thread
From: Richard Henderson @ 2016-07-01 17:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota, alex.bennee, pbonzini, peter.maydell, serge.fdrv

Add all of cmpxchg, op_fetch, fetch_op, and xchg.
Handle both endian-ness, and sizes up to 8.
Handle expanding non-atomically, when emulating in serial.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 Makefile.objs      |   1 -
 Makefile.target    |   1 +
 atomic_template.h  | 220 ++++++++++++++++++++++++++++++++++++
 cputlb.c           |   3 +-
 softmmu_template.h | 178 ++++++++++++++++++++++++++++-
 tcg-runtime.c      |  27 +++--
 tcg/tcg-op.c       | 324 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 tcg/tcg-op.h       |  44 ++++++++
 tcg/tcg-runtime.h  |  75 +++++++++++++
 tcg/tcg.h          |  53 +++++++++
 10 files changed, 909 insertions(+), 17 deletions(-)
 create mode 100644 atomic_template.h

diff --git a/Makefile.objs b/Makefile.objs
index 7f1f0a3..f40bdfd 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -88,7 +88,6 @@ endif
 
 #######################################################################
 # Target-independent parts used in system and user emulation
-common-obj-y += tcg-runtime.o
 common-obj-y += hw/
 common-obj-y += qom/
 common-obj-y += disas/
diff --git a/Makefile.target b/Makefile.target
index d720b3e..0ca9ed6 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -94,6 +94,7 @@ obj-$(CONFIG_TCG_INTERPRETER) += disas/tci.o
 obj-y += fpu/softfloat.o
 obj-y += target-$(TARGET_BASE_ARCH)/
 obj-y += disas.o
+obj-y += tcg-runtime.o
 obj-$(call notempty,$(TARGET_XML_FILES)) += gdbstub-xml.o
 obj-$(call lnot,$(CONFIG_KVM)) += kvm-stub.o
 
diff --git a/atomic_template.h b/atomic_template.h
new file mode 100644
index 0000000..a755853
--- /dev/null
+++ b/atomic_template.h
@@ -0,0 +1,220 @@
+/*
+ * Atomic helper templates
+ * Included from tcg-runtime.c.
+ *
+ * Copyright (c) 2016 Red Hat, Inc
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#define DATA_SIZE (1 << SHIFT)
+
+#if DATA_SIZE == 8
+#define SUFFIX     q
+#define DATA_TYPE  uint64_t
+#define ABI_TYPE   uint64_t
+#define BSWAP      bswap64
+#elif DATA_SIZE == 4
+#define SUFFIX     l
+#define DATA_TYPE  uint32_t
+#define ABI_TYPE   uint32_t
+#define BSWAP      bswap32
+#elif DATA_SIZE == 2
+#define SUFFIX     w
+#define DATA_TYPE  uint16_t
+#define ABI_TYPE   uint32_t
+#define BSWAP      bswap16
+#elif DATA_SIZE == 1
+#define SUFFIX     b
+#define DATA_TYPE  uint8_t
+#define ABI_TYPE   uint32_t
+#else
+#error unsupported data size
+#endif
+
+#ifdef CONFIG_USER_ONLY
+
+#if DATA_SIZE == 1
+# define HE_SUFFIX
+#elif defined(HOST_WORDS_BIGENDIAN)
+# define HE_SUFFIX  _be
+# define RE_SUFFIX  _le
+#else
+# define HE_SUFFIX  _le
+# define RE_SUFFIX  _be
+#endif
+
+ABI_TYPE HELPER(glue(glue(atomic_cmpxchg, SUFFIX), HE_SUFFIX))
+    (target_ulong addr, ABI_TYPE cmpv, ABI_TYPE newv)
+{
+    DATA_TYPE *haddr = g2h(addr);
+    return atomic_cmpxchg(haddr, cmpv, newv);
+}
+
+#define GEN_ATOMIC_HELPER_HE(NAME)                                  \
+ABI_TYPE HELPER(glue(glue(atomic_##NAME, SUFFIX), HE_SUFFIX))       \
+    (target_ulong addr, ABI_TYPE val)                               \
+{                                                                   \
+    DATA_TYPE *haddr = g2h(addr);                                   \
+    return atomic_##NAME(haddr, val);                               \
+}                                                                   \
+
+GEN_ATOMIC_HELPER_HE(fetch_add)
+GEN_ATOMIC_HELPER_HE(fetch_and)
+GEN_ATOMIC_HELPER_HE(fetch_or)
+GEN_ATOMIC_HELPER_HE(fetch_xor)
+GEN_ATOMIC_HELPER_HE(add_fetch)
+GEN_ATOMIC_HELPER_HE(and_fetch)
+GEN_ATOMIC_HELPER_HE(or_fetch)
+GEN_ATOMIC_HELPER_HE(xor_fetch)
+GEN_ATOMIC_HELPER_HE(xchg)
+
+#undef GEN_ATOMIC_HELPER_HE
+
+#if DATA_SIZE > 1
+
+ABI_TYPE HELPER(glue(glue(atomic_cmpxchg, SUFFIX), RE_SUFFIX))
+    (target_ulong addr, ABI_TYPE cmpv, ABI_TYPE newv)
+{
+    DATA_TYPE *haddr = g2h(addr);
+    return BSWAP(atomic_cmpxchg(haddr, BSWAP(cmpv), BSWAP(newv)));
+}
+
+#define GEN_ATOMIC_HELPER_RE(NAME)                                  \
+ABI_TYPE HELPER(glue(glue(atomic_##NAME, SUFFIX), RE_SUFFIX))       \
+    (target_ulong addr, ABI_TYPE val)                               \
+{                                                                   \
+    DATA_TYPE *haddr = g2h(addr);                                   \
+    return BSWAP(atomic_##NAME(haddr, BSWAP(val)));                 \
+}
+
+GEN_ATOMIC_HELPER_RE(fetch_and)
+GEN_ATOMIC_HELPER_RE(fetch_or)
+GEN_ATOMIC_HELPER_RE(fetch_xor)
+GEN_ATOMIC_HELPER_RE(and_fetch)
+GEN_ATOMIC_HELPER_RE(or_fetch)
+GEN_ATOMIC_HELPER_RE(xor_fetch)
+GEN_ATOMIC_HELPER_RE(xchg)
+
+/* Note that for addition, we need to use a separate cmpxchg loop instead
+   of bswaps for the reverse-host-endian helpers.  */
+ABI_TYPE HELPER(glue(glue(atomic_fetch_add, SUFFIX), RE_SUFFIX))
+    (target_ulong addr, ABI_TYPE val)
+{
+    DATA_TYPE ldo, ldn, ret, sto;
+    DATA_TYPE *haddr = g2h(addr);
+
+    ldo = *haddr;
+    while (1) {
+        ret = BSWAP(ldo);
+        sto = BSWAP(ret + val);
+        ldn = atomic_cmpxchg(haddr, ldo, sto);
+        if (ldn == ldo) {
+            return ret;
+        }
+        ldo = ldn;
+    }
+}
+
+ABI_TYPE HELPER(glue(glue(atomic_add_fetch, SUFFIX), RE_SUFFIX))
+    (target_ulong addr, ABI_TYPE val)
+{
+    DATA_TYPE ldo, ldn, ret, sto;
+    DATA_TYPE *haddr = g2h(addr);
+
+    ldo = *haddr;
+    while (1) {
+        ret = BSWAP(ldo) + val;
+        sto = BSWAP(ret);
+        ldn = atomic_cmpxchg(haddr, ldo, sto);
+        if (ldn == ldo) {
+            return ret;
+        }
+        ldo = ldn;
+    }
+}
+
+#undef GEN_ATOMIC_HELPER_RE
+#endif /* DATA_SIZE > 1 */
+
+#undef HE_SUFFIX
+#undef RE_SUFFIX
+
+#else /* !CONFIG_USER_ONLY */
+
+#if DATA_SIZE == 1
+#define LE_SUFFIX
+#else
+#define LE_SUFFIX  _le
+#endif
+
+ABI_TYPE HELPER(glue(glue(atomic_cmpxchg, SUFFIX), LE_SUFFIX))
+    (CPUArchState *env, target_ulong addr,
+     ABI_TYPE cmpv, ABI_TYPE newv, uint32_t oi)
+{
+    return glue(glue(glue(helper_atomic_cmpxchg, SUFFIX), LE_SUFFIX), _mmu)
+        (env, addr, cmpv, newv, oi, GETPC());
+}
+
+#define GEN_ATOMIC_HELPER(NAME, S)                                      \
+ABI_TYPE HELPER(glue(glue(atomic_##NAME, SUFFIX), S))                   \
+    (CPUArchState *env, target_ulong addr, ABI_TYPE val, uint32_t oi)   \
+{                                                                       \
+    return glue(glue(glue(helper_atomic_##NAME, SUFFIX), S), _mmu)      \
+        (env, addr, val, oi, GETPC());                                  \
+}
+
+GEN_ATOMIC_HELPER(fetch_add, LE_SUFFIX)
+GEN_ATOMIC_HELPER(fetch_and, LE_SUFFIX)
+GEN_ATOMIC_HELPER(fetch_or, LE_SUFFIX)
+GEN_ATOMIC_HELPER(fetch_xor, LE_SUFFIX)
+GEN_ATOMIC_HELPER(add_fetch, LE_SUFFIX)
+GEN_ATOMIC_HELPER(and_fetch, LE_SUFFIX)
+GEN_ATOMIC_HELPER(or_fetch, LE_SUFFIX)
+GEN_ATOMIC_HELPER(xor_fetch, LE_SUFFIX)
+GEN_ATOMIC_HELPER(xchg, LE_SUFFIX)
+
+#if DATA_SIZE > 1
+
+ABI_TYPE HELPER(glue(glue(atomic_cmpxchg, SUFFIX), _be))
+    (CPUArchState *env, target_ulong addr,
+     ABI_TYPE cmpv, ABI_TYPE newv, uint32_t oi)
+{
+    return glue(glue(helper_atomic_cmpxchg, SUFFIX), _be_mmu)
+        (env, addr, cmpv, newv, oi, GETPC());
+}
+
+GEN_ATOMIC_HELPER(fetch_add, _be)
+GEN_ATOMIC_HELPER(fetch_and, _be)
+GEN_ATOMIC_HELPER(fetch_or, _be)
+GEN_ATOMIC_HELPER(fetch_xor, _be)
+GEN_ATOMIC_HELPER(add_fetch, _be)
+GEN_ATOMIC_HELPER(and_fetch, _be)
+GEN_ATOMIC_HELPER(or_fetch, _be)
+GEN_ATOMIC_HELPER(xor_fetch, _be)
+GEN_ATOMIC_HELPER(xchg, _be)
+
+#endif /* DATA_SIZE > 1 */
+
+#undef GEN_ATOMIC_HELPER
+#undef LE_SUFFIX
+
+#endif /* CONFIG_USER_ONLY */
+
+#undef BSWAP
+#undef ABI_TYPE
+#undef DATA_TYPE
+#undef SUFFIX
+#undef DATA_SIZE
+#undef SHIFT
diff --git a/cputlb.c b/cputlb.c
index 079e497..5272456 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -23,15 +23,14 @@
 #include "exec/memory.h"
 #include "exec/address-spaces.h"
 #include "exec/cpu_ldst.h"
-
 #include "exec/cputlb.h"
-
 #include "exec/memory-internal.h"
 #include "exec/ram_addr.h"
 #include "exec/exec-all.h"
 #include "tcg/tcg.h"
 #include "qemu/error-report.h"
 #include "exec/log.h"
+#include "exec/helper-proto.h"
 
 /* DEBUG defines, enable DEBUG_TLB_LOG to log to the CPU_LOG_MMU target */
 /* #define DEBUG_TLB */
diff --git a/softmmu_template.h b/softmmu_template.h
index 4d378ca..76712b9 100644
--- a/softmmu_template.h
+++ b/softmmu_template.h
@@ -51,7 +51,6 @@
 #error unsupported data size
 #endif
 
-
 /* For the benefit of TCG generated code, we want to avoid the complication
    of ABI-specific return type promotion and always return a value extended
    to the register size of the host.  This is tcg_target_long, except in the
@@ -508,11 +507,184 @@ void probe_write(CPUArchState *env, target_ulong addr, int mmu_idx,
     }
 }
 #endif
+
+#if DATA_SIZE == 1
+# define HE_SUFFIX  _mmu
+#elif defined(HOST_WORDS_BIGENDIAN)
+# define HE_SUFFIX  _be_mmu
+# define RE_SUFFIX  _le_mmu
+#else
+# define HE_SUFFIX  _le_mmu
+# define RE_SUFFIX  _be_mmu
+#endif
+
+#define ATOMIC_MMU_BODY                                                 \
+    DATA_TYPE *haddr;                                                   \
+    do {                                                                \
+        unsigned mmu_idx = get_mmuidx(oi);                              \
+        int index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);    \
+        CPUTLBEntry *tlbe = &env->tlb_table[mmu_idx][index];            \
+        target_ulong tlb_addr = tlbe->addr_write;                       \
+        int a_bits = get_alignment_bits(get_memop(oi));                 \
+                                                                        \
+        /* Adjust the given return address.  */                         \
+        retaddr -= GETPC_ADJ;                                           \
+                                                                        \
+        /* Enforce guest required alignment.  */                        \
+        if (unlikely(a_bits > 0 && (addr & ((1 << a_bits) - 1)))) {     \
+            /* ??? Maybe indicate atomic op to cpu_unaligned_access */  \
+            cpu_unaligned_access(ENV_GET_CPU(env), addr, MMU_DATA_STORE, \
+                                 mmu_idx, retaddr);                     \
+        }                                                               \
+        /* Enforce qemu required alignment.  */                         \
+        if (unlikely(addr & ((1 << SHIFT) - 1))) {                      \
+            /* We get here if guest alignment was not requested,        \
+               or was not enforced by cpu_unaligned_access above.       \
+               We might widen the access and emulate, but for now       \
+               mark an exception and exit the cpu loop.  */             \
+            cpu_loop_exit_atomic(ENV_GET_CPU(env), retaddr);            \
+        }                                                               \
+                                                                        \
+        /* Check TLB entry and enforce page permissions.  */            \
+        if ((addr & TARGET_PAGE_MASK)                                   \
+            != (tlb_addr & (TARGET_PAGE_MASK | TLB_INVALID_MASK))) {    \
+            if (!VICTIM_TLB_HIT(addr_write)) {                          \
+                tlb_fill(ENV_GET_CPU(env), addr, MMU_DATA_STORE,        \
+                         mmu_idx, retaddr);                             \
+            }                                                           \
+            tlb_addr = tlbe->addr_write;                                \
+        } else if (unlikely(tlbe->addr_read != tlb_addr)) {             \
+            /* Let the guest notice RMW on a write-only page.  */       \
+            tlb_fill(ENV_GET_CPU(env), addr, MMU_DATA_LOAD,             \
+                     mmu_idx, retaddr);                                 \
+        }                                                               \
+                                                                        \
+        /* Notice an IO access.  */                                     \
+        if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {                   \
+            /* There's really nothing that can be done to               \
+               support this apart from stop-the-world.  */              \
+            cpu_loop_exit_atomic(ENV_GET_CPU(env), retaddr);            \
+        }                                                               \
+        haddr = (DATA_TYPE *)((uintptr_t)addr + tlbe->addend);          \
+    } while (0)
+
+DATA_TYPE glue(glue(helper_atomic_cmpxchg, SUFFIX), HE_SUFFIX)
+    (CPUArchState *env, target_ulong addr, DATA_TYPE cmpv, DATA_TYPE newv,
+     TCGMemOpIdx oi, uintptr_t retaddr)
+{
+    ATOMIC_MMU_BODY;
+    return atomic_cmpxchg(haddr, cmpv, newv);
+}
+
+#define GEN_ATOMIC_HELPER(NAME)                                         \
+DATA_TYPE glue(glue(glue(helper_atomic_, NAME), SUFFIX), HE_SUFFIX)     \
+    (CPUArchState *env, target_ulong addr, DATA_TYPE val,               \
+     TCGMemOpIdx oi, uintptr_t retaddr)                                 \
+{                                                                       \
+    ATOMIC_MMU_BODY;                                                    \
+    return glue(atomic_, NAME)(haddr, val);                             \
+}
+
+GEN_ATOMIC_HELPER(fetch_add)
+GEN_ATOMIC_HELPER(fetch_and)
+GEN_ATOMIC_HELPER(fetch_or)
+GEN_ATOMIC_HELPER(fetch_xor)
+
+GEN_ATOMIC_HELPER(add_fetch)
+GEN_ATOMIC_HELPER(and_fetch)
+GEN_ATOMIC_HELPER(or_fetch)
+GEN_ATOMIC_HELPER(xor_fetch)
+
+GEN_ATOMIC_HELPER(xchg)
+
+#undef GEN_ATOMIC_HELPER
+
+#if DATA_SIZE > 1
+DATA_TYPE glue(glue(helper_atomic_cmpxchg, SUFFIX), RE_SUFFIX)
+    (CPUArchState *env, target_ulong addr, DATA_TYPE cmpv, DATA_TYPE newv,
+     TCGMemOpIdx oi, uintptr_t retaddr)
+{
+    DATA_TYPE retv;
+    cmpv = BSWAP(cmpv);
+    newv = BSWAP(newv);
+    retv = (glue(glue(helper_atomic_cmpxchg, SUFFIX), HE_SUFFIX)
+            (env, addr, cmpv, newv, oi, retaddr));
+    return BSWAP(retv);
+}
+
+#define GEN_ATOMIC_HELPER(NAME)                                         \
+DATA_TYPE glue(glue(glue(helper_atomic_, NAME), SUFFIX), RE_SUFFIX)     \
+    (CPUArchState *env, target_ulong addr, DATA_TYPE val,               \
+     TCGMemOpIdx oi, uintptr_t retaddr)                                 \
+{                                                                       \
+    DATA_TYPE ret;                                                      \
+    val = BSWAP(val);                                                   \
+    ret = (glue(glue(glue(helper_atomic_, NAME), SUFFIX), HE_SUFFIX)    \
+           (env, addr, val, oi, retaddr));                              \
+    return BSWAP(ret);                                                  \
+}
+
+GEN_ATOMIC_HELPER(fetch_and)
+GEN_ATOMIC_HELPER(fetch_or)
+GEN_ATOMIC_HELPER(fetch_xor)
+
+GEN_ATOMIC_HELPER(and_fetch)
+GEN_ATOMIC_HELPER(or_fetch)
+GEN_ATOMIC_HELPER(xor_fetch)
+
+GEN_ATOMIC_HELPER(xchg)
+
+#undef GEN_ATOMIC_HELPER
+
+/* Note that for addition, we need to use a separate cmpxchg loop instead
+   of bswaps around the host-endian helpers.  */
+DATA_TYPE glue(glue(helper_atomic_fetch_add, SUFFIX), RE_SUFFIX)
+    (CPUArchState *env, target_ulong addr, DATA_TYPE val,
+     TCGMemOpIdx oi, uintptr_t retaddr)
+{
+    DATA_TYPE ldo, ldn, ret, sto;
+    ATOMIC_MMU_BODY;
+
+    ldo = *haddr;
+    while (1) {
+        ret = BSWAP(ldo);
+        sto = BSWAP(ret + val);
+        ldn = atomic_cmpxchg(haddr, ldo, sto);
+        if (ldn == ldo) {
+            return ret;
+        }
+        ldo = ldn;
+    }
+}
+
+DATA_TYPE glue(glue(helper_atomic_add_fetch, SUFFIX), RE_SUFFIX)
+    (CPUArchState *env, target_ulong addr, DATA_TYPE val,
+     TCGMemOpIdx oi, uintptr_t retaddr)
+{
+    DATA_TYPE ldo, ldn, ret, sto;
+    ATOMIC_MMU_BODY;
+
+    ldo = *haddr;
+    while (1) {
+        ret = BSWAP(ldo) + val;
+        sto = BSWAP(ret);
+        ldn = atomic_cmpxchg(haddr, ldo, sto);
+        if (ldn == ldo) {
+            return ret;
+        }
+        ldo = ldn;
+    }
+}
+#endif /* DATA_SIZE > 1 */
+
+#undef ATOMIC_MMU_BODY
+
 #endif /* !defined(SOFTMMU_CODE_ACCESS) */
 
 #undef READ_ACCESS_TYPE
 #undef SHIFT
 #undef DATA_TYPE
+#undef DATAX_TYPE
 #undef SUFFIX
 #undef LSUFFIX
 #undef DATA_SIZE
@@ -524,8 +696,6 @@ void probe_write(CPUArchState *env, target_ulong addr, int mmu_idx,
 #undef BSWAP
 #undef TGT_BE
 #undef TGT_LE
-#undef CPU_BE
-#undef CPU_LE
 #undef helper_le_ld_name
 #undef helper_be_ld_name
 #undef helper_le_lds_name
@@ -534,3 +704,5 @@ void probe_write(CPUArchState *env, target_ulong addr, int mmu_idx,
 #undef helper_be_st_name
 #undef helper_te_ld_name
 #undef helper_te_st_name
+#undef HE_SUFFIX
+#undef RE_SUFFIX
diff --git a/tcg-runtime.c b/tcg-runtime.c
index ea2ad64..63f78fc 100644
--- a/tcg-runtime.c
+++ b/tcg-runtime.c
@@ -23,17 +23,10 @@
  */
 #include "qemu/osdep.h"
 #include "qemu/host-utils.h"
-
-/* This file is compiled once, and thus we can't include the standard
-   "exec/helper-proto.h", which has includes that are target specific.  */
-
-#include "exec/helper-head.h"
-
-#define DEF_HELPER_FLAGS_2(name, flags, ret, t1, t2) \
-  dh_ctype(ret) HELPER(name) (dh_ctype(t1), dh_ctype(t2));
-
-#include "tcg-runtime.h"
-
+#include "cpu.h"
+#include "exec/helper-proto.h"
+#include "exec/cpu_ldst.h"
+#include "exec/exec-all.h"
 
 /* 32-bit helpers */
 
@@ -107,3 +100,15 @@ int64_t HELPER(mulsh_i64)(int64_t arg1, int64_t arg2)
     muls64(&l, &h, arg1, arg2);
     return h;
 }
+
+#define SHIFT 0
+#include "atomic_template.h"
+
+#define SHIFT 1
+#include "atomic_template.h"
+
+#define SHIFT 2
+#include "atomic_template.h"
+
+#define SHIFT 3
+#include "atomic_template.h"
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 293b854..bc72c17 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -1958,3 +1958,327 @@ void tcg_gen_qemu_st_i64(TCGv_i64 val, TCGv addr, TCGArg idx, TCGMemOp memop)
                                addr, trace_mem_get_info(memop, 1));
     gen_ldst_i64(INDEX_op_qemu_st_i64, val, addr, memop, idx);
 }
+
+static void tcg_gen_ext_i32(TCGv_i32 ret, TCGv_i32 val, TCGMemOp opc)
+{
+    switch (opc & MO_SSIZE) {
+    case MO_SB:
+        tcg_gen_ext8s_i32(ret, val);
+        break;
+    case MO_UB:
+        tcg_gen_ext8u_i32(ret, val);
+        break;
+    case MO_SW:
+        tcg_gen_ext16s_i32(ret, val);
+        break;
+    case MO_UW:
+        tcg_gen_ext16u_i32(ret, val);
+        break;
+    default:
+        tcg_gen_mov_i32(ret, val);
+        break;
+    }
+}
+
+static void tcg_gen_ext_i64(TCGv_i64 ret, TCGv_i64 val, TCGMemOp opc)
+{
+    switch (opc & MO_SSIZE) {
+    case MO_SB:
+        tcg_gen_ext8s_i64(ret, val);
+        break;
+    case MO_UB:
+        tcg_gen_ext8u_i64(ret, val);
+        break;
+    case MO_SW:
+        tcg_gen_ext16s_i64(ret, val);
+        break;
+    case MO_UW:
+        tcg_gen_ext16u_i64(ret, val);
+        break;
+    case MO_SL:
+        tcg_gen_ext32s_i64(ret, val);
+        break;
+    case MO_UL:
+        tcg_gen_ext32u_i64(ret, val);
+        break;
+    default:
+        tcg_gen_mov_i64(ret, val);
+        break;
+    }
+}
+
+#ifdef CONFIG_USER_ONLY
+typedef void (*gen_atomic_cx_i32)(TCGv_i32, TCGv, TCGv_i32, TCGv_i32);
+typedef void (*gen_atomic_cx_i64)(TCGv_i64, TCGv, TCGv_i64, TCGv_i64);
+typedef void (*gen_atomic_op_i32)(TCGv_i32, TCGv, TCGv_i32);
+typedef void (*gen_atomic_op_i64)(TCGv_i64, TCGv, TCGv_i64);
+#else
+typedef void (*gen_atomic_cx_i32)(TCGv_i32, TCGv_env, TCGv,
+                                  TCGv_i32, TCGv_i32, TCGv_i32);
+typedef void (*gen_atomic_cx_i64)(TCGv_i64, TCGv_env, TCGv,
+                                  TCGv_i64, TCGv_i64, TCGv_i32);
+typedef void (*gen_atomic_op_i32)(TCGv_i32, TCGv_env, TCGv, TCGv_i32, TCGv_i32);
+typedef void (*gen_atomic_op_i64)(TCGv_i64, TCGv_env, TCGv, TCGv_i64, TCGv_i32);
+#endif
+
+static void * const table_cmpxchg[16] = {
+    [MO_8] = gen_helper_atomic_cmpxchgb,
+    [MO_16 | MO_LE] = gen_helper_atomic_cmpxchgw_le,
+    [MO_16 | MO_BE] = gen_helper_atomic_cmpxchgw_be,
+    [MO_32 | MO_LE] = gen_helper_atomic_cmpxchgl_le,
+    [MO_32 | MO_BE] = gen_helper_atomic_cmpxchgl_be,
+    [MO_64 | MO_LE] = gen_helper_atomic_cmpxchgq_le,
+    [MO_64 | MO_BE] = gen_helper_atomic_cmpxchgq_be,
+};
+
+void tcg_gen_atomic_cmpxchg_i32(TCGv_i32 retv, TCGv addr, TCGv_i32 cmpv,
+                                TCGv_i32 newv, TCGArg idx, TCGMemOp memop)
+{
+    memop = tcg_canonicalize_memop(memop, 0, 0);
+
+    if (!parallel_cpus) {
+        TCGv_i32 t1 = tcg_temp_new_i32();
+        TCGv_i32 t2 = tcg_temp_new_i32();
+
+        tcg_gen_ext_i32(t2, cmpv, memop & MO_SIZE);
+
+        tcg_gen_qemu_ld_i32(t1, addr, idx, memop & ~MO_SIGN);
+        tcg_gen_movcond_i32(TCG_COND_EQ, t2, t1, t2, newv, t1);
+        tcg_gen_qemu_st_i32(t2, addr, idx, memop);
+        tcg_temp_free_i32(t2);
+
+        if (memop & MO_SIGN) {
+            tcg_gen_ext_i32(retv, t1, memop);
+        } else {
+            tcg_gen_mov_i32(retv, t1);
+        }
+        tcg_temp_free_i32(t1);
+    } else {
+        gen_atomic_cx_i32 gen;
+
+        gen = table_cmpxchg[memop & (MO_SIZE | MO_BSWAP)];
+        tcg_debug_assert(gen != NULL);
+
+#ifdef CONFIG_USER_ONLY
+        gen(retv, addr, cmpv, newv);
+#else
+        {
+            TCGv_i32 oi = tcg_const_i32(make_memop_idx(memop & ~MO_SIGN, idx));
+            gen(retv, tcg_ctx.tcg_env, addr, cmpv, newv, oi);
+            tcg_temp_free_i32(oi);
+        }
+#endif
+
+        if (memop & MO_SIGN) {
+            tcg_gen_ext_i32(retv, retv, memop);
+        }
+    }
+}
+
+void tcg_gen_atomic_cmpxchg_i64(TCGv_i64 retv, TCGv addr, TCGv_i64 cmpv,
+                                TCGv_i64 newv, TCGArg idx, TCGMemOp memop)
+{
+    memop = tcg_canonicalize_memop(memop, 1, 0);
+
+    if (!parallel_cpus) {
+        TCGv_i64 t1 = tcg_temp_new_i64();
+        TCGv_i64 t2 = tcg_temp_new_i64();
+
+        tcg_gen_ext_i64(t2, cmpv, memop & MO_SIZE);
+
+        tcg_gen_qemu_ld_i64(t1, addr, idx, memop & ~MO_SIGN);
+        tcg_gen_movcond_i64(TCG_COND_EQ, t2, t1, t2, newv, t1);
+        tcg_gen_qemu_st_i64(t2, addr, idx, memop);
+        tcg_temp_free_i64(t2);
+
+        if (memop & MO_SIGN) {
+            tcg_gen_ext_i64(retv, t1, memop);
+        } else {
+            tcg_gen_mov_i64(retv, t1);
+        }
+        tcg_temp_free_i64(t1);
+    } else if ((memop & MO_SIZE) == MO_64) {
+        gen_atomic_cx_i64 gen;
+
+        gen = table_cmpxchg[memop & (MO_SIZE | MO_BSWAP)];
+        tcg_debug_assert(gen != NULL);
+
+#ifdef CONFIG_USER_ONLY
+        gen(retv, addr, cmpv, newv);
+#else
+        TCGv_i32 oi = tcg_const_i32(make_memop_idx(memop, idx));
+        gen(retv, tcg_ctx.tcg_env, addr, cmpv, newv, oi);
+        tcg_temp_free_i32(oi);
+#endif
+    } else {
+        TCGv_i32 c32 = tcg_temp_new_i32();
+        TCGv_i32 n32 = tcg_temp_new_i32();
+        TCGv_i32 r32 = tcg_temp_new_i32();
+
+        tcg_gen_extrl_i64_i32(c32, cmpv);
+        tcg_gen_extrl_i64_i32(n32, newv);
+        tcg_gen_atomic_cmpxchg_i32(r32, addr, c32, n32, idx, memop & ~MO_SIGN);
+        tcg_temp_free_i32(c32);
+        tcg_temp_free_i32(n32);
+
+        tcg_gen_extu_i32_i64(retv, r32);
+        tcg_temp_free_i32(r32);
+
+        if (memop & MO_SIGN) {
+            tcg_gen_ext_i64(retv, retv, memop);
+        }
+    }
+}
+
+static void do_nonatomic_op_i32(TCGv_i32 ret, TCGv addr, TCGv_i32 val,
+                                TCGArg idx, TCGMemOp memop, bool new_val,
+                                void (*gen)(TCGv_i32, TCGv_i32, TCGv_i32))
+{
+    TCGv_i32 t1 = tcg_temp_new_i32();
+    TCGv_i32 t2 = tcg_temp_new_i32();
+
+    memop = tcg_canonicalize_memop(memop, 0, 0);
+
+    tcg_gen_qemu_ld_i32(t1, addr, idx, memop & ~MO_SIGN);
+    gen(t2, t1, val);
+    tcg_gen_qemu_st_i32(t2, addr, idx, memop);
+
+    tcg_gen_ext_i32(ret, (new_val ? t2 : t1), memop);
+    tcg_temp_free_i32(t1);
+    tcg_temp_free_i32(t2);
+}
+
+static void do_atomic_op_i32(TCGv_i32 ret, TCGv addr, TCGv_i32 val,
+                             TCGArg idx, TCGMemOp memop, void * const table[])
+{
+    gen_atomic_op_i32 gen;
+
+    memop = tcg_canonicalize_memop(memop, 0, 0);
+
+    gen = table[memop & (MO_SIZE | MO_BSWAP)];
+    tcg_debug_assert(gen != NULL);
+
+#ifdef CONFIG_USER_ONLY
+    gen(ret, addr, val);
+#else
+    {
+        TCGv_i32 oi = tcg_const_i32(make_memop_idx(memop & ~MO_SIGN, idx));
+        gen(ret, tcg_ctx.tcg_env, addr, val, oi);
+        tcg_temp_free_i32(oi);
+    }
+#endif
+
+    if (memop & MO_SIGN) {
+        tcg_gen_ext_i32(ret, ret, memop);
+    }
+}
+
+static void do_nonatomic_op_i64(TCGv_i64 ret, TCGv addr, TCGv_i64 val,
+                                TCGArg idx, TCGMemOp memop, bool new_val,
+                                void (*gen)(TCGv_i64, TCGv_i64, TCGv_i64))
+{
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 t2 = tcg_temp_new_i64();
+
+    memop = tcg_canonicalize_memop(memop, 1, 0);
+
+    tcg_gen_qemu_ld_i64(t1, addr, idx, memop & ~MO_SIGN);
+    gen(t2, t1, val);
+    tcg_gen_qemu_st_i64(t2, addr, idx, memop);
+
+    tcg_gen_ext_i64(ret, (new_val ? t2 : t1), memop);
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
+}
+
+static void do_atomic_op_i64(TCGv_i64 ret, TCGv addr, TCGv_i64 val,
+                             TCGArg idx, TCGMemOp memop, void * const table[])
+{
+    memop = tcg_canonicalize_memop(memop, 1, 0);
+
+    if ((memop & MO_SIZE) == MO_64) {
+        gen_atomic_op_i64 gen;
+
+        gen = table[memop & (MO_SIZE | MO_BSWAP)];
+        tcg_debug_assert(gen != NULL);
+
+#ifdef CONFIG_USER_ONLY
+        gen(ret, addr, val);
+#else
+        {
+            TCGv_i32 oi = tcg_const_i32(make_memop_idx(memop & ~MO_SIGN, idx));
+            gen(ret, tcg_ctx.tcg_env, addr, val, oi);
+            tcg_temp_free_i32(oi);
+        }
+#endif
+    } else {
+        TCGv_i32 v32 = tcg_temp_new_i32();
+        TCGv_i32 r32 = tcg_temp_new_i32();
+
+        tcg_gen_extrl_i64_i32(v32, val);
+        do_atomic_op_i32(r32, addr, v32, idx, memop & ~MO_SIGN, table);
+        tcg_temp_free_i32(v32);
+
+        tcg_gen_extu_i32_i64(ret, r32);
+        tcg_temp_free_i32(r32);
+
+        if (memop & MO_SIGN) {
+            tcg_gen_ext_i64(ret, ret, memop);
+        }
+    }
+}
+
+#define GEN_ATOMIC_HELPER(NAME, OP, NEW)                                \
+static void * const table_##NAME[16] = {                                \
+    [MO_8] = gen_helper_atomic_##NAME##b,                               \
+    [MO_16 | MO_LE] = gen_helper_atomic_##NAME##w_le,                   \
+    [MO_16 | MO_BE] = gen_helper_atomic_##NAME##w_be,                   \
+    [MO_32 | MO_LE] = gen_helper_atomic_##NAME##l_le,                   \
+    [MO_32 | MO_BE] = gen_helper_atomic_##NAME##l_be,                   \
+    [MO_64 | MO_LE] = gen_helper_atomic_##NAME##q_le,                   \
+    [MO_64 | MO_BE] = gen_helper_atomic_##NAME##q_be,                   \
+};                                                                      \
+void tcg_gen_atomic_##NAME##_i32                                        \
+    (TCGv_i32 ret, TCGv addr, TCGv_i32 val, TCGArg idx, TCGMemOp memop) \
+{                                                                       \
+    if (parallel_cpus) {                                                \
+        do_atomic_op_i32(ret, addr, val, idx, memop, table_##NAME);     \
+    } else {                                                            \
+        do_nonatomic_op_i32(ret, addr, val, idx, memop, NEW,            \
+                            tcg_gen_##OP##_i32);                        \
+    }                                                                   \
+}                                                                       \
+void tcg_gen_atomic_##NAME##_i64                                        \
+    (TCGv_i64 ret, TCGv addr, TCGv_i64 val, TCGArg idx, TCGMemOp memop) \
+{                                                                       \
+    if (parallel_cpus) {                                                \
+        do_atomic_op_i64(ret, addr, val, idx, memop, table_##NAME);     \
+    } else {                                                            \
+        do_nonatomic_op_i64(ret, addr, val, idx, memop, NEW,            \
+                            tcg_gen_##OP##_i64);                        \
+    }                                                                   \
+}
+
+GEN_ATOMIC_HELPER(fetch_add, add, 0)
+GEN_ATOMIC_HELPER(fetch_and, and, 0)
+GEN_ATOMIC_HELPER(fetch_or, or, 0)
+GEN_ATOMIC_HELPER(fetch_xor, xor, 0)
+
+GEN_ATOMIC_HELPER(add_fetch, add, 1)
+GEN_ATOMIC_HELPER(and_fetch, and, 1)
+GEN_ATOMIC_HELPER(or_fetch, or, 1)
+GEN_ATOMIC_HELPER(xor_fetch, xor, 1)
+
+static void tcg_gen_mov2_i32(TCGv_i32 r, TCGv_i32 a, TCGv_i32 b)
+{
+    tcg_gen_mov_i32(r, b);
+}
+
+static void tcg_gen_mov2_i64(TCGv_i64 r, TCGv_i64 a, TCGv_i64 b)
+{
+    tcg_gen_mov_i64(r, b);
+}
+
+GEN_ATOMIC_HELPER(xchg, mov2, 0)
+
+#undef GEN_ATOMIC_HELPER
diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
index f217e80..2a845ae 100644
--- a/tcg/tcg-op.h
+++ b/tcg/tcg-op.h
@@ -852,6 +852,30 @@ static inline void tcg_gen_qemu_st64(TCGv_i64 arg, TCGv addr, int mem_index)
     tcg_gen_qemu_st_i64(arg, addr, mem_index, MO_TEQ);
 }
 
+void tcg_gen_atomic_cmpxchg_i32(TCGv_i32, TCGv, TCGv_i32, TCGv_i32,
+                                TCGArg, TCGMemOp);
+void tcg_gen_atomic_cmpxchg_i64(TCGv_i64, TCGv, TCGv_i64, TCGv_i64,
+                                TCGArg, TCGMemOp);
+
+void tcg_gen_atomic_xchg_i32(TCGv_i32, TCGv, TCGv_i32, TCGArg, TCGMemOp);
+void tcg_gen_atomic_xchg_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
+void tcg_gen_atomic_fetch_add_i32(TCGv_i32, TCGv, TCGv_i32, TCGArg, TCGMemOp);
+void tcg_gen_atomic_fetch_add_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
+void tcg_gen_atomic_fetch_and_i32(TCGv_i32, TCGv, TCGv_i32, TCGArg, TCGMemOp);
+void tcg_gen_atomic_fetch_and_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
+void tcg_gen_atomic_fetch_or_i32(TCGv_i32, TCGv, TCGv_i32, TCGArg, TCGMemOp);
+void tcg_gen_atomic_fetch_or_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
+void tcg_gen_atomic_fetch_xor_i32(TCGv_i32, TCGv, TCGv_i32, TCGArg, TCGMemOp);
+void tcg_gen_atomic_fetch_xor_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
+void tcg_gen_atomic_add_fetch_i32(TCGv_i32, TCGv, TCGv_i32, TCGArg, TCGMemOp);
+void tcg_gen_atomic_add_fetch_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
+void tcg_gen_atomic_and_fetch_i32(TCGv_i32, TCGv, TCGv_i32, TCGArg, TCGMemOp);
+void tcg_gen_atomic_and_fetch_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
+void tcg_gen_atomic_or_fetch_i32(TCGv_i32, TCGv, TCGv_i32, TCGArg, TCGMemOp);
+void tcg_gen_atomic_or_fetch_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
+void tcg_gen_atomic_xor_fetch_i32(TCGv_i32, TCGv, TCGv_i32, TCGArg, TCGMemOp);
+void tcg_gen_atomic_xor_fetch_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
+
 #if TARGET_LONG_BITS == 64
 #define tcg_gen_movi_tl tcg_gen_movi_i64
 #define tcg_gen_mov_tl tcg_gen_mov_i64
@@ -930,6 +954,16 @@ static inline void tcg_gen_qemu_st64(TCGv_i64 arg, TCGv addr, int mem_index)
 #define tcg_gen_sub2_tl tcg_gen_sub2_i64
 #define tcg_gen_mulu2_tl tcg_gen_mulu2_i64
 #define tcg_gen_muls2_tl tcg_gen_muls2_i64
+#define tcg_gen_atomic_cmpxchg_tl tcg_gen_atomic_cmpxchg_i64
+#define tcg_gen_atomic_xchg_tl tcg_gen_atomic_xchg_i64
+#define tcg_gen_atomic_fetch_add_tl tcg_gen_atomic_fetch_add_i64
+#define tcg_gen_atomic_fetch_and_tl tcg_gen_atomic_fetch_and_i64
+#define tcg_gen_atomic_fetch_or_tl tcg_gen_atomic_fetch_or_i64
+#define tcg_gen_atomic_fetch_xor_tl tcg_gen_atomic_fetch_xor_i64
+#define tcg_gen_atomic_add_fetch_tl tcg_gen_atomic_add_fetch_i64
+#define tcg_gen_atomic_and_fetch_tl tcg_gen_atomic_and_fetch_i64
+#define tcg_gen_atomic_or_fetch_tl tcg_gen_atomic_or_fetch_i64
+#define tcg_gen_atomic_xor_fetch_tl tcg_gen_atomic_xor_fetch_i64
 #else
 #define tcg_gen_movi_tl tcg_gen_movi_i32
 #define tcg_gen_mov_tl tcg_gen_mov_i32
@@ -1007,6 +1041,16 @@ static inline void tcg_gen_qemu_st64(TCGv_i64 arg, TCGv addr, int mem_index)
 #define tcg_gen_sub2_tl tcg_gen_sub2_i32
 #define tcg_gen_mulu2_tl tcg_gen_mulu2_i32
 #define tcg_gen_muls2_tl tcg_gen_muls2_i32
+#define tcg_gen_atomic_cmpxchg_tl tcg_gen_atomic_cmpxchg_i32
+#define tcg_gen_atomic_xchg_tl tcg_gen_atomic_xchg_i32
+#define tcg_gen_atomic_fetch_add_tl tcg_gen_atomic_fetch_add_i32
+#define tcg_gen_atomic_fetch_and_tl tcg_gen_atomic_fetch_and_i32
+#define tcg_gen_atomic_fetch_or_tl tcg_gen_atomic_fetch_or_i32
+#define tcg_gen_atomic_fetch_xor_tl tcg_gen_atomic_fetch_xor_i32
+#define tcg_gen_atomic_add_fetch_tl tcg_gen_atomic_add_fetch_i32
+#define tcg_gen_atomic_and_fetch_tl tcg_gen_atomic_and_fetch_i32
+#define tcg_gen_atomic_or_fetch_tl tcg_gen_atomic_or_fetch_i32
+#define tcg_gen_atomic_xor_fetch_tl tcg_gen_atomic_xor_fetch_i32
 #endif
 
 #if UINTPTR_MAX == UINT32_MAX
diff --git a/tcg/tcg-runtime.h b/tcg/tcg-runtime.h
index 23a0c37..b3accf4 100644
--- a/tcg/tcg-runtime.h
+++ b/tcg/tcg-runtime.h
@@ -14,3 +14,78 @@ DEF_HELPER_FLAGS_2(sar_i64, TCG_CALL_NO_RWG_SE, s64, s64, s64)
 
 DEF_HELPER_FLAGS_2(mulsh_i64, TCG_CALL_NO_RWG_SE, s64, s64, s64)
 DEF_HELPER_FLAGS_2(muluh_i64, TCG_CALL_NO_RWG_SE, i64, i64, i64)
+
+#ifdef CONFIG_USER_ONLY
+
+DEF_HELPER_FLAGS_3(atomic_cmpxchgb, TCG_CALL_NO_WG, i32, tl, i32, i32)
+DEF_HELPER_FLAGS_3(atomic_cmpxchgw_be, TCG_CALL_NO_WG, i32, tl, i32, i32)
+DEF_HELPER_FLAGS_3(atomic_cmpxchgl_be, TCG_CALL_NO_WG, i32, tl, i32, i32)
+DEF_HELPER_FLAGS_3(atomic_cmpxchgq_be, TCG_CALL_NO_WG, i64, tl, i64, i64)
+DEF_HELPER_FLAGS_3(atomic_cmpxchgw_le, TCG_CALL_NO_WG, i32, tl, i32, i32)
+DEF_HELPER_FLAGS_3(atomic_cmpxchgl_le, TCG_CALL_NO_WG, i32, tl, i32, i32)
+DEF_HELPER_FLAGS_3(atomic_cmpxchgq_le, TCG_CALL_NO_WG, i64, tl, i64, i64)
+
+#define GEN_ATOMIC_HELPERS(NAME)                        \
+    DEF_HELPER_FLAGS_2(glue(glue(atomic_, NAME), b),    \
+                       TCG_CALL_NO_WG, i32, tl, i32)    \
+    DEF_HELPER_FLAGS_2(glue(glue(atomic_, NAME), w_le), \
+                       TCG_CALL_NO_WG, i32, tl, i32)    \
+    DEF_HELPER_FLAGS_2(glue(glue(atomic_, NAME), w_be), \
+                       TCG_CALL_NO_WG, i32, tl, i32)    \
+    DEF_HELPER_FLAGS_2(glue(glue(atomic_, NAME), l_le), \
+                       TCG_CALL_NO_WG, i32, tl, i32)    \
+    DEF_HELPER_FLAGS_2(glue(glue(atomic_, NAME), l_be), \
+                       TCG_CALL_NO_WG, i32, tl, i32)    \
+    DEF_HELPER_FLAGS_2(glue(glue(atomic_, NAME), q_le), \
+                       TCG_CALL_NO_WG, i64, tl, i64)    \
+    DEF_HELPER_FLAGS_2(glue(glue(atomic_, NAME), q_be), \
+                       TCG_CALL_NO_WG, i64, tl, i64)
+
+#else
+
+DEF_HELPER_FLAGS_5(atomic_cmpxchgb, TCG_CALL_NO_WG, i32, env,
+                   tl, i32, i32, i32)
+DEF_HELPER_FLAGS_5(atomic_cmpxchgw_be, TCG_CALL_NO_WG, i32, env,
+                   tl, i32, i32, i32)
+DEF_HELPER_FLAGS_5(atomic_cmpxchgl_be, TCG_CALL_NO_WG, i32, env,
+                   tl, i32, i32, i32)
+DEF_HELPER_FLAGS_5(atomic_cmpxchgq_be, TCG_CALL_NO_WG, i64, env,
+                   tl, i64, i64, i32)
+DEF_HELPER_FLAGS_5(atomic_cmpxchgw_le, TCG_CALL_NO_WG, i32, env,
+                   tl, i32, i32, i32)
+DEF_HELPER_FLAGS_5(atomic_cmpxchgl_le, TCG_CALL_NO_WG, i32, env,
+                   tl, i32, i32, i32)
+DEF_HELPER_FLAGS_5(atomic_cmpxchgq_le, TCG_CALL_NO_WG, i64, env,
+                   tl, i64, i64, i32)
+
+#define GEN_ATOMIC_HELPERS(NAME)                                \
+    DEF_HELPER_FLAGS_4(glue(glue(atomic_, NAME), b),            \
+                       TCG_CALL_NO_WG, i32, env, tl, i32, i32)  \
+    DEF_HELPER_FLAGS_4(glue(glue(atomic_, NAME), w_le),         \
+                       TCG_CALL_NO_WG, i32, env, tl, i32, i32)  \
+    DEF_HELPER_FLAGS_4(glue(glue(atomic_, NAME), w_be),         \
+                       TCG_CALL_NO_WG, i32, env, tl, i32, i32)  \
+    DEF_HELPER_FLAGS_4(glue(glue(atomic_, NAME), l_le),         \
+                       TCG_CALL_NO_WG, i32, env, tl, i32, i32)  \
+    DEF_HELPER_FLAGS_4(glue(glue(atomic_, NAME), l_be),         \
+                       TCG_CALL_NO_WG, i32, env, tl, i32, i32)  \
+    DEF_HELPER_FLAGS_4(glue(glue(atomic_, NAME), q_le),         \
+                       TCG_CALL_NO_WG, i64, env, tl, i64, i32)  \
+    DEF_HELPER_FLAGS_4(glue(glue(atomic_, NAME), q_be),         \
+                       TCG_CALL_NO_WG, i64, env, tl, i64, i32)
+
+#endif
+
+GEN_ATOMIC_HELPERS(fetch_add)
+GEN_ATOMIC_HELPERS(fetch_and)
+GEN_ATOMIC_HELPERS(fetch_or)
+GEN_ATOMIC_HELPERS(fetch_xor)
+
+GEN_ATOMIC_HELPERS(add_fetch)
+GEN_ATOMIC_HELPERS(and_fetch)
+GEN_ATOMIC_HELPERS(or_fetch)
+GEN_ATOMIC_HELPERS(xor_fetch)
+
+GEN_ATOMIC_HELPERS(xchg)
+
+#undef GEN_ATOMIC_HELPERS
diff --git a/tcg/tcg.h b/tcg/tcg.h
index ab67537..4e60498 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -1163,6 +1163,59 @@ uint64_t helper_be_ldq_cmmu(CPUArchState *env, target_ulong addr,
 # define helper_ret_ldq_cmmu  helper_le_ldq_cmmu
 #endif
 
+uint8_t helper_atomic_cmpxchgb_mmu(CPUArchState *env, target_ulong addr,
+                                   uint8_t cmpv, uint8_t newv,
+                                   TCGMemOpIdx oi, uintptr_t retaddr);
+uint16_t helper_atomic_cmpxchgw_le_mmu(CPUArchState *env, target_ulong addr,
+                                       uint16_t cmpv, uint16_t newv,
+                                       TCGMemOpIdx oi, uintptr_t retaddr);
+uint32_t helper_atomic_cmpxchgl_le_mmu(CPUArchState *env, target_ulong addr,
+                                       uint32_t cmpv, uint32_t newv,
+                                       TCGMemOpIdx oi, uintptr_t retaddr);
+uint64_t helper_atomic_cmpxchgq_le_mmu(CPUArchState *env, target_ulong addr,
+                                       uint64_t cmpv, uint64_t newv,
+                                       TCGMemOpIdx oi, uintptr_t retaddr);
+uint16_t helper_atomic_cmpxchgw_be_mmu(CPUArchState *env, target_ulong addr,
+                                       uint16_t cmpv, uint16_t newv,
+                                       TCGMemOpIdx oi, uintptr_t retaddr);
+uint32_t helper_atomic_cmpxchgl_be_mmu(CPUArchState *env, target_ulong addr,
+                                       uint32_t cmpv, uint32_t newv,
+                                       TCGMemOpIdx oi, uintptr_t retaddr);
+uint64_t helper_atomic_cmpxchgq_be_mmu(CPUArchState *env, target_ulong addr,
+                                       uint64_t cmpv, uint64_t newv,
+                                       TCGMemOpIdx oi, uintptr_t retaddr);
+
+#define GEN_ATOMIC_HELPER(NAME, TYPE, SUFFIX)         \
+TYPE helper_atomic_ ## NAME ## SUFFIX ## _mmu         \
+    (CPUArchState *env, target_ulong addr, TYPE val,  \
+     TCGMemOpIdx oi, uintptr_t retaddr);
+
+#define GEN_ATOMIC_HELPER_ALL(NAME)          \
+    GEN_ATOMIC_HELPER(NAME, uint8_t, b)      \
+    GEN_ATOMIC_HELPER(NAME, uint16_t, w_le)  \
+    GEN_ATOMIC_HELPER(NAME, uint32_t, l_le)  \
+    GEN_ATOMIC_HELPER(NAME, uint64_t, q_le)  \
+    GEN_ATOMIC_HELPER(NAME, uint16_t, w_be)  \
+    GEN_ATOMIC_HELPER(NAME, uint32_t, l_be)  \
+    GEN_ATOMIC_HELPER(NAME, uint64_t, q_be)
+
+GEN_ATOMIC_HELPER_ALL(fetch_add)
+GEN_ATOMIC_HELPER_ALL(fetch_sub)
+GEN_ATOMIC_HELPER_ALL(fetch_and)
+GEN_ATOMIC_HELPER_ALL(fetch_or)
+GEN_ATOMIC_HELPER_ALL(fetch_xor)
+
+GEN_ATOMIC_HELPER_ALL(add_fetch)
+GEN_ATOMIC_HELPER_ALL(sub_fetch)
+GEN_ATOMIC_HELPER_ALL(and_fetch)
+GEN_ATOMIC_HELPER_ALL(or_fetch)
+GEN_ATOMIC_HELPER_ALL(xor_fetch)
+
+GEN_ATOMIC_HELPER_ALL(xchg)
+
+#undef GEN_ATOMIC_HELPER_ALL
+#undef GEN_ATOMIC_HELPER
+
 #endif /* CONFIG_SOFTMMU */
 
 #endif /* TCG_H */
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH v2 10/27] tcg: Add atomic128 helpers
  2016-07-01 17:04 [Qemu-devel] [PATCH v2 00/27] cmpxchg-based emulation of atomics Richard Henderson
                   ` (7 preceding siblings ...)
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 09/27] tcg: Add atomic helpers Richard Henderson
@ 2016-07-01 17:04 ` Richard Henderson
  2016-07-08  3:00   ` Emilio G. Cota
  2016-08-11 10:02   ` Alex Bennée
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 11/27] target-i386: emulate LOCK'ed cmpxchg using cmpxchg helpers Richard Henderson
                   ` (18 subsequent siblings)
  27 siblings, 2 replies; 48+ messages in thread
From: Richard Henderson @ 2016-07-01 17:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota, alex.bennee, pbonzini, peter.maydell, serge.fdrv

Force the use of cmpxchg16b on x86_64.

Wikipedia suggests that only very old AMD64 (circa 2004) did not have
this instruction.  Further, it's required by Windows 8 so no new cpus
will ever omit it.

If we truely care about these, then we could check this at startup time
and then avoid executing paths that use it.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 configure             |  29 ++++++++++++-
 cputlb.c              |   6 +++
 include/qemu/int128.h |   6 +++
 softmmu_template.h    | 110 +++++++++++++++++++++++++++++++++++++-------------
 tcg/tcg.h             |  22 ++++++++++
 5 files changed, 144 insertions(+), 29 deletions(-)

diff --git a/configure b/configure
index 59ea124..586abd6 100755
--- a/configure
+++ b/configure
@@ -1201,7 +1201,10 @@ case "$cpu" in
            cc_i386='$(CC) -m32'
            ;;
     x86_64)
-           CPU_CFLAGS="-m64"
+           # ??? Only extremely old AMD cpus do not have cmpxchg16b.
+           # If we truly care, we should simply detect this case at
+           # runtime and generate the fallback to serial emulation.
+           CPU_CFLAGS="-m64 -mcx16"
            LDFLAGS="-m64 $LDFLAGS"
            cc_i386='$(CC) -m32'
            ;;
@@ -4434,6 +4437,26 @@ if compile_prog "" "" ; then
     int128=yes
 fi
 
+#########################################
+# See if 128-bit atomic operations are supported.
+
+atomic128=no
+if test "$int128" = "yes"; then
+  cat > $TMPC << EOF
+int main(void)
+{
+  unsigned __int128 x = 0, y = 0;
+  y = __atomic_load_16(&x, 0);
+  __atomic_store_16(&x, y, 0);
+  __atomic_compare_exchange_16(&x, &y, x, 0, 0, 0);
+  return 0;
+}
+EOF
+  if compile_prog "" "" ; then
+    atomic128=yes
+  fi
+fi
+
 ########################################
 # check if getauxval is available.
 
@@ -5383,6 +5406,10 @@ if test "$int128" = "yes" ; then
   echo "CONFIG_INT128=y" >> $config_host_mak
 fi
 
+if test "$atomic128" = "yes" ; then
+  echo "CONFIG_ATOMIC128=y" >> $config_host_mak
+fi
+
 if test "$getauxval" = "yes" ; then
   echo "CONFIG_GETAUXVAL=y" >> $config_host_mak
 fi
diff --git a/cputlb.c b/cputlb.c
index 5272456..660f824 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -510,6 +510,12 @@ tb_page_addr_t get_page_addr_code(CPUArchState *env1, target_ulong addr)
 
 #define SHIFT 3
 #include "softmmu_template.h"
+
+#ifdef CONFIG_ATOMIC128
+#define SHIFT 4
+#include "softmmu_template.h"
+#endif
+
 #undef MMUSUFFIX
 
 #define MMUSUFFIX _cmmu
diff --git a/include/qemu/int128.h b/include/qemu/int128.h
index ab67275..5819da4 100644
--- a/include/qemu/int128.h
+++ b/include/qemu/int128.h
@@ -2,6 +2,7 @@
 #define INT128_H
 
 #ifdef CONFIG_INT128
+#include "qemu/bswap.h"
 
 typedef __int128 Int128;
 
@@ -137,6 +138,11 @@ static inline void int128_subfrom(Int128 *a, Int128 b)
     *a -= b;
 }
 
+static inline Int128 bswap128(Int128 a)
+{
+    return int128_make128(bswap64(int128_gethi(a)), bswap64(int128_getlo(a)));
+}
+
 #else /* !CONFIG_INT128 */
 
 /* Here we are catering to the ABI of the host.  If the host returns
diff --git a/softmmu_template.h b/softmmu_template.h
index 76712b9..0a9f49b 100644
--- a/softmmu_template.h
+++ b/softmmu_template.h
@@ -27,25 +27,30 @@
 
 #define DATA_SIZE (1 << SHIFT)
 
-#if DATA_SIZE == 8
-#define SUFFIX q
-#define LSUFFIX q
-#define SDATA_TYPE  int64_t
+#if DATA_SIZE == 16
+#define SUFFIX     o
+#define LSUFFIX    o
+#define SDATA_TYPE Int128
+#define DATA_TYPE  Int128
+#elif DATA_SIZE == 8
+#define SUFFIX     q
+#define LSUFFIX    q
+#define SDATA_TYPE int64_t
 #define DATA_TYPE  uint64_t
 #elif DATA_SIZE == 4
-#define SUFFIX l
-#define LSUFFIX l
-#define SDATA_TYPE  int32_t
+#define SUFFIX     l
+#define LSUFFIX    l
+#define SDATA_TYPE int32_t
 #define DATA_TYPE  uint32_t
 #elif DATA_SIZE == 2
-#define SUFFIX w
-#define LSUFFIX uw
-#define SDATA_TYPE  int16_t
+#define SUFFIX     w
+#define LSUFFIX    uw
+#define SDATA_TYPE int16_t
 #define DATA_TYPE  uint16_t
 #elif DATA_SIZE == 1
-#define SUFFIX b
-#define LSUFFIX ub
-#define SDATA_TYPE  int8_t
+#define SUFFIX     b
+#define LSUFFIX    ub
+#define SDATA_TYPE int8_t
 #define DATA_TYPE  uint8_t
 #else
 #error unsupported data size
@@ -56,7 +61,7 @@
    to the register size of the host.  This is tcg_target_long, except in the
    case of a 32-bit host and 64-bit data, and for that we always have
    uint64_t.  Don't bother with this widened value for SOFTMMU_CODE_ACCESS.  */
-#if defined(SOFTMMU_CODE_ACCESS) || DATA_SIZE == 8
+#if defined(SOFTMMU_CODE_ACCESS) || DATA_SIZE >= 8
 # define WORD_TYPE  DATA_TYPE
 # define USUFFIX    SUFFIX
 #else
@@ -73,7 +78,9 @@
 #define ADDR_READ addr_read
 #endif
 
-#if DATA_SIZE == 8
+#if DATA_SIZE == 16
+# define BSWAP(X)  bswap128(X)
+#elif DATA_SIZE == 8
 # define BSWAP(X)  bswap64(X)
 #elif DATA_SIZE == 4
 # define BSWAP(X)  bswap32(X)
@@ -140,6 +147,7 @@
     vidx >= 0;                                                                \
 })
 
+#if DATA_SIZE < 16
 #ifndef SOFTMMU_CODE_ACCESS
 static inline DATA_TYPE glue(io_read, SUFFIX)(CPUArchState *env,
                                               CPUIOTLBEntry *iotlbentry,
@@ -307,9 +315,10 @@ WORD_TYPE helper_be_ld_name(CPUArchState *env, target_ulong addr,
     return res;
 }
 #endif /* DATA_SIZE > 1 */
+#endif /* DATA_SIZE < 16 */
 
 #ifndef SOFTMMU_CODE_ACCESS
-
+#if DATA_SIZE < 16
 /* Provide signed versions of the load routines as well.  We can of course
    avoid this for 64-bit data, or for 32-bit data on 32-bit host.  */
 #if DATA_SIZE * 8 < TCG_TARGET_REG_BITS
@@ -507,6 +516,7 @@ void probe_write(CPUArchState *env, target_ulong addr, int mmu_idx,
     }
 }
 #endif
+#endif /* DATA_SIZE < 16 */
 
 #if DATA_SIZE == 1
 # define HE_SUFFIX  _mmu
@@ -573,9 +583,30 @@ DATA_TYPE glue(glue(helper_atomic_cmpxchg, SUFFIX), HE_SUFFIX)
      TCGMemOpIdx oi, uintptr_t retaddr)
 {
     ATOMIC_MMU_BODY;
+#if DATA_SIZE < 16
     return atomic_cmpxchg(haddr, cmpv, newv);
+#else
+    __atomic_compare_exchange(haddr, &cmpv, &newv, false,
+                              __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST);
+    return cmpv;
+#endif
 }
 
+#if DATA_SIZE > 1
+DATA_TYPE glue(glue(helper_atomic_cmpxchg, SUFFIX), RE_SUFFIX)
+    (CPUArchState *env, target_ulong addr, DATA_TYPE cmpv, DATA_TYPE newv,
+     TCGMemOpIdx oi, uintptr_t retaddr)
+{
+    DATA_TYPE retv;
+    cmpv = BSWAP(cmpv);
+    newv = BSWAP(newv);
+    retv = (glue(glue(helper_atomic_cmpxchg, SUFFIX), HE_SUFFIX)
+            (env, addr, cmpv, newv, oi, retaddr));
+    return BSWAP(retv);
+}
+#endif
+
+#if DATA_SIZE < 16
 #define GEN_ATOMIC_HELPER(NAME)                                         \
 DATA_TYPE glue(glue(glue(helper_atomic_, NAME), SUFFIX), HE_SUFFIX)     \
     (CPUArchState *env, target_ulong addr, DATA_TYPE val,               \
@@ -600,18 +631,6 @@ GEN_ATOMIC_HELPER(xchg)
 #undef GEN_ATOMIC_HELPER
 
 #if DATA_SIZE > 1
-DATA_TYPE glue(glue(helper_atomic_cmpxchg, SUFFIX), RE_SUFFIX)
-    (CPUArchState *env, target_ulong addr, DATA_TYPE cmpv, DATA_TYPE newv,
-     TCGMemOpIdx oi, uintptr_t retaddr)
-{
-    DATA_TYPE retv;
-    cmpv = BSWAP(cmpv);
-    newv = BSWAP(newv);
-    retv = (glue(glue(helper_atomic_cmpxchg, SUFFIX), HE_SUFFIX)
-            (env, addr, cmpv, newv, oi, retaddr));
-    return BSWAP(retv);
-}
-
 #define GEN_ATOMIC_HELPER(NAME)                                         \
 DATA_TYPE glue(glue(glue(helper_atomic_, NAME), SUFFIX), RE_SUFFIX)     \
     (CPUArchState *env, target_ulong addr, DATA_TYPE val,               \
@@ -676,6 +695,41 @@ DATA_TYPE glue(glue(helper_atomic_add_fetch, SUFFIX), RE_SUFFIX)
     }
 }
 #endif /* DATA_SIZE > 1 */
+#else /* DATA_SIZE >= 16 */
+DATA_TYPE glue(glue(helper_atomic_ld, SUFFIX), HE_SUFFIX)
+    (CPUArchState *env, target_ulong addr, TCGMemOpIdx oi, uintptr_t retaddr)
+{
+    DATA_TYPE res;
+    ATOMIC_MMU_BODY;
+    __atomic_load(haddr, &res, __ATOMIC_RELAXED);
+    return res;
+}
+
+DATA_TYPE glue(glue(helper_atomic_ld, SUFFIX), RE_SUFFIX)
+    (CPUArchState *env, target_ulong addr, TCGMemOpIdx oi, uintptr_t retaddr)
+{
+    DATA_TYPE res;
+    res = (glue(glue(helper_atomic_ld, SUFFIX), HE_SUFFIX)
+           (env, addr, oi, retaddr));
+    return BSWAP(res);
+}
+
+void glue(glue(helper_atomic_st, SUFFIX), HE_SUFFIX)
+    (CPUArchState *env, target_ulong addr, DATA_TYPE val,
+     TCGMemOpIdx oi, uintptr_t retaddr)
+{
+    ATOMIC_MMU_BODY;
+    __atomic_store(haddr, &val, __ATOMIC_RELAXED);
+}
+
+void glue(glue(helper_atomic_st, SUFFIX), RE_SUFFIX)
+    (CPUArchState *env, target_ulong addr, DATA_TYPE val,
+     TCGMemOpIdx oi, uintptr_t retaddr)
+{
+    (glue(glue(helper_atomic_st, SUFFIX), HE_SUFFIX)
+     (env, addr, BSWAP(val), oi, retaddr));
+}
+#endif /* DATA_SIZE < 16 */
 
 #undef ATOMIC_MMU_BODY
 
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 4e60498..1304a42 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -1216,6 +1216,28 @@ GEN_ATOMIC_HELPER_ALL(xchg)
 #undef GEN_ATOMIC_HELPER_ALL
 #undef GEN_ATOMIC_HELPER
 
+#ifdef CONFIG_ATOMIC128
+#include "qemu/int128.h"
+
+/* These aren't really a "proper" helpers because TCG cannot manage Int128.
+   However, use the same format as the others, for use by the backends. */
+Int128 helper_atomic_cmpxchgo_le_mmu(CPUArchState *env, target_ulong addr,
+                                     Int128 cmpv, Int128 newv,
+                                     TCGMemOpIdx oi, uintptr_t retaddr);
+Int128 helper_atomic_cmpxchgo_be_mmu(CPUArchState *env, target_ulong addr,
+                                     Int128 cmpv, Int128 newv,
+                                     TCGMemOpIdx oi, uintptr_t retaddr);
+
+Int128 helper_atomic_ldo_le_mmu(CPUArchState *env, target_ulong addr,
+                                TCGMemOpIdx oi, uintptr_t retaddr);
+Int128 helper_atomic_ldo_be_mmu(CPUArchState *env, target_ulong addr,
+                                TCGMemOpIdx oi, uintptr_t retaddr);
+void helper_atomic_sto_le_mmu(CPUArchState *env, target_ulong addr, Int128 val,
+                              TCGMemOpIdx oi, uintptr_t retaddr);
+void helper_atomic_sto_be_mmu(CPUArchState *env, target_ulong addr, Int128 val,
+                              TCGMemOpIdx oi, uintptr_t retaddr);
+
+#endif /* CONFIG_ATOMIC128 */
 #endif /* CONFIG_SOFTMMU */
 
 #endif /* TCG_H */
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH v2 11/27] target-i386: emulate LOCK'ed cmpxchg using cmpxchg helpers
  2016-07-01 17:04 [Qemu-devel] [PATCH v2 00/27] cmpxchg-based emulation of atomics Richard Henderson
                   ` (8 preceding siblings ...)
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 10/27] tcg: Add atomic128 helpers Richard Henderson
@ 2016-07-01 17:04 ` Richard Henderson
  2016-07-08  3:08   ` Emilio G. Cota
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 12/27] target-i386: emulate LOCK'ed OP instructions using atomic helpers Richard Henderson
                   ` (17 subsequent siblings)
  27 siblings, 1 reply; 48+ messages in thread
From: Richard Henderson @ 2016-07-01 17:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota, alex.bennee, pbonzini, peter.maydell, serge.fdrv

From: "Emilio G. Cota" <cota@braap.org>

The diff here is uglier than necessary. All this does is to turn

FOO

into:

if (s->prefix & PREFIX_LOCK) {
  BAR
} else {
  FOO
}

where FOO is the original implementation of an unlocked cmpxchg.

[rth: Adjust unlocked cmpxchg to use movcond instead of branches.
Adjust helpers to use atomic helpers.]

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1467054136-10430-6-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-i386/mem_helper.c | 96 ++++++++++++++++++++++++++++++++++++++----------
 target-i386/translate.c  | 87 +++++++++++++++++++++----------------------
 2 files changed, 120 insertions(+), 63 deletions(-)

diff --git a/target-i386/mem_helper.c b/target-i386/mem_helper.c
index c2f4769..5c0558f 100644
--- a/target-i386/mem_helper.c
+++ b/target-i386/mem_helper.c
@@ -22,6 +22,8 @@
 #include "exec/helper-proto.h"
 #include "exec/exec-all.h"
 #include "exec/cpu_ldst.h"
+#include "qemu/int128.h"
+#include "tcg.h"
 
 /* broken thread support */
 
@@ -58,20 +60,39 @@ void helper_lock_init(void)
 
 void helper_cmpxchg8b(CPUX86State *env, target_ulong a0)
 {
-    uint64_t d;
+    uintptr_t ra = GETPC();
+    uint64_t oldv, cmpv, newv;
     int eflags;
 
     eflags = cpu_cc_compute_all(env, CC_OP);
-    d = cpu_ldq_data_ra(env, a0, GETPC());
-    if (d == (((uint64_t)env->regs[R_EDX] << 32) | (uint32_t)env->regs[R_EAX])) {
-        cpu_stq_data_ra(env, a0, ((uint64_t)env->regs[R_ECX] << 32)
-                                  | (uint32_t)env->regs[R_EBX], GETPC());
-        eflags |= CC_Z;
+
+    cmpv = deposit64(env->regs[R_EAX], 32, 32, env->regs[R_EDX]);
+    newv = deposit64(env->regs[R_EBX], 32, 32, env->regs[R_ECX]);
+
+    if (parallel_cpus) {
+#ifdef CONFIG_USER_ONLY
+        uint64_t *haddr = g2h(a0);
+        cmpv = cpu_to_le64(cmpv);
+        newv = cpu_to_le64(newv);
+        oldv = atomic_cmpxchg(haddr, cmpv, newv);
+        oldv = le64_to_cpu(oldv);
+#else
+        int mem_idx = cpu_mmu_index(env, false);
+        TCGMemOpIdx oi = make_memop_idx(MO_TEQ, mem_idx);
+        oldv = helper_atomic_cmpxchgq_le_mmu(env, a0, cmpv, newv, oi, ra);
+#endif
     } else {
+        oldv = cpu_ldq_data_ra(env, a0, ra);
+        newv = (cmpv == oldv ? newv : oldv);
         /* always do the store */
-        cpu_stq_data_ra(env, a0, d, GETPC());
-        env->regs[R_EDX] = (uint32_t)(d >> 32);
-        env->regs[R_EAX] = (uint32_t)d;
+        cpu_stq_data_ra(env, a0, newv, ra);
+    }
+
+    if (oldv == cmpv) {
+        eflags |= CC_Z;
+    } else {
+        env->regs[R_EAX] = (uint32_t)oldv;
+        env->regs[R_EDX] = (uint32_t)(oldv >> 32);
         eflags &= ~CC_Z;
     }
     CC_SRC = eflags;
@@ -80,25 +101,60 @@ void helper_cmpxchg8b(CPUX86State *env, target_ulong a0)
 #ifdef TARGET_X86_64
 void helper_cmpxchg16b(CPUX86State *env, target_ulong a0)
 {
-    uint64_t d0, d1;
+    uintptr_t ra = GETPC();
+    Int128 oldv, cmpv, newv;
     int eflags;
+    bool success;
 
     if ((a0 & 0xf) != 0) {
         raise_exception_ra(env, EXCP0D_GPF, GETPC());
     }
     eflags = cpu_cc_compute_all(env, CC_OP);
-    d0 = cpu_ldq_data_ra(env, a0, GETPC());
-    d1 = cpu_ldq_data_ra(env, a0 + 8, GETPC());
-    if (d0 == env->regs[R_EAX] && d1 == env->regs[R_EDX]) {
-        cpu_stq_data_ra(env, a0, env->regs[R_EBX], GETPC());
-        cpu_stq_data_ra(env, a0 + 8, env->regs[R_ECX], GETPC());
+
+    cmpv = int128_make128(env->regs[R_EAX], env->regs[R_EDX]);
+    newv = int128_make128(env->regs[R_EBX], env->regs[R_ECX]);
+
+    if (parallel_cpus) {
+#ifndef CONFIG_ATOMIC128
+        cpu_loop_exit_atomic(ENV_GET_CPU(env), ra);
+#elif defined(CONFIG_USER_ONLY)
+        Int128 *haddr = g2h(a0);
+        oldv = cmpv;
+#ifdef HOST_WORDS_BIGENDIAN
+        oldv = bswap128(oldv);
+        newv = bswap128(newv);
+#endif
+        success = __atomic_compare_exchange_16(haddr, &oldv, newv, false,
+                                               __ATOMIC_SEQ_CST,
+                                               __ATOMIC_SEQ_CST);
+#ifdef HOST_WORDS_BIGENDIAN
+        oldv = bswap128(oldv);
+#endif
+#else
+        int mem_idx = cpu_mmu_index(env, false);
+        TCGMemOpIdx oi = make_memop_idx(MO_TEQ | MO_ALIGN_16, mem_idx);
+        oldv = helper_atomic_cmpxchgo_le_mmu(env, a0, cmpv, newv, oi, ra);
+        success = int128_eq(oldv, cmpv);
+#endif
+    } else {
+        uint64_t o0 = cpu_ldq_data_ra(env, a0 + 0, ra);
+        uint64_t o1 = cpu_ldq_data_ra(env, a0 + 8, ra);
+
+        oldv = int128_make128(o0, o1);
+        success = int128_eq(oldv, cmpv);
+        if (!success) {
+            newv = oldv;
+        }
+
+        cpu_stq_data_ra(env, a0 + 0, int128_getlo(newv), ra);
+        cpu_stq_data_ra(env, a0 + 8, int128_gethi(newv), ra);
+    }
+
+    if (success) {
         eflags |= CC_Z;
     } else {
-        /* always do the store */
-        cpu_stq_data_ra(env, a0, d0, GETPC());
-        cpu_stq_data_ra(env, a0 + 8, d1, GETPC());
-        env->regs[R_EDX] = d1;
-        env->regs[R_EAX] = d0;
+        env->regs[R_EAX] = int128_getlo(oldv);
+        env->regs[R_EDX] = int128_gethi(oldv);
         eflags &= ~CC_Z;
     }
     CC_SRC = eflags;
diff --git a/target-i386/translate.c b/target-i386/translate.c
index 7dea18b..2244f38 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -5070,57 +5070,58 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
     case 0x1b0:
     case 0x1b1: /* cmpxchg Ev, Gv */
         {
-            TCGLabel *label1, *label2;
-            TCGv t0, t1, t2, a0;
+            TCGv oldv, newv, cmpv;
 
             ot = mo_b_d(b, dflag);
             modrm = cpu_ldub_code(env, s->pc++);
             reg = ((modrm >> 3) & 7) | rex_r;
             mod = (modrm >> 6) & 3;
-            t0 = tcg_temp_local_new();
-            t1 = tcg_temp_local_new();
-            t2 = tcg_temp_local_new();
-            a0 = tcg_temp_local_new();
-            gen_op_mov_v_reg(ot, t1, reg);
-            if (mod == 3) {
-                rm = (modrm & 7) | REX_B(s);
-                gen_op_mov_v_reg(ot, t0, rm);
-            } else {
+            oldv = tcg_temp_new();
+            newv = tcg_temp_new();
+            cmpv = tcg_temp_new();
+            gen_op_mov_v_reg(ot, newv, reg);
+            tcg_gen_mov_tl(cmpv, cpu_regs[R_EAX]);
+
+            if (s->prefix & PREFIX_LOCK) {
+                if (mod == 3) {
+                    goto illegal_op;
+                }
                 gen_lea_modrm(env, s, modrm);
-                tcg_gen_mov_tl(a0, cpu_A0);
-                gen_op_ld_v(s, ot, t0, a0);
-                rm = 0; /* avoid warning */
-            }
-            label1 = gen_new_label();
-            tcg_gen_mov_tl(t2, cpu_regs[R_EAX]);
-            gen_extu(ot, t0);
-            gen_extu(ot, t2);
-            tcg_gen_brcond_tl(TCG_COND_EQ, t2, t0, label1);
-            label2 = gen_new_label();
-            if (mod == 3) {
-                gen_op_mov_reg_v(ot, R_EAX, t0);
-                tcg_gen_br(label2);
-                gen_set_label(label1);
-                gen_op_mov_reg_v(ot, rm, t1);
+                tcg_gen_atomic_cmpxchg_tl(oldv, cpu_A0, cmpv, newv,
+                                          s->mem_index, ot | MO_LE);
+                gen_op_mov_reg_v(ot, R_EAX, oldv);
             } else {
-                /* perform no-op store cycle like physical cpu; must be
-                   before changing accumulator to ensure idempotency if
-                   the store faults and the instruction is restarted */
-                gen_op_st_v(s, ot, t0, a0);
-                gen_op_mov_reg_v(ot, R_EAX, t0);
-                tcg_gen_br(label2);
-                gen_set_label(label1);
-                gen_op_st_v(s, ot, t1, a0);
-            }
-            gen_set_label(label2);
-            tcg_gen_mov_tl(cpu_cc_src, t0);
-            tcg_gen_mov_tl(cpu_cc_srcT, t2);
-            tcg_gen_sub_tl(cpu_cc_dst, t2, t0);
+                if (mod == 3) {
+                    rm = (modrm & 7) | REX_B(s);
+                    gen_op_mov_v_reg(ot, oldv, rm);
+                } else {
+                    gen_lea_modrm(env, s, modrm);
+                    gen_op_ld_v(s, ot, oldv, cpu_A0);
+                    rm = 0; /* avoid warning */
+                }
+                gen_extu(ot, oldv);
+                gen_extu(ot, cmpv);
+                /* store value = (old == cmp ? new : old);  */
+                tcg_gen_movcond_tl(TCG_COND_EQ, newv, oldv, cmpv, newv, oldv);
+                if (mod == 3) {
+                    gen_op_mov_reg_v(ot, R_EAX, oldv);
+                    gen_op_mov_reg_v(ot, rm, newv);
+                } else {
+                    /* Perform an unconditional store cycle like physical cpu;
+                       must be before changing accumulator to ensure
+                       idempotency if the store faults and the instruction
+                       is restarted */
+                    gen_op_st_v(s, ot, newv, cpu_A0);
+                    gen_op_mov_reg_v(ot, R_EAX, oldv);
+                }
+            }
+            tcg_gen_mov_tl(cpu_cc_src, oldv);
+            tcg_gen_mov_tl(cpu_cc_srcT, cmpv);
+            tcg_gen_sub_tl(cpu_cc_dst, cmpv, oldv);
             set_cc_op(s, CC_OP_SUBB + ot);
-            tcg_temp_free(t0);
-            tcg_temp_free(t1);
-            tcg_temp_free(t2);
-            tcg_temp_free(a0);
+            tcg_temp_free(oldv);
+            tcg_temp_free(newv);
+            tcg_temp_free(cmpv);
         }
         break;
     case 0x1c7: /* cmpxchg8b */
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH v2 12/27] target-i386: emulate LOCK'ed OP instructions using atomic helpers
  2016-07-01 17:04 [Qemu-devel] [PATCH v2 00/27] cmpxchg-based emulation of atomics Richard Henderson
                   ` (9 preceding siblings ...)
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 11/27] target-i386: emulate LOCK'ed cmpxchg using cmpxchg helpers Richard Henderson
@ 2016-07-01 17:04 ` Richard Henderson
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 13/27] target-i386: emulate LOCK'ed INC using atomic helper Richard Henderson
                   ` (16 subsequent siblings)
  27 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2016-07-01 17:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota, alex.bennee, pbonzini, peter.maydell, serge.fdrv

From: "Emilio G. Cota" <cota@braap.org>

[rth: Eliminate some unnecessary temporaries.]

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1467054136-10430-13-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-i386/translate.c | 76 +++++++++++++++++++++++++++++++++++++------------
 1 file changed, 58 insertions(+), 18 deletions(-)

diff --git a/target-i386/translate.c b/target-i386/translate.c
index 2244f38..58bc954 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -1258,55 +1258,95 @@ static void gen_op(DisasContext *s1, int op, TCGMemOp ot, int d)
 {
     if (d != OR_TMP0) {
         gen_op_mov_v_reg(ot, cpu_T0, d);
-    } else {
+    } else if (!(s1->prefix & PREFIX_LOCK)) {
         gen_op_ld_v(s1, ot, cpu_T0, cpu_A0);
     }
     switch(op) {
     case OP_ADCL:
         gen_compute_eflags_c(s1, cpu_tmp4);
-        tcg_gen_add_tl(cpu_T0, cpu_T0, cpu_T1);
-        tcg_gen_add_tl(cpu_T0, cpu_T0, cpu_tmp4);
-        gen_op_st_rm_T0_A0(s1, ot, d);
+        if (s1->prefix & PREFIX_LOCK) {
+            tcg_gen_add_tl(cpu_T0, cpu_tmp4, cpu_T1);
+            tcg_gen_atomic_add_fetch_tl(cpu_T0, cpu_A0, cpu_T0,
+                                        s1->mem_index, ot | MO_LE);
+        } else {
+            tcg_gen_add_tl(cpu_T0, cpu_T0, cpu_T1);
+            tcg_gen_add_tl(cpu_T0, cpu_T0, cpu_tmp4);
+            gen_op_st_rm_T0_A0(s1, ot, d);
+        }
         gen_op_update3_cc(cpu_tmp4);
         set_cc_op(s1, CC_OP_ADCB + ot);
         break;
     case OP_SBBL:
         gen_compute_eflags_c(s1, cpu_tmp4);
-        tcg_gen_sub_tl(cpu_T0, cpu_T0, cpu_T1);
-        tcg_gen_sub_tl(cpu_T0, cpu_T0, cpu_tmp4);
-        gen_op_st_rm_T0_A0(s1, ot, d);
+        if (s1->prefix & PREFIX_LOCK) {
+            tcg_gen_add_tl(cpu_T0, cpu_T1, cpu_tmp4);
+            tcg_gen_neg_tl(cpu_T0, cpu_T0);
+            tcg_gen_atomic_add_fetch_tl(cpu_T0, cpu_A0, cpu_T0,
+                                        s1->mem_index, ot | MO_LE);
+        } else {
+            tcg_gen_sub_tl(cpu_T0, cpu_T0, cpu_T1);
+            tcg_gen_sub_tl(cpu_T0, cpu_T0, cpu_tmp4);
+            gen_op_st_rm_T0_A0(s1, ot, d);
+        }
         gen_op_update3_cc(cpu_tmp4);
         set_cc_op(s1, CC_OP_SBBB + ot);
         break;
     case OP_ADDL:
-        tcg_gen_add_tl(cpu_T0, cpu_T0, cpu_T1);
-        gen_op_st_rm_T0_A0(s1, ot, d);
+        if (s1->prefix & PREFIX_LOCK) {
+            tcg_gen_atomic_add_fetch_tl(cpu_T0, cpu_A0, cpu_T1,
+                                        s1->mem_index, ot | MO_LE);
+        } else {
+            tcg_gen_add_tl(cpu_T0, cpu_T0, cpu_T1);
+            gen_op_st_rm_T0_A0(s1, ot, d);
+        }
         gen_op_update2_cc();
         set_cc_op(s1, CC_OP_ADDB + ot);
         break;
     case OP_SUBL:
-        tcg_gen_mov_tl(cpu_cc_srcT, cpu_T0);
-        tcg_gen_sub_tl(cpu_T0, cpu_T0, cpu_T1);
-        gen_op_st_rm_T0_A0(s1, ot, d);
+        if (s1->prefix & PREFIX_LOCK) {
+            tcg_gen_neg_tl(cpu_T0, cpu_T1);
+            tcg_gen_atomic_fetch_add_tl(cpu_cc_srcT, cpu_A0, cpu_T0,
+                                        s1->mem_index, ot | MO_LE);
+            tcg_gen_sub_tl(cpu_T0, cpu_cc_srcT, cpu_T1);
+        } else {
+            tcg_gen_mov_tl(cpu_cc_srcT, cpu_T0);
+            tcg_gen_sub_tl(cpu_T0, cpu_T0, cpu_T1);
+            gen_op_st_rm_T0_A0(s1, ot, d);
+        }
         gen_op_update2_cc();
         set_cc_op(s1, CC_OP_SUBB + ot);
         break;
     default:
     case OP_ANDL:
-        tcg_gen_and_tl(cpu_T0, cpu_T0, cpu_T1);
-        gen_op_st_rm_T0_A0(s1, ot, d);
+        if (s1->prefix & PREFIX_LOCK) {
+            tcg_gen_atomic_and_fetch_tl(cpu_T0, cpu_A0, cpu_T1,
+                                        s1->mem_index, ot | MO_LE);
+        } else {
+            tcg_gen_and_tl(cpu_T0, cpu_T0, cpu_T1);
+            gen_op_st_rm_T0_A0(s1, ot, d);
+        }
         gen_op_update1_cc();
         set_cc_op(s1, CC_OP_LOGICB + ot);
         break;
     case OP_ORL:
-        tcg_gen_or_tl(cpu_T0, cpu_T0, cpu_T1);
-        gen_op_st_rm_T0_A0(s1, ot, d);
+        if (s1->prefix & PREFIX_LOCK) {
+            tcg_gen_atomic_or_fetch_tl(cpu_T0, cpu_A0, cpu_T1,
+                                       s1->mem_index, ot | MO_LE);
+        } else {
+            tcg_gen_or_tl(cpu_T0, cpu_T0, cpu_T1);
+            gen_op_st_rm_T0_A0(s1, ot, d);
+        }
         gen_op_update1_cc();
         set_cc_op(s1, CC_OP_LOGICB + ot);
         break;
     case OP_XORL:
-        tcg_gen_xor_tl(cpu_T0, cpu_T0, cpu_T1);
-        gen_op_st_rm_T0_A0(s1, ot, d);
+        if (s1->prefix & PREFIX_LOCK) {
+            tcg_gen_atomic_xor_fetch_tl(cpu_T0, cpu_A0, cpu_T1,
+                                        s1->mem_index, ot | MO_LE);
+        } else {
+            tcg_gen_xor_tl(cpu_T0, cpu_T0, cpu_T1);
+            gen_op_st_rm_T0_A0(s1, ot, d);
+        }
         gen_op_update1_cc();
         set_cc_op(s1, CC_OP_LOGICB + ot);
         break;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH v2 13/27] target-i386: emulate LOCK'ed INC using atomic helper
  2016-07-01 17:04 [Qemu-devel] [PATCH v2 00/27] cmpxchg-based emulation of atomics Richard Henderson
                   ` (10 preceding siblings ...)
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 12/27] target-i386: emulate LOCK'ed OP instructions using atomic helpers Richard Henderson
@ 2016-07-01 17:04 ` Richard Henderson
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 14/27] target-i386: emulate LOCK'ed NOT " Richard Henderson
                   ` (15 subsequent siblings)
  27 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2016-07-01 17:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota, alex.bennee, pbonzini, peter.maydell, serge.fdrv

From: "Emilio G. Cota" <cota@braap.org>

[rth: Merge gen_inc_locked back into gen_inc to share cc update.]

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1467054136-10430-14-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-i386/translate.c | 24 +++++++++++++-----------
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/target-i386/translate.c b/target-i386/translate.c
index 58bc954..3f10ff0 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -1362,21 +1362,23 @@ static void gen_op(DisasContext *s1, int op, TCGMemOp ot, int d)
 /* if d == OR_TMP0, it means memory operand (address in A0) */
 static void gen_inc(DisasContext *s1, TCGMemOp ot, int d, int c)
 {
-    if (d != OR_TMP0) {
-        gen_op_mov_v_reg(ot, cpu_T0, d);
+    if (s1->prefix & PREFIX_LOCK) {
+        tcg_gen_movi_tl(cpu_T0, c > 0 ? 1 : -1);
+        tcg_gen_atomic_add_fetch_tl(cpu_T0, cpu_A0, cpu_T0,
+                                    s1->mem_index, ot | MO_LE);
     } else {
-        gen_op_ld_v(s1, ot, cpu_T0, cpu_A0);
+        if (d != OR_TMP0) {
+            gen_op_mov_v_reg(ot, cpu_T0, d);
+        } else {
+            gen_op_ld_v(s1, ot, cpu_T0, cpu_A0);
+        }
+        tcg_gen_addi_tl(cpu_T0, cpu_T0, (c > 0 ? 1 : -1));
+        gen_op_st_rm_T0_A0(s1, ot, d);
     }
+
     gen_compute_eflags_c(s1, cpu_cc_src);
-    if (c > 0) {
-        tcg_gen_addi_tl(cpu_T0, cpu_T0, 1);
-        set_cc_op(s1, CC_OP_INCB + ot);
-    } else {
-        tcg_gen_addi_tl(cpu_T0, cpu_T0, -1);
-        set_cc_op(s1, CC_OP_DECB + ot);
-    }
-    gen_op_st_rm_T0_A0(s1, ot, d);
     tcg_gen_mov_tl(cpu_cc_dst, cpu_T0);
+    set_cc_op(s1, (c > 0 ? CC_OP_INCB : CC_OP_DECB) + ot);
 }
 
 static void gen_shift_flags(DisasContext *s, TCGMemOp ot, TCGv result,
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH v2 14/27] target-i386: emulate LOCK'ed NOT using atomic helper
  2016-07-01 17:04 [Qemu-devel] [PATCH v2 00/27] cmpxchg-based emulation of atomics Richard Henderson
                   ` (11 preceding siblings ...)
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 13/27] target-i386: emulate LOCK'ed INC using atomic helper Richard Henderson
@ 2016-07-01 17:04 ` Richard Henderson
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 15/27] target-i386: emulate LOCK'ed NEG using cmpxchg helper Richard Henderson
                   ` (14 subsequent siblings)
  27 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2016-07-01 17:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota, alex.bennee, pbonzini, peter.maydell, serge.fdrv

From: "Emilio G. Cota" <cota@braap.org>

[rth: Avoid qemu_load that's redundant with the atomic op.]

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1467054136-10430-15-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-i386/translate.c | 26 ++++++++++++++++++++------
 1 file changed, 20 insertions(+), 6 deletions(-)

diff --git a/target-i386/translate.c b/target-i386/translate.c
index 3f10ff0..78eadee 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -4675,10 +4675,15 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
         rm = (modrm & 7) | REX_B(s);
         op = (modrm >> 3) & 7;
         if (mod != 3) {
-            if (op == 0)
+            if (op == 0) {
                 s->rip_offset = insn_const_size(ot);
+            }
             gen_lea_modrm(env, s, modrm);
-            gen_op_ld_v(s, ot, cpu_T0, cpu_A0);
+            /* For those below that handle locked memory, don't load here.  */
+            if (!(s->prefix & PREFIX_LOCK)
+                || op != 2) {
+                gen_op_ld_v(s, ot, cpu_T0, cpu_A0);
+            }
         } else {
             gen_op_mov_v_reg(ot, cpu_T0, rm);
         }
@@ -4691,11 +4696,20 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
             set_cc_op(s, CC_OP_LOGICB + ot);
             break;
         case 2: /* not */
-            tcg_gen_not_tl(cpu_T0, cpu_T0);
-            if (mod != 3) {
-                gen_op_st_v(s, ot, cpu_T0, cpu_A0);
+            if (s->prefix & PREFIX_LOCK) {
+                if (mod == 3) {
+                    goto illegal_op;
+                }
+                tcg_gen_movi_tl(cpu_T0, ~0);
+                tcg_gen_atomic_xor_fetch_tl(cpu_T0, cpu_A0, cpu_T0,
+                                            s->mem_index, ot | MO_LE);
             } else {
-                gen_op_mov_reg_v(ot, rm, cpu_T0);
+                tcg_gen_not_tl(cpu_T0, cpu_T0);
+                if (mod != 3) {
+                    gen_op_st_v(s, ot, cpu_T0, cpu_A0);
+                } else {
+                    gen_op_mov_reg_v(ot, rm, cpu_T0);
+                }
             }
             break;
         case 3: /* neg */
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH v2 15/27] target-i386: emulate LOCK'ed NEG using cmpxchg helper
  2016-07-01 17:04 [Qemu-devel] [PATCH v2 00/27] cmpxchg-based emulation of atomics Richard Henderson
                   ` (12 preceding siblings ...)
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 14/27] target-i386: emulate LOCK'ed NOT " Richard Henderson
@ 2016-07-01 17:04 ` Richard Henderson
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 16/27] target-i386: emulate LOCK'ed XADD using atomic helper Richard Henderson
                   ` (13 subsequent siblings)
  27 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2016-07-01 17:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota, alex.bennee, pbonzini, peter.maydell, serge.fdrv

From: "Emilio G. Cota" <cota@braap.org>

[rth: Move redundant qemu_load out of cmpxchg loop.]

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1467054136-10430-16-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-i386/translate.c | 38 ++++++++++++++++++++++++++++++++++----
 1 file changed, 34 insertions(+), 4 deletions(-)

diff --git a/target-i386/translate.c b/target-i386/translate.c
index 78eadee..19d6f1e 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -4713,11 +4713,41 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
             }
             break;
         case 3: /* neg */
-            tcg_gen_neg_tl(cpu_T0, cpu_T0);
-            if (mod != 3) {
-                gen_op_st_v(s, ot, cpu_T0, cpu_A0);
+            if (s->prefix & PREFIX_LOCK) {
+                TCGLabel *label1;
+                TCGv a0, t0, t1, t2;
+
+                if (mod == 3) {
+                    goto illegal_op;
+                }
+                a0 = tcg_temp_local_new();
+                t0 = tcg_temp_local_new();
+                label1 = gen_new_label();
+
+                tcg_gen_mov_tl(a0, cpu_A0);
+                tcg_gen_mov_tl(t0, cpu_T0);
+
+                gen_set_label(label1);
+                t1 = tcg_temp_new();
+                t2 = tcg_temp_new();
+                tcg_gen_mov_tl(t2, t0);
+                tcg_gen_neg_tl(t1, t0);
+                tcg_gen_atomic_cmpxchg_tl(t0, a0, t0, t1,
+                                          s->mem_index, ot | MO_LE);
+                tcg_temp_free(t1);
+                tcg_gen_brcond_tl(TCG_COND_NE, t0, t2, label1);
+
+                tcg_temp_free(t2);
+                tcg_temp_free(a0);
+                tcg_gen_mov_tl(cpu_T0, t0);
+                tcg_temp_free(t0);
             } else {
-                gen_op_mov_reg_v(ot, rm, cpu_T0);
+                tcg_gen_neg_tl(cpu_T0, cpu_T0);
+                if (mod != 3) {
+                    gen_op_st_v(s, ot, cpu_T0, cpu_A0);
+                } else {
+                    gen_op_mov_reg_v(ot, rm, cpu_T0);
+                }
             }
             gen_op_update_neg_cc();
             set_cc_op(s, CC_OP_SUBB + ot);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH v2 16/27] target-i386: emulate LOCK'ed XADD using atomic helper
  2016-07-01 17:04 [Qemu-devel] [PATCH v2 00/27] cmpxchg-based emulation of atomics Richard Henderson
                   ` (13 preceding siblings ...)
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 15/27] target-i386: emulate LOCK'ed NEG using cmpxchg helper Richard Henderson
@ 2016-07-01 17:04 ` Richard Henderson
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 17/27] target-i386: emulate LOCK'ed BTX ops using atomic helpers Richard Henderson
                   ` (12 subsequent siblings)
  27 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2016-07-01 17:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota, alex.bennee, pbonzini, peter.maydell, serge.fdrv

From: "Emilio G. Cota" <cota@braap.org>

[rth: Move load of reg value to common location.]

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1467054136-10430-17-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-i386/translate.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/target-i386/translate.c b/target-i386/translate.c
index 19d6f1e..dc132a1 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -5135,19 +5135,24 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
         modrm = cpu_ldub_code(env, s->pc++);
         reg = ((modrm >> 3) & 7) | rex_r;
         mod = (modrm >> 6) & 3;
+        gen_op_mov_v_reg(ot, cpu_T0, reg);
         if (mod == 3) {
             rm = (modrm & 7) | REX_B(s);
-            gen_op_mov_v_reg(ot, cpu_T0, reg);
             gen_op_mov_v_reg(ot, cpu_T1, rm);
             tcg_gen_add_tl(cpu_T0, cpu_T0, cpu_T1);
             gen_op_mov_reg_v(ot, reg, cpu_T1);
             gen_op_mov_reg_v(ot, rm, cpu_T0);
         } else {
             gen_lea_modrm(env, s, modrm);
-            gen_op_mov_v_reg(ot, cpu_T0, reg);
-            gen_op_ld_v(s, ot, cpu_T1, cpu_A0);
-            tcg_gen_add_tl(cpu_T0, cpu_T0, cpu_T1);
-            gen_op_st_v(s, ot, cpu_T0, cpu_A0);
+            if (s->prefix & PREFIX_LOCK) {
+                tcg_gen_atomic_fetch_add_tl(cpu_T1, cpu_A0, cpu_T0,
+                                            s->mem_index, ot | MO_LE);
+                tcg_gen_add_tl(cpu_T0, cpu_T0, cpu_T1);
+            } else {
+                gen_op_ld_v(s, ot, cpu_T1, cpu_A0);
+                tcg_gen_add_tl(cpu_T0, cpu_T0, cpu_T1);
+                gen_op_st_v(s, ot, cpu_T0, cpu_A0);
+            }
             gen_op_mov_reg_v(ot, reg, cpu_T1);
         }
         gen_op_update2_cc();
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH v2 17/27] target-i386: emulate LOCK'ed BTX ops using atomic helpers
  2016-07-01 17:04 [Qemu-devel] [PATCH v2 00/27] cmpxchg-based emulation of atomics Richard Henderson
                   ` (14 preceding siblings ...)
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 16/27] target-i386: emulate LOCK'ed XADD using atomic helper Richard Henderson
@ 2016-07-01 17:04 ` Richard Henderson
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 18/27] target-i386: emulate XCHG using atomic helper Richard Henderson
                   ` (11 subsequent siblings)
  27 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2016-07-01 17:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota, alex.bennee, pbonzini, peter.maydell, serge.fdrv

From: "Emilio G. Cota" <cota@braap.org>

[rth: Avoid redundant qemu_ld in locked case.  Fix previously unnoticed
incorrect zero-extension of address in register-offset case.]

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1467054136-10430-18-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-i386/translate.c | 87 ++++++++++++++++++++++++++++++++-----------------
 1 file changed, 57 insertions(+), 30 deletions(-)

diff --git a/target-i386/translate.c b/target-i386/translate.c
index dc132a1..be097a6 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -6647,7 +6647,9 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
         if (mod != 3) {
             s->rip_offset = 1;
             gen_lea_modrm(env, s, modrm);
-            gen_op_ld_v(s, ot, cpu_T0, cpu_A0);
+            if (!(s->prefix & PREFIX_LOCK)) {
+                gen_op_ld_v(s, ot, cpu_T0, cpu_A0);
+            }
         } else {
             gen_op_mov_v_reg(ot, cpu_T0, rm);
         }
@@ -6677,44 +6679,69 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
         rm = (modrm & 7) | REX_B(s);
         gen_op_mov_v_reg(MO_32, cpu_T1, reg);
         if (mod != 3) {
-            gen_lea_modrm(env, s, modrm);
+            AddressParts a = gen_lea_modrm_0(env, s, modrm);
             /* specific case: we need to add a displacement */
             gen_exts(ot, cpu_T1);
             tcg_gen_sari_tl(cpu_tmp0, cpu_T1, 3 + ot);
             tcg_gen_shli_tl(cpu_tmp0, cpu_tmp0, ot);
-            tcg_gen_add_tl(cpu_A0, cpu_A0, cpu_tmp0);
-            gen_op_ld_v(s, ot, cpu_T0, cpu_A0);
+            tcg_gen_add_tl(cpu_A0, gen_lea_modrm_1(a), cpu_tmp0);
+            gen_lea_v_seg(s, s->aflag, cpu_A0, a.def_seg, s->override);
+            if (!(s->prefix & PREFIX_LOCK)) {
+                gen_op_ld_v(s, ot, cpu_T0, cpu_A0);
+            }
         } else {
             gen_op_mov_v_reg(ot, cpu_T0, rm);
         }
     bt_op:
         tcg_gen_andi_tl(cpu_T1, cpu_T1, (1 << (3 + ot)) - 1);
-        tcg_gen_shr_tl(cpu_tmp4, cpu_T0, cpu_T1);
-        switch(op) {
-        case 0:
-            break;
-        case 1:
-            tcg_gen_movi_tl(cpu_tmp0, 1);
-            tcg_gen_shl_tl(cpu_tmp0, cpu_tmp0, cpu_T1);
-            tcg_gen_or_tl(cpu_T0, cpu_T0, cpu_tmp0);
-            break;
-        case 2:
-            tcg_gen_movi_tl(cpu_tmp0, 1);
-            tcg_gen_shl_tl(cpu_tmp0, cpu_tmp0, cpu_T1);
-            tcg_gen_andc_tl(cpu_T0, cpu_T0, cpu_tmp0);
-            break;
-        default:
-        case 3:
-            tcg_gen_movi_tl(cpu_tmp0, 1);
-            tcg_gen_shl_tl(cpu_tmp0, cpu_tmp0, cpu_T1);
-            tcg_gen_xor_tl(cpu_T0, cpu_T0, cpu_tmp0);
-            break;
-        }
-        if (op != 0) {
-            if (mod != 3) {
-                gen_op_st_v(s, ot, cpu_T0, cpu_A0);
-            } else {
-                gen_op_mov_reg_v(ot, rm, cpu_T0);
+        tcg_gen_movi_tl(cpu_tmp0, 1);
+        tcg_gen_shl_tl(cpu_tmp0, cpu_tmp0, cpu_T1);
+        if (s->prefix & PREFIX_LOCK) {
+            switch (op) {
+            case 0: /* bt */
+                /* Needs no atomic ops; we surpressed the normal
+                   memory load for LOCK above so do it now.  */
+                gen_op_ld_v(s, ot, cpu_T0, cpu_A0);
+                break;
+            case 1: /* bts */
+                tcg_gen_atomic_fetch_or_tl(cpu_T0, cpu_A0, cpu_tmp0,
+                                           s->mem_index, ot | MO_LE);
+                break;
+            case 2: /* btr */
+                tcg_gen_not_tl(cpu_tmp0, cpu_tmp0);
+                tcg_gen_atomic_fetch_and_tl(cpu_T0, cpu_A0, cpu_tmp0,
+                                            s->mem_index, ot | MO_LE);
+                break;
+            default:
+            case 3: /* btc */
+                tcg_gen_atomic_fetch_xor_tl(cpu_T0, cpu_A0, cpu_tmp0,
+                                            s->mem_index, ot | MO_LE);
+                break;
+            }
+            tcg_gen_shr_tl(cpu_tmp4, cpu_T0, cpu_T1);
+        } else {
+            tcg_gen_shr_tl(cpu_tmp4, cpu_T0, cpu_T1);
+            switch (op) {
+            case 0: /* bt */
+                /* Data already loaded; nothing to do.  */
+                break;
+            case 1: /* bts */
+                tcg_gen_or_tl(cpu_T0, cpu_T0, cpu_tmp0);
+                break;
+            case 2: /* btr */
+                tcg_gen_andc_tl(cpu_T0, cpu_T0, cpu_tmp0);
+                break;
+            default:
+            case 3: /* btc */
+                tcg_gen_xor_tl(cpu_T0, cpu_T0, cpu_tmp0);
+                break;
+            }
+            if (op != 0) {
+                if (mod != 3) {
+                    gen_op_st_v(s, ot, cpu_T0, cpu_A0);
+                } else {
+                    gen_op_mov_reg_v(ot, rm, cpu_T0);
+                }
             }
         }
 
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH v2 18/27] target-i386: emulate XCHG using atomic helper
  2016-07-01 17:04 [Qemu-devel] [PATCH v2 00/27] cmpxchg-based emulation of atomics Richard Henderson
                   ` (15 preceding siblings ...)
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 17/27] target-i386: emulate LOCK'ed BTX ops using atomic helpers Richard Henderson
@ 2016-07-01 17:04 ` Richard Henderson
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 19/27] target-i386: remove helper_lock() Richard Henderson
                   ` (10 subsequent siblings)
  27 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2016-07-01 17:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota, alex.bennee, pbonzini, peter.maydell, serge.fdrv

From: "Emilio G. Cota" <cota@braap.org>

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1467054136-10430-19-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-i386/translate.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/target-i386/translate.c b/target-i386/translate.c
index be097a6..525c445 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -5556,12 +5556,8 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
             gen_lea_modrm(env, s, modrm);
             gen_op_mov_v_reg(ot, cpu_T0, reg);
             /* for xchg, lock is implicit */
-            if (!(prefixes & PREFIX_LOCK))
-                gen_helper_lock();
-            gen_op_ld_v(s, ot, cpu_T1, cpu_A0);
-            gen_op_st_v(s, ot, cpu_T0, cpu_A0);
-            if (!(prefixes & PREFIX_LOCK))
-                gen_helper_unlock();
+            tcg_gen_atomic_xchg_tl(cpu_T1, cpu_A0, cpu_T0,
+                                   s->mem_index, ot | MO_LE);
             gen_op_mov_reg_v(ot, reg, cpu_T1);
         }
         break;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH v2 19/27] target-i386: remove helper_lock()
  2016-07-01 17:04 [Qemu-devel] [PATCH v2 00/27] cmpxchg-based emulation of atomics Richard Henderson
                   ` (16 preceding siblings ...)
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 18/27] target-i386: emulate XCHG using atomic helper Richard Henderson
@ 2016-07-01 17:04 ` Richard Henderson
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 20/27] tests: add atomic_add-bench Richard Henderson
                   ` (9 subsequent siblings)
  27 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2016-07-01 17:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota, alex.bennee, pbonzini, peter.maydell, serge.fdrv

From: "Emilio G. Cota" <cota@braap.org>

It's been superseded by the atomic helpers.

The use of the atomic helpers provides a significant performance and scalability
improvement. Below is the result of running the atomic_add-test microbenchmark with:
 $ x86_64-linux-user/qemu-x86_64 tests/atomic_add-bench -o 5000000 -r $r -n $n
, where $n is the number of threads and $r is the allowed range for the additions.

The scenarios measured are:
- atomic: implements x86' ADDL with the atomic_add helper (i.e. this patchset)
- cmpxchg: implement x86' ADDL with a TCG loop using the cmpxchg helper
- master: before this patchset

Results sorted in ascending range, i.e. descending degree of contention.
Y axis is Throughput in Mops/s. Tests are run on an AMD machine with 64
Opteron 6376 cores.

                atomic_add-bench: 5000000 ops/thread, [0,1] range

  25 ++---------+----------+---------+----------+----------+----------+---++
     + atomic +-E--+       +         +          +          +          +    |
     |cmpxchg +-H--+                                                       |
  20 +Emaster +-N--+                                                      ++
     ||                                                                    |
     |++                                                                   |
     ||                                                                    |
  15 +++                                                                  ++
     |N|                                                                   |
     |+|                                                                   |
  10 ++|                                                                  ++
     |+|+                                                                  |
     | |    -+E+------        +++  ---+E+------+E+------+E+-----+E+------+E|
     |+E+E+- +++     +E+------+E+--                                        |
   5 ++|+                                                                 ++
     |+N+H+---                                 +++                         |
     ++++N+--+H++----+++   +  +++  --++H+------+H+------+H++----+H+---+--- |
   0 ++---------+-----H----+---H-----+----------+----------+----------+---H+
     0          10         20        30         40         50         60
                                Number of threads

                atomic_add-bench: 5000000 ops/thread, [0,2] range

  25 ++---------+----------+---------+----------+----------+----------+---++
     ++atomic +-E--+       +         +          +          +          +    |
     |cmpxchg +-H--+                                                       |
  20 ++master +-N--+                                                      ++
     |E|                                                                   |
     |++                                                                   |
     ||E                                                                   |
  15 ++|                                                                  ++
     |N||                                                                  |
     |+||                                   ---+E+------+E+-----+E+------+E|
  10 ++| |        ---+E+------+E+-----+E+---                    +++      +++
     ||H+E+--+E+--                                                         |
     |+++++                                                                |
     | ||                                                                  |
   5 ++|+H+--                                  +++                        ++
     |+N+    -                              ---+H+------+H+------          |
     +  +N+--+H++----+H+---+--+H+----++H+---    +          +    +H+---+--+H|
   0 ++---------+----------+---------+----------+----------+----------+---++
     0          10         20        30         40         50         60
                                Number of threads

                atomic_add-bench: 5000000 ops/thread, [0,8] range

  40 ++---------+----------+---------+----------+----------+----------+---++
     ++atomic +-E--+       +         +          +          +          +    |
  35 +cmpxchg +-H--+                                                      ++
     | master +-N--+               ---+E+------+E+------+E+-----+E+------+E|
  30 ++|                   ---+E+--   +++                                 ++
     | |            -+E+---                                                |
  25 ++E        ---- +++                                                  ++
     |+++++ -+E+                                                           |
  20 +E+ E-- +++                                                          ++
     |H|+++                                                                |
     |+|                                       +H+-------                  |
  15 ++H+                                   ---+++      +H+------         ++
     |N++H+--                         +++---                    +H+------++|
  10 ++ +++  -       +++           ---+H+                       +++      +H+
     | |     +H+-----+H+------+H+--                                        |
   5 ++|                      +++                                         ++
     ++N+N+--+N++          +         +          +          +          +    |
   0 ++---------+----------+---------+----------+----------+----------+---++
     0          10         20        30         40         50         60
                                Number of threads

               atomic_add-bench: 5000000 ops/thread, [0,128] range

  160 ++---------+---------+----------+---------+----------+----------+---++
      + atomic +-E--+      +          +         +          +          +    |
  140 +cmpxchg +-H--+                          +++      +++               ++
      | master +-N--+                           E--------E------+E+------++|
  120 ++                                      --|        |      +++       E+
      |                                     -- +++      +++              ++|
  100 ++                                   -                              ++
      |                                +++-                     +++      ++|
   80 ++                              -+E+    -+H+------+H+------H--------++
      |                           ----    ----                  +++       H|
      |            ---+E+-----+E+-  ---+H+                               ++|
   60 ++     +E+---   +++  ---+H+---                                      ++
      |    --+++   ---+H+--                                                |
   40 ++ +E+-+H+---                                                       ++
      |  +H+                                                               |
   20 +EE+                                                                ++
      +N+        +         +          +         +          +          +    |
    0 ++N-N---N--+---------+----------+---------+----------+----------+---++
      0          10        20         30        40         50         60
                                Number of threads

              atomic_add-bench: 5000000 ops/thread, [0,1024] range

  350 ++---------+---------+----------+---------+----------+----------+---++
      + atomic +-E--+      +          +         +          +          +    |
  300 +cmpxchg +-H--+                                                    +++
      | master +-N--+                                           +++       ||
      |                                                 +++      |    ----E|
  250 ++                                                 |   ----E----    ++
      |                                              ----E---    |    ---+H|
  200 ++                                      -+E+---   +++  ---+H+---    ++
      |                                   ----         -+H+--              |
      |                                +E+     +++ ---- +++                |
  150 ++                            ---+++  ---+H+-                       ++
      |                          ---  -+H+--                               |
  100 ++                   ---+E+ ---- +++                                ++
      |      +++   ---+E+-----+H+-                                         |
      |     -+E+------+H+--                                                |
   50 ++ +E+                                                              ++
      +EE+       +         +          +         +          +          +    |
    0 ++N-N---N--+---------+----------+---------+----------+----------+---++
      0          10        20         30        40         50         60
                                Number of threads

  hi-res: http://imgur.com/a/fMRmq

For master I stopped measuring master after 8 threads, because there is little
point in measuring the well-known performance collapse of a contended lock.

Note that using atomic helpers instead of cmpxchg is not only more performant
(when the host implements natively the same atomics, as is in this case); it also
simplifies the necessary TCG/helper code of the implementation. Compare
the difference between the atomic and cmpxchg implementations:

     case OP_ADDL:
         if (s1->prefix & PREFIX_LOCK) {
-            gen_atomic_add_fetch(cpu_T0, cpu_A0, cpu_T1, ot);
+            TCGv t0, t1, t2, a0;
+            TCGLabel *retry;
+
+            t0 = tcg_temp_local_new();
+            t1 = tcg_temp_local_new();
+            t2 = tcg_temp_local_new();
+            a0 = tcg_temp_local_new();
+            retry = gen_new_label();
+
+            tcg_gen_mov_tl(a0, cpu_A0);
+            tcg_gen_mov_tl(t1, cpu_T1);
+            gen_set_label(retry);
+            gen_op_ld_v(s1, ot, cpu_T0, a0);
+            tcg_gen_add_tl(t2, cpu_T0, t1);
+            gen_cmpxchg(t0, a0, cpu_T0, t2, ot);
+            tcg_gen_brcond_tl(TCG_COND_NE, t0, cpu_T0, retry);
+
+            tcg_gen_mov_tl(cpu_T0, t2);
+            tcg_gen_mov_tl(cpu_T1, t1);
+
+            tcg_temp_free(t0);
+            tcg_temp_free(t1);
+            tcg_temp_free(t2);
+            tcg_temp_free(a0);
         } else {

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1467054136-10430-21-git-send-email-cota@braap.org>
---
 target-i386/helper.h     |  2 --
 target-i386/mem_helper.c | 33 ---------------------------------
 target-i386/translate.c  | 15 ---------------
 3 files changed, 50 deletions(-)

diff --git a/target-i386/helper.h b/target-i386/helper.h
index 1320edc..d02357c 100644
--- a/target-i386/helper.h
+++ b/target-i386/helper.h
@@ -1,8 +1,6 @@
 DEF_HELPER_FLAGS_4(cc_compute_all, TCG_CALL_NO_RWG_SE, tl, tl, tl, tl, int)
 DEF_HELPER_FLAGS_4(cc_compute_c, TCG_CALL_NO_RWG_SE, tl, tl, tl, tl, int)
 
-DEF_HELPER_0(lock, void)
-DEF_HELPER_0(unlock, void)
 DEF_HELPER_3(write_eflags, void, env, tl, i32)
 DEF_HELPER_1(read_eflags, tl, env)
 DEF_HELPER_2(divb_AL, void, env, tl)
diff --git a/target-i386/mem_helper.c b/target-i386/mem_helper.c
index 5c0558f..8649115 100644
--- a/target-i386/mem_helper.c
+++ b/target-i386/mem_helper.c
@@ -25,39 +25,6 @@
 #include "qemu/int128.h"
 #include "tcg.h"
 
-/* broken thread support */
-
-#if defined(CONFIG_USER_ONLY)
-QemuMutex global_cpu_lock;
-
-void helper_lock(void)
-{
-    qemu_mutex_lock(&global_cpu_lock);
-}
-
-void helper_unlock(void)
-{
-    qemu_mutex_unlock(&global_cpu_lock);
-}
-
-void helper_lock_init(void)
-{
-    qemu_mutex_init(&global_cpu_lock);
-}
-#else
-void helper_lock(void)
-{
-}
-
-void helper_unlock(void)
-{
-}
-
-void helper_lock_init(void)
-{
-}
-#endif
-
 void helper_cmpxchg8b(CPUX86State *env, target_ulong a0)
 {
     uintptr_t ra = GETPC();
diff --git a/target-i386/translate.c b/target-i386/translate.c
index 525c445..9468cd5 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -4537,10 +4537,6 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
     s->aflag = aflag;
     s->dflag = dflag;
 
-    /* lock generation */
-    if (prefixes & PREFIX_LOCK)
-        gen_helper_lock();
-
     /* now check op code */
  reswitch:
     switch(b) {
@@ -8195,20 +8191,11 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
     default:
         goto unknown_op;
     }
-    /* lock generation */
-    if (s->prefix & PREFIX_LOCK)
-        gen_helper_unlock();
     return s->pc;
  illegal_op:
-    if (s->prefix & PREFIX_LOCK)
-        gen_helper_unlock();
-    /* XXX: ensure that no lock was generated */
     gen_illegal_opcode(s);
     return s->pc;
  unknown_op:
-    if (s->prefix & PREFIX_LOCK)
-        gen_helper_unlock();
-    /* XXX: ensure that no lock was generated */
     gen_unknown_opcode(env, s);
     return s->pc;
 }
@@ -8300,8 +8287,6 @@ void tcg_x86_init(void)
                                      offsetof(CPUX86State, bnd_regs[i].ub),
                                      bnd_regu_names[i]);
     }
-
-    helper_lock_init();
 }
 
 /* generate intermediate code for basic block 'tb'.  */
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH v2 20/27] tests: add atomic_add-bench
  2016-07-01 17:04 [Qemu-devel] [PATCH v2 00/27] cmpxchg-based emulation of atomics Richard Henderson
                   ` (17 preceding siblings ...)
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 19/27] target-i386: remove helper_lock() Richard Henderson
@ 2016-07-01 17:04 ` Richard Henderson
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 21/27] target-arm: Rearrange aa32 load and store functions Richard Henderson
                   ` (8 subsequent siblings)
  27 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2016-07-01 17:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota, alex.bennee, pbonzini, peter.maydell, serge.fdrv

From: "Emilio G. Cota" <cota@braap.org>

With this microbenchmark we can measure the overhead of emulating atomic
instructions with a configurable degree of contention.

The benchmark spawns $n threads, each performing $o atomic ops (additions)
in a loop. Each atomic operation is performed on a different cache line
(assuming lines are 64b long) that is randomly selected from a range [0, $r).

[ Note: each $foo corresponds to a -foo flag ]

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1467054136-10430-20-git-send-email-cota@braap.org>
---
 tests/.gitignore         |   1 +
 tests/Makefile.include   |   4 +-
 tests/atomic_add-bench.c | 180 +++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 184 insertions(+), 1 deletion(-)
 create mode 100644 tests/atomic_add-bench.c

diff --git a/tests/.gitignore b/tests/.gitignore
index 840ea39..52488a0 100644
--- a/tests/.gitignore
+++ b/tests/.gitignore
@@ -1,3 +1,4 @@
+atomic_add-bench
 check-qdict
 check-qfloat
 check-qint
diff --git a/tests/Makefile.include b/tests/Makefile.include
index 6c09962..7421778 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -408,7 +408,8 @@ test-obj-y = tests/check-qint.o tests/check-qstring.o tests/check-qdict.o \
 	tests/test-opts-visitor.o tests/test-qmp-event.o \
 	tests/rcutorture.o tests/test-rcu-list.o \
 	tests/test-qdist.o \
-	tests/test-qht.o tests/qht-bench.o tests/test-qht-par.o
+	tests/test-qht.o tests/qht-bench.o tests/test-qht-par.o \
+	tests/atomic_add-bench.o
 
 $(test-obj-y): QEMU_INCLUDES += -Itests
 QEMU_CFLAGS += -I$(SRC_PATH)/tests
@@ -451,6 +452,7 @@ tests/test-qdist$(EXESUF): tests/test-qdist.o $(test-util-obj-y)
 tests/test-qht$(EXESUF): tests/test-qht.o $(test-util-obj-y)
 tests/test-qht-par$(EXESUF): tests/test-qht-par.o tests/qht-bench$(EXESUF) $(test-util-obj-y)
 tests/qht-bench$(EXESUF): tests/qht-bench.o $(test-util-obj-y)
+tests/atomic_add-bench$(EXESUF): tests/atomic_add-bench.o $(test-util-obj-y)
 
 tests/test-qdev-global-props$(EXESUF): tests/test-qdev-global-props.o \
 	hw/core/qdev.o hw/core/qdev-properties.o hw/core/hotplug.o\
diff --git a/tests/atomic_add-bench.c b/tests/atomic_add-bench.c
new file mode 100644
index 0000000..5bbecf6
--- /dev/null
+++ b/tests/atomic_add-bench.c
@@ -0,0 +1,180 @@
+#include "qemu/osdep.h"
+#include "qemu/thread.h"
+#include "qemu/host-utils.h"
+#include "qemu/processor.h"
+
+struct thread_info {
+    uint64_t r;
+} QEMU_ALIGNED(64);
+
+struct count {
+    unsigned long val;
+} QEMU_ALIGNED(64);
+
+static QemuThread *threads;
+static struct thread_info *th_info;
+static unsigned int n_threads = 1;
+static unsigned int n_ready_threads;
+static struct count *counts;
+static unsigned long n_ops = 10000;
+static double duration;
+static unsigned int range = 1;
+static bool test_start;
+
+static const char commands_string[] =
+    " -n = number of threads\n"
+    " -o = number of ops per thread\n"
+    " -r = range (will be rounded up to pow2)";
+
+static void usage_complete(char *argv[])
+{
+    fprintf(stderr, "Usage: %s [options]\n", argv[0]);
+    fprintf(stderr, "options:\n%s\n", commands_string);
+}
+
+/*
+ * From: https://en.wikipedia.org/wiki/Xorshift
+ * This is faster than rand_r(), and gives us a wider range (RAND_MAX is only
+ * guaranteed to be >= INT_MAX).
+ */
+static uint64_t xorshift64star(uint64_t x)
+{
+    x ^= x >> 12; /* a */
+    x ^= x << 25; /* b */
+    x ^= x >> 27; /* c */
+    return x * UINT64_C(2685821657736338717);
+}
+
+static void *thread_func(void *arg)
+{
+    struct thread_info *info = arg;
+    unsigned long i;
+
+    atomic_inc(&n_ready_threads);
+    while (!atomic_mb_read(&test_start)) {
+        cpu_relax();
+    }
+
+    for (i = 0; i < n_ops; i++) {
+        unsigned int index;
+
+        info->r = xorshift64star(info->r);
+        index = info->r & (range - 1);
+        atomic_inc(&counts[index].val);
+    }
+    return NULL;
+}
+
+static inline
+uint64_t ts_subtract(const struct timespec *a, const struct timespec *b)
+{
+    uint64_t ns;
+
+    ns = (b->tv_sec - a->tv_sec) * 1000000000ULL;
+    ns += (b->tv_nsec - a->tv_nsec);
+    return ns;
+}
+
+static void run_test(void)
+{
+    unsigned int i;
+    struct timespec ts_start, ts_end;
+
+    while (atomic_read(&n_ready_threads) != n_threads) {
+        cpu_relax();
+    }
+    atomic_mb_set(&test_start, true);
+
+    clock_gettime(CLOCK_MONOTONIC, &ts_start);
+    for (i = 0; i < n_threads; i++) {
+        qemu_thread_join(&threads[i]);
+    }
+    clock_gettime(CLOCK_MONOTONIC, &ts_end);
+    duration = ts_subtract(&ts_start, &ts_end) / 1e9;
+}
+
+static void create_threads(void)
+{
+    unsigned int i;
+
+    threads = g_new(QemuThread, n_threads);
+    th_info = g_new(struct thread_info, n_threads);
+    counts = qemu_memalign(64, sizeof(*counts) * range);
+
+    for (i = 0; i < n_threads; i++) {
+        struct thread_info *info = &th_info[i];
+
+        info->r = (i + 1) ^ time(NULL);
+        qemu_thread_create(&threads[i], NULL, thread_func, info,
+                           QEMU_THREAD_JOINABLE);
+    }
+}
+
+static void pr_params(void)
+{
+    printf("Parameters:\n");
+    printf(" # of threads:      %u\n", n_threads);
+    printf(" n_ops:             %lu\n", n_ops);
+    printf(" ops' range:        %u\n", range);
+}
+
+static void pr_stats(void)
+{
+    unsigned long long val = 0;
+    unsigned int i;
+    double tx;
+
+    for (i = 0; i < range; i++) {
+        val += counts[i].val;
+    }
+    assert(val == n_threads * n_ops);
+    tx = val / duration / 1e6;
+
+    printf("Results:\n");
+    printf("Duration:            %.2f s\n", duration);
+    printf(" Throughput:         %.2f Mops/s\n", tx);
+    printf(" Throughput/thread:  %.2f Mops/s/thread\n", tx / n_threads);
+}
+
+static void parse_args(int argc, char *argv[])
+{
+    unsigned long long n_ops_ull;
+    int c;
+
+    for (;;) {
+        c = getopt(argc, argv, "hn:o:r:");
+        if (c < 0) {
+            break;
+        }
+        switch (c) {
+        case 'h':
+            usage_complete(argv);
+            exit(0);
+        case 'n':
+            n_threads = atoi(optarg);
+            break;
+        case 'o':
+            n_ops_ull = atoll(optarg);
+            if (n_ops_ull > ULONG_MAX) {
+                fprintf(stderr,
+                        "fatal: -o cannot be greater than %lu\n", ULONG_MAX);
+                exit(1);
+            }
+            n_ops = n_ops_ull;
+            break;
+        case 'r':
+            range = pow2ceil(atoi(optarg));
+            break;
+        }
+    }
+}
+
+int main(int argc, char *argv[])
+{
+    parse_args(argc, argv);
+    pr_params();
+    create_threads();
+    run_test();
+    pr_stats();
+    return 0;
+}
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH v2 21/27] target-arm: Rearrange aa32 load and store functions
  2016-07-01 17:04 [Qemu-devel] [PATCH v2 00/27] cmpxchg-based emulation of atomics Richard Henderson
                   ` (18 preceding siblings ...)
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 20/27] tests: add atomic_add-bench Richard Henderson
@ 2016-07-01 17:04 ` Richard Henderson
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 22/27] target-arm: emulate LL/SC using cmpxchg helpers Richard Henderson
                   ` (7 subsequent siblings)
  27 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2016-07-01 17:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota, alex.bennee, pbonzini, peter.maydell, serge.fdrv

Stop specializing on TARGET_LONG_BITS == 32; unconditionally allocate
a temp and expand with tcg_gen_extu_i32_tl.  Split out gen_aa32_addr,
gen_aa32_frob64, gen_aa32_ld_i32 and gen_aa32_st_i32 as separate interfaces.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-arm/translate.c | 171 +++++++++++++++++++------------------------------
 1 file changed, 66 insertions(+), 105 deletions(-)

diff --git a/target-arm/translate.c b/target-arm/translate.c
index bd5d5cb..1b5bf87 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -926,145 +926,106 @@ static inline void store_reg_from_load(DisasContext *s, int reg, TCGv_i32 var)
  * These functions work like tcg_gen_qemu_{ld,st}* except
  * that the address argument is TCGv_i32 rather than TCGv.
  */
-#if TARGET_LONG_BITS == 32
 
-#define DO_GEN_LD(SUFF, OPC, BE32_XOR)                                   \
-static inline void gen_aa32_ld##SUFF(DisasContext *s, TCGv_i32 val,      \
-                                     TCGv_i32 addr, int index)           \
-{                                                                        \
-    TCGMemOp opc = (OPC) | s->be_data;                                   \
-    /* Not needed for user-mode BE32, where we use MO_BE instead.  */    \
-    if (!IS_USER_ONLY && s->sctlr_b && BE32_XOR) {                       \
-        TCGv addr_be = tcg_temp_new();                                   \
-        tcg_gen_xori_i32(addr_be, addr, BE32_XOR);                       \
-        tcg_gen_qemu_ld_i32(val, addr_be, index, opc);                   \
-        tcg_temp_free(addr_be);                                          \
-        return;                                                          \
-    }                                                                    \
-    tcg_gen_qemu_ld_i32(val, addr, index, opc);                          \
-}
-
-#define DO_GEN_ST(SUFF, OPC, BE32_XOR)                                   \
-static inline void gen_aa32_st##SUFF(DisasContext *s, TCGv_i32 val,      \
-                                     TCGv_i32 addr, int index)           \
-{                                                                        \
-    TCGMemOp opc = (OPC) | s->be_data;                                   \
-    /* Not needed for user-mode BE32, where we use MO_BE instead.  */    \
-    if (!IS_USER_ONLY && s->sctlr_b && BE32_XOR) {                       \
-        TCGv addr_be = tcg_temp_new();                                   \
-        tcg_gen_xori_i32(addr_be, addr, BE32_XOR);                       \
-        tcg_gen_qemu_st_i32(val, addr_be, index, opc);                   \
-        tcg_temp_free(addr_be);                                          \
-        return;                                                          \
-    }                                                                    \
-    tcg_gen_qemu_st_i32(val, addr, index, opc);                          \
-}
-
-static inline void gen_aa32_ld64(DisasContext *s, TCGv_i64 val,
-                                 TCGv_i32 addr, int index)
+static inline TCGv gen_aa32_addr(DisasContext *s, TCGv_i32 a32, TCGMemOp op)
 {
-    TCGMemOp opc = MO_Q | s->be_data;
-    tcg_gen_qemu_ld_i64(val, addr, index, opc);
+    TCGv addr = tcg_temp_new();
+    tcg_gen_extu_i32_tl(addr, a32);
+
     /* Not needed for user-mode BE32, where we use MO_BE instead.  */
-    if (!IS_USER_ONLY && s->sctlr_b) {
-        tcg_gen_rotri_i64(val, val, 32);
+    if (!IS_USER_ONLY && s->sctlr_b && (op & MO_SIZE) < MO_32) {
+        tcg_gen_xori_tl(addr, addr, 4 - (1 << (op & MO_SIZE)));
     }
+    return addr;
 }
 
-static inline void gen_aa32_st64(DisasContext *s, TCGv_i64 val,
-                                 TCGv_i32 addr, int index)
+static void gen_aa32_ld_i32(DisasContext *s, TCGv_i32 val, TCGv_i32 a32,
+                            int index, TCGMemOp opc)
 {
-    TCGMemOp opc = MO_Q | s->be_data;
-    /* Not needed for user-mode BE32, where we use MO_BE instead.  */
-    if (!IS_USER_ONLY && s->sctlr_b) {
-        TCGv_i64 tmp = tcg_temp_new_i64();
-        tcg_gen_rotri_i64(tmp, val, 32);
-        tcg_gen_qemu_st_i64(tmp, addr, index, opc);
-        tcg_temp_free_i64(tmp);
-        return;
-    }
-    tcg_gen_qemu_st_i64(val, addr, index, opc);
+    TCGv addr = gen_aa32_addr(s, a32, opc);
+    tcg_gen_qemu_ld_i32(val, addr, index, opc);
+    tcg_temp_free(addr);
 }
 
-#else
+static void gen_aa32_st_i32(DisasContext *s, TCGv_i32 val, TCGv_i32 a32,
+                            int index, TCGMemOp opc)
+{
+    TCGv addr = gen_aa32_addr(s, a32, opc);
+    tcg_gen_qemu_st_i32(val, addr, index, opc);
+    tcg_temp_free(addr);
+}
 
-#define DO_GEN_LD(SUFF, OPC, BE32_XOR)                                   \
+#define DO_GEN_LD(SUFF, OPC)                                             \
 static inline void gen_aa32_ld##SUFF(DisasContext *s, TCGv_i32 val,      \
-                                     TCGv_i32 addr, int index)           \
+                                     TCGv_i32 a32, int index)            \
 {                                                                        \
-    TCGMemOp opc = (OPC) | s->be_data;                                   \
-    TCGv addr64 = tcg_temp_new();                                        \
-    tcg_gen_extu_i32_i64(addr64, addr);                                  \
-    /* Not needed for user-mode BE32, where we use MO_BE instead.  */    \
-    if (!IS_USER_ONLY && s->sctlr_b && BE32_XOR) {                       \
-        tcg_gen_xori_i64(addr64, addr64, BE32_XOR);                      \
-    }                                                                    \
-    tcg_gen_qemu_ld_i32(val, addr64, index, opc);                        \
-    tcg_temp_free(addr64);                                               \
-}
-
-#define DO_GEN_ST(SUFF, OPC, BE32_XOR)                                   \
+    gen_aa32_ld_i32(s, val, a32, index, OPC | s->be_data);               \
+}
+
+#define DO_GEN_ST(SUFF, OPC)                                             \
 static inline void gen_aa32_st##SUFF(DisasContext *s, TCGv_i32 val,      \
-                                     TCGv_i32 addr, int index)           \
+                                     TCGv_i32 a32, int index)            \
 {                                                                        \
-    TCGMemOp opc = (OPC) | s->be_data;                                   \
-    TCGv addr64 = tcg_temp_new();                                        \
-    tcg_gen_extu_i32_i64(addr64, addr);                                  \
-    /* Not needed for user-mode BE32, where we use MO_BE instead.  */    \
-    if (!IS_USER_ONLY && s->sctlr_b && BE32_XOR) {                       \
-        tcg_gen_xori_i64(addr64, addr64, BE32_XOR);                      \
-    }                                                                    \
-    tcg_gen_qemu_st_i32(val, addr64, index, opc);                        \
-    tcg_temp_free(addr64);                                               \
+    gen_aa32_st_i32(s, val, a32, index, OPC | s->be_data);               \
 }
 
-static inline void gen_aa32_ld64(DisasContext *s, TCGv_i64 val,
-                                 TCGv_i32 addr, int index)
+static inline void gen_aa32_frob64(DisasContext *s, TCGv_i64 val)
 {
-    TCGMemOp opc = MO_Q | s->be_data;
-    TCGv addr64 = tcg_temp_new();
-    tcg_gen_extu_i32_i64(addr64, addr);
-    tcg_gen_qemu_ld_i64(val, addr64, index, opc);
-
     /* Not needed for user-mode BE32, where we use MO_BE instead.  */
     if (!IS_USER_ONLY && s->sctlr_b) {
         tcg_gen_rotri_i64(val, val, 32);
     }
-    tcg_temp_free(addr64);
 }
 
-static inline void gen_aa32_st64(DisasContext *s, TCGv_i64 val,
-                                 TCGv_i32 addr, int index)
+static void gen_aa32_ld_i64(DisasContext *s, TCGv_i64 val, TCGv_i32 a32,
+                            int index, TCGMemOp opc)
 {
-    TCGMemOp opc = MO_Q | s->be_data;
-    TCGv addr64 = tcg_temp_new();
-    tcg_gen_extu_i32_i64(addr64, addr);
+    TCGv addr = gen_aa32_addr(s, a32, opc);
+    tcg_gen_qemu_ld_i64(val, addr, index, opc);
+    gen_aa32_frob64(s, val);
+    tcg_temp_free(addr);
+}
+
+static inline void gen_aa32_ld64(DisasContext *s, TCGv_i64 val,
+                                 TCGv_i32 a32, int index)
+{
+    gen_aa32_ld_i64(s, val, a32, index, MO_Q | s->be_data);
+}
+
+static void gen_aa32_st_i64(DisasContext *s, TCGv_i64 val, TCGv_i32 a32,
+                            int index, TCGMemOp opc)
+{
+    TCGv addr = gen_aa32_addr(s, a32, opc);
 
     /* Not needed for user-mode BE32, where we use MO_BE instead.  */
     if (!IS_USER_ONLY && s->sctlr_b) {
-        TCGv tmp = tcg_temp_new();
+        TCGv_i64 tmp = tcg_temp_new_i64();
         tcg_gen_rotri_i64(tmp, val, 32);
-        tcg_gen_qemu_st_i64(tmp, addr64, index, opc);
-        tcg_temp_free(tmp);
+        tcg_gen_qemu_st_i64(tmp, addr, index, opc);
+        tcg_temp_free_i64(tmp);
     } else {
-        tcg_gen_qemu_st_i64(val, addr64, index, opc);
+        tcg_gen_qemu_st_i64(val, addr, index, opc);
     }
-    tcg_temp_free(addr64);
+    tcg_temp_free(addr);
 }
 
-#endif
+static inline void gen_aa32_st64(DisasContext *s, TCGv_i64 val,
+                                 TCGv_i32 a32, int index)
+{
+    gen_aa32_st_i64(s, val, a32, index, MO_Q | s->be_data);
+}
 
-DO_GEN_LD(8s, MO_SB, 3)
-DO_GEN_LD(8u, MO_UB, 3)
-DO_GEN_LD(16s, MO_SW, 2)
-DO_GEN_LD(16u, MO_UW, 2)
-DO_GEN_LD(32u, MO_UL, 0)
+DO_GEN_LD(8s, MO_SB)
+DO_GEN_LD(8u, MO_UB)
+DO_GEN_LD(16s, MO_SW)
+DO_GEN_LD(16u, MO_UW)
+DO_GEN_LD(32u, MO_UL)
 /* 'a' variants include an alignment check */
-DO_GEN_LD(16ua, MO_UW | MO_ALIGN, 2)
-DO_GEN_LD(32ua, MO_UL | MO_ALIGN, 0)
-DO_GEN_ST(8, MO_UB, 3)
-DO_GEN_ST(16, MO_UW, 2)
-DO_GEN_ST(32, MO_UL, 0)
+DO_GEN_LD(16ua, MO_UW | MO_ALIGN)
+DO_GEN_LD(32ua, MO_UL | MO_ALIGN)
+DO_GEN_ST(8, MO_UB)
+DO_GEN_ST(16, MO_UW)
+DO_GEN_ST(32, MO_UL)
 
 static inline void gen_set_pc_im(DisasContext *s, target_ulong val)
 {
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH v2 22/27] target-arm: emulate LL/SC using cmpxchg helpers
  2016-07-01 17:04 [Qemu-devel] [PATCH v2 00/27] cmpxchg-based emulation of atomics Richard Henderson
                   ` (19 preceding siblings ...)
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 21/27] target-arm: Rearrange aa32 load and store functions Richard Henderson
@ 2016-07-01 17:04 ` Richard Henderson
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 23/27] target-arm: emulate SWP with atomic_xchg helper Richard Henderson
                   ` (6 subsequent siblings)
  27 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2016-07-01 17:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota, alex.bennee, pbonzini, peter.maydell, serge.fdrv

From: "Emilio G. Cota" <cota@braap.org>

Emulating LL/SC with cmpxchg is not correct, since it can
suffer from the ABA problem. Portable parallel code, however,
is written assuming only cmpxchg--and not LL/SC--is available.
This means that in practice emulating LL/SC with cmpxchg is
a viable alternative.

The appended emulates LL/SC pairs in ARM with cmpxchg helpers.
This works in both user and system mode. In usermode, it avoids
pausing all other CPUs to perform the LL/SC pair. The subsequent
performance and scalability improvement is significant, as the
plots below show. They plot the throughput of atomic_add-bench
compiled for ARM and executed on a 64-core x86 machine.

Hi-res plots: http://imgur.com/a/aNQpB

               atomic_add-bench: 1000000 ops/thread, [0,1] range

  9 ++---------+----------+----------+----------+----------+----------+---++
    +cmpxchg +-E--+       +          +          +          +          +    |
  8 +Emaster +-H--+                                                       ++
    | |                                                                    |
  7 ++E                                                                   ++
    | |                                                                    |
  6 ++++                                                                  ++
    |  |                                                                   |
  5 ++ |                                                                  ++
  4 ++ |                                                                  ++
    |  |                                                                   |
  3 ++ |                                                                  ++
    |   |                                                                  |
  2 ++  |                                                                 ++
    |H++E+---                                  +++  ---+E+------+E+------+E|
  1 +++     +E+-----+E+------+E+------+E+------+E+--   +++      +++       ++
    ++H+       +    +++   +  +++     ++++       +          +          +    |
  0 ++--H----H-+-----H----+----------+----------+----------+----------+---++
    0          10         20         30         40         50         60
                               Number of threads

                atomic_add-bench: 1000000 ops/thread, [0,2] range

  16 ++---------+----------+---------+----------+----------+----------+---++
     +cmpxchg +-E--+       +         +          +          +          +    |
  14 ++master +-H--+                                                      ++
     | |                                                                   |
  12 ++|                                                                  ++
     | E                                                                   |
  10 ++|                                                                  ++
     | |                                                                   |
   8 ++++                                                                 ++
     |E+|                                                                  |
     |  |                                                                  |
   6 ++ |                                                                 ++
     |   |                                                                 |
   4 ++  |                                                                ++
     |  +E+---       +++      +++              +++           ---+E+------+E|
   2 +H+     +E+------E-------+E+-----+E+------+E+------+E+--            +++
     + |        +    +++   +         ++++       +          +          +    |
   0 ++H-H----H-+-----H----+---------+----------+----------+----------+---++
     0          10         20        30         40         50         60
                                Number of threads

               atomic_add-bench: 1000000 ops/thread, [0,128] range

  70 ++---------+----------+---------+----------+----------+----------+---++
     +cmpxchg +-E--+       +         +          +       ++++          +    |
  60 ++master +-H--+                                 ----E------+E+-------++
     |                                        -+E+---   +++     +++      +E|
     |                                +++ ---- +++                       ++|
  50 ++                       +++  ---+E+-                                ++
     |                        -E---                                        |
  40 ++                    ---+++                                         ++
     |               +++---                                                |
     |              -+E+                                                   |
  30 ++      +++----                                                      ++
     |       +E+                                                           |
  20 ++ +++--                                                             ++
     |  +E+                                                                |
     |+E+                                                                  |
  10 +E+                                                                  ++
     +          +          +         +          +          +          +    |
   0 +HH-H----H-+-----H----+---------+----------+----------+----------+---++
     0          10         20        30         40         50         60
                                Number of threads

              atomic_add-bench: 1000000 ops/thread, [0,1024] range

  120 ++---------+---------+----------+---------+----------+----------+---++
      +cmpxchg +-E--+      +          +         +          +          +    |
      | master +-H--+                                                    ++|
  100 ++                                                              ----E+
      |                                                 +++  ---+E+---   ++|
      |                                                --E---   +++        |
   80 ++                                           ---- +++               ++
      |                                     ---+E+-                        |
   60 ++                              -+E+--                              ++
      |                       +++ ---- +++                                 |
      |                      -+E+-                                         |
   40 ++              +++----                                             ++
      |      +++   ---+E+                                                  |
      |     -+E+---                                                        |
   20 ++ +E+                                                              ++
      |+E+++                                                               |
      +E+        +         +          +         +          +          +    |
    0 +HH-H---H--+-----H---+----------+---------+----------+----------+---++
      0          10        20         30        40         50         60
                                Number of threads

[rth: Enforce alignment for ldrexd.]

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1467054136-10430-23-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-arm/translate.c | 136 +++++++++++++++----------------------------------
 1 file changed, 42 insertions(+), 94 deletions(-)

diff --git a/target-arm/translate.c b/target-arm/translate.c
index 1b5bf87..680635c 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -7676,47 +7676,27 @@ static void gen_logicq_cc(TCGv_i32 lo, TCGv_i32 hi)
     tcg_gen_or_i32(cpu_ZF, lo, hi);
 }
 
-/* Load/Store exclusive instructions are implemented by remembering
-   the value/address loaded, and seeing if these are the same
-   when the store is performed. This should be sufficient to implement
-   the architecturally mandated semantics, and avoids having to monitor
-   regular stores.
-
-   In system emulation mode only one CPU will be running at once, so
-   this sequence is effectively atomic.  In user emulation mode we
-   throw an exception and handle the atomic operation elsewhere.  */
 static void gen_load_exclusive(DisasContext *s, int rt, int rt2,
                                TCGv_i32 addr, int size)
 {
     TCGv_i32 tmp = tcg_temp_new_i32();
+    TCGMemOp opc = size | MO_ALIGN | s->be_data;
 
     s->is_ldex = true;
 
-    switch (size) {
-    case 0:
-        gen_aa32_ld8u(s, tmp, addr, get_mem_index(s));
-        break;
-    case 1:
-        gen_aa32_ld16ua(s, tmp, addr, get_mem_index(s));
-        break;
-    case 2:
-    case 3:
-        gen_aa32_ld32ua(s, tmp, addr, get_mem_index(s));
-        break;
-    default:
-        abort();
-    }
-
     if (size == 3) {
         TCGv_i32 tmp2 = tcg_temp_new_i32();
-        TCGv_i32 tmp3 = tcg_temp_new_i32();
+        TCGv_i64 t64 = tcg_temp_new_i64();
+
+        gen_aa32_ld_i64(s, t64, addr, get_mem_index(s), opc);
+        tcg_gen_mov_i64(cpu_exclusive_val, t64);
+        tcg_gen_extr_i64_i32(tmp, tmp2, t64);
+        tcg_temp_free_i64(t64);
 
-        tcg_gen_addi_i32(tmp2, addr, 4);
-        gen_aa32_ld32u(s, tmp3, tmp2, get_mem_index(s));
+        store_reg(s, rt2, tmp2);
         tcg_temp_free_i32(tmp2);
-        tcg_gen_concat_i32_i64(cpu_exclusive_val, tmp, tmp3);
-        store_reg(s, rt2, tmp3);
     } else {
+        gen_aa32_ld_i32(s, tmp, addr, get_mem_index(s), opc);
         tcg_gen_extu_i32_i64(cpu_exclusive_val, tmp);
     }
 
@@ -7729,23 +7709,15 @@ static void gen_clrex(DisasContext *s)
     tcg_gen_movi_i64(cpu_exclusive_addr, -1);
 }
 
-#ifdef CONFIG_USER_ONLY
-static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2,
-                                TCGv_i32 addr, int size)
-{
-    tcg_gen_extu_i32_i64(cpu_exclusive_test, addr);
-    tcg_gen_movi_i32(cpu_exclusive_info,
-                     size | (rd << 4) | (rt << 8) | (rt2 << 12));
-    gen_exception_internal_insn(s, 4, EXCP_STREX);
-}
-#else
 static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2,
                                 TCGv_i32 addr, int size)
 {
-    TCGv_i32 tmp;
-    TCGv_i64 val64, extaddr;
+    TCGv_i32 t0, t1, t2;
+    TCGv_i64 extaddr;
+    TCGv taddr;
     TCGLabel *done_label;
     TCGLabel *fail_label;
+    TCGMemOp opc = size | MO_ALIGN | s->be_data;
 
     /* if (env->exclusive_addr == addr && env->exclusive_val == [addr]) {
          [addr] = {Rt};
@@ -7760,69 +7732,45 @@ static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2,
     tcg_gen_brcond_i64(TCG_COND_NE, extaddr, cpu_exclusive_addr, fail_label);
     tcg_temp_free_i64(extaddr);
 
-    tmp = tcg_temp_new_i32();
-    switch (size) {
-    case 0:
-        gen_aa32_ld8u(s, tmp, addr, get_mem_index(s));
-        break;
-    case 1:
-        gen_aa32_ld16u(s, tmp, addr, get_mem_index(s));
-        break;
-    case 2:
-    case 3:
-        gen_aa32_ld32u(s, tmp, addr, get_mem_index(s));
-        break;
-    default:
-        abort();
-    }
-
-    val64 = tcg_temp_new_i64();
+    taddr = gen_aa32_addr(s, addr, opc);
+    t0 = tcg_temp_new_i32();
+    t1 = load_reg(s, rt);
     if (size == 3) {
-        TCGv_i32 tmp2 = tcg_temp_new_i32();
-        TCGv_i32 tmp3 = tcg_temp_new_i32();
-        tcg_gen_addi_i32(tmp2, addr, 4);
-        gen_aa32_ld32u(s, tmp3, tmp2, get_mem_index(s));
-        tcg_temp_free_i32(tmp2);
-        tcg_gen_concat_i32_i64(val64, tmp, tmp3);
-        tcg_temp_free_i32(tmp3);
-    } else {
-        tcg_gen_extu_i32_i64(val64, tmp);
-    }
-    tcg_temp_free_i32(tmp);
+        TCGv_i64 o64 = tcg_temp_new_i64();
+        TCGv_i64 n64 = tcg_temp_new_i64();
 
-    tcg_gen_brcond_i64(TCG_COND_NE, val64, cpu_exclusive_val, fail_label);
-    tcg_temp_free_i64(val64);
+        t2 = load_reg(s, rt2);
+        tcg_gen_concat_i32_i64(n64, t1, t2);
+        tcg_temp_free_i32(t2);
+        gen_aa32_frob64(s, n64);
 
-    tmp = load_reg(s, rt);
-    switch (size) {
-    case 0:
-        gen_aa32_st8(s, tmp, addr, get_mem_index(s));
-        break;
-    case 1:
-        gen_aa32_st16(s, tmp, addr, get_mem_index(s));
-        break;
-    case 2:
-    case 3:
-        gen_aa32_st32(s, tmp, addr, get_mem_index(s));
-        break;
-    default:
-        abort();
-    }
-    tcg_temp_free_i32(tmp);
-    if (size == 3) {
-        tcg_gen_addi_i32(addr, addr, 4);
-        tmp = load_reg(s, rt2);
-        gen_aa32_st32(s, tmp, addr, get_mem_index(s));
-        tcg_temp_free_i32(tmp);
+        tcg_gen_atomic_cmpxchg_i64(o64, taddr, cpu_exclusive_val, n64,
+                                   get_mem_index(s), opc);
+        tcg_temp_free_i64(n64);
+
+        gen_aa32_frob64(s, o64);
+        tcg_gen_setcond_i64(TCG_COND_NE, o64, o64, cpu_exclusive_val);
+        tcg_gen_extrl_i64_i32(t0, o64);
+
+        tcg_temp_free_i64(o64);
+    } else {
+        t2 = tcg_temp_new_i32();
+        tcg_gen_extrl_i64_i32(t2, cpu_exclusive_val);
+        tcg_gen_atomic_cmpxchg_i32(t0, taddr, t2, t1, get_mem_index(s), opc);
+        tcg_gen_setcond_i32(TCG_COND_NE, t0, t0, t2);
+        tcg_temp_free_i32(t2);
     }
-    tcg_gen_movi_i32(cpu_R[rd], 0);
+    tcg_temp_free_i32(t1);
+    tcg_temp_free(taddr);
+    tcg_gen_mov_i32(cpu_R[rd], t0);
+    tcg_temp_free_i32(t0);
     tcg_gen_br(done_label);
+
     gen_set_label(fail_label);
     tcg_gen_movi_i32(cpu_R[rd], 1);
     gen_set_label(done_label);
     tcg_gen_movi_i64(cpu_exclusive_addr, -1);
 }
-#endif
 
 /* gen_srs:
  * @env: CPUARMState
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH v2 23/27] target-arm: emulate SWP with atomic_xchg helper
  2016-07-01 17:04 [Qemu-devel] [PATCH v2 00/27] cmpxchg-based emulation of atomics Richard Henderson
                   ` (20 preceding siblings ...)
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 22/27] target-arm: emulate LL/SC using cmpxchg helpers Richard Henderson
@ 2016-07-01 17:04 ` Richard Henderson
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 24/27] target-arm: emulate aarch64's LL/SC using cmpxchg helpers Richard Henderson
                   ` (5 subsequent siblings)
  27 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2016-07-01 17:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota, alex.bennee, pbonzini, peter.maydell, serge.fdrv

From: "Emilio G. Cota" <cota@braap.org>

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1467054136-10430-25-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-arm/translate.c | 25 +++++++++++++------------
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/target-arm/translate.c b/target-arm/translate.c
index 680635c..2b3c34f 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -8741,25 +8741,26 @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
                         }
                         tcg_temp_free_i32(addr);
                     } else {
+                        TCGv taddr;
+                        TCGMemOp opc = s->be_data;
+
                         /* SWP instruction */
                         rm = (insn) & 0xf;
 
-                        /* ??? This is not really atomic.  However we know
-                           we never have multiple CPUs running in parallel,
-                           so it is good enough.  */
-                        addr = load_reg(s, rn);
-                        tmp = load_reg(s, rm);
-                        tmp2 = tcg_temp_new_i32();
                         if (insn & (1 << 22)) {
-                            gen_aa32_ld8u(s, tmp2, addr, get_mem_index(s));
-                            gen_aa32_st8(s, tmp, addr, get_mem_index(s));
+                            opc |= MO_UB;
                         } else {
-                            gen_aa32_ld32u(s, tmp2, addr, get_mem_index(s));
-                            gen_aa32_st32(s, tmp, addr, get_mem_index(s));
+                            opc |= MO_UL | MO_ALIGN;
                         }
-                        tcg_temp_free_i32(tmp);
+
+                        addr = load_reg(s, rn);
+                        taddr = gen_aa32_addr(s, addr, opc);
                         tcg_temp_free_i32(addr);
-                        store_reg(s, rd, tmp2);
+
+                        tmp = load_reg(s, rm);
+                        tcg_gen_atomic_xchg_i32(tmp, taddr, tmp,
+                                                get_mem_index(s), opc);
+                        store_reg(s, rd, tmp);
                     }
                 }
             } else {
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH v2 24/27] target-arm: emulate aarch64's LL/SC using cmpxchg helpers
  2016-07-01 17:04 [Qemu-devel] [PATCH v2 00/27] cmpxchg-based emulation of atomics Richard Henderson
                   ` (21 preceding siblings ...)
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 23/27] target-arm: emulate SWP with atomic_xchg helper Richard Henderson
@ 2016-07-01 17:04 ` Richard Henderson
  2016-07-08  3:34   ` Emilio G. Cota
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 25/27] linux-user: remove handling of ARM's EXCP_STREX Richard Henderson
                   ` (4 subsequent siblings)
  27 siblings, 1 reply; 48+ messages in thread
From: Richard Henderson @ 2016-07-01 17:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota, alex.bennee, pbonzini, peter.maydell, serge.fdrv

From: "Emilio G. Cota" <cota@braap.org>

Emulating LL/SC with cmpxchg is not correct, since it can
suffer from the ABA problem. Portable parallel code, however,
is written assuming only cmpxchg--and not LL/SC--is available.
This means that in practice emulating LL/SC with cmpxchg is
a viable alternative.

The appended emulates LL/SC pairs in aarch64 with cmpxchg helpers.
This works in both user and system mode. In usermode, it avoids
pausing all other CPUs to perform the LL/SC pair. The subsequent
performance and scalability improvement is significant, as the
plots below show. They plot the throughput of atomic_add-bench
compiled for ARM and executed on a 64-core x86 machine.

Hi-res plots: http://imgur.com/a/JVc8Y

                atomic_add-bench: 1000000 ops/thread, [0,1] range

  18 ++---------+----------+---------+----------+----------+----------+---++
     +cmpxchg +-E--+       +         +          +          +          +    |
  16 ++master +-H--+                                                      ++
     ||                                                                    |
  14 ++                                                                   ++
     | |                                                                   |
  12 ++|                                                                  ++
     | |                                                                   |
  10 ++++                                                                 ++
   8 ++E                                                                  ++
     |+++                                                                  |
   6 ++ |                                                                 ++
     |  |                                                                  |
   4 ++ |                                                                 ++
     |   |                                                                 |
   2 +H++E+---                                                            ++
     + |     +E++----+E+---+--+E+----++E+------+E+------+E++----+E+---+--+E|
   0 ++H-H----H-+-----H----+---------+----------+----------+----------+---++
     0          10         20        30         40         50         60
                                Number of threads

                atomic_add-bench: 1000000 ops/thread, [0,2] range

  18 ++---------+----------+---------+----------+----------+----------+---++
     +cmpxchg +-E--+       +         +          +          +          +    |
  16 ++master +-H--+                                                      ++
     | |                                                                   |
  14 ++E                                                                  ++
     | |                                                                   |
  12 ++|                                                                  ++
     |+++                                                                  |
  10 ++ |                                                                 ++
   8 ++ |                                                                 ++
     |  |                                                                  |
   6 ++ |                                                                 ++
     |   |                                                                 |
   4 ++  |                                                                ++
     |  +E+---                                                             |
   2 +H+     +E+-----+++              +++      +++   ---+E+-----+E+------+++
     +++        +    +E+---+--+E+----++E+------+E+---   ++++    +++   +  +E|
   0 ++H-H----H-+-----H----+---------+----------+----------+----------+---++
     0          10         20        30         40         50         60
                                Number of threads

               atomic_add-bench: 1000000 ops/thread, [0,128] range

  70 ++---------+----------+---------+----------+----------+----------+---++
     +cmpxchg +-E--+       +         +          +          +          +    |
  60 ++master +-H--+                  +++            ---+E+-----+E+------+E+
     |                        +E+------E-------+E+---                      |
     |                     ---        +++                                  |
  50 ++              +++---                                               ++
     |              -+E+                                                   |
  40 ++      +++----                                                      ++
     |        E-                                                           |
     |      --|                                                            |
  30 ++   -- +++                                                          ++
     |  +E+                                                                |
  20 ++E+                                                                 ++
     |E+                                                                   |
     |                                                                     |
  10 ++                                                                   ++
     +          +          +         +          +          +          +    |
   0 +HH-H----H-+-----H----+---------+----------+----------+----------+---++
     0          10         20        30         40         50         60
                                Number of threads

              atomic_add-bench: 1000000 ops/thread, [0,1024] range

  160 ++---------+---------+----------+---------+----------+----------+---++
      +cmpxchg +-E--+      +          +         +          +          +    |
  140 ++master +-H--+                                           +++      +++
      |                                                -+E+-----+E+-------E|
  120 ++                                       +++ ----                  +++
      |                                +++  ----E--                        |
  100 ++                              --E---   +++                        ++
      |                       +++ ---- +++                                 |
   80 ++                     --E--                                        ++
      |                  ---- +++                                          |
      |              -+E+                                                  |
   60 ++         ---- +++                                                 ++
      |      +E+-                                                          |
   40 ++   --                                                             ++
      |  +E+                                                               |
   20 +EE+                                                                ++
      +++        +         +          +         +          +          +    |
    0 +HH-H---H--+-----H---+----------+---------+----------+----------+---++
      0          10        20         30        40         50         60
                                Number of threads

[rth: Rearrange 128-bit cmpxchg helper.  Enforce alignment on LL.]

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1467054136-10430-28-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-arm/helper-a64.c    | 135 +++++++++++++++++++++++++++++++++++++++++++++
 target-arm/helper-a64.h    |   2 +
 target-arm/translate-a64.c | 106 ++++++++++++++++-------------------
 3 files changed, 185 insertions(+), 58 deletions(-)

diff --git a/target-arm/helper-a64.c b/target-arm/helper-a64.c
index 41e48a4..d98d781 100644
--- a/target-arm/helper-a64.c
+++ b/target-arm/helper-a64.c
@@ -27,6 +27,10 @@
 #include "qemu/bitops.h"
 #include "internals.h"
 #include "qemu/crc32c.h"
+#include "exec/exec-all.h"
+#include "exec/cpu_ldst.h"
+#include "qemu/int128.h"
+#include "tcg.h"
 #include <zlib.h> /* For crc32 */
 
 /* C2.4.7 Multiply and divide */
@@ -444,3 +448,134 @@ uint64_t HELPER(crc32c_64)(uint64_t acc, uint64_t val, uint32_t bytes)
     /* Linux crc32c converts the output to one's complement.  */
     return crc32c(acc, buf, bytes) ^ 0xffffffff;
 }
+
+/* Returns 0 on success; 1 otherwise.  */
+uint64_t HELPER(paired_cmpxchg64_le)(CPUARMState *env, uint64_t addr,
+                                     uint64_t new_lo, uint64_t new_hi)
+{
+#ifndef CONFIG_USER_ONLY
+    uintptr_t ra = GETPC();
+#endif
+    Int128 oldv, cmpv, newv;
+    bool success;
+
+    cmpv = int128_make128(env->exclusive_val, env->exclusive_high);
+    newv = int128_make128(new_lo, new_hi);
+
+    if (parallel_cpus) {
+#ifndef CONFIG_ATOMIC128
+        cpu_loop_exit_atomic(ENV_GET_CPU(env), ra);
+#elif defined(CONFIG_USER_ONLY)
+        Int128 *haddr = g2h(addr);
+#ifdef HOST_WORDS_BIGENDIAN
+        cmpv = bswap128(cmpv);
+        newv = bswap128(newv);
+#endif
+        success = __atomic_compare_exchange_16(haddr, &cmpv, newv, false,
+                                               __ATOMIC_SEQ_CST,
+                                               __ATOMIC_SEQ_CST);
+#else
+        int mem_idx = cpu_mmu_index(env, false);
+        TCGMemOpIdx oi = make_memop_idx(MO_LEQ | MO_ALIGN_16, mem_idx);
+        oldv = helper_atomic_cmpxchgo_le_mmu(env, addr, cmpv, newv, oi, ra);
+        success = int128_eq(oldv, cmpv);
+#endif
+    } else {
+        uint64_t o0, o1;
+
+#ifdef CONFIG_USER_ONLY
+        /* ??? Enforce alignment.  */
+        uint64_t *haddr = g2h(addr);
+        o0 = ldq_le_p(haddr + 0);
+        o1 = ldq_le_p(haddr + 1);
+        oldv = int128_make128(o0, o1);
+
+        success = int128_eq(oldv, cmpv);
+        if (success) {
+            stq_le_p(haddr + 0, int128_getlo(newv));
+            stq_le_p(haddr + 8, int128_gethi(newv));
+        }
+#else
+        int mem_idx = cpu_mmu_index(env, false);
+        TCGMemOpIdx oi0 = make_memop_idx(MO_LEQ | MO_ALIGN_16, mem_idx);
+        TCGMemOpIdx oi1 = make_memop_idx(MO_LEQ, mem_idx);
+
+        o0 = helper_le_ldq_mmu(env, addr + 0, oi0, ra);
+        o1 = helper_le_ldq_mmu(env, addr + 8, oi1, ra);
+        oldv = int128_make128(o0, o1);
+
+        success = int128_eq(oldv, cmpv);
+        if (success) {
+            helper_le_stq_mmu(env, addr + 0, int128_getlo(newv), oi1, ra);
+            helper_le_stq_mmu(env, addr + 8, int128_gethi(newv), oi1, ra);
+        }
+#endif
+    }
+
+    return !success;
+}
+
+uint64_t HELPER(paired_cmpxchg64_be)(CPUARMState *env, uint64_t addr,
+                                     uint64_t new_lo, uint64_t new_hi)
+{
+#ifndef CONFIG_USER_ONLY
+    uintptr_t ra = GETPC();
+#endif
+    Int128 oldv, cmpv, newv;
+    bool success;
+
+    cmpv = int128_make128(env->exclusive_val, env->exclusive_high);
+    newv = int128_make128(new_lo, new_hi);
+
+    if (parallel_cpus) {
+#ifndef CONFIG_ATOMIC128
+        cpu_loop_exit_atomic(ENV_GET_CPU(env), ra);
+#elif defined(CONFIG_USER_ONLY)
+        Int128 *haddr = g2h(addr);
+#ifndef HOST_WORDS_BIGENDIAN
+        cmpv = bswap128(cmpv);
+        newv = bswap128(newv);
+#endif
+        success = __atomic_compare_exchange_16(haddr, &cmpv, newv, false,
+                                               __ATOMIC_SEQ_CST,
+                                               __ATOMIC_SEQ_CST);
+#else
+        int mem_idx = cpu_mmu_index(env, false);
+        TCGMemOpIdx oi = make_memop_idx(MO_BEQ | MO_ALIGN_16, mem_idx);
+        oldv = helper_atomic_cmpxchgo_be_mmu(env, addr, cmpv, newv, oi, ra);
+        success = int128_eq(oldv, cmpv);
+#endif
+    } else {
+        uint64_t o0, o1;
+
+#ifdef CONFIG_USER_ONLY
+        /* ??? Enforce alignment.  */
+        uint64_t *haddr = g2h(addr);
+        o1 = ldq_be_p(haddr + 0);
+        o0 = ldq_be_p(haddr + 1);
+        oldv = int128_make128(o0, o1);
+
+        success = int128_eq(oldv, cmpv);
+        if (success) {
+            stq_be_p(haddr + 0, int128_gethi(newv));
+            stq_be_p(haddr + 8, int128_getlo(newv));
+        }
+#else
+        int mem_idx = cpu_mmu_index(env, false);
+        TCGMemOpIdx oi0 = make_memop_idx(MO_BEQ | MO_ALIGN_16, mem_idx);
+        TCGMemOpIdx oi1 = make_memop_idx(MO_BEQ, mem_idx);
+
+        o1 = helper_be_ldq_mmu(env, addr + 0, oi0, ra);
+        o0 = helper_be_ldq_mmu(env, addr + 8, oi1, ra);
+        oldv = int128_make128(o0, o1);
+
+        success = int128_eq(oldv, cmpv);
+        if (success) {
+            helper_be_stq_mmu(env, addr + 0, int128_gethi(newv), oi1, ra);
+            helper_be_stq_mmu(env, addr + 8, int128_getlo(newv), oi1, ra);
+        }
+#endif
+    }
+
+    return !success;
+}
diff --git a/target-arm/helper-a64.h b/target-arm/helper-a64.h
index 1d3d10f..dd32000 100644
--- a/target-arm/helper-a64.h
+++ b/target-arm/helper-a64.h
@@ -46,3 +46,5 @@ DEF_HELPER_FLAGS_2(frecpx_f32, TCG_CALL_NO_RWG, f32, f32, ptr)
 DEF_HELPER_FLAGS_2(fcvtx_f64_to_f32, TCG_CALL_NO_RWG, f32, f64, env)
 DEF_HELPER_FLAGS_3(crc32_64, TCG_CALL_NO_RWG_SE, i64, i64, i64, i32)
 DEF_HELPER_FLAGS_3(crc32c_64, TCG_CALL_NO_RWG_SE, i64, i64, i64, i32)
+DEF_HELPER_FLAGS_4(paired_cmpxchg64_le, TCG_CALL_NO_WG, i64, env, i64, i64, i64)
+DEF_HELPER_FLAGS_4(paired_cmpxchg64_be, TCG_CALL_NO_WG, i64, env, i64, i64, i64)
diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index f5e29d2..450c359 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -1754,37 +1754,41 @@ static void disas_b_exc_sys(DisasContext *s, uint32_t insn)
     }
 }
 
-/*
- * Load/Store exclusive instructions are implemented by remembering
- * the value/address loaded, and seeing if these are the same
- * when the store is performed. This is not actually the architecturally
- * mandated semantics, but it works for typical guest code sequences
- * and avoids having to monitor regular stores.
- *
- * In system emulation mode only one CPU will be running at once, so
- * this sequence is effectively atomic.  In user emulation mode we
- * throw an exception and handle the atomic operation elsewhere.
- */
 static void gen_load_exclusive(DisasContext *s, int rt, int rt2,
                                TCGv_i64 addr, int size, bool is_pair)
 {
     TCGv_i64 tmp = tcg_temp_new_i64();
-    TCGMemOp memop = s->be_data + size;
+    TCGMemOp be = s->be_data;
 
     g_assert(size <= 3);
-    tcg_gen_qemu_ld_i64(tmp, addr, get_mem_index(s), memop);
-
     if (is_pair) {
-        TCGv_i64 addr2 = tcg_temp_new_i64();
         TCGv_i64 hitmp = tcg_temp_new_i64();
 
-        g_assert(size >= 2);
-        tcg_gen_addi_i64(addr2, addr, 1 << size);
-        tcg_gen_qemu_ld_i64(hitmp, addr2, get_mem_index(s), memop);
-        tcg_temp_free_i64(addr2);
+        if (size == 3) {
+            TCGv_i64 addr2 = tcg_temp_new_i64();
+
+            tcg_gen_qemu_ld_i64(tmp, addr, get_mem_index(s),
+                                MO_64 | MO_ALIGN_16 | be);
+            tcg_gen_addi_i64(addr2, addr, 8);
+            tcg_gen_qemu_ld_i64(hitmp, addr2, get_mem_index(s),
+                                MO_64 | MO_ALIGN | be);
+            tcg_temp_free_i64(addr2);
+        } else {
+            g_assert(size == 2);
+            tcg_gen_qemu_ld_i64(tmp, addr, get_mem_index(s),
+                                MO_64 | MO_ALIGN | be);
+            if (be == MO_LE) {
+                tcg_gen_extr32_i64(tmp, hitmp, tmp);
+            } else {
+                tcg_gen_extr32_i64(hitmp, tmp, tmp);
+            }
+        }
+
         tcg_gen_mov_i64(cpu_exclusive_high, hitmp);
         tcg_gen_mov_i64(cpu_reg(s, rt2), hitmp);
         tcg_temp_free_i64(hitmp);
+    } else {
+        tcg_gen_qemu_ld_i64(tmp, addr, get_mem_index(s), size | MO_ALIGN | be);
     }
 
     tcg_gen_mov_i64(cpu_exclusive_val, tmp);
@@ -1794,16 +1798,6 @@ static void gen_load_exclusive(DisasContext *s, int rt, int rt2,
     tcg_gen_mov_i64(cpu_exclusive_addr, addr);
 }
 
-#ifdef CONFIG_USER_ONLY
-static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2,
-                                TCGv_i64 addr, int size, int is_pair)
-{
-    tcg_gen_mov_i64(cpu_exclusive_test, addr);
-    tcg_gen_movi_i32(cpu_exclusive_info,
-                     size | is_pair << 2 | (rd << 4) | (rt << 9) | (rt2 << 14));
-    gen_exception_internal_insn(s, 4, EXCP_STREX);
-}
-#else
 static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2,
                                 TCGv_i64 inaddr, int size, int is_pair)
 {
@@ -1831,46 +1825,42 @@ static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2,
     tcg_gen_brcond_i64(TCG_COND_NE, addr, cpu_exclusive_addr, fail_label);
 
     tmp = tcg_temp_new_i64();
-    tcg_gen_qemu_ld_i64(tmp, addr, get_mem_index(s), s->be_data + size);
-    tcg_gen_brcond_i64(TCG_COND_NE, tmp, cpu_exclusive_val, fail_label);
-    tcg_temp_free_i64(tmp);
-
-    if (is_pair) {
-        TCGv_i64 addrhi = tcg_temp_new_i64();
-        TCGv_i64 tmphi = tcg_temp_new_i64();
-
-        tcg_gen_addi_i64(addrhi, addr, 1 << size);
-        tcg_gen_qemu_ld_i64(tmphi, addrhi, get_mem_index(s),
-                            s->be_data + size);
-        tcg_gen_brcond_i64(TCG_COND_NE, tmphi, cpu_exclusive_high, fail_label);
-
-        tcg_temp_free_i64(tmphi);
-        tcg_temp_free_i64(addrhi);
-    }
-
-    /* We seem to still have the exclusive monitor, so do the store */
-    tcg_gen_qemu_st_i64(cpu_reg(s, rt), addr, get_mem_index(s),
-                        s->be_data + size);
     if (is_pair) {
-        TCGv_i64 addrhi = tcg_temp_new_i64();
-
-        tcg_gen_addi_i64(addrhi, addr, 1 << size);
-        tcg_gen_qemu_st_i64(cpu_reg(s, rt2), addrhi,
-                            get_mem_index(s), s->be_data + size);
-        tcg_temp_free_i64(addrhi);
+        if (size == 2) {
+            TCGv_i64 val = tcg_temp_new_i64();
+            tcg_gen_concat32_i64(tmp, cpu_reg(s, rt), cpu_reg(s, rt2));
+            tcg_gen_concat32_i64(val, cpu_exclusive_val, cpu_exclusive_high);
+            tcg_gen_atomic_cmpxchg_i64(tmp, addr, val, tmp,
+                                       get_mem_index(s),
+                                       size | MO_ALIGN | s->be_data);
+            tcg_gen_setcond_i64(TCG_COND_NE, tmp, tmp, val);
+            tcg_temp_free_i64(val);
+        } else if (s->be_data == MO_LE) {
+            gen_helper_paired_cmpxchg64_le(tmp, cpu_env, addr, cpu_reg(s, rt),
+                                           cpu_reg(s, rt2));
+        } else {
+            gen_helper_paired_cmpxchg64_be(tmp, cpu_env, addr, cpu_reg(s, rt),
+                                           cpu_reg(s, rt2));
+        }
+    } else {
+        TCGv_i64 val = cpu_reg(s, rt);
+        tcg_gen_atomic_cmpxchg_i64(tmp, addr, cpu_exclusive_val, val,
+                                   get_mem_index(s),
+                                   size | MO_ALIGN | s->be_data);
+        tcg_gen_setcond_i64(TCG_COND_NE, tmp, tmp, cpu_exclusive_val);
     }
 
     tcg_temp_free_i64(addr);
 
-    tcg_gen_movi_i64(cpu_reg(s, rd), 0);
+    tcg_gen_mov_i64(cpu_reg(s, rd), tmp);
+    tcg_temp_free_i64(tmp);
     tcg_gen_br(done_label);
+
     gen_set_label(fail_label);
     tcg_gen_movi_i64(cpu_reg(s, rd), 1);
     gen_set_label(done_label);
     tcg_gen_movi_i64(cpu_exclusive_addr, -1);
-
 }
-#endif
 
 /* Update the Sixty-Four bit (SF) registersize. This logic is derived
  * from the ARMv8 specs for LDR (Shared decode for all encodings).
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH v2 25/27] linux-user: remove handling of ARM's EXCP_STREX
  2016-07-01 17:04 [Qemu-devel] [PATCH v2 00/27] cmpxchg-based emulation of atomics Richard Henderson
                   ` (22 preceding siblings ...)
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 24/27] target-arm: emulate aarch64's LL/SC using cmpxchg helpers Richard Henderson
@ 2016-07-01 17:04 ` Richard Henderson
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 26/27] linux-user: remove handling of aarch64's EXCP_STREX Richard Henderson
                   ` (3 subsequent siblings)
  27 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2016-07-01 17:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota, alex.bennee, pbonzini, peter.maydell, serge.fdrv

From: "Emilio G. Cota" <cota@braap.org>

The exception is not emitted anymore.

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1467054136-10430-29-git-send-email-cota@braap.org>
---
 linux-user/main.c | 93 -------------------------------------------------------
 1 file changed, 93 deletions(-)

diff --git a/linux-user/main.c b/linux-user/main.c
index 54df300..f4fc460 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -648,94 +648,6 @@ do_kernel_trap(CPUARMState *env)
     return 0;
 }
 
-/* Store exclusive handling for AArch32 */
-static int do_strex(CPUARMState *env)
-{
-    uint64_t val;
-    int size;
-    int rc = 1;
-    int segv = 0;
-    uint32_t addr;
-    start_exclusive();
-    if (env->exclusive_addr != env->exclusive_test) {
-        goto fail;
-    }
-    /* We know we're always AArch32 so the address is in uint32_t range
-     * unless it was the -1 exclusive-monitor-lost value (which won't
-     * match exclusive_test above).
-     */
-    assert(extract64(env->exclusive_addr, 32, 32) == 0);
-    addr = env->exclusive_addr;
-    size = env->exclusive_info & 0xf;
-    switch (size) {
-    case 0:
-        segv = get_user_u8(val, addr);
-        break;
-    case 1:
-        segv = get_user_data_u16(val, addr, env);
-        break;
-    case 2:
-    case 3:
-        segv = get_user_data_u32(val, addr, env);
-        break;
-    default:
-        abort();
-    }
-    if (segv) {
-        env->exception.vaddress = addr;
-        goto done;
-    }
-    if (size == 3) {
-        uint32_t valhi;
-        segv = get_user_data_u32(valhi, addr + 4, env);
-        if (segv) {
-            env->exception.vaddress = addr + 4;
-            goto done;
-        }
-        if (arm_cpu_bswap_data(env)) {
-            val = deposit64((uint64_t)valhi, 32, 32, val);
-        } else {
-            val = deposit64(val, 32, 32, valhi);
-        }
-    }
-    if (val != env->exclusive_val) {
-        goto fail;
-    }
-
-    val = env->regs[(env->exclusive_info >> 8) & 0xf];
-    switch (size) {
-    case 0:
-        segv = put_user_u8(val, addr);
-        break;
-    case 1:
-        segv = put_user_data_u16(val, addr, env);
-        break;
-    case 2:
-    case 3:
-        segv = put_user_data_u32(val, addr, env);
-        break;
-    }
-    if (segv) {
-        env->exception.vaddress = addr;
-        goto done;
-    }
-    if (size == 3) {
-        val = env->regs[(env->exclusive_info >> 12) & 0xf];
-        segv = put_user_data_u32(val, addr + 4, env);
-        if (segv) {
-            env->exception.vaddress = addr + 4;
-            goto done;
-        }
-    }
-    rc = 0;
-fail:
-    env->regs[15] += 4;
-    env->regs[(env->exclusive_info >> 4) & 0xf] = rc;
-done:
-    end_exclusive();
-    return segv;
-}
-
 void cpu_loop(CPUARMState *env)
 {
     CPUState *cs = CPU(arm_env_get_cpu(env));
@@ -905,11 +817,6 @@ void cpu_loop(CPUARMState *env)
         case EXCP_INTERRUPT:
             /* just indicate that signals should be handled asap */
             break;
-        case EXCP_STREX:
-            if (!do_strex(env)) {
-                break;
-            }
-            /* fall through for segv */
         case EXCP_PREFETCH_ABORT:
         case EXCP_DATA_ABORT:
             addr = env->exception.vaddress;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH v2 26/27] linux-user: remove handling of aarch64's EXCP_STREX
  2016-07-01 17:04 [Qemu-devel] [PATCH v2 00/27] cmpxchg-based emulation of atomics Richard Henderson
                   ` (23 preceding siblings ...)
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 25/27] linux-user: remove handling of ARM's EXCP_STREX Richard Henderson
@ 2016-07-01 17:04 ` Richard Henderson
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 27/27] target-arm: remove EXCP_STREX + cpu_exclusive_{test, info} Richard Henderson
                   ` (2 subsequent siblings)
  27 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2016-07-01 17:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota, alex.bennee, pbonzini, peter.maydell, serge.fdrv

From: "Emilio G. Cota" <cota@braap.org>

The exception is not emitted anymore.

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1467054136-10430-30-git-send-email-cota@braap.org>
---
 linux-user/main.c | 125 ------------------------------------------------------
 1 file changed, 125 deletions(-)

diff --git a/linux-user/main.c b/linux-user/main.c
index f4fc460..f2f7422 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -865,124 +865,6 @@ void cpu_loop(CPUARMState *env)
 
 #else
 
-/*
- * Handle AArch64 store-release exclusive
- *
- * rs = gets the status result of store exclusive
- * rt = is the register that is stored
- * rt2 = is the second register store (in STP)
- *
- */
-static int do_strex_a64(CPUARMState *env)
-{
-    uint64_t val;
-    int size;
-    bool is_pair;
-    int rc = 1;
-    int segv = 0;
-    uint64_t addr;
-    int rs, rt, rt2;
-
-    start_exclusive();
-    /* size | is_pair << 2 | (rs << 4) | (rt << 9) | (rt2 << 14)); */
-    size = extract32(env->exclusive_info, 0, 2);
-    is_pair = extract32(env->exclusive_info, 2, 1);
-    rs = extract32(env->exclusive_info, 4, 5);
-    rt = extract32(env->exclusive_info, 9, 5);
-    rt2 = extract32(env->exclusive_info, 14, 5);
-
-    addr = env->exclusive_addr;
-
-    if (addr != env->exclusive_test) {
-        goto finish;
-    }
-
-    switch (size) {
-    case 0:
-        segv = get_user_u8(val, addr);
-        break;
-    case 1:
-        segv = get_user_u16(val, addr);
-        break;
-    case 2:
-        segv = get_user_u32(val, addr);
-        break;
-    case 3:
-        segv = get_user_u64(val, addr);
-        break;
-    default:
-        abort();
-    }
-    if (segv) {
-        env->exception.vaddress = addr;
-        goto error;
-    }
-    if (val != env->exclusive_val) {
-        goto finish;
-    }
-    if (is_pair) {
-        if (size == 2) {
-            segv = get_user_u32(val, addr + 4);
-        } else {
-            segv = get_user_u64(val, addr + 8);
-        }
-        if (segv) {
-            env->exception.vaddress = addr + (size == 2 ? 4 : 8);
-            goto error;
-        }
-        if (val != env->exclusive_high) {
-            goto finish;
-        }
-    }
-    /* handle the zero register */
-    val = rt == 31 ? 0 : env->xregs[rt];
-    switch (size) {
-    case 0:
-        segv = put_user_u8(val, addr);
-        break;
-    case 1:
-        segv = put_user_u16(val, addr);
-        break;
-    case 2:
-        segv = put_user_u32(val, addr);
-        break;
-    case 3:
-        segv = put_user_u64(val, addr);
-        break;
-    }
-    if (segv) {
-        goto error;
-    }
-    if (is_pair) {
-        /* handle the zero register */
-        val = rt2 == 31 ? 0 : env->xregs[rt2];
-        if (size == 2) {
-            segv = put_user_u32(val, addr + 4);
-        } else {
-            segv = put_user_u64(val, addr + 8);
-        }
-        if (segv) {
-            env->exception.vaddress = addr + (size == 2 ? 4 : 8);
-            goto error;
-        }
-    }
-    rc = 0;
-finish:
-    env->pc += 4;
-    /* rs == 31 encodes a write to the ZR, thus throwing away
-     * the status return. This is rather silly but valid.
-     */
-    if (rs < 31) {
-        env->xregs[rs] = rc;
-    }
-error:
-    /* instruction faulted, PC does not advance */
-    /* either way a strex releases any exclusive lock we have */
-    env->exclusive_addr = -1;
-    end_exclusive();
-    return segv;
-}
-
 /* AArch64 main loop */
 void cpu_loop(CPUARMState *env)
 {
@@ -1023,11 +905,6 @@ void cpu_loop(CPUARMState *env)
             info._sifields._sigfault._addr = env->pc;
             queue_signal(env, info.si_signo, &info);
             break;
-        case EXCP_STREX:
-            if (!do_strex_a64(env)) {
-                break;
-            }
-            /* fall through for segv */
         case EXCP_PREFETCH_ABORT:
         case EXCP_DATA_ABORT:
             info.si_signo = TARGET_SIGSEGV;
@@ -1063,8 +940,6 @@ void cpu_loop(CPUARMState *env)
         process_pending_signals(env);
         /* Exception return on AArch64 always clears the exclusive monitor,
          * so any return to running guest code implies this.
-         * A strex (successful or otherwise) also clears the monitor, so
-         * we don't need to specialcase EXCP_STREX.
          */
         env->exclusive_addr = -1;
     }
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH v2 27/27] target-arm: remove EXCP_STREX + cpu_exclusive_{test, info}
  2016-07-01 17:04 [Qemu-devel] [PATCH v2 00/27] cmpxchg-based emulation of atomics Richard Henderson
                   ` (24 preceding siblings ...)
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 26/27] linux-user: remove handling of aarch64's EXCP_STREX Richard Henderson
@ 2016-07-01 17:04 ` Richard Henderson
  2016-07-01 17:23 ` [Qemu-devel] [PATCH v2 00/27] cmpxchg-based emulation of atomics Richard Henderson
  2016-07-08  2:53 ` Emilio G. Cota
  27 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2016-07-01 17:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota, alex.bennee, pbonzini, peter.maydell, serge.fdrv

From: "Emilio G. Cota" <cota@braap.org>

The exception is not emitted anymore; remove it and the associated
TCG variables.

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1467054136-10430-31-git-send-email-cota@braap.org>
---
 target-arm/cpu.h       | 17 ++++++-----------
 target-arm/internals.h |  4 +---
 target-arm/translate.c | 10 ----------
 target-arm/translate.h |  4 ----
 4 files changed, 7 insertions(+), 28 deletions(-)

diff --git a/target-arm/cpu.h b/target-arm/cpu.h
index 7938ddc..0b2ed28 100644
--- a/target-arm/cpu.h
+++ b/target-arm/cpu.h
@@ -46,13 +46,12 @@
 #define EXCP_BKPT            7
 #define EXCP_EXCEPTION_EXIT  8   /* Return from v7M exception.  */
 #define EXCP_KERNEL_TRAP     9   /* Jumped to kernel code page.  */
-#define EXCP_STREX          10
-#define EXCP_HVC            11   /* HyperVisor Call */
-#define EXCP_HYP_TRAP       12
-#define EXCP_SMC            13   /* Secure Monitor Call */
-#define EXCP_VIRQ           14
-#define EXCP_VFIQ           15
-#define EXCP_SEMIHOST       16   /* semihosting call (A64 only) */
+#define EXCP_HVC            10   /* HyperVisor Call */
+#define EXCP_HYP_TRAP       11
+#define EXCP_SMC            12   /* Secure Monitor Call */
+#define EXCP_VIRQ           13
+#define EXCP_VFIQ           14
+#define EXCP_SEMIHOST       15   /* semihosting call (A64 only) */
 
 #define ARMV7M_EXCP_RESET   1
 #define ARMV7M_EXCP_NMI     2
@@ -475,10 +474,6 @@ typedef struct CPUARMState {
     uint64_t exclusive_addr;
     uint64_t exclusive_val;
     uint64_t exclusive_high;
-#if defined(CONFIG_USER_ONLY)
-    uint64_t exclusive_test;
-    uint32_t exclusive_info;
-#endif
 
     /* iwMMXt coprocessor state.  */
     struct {
diff --git a/target-arm/internals.h b/target-arm/internals.h
index 466be0b..5ab3b28 100644
--- a/target-arm/internals.h
+++ b/target-arm/internals.h
@@ -46,8 +46,7 @@ static inline bool excp_is_internal(int excp)
         || excp == EXCP_HALTED
         || excp == EXCP_EXCEPTION_EXIT
         || excp == EXCP_KERNEL_TRAP
-        || excp == EXCP_SEMIHOST
-        || excp == EXCP_STREX;
+        || excp == EXCP_SEMIHOST;
 }
 
 /* Exception names for debug logging; note that not all of these
@@ -63,7 +62,6 @@ static const char * const excnames[] = {
     [EXCP_BKPT] = "Breakpoint",
     [EXCP_EXCEPTION_EXIT] = "QEMU v7M exception exit",
     [EXCP_KERNEL_TRAP] = "QEMU intercept of kernel commpage",
-    [EXCP_STREX] = "QEMU intercept of STREX",
     [EXCP_HVC] = "Hypervisor Call",
     [EXCP_HYP_TRAP] = "Hypervisor Trap",
     [EXCP_SMC] = "Secure Monitor Call",
diff --git a/target-arm/translate.c b/target-arm/translate.c
index 2b3c34f..e8e8502 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -64,10 +64,6 @@ static TCGv_i32 cpu_R[16];
 TCGv_i32 cpu_CF, cpu_NF, cpu_VF, cpu_ZF;
 TCGv_i64 cpu_exclusive_addr;
 TCGv_i64 cpu_exclusive_val;
-#ifdef CONFIG_USER_ONLY
-TCGv_i64 cpu_exclusive_test;
-TCGv_i32 cpu_exclusive_info;
-#endif
 
 /* FIXME:  These should be removed.  */
 static TCGv_i32 cpu_F0s, cpu_F1s;
@@ -101,12 +97,6 @@ void arm_translate_init(void)
         offsetof(CPUARMState, exclusive_addr), "exclusive_addr");
     cpu_exclusive_val = tcg_global_mem_new_i64(cpu_env,
         offsetof(CPUARMState, exclusive_val), "exclusive_val");
-#ifdef CONFIG_USER_ONLY
-    cpu_exclusive_test = tcg_global_mem_new_i64(cpu_env,
-        offsetof(CPUARMState, exclusive_test), "exclusive_test");
-    cpu_exclusive_info = tcg_global_mem_new_i32(cpu_env,
-        offsetof(CPUARMState, exclusive_info), "exclusive_info");
-#endif
 
     a64_translate_init();
 }
diff --git a/target-arm/translate.h b/target-arm/translate.h
index dbd7ac8..d4e205e 100644
--- a/target-arm/translate.h
+++ b/target-arm/translate.h
@@ -77,10 +77,6 @@ extern TCGv_env cpu_env;
 extern TCGv_i32 cpu_NF, cpu_ZF, cpu_CF, cpu_VF;
 extern TCGv_i64 cpu_exclusive_addr;
 extern TCGv_i64 cpu_exclusive_val;
-#ifdef CONFIG_USER_ONLY
-extern TCGv_i64 cpu_exclusive_test;
-extern TCGv_i32 cpu_exclusive_info;
-#endif
 
 static inline int arm_dc_feature(DisasContext *dc, int feature)
 {
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/27] cmpxchg-based emulation of atomics
  2016-07-01 17:04 [Qemu-devel] [PATCH v2 00/27] cmpxchg-based emulation of atomics Richard Henderson
                   ` (25 preceding siblings ...)
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 27/27] target-arm: remove EXCP_STREX + cpu_exclusive_{test, info} Richard Henderson
@ 2016-07-01 17:23 ` Richard Henderson
  2016-07-08  2:53 ` Emilio G. Cota
  27 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2016-07-01 17:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, serge.fdrv, cota, alex.bennee, peter.maydell

On 07/01/2016 10:04 AM, Richard Henderson wrote:
> I spent a couple evenings this week tweaking Emilio's patch set.
> 
> The first major change is to "qemu/int128.h", so that we can use
> that type in the context of a 16-byte cmpxchg.  I have yet to teach
> TCG code generation about this type, so it's really only usable
> from other helper functions.  But that's still an improvement over
> having to return two uint64_t by reference.
> 
> The second major change is to funnel atomic operation generation
> through functions in tcg-op.c.  There we can test whether or not
> we're generating code in a parallel context and require atomic
> operations.  This also centralizes the helper functions so that we
> don't have to have the same sets in every target.
> 
> The third major change is providing a mechanism by which we can
> trap on atomic operations that we do not support, exit the cpu loop,
> stop the world, and then re-execute the instruction in a serial context.
> This is obviously something that will need to be filled in further
> as MTTCG progresses.
> 
> This minimally tested, but it is good enough to boot Fedora 24 x86-64,
> even with the softmmu single-step stubbed out.  Perhaps unsurprisingly,
> Fedora does not attempt an unaligned atomic operation.

I should have mentioned -- this was based on my tcg-next branch, for which I
just sent a pull request (in particular, Sergey's alignment improvement patch).

I pushed my patchset to

  git://github.com/rth7680/qemu.git atomic-2

for ease of browsing.


r~

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH v2 06/27] int128: Use complex numbers if advisable
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 06/27] int128: Use complex numbers if advisable Richard Henderson
@ 2016-07-04 11:51   ` Paolo Bonzini
  2016-07-04 12:07   ` Peter Maydell
  1 sibling, 0 replies; 48+ messages in thread
From: Paolo Bonzini @ 2016-07-04 11:51 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: cota, alex.bennee, peter.maydell, serge.fdrv



On 01/07/2016 19:04, Richard Henderson wrote:
> If __int128 is not supported, prefer a base type that is
> returned in registers rather than memory.
> 
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  include/qemu/int128.h | 110 +++++++++++++++++++++++++++++++-------------------
>  1 file changed, 69 insertions(+), 41 deletions(-)
> 
> diff --git a/include/qemu/int128.h b/include/qemu/int128.h
> index 67440fa..ab67275 100644
> --- a/include/qemu/int128.h
> +++ b/include/qemu/int128.h
> @@ -139,27 +139,37 @@ static inline void int128_subfrom(Int128 *a, Int128 b)
>  
>  #else /* !CONFIG_INT128 */
>  
> -typedef struct Int128 Int128;
> +/* Here we are catering to the ABI of the host.  If the host returns
> +   64-bit complex in registers, but the 128-bit structure in memory,
> +   then choose the complex representation.  */
> +#if defined(__GNUC__) \
> +    && (defined(__powerpc__) || defined(__sparc__)) \
> +    && !defined(CONFIG_TCG_INTERPRETER)
> +typedef _Complex unsigned long long Int128;

Is there any reason not to do that unconditionally?

Paolo

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH v2 06/27] int128: Use complex numbers if advisable
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 06/27] int128: Use complex numbers if advisable Richard Henderson
  2016-07-04 11:51   ` Paolo Bonzini
@ 2016-07-04 12:07   ` Peter Maydell
  1 sibling, 0 replies; 48+ messages in thread
From: Peter Maydell @ 2016-07-04 12:07 UTC (permalink / raw)
  To: Richard Henderson
  Cc: QEMU Developers, Emilio G. Cota, Alex Bennée, Paolo Bonzini,
	Sergey Fedorov

On 1 July 2016 at 18:04, Richard Henderson <rth@twiddle.net> wrote:
> If __int128 is not supported, prefer a base type that is
> returned in registers rather than memory.

So which host architectures does this improve?
128 bit integers are nothing to do with complex numbers,
so we ought to have a strong justification for abusing
the _Complex type.

The ifdef suggests this only helps ppc and sparc, which
to my mind is not a sufficient justification.

If there's much benefit from doing this then it would be
better for the compiler on those architectures to support
int128 as a proper native type returned in registers.

> +#if defined(__GNUC__) \
> +    && (defined(__powerpc__) || defined(__sparc__)) \
> +    && !defined(CONFIG_TCG_INTERPRETER)

Why the CONFIG_TCG_INTERPRETER clause ?

thanks
-- PMM

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/27] cmpxchg-based emulation of atomics
  2016-07-01 17:04 [Qemu-devel] [PATCH v2 00/27] cmpxchg-based emulation of atomics Richard Henderson
                   ` (26 preceding siblings ...)
  2016-07-01 17:23 ` [Qemu-devel] [PATCH v2 00/27] cmpxchg-based emulation of atomics Richard Henderson
@ 2016-07-08  2:53 ` Emilio G. Cota
  27 siblings, 0 replies; 48+ messages in thread
From: Emilio G. Cota @ 2016-07-08  2:53 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-devel, alex.bennee, pbonzini, peter.maydell, serge.fdrv

On Fri, Jul 01, 2016 at 10:04:26 -0700, Richard Henderson wrote:
> I spent a couple evenings this week tweaking Emilio's patch set.
> 
> The first major change is to "qemu/int128.h", so that we can use
> that type in the context of a 16-byte cmpxchg.  I have yet to teach
> TCG code generation about this type, so it's really only usable
> from other helper functions.  But that's still an improvement over
> having to return two uint64_t by reference.
> 
> The second major change is to funnel atomic operation generation
> through functions in tcg-op.c.  There we can test whether or not
> we're generating code in a parallel context and require atomic
> operations.  This also centralizes the helper functions so that we
> don't have to have the same sets in every target.
> 
> The third major change is providing a mechanism by which we can
> trap on atomic operations that we do not support, exit the cpu loop,
> stop the world, and then re-execute the instruction in a serial context.
> This is obviously something that will need to be filled in further
> as MTTCG progresses.
> 
> This minimally tested, but it is good enough to boot Fedora 24 x86-64,
> even with the softmmu single-step stubbed out.  Perhaps unsurprisingly,
> Fedora does not attempt an unaligned atomic operation.
> 
> Comments?

I *really* like this patchset. BTW many patches are significantly
improved over my original ones, so feel free to attribute authorship
to yourself for those.

I'll now address in separate emails the very few issues I found.

Thanks,

		Emilio

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH v2 10/27] tcg: Add atomic128 helpers
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 10/27] tcg: Add atomic128 helpers Richard Henderson
@ 2016-07-08  3:00   ` Emilio G. Cota
  2016-07-08  5:26     ` Richard Henderson
  2016-08-11 10:02   ` Alex Bennée
  1 sibling, 1 reply; 48+ messages in thread
From: Emilio G. Cota @ 2016-07-08  3:00 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-devel, alex.bennee, pbonzini, peter.maydell, serge.fdrv

On Fri, Jul 01, 2016 at 10:04:36 -0700, Richard Henderson wrote:
> Force the use of cmpxchg16b on x86_64.
> 
> Wikipedia suggests that only very old AMD64 (circa 2004) did not have
> this instruction.  Further, it's required by Windows 8 so no new cpus
> will ever omit it.
> 
> If we truely care about these, then we could check this at startup time
> and then avoid executing paths that use it.
> 
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  configure             |  29 ++++++++++++-
>  cputlb.c              |   6 +++
>  include/qemu/int128.h |   6 +++
>  softmmu_template.h    | 110 +++++++++++++++++++++++++++++++++++++-------------
>  tcg/tcg.h             |  22 ++++++++++
>  5 files changed, 144 insertions(+), 29 deletions(-)
> 
> diff --git a/configure b/configure
> index 59ea124..586abd6 100755
> --- a/configure
> +++ b/configure
> @@ -1201,7 +1201,10 @@ case "$cpu" in
>             cc_i386='$(CC) -m32'
>             ;;
>      x86_64)
> -           CPU_CFLAGS="-m64"
> +           # ??? Only extremely old AMD cpus do not have cmpxchg16b.
> +           # If we truly care, we should simply detect this case at
> +           # runtime and generate the fallback to serial emulation.
> +           CPU_CFLAGS="-m64 -mcx16"
>             LDFLAGS="-m64 $LDFLAGS"
>             cc_i386='$(CC) -m32'
>             ;;
> @@ -4434,6 +4437,26 @@ if compile_prog "" "" ; then
>      int128=yes
>  fi
>  
> +#########################################
> +# See if 128-bit atomic operations are supported.
> +
> +atomic128=no
> +if test "$int128" = "yes"; then
> +  cat > $TMPC << EOF
> +int main(void)
> +{
> +  unsigned __int128 x = 0, y = 0;
> +  y = __atomic_load_16(&x, 0);
> +  __atomic_store_16(&x, y, 0);
> +  __atomic_compare_exchange_16(&x, &y, x, 0, 0, 0);
> +  return 0;
> +}
> +EOF
> +  if compile_prog "" "" ; then
> +    atomic128=yes
> +  fi
> +fi

Would it be correct to just trust that gcc is doing the right thing?
As in this delta over the patch:

--- a/configure
+++ b/configure
@@ -1201,10 +1201,7 @@ case "$cpu" in
            cc_i386='$(CC) -m32'
            ;;
     x86_64)
-           # ??? Only extremely old AMD cpus do not have cmpxchg16b.
-           # If we truly care, we should simply detect this case at
-           # runtime and generate the fallback to serial emulation.
-           CPU_CFLAGS="-m64 -mcx16"
+           CPU_CFLAGS="-m64"
            LDFLAGS="-m64 $LDFLAGS"
            cc_i386='$(CC) -m32'
            ;;
@@ -4454,6 +4451,10 @@ int main(void)
 EOF
   if compile_prog "" "" ; then
     atomic128=yes
+  elif compile_prog "-mcx16" "" ; then
+    QEMU_CFLAGS="$QEMU_CFLAGS -mcx16"
+    EXTRA_CFLAGS="$EXTRA_CFLAGS -mcx16"
+    atomic128=yes
   fi
 fi

I might be missing other CFLAGS to be set, but the idea is that
if a program with __atomic[..]_16 links, then we should be OK.

This way we would handle correctly even those old AMD cpus,
and would also handle non-x86 architectures that implement
cmpxchg16.

		Emilio

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH v2 11/27] target-i386: emulate LOCK'ed cmpxchg using cmpxchg helpers
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 11/27] target-i386: emulate LOCK'ed cmpxchg using cmpxchg helpers Richard Henderson
@ 2016-07-08  3:08   ` Emilio G. Cota
  2016-07-08  3:19     ` Emilio G. Cota
  0 siblings, 1 reply; 48+ messages in thread
From: Emilio G. Cota @ 2016-07-08  3:08 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-devel, alex.bennee, pbonzini, peter.maydell, serge.fdrv

On Fri, Jul 01, 2016 at 10:04:37 -0700, Richard Henderson wrote:
> From: "Emilio G. Cota" <cota@braap.org>
> 
> The diff here is uglier than necessary. All this does is to turn
> 
> FOO
> 
> into:
> 
> if (s->prefix & PREFIX_LOCK) {
>   BAR
> } else {
>   FOO
> }
> 
> where FOO is the original implementation of an unlocked cmpxchg.
> 
> [rth: Adjust unlocked cmpxchg to use movcond instead of branches.
> Adjust helpers to use atomic helpers.]
> 
> Signed-off-by: Emilio G. Cota <cota@braap.org>
> Message-Id: <1467054136-10430-6-git-send-email-cota@braap.org>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  target-i386/mem_helper.c | 96 ++++++++++++++++++++++++++++++++++++++----------
>  target-i386/translate.c  | 87 +++++++++++++++++++++----------------------
>  2 files changed, 120 insertions(+), 63 deletions(-)
> 
> diff --git a/target-i386/mem_helper.c b/target-i386/mem_helper.c
> index c2f4769..5c0558f 100644
> --- a/target-i386/mem_helper.c
> +++ b/target-i386/mem_helper.c
> @@ -22,6 +22,8 @@
>  #include "exec/helper-proto.h"
>  #include "exec/exec-all.h"
>  #include "exec/cpu_ldst.h"
> +#include "qemu/int128.h"
> +#include "tcg.h"
>  
>  /* broken thread support */
>  
> @@ -58,20 +60,39 @@ void helper_lock_init(void)
>  
>  void helper_cmpxchg8b(CPUX86State *env, target_ulong a0)
>  {
> -    uint64_t d;
> +    uintptr_t ra = GETPC();
> +    uint64_t oldv, cmpv, newv;
>      int eflags;
>  
>      eflags = cpu_cc_compute_all(env, CC_OP);
> -    d = cpu_ldq_data_ra(env, a0, GETPC());
> -    if (d == (((uint64_t)env->regs[R_EDX] << 32) | (uint32_t)env->regs[R_EAX])) {
> -        cpu_stq_data_ra(env, a0, ((uint64_t)env->regs[R_ECX] << 32)
> -                                  | (uint32_t)env->regs[R_EBX], GETPC());
> -        eflags |= CC_Z;
> +
> +    cmpv = deposit64(env->regs[R_EAX], 32, 32, env->regs[R_EDX]);
> +    newv = deposit64(env->regs[R_EBX], 32, 32, env->regs[R_ECX]);
> +
> +    if (parallel_cpus) {

I think here we mean 'if (prefix_locked)', although prefix_locked isn't
here. This is why in my original patch I defined two helpers ('locked'
and 'unlocked'), otherwise we'll emulate cmpxchg atomically even when
the LOCK prefix wasn't there. Not a big deal, but could hide bugs.

> +#ifdef CONFIG_USER_ONLY
> +        uint64_t *haddr = g2h(a0);
> +        cmpv = cpu_to_le64(cmpv);
> +        newv = cpu_to_le64(newv);
> +        oldv = atomic_cmpxchg(haddr, cmpv, newv);
> +        oldv = le64_to_cpu(oldv);
> +#else
> +        int mem_idx = cpu_mmu_index(env, false);
> +        TCGMemOpIdx oi = make_memop_idx(MO_TEQ, mem_idx);
> +        oldv = helper_atomic_cmpxchgq_le_mmu(env, a0, cmpv, newv, oi, ra);
> +#endif
>      } else {
> +        oldv = cpu_ldq_data_ra(env, a0, ra);
> +        newv = (cmpv == oldv ? newv : oldv);
>          /* always do the store */
> -        cpu_stq_data_ra(env, a0, d, GETPC());
> -        env->regs[R_EDX] = (uint32_t)(d >> 32);
> -        env->regs[R_EAX] = (uint32_t)d;
> +        cpu_stq_data_ra(env, a0, newv, ra);
> +    }
> +
> +    if (oldv == cmpv) {
> +        eflags |= CC_Z;
> +    } else {
> +        env->regs[R_EAX] = (uint32_t)oldv;
> +        env->regs[R_EDX] = (uint32_t)(oldv >> 32);
>          eflags &= ~CC_Z;
>      }
>      CC_SRC = eflags;
> @@ -80,25 +101,60 @@ void helper_cmpxchg8b(CPUX86State *env, target_ulong a0)
>  #ifdef TARGET_X86_64
>  void helper_cmpxchg16b(CPUX86State *env, target_ulong a0)
>  {
> -    uint64_t d0, d1;
> +    uintptr_t ra = GETPC();
> +    Int128 oldv, cmpv, newv;
>      int eflags;
> +    bool success;
>  
>      if ((a0 & 0xf) != 0) {
>          raise_exception_ra(env, EXCP0D_GPF, GETPC());
>      }
>      eflags = cpu_cc_compute_all(env, CC_OP);
> -    d0 = cpu_ldq_data_ra(env, a0, GETPC());
> -    d1 = cpu_ldq_data_ra(env, a0 + 8, GETPC());
> -    if (d0 == env->regs[R_EAX] && d1 == env->regs[R_EDX]) {
> -        cpu_stq_data_ra(env, a0, env->regs[R_EBX], GETPC());
> -        cpu_stq_data_ra(env, a0 + 8, env->regs[R_ECX], GETPC());
> +
> +    cmpv = int128_make128(env->regs[R_EAX], env->regs[R_EDX]);
> +    newv = int128_make128(env->regs[R_EBX], env->regs[R_ECX]);
> +
> +    if (parallel_cpus) {

Ditto.

FWIW I tested the x86_64 bits with the all the ck_pr regression tests
in concurrencykit. Would be nice to get access to a big-endian
machine to test the byte-ordering bits -- I sent a request to the
GCC compile farm earlier today for this purpose.

		Emilio

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH v2 11/27] target-i386: emulate LOCK'ed cmpxchg using cmpxchg helpers
  2016-07-08  3:08   ` Emilio G. Cota
@ 2016-07-08  3:19     ` Emilio G. Cota
  0 siblings, 0 replies; 48+ messages in thread
From: Emilio G. Cota @ 2016-07-08  3:19 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-devel, alex.bennee, pbonzini, peter.maydell, serge.fdrv

On Thu, Jul 07, 2016 at 23:08:17 -0400, Emilio G. Cota wrote:
> On Fri, Jul 01, 2016 at 10:04:37 -0700, Richard Henderson wrote:
> > From: "Emilio G. Cota" <cota@braap.org>
> > 
> > The diff here is uglier than necessary. All this does is to turn
> > 
> > FOO
> > 
> > into:
> > 
> > if (s->prefix & PREFIX_LOCK) {
> >   BAR
> > } else {
> >   FOO
> > }
> > 
> > where FOO is the original implementation of an unlocked cmpxchg.
> > 
> > [rth: Adjust unlocked cmpxchg to use movcond instead of branches.
> > Adjust helpers to use atomic helpers.]
> > 
> > Signed-off-by: Emilio G. Cota <cota@braap.org>
> > Message-Id: <1467054136-10430-6-git-send-email-cota@braap.org>
> > Signed-off-by: Richard Henderson <rth@twiddle.net>
> > ---
> >  target-i386/mem_helper.c | 96 ++++++++++++++++++++++++++++++++++++++----------
> >  target-i386/translate.c  | 87 +++++++++++++++++++++----------------------
> >  2 files changed, 120 insertions(+), 63 deletions(-)
> > 
> > diff --git a/target-i386/mem_helper.c b/target-i386/mem_helper.c
> > index c2f4769..5c0558f 100644
> > --- a/target-i386/mem_helper.c
> > +++ b/target-i386/mem_helper.c
> > @@ -22,6 +22,8 @@
> >  #include "exec/helper-proto.h"
> >  #include "exec/exec-all.h"
> >  #include "exec/cpu_ldst.h"
> > +#include "qemu/int128.h"
> > +#include "tcg.h"
> >  
> >  /* broken thread support */
> >  
> > @@ -58,20 +60,39 @@ void helper_lock_init(void)
> >  
> >  void helper_cmpxchg8b(CPUX86State *env, target_ulong a0)
> >  {
> > -    uint64_t d;
> > +    uintptr_t ra = GETPC();
> > +    uint64_t oldv, cmpv, newv;
> >      int eflags;
> >  
> >      eflags = cpu_cc_compute_all(env, CC_OP);
> > -    d = cpu_ldq_data_ra(env, a0, GETPC());
> > -    if (d == (((uint64_t)env->regs[R_EDX] << 32) | (uint32_t)env->regs[R_EAX])) {
> > -        cpu_stq_data_ra(env, a0, ((uint64_t)env->regs[R_ECX] << 32)
> > -                                  | (uint32_t)env->regs[R_EBX], GETPC());
> > -        eflags |= CC_Z;
> > +
> > +    cmpv = deposit64(env->regs[R_EAX], 32, 32, env->regs[R_EDX]);
> > +    newv = deposit64(env->regs[R_EBX], 32, 32, env->regs[R_ECX]);
> > +
> > +    if (parallel_cpus) {
> 
> I think here we mean 'if (prefix_locked)', although prefix_locked isn't
> here. This is why in my original patch I defined two helpers ('locked'
> and 'unlocked'), otherwise we'll emulate cmpxchg atomically even when
> the LOCK prefix wasn't there. Not a big deal, but could hide bugs.

Correction to myself: we'd still need if (prefix_locked) for the whole
stop-the-world-and-do-non-atomically thing to work.

The concern that we might hide bugs by emulating non-locked cmpxchg
atomically persists. [ I wonder why one would ever use a non-locked
cmpxchg, though ]

		E.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH v2 24/27] target-arm: emulate aarch64's LL/SC using cmpxchg helpers
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 24/27] target-arm: emulate aarch64's LL/SC using cmpxchg helpers Richard Henderson
@ 2016-07-08  3:34   ` Emilio G. Cota
  0 siblings, 0 replies; 48+ messages in thread
From: Emilio G. Cota @ 2016-07-08  3:34 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-devel, alex.bennee, pbonzini, peter.maydell, serge.fdrv

On Fri, Jul 01, 2016 at 10:04:50 -0700, Richard Henderson wrote:
(snip)
> [rth: Rearrange 128-bit cmpxchg helper.  Enforce alignment on LL.]
> 
> Signed-off-by: Emilio G. Cota <cota@braap.org>
> Message-Id: <1467054136-10430-28-git-send-email-cota@braap.org>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  target-arm/helper-a64.c    | 135 +++++++++++++++++++++++++++++++++++++++++++++
>  target-arm/helper-a64.h    |   2 +
>  target-arm/translate-a64.c | 106 ++++++++++++++++-------------------
>  3 files changed, 185 insertions(+), 58 deletions(-)
> 
> diff --git a/target-arm/helper-a64.c b/target-arm/helper-a64.c
> index 41e48a4..d98d781 100644
> --- a/target-arm/helper-a64.c
> +++ b/target-arm/helper-a64.c
> @@ -27,6 +27,10 @@
>  #include "qemu/bitops.h"
>  #include "internals.h"
>  #include "qemu/crc32c.h"
> +#include "exec/exec-all.h"
> +#include "exec/cpu_ldst.h"
> +#include "qemu/int128.h"
> +#include "tcg.h"
>  #include <zlib.h> /* For crc32 */
>  
>  /* C2.4.7 Multiply and divide */
> @@ -444,3 +448,134 @@ uint64_t HELPER(crc32c_64)(uint64_t acc, uint64_t val, uint32_t bytes)
>      /* Linux crc32c converts the output to one's complement.  */
>      return crc32c(acc, buf, bytes) ^ 0xffffffff;
>  }
> +
> +/* Returns 0 on success; 1 otherwise.  */
> +uint64_t HELPER(paired_cmpxchg64_le)(CPUARMState *env, uint64_t addr,
> +                                     uint64_t new_lo, uint64_t new_hi)
> +{
> +#ifndef CONFIG_USER_ONLY
> +    uintptr_t ra = GETPC();
> +#endif

This ifdef breaks the compilation for user-mode, where we need the
retaddr when calling cpu_loop_exit_atomic. A possible fix would be:

-#ifndef CONFIG_USER_ONLY
+#if !defined(CONFIG_USER_ONLY) || !defined(CONFIG_ATOMIC128)

> +    Int128 oldv, cmpv, newv;
> +    bool success;
> +
> +    cmpv = int128_make128(env->exclusive_val, env->exclusive_high);
> +    newv = int128_make128(new_lo, new_hi);
> +
> +    if (parallel_cpus) {
> +#ifndef CONFIG_ATOMIC128
> +        cpu_loop_exit_atomic(ENV_GET_CPU(env), ra);
> +#elif defined(CONFIG_USER_ONLY)
> +        Int128 *haddr = g2h(addr);
> +#ifdef HOST_WORDS_BIGENDIAN
> +        cmpv = bswap128(cmpv);
> +        newv = bswap128(newv);
> +#endif
> +        success = __atomic_compare_exchange_16(haddr, &cmpv, newv, false,
> +                                               __ATOMIC_SEQ_CST,
> +                                               __ATOMIC_SEQ_CST);
> +#else
> +        int mem_idx = cpu_mmu_index(env, false);
> +        TCGMemOpIdx oi = make_memop_idx(MO_LEQ | MO_ALIGN_16, mem_idx);
> +        oldv = helper_atomic_cmpxchgo_le_mmu(env, addr, cmpv, newv, oi, ra);
> +        success = int128_eq(oldv, cmpv);
> +#endif
> +    } else {
> +        uint64_t o0, o1;
> +
> +#ifdef CONFIG_USER_ONLY
> +        /* ??? Enforce alignment.  */
> +        uint64_t *haddr = g2h(addr);
> +        o0 = ldq_le_p(haddr + 0);
> +        o1 = ldq_le_p(haddr + 1);
> +        oldv = int128_make128(o0, o1);
> +
> +        success = int128_eq(oldv, cmpv);
> +        if (success) {
> +            stq_le_p(haddr + 0, int128_getlo(newv));
> +            stq_le_p(haddr + 8, int128_gethi(newv));

haddr+1, as in ldq_le_p?

> +        }
> +#else
> +        int mem_idx = cpu_mmu_index(env, false);
> +        TCGMemOpIdx oi0 = make_memop_idx(MO_LEQ | MO_ALIGN_16, mem_idx);
> +        TCGMemOpIdx oi1 = make_memop_idx(MO_LEQ, mem_idx);
> +
> +        o0 = helper_le_ldq_mmu(env, addr + 0, oi0, ra);
> +        o1 = helper_le_ldq_mmu(env, addr + 8, oi1, ra);
> +        oldv = int128_make128(o0, o1);
> +
> +        success = int128_eq(oldv, cmpv);
> +        if (success) {
> +            helper_le_stq_mmu(env, addr + 0, int128_getlo(newv), oi1, ra);
> +            helper_le_stq_mmu(env, addr + 8, int128_gethi(newv), oi1, ra);
> +        }
> +#endif
> +    }
> +
> +    return !success;
> +}
> +
> +uint64_t HELPER(paired_cmpxchg64_be)(CPUARMState *env, uint64_t addr,
> +                                     uint64_t new_lo, uint64_t new_hi)
> +{
> +#ifndef CONFIG_USER_ONLY
> +    uintptr_t ra = GETPC();
> +#endif

Ditto.


> +    Int128 oldv, cmpv, newv;
> +    bool success;
> +
> +    cmpv = int128_make128(env->exclusive_val, env->exclusive_high);
> +    newv = int128_make128(new_lo, new_hi);
> +
> +    if (parallel_cpus) {
> +#ifndef CONFIG_ATOMIC128
> +        cpu_loop_exit_atomic(ENV_GET_CPU(env), ra);
> +#elif defined(CONFIG_USER_ONLY)
> +        Int128 *haddr = g2h(addr);
> +#ifndef HOST_WORDS_BIGENDIAN
> +        cmpv = bswap128(cmpv);
> +        newv = bswap128(newv);
> +#endif
> +        success = __atomic_compare_exchange_16(haddr, &cmpv, newv, false,
> +                                               __ATOMIC_SEQ_CST,
> +                                               __ATOMIC_SEQ_CST);
> +#else
> +        int mem_idx = cpu_mmu_index(env, false);
> +        TCGMemOpIdx oi = make_memop_idx(MO_BEQ | MO_ALIGN_16, mem_idx);
> +        oldv = helper_atomic_cmpxchgo_be_mmu(env, addr, cmpv, newv, oi, ra);
> +        success = int128_eq(oldv, cmpv);
> +#endif
> +    } else {
> +        uint64_t o0, o1;
> +
> +#ifdef CONFIG_USER_ONLY
> +        /* ??? Enforce alignment.  */
> +        uint64_t *haddr = g2h(addr);
> +        o1 = ldq_be_p(haddr + 0);
> +        o0 = ldq_be_p(haddr + 1);
> +        oldv = int128_make128(o0, o1);
> +
> +        success = int128_eq(oldv, cmpv);
> +        if (success) {
> +            stq_be_p(haddr + 0, int128_gethi(newv));
> +            stq_be_p(haddr + 8, int128_getlo(newv));

Ditto.

BTW I tested ck's ck_pr tests for x86_64-linux-user and aarch64-linux-user
on an aarch64 host, with parallel_cpus={0,1}. It works, but note
that ck doesn't generate paired ops, so those remain untested.

Thanks,

		E.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH v2 10/27] tcg: Add atomic128 helpers
  2016-07-08  3:00   ` Emilio G. Cota
@ 2016-07-08  5:26     ` Richard Henderson
  0 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2016-07-08  5:26 UTC (permalink / raw)
  To: Emilio G. Cota
  Cc: qemu-devel, alex.bennee, pbonzini, peter.maydell, serge.fdrv

On 07/07/2016 08:00 PM, Emilio G. Cota wrote:
> On Fri, Jul 01, 2016 at 10:04:36 -0700, Richard Henderson wrote:
>> Force the use of cmpxchg16b on x86_64.
>>
>> Wikipedia suggests that only very old AMD64 (circa 2004) did not have
>> this instruction.  Further, it's required by Windows 8 so no new cpus
>> will ever omit it.
>>
>> If we truely care about these, then we could check this at startup time
>> and then avoid executing paths that use it.
>>
>> Signed-off-by: Richard Henderson <rth@twiddle.net>
>> ---
>>  configure             |  29 ++++++++++++-
>>  cputlb.c              |   6 +++
>>  include/qemu/int128.h |   6 +++
>>  softmmu_template.h    | 110 +++++++++++++++++++++++++++++++++++++-------------
>>  tcg/tcg.h             |  22 ++++++++++
>>  5 files changed, 144 insertions(+), 29 deletions(-)
>>
>> diff --git a/configure b/configure
>> index 59ea124..586abd6 100755
>> --- a/configure
>> +++ b/configure
>> @@ -1201,7 +1201,10 @@ case "$cpu" in
>>             cc_i386='$(CC) -m32'
>>             ;;
>>      x86_64)
>> -           CPU_CFLAGS="-m64"
>> +           # ??? Only extremely old AMD cpus do not have cmpxchg16b.
>> +           # If we truly care, we should simply detect this case at
>> +           # runtime and generate the fallback to serial emulation.
>> +           CPU_CFLAGS="-m64 -mcx16"
...
>    if compile_prog "" "" ; then
>      atomic128=yes
> +  elif compile_prog "-mcx16" "" ; then
> +    QEMU_CFLAGS="$QEMU_CFLAGS -mcx16"
> +    EXTRA_CFLAGS="$EXTRA_CFLAGS -mcx16"
> +    atomic128=yes

Your change doesn't change anything.  If the user gave -march=haswell, then 
cx16 is enabled and the first test succeeds.  If the user gave -march=k8, the 
initial test would fail, but the second test would explicitly add cx16, and the 
second test would succeed.

In all cases, cx16 gets enabled.


r~

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH v2 10/27] tcg: Add atomic128 helpers
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 10/27] tcg: Add atomic128 helpers Richard Henderson
  2016-07-08  3:00   ` Emilio G. Cota
@ 2016-08-11 10:02   ` Alex Bennée
  1 sibling, 0 replies; 48+ messages in thread
From: Alex Bennée @ 2016-08-11 10:02 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, cota, pbonzini, peter.maydell, serge.fdrv


Richard Henderson <rth@twiddle.net> writes:

> Force the use of cmpxchg16b on x86_64.
>
> Wikipedia suggests that only very old AMD64 (circa 2004) did not have
> this instruction.  Further, it's required by Windows 8 so no new cpus
> will ever omit it.
>
> If we truely care about these, then we could check this at startup time
> and then avoid executing paths that use it.
>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  configure             |  29 ++++++++++++-
>  cputlb.c              |   6 +++
>  include/qemu/int128.h |   6 +++
>  softmmu_template.h    | 110 +++++++++++++++++++++++++++++++++++++-------------
>  tcg/tcg.h             |  22 ++++++++++
>  5 files changed, 144 insertions(+), 29 deletions(-)
>
<snip>
> diff --git a/softmmu_template.h b/softmmu_template.h
> index 76712b9..0a9f49b 100644
> --- a/softmmu_template.h
> +++ b/softmmu_template.h
> @@ -27,25 +27,30 @@
>
>  #define DATA_SIZE (1 << SHIFT)
>
> -#if DATA_SIZE == 8
> -#define SUFFIX q
> -#define LSUFFIX q
> -#define SDATA_TYPE  int64_t
> +#if DATA_SIZE == 16
> +#define SUFFIX     o
> +#define LSUFFIX    o
> +#define SDATA_TYPE Int128
> +#define DATA_TYPE  Int128
> +#elif DATA_SIZE == 8
> +#define SUFFIX     q
> +#define LSUFFIX    q
> +#define SDATA_TYPE int64_t
>  #define DATA_TYPE  uint64_t
>  #elif DATA_SIZE == 4
> -#define SUFFIX l
> -#define LSUFFIX l
> -#define SDATA_TYPE  int32_t
> +#define SUFFIX     l
> +#define LSUFFIX    l
> +#define SDATA_TYPE int32_t
>  #define DATA_TYPE  uint32_t
>  #elif DATA_SIZE == 2
> -#define SUFFIX w
> -#define LSUFFIX uw
> -#define SDATA_TYPE  int16_t
> +#define SUFFIX     w
> +#define LSUFFIX    uw
> +#define SDATA_TYPE int16_t
>  #define DATA_TYPE  uint16_t
>  #elif DATA_SIZE == 1
> -#define SUFFIX b
> -#define LSUFFIX ub
> -#define SDATA_TYPE  int8_t
> +#define SUFFIX     b
> +#define LSUFFIX    ub
> +#define SDATA_TYPE int8_t
>  #define DATA_TYPE  uint8_t
>  #else
>  #error unsupported data size
> @@ -56,7 +61,7 @@
>     to the register size of the host.  This is tcg_target_long, except in the
>     case of a 32-bit host and 64-bit data, and for that we always have
>     uint64_t.  Don't bother with this widened value for SOFTMMU_CODE_ACCESS.  */
> -#if defined(SOFTMMU_CODE_ACCESS) || DATA_SIZE == 8
> +#if defined(SOFTMMU_CODE_ACCESS) || DATA_SIZE >= 8
>  # define WORD_TYPE  DATA_TYPE
>  # define USUFFIX    SUFFIX
>  #else
> @@ -73,7 +78,9 @@
>  #define ADDR_READ addr_read
>  #endif
>
> -#if DATA_SIZE == 8
> +#if DATA_SIZE == 16
> +# define BSWAP(X)  bswap128(X)
> +#elif DATA_SIZE == 8
>  # define BSWAP(X)  bswap64(X)
>  #elif DATA_SIZE == 4
>  # define BSWAP(X)  bswap32(X)
> @@ -140,6 +147,7 @@
>      vidx >= 0;                                                                \
>  })

This currently merge conflicts with the current master due to the move
of the VICTIM_TLB code.

>
> +#if DATA_SIZE < 16
>  #ifndef SOFTMMU_CODE_ACCESS
>  static inline DATA_TYPE glue(io_read, SUFFIX)(CPUArchState *env,
>                                                CPUIOTLBEntry *iotlbentry,
> @@ -307,9 +315,10 @@ WORD_TYPE helper_be_ld_name(CPUArchState *env, target_ulong addr,
>      return res;
>  }
>  #endif /* DATA_SIZE > 1 */
> +#endif /* DATA_SIZE < 16 */
>
>  #ifndef SOFTMMU_CODE_ACCESS
> -
> +#if DATA_SIZE < 16
>  /* Provide signed versions of the load routines as well.  We can of course
>     avoid this for 64-bit data, or for 32-bit data on 32-bit host.  */
>  #if DATA_SIZE * 8 < TCG_TARGET_REG_BITS
> @@ -507,6 +516,7 @@ void probe_write(CPUArchState *env, target_ulong addr, int mmu_idx,
>      }
>  }
>  #endif
> +#endif /* DATA_SIZE < 16 */
>
>  #if DATA_SIZE == 1
>  # define HE_SUFFIX  _mmu
> @@ -573,9 +583,30 @@ DATA_TYPE glue(glue(helper_atomic_cmpxchg, SUFFIX), HE_SUFFIX)
>       TCGMemOpIdx oi, uintptr_t retaddr)
>  {
>      ATOMIC_MMU_BODY;
> +#if DATA_SIZE < 16
>      return atomic_cmpxchg(haddr, cmpv, newv);
> +#else
> +    __atomic_compare_exchange(haddr, &cmpv, &newv, false,
> +                              __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST);
> +    return cmpv;
> +#endif
>  }
>
> +#if DATA_SIZE > 1
> +DATA_TYPE glue(glue(helper_atomic_cmpxchg, SUFFIX), RE_SUFFIX)
> +    (CPUArchState *env, target_ulong addr, DATA_TYPE cmpv, DATA_TYPE newv,
> +     TCGMemOpIdx oi, uintptr_t retaddr)
> +{
> +    DATA_TYPE retv;
> +    cmpv = BSWAP(cmpv);
> +    newv = BSWAP(newv);
> +    retv = (glue(glue(helper_atomic_cmpxchg, SUFFIX), HE_SUFFIX)
> +            (env, addr, cmpv, newv, oi, retaddr));
> +    return BSWAP(retv);
> +}
> +#endif
> +
> +#if DATA_SIZE < 16
>  #define GEN_ATOMIC_HELPER(NAME)                                         \
>  DATA_TYPE glue(glue(glue(helper_atomic_, NAME), SUFFIX), HE_SUFFIX)     \
>      (CPUArchState *env, target_ulong addr, DATA_TYPE val,               \
> @@ -600,18 +631,6 @@ GEN_ATOMIC_HELPER(xchg)
>  #undef GEN_ATOMIC_HELPER
>
>  #if DATA_SIZE > 1
> -DATA_TYPE glue(glue(helper_atomic_cmpxchg, SUFFIX), RE_SUFFIX)
> -    (CPUArchState *env, target_ulong addr, DATA_TYPE cmpv, DATA_TYPE newv,
> -     TCGMemOpIdx oi, uintptr_t retaddr)
> -{
> -    DATA_TYPE retv;
> -    cmpv = BSWAP(cmpv);
> -    newv = BSWAP(newv);
> -    retv = (glue(glue(helper_atomic_cmpxchg, SUFFIX), HE_SUFFIX)
> -            (env, addr, cmpv, newv, oi, retaddr));
> -    return BSWAP(retv);
> -}
> -
>  #define GEN_ATOMIC_HELPER(NAME)                                         \
>  DATA_TYPE glue(glue(glue(helper_atomic_, NAME), SUFFIX), RE_SUFFIX)     \
>      (CPUArchState *env, target_ulong addr, DATA_TYPE val,               \
> @@ -676,6 +695,41 @@ DATA_TYPE glue(glue(helper_atomic_add_fetch, SUFFIX), RE_SUFFIX)
>      }
>  }
>  #endif /* DATA_SIZE > 1 */
> +#else /* DATA_SIZE >= 16 */
> +DATA_TYPE glue(glue(helper_atomic_ld, SUFFIX), HE_SUFFIX)
> +    (CPUArchState *env, target_ulong addr, TCGMemOpIdx oi, uintptr_t retaddr)
> +{
> +    DATA_TYPE res;
> +    ATOMIC_MMU_BODY;
> +    __atomic_load(haddr, &res, __ATOMIC_RELAXED);
> +    return res;
> +}
> +
> +DATA_TYPE glue(glue(helper_atomic_ld, SUFFIX), RE_SUFFIX)
> +    (CPUArchState *env, target_ulong addr, TCGMemOpIdx oi, uintptr_t retaddr)
> +{
> +    DATA_TYPE res;
> +    res = (glue(glue(helper_atomic_ld, SUFFIX), HE_SUFFIX)
> +           (env, addr, oi, retaddr));
> +    return BSWAP(res);
> +}
> +
> +void glue(glue(helper_atomic_st, SUFFIX), HE_SUFFIX)
> +    (CPUArchState *env, target_ulong addr, DATA_TYPE val,
> +     TCGMemOpIdx oi, uintptr_t retaddr)
> +{
> +    ATOMIC_MMU_BODY;
> +    __atomic_store(haddr, &val, __ATOMIC_RELAXED);
> +}
> +
> +void glue(glue(helper_atomic_st, SUFFIX), RE_SUFFIX)
> +    (CPUArchState *env, target_ulong addr, DATA_TYPE val,
> +     TCGMemOpIdx oi, uintptr_t retaddr)
> +{
> +    (glue(glue(helper_atomic_st, SUFFIX), HE_SUFFIX)
> +     (env, addr, BSWAP(val), oi, retaddr));
> +}
> +#endif /* DATA_SIZE < 16 */
>
>  #undef ATOMIC_MMU_BODY
>
> diff --git a/tcg/tcg.h b/tcg/tcg.h
> index 4e60498..1304a42 100644
> --- a/tcg/tcg.h
> +++ b/tcg/tcg.h
> @@ -1216,6 +1216,28 @@ GEN_ATOMIC_HELPER_ALL(xchg)
>  #undef GEN_ATOMIC_HELPER_ALL
>  #undef GEN_ATOMIC_HELPER
>
> +#ifdef CONFIG_ATOMIC128
> +#include "qemu/int128.h"
> +
> +/* These aren't really a "proper" helpers because TCG cannot manage Int128.
> +   However, use the same format as the others, for use by the backends. */
> +Int128 helper_atomic_cmpxchgo_le_mmu(CPUArchState *env, target_ulong addr,
> +                                     Int128 cmpv, Int128 newv,
> +                                     TCGMemOpIdx oi, uintptr_t retaddr);
> +Int128 helper_atomic_cmpxchgo_be_mmu(CPUArchState *env, target_ulong addr,
> +                                     Int128 cmpv, Int128 newv,
> +                                     TCGMemOpIdx oi, uintptr_t retaddr);
> +
> +Int128 helper_atomic_ldo_le_mmu(CPUArchState *env, target_ulong addr,
> +                                TCGMemOpIdx oi, uintptr_t retaddr);
> +Int128 helper_atomic_ldo_be_mmu(CPUArchState *env, target_ulong addr,
> +                                TCGMemOpIdx oi, uintptr_t retaddr);
> +void helper_atomic_sto_le_mmu(CPUArchState *env, target_ulong addr, Int128 val,
> +                              TCGMemOpIdx oi, uintptr_t retaddr);
> +void helper_atomic_sto_be_mmu(CPUArchState *env, target_ulong addr, Int128 val,
> +                              TCGMemOpIdx oi, uintptr_t retaddr);
> +
> +#endif /* CONFIG_ATOMIC128 */
>  #endif /* CONFIG_SOFTMMU */
>
>  #endif /* TCG_H */


--
Alex Bennée

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH v2 04/27] int128: Use __int128 if available
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 04/27] int128: Use __int128 if available Richard Henderson
@ 2016-08-11 10:45   ` Alex Bennée
  2016-08-25 19:09   ` [Qemu-devel] [PATCH] fixup! " Alex Bennée
  1 sibling, 0 replies; 48+ messages in thread
From: Alex Bennée @ 2016-08-11 10:45 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, cota, pbonzini, peter.maydell, serge.fdrv


Richard Henderson <rth@twiddle.net> writes:

> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  include/qemu/int128.h | 135 +++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 134 insertions(+), 1 deletion(-)
>
> diff --git a/include/qemu/int128.h b/include/qemu/int128.h
> index 52aaf99..08f1db1 100644
> --- a/include/qemu/int128.h
> +++ b/include/qemu/int128.h
> @@ -1,6 +1,138 @@
>  #ifndef INT128_H
>  #define INT128_H
>
> +#ifdef CONFIG_INT128
> +
> +typedef __int128 Int128;
> +
> +static inline Int128 int128_make64(uint64_t a)
> +{
> +    return a;
> +}
> +
> +static inline uint64_t int128_get64(Int128 a)
> +{
> +    uint64_t r = a;
> +    assert(r == a);
> +    return r;
> +}
> +
> +static inline uint64_t int128_getlo(Int128 a)
> +{
> +    return a;
> +}
> +
> +static inline int64_t int128_gethi(Int128 a)
> +{
> +    return a >> 64;
> +}
> +
> +static inline Int128 int128_zero(void)
> +{
> +    return 0;
> +}
> +
> +static inline Int128 int128_one(void)
> +{
> +    return 1;
> +}
> +
> +static inline Int128 int128_2_64(void)
> +{
> +    return (Int128)1 << 64;
> +}
> +
> +static inline Int128 int128_exts64(int64_t a)
> +{
> +    return a;
> +}
> +
> +static inline Int128 int128_and(Int128 a, Int128 b)
> +{
> +    return a & b;
> +}
> +
> +static inline Int128 int128_rshift(Int128 a, int n)
> +{
> +    return a >> n;
> +}
> +
> +static inline Int128 int128_add(Int128 a, Int128 b)
> +{
> +    return a + b;
> +}
> +
> +static inline Int128 int128_neg(Int128 a)
> +{
> +    return -a;
> +}
> +
> +static inline Int128 int128_sub(Int128 a, Int128 b)
> +{
> +    return a - b;
> +}
> +
> +static inline bool int128_nonneg(Int128 a)
> +{
> +    return a >= 0;
> +}
> +
> +static inline bool int128_eq(Int128 a, Int128 b)
> +{
> +    return a == b;
> +}
> +
> +static inline bool int128_ne(Int128 a, Int128 b)
> +{
> +    return a != b;
> +}
> +
> +static inline bool int128_ge(Int128 a, Int128 b)
> +{
> +    return a >= b;
> +}
> +
> +static inline bool int128_lt(Int128 a, Int128 b)
> +{
> +    return a < b;
> +}
> +
> +static inline bool int128_le(Int128 a, Int128 b)
> +{
> +    return a <= b;
> +}
> +
> +static inline bool int128_gt(Int128 a, Int128 b)
> +{
> +    return a > b;
> +}
> +
> +static inline bool int128_nz(Int128 a)
> +{
> +    return a != 0;
> +}
> +
> +static inline Int128 int128_min(Int128 a, Int128 b)
> +{
> +    return a < b ? a : b;
> +}
> +
> +static inline Int128 int128_max(Int128 a, Int128 b)
> +{
> +    return a > b ? a : b;
> +}
> +
> +static inline void int128_addto(Int128 *a, Int128 b)
> +{
> +    *a += b;
> +}
> +
> +static inline void int128_subfrom(Int128 *a, Int128 b)
> +{
> +    *a -= b;
> +}
> +
> +#else /* !CONFIG_INT128 */
>
>  typedef struct Int128 Int128;
>
> @@ -153,4 +285,5 @@ static inline void int128_subfrom(Int128 *a, Int128 b)
>      *a = int128_sub(*a, b);
>  }
>
> -#endif
> +#endif /* CONFIG_INT128 */
> +#endif /* INT128_H */

This breaks the tests/test-int128.c compile. This is mainly because the
test itself assumes details of the internals of the Int128 structure
however the expand() function will also need fixing if using GCC
__int128's.

--
Alex Bennée

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH v2 01/27] atomics: add atomic_xor
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 01/27] atomics: add atomic_xor Richard Henderson
@ 2016-08-11 17:19   ` Alex Bennée
  0 siblings, 0 replies; 48+ messages in thread
From: Alex Bennée @ 2016-08-11 17:19 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, cota, pbonzini, peter.maydell, serge.fdrv


Richard Henderson <rth@twiddle.net> writes:

> From: "Emilio G. Cota" <cota@braap.org>
>
> This paves the way for upcoming work.
>
> Signed-off-by: Emilio G. Cota <cota@braap.org>
> Message-Id: <1467054136-10430-8-git-send-email-cota@braap.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  include/qemu/atomic.h | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/include/qemu/atomic.h b/include/qemu/atomic.h
> index 7a59096..a5531da 100644
> --- a/include/qemu/atomic.h
> +++ b/include/qemu/atomic.h
> @@ -161,6 +161,7 @@
>  #define atomic_fetch_sub(ptr, n) __atomic_fetch_sub(ptr, n, __ATOMIC_SEQ_CST)
>  #define atomic_fetch_and(ptr, n) __atomic_fetch_and(ptr, n, __ATOMIC_SEQ_CST)
>  #define atomic_fetch_or(ptr, n)  __atomic_fetch_or(ptr, n, __ATOMIC_SEQ_CST)
> +#define atomic_fetch_xor(ptr, n) __atomic_fetch_xor(ptr, n, __ATOMIC_SEQ_CST)
>
>  /* And even shorter names that return void.  */
>  #define atomic_inc(ptr)    ((void) __atomic_fetch_add(ptr, 1, __ATOMIC_SEQ_CST))
> @@ -169,6 +170,7 @@
>  #define atomic_sub(ptr, n) ((void) __atomic_fetch_sub(ptr, n, __ATOMIC_SEQ_CST))
>  #define atomic_and(ptr, n) ((void) __atomic_fetch_and(ptr, n, __ATOMIC_SEQ_CST))
>  #define atomic_or(ptr, n)  ((void) __atomic_fetch_or(ptr, n, __ATOMIC_SEQ_CST))
> +#define atomic_xor(ptr, n) ((void) __atomic_fetch_xor(ptr, n, __ATOMIC_SEQ_CST))
>
>  #else /* __ATOMIC_RELAXED */
>
> @@ -355,6 +357,7 @@
>  #define atomic_fetch_sub       __sync_fetch_and_sub
>  #define atomic_fetch_and       __sync_fetch_and_and
>  #define atomic_fetch_or        __sync_fetch_and_or
> +#define atomic_fetch_xor       __sync_fetch_and_xor
>  #define atomic_cmpxchg         __sync_val_compare_and_swap
>
>  /* And even shorter names that return void.  */
> @@ -364,6 +367,7 @@
>  #define atomic_sub(ptr, n)     ((void) __sync_fetch_and_sub(ptr, n))
>  #define atomic_and(ptr, n)     ((void) __sync_fetch_and_and(ptr, n))
>  #define atomic_or(ptr, n)      ((void) __sync_fetch_and_or(ptr, n))
> +#define atomic_xor(ptr, n)     ((void) __sync_fetch_and_xor(ptr, n))
>
>  #endif /* __ATOMIC_RELAXED */
>  #endif /* __QEMU_ATOMIC_H */


--
Alex Bennée

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH v2 02/27] atomics: add atomic_op_fetch variants
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 02/27] atomics: add atomic_op_fetch variants Richard Henderson
@ 2016-08-11 17:20   ` Alex Bennée
  0 siblings, 0 replies; 48+ messages in thread
From: Alex Bennée @ 2016-08-11 17:20 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, cota, pbonzini, peter.maydell, serge.fdrv


Richard Henderson <rth@twiddle.net> writes:

> From: "Emilio G. Cota" <cota@braap.org>
>
> This paves the way for upcoming work.
>
> Signed-off-by: Emilio G. Cota <cota@braap.org>
> Message-Id: <1467054136-10430-9-git-send-email-cota@braap.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  include/qemu/atomic.h | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
>
> diff --git a/include/qemu/atomic.h b/include/qemu/atomic.h
> index a5531da..d75bfde 100644
> --- a/include/qemu/atomic.h
> +++ b/include/qemu/atomic.h
> @@ -163,6 +163,14 @@
>  #define atomic_fetch_or(ptr, n)  __atomic_fetch_or(ptr, n, __ATOMIC_SEQ_CST)
>  #define atomic_fetch_xor(ptr, n) __atomic_fetch_xor(ptr, n, __ATOMIC_SEQ_CST)
>
> +#define atomic_inc_fetch(ptr)    __atomic_add_fetch(ptr, 1, __ATOMIC_SEQ_CST)
> +#define atomic_dec_fetch(ptr)    __atomic_sub_fetch(ptr, 1, __ATOMIC_SEQ_CST)
> +#define atomic_add_fetch(ptr, n) __atomic_add_fetch(ptr, n, __ATOMIC_SEQ_CST)
> +#define atomic_sub_fetch(ptr, n) __atomic_sub_fetch(ptr, n, __ATOMIC_SEQ_CST)
> +#define atomic_and_fetch(ptr, n) __atomic_and_fetch(ptr, n, __ATOMIC_SEQ_CST)
> +#define atomic_or_fetch(ptr, n)  __atomic_or_fetch(ptr, n, __ATOMIC_SEQ_CST)
> +#define atomic_xor_fetch(ptr, n) __atomic_xor_fetch(ptr, n, __ATOMIC_SEQ_CST)
> +
>  /* And even shorter names that return void.  */
>  #define atomic_inc(ptr)    ((void) __atomic_fetch_add(ptr, 1, __ATOMIC_SEQ_CST))
>  #define atomic_dec(ptr)    ((void) __atomic_fetch_sub(ptr, 1, __ATOMIC_SEQ_CST))
> @@ -358,6 +366,15 @@
>  #define atomic_fetch_and       __sync_fetch_and_and
>  #define atomic_fetch_or        __sync_fetch_and_or
>  #define atomic_fetch_xor       __sync_fetch_and_xor
> +
> +#define atomic_inc_fetch(ptr)  __sync_add_and_fetch(ptr, 1)
> +#define atomic_dec_fetch(ptr)  __sync_add_and_fetch(ptr, -1)
> +#define atomic_add_fetch       __sync_add_and_fetch
> +#define atomic_sub_fetch       __sync_sub_and_fetch
> +#define atomic_and_fetch       __sync_and_and_fetch
> +#define atomic_or_fetch        __sync_or_and_fetch
> +#define atomic_xor_fetch       __sync_xor_and_fetch
> +
>  #define atomic_cmpxchg         __sync_val_compare_and_swap
>
>  /* And even shorter names that return void.  */


--
Alex Bennée

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH] fixup! int128: Use __int128 if available
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 04/27] int128: Use __int128 if available Richard Henderson
  2016-08-11 10:45   ` Alex Bennée
@ 2016-08-25 19:09   ` Alex Bennée
  2016-08-26 12:48     ` no-reply
  1 sibling, 1 reply; 48+ messages in thread
From: Alex Bennée @ 2016-08-25 19:09 UTC (permalink / raw)
  To: rth, cota; +Cc: Alex Bennée, open list:All patches CC here

---
ajb:
  - fixup the test-128 test case
---
 tests/test-int128.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/tests/test-int128.c b/tests/test-int128.c
index 4390123..b86a3c7 100644
--- a/tests/test-int128.c
+++ b/tests/test-int128.c
@@ -41,7 +41,7 @@ static Int128 expand(uint32_t x)
     uint64_t l, h;
     l = expand16(x & 65535);
     h = expand16(x >> 16);
-    return (Int128) {l, h};
+    return (Int128) int128_make128(l, h);
 };
 
 static void test_and(void)
@@ -54,8 +54,8 @@ static void test_and(void)
             Int128 b = expand(tests[j]);
             Int128 r = expand(tests[i] & tests[j]);
             Int128 s = int128_and(a, b);
-            g_assert_cmpuint(r.lo, ==, s.lo);
-            g_assert_cmpuint(r.hi, ==, s.hi);
+            g_assert_cmpuint(int128_getlo(r), ==, int128_getlo(s));
+            g_assert_cmpuint(int128_gethi(r), ==, int128_gethi(s));
         }
     }
 }
@@ -70,8 +70,8 @@ static void test_add(void)
             Int128 b = expand(tests[j]);
             Int128 r = expand(tests[i] + tests[j]);
             Int128 s = int128_add(a, b);
-            g_assert_cmpuint(r.lo, ==, s.lo);
-            g_assert_cmpuint(r.hi, ==, s.hi);
+            g_assert_cmpuint(int128_getlo(r), ==, int128_getlo(s));
+            g_assert_cmpuint(int128_gethi(r), ==, int128_gethi(s));
         }
     }
 }
@@ -86,8 +86,8 @@ static void test_sub(void)
             Int128 b = expand(tests[j]);
             Int128 r = expand(tests[i] - tests[j]);
             Int128 s = int128_sub(a, b);
-            g_assert_cmpuint(r.lo, ==, s.lo);
-            g_assert_cmpuint(r.hi, ==, s.hi);
+            g_assert_cmpuint(int128_getlo(r), ==, int128_getlo(s));
+            g_assert_cmpuint(int128_gethi(r), ==, int128_gethi(s));
         }
     }
 }
@@ -100,8 +100,8 @@ static void test_neg(void)
         Int128 a = expand(tests[i]);
         Int128 r = expand(-tests[i]);
         Int128 s = int128_neg(a);
-        g_assert_cmpuint(r.lo, ==, s.lo);
-        g_assert_cmpuint(r.hi, ==, s.hi);
+        g_assert_cmpuint(int128_getlo(r), ==, int128_getlo(s));
+        g_assert_cmpuint(int128_gethi(r), ==, int128_gethi(s));
     }
 }
 
@@ -180,8 +180,8 @@ test_rshift_one(uint32_t x, int n, uint64_t h, uint64_t l)
 {
     Int128 a = expand(x);
     Int128 r = int128_rshift(a, n);
-    g_assert_cmpuint(r.lo, ==, l);
-    g_assert_cmpuint(r.hi, ==, h);
+    g_assert_cmpuint(int128_getlo(r), ==, l);
+    g_assert_cmpuint(int128_gethi(r), ==, h);
 }
 
 static void test_rshift(void)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH] fixup! int128: Use __int128 if available
  2016-08-25 19:09   ` [Qemu-devel] [PATCH] fixup! " Alex Bennée
@ 2016-08-26 12:48     ` no-reply
  0 siblings, 0 replies; 48+ messages in thread
From: no-reply @ 2016-08-26 12:48 UTC (permalink / raw)
  To: alex.bennee; +Cc: famz, rth, cota, qemu-devel

Hi,

Your series failed automatic build test. Please find the testing commands and
their output below. If you have docker installed, you can probably reproduce it
locally.

Subject: [Qemu-devel] [PATCH] fixup! int128: Use __int128 if available
Type: series
Message-id: 1472152170-18562-1-git-send-email-alex.bennee@linaro.org

=== TEST SCRIPT BEGIN ===
#!/bin/bash
set -e
git submodule update --init dtc
make J=8 docker-test-quick@centos6
make J=8 docker-test-mingw@fedora
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
afef908 fixup! int128: Use __int128 if available

=== OUTPUT BEGIN ===
Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc'
Cloning into 'dtc'...
Submodule path 'dtc': checked out '65cc4d2748a2c2e6f27f1cf39e07a5dbabd80ebf'
  BUILD centos6
  ARCHIVE qemu.tgz
  ARCHIVE dtc.tgz
  COPY RUNNER
  RUN test-quick in centos6
No C++ compiler available; disabling C++ specific optional code
Install prefix    /tmp/qemu-test/src/tests/docker/install
BIOS directory    /tmp/qemu-test/src/tests/docker/install/share/qemu
binary directory  /tmp/qemu-test/src/tests/docker/install/bin
library directory /tmp/qemu-test/src/tests/docker/install/lib
module directory  /tmp/qemu-test/src/tests/docker/install/lib/qemu
libexec directory /tmp/qemu-test/src/tests/docker/install/libexec
include directory /tmp/qemu-test/src/tests/docker/install/include
config directory  /tmp/qemu-test/src/tests/docker/install/etc
local state directory   /tmp/qemu-test/src/tests/docker/install/var
Manual directory  /tmp/qemu-test/src/tests/docker/install/share/man
ELF interp prefix /usr/gnemul/qemu-%M
Source path       /tmp/qemu-test/src
C compiler        cc
Host C compiler   cc
C++ compiler      
Objective-C compiler cc
ARFLAGS           rv
CFLAGS            -O2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -pthread -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include   -g 
QEMU_CFLAGS       -I/usr/include/pixman-1    -fPIE -DPIE -m64 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common  -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all
LDFLAGS           -Wl,--warn-common -Wl,-z,relro -Wl,-z,now -pie -m64 -g 
make              make
install           install
python            python -B
smbd              /usr/sbin/smbd
module support    no
host CPU          x86_64
host big endian   no
target list       x86_64-softmmu aarch64-softmmu
tcg debug enabled no
gprof enabled     no
sparse enabled    no
strip binaries    yes
profiler          no
static build      no
pixman            system
SDL support       yes (1.2.14)
GTK support       no 
GTK GL support    no
VTE support       no 
TLS priority      NORMAL
GNUTLS support    no
GNUTLS rnd        no
libgcrypt         no
libgcrypt kdf     no
nettle            no 
nettle kdf        no
libtasn1          no
curses support    no
virgl support     no
curl support      no
mingw32 support   no
Audio drivers     oss
Block whitelist (rw) 
Block whitelist (ro) 
VirtFS support    no
VNC support       yes
VNC SASL support  no
VNC JPEG support  no
VNC PNG support   no
xen support       no
brlapi support    no
bluez  support    no
Documentation     no
PIE               yes
vde support       no
netmap support    no
Linux AIO support no
ATTR/XATTR support yes
Install blobs     yes
KVM support       yes
RDMA support      no
TCG interpreter   no
fdt support       yes
preadv support    yes
fdatasync         yes
madvise           yes
posix_madvise     yes
uuid support      no
libcap-ng support no
vhost-net support yes
vhost-scsi support yes
Trace backends    log
spice support     no 
rbd support       no
xfsctl support    no
smartcard support no
libusb            no
usb net redir     no
OpenGL support    no
OpenGL dmabufs    no
libiscsi support  no
libnfs support    no
build guest agent yes
QGA VSS support   no
QGA w32 disk info no
QGA MSI support   no
seccomp support   no
coroutine backend ucontext
coroutine pool    yes
GlusterFS support no
Archipelago support no
gcov              gcov
gcov enabled      no
TPM support       yes
libssh2 support   no
TPM passthrough   yes
QOM debugging     yes
vhdx              no
lzo support       no
snappy support    no
bzip2 support     no
NUMA host support no
tcmalloc support  no
jemalloc support  no
avx2 optimization no
  GEN   x86_64-softmmu/config-devices.mak.tmp
  GEN   aarch64-softmmu/config-devices.mak.tmp
  GEN   config-host.h
  GEN   qemu-options.def
  GEN   qmp-commands.h
  GEN   qapi-types.h
  GEN   qapi-visit.h
  GEN   qapi-event.h
  GEN   x86_64-softmmu/config-devices.mak
  GEN   aarch64-softmmu/config-devices.mak
  GEN   qmp-introspect.h
  GEN   tests/test-qapi-types.h
  GEN   tests/test-qapi-visit.h
  GEN   tests/test-qmp-commands.h
  GEN   tests/test-qapi-event.h
  GEN   tests/test-qmp-introspect.h
  GEN   config-all-devices.mak
  GEN   trace/generated-events.h
  GEN   trace/generated-tracers.h
  GEN   trace/generated-tcg-tracers.h
  GEN   trace/generated-helpers-wrappers.h
  GEN   trace/generated-helpers.h
  CC    tests/qemu-iotests/socket_scm_helper.o
  GEN   qga/qapi-generated/qga-qapi-types.h
  GEN   qga/qapi-generated/qga-qapi-visit.h
  GEN   qga/qapi-generated/qga-qmp-commands.h
  GEN   qga/qapi-generated/qga-qapi-types.c
  GEN   qga/qapi-generated/qga-qapi-visit.c
  GEN   qmp-introspect.c
  GEN   qga/qapi-generated/qga-qmp-marshal.c
  GEN   qapi-types.c
  GEN   qapi-visit.c
  GEN   qapi-event.c
  CC    qapi/qapi-visit-core.o
  CC    qapi/qapi-dealloc-visitor.o
  CC    qapi/qmp-input-visitor.o
  CC    qapi/qmp-output-visitor.o
  CC    qapi/qmp-registry.o
  CC    qapi/qmp-dispatch.o
  CC    qapi/string-input-visitor.o
  CC    qapi/string-output-visitor.o
  CC    qapi/opts-visitor.o
  CC    qapi/qapi-clone-visitor.o
  CC    qapi/qmp-event.o
  CC    qapi/qapi-util.o
  CC    qobject/qnull.o
  CC    qobject/qint.o
  CC    qobject/qstring.o
  CC    qobject/qdict.o
  CC    qobject/qlist.o
  CC    qobject/qfloat.o
  CC    qobject/qbool.o
  CC    qobject/qjson.o
  CC    qobject/qobject.o
  CC    qobject/json-lexer.o
  CC    qobject/json-streamer.o
  CC    qobject/json-parser.o
  GEN   trace/generated-events.c
  CC    trace/control.o
  CC    trace/qmp.o
  CC    util/osdep.o
  CC    util/cutils.o
  CC    util/unicode.o
  CC    util/qemu-timer-common.o
  CC    util/compatfd.o
  CC    util/event_notifier-posix.o
  CC    util/mmap-alloc.o
  CC    util/oslib-posix.o
  CC    util/qemu-openpty.o
  CC    util/qemu-thread-posix.o
  CC    util/memfd.o
  CC    util/envlist.o
  CC    util/path.o
  CC    util/bitmap.o
  CC    util/module.o
  CC    util/bitops.o
  CC    util/hbitmap.o
  CC    util/fifo8.o
  CC    util/acl.o
  CC    util/error.o
  CC    util/qemu-error.o
  CC    util/id.o
  CC    util/iov.o
  CC    util/qemu-config.o
  CC    util/qemu-sockets.o
  CC    util/uri.o
  CC    util/notify.o
  CC    util/qemu-option.o
  CC    util/qemu-progress.o
  CC    util/crc32c.o
  CC    util/hexdump.o
  CC    util/throttle.o
  CC    util/getauxval.o
  CC    util/readline.o
  CC    util/rfifolock.o
  CC    util/rcu.o
  CC    util/qemu-coroutine.o
  CC    util/qemu-coroutine-lock.o
  CC    util/qemu-coroutine-io.o
  CC    util/qemu-coroutine-sleep.o
  CC    util/coroutine-ucontext.o
  CC    util/buffer.o
  CC    util/timed-average.o
  CC    util/base64.o
  CC    util/log.o
  CC    util/qdist.o
  CC    util/qht.o
  CC    util/range.o
  CC    crypto/pbkdf-stub.o
  CC    stubs/arch-query-cpu-def.o
  CC    stubs/bdrv-next-monitor-owned.o
/tmp/qemu-test/src/util/qht.c: In function ‘qht_reset_size’:
/tmp/qemu-test/src/util/qht.c:413: warning: ‘new’ may be used uninitialized in this function
  CC    stubs/blk-commit-all.o
  CC    stubs/blockdev-close-all-bdrv-states.o
  CC    stubs/clock-warp.o
  CC    stubs/cpu-get-clock.o
  CC    stubs/cpu-get-icount.o
  CC    stubs/dump.o
  CC    stubs/fdset-add-fd.o
  CC    stubs/fdset-find-fd.o
  CC    stubs/fdset-get-fd.o
  CC    stubs/fdset-remove-fd.o
  CC    stubs/gdbstub.o
  CC    stubs/get-fd.o
  CC    stubs/get-next-serial.o
  CC    stubs/get-vm-name.o
  CC    stubs/iothread-lock.o
  CC    stubs/is-daemonized.o
  CC    stubs/machine-init-done.o
  CC    stubs/migr-blocker.o
  CC    stubs/mon-is-qmp.o
  CC    stubs/mon-printf.o
  CC    stubs/monitor-init.o
  CC    stubs/notify-event.o
  CC    stubs/qtest.o
  CC    stubs/replay.o
  CC    stubs/replay-user.o
  CC    stubs/reset.o
  CC    stubs/runstate-check.o
  CC    stubs/set-fd-handler.o
  CC    stubs/slirp.o
  CC    stubs/sysbus.o
  CC    stubs/trace-control.o
  CC    stubs/uuid.o
  CC    stubs/vm-stop.o
  CC    stubs/vmstate.o
  CC    stubs/cpus.o
  CC    stubs/kvm.o
  CC    stubs/qmp_pc_dimm_device_list.o
  CC    stubs/target-monitor-defs.o
  CC    stubs/target-get-monitor-def.o
  CC    stubs/vhost.o
  CC    stubs/iohandler.o
  CC    stubs/smbios_type_38.o
  CC    stubs/ipmi.o
  CC    stubs/pc_madt_cpu_entry.o
  CC    contrib/ivshmem-client/ivshmem-client.o
  CC    contrib/ivshmem-client/main.o
  CC    contrib/ivshmem-server/ivshmem-server.o
  CC    contrib/ivshmem-server/main.o
  CC    qemu-nbd.o
  CC    async.o
  CC    thread-pool.o
  CC    block.o
  CC    blockjob.o
  CC    main-loop.o
  CC    iohandler.o
  CC    qemu-timer.o
  CC    aio-posix.o
  CC    qemu-io-cmds.o
  CC    block/raw_bsd.o
  CC    block/qcow.o
  CC    block/vdi.o
  CC    block/vmdk.o
  CC    block/cloop.o
  CC    block/bochs.o
  CC    block/vpc.o
  CC    block/vvfat.o
  CC    block/qcow2.o
  CC    block/qcow2-refcount.o
  CC    block/qcow2-cluster.o
  CC    block/qcow2-snapshot.o
  CC    block/qcow2-cache.o
  CC    block/qed.o
  CC    block/qed-gencb.o
  CC    block/qed-l2-cache.o
  CC    block/qed-table.o
  CC    block/qed-cluster.o
  CC    block/qed-check.o
  CC    block/quorum.o
  CC    block/parallels.o
  CC    block/blkdebug.o
  CC    block/blkverify.o
  CC    block/blkreplay.o
  CC    block/block-backend.o
  CC    block/snapshot.o
  CC    block/qapi.o
  CC    block/raw-posix.o
  CC    block/null.o
  CC    block/mirror.o
  CC    block/commit.o
  CC    block/io.o
  CC    block/throttle-groups.o
  CC    block/nbd.o
  CC    block/nbd-client.o
  CC    block/sheepdog.o
  CC    block/accounting.o
  CC    block/dirty-bitmap.o
  CC    block/write-threshold.o
  CC    block/crypto.o
  CC    nbd/server.o
  CC    nbd/client.o
  CC    nbd/common.o
  CC    block/dmg.o
  CC    crypto/init.o
  CC    crypto/hash.o
  CC    crypto/hash-glib.o
  CC    crypto/aes.o
  CC    crypto/desrfb.o
  CC    crypto/cipher.o
  CC    crypto/tlscreds.o
  CC    crypto/tlscredsanon.o
  CC    crypto/tlscredsx509.o
  CC    crypto/tlssession.o
  CC    crypto/secret.o
  CC    crypto/random-platform.o
  CC    crypto/pbkdf.o
  CC    crypto/ivgen.o
  CC    crypto/ivgen-essiv.o
  CC    crypto/ivgen-plain.o
  CC    crypto/ivgen-plain64.o
  CC    crypto/afsplit.o
  CC    crypto/xts.o
  CC    crypto/block.o
  CC    crypto/block-qcow.o
  CC    crypto/block-luks.o
  CC    io/channel.o
  CC    io/channel-buffer.o
  CC    io/channel-command.o
  CC    io/channel-file.o
  CC    io/channel-socket.o
  CC    io/channel-tls.o
  CC    io/channel-watch.o
  CC    io/channel-websock.o
  CC    io/channel-util.o
  CC    io/task.o
  CC    qom/object.o
  CC    qom/container.o
  CC    qom/qom-qobject.o
  CC    qom/object_interfaces.o
  GEN   qemu-img-cmds.h
  CC    qemu-io.o
  CC    qemu-bridge-helper.o
  CC    blockdev.o
  CC    blockdev-nbd.o
  CC    iothread.o
  CC    qdev-monitor.o
  CC    device-hotplug.o
  CC    os-posix.o
  CC    qemu-char.o
  CC    page_cache.o
  CC    accel.o
  CC    bt-host.o
  CC    bt-vhci.o
  CC    dma-helpers.o
  CC    vl.o
  CC    tpm.o
  CC    device_tree.o
  GEN   qmp-marshal.c
  CC    qmp.o
  CC    hmp.o
  CC    tcg-runtime.o
  CC    audio/audio.o
  CC    audio/noaudio.o
  CC    audio/wavaudio.o
  CC    audio/mixeng.o
  CC    audio/sdlaudio.o
  CC    audio/ossaudio.o
  CC    audio/wavcapture.o
  CC    backends/rng.o
  CC    backends/rng-egd.o
  CC    backends/rng-random.o
  CC    backends/msmouse.o
  CC    backends/testdev.o
  CC    backends/tpm.o
  CC    backends/hostmem.o
  CC    backends/hostmem-ram.o
  CC    backends/hostmem-file.o
  CC    block/stream.o
  CC    block/backup.o
  CC    disas/arm.o
  CC    disas/i386.o
  CC    fsdev/qemu-fsdev-dummy.o
  CC    fsdev/qemu-fsdev-opts.o
  CC    hw/acpi/core.o
  CC    hw/acpi/piix4.o
  CC    hw/acpi/pcihp.o
  CC    hw/acpi/ich9.o
  CC    hw/acpi/tco.o
  CC    hw/acpi/cpu_hotplug.o
  CC    hw/acpi/memory_hotplug.o
  CC    hw/acpi/memory_hotplug_acpi_table.o
  CC    hw/acpi/cpu.o
  CC    hw/acpi/acpi_interface.o
  CC    hw/acpi/bios-linker-loader.o
  CC    hw/acpi/aml-build.o
  CC    hw/acpi/ipmi.o
  CC    hw/audio/sb16.o
  CC    hw/audio/es1370.o
  CC    hw/audio/ac97.o
  CC    hw/audio/fmopl.o
  CC    hw/audio/adlib.o
  CC    hw/audio/gus.o
  CC    hw/audio/gusemu_hal.o
  CC    hw/audio/gusemu_mixer.o
  CC    hw/audio/cs4231a.o
  CC    hw/audio/intel-hda.o
  CC    hw/audio/hda-codec.o
  CC    hw/audio/pcspk.o
  CC    hw/audio/wm8750.o
  CC    hw/audio/pl041.o
  CC    hw/audio/lm4549.o
  CC    hw/audio/marvell_88w8618.o
  CC    hw/block/block.o
  CC    hw/block/cdrom.o
  CC    hw/block/hd-geometry.o
  CC    hw/block/fdc.o
  CC    hw/block/m25p80.o
  CC    hw/block/nand.o
  CC    hw/block/pflash_cfi01.o
  CC    hw/block/pflash_cfi02.o
  CC    hw/block/ecc.o
  CC    hw/block/onenand.o
  CC    hw/block/nvme.o
  CC    hw/bt/core.o
  CC    hw/bt/l2cap.o
  CC    hw/bt/sdp.o
  CC    hw/bt/hci.o
  CC    hw/bt/hid.o
  CC    hw/bt/hci-csr.o
  CC    hw/char/ipoctal232.o
  CC    hw/char/parallel.o
  CC    hw/char/pl011.o
  CC    hw/char/serial.o
  CC    hw/char/serial-isa.o
  CC    hw/char/serial-pci.o
  CC    hw/char/virtio-console.o
  CC    hw/char/cadence_uart.o
  CC    hw/char/debugcon.o
  CC    hw/char/imx_serial.o
  CC    hw/core/qdev.o
  CC    hw/core/qdev-properties.o
  CC    hw/core/bus.o
  CC    hw/core/fw-path-provider.o
  CC    hw/core/irq.o
  CC    hw/core/hotplug.o
  CC    hw/core/ptimer.o
  CC    hw/core/sysbus.o
  CC    hw/core/machine.o
  CC    hw/core/null-machine.o
  CC    hw/core/loader.o
  CC    hw/core/qdev-properties-system.o
  CC    hw/core/register.o
  CC    hw/core/platform-bus.o
  CC    hw/display/ads7846.o
  CC    hw/display/cirrus_vga.o
  CC    hw/display/pl110.o
  CC    hw/display/ssd0303.o
  CC    hw/display/ssd0323.o
  CC    hw/display/vga-pci.o
  CC    hw/display/vga-isa.o
  CC    hw/display/vmware_vga.o
  CC    hw/display/blizzard.o
  CC    hw/display/exynos4210_fimd.o
  CC    hw/display/framebuffer.o
  CC    hw/display/tc6393xb.o
  CC    hw/dma/pl080.o
  CC    hw/dma/pl330.o
  CC    hw/dma/i8257.o
  CC    hw/dma/xlnx-zynq-devcfg.o
  CC    hw/gpio/max7310.o
  CC    hw/gpio/pl061.o
  CC    hw/gpio/zaurus.o
  CC    hw/gpio/gpio_key.o
  CC    hw/i2c/core.o
  CC    hw/i2c/smbus.o
  CC    hw/i2c/smbus_eeprom.o
  CC    hw/i2c/i2c-ddc.o
  CC    hw/i2c/versatile_i2c.o
  CC    hw/i2c/smbus_ich9.o
  CC    hw/i2c/pm_smbus.o
  CC    hw/i2c/bitbang_i2c.o
  CC    hw/i2c/exynos4210_i2c.o
  CC    hw/i2c/imx_i2c.o
  CC    hw/i2c/aspeed_i2c.o
  CC    hw/ide/core.o
  CC    hw/ide/atapi.o
  CC    hw/ide/qdev.o
  CC    hw/ide/pci.o
  CC    hw/ide/isa.o
  CC    hw/ide/piix.o
  CC    hw/ide/microdrive.o
  CC    hw/ide/ahci.o
  CC    hw/ide/ich.o
  CC    hw/input/hid.o
  CC    hw/input/lm832x.o
  CC    hw/input/pckbd.o
  CC    hw/input/pl050.o
  CC    hw/input/ps2.o
  CC    hw/input/stellaris_input.o
  CC    hw/input/tsc2005.o
  CC    hw/input/vmmouse.o
  CC    hw/input/virtio-input.o
  CC    hw/input/virtio-input-hid.o
  CC    hw/input/virtio-input-host.o
  CC    hw/intc/i8259_common.o
  CC    hw/intc/i8259.o
  CC    hw/intc/pl190.o
  CC    hw/intc/imx_avic.o
  CC    hw/intc/realview_gic.o
  CC    hw/intc/ioapic_common.o
  CC    hw/intc/arm_gic_common.o
  CC    hw/intc/arm_gic.o
  CC    hw/intc/arm_gicv2m.o
  CC    hw/intc/arm_gicv3_common.o
  CC    hw/intc/arm_gicv3.o
  CC    hw/intc/arm_gicv3_dist.o
  CC    hw/intc/arm_gicv3_redist.o
  CC    hw/ipack/ipack.o
  CC    hw/ipack/tpci200.o
  CC    hw/ipmi/ipmi.o
  CC    hw/ipmi/ipmi_bmc_sim.o
  CC    hw/ipmi/ipmi_bmc_extern.o
  CC    hw/ipmi/isa_ipmi_kcs.o
  CC    hw/ipmi/isa_ipmi_bt.o
  CC    hw/isa/isa-bus.o
  CC    hw/isa/apm.o
  CC    hw/mem/pc-dimm.o
  CC    hw/mem/nvdimm.o
  CC    hw/misc/applesmc.o
  CC    hw/misc/max111x.o
  CC    hw/misc/tmp105.o
  CC    hw/misc/debugexit.o
  CC    hw/misc/sga.o
  CC    hw/misc/pc-testdev.o
  CC    hw/misc/pci-testdev.o
  CC    hw/misc/arm_l2x0.o
  CC    hw/misc/arm_integrator_debug.o
  CC    hw/misc/a9scu.o
  CC    hw/misc/arm11scu.o
  CC    hw/net/ne2000.o
  CC    hw/net/eepro100.o
  CC    hw/net/pcnet-pci.o
  CC    hw/net/pcnet.o
  CC    hw/net/e1000.o
  CC    hw/net/e1000x_common.o
  CC    hw/net/net_tx_pkt.o
  CC    hw/net/net_rx_pkt.o
  CC    hw/net/e1000e.o
  CC    hw/net/e1000e_core.o
  CC    hw/net/rtl8139.o
  CC    hw/net/vmxnet3.o
  CC    hw/net/smc91c111.o
  CC    hw/net/lan9118.o
  CC    hw/net/ne2000-isa.o
  CC    hw/net/xgmac.o
  CC    hw/net/allwinner_emac.o
  CC    hw/net/imx_fec.o
  CC    hw/net/cadence_gem.o
  CC    hw/net/stellaris_enet.o
  CC    hw/net/rocker/rocker.o
  CC    hw/net/rocker/rocker_fp.o
  CC    hw/net/rocker/rocker_desc.o
  CC    hw/net/rocker/rocker_world.o
  CC    hw/net/rocker/rocker_of_dpa.o
  CC    hw/nvram/eeprom93xx.o
  CC    hw/nvram/fw_cfg.o
  CC    hw/pci-bridge/pci_bridge_dev.o
  CC    hw/pci-bridge/pci_expander_bridge.o
  CC    hw/pci-bridge/xio3130_upstream.o
  CC    hw/pci-bridge/xio3130_downstream.o
  CC    hw/pci-bridge/ioh3420.o
  CC    hw/pci-bridge/i82801b11.o
  CC    hw/pci-host/pam.o
  CC    hw/pci-host/versatile.o
  CC    hw/pci-host/piix.o
  CC    hw/pci-host/q35.o
/tmp/qemu-test/src/hw/nvram/fw_cfg.c: In function ‘fw_cfg_dma_transfer’:
/tmp/qemu-test/src/hw/nvram/fw_cfg.c:330: warning: ‘read’ may be used uninitialized in this function
  CC    hw/pci-host/gpex.o
  CC    hw/pci/pci.o
  CC    hw/pci/pci_bridge.o
  CC    hw/pci/msix.o
  CC    hw/pci/msi.o
  CC    hw/pci/shpc.o
  CC    hw/pci/slotid_cap.o
  CC    hw/pci/pci_host.o
  CC    hw/pci/pcie_host.o
  CC    hw/pci/pcie.o
  CC    hw/pci/pcie_aer.o
  CC    hw/pci/pcie_port.o
  CC    hw/pci/pci-stub.o
  CC    hw/pcmcia/pcmcia.o
  CC    hw/scsi/scsi-disk.o
  CC    hw/scsi/scsi-generic.o
  CC    hw/scsi/scsi-bus.o
  CC    hw/scsi/lsi53c895a.o
  CC    hw/scsi/mptsas.o
  CC    hw/scsi/mptconfig.o
  CC    hw/scsi/mptendian.o
  CC    hw/scsi/megasas.o
  CC    hw/scsi/vmw_pvscsi.o
  CC    hw/scsi/esp.o
  CC    hw/scsi/esp-pci.o
  CC    hw/sd/pl181.o
  CC    hw/sd/ssi-sd.o
  CC    hw/sd/sd.o
  CC    hw/sd/core.o
  CC    hw/sd/sdhci.o
  CC    hw/smbios/smbios.o
  CC    hw/smbios/smbios_type_38.o
  CC    hw/ssi/pl022.o
  CC    hw/ssi/ssi.o
  CC    hw/ssi/xilinx_spips.o
  CC    hw/ssi/aspeed_smc.o
  CC    hw/timer/arm_timer.o
  CC    hw/timer/arm_mptimer.o
  CC    hw/timer/a9gtimer.o
  CC    hw/timer/cadence_ttc.o
  CC    hw/timer/ds1338.o
  CC    hw/timer/hpet.o
  CC    hw/timer/i8254_common.o
  CC    hw/timer/i8254.o
  CC    hw/timer/pl031.o
  CC    hw/timer/twl92230.o
  CC    hw/timer/imx_epit.o
  CC    hw/timer/imx_gpt.o
  CC    hw/timer/stm32f2xx_timer.o
  CC    hw/timer/aspeed_timer.o
  CC    hw/tpm/tpm_tis.o
  CC    hw/tpm/tpm_passthrough.o
  CC    hw/tpm/tpm_util.o
  CC    hw/usb/core.o
  CC    hw/usb/combined-packet.o
  CC    hw/usb/bus.o
  CC    hw/usb/libhw.o
  CC    hw/usb/desc.o
  CC    hw/usb/desc-msos.o
  CC    hw/usb/hcd-uhci.o
  CC    hw/usb/hcd-ohci.o
  CC    hw/usb/hcd-ehci.o
  CC    hw/usb/hcd-ehci-pci.o
  CC    hw/usb/hcd-ehci-sysbus.o
  CC    hw/usb/hcd-xhci.o
  CC    hw/usb/hcd-musb.o
  CC    hw/usb/dev-hub.o
  CC    hw/usb/dev-hid.o
  CC    hw/usb/dev-wacom.o
  CC    hw/usb/dev-storage.o
  CC    hw/usb/dev-uas.o
  CC    hw/usb/dev-audio.o
  CC    hw/usb/dev-serial.o
  CC    hw/usb/dev-network.o
  CC    hw/usb/dev-bluetooth.o
  CC    hw/usb/dev-smartcard-reader.o
  CC    hw/usb/dev-mtp.o
  CC    hw/usb/host-stub.o
  CC    hw/virtio/virtio-rng.o
  CC    hw/virtio/virtio-pci.o
  CC    hw/virtio/virtio-bus.o
  CC    hw/virtio/virtio-mmio.o
  CC    hw/watchdog/watchdog.o
  CC    hw/watchdog/wdt_i6300esb.o
  CC    hw/watchdog/wdt_ib700.o
  CC    migration/migration.o
  CC    migration/socket.o
  CC    migration/fd.o
  CC    migration/exec.o
  CC    migration/tls.o
  CC    migration/vmstate.o
  CC    migration/qemu-file.o
  CC    migration/qemu-file-channel.o
  CC    migration/xbzrle.o
  CC    migration/postcopy-ram.o
  CC    migration/qjson.o
  CC    migration/block.o
  CC    net/net.o
  CC    net/queue.o
  CC    net/checksum.o
  CC    net/util.o
  CC    net/hub.o
  CC    net/socket.o
  CC    net/dump.o
  CC    net/eth.o
  CC    net/l2tpv3.o
  CC    net/tap.o
  CC    net/vhost-user.o
  CC    net/tap-linux.o
  CC    net/slirp.o
  CC    net/filter.o
  CC    net/filter-buffer.o
  CC    net/filter-mirror.o
  CC    qom/cpu.o
  CC    replay/replay.o
  CC    replay/replay-internal.o
  CC    replay/replay-events.o
  CC    replay/replay-time.o
/tmp/qemu-test/src/replay/replay-internal.c: In function ‘replay_put_array’:
/tmp/qemu-test/src/replay/replay-internal.c:68: warning: ignoring return value of ‘fwrite’, declared with attribute warn_unused_result
  CC    replay/replay-input.o
  CC    replay/replay-char.o
  CC    slirp/cksum.o
  CC    slirp/if.o
  CC    slirp/ip_icmp.o
  CC    slirp/ip6_icmp.o
  CC    slirp/ip6_input.o
  CC    slirp/ip6_output.o
  CC    slirp/ip_input.o
  CC    slirp/ip_output.o
  CC    slirp/dnssearch.o
  CC    slirp/dhcpv6.o
  CC    slirp/slirp.o
  CC    slirp/mbuf.o
  CC    slirp/misc.o
  CC    slirp/sbuf.o
  CC    slirp/socket.o
  CC    slirp/tcp_input.o
/tmp/qemu-test/src/slirp/tcp_input.c: In function ‘tcp_input’:
/tmp/qemu-test/src/slirp/tcp_input.c:219: warning: ‘save_ip.ip_p’ may be used uninitialized in this function
/tmp/qemu-test/src/slirp/tcp_input.c:219: warning: ‘save_ip.ip_len’ may be used uninitialized in this function
/tmp/qemu-test/src/slirp/tcp_input.c:219: warning: ‘save_ip.ip_tos’ may be used uninitialized in this function
/tmp/qemu-test/src/slirp/tcp_input.c:219: warning: ‘save_ip.ip_id’ may be used uninitialized in this function
/tmp/qemu-test/src/slirp/tcp_input.c:219: warning: ‘save_ip.ip_off’ may be used uninitialized in this function
/tmp/qemu-test/src/slirp/tcp_input.c:219: warning: ‘save_ip.ip_ttl’ may be used uninitialized in this function
/tmp/qemu-test/src/slirp/tcp_input.c:219: warning: ‘save_ip.ip_sum’ may be used uninitialized in this function
/tmp/qemu-test/src/slirp/tcp_input.c:219: warning: ‘save_ip.ip_src.s_addr’ may be used uninitialized in this function
/tmp/qemu-test/src/slirp/tcp_input.c:219: warning: ‘save_ip.ip_dst.s_addr’ may be used uninitialized in this function
/tmp/qemu-test/src/slirp/tcp_input.c:220: warning: ‘save_ip6.ip_nh’ may be used uninitialized in this function
  CC    slirp/tcp_output.o
  CC    slirp/tcp_subr.o
  CC    slirp/tcp_timer.o
  CC    slirp/udp.o
  CC    slirp/udp6.o
  CC    slirp/bootp.o
  CC    slirp/tftp.o
  CC    slirp/arp_table.o
  CC    slirp/ndp_table.o
  CC    ui/keymaps.o
  CC    ui/console.o
  CC    ui/cursor.o
  CC    ui/qemu-pixman.o
  CC    ui/input.o
  CC    ui/input-keymap.o
  CC    ui/input-legacy.o
  CC    ui/input-linux.o
  CC    ui/sdl.o
  CC    ui/sdl_zoom.o
  CC    ui/x_keymap.o
  CC    ui/vnc.o
  CC    ui/vnc-enc-zlib.o
  CC    ui/vnc-enc-hextile.o
  CC    ui/vnc-enc-tight.o
  CC    ui/vnc-palette.o
  CC    ui/vnc-enc-zrle.o
  CC    ui/vnc-auth-vencrypt.o
  CC    ui/vnc-ws.o
  CC    ui/vnc-jobs.o
  LINK  tests/qemu-iotests/socket_scm_helper
  CC    qga/commands.o
  CC    qga/guest-agent-command-state.o
  CC    qga/main.o
  CC    qga/commands-posix.o
  CC    qga/channel-posix.o
  CC    qga/qapi-generated/qga-qapi-types.o
  CC    qga/qapi-generated/qga-qapi-visit.o
  CC    qga/qapi-generated/qga-qmp-marshal.o
  CC    qmp-introspect.o
  CC    qapi-types.o
  CC    qapi-visit.o
  CC    qapi-event.o
  AR    libqemustub.a
  CC    qemu-img.o
  CC    qmp-marshal.o
  CC    trace/generated-events.o
  AS    optionrom/multiboot.o
  AS    optionrom/linuxboot.o
  CC    optionrom/linuxboot_dma.o
  AS    optionrom/kvmvapic.o
cc: unrecognized option '-no-integrated-as'
cc: unrecognized option '-no-integrated-as'
  Building optionrom/linuxboot_dma.img
  Building optionrom/linuxboot_dma.raw
  Building optionrom/multiboot.img
  Building optionrom/linuxboot.img
  Building optionrom/kvmvapic.img
  Building optionrom/multiboot.raw
  Building optionrom/linuxboot.raw
  Building optionrom/kvmvapic.raw
  Signing optionrom/linuxboot_dma.bin
  Signing optionrom/multiboot.bin
  Signing optionrom/linuxboot.bin
  Signing optionrom/kvmvapic.bin
  AR    libqemuutil.a
  LINK  qemu-ga
  LINK  ivshmem-client
  LINK  ivshmem-server
  LINK  qemu-nbd
  LINK  qemu-img
  LINK  qemu-io
  LINK  qemu-bridge-helper
  GEN   aarch64-softmmu/hmp-commands.h
  GEN   aarch64-softmmu/hmp-commands-info.h
  GEN   aarch64-softmmu/qmp-commands-old.h
  GEN   aarch64-softmmu/config-target.h
  CC    aarch64-softmmu/exec.o
  CC    aarch64-softmmu/translate-all.o
  CC    aarch64-softmmu/cpu-exec.o
  CC    aarch64-softmmu/translate-common.o
  GEN   x86_64-softmmu/hmp-commands.h
  CC    aarch64-softmmu/cpu-exec-common.o
  GEN   x86_64-softmmu/hmp-commands-info.h
  CC    aarch64-softmmu/tcg/tcg.o
  CC    aarch64-softmmu/tcg/tcg-op.o
  CC    aarch64-softmmu/tcg/optimize.o
  CC    aarch64-softmmu/tcg/tcg-common.o
  GEN   x86_64-softmmu/qmp-commands-old.h
  CC    aarch64-softmmu/fpu/softfloat.o
  GEN   x86_64-softmmu/config-target.h
  CC    aarch64-softmmu/disas.o
  GEN   aarch64-softmmu/gdbstub-xml.c
  CC    aarch64-softmmu/kvm-stub.o
  CC    aarch64-softmmu/arch_init.o
  CC    aarch64-softmmu/cpus.o
  CC    aarch64-softmmu/monitor.o
  CC    aarch64-softmmu/gdbstub.o
  CC    aarch64-softmmu/balloon.o
  CC    aarch64-softmmu/ioport.o
  CC    aarch64-softmmu/numa.o
  CC    aarch64-softmmu/qtest.o
  CC    aarch64-softmmu/bootdevice.o
  CC    aarch64-softmmu/memory.o
  CC    aarch64-softmmu/cputlb.o
  CC    aarch64-softmmu/memory_mapping.o
  CC    aarch64-softmmu/dump.o
  CC    aarch64-softmmu/migration/ram.o
  CC    aarch64-softmmu/migration/savevm.o
  CC    aarch64-softmmu/xen-common-stub.o
  CC    aarch64-softmmu/xen-hvm-stub.o
  CC    aarch64-softmmu/hw/block/virtio-blk.o
  CC    aarch64-softmmu/hw/block/dataplane/virtio-blk.o
  CC    aarch64-softmmu/hw/char/exynos4210_uart.o
  CC    aarch64-softmmu/hw/char/omap_uart.o
  CC    aarch64-softmmu/hw/char/digic-uart.o
  CC    aarch64-softmmu/hw/char/stm32f2xx_usart.o
  CC    aarch64-softmmu/hw/char/bcm2835_aux.o
  CC    aarch64-softmmu/hw/char/virtio-serial-bus.o
  CC    aarch64-softmmu/hw/core/nmi.o
  CC    aarch64-softmmu/hw/cpu/arm11mpcore.o
  CC    aarch64-softmmu/hw/cpu/a9mpcore.o
  CC    aarch64-softmmu/hw/cpu/realview_mpcore.o
  CC    aarch64-softmmu/hw/cpu/a15mpcore.o
  CC    aarch64-softmmu/hw/cpu/core.o
  CC    aarch64-softmmu/hw/display/omap_dss.o
  CC    aarch64-softmmu/hw/display/omap_lcdc.o
  CC    aarch64-softmmu/hw/display/pxa2xx_lcd.o
  CC    aarch64-softmmu/hw/display/bcm2835_fb.o
  CC    aarch64-softmmu/hw/display/vga.o
  CC    aarch64-softmmu/hw/display/virtio-gpu.o
  CC    aarch64-softmmu/hw/display/virtio-gpu-3d.o
  CC    aarch64-softmmu/hw/display/virtio-gpu-pci.o
  CC    aarch64-softmmu/hw/display/dpcd.o
  CC    aarch64-softmmu/hw/display/xlnx_dp.o
  CC    aarch64-softmmu/hw/dma/xlnx_dpdma.o
  CC    aarch64-softmmu/hw/dma/omap_dma.o
  CC    aarch64-softmmu/hw/dma/soc_dma.o
  CC    aarch64-softmmu/hw/dma/pxa2xx_dma.o
  CC    aarch64-softmmu/hw/dma/bcm2835_dma.o
  CC    x86_64-softmmu/exec.o
  CC    aarch64-softmmu/hw/gpio/omap_gpio.o
  CC    x86_64-softmmu/translate-all.o
  CC    aarch64-softmmu/hw/gpio/imx_gpio.o
  CC    aarch64-softmmu/hw/i2c/omap_i2c.o
  CC    x86_64-softmmu/cpu-exec.o
  CC    x86_64-softmmu/translate-common.o
  CC    aarch64-softmmu/hw/input/pxa2xx_keypad.o
  CC    aarch64-softmmu/hw/input/tsc210x.o
  CC    x86_64-softmmu/cpu-exec-common.o
  CC    aarch64-softmmu/hw/intc/armv7m_nvic.o
  CC    aarch64-softmmu/hw/intc/exynos4210_gic.o
  CC    x86_64-softmmu/tcg/tcg.o
  CC    aarch64-softmmu/hw/intc/exynos4210_combiner.o
  CC    aarch64-softmmu/hw/intc/omap_intc.o
  CC    x86_64-softmmu/tcg/tcg-op.o
  CC    aarch64-softmmu/hw/intc/bcm2835_ic.o
  CC    x86_64-softmmu/tcg/optimize.o
  CC    aarch64-softmmu/hw/intc/bcm2836_control.o
  CC    aarch64-softmmu/hw/intc/allwinner-a10-pic.o
  CC    aarch64-softmmu/hw/intc/aspeed_vic.o
  CC    aarch64-softmmu/hw/intc/arm_gicv3_cpuif.o
  CC    aarch64-softmmu/hw/misc/ivshmem.o
  CC    x86_64-softmmu/tcg/tcg-common.o
  CC    aarch64-softmmu/hw/misc/arm_sysctl.o
  CC    x86_64-softmmu/fpu/softfloat.o
  CC    aarch64-softmmu/hw/misc/cbus.o
  CC    x86_64-softmmu/disas.o
  CC    aarch64-softmmu/hw/misc/exynos4210_pmu.o
  CC    x86_64-softmmu/arch_init.o
  CC    aarch64-softmmu/hw/misc/imx_ccm.o
  CC    x86_64-softmmu/cpus.o
  CC    aarch64-softmmu/hw/misc/imx31_ccm.o
  CC    aarch64-softmmu/hw/misc/imx25_ccm.o
  CC    aarch64-softmmu/hw/misc/imx6_ccm.o
  CC    aarch64-softmmu/hw/misc/imx6_src.o
  CC    x86_64-softmmu/monitor.o
  CC    x86_64-softmmu/gdbstub.o
  CC    aarch64-softmmu/hw/misc/mst_fpga.o
  CC    x86_64-softmmu/balloon.o
  CC    x86_64-softmmu/ioport.o
  CC    aarch64-softmmu/hw/misc/omap_clk.o
  CC    aarch64-softmmu/hw/misc/omap_gpmc.o
  CC    aarch64-softmmu/hw/misc/omap_l4.o
  CC    aarch64-softmmu/hw/misc/omap_sdrc.o
  CC    x86_64-softmmu/numa.o
  CC    x86_64-softmmu/qtest.o
  CC    aarch64-softmmu/hw/misc/omap_tap.o
  CC    x86_64-softmmu/bootdevice.o
  CC    x86_64-softmmu/kvm-all.o
  CC    x86_64-softmmu/memory.o
  CC    aarch64-softmmu/hw/misc/bcm2835_mbox.o
  CC    aarch64-softmmu/hw/misc/bcm2835_property.o
  CC    x86_64-softmmu/cputlb.o
  CC    aarch64-softmmu/hw/misc/zynq_slcr.o
  CC    x86_64-softmmu/memory_mapping.o
  CC    x86_64-softmmu/dump.o
  CC    aarch64-softmmu/hw/misc/zynq-xadc.o
  CC    x86_64-softmmu/migration/ram.o
  CC    x86_64-softmmu/migration/savevm.o
  CC    aarch64-softmmu/hw/misc/stm32f2xx_syscfg.o
  CC    aarch64-softmmu/hw/misc/edu.o
  CC    x86_64-softmmu/xen-common-stub.o
  CC    x86_64-softmmu/xen-hvm-stub.o
  CC    x86_64-softmmu/hw/acpi/nvdimm.o
  CC    x86_64-softmmu/hw/block/virtio-blk.o
  CC    x86_64-softmmu/hw/block/dataplane/virtio-blk.o
  CC    aarch64-softmmu/hw/misc/auxbus.o
  CC    x86_64-softmmu/hw/char/virtio-serial-bus.o
  CC    x86_64-softmmu/hw/core/nmi.o
  CC    x86_64-softmmu/hw/cpu/core.o
  CC    x86_64-softmmu/hw/display/vga.o
  CC    x86_64-softmmu/hw/display/virtio-gpu.o
  CC    aarch64-softmmu/hw/misc/aspeed_scu.o
  CC    x86_64-softmmu/hw/display/virtio-gpu-3d.o
  CC    x86_64-softmmu/hw/display/virtio-gpu-pci.o
  CC    x86_64-softmmu/hw/display/virtio-vga.o
  CC    x86_64-softmmu/hw/intc/apic.o
  CC    aarch64-softmmu/hw/net/virtio-net.o
  CC    x86_64-softmmu/hw/intc/apic_common.o
  CC    aarch64-softmmu/hw/net/vhost_net.o
  CC    x86_64-softmmu/hw/intc/ioapic.o
  CC    aarch64-softmmu/hw/pcmcia/pxa2xx.o
  CC    x86_64-softmmu/hw/isa/lpc_ich9.o
  CC    aarch64-softmmu/hw/scsi/virtio-scsi.o
  CC    aarch64-softmmu/hw/scsi/virtio-scsi-dataplane.o
  CC    x86_64-softmmu/hw/misc/vmport.o
  CC    aarch64-softmmu/hw/scsi/vhost-scsi.o
  CC    aarch64-softmmu/hw/sd/omap_mmc.o
  CC    aarch64-softmmu/hw/sd/pxa2xx_mmci.o
  CC    aarch64-softmmu/hw/ssi/omap_spi.o
  CC    x86_64-softmmu/hw/misc/ivshmem.o
  CC    x86_64-softmmu/hw/misc/pvpanic.o
  CC    aarch64-softmmu/hw/ssi/imx_spi.o
  CC    x86_64-softmmu/hw/misc/edu.o
  CC    aarch64-softmmu/hw/timer/exynos4210_mct.o
  CC    aarch64-softmmu/hw/timer/exynos4210_pwm.o
  CC    aarch64-softmmu/hw/timer/exynos4210_rtc.o
  CC    aarch64-softmmu/hw/timer/omap_gptimer.o
  CC    x86_64-softmmu/hw/misc/hyperv_testdev.o
  CC    aarch64-softmmu/hw/timer/omap_synctimer.o
  CC    x86_64-softmmu/hw/net/virtio-net.o
  CC    x86_64-softmmu/hw/net/vhost_net.o
  CC    aarch64-softmmu/hw/timer/pxa2xx_timer.o
  CC    aarch64-softmmu/hw/timer/digic-timer.o
  CC    x86_64-softmmu/hw/scsi/virtio-scsi.o
  CC    aarch64-softmmu/hw/timer/allwinner-a10-pit.o
  CC    aarch64-softmmu/hw/usb/tusb6010.o
  CC    aarch64-softmmu/hw/vfio/common.o
  CC    x86_64-softmmu/hw/scsi/virtio-scsi-dataplane.o
  CC    x86_64-softmmu/hw/scsi/vhost-scsi.o
  CC    aarch64-softmmu/hw/vfio/pci.o
  CC    x86_64-softmmu/hw/timer/mc146818rtc.o
  CC    x86_64-softmmu/hw/vfio/common.o
  CC    x86_64-softmmu/hw/vfio/pci.o
  CC    aarch64-softmmu/hw/vfio/pci-quirks.o
  CC    aarch64-softmmu/hw/vfio/platform.o
  CC    aarch64-softmmu/hw/vfio/calxeda-xgmac.o
  CC    x86_64-softmmu/hw/vfio/pci-quirks.o
  CC    x86_64-softmmu/hw/vfio/platform.o
  CC    x86_64-softmmu/hw/vfio/calxeda-xgmac.o
  CC    aarch64-softmmu/hw/vfio/amd-xgbe.o
  CC    aarch64-softmmu/hw/vfio/spapr.o
  CC    aarch64-softmmu/hw/virtio/virtio.o
  CC    aarch64-softmmu/hw/virtio/virtio-balloon.o
  CC    aarch64-softmmu/hw/virtio/vhost.o
  CC    x86_64-softmmu/hw/vfio/amd-xgbe.o
  CC    aarch64-softmmu/hw/virtio/vhost-backend.o
  CC    x86_64-softmmu/hw/vfio/spapr.o
  CC    aarch64-softmmu/hw/virtio/vhost-user.o
  CC    aarch64-softmmu/hw/arm/boot.o
  CC    aarch64-softmmu/hw/arm/collie.o
  CC    aarch64-softmmu/hw/arm/exynos4_boards.o
  CC    aarch64-softmmu/hw/arm/gumstix.o
  CC    aarch64-softmmu/hw/arm/highbank.o
  CC    aarch64-softmmu/hw/arm/digic_boards.o
  CC    aarch64-softmmu/hw/arm/integratorcp.o
  CC    x86_64-softmmu/hw/virtio/virtio.o
  CC    aarch64-softmmu/hw/arm/mainstone.o
  CC    aarch64-softmmu/hw/arm/musicpal.o
  CC    aarch64-softmmu/hw/arm/nseries.o
  CC    x86_64-softmmu/hw/virtio/virtio-balloon.o
  CC    aarch64-softmmu/hw/arm/omap_sx1.o
  CC    aarch64-softmmu/hw/arm/palm.o
  CC    x86_64-softmmu/hw/virtio/vhost.o
  CC    aarch64-softmmu/hw/arm/realview.o
  CC    aarch64-softmmu/hw/arm/spitz.o
  CC    aarch64-softmmu/hw/arm/stellaris.o
  CC    x86_64-softmmu/hw/virtio/vhost-backend.o
  CC    x86_64-softmmu/hw/virtio/vhost-user.o
  CC    x86_64-softmmu/hw/i386/multiboot.o
  CC    x86_64-softmmu/hw/i386/pc.o
  CC    x86_64-softmmu/hw/i386/pc_piix.o
  CC    x86_64-softmmu/hw/i386/pc_q35.o
  CC    aarch64-softmmu/hw/arm/tosa.o
  CC    aarch64-softmmu/hw/arm/versatilepb.o
  CC    aarch64-softmmu/hw/arm/vexpress.o
  CC    x86_64-softmmu/hw/i386/pc_sysfw.o
  CC    aarch64-softmmu/hw/arm/virt.o
  CC    x86_64-softmmu/hw/i386/x86-iommu.o
/tmp/qemu-test/src/hw/i386/pc_piix.c: In function ‘igd_passthrough_isa_bridge_create’:
/tmp/qemu-test/src/hw/i386/pc_piix.c:1037: warning: ‘pch_rev_id’ may be used uninitialized in this function
  CC    x86_64-softmmu/hw/i386/intel_iommu.o
  CC    x86_64-softmmu/hw/i386/kvmvapic.o
  CC    x86_64-softmmu/hw/i386/acpi-build.o
  CC    aarch64-softmmu/hw/arm/xilinx_zynq.o
  CC    aarch64-softmmu/hw/arm/z2.o
  CC    aarch64-softmmu/hw/arm/virt-acpi-build.o
  CC    aarch64-softmmu/hw/arm/netduino2.o
  CC    aarch64-softmmu/hw/arm/sysbus-fdt.o
  CC    x86_64-softmmu/hw/i386/pci-assign-load-rom.o
  CC    x86_64-softmmu/hw/i386/kvm/clock.o
  CC    x86_64-softmmu/hw/i386/kvm/apic.o
/tmp/qemu-test/src/hw/i386/acpi-build.c: In function ‘build_append_pci_bus_devices’:
/tmp/qemu-test/src/hw/i386/acpi-build.c:471: warning: ‘notify_method’ may be used uninitialized in this function
  CC    aarch64-softmmu/hw/arm/armv7m.o
  CC    x86_64-softmmu/hw/i386/kvm/i8259.o
  CC    x86_64-softmmu/hw/i386/kvm/ioapic.o
  CC    aarch64-softmmu/hw/arm/exynos4210.o
  CC    aarch64-softmmu/hw/arm/pxa2xx.o
  CC    aarch64-softmmu/hw/arm/pxa2xx_gpio.o
  CC    aarch64-softmmu/hw/arm/pxa2xx_pic.o
  CC    x86_64-softmmu/hw/i386/kvm/i8254.o
  CC    x86_64-softmmu/hw/i386/kvm/pci-assign.o
  CC    aarch64-softmmu/hw/arm/digic.o
  CC    x86_64-softmmu/target-i386/translate.o
  CC    aarch64-softmmu/hw/arm/omap1.o
  CC    x86_64-softmmu/target-i386/helper.o
  CC    x86_64-softmmu/target-i386/cpu.o
  CC    aarch64-softmmu/hw/arm/omap2.o
  CC    aarch64-softmmu/hw/arm/strongarm.o
  CC    x86_64-softmmu/target-i386/bpt_helper.o
  CC    x86_64-softmmu/target-i386/excp_helper.o
  CC    x86_64-softmmu/target-i386/fpu_helper.o
  CC    x86_64-softmmu/target-i386/cc_helper.o
  CC    aarch64-softmmu/hw/arm/allwinner-a10.o
  CC    x86_64-softmmu/target-i386/int_helper.o
  CC    x86_64-softmmu/target-i386/svm_helper.o
  CC    aarch64-softmmu/hw/arm/cubieboard.o
  CC    aarch64-softmmu/hw/arm/bcm2835_peripherals.o
  CC    aarch64-softmmu/hw/arm/bcm2836.o
  CC    x86_64-softmmu/target-i386/smm_helper.o
  CC    x86_64-softmmu/target-i386/misc_helper.o
  CC    x86_64-softmmu/target-i386/mem_helper.o
  CC    x86_64-softmmu/target-i386/seg_helper.o
  CC    aarch64-softmmu/hw/arm/raspi.o
  CC    x86_64-softmmu/target-i386/mpx_helper.o
  CC    x86_64-softmmu/target-i386/gdbstub.o
  CC    x86_64-softmmu/target-i386/machine.o
  CC    aarch64-softmmu/hw/arm/stm32f205_soc.o
  CC    x86_64-softmmu/target-i386/arch_memory_mapping.o
  CC    x86_64-softmmu/target-i386/arch_dump.o
  CC    x86_64-softmmu/target-i386/monitor.o
  CC    aarch64-softmmu/hw/arm/xlnx-zynqmp.o
  CC    x86_64-softmmu/target-i386/kvm.o
  CC    aarch64-softmmu/hw/arm/xlnx-ep108.o
  CC    x86_64-softmmu/target-i386/hyperv.o
  GEN   trace/generated-helpers.c
  CC    x86_64-softmmu/trace/control-target.o
  CC    aarch64-softmmu/hw/arm/fsl-imx25.o
  CC    x86_64-softmmu/trace/generated-helpers.o
  CC    aarch64-softmmu/hw/arm/imx25_pdk.o
  LINK  x86_64-softmmu/qemu-system-x86_64
  CC    aarch64-softmmu/hw/arm/fsl-imx31.o
  CC    aarch64-softmmu/hw/arm/kzm.o
  CC    aarch64-softmmu/hw/arm/fsl-imx6.o
  CC    aarch64-softmmu/hw/arm/sabrelite.o
  CC    aarch64-softmmu/hw/arm/ast2400.o
  CC    aarch64-softmmu/hw/arm/palmetto-bmc.o
  CC    aarch64-softmmu/target-arm/arm-semi.o
  CC    aarch64-softmmu/target-arm/machine.o
  CC    aarch64-softmmu/target-arm/psci.o
  CC    aarch64-softmmu/target-arm/arch_dump.o
  CC    aarch64-softmmu/target-arm/monitor.o
  CC    aarch64-softmmu/target-arm/kvm-stub.o
  CC    aarch64-softmmu/target-arm/translate.o
  CC    aarch64-softmmu/target-arm/op_helper.o
  CC    aarch64-softmmu/target-arm/helper.o
  CC    aarch64-softmmu/target-arm/cpu.o
  CC    aarch64-softmmu/target-arm/neon_helper.o
  CC    aarch64-softmmu/target-arm/iwmmxt_helper.o
  CC    aarch64-softmmu/target-arm/gdbstub.o
  CC    aarch64-softmmu/target-arm/cpu64.o
  CC    aarch64-softmmu/target-arm/translate-a64.o
  CC    aarch64-softmmu/target-arm/helper-a64.o
  CC    aarch64-softmmu/target-arm/gdbstub64.o
  CC    aarch64-softmmu/target-arm/crypto_helper.o
  CC    aarch64-softmmu/target-arm/arm-powerctl.o
/tmp/qemu-test/src/target-arm/translate-a64.c: In function ‘handle_shri_with_rndacc’:
/tmp/qemu-test/src/target-arm/translate-a64.c:6308: warning: ‘tcg_src_hi’ may be used uninitialized in this function
/tmp/qemu-test/src/target-arm/translate-a64.c: In function ‘disas_simd_scalar_two_reg_misc’:
/tmp/qemu-test/src/target-arm/translate-a64.c:8035: warning: ‘rmode’ may be used uninitialized in this function
  GEN   trace/generated-helpers.c
  CC    aarch64-softmmu/trace/control-target.o
  CC    aarch64-softmmu/gdbstub-xml.o
  CC    aarch64-softmmu/trace/generated-helpers.o
  LINK  aarch64-softmmu/qemu-system-aarch64
  TEST  tests/qapi-schema/alternate-any.out
  TEST  tests/qapi-schema/alternate-array.out
  TEST  tests/qapi-schema/alternate-base.out
  TEST  tests/qapi-schema/alternate-clash.out
  TEST  tests/qapi-schema/alternate-conflict-dict.out
  TEST  tests/qapi-schema/alternate-conflict-string.out
  TEST  tests/qapi-schema/alternate-empty.out
  TEST  tests/qapi-schema/alternate-nested.out
  TEST  tests/qapi-schema/alternate-unknown.out
  TEST  tests/qapi-schema/args-alternate.out
  TEST  tests/qapi-schema/args-any.out
  TEST  tests/qapi-schema/args-array-empty.out
  TEST  tests/qapi-schema/args-array-unknown.out
  TEST  tests/qapi-schema/args-bad-boxed.out
  TEST  tests/qapi-schema/args-boxed-anon.out
  TEST  tests/qapi-schema/args-boxed-empty.out
  TEST  tests/qapi-schema/args-boxed-string.out
  TEST  tests/qapi-schema/args-int.out
  TEST  tests/qapi-schema/args-invalid.out
  TEST  tests/qapi-schema/args-member-array-bad.out
  TEST  tests/qapi-schema/args-member-case.out
  TEST  tests/qapi-schema/args-member-unknown.out
  TEST  tests/qapi-schema/args-name-clash.out
  TEST  tests/qapi-schema/args-union.out
  TEST  tests/qapi-schema/args-unknown.out
  TEST  tests/qapi-schema/bad-base.out
  TEST  tests/qapi-schema/bad-data.out
  TEST  tests/qapi-schema/bad-ident.out
  TEST  tests/qapi-schema/bad-type-bool.out
  TEST  tests/qapi-schema/bad-type-dict.out
  TEST  tests/qapi-schema/bad-type-int.out
  TEST  tests/qapi-schema/base-cycle-direct.out
  TEST  tests/qapi-schema/base-cycle-indirect.out
  TEST  tests/qapi-schema/command-int.out
  TEST  tests/qapi-schema/comments.out
  TEST  tests/qapi-schema/double-data.out
  TEST  tests/qapi-schema/double-type.out
  TEST  tests/qapi-schema/duplicate-key.out
  TEST  tests/qapi-schema/empty.out
  TEST  tests/qapi-schema/enum-bad-name.out
  TEST  tests/qapi-schema/enum-bad-prefix.out
  TEST  tests/qapi-schema/enum-clash-member.out
  TEST  tests/qapi-schema/enum-dict-member.out
  TEST  tests/qapi-schema/enum-int-member.out
  TEST  tests/qapi-schema/enum-member-case.out
  TEST  tests/qapi-schema/enum-missing-data.out
  TEST  tests/qapi-schema/enum-wrong-data.out
  TEST  tests/qapi-schema/escape-outside-string.out
  TEST  tests/qapi-schema/escape-too-big.out
  TEST  tests/qapi-schema/escape-too-short.out
  TEST  tests/qapi-schema/event-boxed-empty.out
  TEST  tests/qapi-schema/event-case.out
  TEST  tests/qapi-schema/event-nest-struct.out
  TEST  tests/qapi-schema/flat-union-array-branch.out
  TEST  tests/qapi-schema/flat-union-bad-base.out
  TEST  tests/qapi-schema/flat-union-bad-discriminator.out
  TEST  tests/qapi-schema/flat-union-base-any.out
  TEST  tests/qapi-schema/flat-union-base-union.out
  TEST  tests/qapi-schema/flat-union-clash-member.out
  TEST  tests/qapi-schema/flat-union-empty.out
  TEST  tests/qapi-schema/flat-union-inline.out
  TEST  tests/qapi-schema/flat-union-incomplete-branch.out
  TEST  tests/qapi-schema/flat-union-int-branch.out
  TEST  tests/qapi-schema/flat-union-invalid-branch-key.out
  TEST  tests/qapi-schema/flat-union-invalid-discriminator.out
  TEST  tests/qapi-schema/flat-union-no-base.out
  TEST  tests/qapi-schema/flat-union-optional-discriminator.out
  TEST  tests/qapi-schema/flat-union-string-discriminator.out
  TEST  tests/qapi-schema/funny-char.out
  TEST  tests/qapi-schema/ident-with-escape.out
  TEST  tests/qapi-schema/include-before-err.out
  TEST  tests/qapi-schema/include-cycle.out
  TEST  tests/qapi-schema/include-format-err.out
  TEST  tests/qapi-schema/include-nested-err.out
  TEST  tests/qapi-schema/include-no-file.out
  TEST  tests/qapi-schema/include-non-file.out
  TEST  tests/qapi-schema/include-relpath.out
  TEST  tests/qapi-schema/include-repetition.out
  TEST  tests/qapi-schema/include-self-cycle.out
  TEST  tests/qapi-schema/include-simple.out
  TEST  tests/qapi-schema/indented-expr.out
  TEST  tests/qapi-schema/leading-comma-list.out
  TEST  tests/qapi-schema/leading-comma-object.out
  TEST  tests/qapi-schema/missing-colon.out
  TEST  tests/qapi-schema/missing-comma-list.out
  TEST  tests/qapi-schema/missing-comma-object.out
  TEST  tests/qapi-schema/missing-type.out
  TEST  tests/qapi-schema/nested-struct-data.out
  TEST  tests/qapi-schema/non-objects.out
  TEST  tests/qapi-schema/qapi-schema-test.out
  TEST  tests/qapi-schema/quoted-structural-chars.out
  TEST  tests/qapi-schema/redefined-builtin.out
  TEST  tests/qapi-schema/redefined-command.out
  TEST  tests/qapi-schema/redefined-event.out
  TEST  tests/qapi-schema/redefined-type.out
  TEST  tests/qapi-schema/reserved-command-q.out
  TEST  tests/qapi-schema/reserved-enum-q.out
  TEST  tests/qapi-schema/reserved-member-has.out
  TEST  tests/qapi-schema/reserved-member-q.out
  TEST  tests/qapi-schema/reserved-member-u.out
  TEST  tests/qapi-schema/reserved-member-underscore.out
  TEST  tests/qapi-schema/reserved-type-kind.out
  TEST  tests/qapi-schema/reserved-type-list.out
  TEST  tests/qapi-schema/returns-alternate.out
  TEST  tests/qapi-schema/returns-array-bad.out
  TEST  tests/qapi-schema/returns-dict.out
  TEST  tests/qapi-schema/returns-unknown.out
  TEST  tests/qapi-schema/returns-whitelist.out
  TEST  tests/qapi-schema/struct-base-clash-deep.out
  TEST  tests/qapi-schema/struct-base-clash.out
  TEST  tests/qapi-schema/struct-data-invalid.out
  TEST  tests/qapi-schema/struct-member-invalid.out
  TEST  tests/qapi-schema/trailing-comma-list.out
  TEST  tests/qapi-schema/trailing-comma-object.out
  TEST  tests/qapi-schema/type-bypass-bad-gen.out
  TEST  tests/qapi-schema/unclosed-list.out
  TEST  tests/qapi-schema/unclosed-object.out
  TEST  tests/qapi-schema/unclosed-string.out
  TEST  tests/qapi-schema/unicode-str.out
  TEST  tests/qapi-schema/union-base-no-discriminator.out
  TEST  tests/qapi-schema/union-branch-case.out
  TEST  tests/qapi-schema/union-clash-branches.out
  TEST  tests/qapi-schema/union-empty.out
  TEST  tests/qapi-schema/union-invalid-base.out
  TEST  tests/qapi-schema/union-optional-branch.out
  TEST  tests/qapi-schema/union-unknown.out
  TEST  tests/qapi-schema/unknown-escape.out
  TEST  tests/qapi-schema/unknown-expr-key.out
  CC    tests/check-qdict.o
  CC    tests/check-qfloat.o
  CC    tests/check-qint.o
  CC    tests/check-qstring.o
  CC    tests/check-qlist.o
  CC    tests/check-qnull.o
  CC    tests/check-qjson.o
  CC    tests/test-qmp-output-visitor.o
  GEN   tests/test-qapi-visit.c
  GEN   tests/test-qapi-types.c
  GEN   tests/test-qapi-event.c
  GEN   tests/test-qmp-introspect.c
  CC    tests/test-clone-visitor.o
  CC    tests/test-qmp-input-visitor.o
  CC    tests/test-qmp-input-strict.o
  CC    tests/test-qmp-commands.o
  GEN   tests/test-qmp-marshal.c
  CC    tests/test-string-input-visitor.o
  CC    tests/test-string-output-visitor.o
  CC    tests/test-qmp-event.o
  CC    tests/test-opts-visitor.o
  CC    tests/test-coroutine.o
  CC    tests/test-visitor-serialization.o
  CC    tests/test-iov.o
  CC    tests/test-aio.o
  CC    tests/test-rfifolock.o
  CC    tests/test-throttle.o
  CC    tests/test-thread-pool.o
  CC    tests/test-hbitmap.o
  CC    tests/test-blockjob.o
  CC    tests/test-blockjob-txn.o
  CC    tests/test-x86-cpuid.o
  CC    tests/test-xbzrle.o
  CC    tests/test-vmstate.o
  CC    tests/test-cutils.o
  CC    tests/test-mul64.o
  CC    tests/test-int128.o
  CC    tests/rcutorture.o
  CC    tests/test-rcu-list.o
  CC    tests/test-qdist.o
  CC    tests/test-qht.o
  CC    tests/test-qht-par.o
  CC    tests/qht-bench.o
  CC    tests/test-bitops.o
  CC    tests/check-qom-interface.o
  CC    tests/check-qom-proplist.o
  CC    tests/test-qemu-opts.o
  CC    tests/test-write-threshold.o
  CC    tests/test-crypto-hash.o
  CC    tests/test-crypto-cipher.o
  CC    tests/test-crypto-secret.o
  CC    tests/test-qga.o
  CC    tests/libqtest.o
  CC    tests/test-timed-average.o
  CC    tests/test-io-task.o
  CC    tests/test-io-channel-socket.o
  CC    tests/io-channel-helpers.o
  CC    tests/test-io-channel-file.o
  CC    tests/test-io-channel-command.o
  CC    tests/test-io-channel-buffer.o
  CC    tests/test-base64.o
  CC    tests/test-crypto-ivgen.o
  CC    tests/test-crypto-afsplit.o
  CC    tests/test-crypto-xts.o
  CC    tests/test-crypto-block.o
  CC    tests/test-logging.o
  CC    tests/vhost-user-test.o
  CC    tests/endianness-test.o
  CC    tests/fdc-test.o
  CC    tests/ide-test.o
  CC    tests/libqos/pci.o
  CC    tests/libqos/fw_cfg.o
  CC    tests/libqos/malloc.o
/tmp/qemu-test/src/tests/ide-test.c: In function ‘cdrom_pio_impl’:
/tmp/qemu-test/src/tests/ide-test.c:739: warning: ignoring return value of ‘fwrite’, declared with attribute warn_unused_result
/tmp/qemu-test/src/tests/ide-test.c: In function ‘test_cdrom_dma’:
/tmp/qemu-test/src/tests/ide-test.c:832: warning: ignoring return value of ‘fwrite’, declared with attribute warn_unused_result
  CC    tests/libqos/i2c.o
  CC    tests/libqos/libqos.o
  CC    tests/libqos/pci-pc.o
  CC    tests/libqos/malloc-pc.o
  CC    tests/libqos/libqos-pc.o
  CC    tests/libqos/ahci.o
  CC    tests/ahci-test.o
  CC    tests/hd-geo-test.o
  CC    tests/boot-order-test.o
  CC    tests/bios-tables-test.o
  CC    tests/boot-sector.o
  CC    tests/pxe-test.o
  CC    tests/rtc-test.o
  CC    tests/ipmi-kcs-test.o
  CC    tests/ipmi-bt-test.o
/tmp/qemu-test/src/tests/boot-sector.c: In function ‘boot_sector_init’:
/tmp/qemu-test/src/tests/boot-sector.c:80: warning: ignoring return value of ‘fwrite’, declared with attribute warn_unused_result
  CC    tests/i440fx-test.o
  CC    tests/fw_cfg-test.o
  CC    tests/drive_del-test.o
  CC    tests/wdt_ib700-test.o
  CC    tests/tco-test.o
  CC    tests/e1000-test.o
  CC    tests/e1000e-test.o
  CC    tests/rtl8139-test.o
  CC    tests/pcnet-test.o
  CC    tests/eepro100-test.o
  CC    tests/ne2000-test.o
/tmp/qemu-test/src/tests/test-int128.c: In function ‘expand’:
/tmp/qemu-test/src/tests/test-int128.c:44: warning: implicit declaration of function ‘int128_make128’
/tmp/qemu-test/src/tests/test-int128.c:44: warning: nested extern declaration of ‘int128_make128’
/tmp/qemu-test/src/tests/test-int128.c:44: error: conversion to non-scalar type requested
/tmp/qemu-test/src/tests/test-int128.c: In function ‘test_and’:
/tmp/qemu-test/src/tests/test-int128.c:57: warning: implicit declaration of function ‘int128_getlo’
/tmp/qemu-test/src/tests/test-int128.c:57: warning: nested extern declaration of ‘int128_getlo’
/tmp/qemu-test/src/tests/test-int128.c:58: warning: implicit declaration of function ‘int128_gethi’
/tmp/qemu-test/src/tests/test-int128.c:58: warning: nested extern declaration of ‘int128_gethi’
/tmp/qemu-test/src/tests/test-int128.c: At top level:
/tmp/qemu-test/src/tests/test-int128.c:180: warning: ‘__noclone__’ attribute directive ignored
make: *** [tests/test-int128.o] Error 1
make: *** Waiting for unfinished jobs....
tests/docker/Makefile.include:104: recipe for target 'docker-run-test-quick@centos6' failed
make: *** [docker-run-test-quick@centos6] Error 1
=== OUTPUT END ===

Test command exited with code: 2


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@freelists.org

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH v2 07/27] tcg: Add EXCP_ATOMIC
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 07/27] tcg: Add EXCP_ATOMIC Richard Henderson
@ 2016-09-08  8:38   ` Alex Bennée
  2016-09-08 16:26     ` Richard Henderson
  0 siblings, 1 reply; 48+ messages in thread
From: Alex Bennée @ 2016-09-08  8:38 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, cota, pbonzini, peter.maydell, serge.fdrv


Richard Henderson <rth@twiddle.net> writes:

> When we cannot emulate an atomic operation within a parallel
> context, this exception allows us to stop the world and try
> again in a serial context.
>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  cpu-exec-common.c       |  6 +++++
>  cpu-exec.c              | 23 +++++++++++++++++++
>  cpus.c                  |  6 +++++
>  include/exec/cpu-all.h  |  1 +
>  include/exec/exec-all.h |  1 +
>  include/qemu-common.h   |  1 +
>  linux-user/main.c       | 59 ++++++++++++++++++++++++++++++++++++++++++++++++-
>  tcg/tcg.h               |  1 +
>  translate-all.c         |  1 +
>  9 files changed, 98 insertions(+), 1 deletion(-)
>
> diff --git a/cpu-exec-common.c b/cpu-exec-common.c
> index 0cb4ae6..767d9c6 100644
> --- a/cpu-exec-common.c
> +++ b/cpu-exec-common.c
> @@ -77,3 +77,9 @@ void cpu_loop_exit_restore(CPUState *cpu, uintptr_t pc)
>      }
>      siglongjmp(cpu->jmp_env, 1);
>  }
> +
> +void cpu_loop_exit_atomic(CPUState *cpu, uintptr_t pc)
> +{
> +    cpu->exception_index = EXCP_ATOMIC;
> +    cpu_loop_exit_restore(cpu, pc);
> +}
> diff --git a/cpu-exec.c b/cpu-exec.c
> index b840e1d..041f8b7 100644
> --- a/cpu-exec.c
> +++ b/cpu-exec.c
> @@ -225,6 +225,29 @@ static void cpu_exec_nocache(CPUState *cpu, int max_cycles,
>  }
>  #endif
>
> +void cpu_exec_step(CPUState *cpu)
> +{
> +    CPUArchState *env = (CPUArchState *)cpu->env_ptr;
> +    TranslationBlock *tb;
> +    target_ulong cs_base, pc;
> +    uint32_t flags;
> +    bool old_tb_flushed;
> +
> +    old_tb_flushed = cpu->tb_flushed;
> +    cpu->tb_flushed = false;
> +
> +    cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
> +    tb = tb_gen_code(cpu, pc, cs_base, flags,
> +                     1 | CF_NOCACHE | CF_IGNORE_ICOUNT);
> +    tb->orig_tb = NULL;
> +    cpu->tb_flushed |= old_tb_flushed;
> +    /* execute the generated code */
> +    trace_exec_tb_nocache(tb, pc);
> +    cpu_tb_exec(cpu, tb);
> +    tb_phys_invalidate(tb, -1);
> +    tb_free(tb);
> +}
> +
>  struct tb_desc {
>      target_ulong pc;
>      target_ulong cs_base;
> diff --git a/cpus.c b/cpus.c
> index 84c3520..a01bbbd 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -1575,6 +1575,12 @@ static void tcg_exec_all(void)
>              if (r == EXCP_DEBUG) {
>                  cpu_handle_guest_debug(cpu);
>                  break;
> +            } else if (r == EXCP_ATOMIC) {
> +                /* ??? When we begin running cpus in parallel, we should
> +                   stop all cpus, clear parallel_cpus, and interpret a
> +                   single insn with cpu_exec_step.  In the meantime,
> +                   we should never get here.  */
> +                abort();

Pranith has been hitting this abort in the latest merged tree with MTTCG
but I'm a little unclear how it got here. So is the plan the MTTCG
thread function should do a step_atomic a-la user mode but we'll never
get here in the single threaded case?

>              }
>          } else if (cpu->stop || cpu->stopped) {
>              if (cpu->unplug) {
> diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
> index 8007abb..d2aac4b 100644
> --- a/include/exec/cpu-all.h
> +++ b/include/exec/cpu-all.h
> @@ -31,6 +31,7 @@
>  #define EXCP_DEBUG      0x10002 /* cpu stopped after a breakpoint or singlestep */
>  #define EXCP_HALTED     0x10003 /* cpu is halted (waiting for external event) */
>  #define EXCP_YIELD      0x10004 /* cpu wants to yield timeslice to another */
> +#define EXCP_ATOMIC     0x10005 /* stop-the-world and emulate atomic */
>
>  /* some important defines:
>   *
> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
> index c1f59fa..ec72c5a 100644
> --- a/include/exec/exec-all.h
> +++ b/include/exec/exec-all.h
> @@ -59,6 +59,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>  void cpu_exec_init(CPUState *cpu, Error **errp);
>  void QEMU_NORETURN cpu_loop_exit(CPUState *cpu);
>  void QEMU_NORETURN cpu_loop_exit_restore(CPUState *cpu, uintptr_t pc);
> +void QEMU_NORETURN cpu_loop_exit_atomic(CPUState *cpu, uintptr_t pc);
>
>  #if !defined(CONFIG_USER_ONLY)
>  void cpu_reloading_memory_map(void);
> diff --git a/include/qemu-common.h b/include/qemu-common.h
> index 1f2cb94..77e379d 100644
> --- a/include/qemu-common.h
> +++ b/include/qemu-common.h
> @@ -76,6 +76,7 @@ void tcg_exec_init(unsigned long tb_size);
>  bool tcg_enabled(void);
>
>  void cpu_exec_init_all(void);
> +void cpu_exec_step(CPUState *cpu);
>
>  /**
>   * Sends a (part of) iovec down a socket, yielding when the socket is full, or
> diff --git a/linux-user/main.c b/linux-user/main.c
> index 78d8d04..54df300 100644
> --- a/linux-user/main.c
> +++ b/linux-user/main.c
> @@ -179,13 +179,25 @@ static inline void start_exclusive(void)
>  }
>
>  /* Finish an exclusive operation.  */
> -static inline void __attribute__((unused)) end_exclusive(void)
> +static inline void end_exclusive(void)
>  {
>      pending_cpus = 0;
>      pthread_cond_broadcast(&exclusive_resume);
>      pthread_mutex_unlock(&exclusive_lock);
>  }
>
> +static void step_atomic(CPUState *cpu)
> +{
> +    start_exclusive();
> +
> +    /* Since we got here, we know that parallel_cpus must be true.  */
> +    parallel_cpus = false;
> +    cpu_exec_step(cpu);
> +    parallel_cpus = true;
> +
> +    end_exclusive();
> +}
> +
>  /* Wait for exclusive ops to finish, and begin cpu execution.  */
>  static inline void cpu_exec_start(CPUState *cpu)
>  {
> @@ -437,6 +449,9 @@ void cpu_loop(CPUX86State *env)
>                    }
>              }
>              break;
> +        case EXCP_ATOMIC:
> +            step_atomic(cs);
> +            break;
>          default:
>              pc = env->segs[R_CS].base + env->eip;
>              EXCP_DUMP(env, "qemu: 0x%08lx: unhandled CPU exception 0x%x - aborting\n",
> @@ -929,6 +944,9 @@ void cpu_loop(CPUARMState *env)
>          case EXCP_YIELD:
>              /* nothing to do here for user-mode, just resume guest code */
>              break;
> +        case EXCP_ATOMIC:
> +            step_atomic(cs);
> +            break;
>          default:
>          error:
>              EXCP_DUMP(env, "qemu: unhandled CPU exception 0x%x - aborting\n", trapnr);
> @@ -1128,6 +1146,9 @@ void cpu_loop(CPUARMState *env)
>          case EXCP_YIELD:
>              /* nothing to do here for user-mode, just resume guest code */
>              break;
> +        case EXCP_ATOMIC:
> +            step_atomic(cs);
> +            break;
>          default:
>              EXCP_DUMP(env, "qemu: unhandled CPU exception 0x%x - aborting\n", trapnr);
>              abort();
> @@ -1217,6 +1238,9 @@ void cpu_loop(CPUUniCore32State *env)
>                  }
>              }
>              break;
> +        case EXCP_ATOMIC:
> +            step_atomic(cs);
> +            break;
>          default:
>              goto error;
>          }
> @@ -1489,6 +1513,9 @@ void cpu_loop (CPUSPARCState *env)
>                    }
>              }
>              break;
> +        case EXCP_ATOMIC:
> +            step_atomic(cs);
> +            break;
>          default:
>              printf ("Unhandled trap: 0x%x\n", trapnr);
>              cpu_dump_state(cs, stderr, fprintf, 0);
> @@ -2037,6 +2064,9 @@ void cpu_loop(CPUPPCState *env)
>          case EXCP_INTERRUPT:
>              /* just indicate that signals should be handled asap */
>              break;
> +        case EXCP_ATOMIC:
> +            step_atomic(cs);
> +            break;
>          default:
>              cpu_abort(cs, "Unknown exception 0x%d. Aborting\n", trapnr);
>              break;
> @@ -2710,6 +2740,9 @@ done_syscall:
>                  }
>              }
>              break;
> +        case EXCP_ATOMIC:
> +            step_atomic(cs);
> +            break;
>          default:
>  error:
>              EXCP_DUMP(env, "qemu: unhandled CPU exception 0x%x - aborting\n", trapnr);
> @@ -2796,6 +2829,9 @@ void cpu_loop(CPUOpenRISCState *env)
>          case EXCP_NR:
>              qemu_log_mask(CPU_LOG_INT, "\nNR\n");
>              break;
> +        case EXCP_ATOMIC:
> +            step_atomic(cs);
> +            break;
>          default:
>              EXCP_DUMP(env, "\nqemu: unhandled CPU exception %#x - aborting\n",
>                       trapnr);
> @@ -2871,6 +2907,9 @@ void cpu_loop(CPUSH4State *env)
>              queue_signal(env, info.si_signo, &info);
>  	    break;
>
> +        case EXCP_ATOMIC:
> +            step_atomic(cs);
> +            break;
>          default:
>              printf ("Unhandled trap: 0x%x\n", trapnr);
>              cpu_dump_state(cs, stderr, fprintf, 0);
> @@ -2936,6 +2975,9 @@ void cpu_loop(CPUCRISState *env)
>                    }
>              }
>              break;
> +        case EXCP_ATOMIC:
> +            step_atomic(cs);
> +            break;
>          default:
>              printf ("Unhandled trap: 0x%x\n", trapnr);
>              cpu_dump_state(cs, stderr, fprintf, 0);
> @@ -3050,6 +3092,9 @@ void cpu_loop(CPUMBState *env)
>                    }
>              }
>              break;
> +        case EXCP_ATOMIC:
> +            step_atomic(cs);
> +            break;
>          default:
>              printf ("Unhandled trap: 0x%x\n", trapnr);
>              cpu_dump_state(cs, stderr, fprintf, 0);
> @@ -3151,6 +3196,9 @@ void cpu_loop(CPUM68KState *env)
>                    }
>              }
>              break;
> +        case EXCP_ATOMIC:
> +            step_atomic(cs);
> +            break;
>          default:
>              EXCP_DUMP(env, "qemu: unhandled CPU exception 0x%x - aborting\n", trapnr);
>              abort();
> @@ -3386,6 +3434,9 @@ void cpu_loop(CPUAlphaState *env)
>          case EXCP_INTERRUPT:
>              /* Just indicate that signals should be handled asap.  */
>              break;
> +        case EXCP_ATOMIC:
> +            step_atomic(cs);
> +            break;
>          default:
>              printf ("Unhandled trap: 0x%x\n", trapnr);
>              cpu_dump_state(cs, stderr, fprintf, 0);
> @@ -3513,6 +3564,9 @@ void cpu_loop(CPUS390XState *env)
>              queue_signal(env, info.si_signo, &info);
>              break;
>
> +        case EXCP_ATOMIC:
> +            step_atomic(cs);
> +            break;
>          default:
>              fprintf(stderr, "Unhandled trap: 0x%x\n", trapnr);
>              cpu_dump_state(cs, stderr, fprintf, 0);
> @@ -3765,6 +3819,9 @@ void cpu_loop(CPUTLGState *env)
>          case TILEGX_EXCP_REG_UDN_ACCESS:
>              gen_sigill_reg(env);
>              break;
> +        case EXCP_ATOMIC:
> +            step_atomic(cs);
> +            break;


Off topic but this just reminds me linux-user could do with some
clean-up to move the common code into helpers for the cpu loop handling.

>          default:
>              fprintf(stderr, "trapnr is %d[0x%x].\n", trapnr, trapnr);
>              g_assert_not_reached();
> diff --git a/tcg/tcg.h b/tcg/tcg.h
> index 66ae0c7..ab67537 100644
> --- a/tcg/tcg.h
> +++ b/tcg/tcg.h
> @@ -691,6 +691,7 @@ struct TCGContext {
>  };
>
>  extern TCGContext tcg_ctx;
> +extern bool parallel_cpus;
>
>  static inline void tcg_set_insn_param(int op_idx, int arg, TCGArg v)
>  {
> diff --git a/translate-all.c b/translate-all.c
> index eaa95e4..99ae7f9 100644
> --- a/translate-all.c
> +++ b/translate-all.c
> @@ -119,6 +119,7 @@ static void *l1_map[V_L1_SIZE];
>
>  /* code generation context */
>  TCGContext tcg_ctx;
> +bool parallel_cpus;
>
>  /* translation block context */
>  #ifdef CONFIG_USER_ONLY


--
Alex Bennée

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH v2 08/27] HACK: Always enable parallel_cpus
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 08/27] HACK: Always enable parallel_cpus Richard Henderson
@ 2016-09-08  8:39   ` Alex Bennée
  2016-09-08 16:22     ` Richard Henderson
  0 siblings, 1 reply; 48+ messages in thread
From: Alex Bennée @ 2016-09-08  8:39 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, cota, pbonzini, peter.maydell, serge.fdrv


Richard Henderson <rth@twiddle.net> writes:

> This is really just a placeholder for an actual
> command-line switch for mttcg.
> ---
>  translate-all.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/translate-all.c b/translate-all.c
> index 99ae7f9..a10fa06 100644
> --- a/translate-all.c
> +++ b/translate-all.c
> @@ -119,7 +119,7 @@ static void *l1_map[V_L1_SIZE];
>
>  /* code generation context */
>  TCGContext tcg_ctx;
> -bool parallel_cpus;
> +bool parallel_cpus = 1;

I appreciate this is currently a hack but for CONFIG_USER it should
always be true anyway.

>
>  /* translation block context */
>  #ifdef CONFIG_USER_ONLY


--
Alex Bennée

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH v2 09/27] tcg: Add atomic helpers
  2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 09/27] tcg: Add atomic helpers Richard Henderson
@ 2016-09-08 13:43   ` Alex Bennée
  2016-09-08 16:08     ` Richard Henderson
  0 siblings, 1 reply; 48+ messages in thread
From: Alex Bennée @ 2016-09-08 13:43 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, cota, pbonzini, peter.maydell, serge.fdrv


Richard Henderson <rth@twiddle.net> writes:

> Add all of cmpxchg, op_fetch, fetch_op, and xchg.
> Handle both endian-ness, and sizes up to 8.
> Handle expanding non-atomically, when emulating in serial.
>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
<snip>
>  /* For the benefit of TCG generated code, we want to avoid the complication
>     of ABI-specific return type promotion and always return a value extended
>     to the register size of the host.  This is tcg_target_long, except in the
> @@ -508,11 +507,184 @@ void probe_write(CPUArchState *env, target_ulong addr, int mmu_idx,
>      }
>  }
>  #endif
> +
> +#if DATA_SIZE == 1
> +# define HE_SUFFIX  _mmu
> +#elif defined(HOST_WORDS_BIGENDIAN)
> +# define HE_SUFFIX  _be_mmu
> +# define RE_SUFFIX  _le_mmu
> +#else
> +# define HE_SUFFIX  _le_mmu
> +# define RE_SUFFIX  _be_mmu
> +#endif
> +
> +#define ATOMIC_MMU_BODY                                                 \
> +    DATA_TYPE *haddr;                                                   \
> +    do {                                                                \
> +        unsigned mmu_idx = get_mmuidx(oi);                              \
> +        int index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);    \
> +        CPUTLBEntry *tlbe = &env->tlb_table[mmu_idx][index];            \
> +        target_ulong tlb_addr = tlbe->addr_write;                       \
> +        int a_bits = get_alignment_bits(get_memop(oi));                 \
> +                                                                        \
> +        /* Adjust the given return address.  */                         \
> +        retaddr -= GETPC_ADJ;                                           \
> +                                                                        \
> +        /* Enforce guest required alignment.  */                        \
> +        if (unlikely(a_bits > 0 && (addr & ((1 << a_bits) - 1)))) {     \
> +            /* ??? Maybe indicate atomic op to cpu_unaligned_access */  \
> +            cpu_unaligned_access(ENV_GET_CPU(env), addr, MMU_DATA_STORE, \
> +                                 mmu_idx, retaddr);                     \
> +        }                                                               \
> +        /* Enforce qemu required alignment.  */                         \
> +        if (unlikely(addr & ((1 << SHIFT) - 1))) {                      \
> +            /* We get here if guest alignment was not requested,        \
> +               or was not enforced by cpu_unaligned_access above.       \
> +               We might widen the access and emulate, but for now       \
> +               mark an exception and exit the cpu loop.  */             \
> +            cpu_loop_exit_atomic(ENV_GET_CPU(env), retaddr);            \
> +        }                                                               \
> +                                                                        \
> +        /* Check TLB entry and enforce page permissions.  */            \
> +        if ((addr & TARGET_PAGE_MASK)                                   \
> +            != (tlb_addr & (TARGET_PAGE_MASK | TLB_INVALID_MASK))) {    \
> +            if (!VICTIM_TLB_HIT(addr_write)) {                          \
> +                tlb_fill(ENV_GET_CPU(env), addr, MMU_DATA_STORE,        \
> +                         mmu_idx, retaddr);                             \
> +            }                                                           \
> +            tlb_addr = tlbe->addr_write;                                \
> +        } else if (unlikely(tlbe->addr_read != tlb_addr)) {             \
> +            /* Let the guest notice RMW on a write-only page.  */       \
> +            tlb_fill(ENV_GET_CPU(env), addr, MMU_DATA_LOAD,             \
> +                     mmu_idx, retaddr);                                 \
> +        }                                                               \
> +                                                                        \
> +        /* Notice an IO access.  */                                     \
> +        if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {                   \
> +            /* There's really nothing that can be done to               \
> +               support this apart from stop-the-world.  */              \
> +            cpu_loop_exit_atomic(ENV_GET_CPU(env), retaddr);
>  \

Hmm what are we doing here? I don't think there is a reason to stop the
world because atomicity should be guaranteed by the BQL. Although I
suspect anything doing atomic access to an I/O address (especially with
LL/SC primitives) is going to be disappointed.

> +        }                                                               \
> +        haddr = (DATA_TYPE *)((uintptr_t)addr + tlbe->addend);          \
> +    } while (0)
> +
> +DATA_TYPE glue(glue(helper_atomic_cmpxchg, SUFFIX), HE_SUFFIX)
> +    (CPUArchState *env, target_ulong addr, DATA_TYPE cmpv, DATA_TYPE newv,
> +     TCGMemOpIdx oi, uintptr_t retaddr)
> +{
> +    ATOMIC_MMU_BODY;
> +    return atomic_cmpxchg(haddr, cmpv, newv);
> +}
> +
> +#define GEN_ATOMIC_HELPER(NAME)                                         \
> +DATA_TYPE glue(glue(glue(helper_atomic_, NAME), SUFFIX), HE_SUFFIX)     \
> +    (CPUArchState *env, target_ulong addr, DATA_TYPE val,               \
> +     TCGMemOpIdx oi, uintptr_t retaddr)                                 \
> +{                                                                       \
> +    ATOMIC_MMU_BODY;                                                    \
> +    return glue(atomic_, NAME)(haddr, val);                             \
> +}
> +
> +GEN_ATOMIC_HELPER(fetch_add)
> +GEN_ATOMIC_HELPER(fetch_and)
> +GEN_ATOMIC_HELPER(fetch_or)
> +GEN_ATOMIC_HELPER(fetch_xor)
> +
> +GEN_ATOMIC_HELPER(add_fetch)
> +GEN_ATOMIC_HELPER(and_fetch)
> +GEN_ATOMIC_HELPER(or_fetch)
> +GEN_ATOMIC_HELPER(xor_fetch)
> +
> +GEN_ATOMIC_HELPER(xchg)
> +
> +#undef GEN_ATOMIC_HELPER
> +
> +#if DATA_SIZE > 1
> +DATA_TYPE glue(glue(helper_atomic_cmpxchg, SUFFIX), RE_SUFFIX)
> +    (CPUArchState *env, target_ulong addr, DATA_TYPE cmpv, DATA_TYPE newv,
> +     TCGMemOpIdx oi, uintptr_t retaddr)
> +{
> +    DATA_TYPE retv;
> +    cmpv = BSWAP(cmpv);
> +    newv = BSWAP(newv);
> +    retv = (glue(glue(helper_atomic_cmpxchg, SUFFIX), HE_SUFFIX)
> +            (env, addr, cmpv, newv, oi, retaddr));
> +    return BSWAP(retv);
> +}
> +
> +#define GEN_ATOMIC_HELPER(NAME)                                         \
> +DATA_TYPE glue(glue(glue(helper_atomic_, NAME), SUFFIX), RE_SUFFIX)     \
> +    (CPUArchState *env, target_ulong addr, DATA_TYPE val,               \
> +     TCGMemOpIdx oi, uintptr_t retaddr)                                 \
> +{                                                                       \
> +    DATA_TYPE ret;                                                      \
> +    val = BSWAP(val);                                                   \
> +    ret = (glue(glue(glue(helper_atomic_, NAME), SUFFIX), HE_SUFFIX)    \
> +           (env, addr, val, oi, retaddr));                              \
> +    return BSWAP(ret);                                                  \
> +}
> +
> +GEN_ATOMIC_HELPER(fetch_and)
> +GEN_ATOMIC_HELPER(fetch_or)
> +GEN_ATOMIC_HELPER(fetch_xor)
> +
> +GEN_ATOMIC_HELPER(and_fetch)
> +GEN_ATOMIC_HELPER(or_fetch)
> +GEN_ATOMIC_HELPER(xor_fetch)
> +
> +GEN_ATOMIC_HELPER(xchg)
> +
> +#undef GEN_ATOMIC_HELPER
> +
> +/* Note that for addition, we need to use a separate cmpxchg loop instead
> +   of bswaps around the host-endian helpers.  */
> +DATA_TYPE glue(glue(helper_atomic_fetch_add, SUFFIX), RE_SUFFIX)
> +    (CPUArchState *env, target_ulong addr, DATA_TYPE val,
> +     TCGMemOpIdx oi, uintptr_t retaddr)
> +{
> +    DATA_TYPE ldo, ldn, ret, sto;
> +    ATOMIC_MMU_BODY;
> +
> +    ldo = *haddr;
> +    while (1) {
> +        ret = BSWAP(ldo);
> +        sto = BSWAP(ret + val);
> +        ldn = atomic_cmpxchg(haddr, ldo, sto);
> +        if (ldn == ldo) {
> +            return ret;
> +        }
> +        ldo = ldn;
> +    }
> +}
> +
> +DATA_TYPE glue(glue(helper_atomic_add_fetch, SUFFIX), RE_SUFFIX)
> +    (CPUArchState *env, target_ulong addr, DATA_TYPE val,
> +     TCGMemOpIdx oi, uintptr_t retaddr)
> +{
> +    DATA_TYPE ldo, ldn, ret, sto;
> +    ATOMIC_MMU_BODY;
> +
> +    ldo = *haddr;
> +    while (1) {
> +        ret = BSWAP(ldo) + val;
> +        sto = BSWAP(ret);
> +        ldn = atomic_cmpxchg(haddr, ldo, sto);
> +        if (ldn == ldo) {
> +            return ret;
> +        }
> +        ldo = ldn;
> +    }
> +}
> +#endif /* DATA_SIZE > 1 */
> +
> +#undef ATOMIC_MMU_BODY
> +
>  #endif /* !defined(SOFTMMU_CODE_ACCESS) */
>
>  #undef READ_ACCESS_TYPE
>  #undef SHIFT
>  #undef DATA_TYPE
> +#undef DATAX_TYPE

Accidental typo?

>  #undef SUFFIX
>  #undef LSUFFIX
>  #undef DATA_SIZE
> @@ -524,8 +696,6 @@ void probe_write(CPUArchState *env, target_ulong addr, int mmu_idx,
>  #undef BSWAP
>  #undef TGT_BE
>  #undef TGT_LE
> -#undef CPU_BE
> -#undef CPU_LE

I suspect these belong in a separate clean-up patch?

>  #undef helper_le_ld_name
>  #undef helper_be_ld_name
>  #undef helper_le_lds_name
> @@ -534,3 +704,5 @@ void probe_write(CPUArchState *env, target_ulong addr, int mmu_idx,
>  #undef helper_be_st_name
>  #undef helper_te_ld_name
>  #undef helper_te_st_name
> +#undef HE_SUFFIX
> +#undef RE_SUFFIX
> diff --git a/tcg-runtime.c b/tcg-runtime.c
> index ea2ad64..63f78fc 100644
> --- a/tcg-runtime.c
> +++ b/tcg-runtime.c
> @@ -23,17 +23,10 @@
>   */
>  #include "qemu/osdep.h"
>  #include "qemu/host-utils.h"
> -
> -/* This file is compiled once, and thus we can't include the standard
> -   "exec/helper-proto.h", which has includes that are target specific.  */
> -
> -#include "exec/helper-head.h"
> -
> -#define DEF_HELPER_FLAGS_2(name, flags, ret, t1, t2) \
> -  dh_ctype(ret) HELPER(name) (dh_ctype(t1), dh_ctype(t2));
> -
> -#include "tcg-runtime.h"
> -
> +#include "cpu.h"
> +#include "exec/helper-proto.h"
> +#include "exec/cpu_ldst.h"
> +#include "exec/exec-all.h"
>
>  /* 32-bit helpers */
>
> @@ -107,3 +100,15 @@ int64_t HELPER(mulsh_i64)(int64_t arg1, int64_t arg2)
>      muls64(&l, &h, arg1, arg2);
>      return h;
>  }
> +
> +#define SHIFT 0
> +#include "atomic_template.h"
> +
> +#define SHIFT 1
> +#include "atomic_template.h"
> +
> +#define SHIFT 2
> +#include "atomic_template.h"
> +
> +#define SHIFT 3
> +#include "atomic_template.h"
> diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
> index 293b854..bc72c17 100644
> --- a/tcg/tcg-op.c
> +++ b/tcg/tcg-op.c
> @@ -1958,3 +1958,327 @@ void tcg_gen_qemu_st_i64(TCGv_i64 val, TCGv addr, TCGArg idx, TCGMemOp memop)
>                                 addr, trace_mem_get_info(memop, 1));
>      gen_ldst_i64(INDEX_op_qemu_st_i64, val, addr, memop, idx);
>  }
> +
> +static void tcg_gen_ext_i32(TCGv_i32 ret, TCGv_i32 val, TCGMemOp opc)
> +{
> +    switch (opc & MO_SSIZE) {
> +    case MO_SB:
> +        tcg_gen_ext8s_i32(ret, val);
> +        break;
> +    case MO_UB:
> +        tcg_gen_ext8u_i32(ret, val);
> +        break;
> +    case MO_SW:
> +        tcg_gen_ext16s_i32(ret, val);
> +        break;
> +    case MO_UW:
> +        tcg_gen_ext16u_i32(ret, val);
> +        break;
> +    default:
> +        tcg_gen_mov_i32(ret, val);
> +        break;
> +    }
> +}
> +
> +static void tcg_gen_ext_i64(TCGv_i64 ret, TCGv_i64 val, TCGMemOp opc)
> +{
> +    switch (opc & MO_SSIZE) {
> +    case MO_SB:
> +        tcg_gen_ext8s_i64(ret, val);
> +        break;
> +    case MO_UB:
> +        tcg_gen_ext8u_i64(ret, val);
> +        break;
> +    case MO_SW:
> +        tcg_gen_ext16s_i64(ret, val);
> +        break;
> +    case MO_UW:
> +        tcg_gen_ext16u_i64(ret, val);
> +        break;
> +    case MO_SL:
> +        tcg_gen_ext32s_i64(ret, val);
> +        break;
> +    case MO_UL:
> +        tcg_gen_ext32u_i64(ret, val);
> +        break;
> +    default:
> +        tcg_gen_mov_i64(ret, val);
> +        break;
> +    }
> +}

Should these two be separate patches?

> +
> +#ifdef CONFIG_USER_ONLY
> +typedef void (*gen_atomic_cx_i32)(TCGv_i32, TCGv, TCGv_i32, TCGv_i32);
> +typedef void (*gen_atomic_cx_i64)(TCGv_i64, TCGv, TCGv_i64, TCGv_i64);
> +typedef void (*gen_atomic_op_i32)(TCGv_i32, TCGv, TCGv_i32);
> +typedef void (*gen_atomic_op_i64)(TCGv_i64, TCGv, TCGv_i64);
> +#else
> +typedef void (*gen_atomic_cx_i32)(TCGv_i32, TCGv_env, TCGv,
> +                                  TCGv_i32, TCGv_i32, TCGv_i32);
> +typedef void (*gen_atomic_cx_i64)(TCGv_i64, TCGv_env, TCGv,
> +                                  TCGv_i64, TCGv_i64, TCGv_i32);
> +typedef void (*gen_atomic_op_i32)(TCGv_i32, TCGv_env, TCGv, TCGv_i32, TCGv_i32);
> +typedef void (*gen_atomic_op_i64)(TCGv_i64, TCGv_env, TCGv, TCGv_i64, TCGv_i32);
> +#endif
> +
> +static void * const table_cmpxchg[16] = {
> +    [MO_8] = gen_helper_atomic_cmpxchgb,
> +    [MO_16 | MO_LE] = gen_helper_atomic_cmpxchgw_le,
> +    [MO_16 | MO_BE] = gen_helper_atomic_cmpxchgw_be,
> +    [MO_32 | MO_LE] = gen_helper_atomic_cmpxchgl_le,
> +    [MO_32 | MO_BE] = gen_helper_atomic_cmpxchgl_be,
> +    [MO_64 | MO_LE] = gen_helper_atomic_cmpxchgq_le,
> +    [MO_64 | MO_BE] = gen_helper_atomic_cmpxchgq_be,
> +};
> +
> +void tcg_gen_atomic_cmpxchg_i32(TCGv_i32 retv, TCGv addr, TCGv_i32 cmpv,
> +                                TCGv_i32 newv, TCGArg idx, TCGMemOp memop)
> +{
> +    memop = tcg_canonicalize_memop(memop, 0, 0);
> +
> +    if (!parallel_cpus) {
> +        TCGv_i32 t1 = tcg_temp_new_i32();
> +        TCGv_i32 t2 = tcg_temp_new_i32();
> +
> +        tcg_gen_ext_i32(t2, cmpv, memop & MO_SIZE);
> +
> +        tcg_gen_qemu_ld_i32(t1, addr, idx, memop & ~MO_SIGN);
> +        tcg_gen_movcond_i32(TCG_COND_EQ, t2, t1, t2, newv, t1);
> +        tcg_gen_qemu_st_i32(t2, addr, idx, memop);
> +        tcg_temp_free_i32(t2);
> +
> +        if (memop & MO_SIGN) {
> +            tcg_gen_ext_i32(retv, t1, memop);
> +        } else {
> +            tcg_gen_mov_i32(retv, t1);
> +        }
> +        tcg_temp_free_i32(t1);
> +    } else {
> +        gen_atomic_cx_i32 gen;
> +
> +        gen = table_cmpxchg[memop & (MO_SIZE | MO_BSWAP)];
> +        tcg_debug_assert(gen != NULL);
> +
> +#ifdef CONFIG_USER_ONLY
> +        gen(retv, addr, cmpv, newv);
> +#else
> +        {
> +            TCGv_i32 oi = tcg_const_i32(make_memop_idx(memop & ~MO_SIGN, idx));
> +            gen(retv, tcg_ctx.tcg_env, addr, cmpv, newv, oi);
> +            tcg_temp_free_i32(oi);
> +        }
> +#endif
> +
> +        if (memop & MO_SIGN) {
> +            tcg_gen_ext_i32(retv, retv, memop);
> +        }
> +    }
> +}
> +
> +void tcg_gen_atomic_cmpxchg_i64(TCGv_i64 retv, TCGv addr, TCGv_i64 cmpv,
> +                                TCGv_i64 newv, TCGArg idx, TCGMemOp memop)
> +{
> +    memop = tcg_canonicalize_memop(memop, 1, 0);
> +
> +    if (!parallel_cpus) {
> +        TCGv_i64 t1 = tcg_temp_new_i64();
> +        TCGv_i64 t2 = tcg_temp_new_i64();
> +
> +        tcg_gen_ext_i64(t2, cmpv, memop & MO_SIZE);
> +
> +        tcg_gen_qemu_ld_i64(t1, addr, idx, memop & ~MO_SIGN);
> +        tcg_gen_movcond_i64(TCG_COND_EQ, t2, t1, t2, newv, t1);
> +        tcg_gen_qemu_st_i64(t2, addr, idx, memop);
> +        tcg_temp_free_i64(t2);
> +
> +        if (memop & MO_SIGN) {
> +            tcg_gen_ext_i64(retv, t1, memop);
> +        } else {
> +            tcg_gen_mov_i64(retv, t1);
> +        }
> +        tcg_temp_free_i64(t1);
> +    } else if ((memop & MO_SIZE) == MO_64) {
> +        gen_atomic_cx_i64 gen;
> +
> +        gen = table_cmpxchg[memop & (MO_SIZE | MO_BSWAP)];
> +        tcg_debug_assert(gen != NULL);
> +
> +#ifdef CONFIG_USER_ONLY
> +        gen(retv, addr, cmpv, newv);
> +#else
> +        TCGv_i32 oi = tcg_const_i32(make_memop_idx(memop, idx));
> +        gen(retv, tcg_ctx.tcg_env, addr, cmpv, newv, oi);
> +        tcg_temp_free_i32(oi);
> +#endif
> +    } else {
> +        TCGv_i32 c32 = tcg_temp_new_i32();
> +        TCGv_i32 n32 = tcg_temp_new_i32();
> +        TCGv_i32 r32 = tcg_temp_new_i32();
> +
> +        tcg_gen_extrl_i64_i32(c32, cmpv);
> +        tcg_gen_extrl_i64_i32(n32, newv);
> +        tcg_gen_atomic_cmpxchg_i32(r32, addr, c32, n32, idx, memop & ~MO_SIGN);
> +        tcg_temp_free_i32(c32);
> +        tcg_temp_free_i32(n32);
> +
> +        tcg_gen_extu_i32_i64(retv, r32);
> +        tcg_temp_free_i32(r32);
> +
> +        if (memop & MO_SIGN) {
> +            tcg_gen_ext_i64(retv, retv, memop);
> +        }
> +    }
> +}
> +
> +static void do_nonatomic_op_i32(TCGv_i32 ret, TCGv addr, TCGv_i32 val,
> +                                TCGArg idx, TCGMemOp memop, bool new_val,
> +                                void (*gen)(TCGv_i32, TCGv_i32, TCGv_i32))
> +{
> +    TCGv_i32 t1 = tcg_temp_new_i32();
> +    TCGv_i32 t2 = tcg_temp_new_i32();
> +
> +    memop = tcg_canonicalize_memop(memop, 0, 0);
> +
> +    tcg_gen_qemu_ld_i32(t1, addr, idx, memop & ~MO_SIGN);
> +    gen(t2, t1, val);
> +    tcg_gen_qemu_st_i32(t2, addr, idx, memop);
> +
> +    tcg_gen_ext_i32(ret, (new_val ? t2 : t1), memop);
> +    tcg_temp_free_i32(t1);
> +    tcg_temp_free_i32(t2);
> +}
> +
> +static void do_atomic_op_i32(TCGv_i32 ret, TCGv addr, TCGv_i32 val,
> +                             TCGArg idx, TCGMemOp memop, void * const table[])
> +{
> +    gen_atomic_op_i32 gen;
> +
> +    memop = tcg_canonicalize_memop(memop, 0, 0);
> +
> +    gen = table[memop & (MO_SIZE | MO_BSWAP)];
> +    tcg_debug_assert(gen != NULL);
> +
> +#ifdef CONFIG_USER_ONLY
> +    gen(ret, addr, val);
> +#else
> +    {
> +        TCGv_i32 oi = tcg_const_i32(make_memop_idx(memop & ~MO_SIGN, idx));
> +        gen(ret, tcg_ctx.tcg_env, addr, val, oi);
> +        tcg_temp_free_i32(oi);
> +    }
> +#endif
> +
> +    if (memop & MO_SIGN) {
> +        tcg_gen_ext_i32(ret, ret, memop);
> +    }
> +}
> +
> +static void do_nonatomic_op_i64(TCGv_i64 ret, TCGv addr, TCGv_i64 val,
> +                                TCGArg idx, TCGMemOp memop, bool new_val,
> +                                void (*gen)(TCGv_i64, TCGv_i64, TCGv_i64))
> +{
> +    TCGv_i64 t1 = tcg_temp_new_i64();
> +    TCGv_i64 t2 = tcg_temp_new_i64();
> +
> +    memop = tcg_canonicalize_memop(memop, 1, 0);
> +
> +    tcg_gen_qemu_ld_i64(t1, addr, idx, memop & ~MO_SIGN);
> +    gen(t2, t1, val);
> +    tcg_gen_qemu_st_i64(t2, addr, idx, memop);
> +
> +    tcg_gen_ext_i64(ret, (new_val ? t2 : t1), memop);
> +    tcg_temp_free_i64(t1);
> +    tcg_temp_free_i64(t2);
> +}
> +
> +static void do_atomic_op_i64(TCGv_i64 ret, TCGv addr, TCGv_i64 val,
> +                             TCGArg idx, TCGMemOp memop, void * const table[])
> +{
> +    memop = tcg_canonicalize_memop(memop, 1, 0);
> +
> +    if ((memop & MO_SIZE) == MO_64) {
> +        gen_atomic_op_i64 gen;
> +
> +        gen = table[memop & (MO_SIZE | MO_BSWAP)];
> +        tcg_debug_assert(gen != NULL);
> +
> +#ifdef CONFIG_USER_ONLY
> +        gen(ret, addr, val);
> +#else
> +        {
> +            TCGv_i32 oi = tcg_const_i32(make_memop_idx(memop & ~MO_SIGN, idx));
> +            gen(ret, tcg_ctx.tcg_env, addr, val, oi);
> +            tcg_temp_free_i32(oi);
> +        }
> +#endif
> +    } else {
> +        TCGv_i32 v32 = tcg_temp_new_i32();
> +        TCGv_i32 r32 = tcg_temp_new_i32();
> +
> +        tcg_gen_extrl_i64_i32(v32, val);
> +        do_atomic_op_i32(r32, addr, v32, idx, memop & ~MO_SIGN, table);
> +        tcg_temp_free_i32(v32);
> +
> +        tcg_gen_extu_i32_i64(ret, r32);
> +        tcg_temp_free_i32(r32);
> +
> +        if (memop & MO_SIGN) {
> +            tcg_gen_ext_i64(ret, ret, memop);
> +        }
> +    }
> +}
> +
> +#define GEN_ATOMIC_HELPER(NAME, OP, NEW)                                \
> +static void * const table_##NAME[16] = {                                \
> +    [MO_8] = gen_helper_atomic_##NAME##b,                               \
> +    [MO_16 | MO_LE] = gen_helper_atomic_##NAME##w_le,                   \
> +    [MO_16 | MO_BE] = gen_helper_atomic_##NAME##w_be,                   \
> +    [MO_32 | MO_LE] = gen_helper_atomic_##NAME##l_le,                   \
> +    [MO_32 | MO_BE] = gen_helper_atomic_##NAME##l_be,                   \
> +    [MO_64 | MO_LE] = gen_helper_atomic_##NAME##q_le,                   \
> +    [MO_64 | MO_BE] = gen_helper_atomic_##NAME##q_be,                   \
> +};                                                                      \
> +void tcg_gen_atomic_##NAME##_i32                                        \
> +    (TCGv_i32 ret, TCGv addr, TCGv_i32 val, TCGArg idx, TCGMemOp memop) \
> +{                                                                       \
> +    if (parallel_cpus) {                                                \
> +        do_atomic_op_i32(ret, addr, val, idx, memop, table_##NAME);     \
> +    } else {                                                            \
> +        do_nonatomic_op_i32(ret, addr, val, idx, memop, NEW,            \
> +                            tcg_gen_##OP##_i32);                        \
> +    }                                                                   \
> +}                                                                       \
> +void tcg_gen_atomic_##NAME##_i64                                        \
> +    (TCGv_i64 ret, TCGv addr, TCGv_i64 val, TCGArg idx, TCGMemOp memop) \
> +{                                                                       \
> +    if (parallel_cpus) {                                                \
> +        do_atomic_op_i64(ret, addr, val, idx, memop, table_##NAME);     \
> +    } else {                                                            \
> +        do_nonatomic_op_i64(ret, addr, val, idx, memop, NEW,            \
> +                            tcg_gen_##OP##_i64);                        \
> +    }                                                                   \
> +}
> +
> +GEN_ATOMIC_HELPER(fetch_add, add, 0)
> +GEN_ATOMIC_HELPER(fetch_and, and, 0)
> +GEN_ATOMIC_HELPER(fetch_or, or, 0)
> +GEN_ATOMIC_HELPER(fetch_xor, xor, 0)
> +
> +GEN_ATOMIC_HELPER(add_fetch, add, 1)
> +GEN_ATOMIC_HELPER(and_fetch, and, 1)
> +GEN_ATOMIC_HELPER(or_fetch, or, 1)
> +GEN_ATOMIC_HELPER(xor_fetch, xor, 1)
> +
> +static void tcg_gen_mov2_i32(TCGv_i32 r, TCGv_i32 a, TCGv_i32 b)
> +{
> +    tcg_gen_mov_i32(r, b);
> +}
> +
> +static void tcg_gen_mov2_i64(TCGv_i64 r, TCGv_i64 a, TCGv_i64 b)
> +{
> +    tcg_gen_mov_i64(r, b);
> +}
> +
> +GEN_ATOMIC_HELPER(xchg, mov2, 0)
> +
> +#undef GEN_ATOMIC_HELPER
> diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
> index f217e80..2a845ae 100644
> --- a/tcg/tcg-op.h
> +++ b/tcg/tcg-op.h
> @@ -852,6 +852,30 @@ static inline void tcg_gen_qemu_st64(TCGv_i64 arg, TCGv addr, int mem_index)
>      tcg_gen_qemu_st_i64(arg, addr, mem_index, MO_TEQ);
>  }
>
> +void tcg_gen_atomic_cmpxchg_i32(TCGv_i32, TCGv, TCGv_i32, TCGv_i32,
> +                                TCGArg, TCGMemOp);
> +void tcg_gen_atomic_cmpxchg_i64(TCGv_i64, TCGv, TCGv_i64, TCGv_i64,
> +                                TCGArg, TCGMemOp);
> +
> +void tcg_gen_atomic_xchg_i32(TCGv_i32, TCGv, TCGv_i32, TCGArg, TCGMemOp);
> +void tcg_gen_atomic_xchg_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
> +void tcg_gen_atomic_fetch_add_i32(TCGv_i32, TCGv, TCGv_i32, TCGArg, TCGMemOp);
> +void tcg_gen_atomic_fetch_add_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
> +void tcg_gen_atomic_fetch_and_i32(TCGv_i32, TCGv, TCGv_i32, TCGArg, TCGMemOp);
> +void tcg_gen_atomic_fetch_and_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
> +void tcg_gen_atomic_fetch_or_i32(TCGv_i32, TCGv, TCGv_i32, TCGArg, TCGMemOp);
> +void tcg_gen_atomic_fetch_or_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
> +void tcg_gen_atomic_fetch_xor_i32(TCGv_i32, TCGv, TCGv_i32, TCGArg, TCGMemOp);
> +void tcg_gen_atomic_fetch_xor_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
> +void tcg_gen_atomic_add_fetch_i32(TCGv_i32, TCGv, TCGv_i32, TCGArg, TCGMemOp);
> +void tcg_gen_atomic_add_fetch_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
> +void tcg_gen_atomic_and_fetch_i32(TCGv_i32, TCGv, TCGv_i32, TCGArg, TCGMemOp);
> +void tcg_gen_atomic_and_fetch_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
> +void tcg_gen_atomic_or_fetch_i32(TCGv_i32, TCGv, TCGv_i32, TCGArg, TCGMemOp);
> +void tcg_gen_atomic_or_fetch_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
> +void tcg_gen_atomic_xor_fetch_i32(TCGv_i32, TCGv, TCGv_i32, TCGArg, TCGMemOp);
> +void tcg_gen_atomic_xor_fetch_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
> +
>  #if TARGET_LONG_BITS == 64
>  #define tcg_gen_movi_tl tcg_gen_movi_i64
>  #define tcg_gen_mov_tl tcg_gen_mov_i64
> @@ -930,6 +954,16 @@ static inline void tcg_gen_qemu_st64(TCGv_i64 arg, TCGv addr, int mem_index)
>  #define tcg_gen_sub2_tl tcg_gen_sub2_i64
>  #define tcg_gen_mulu2_tl tcg_gen_mulu2_i64
>  #define tcg_gen_muls2_tl tcg_gen_muls2_i64
> +#define tcg_gen_atomic_cmpxchg_tl tcg_gen_atomic_cmpxchg_i64
> +#define tcg_gen_atomic_xchg_tl tcg_gen_atomic_xchg_i64
> +#define tcg_gen_atomic_fetch_add_tl tcg_gen_atomic_fetch_add_i64
> +#define tcg_gen_atomic_fetch_and_tl tcg_gen_atomic_fetch_and_i64
> +#define tcg_gen_atomic_fetch_or_tl tcg_gen_atomic_fetch_or_i64
> +#define tcg_gen_atomic_fetch_xor_tl tcg_gen_atomic_fetch_xor_i64
> +#define tcg_gen_atomic_add_fetch_tl tcg_gen_atomic_add_fetch_i64
> +#define tcg_gen_atomic_and_fetch_tl tcg_gen_atomic_and_fetch_i64
> +#define tcg_gen_atomic_or_fetch_tl tcg_gen_atomic_or_fetch_i64
> +#define tcg_gen_atomic_xor_fetch_tl tcg_gen_atomic_xor_fetch_i64
>  #else
>  #define tcg_gen_movi_tl tcg_gen_movi_i32
>  #define tcg_gen_mov_tl tcg_gen_mov_i32
> @@ -1007,6 +1041,16 @@ static inline void tcg_gen_qemu_st64(TCGv_i64 arg, TCGv addr, int mem_index)
>  #define tcg_gen_sub2_tl tcg_gen_sub2_i32
>  #define tcg_gen_mulu2_tl tcg_gen_mulu2_i32
>  #define tcg_gen_muls2_tl tcg_gen_muls2_i32
> +#define tcg_gen_atomic_cmpxchg_tl tcg_gen_atomic_cmpxchg_i32
> +#define tcg_gen_atomic_xchg_tl tcg_gen_atomic_xchg_i32
> +#define tcg_gen_atomic_fetch_add_tl tcg_gen_atomic_fetch_add_i32
> +#define tcg_gen_atomic_fetch_and_tl tcg_gen_atomic_fetch_and_i32
> +#define tcg_gen_atomic_fetch_or_tl tcg_gen_atomic_fetch_or_i32
> +#define tcg_gen_atomic_fetch_xor_tl tcg_gen_atomic_fetch_xor_i32
> +#define tcg_gen_atomic_add_fetch_tl tcg_gen_atomic_add_fetch_i32
> +#define tcg_gen_atomic_and_fetch_tl tcg_gen_atomic_and_fetch_i32
> +#define tcg_gen_atomic_or_fetch_tl tcg_gen_atomic_or_fetch_i32
> +#define tcg_gen_atomic_xor_fetch_tl tcg_gen_atomic_xor_fetch_i32
>  #endif
>
>  #if UINTPTR_MAX == UINT32_MAX
> diff --git a/tcg/tcg-runtime.h b/tcg/tcg-runtime.h
> index 23a0c37..b3accf4 100644
> --- a/tcg/tcg-runtime.h
> +++ b/tcg/tcg-runtime.h
> @@ -14,3 +14,78 @@ DEF_HELPER_FLAGS_2(sar_i64, TCG_CALL_NO_RWG_SE, s64, s64, s64)
>
>  DEF_HELPER_FLAGS_2(mulsh_i64, TCG_CALL_NO_RWG_SE, s64, s64, s64)
>  DEF_HELPER_FLAGS_2(muluh_i64, TCG_CALL_NO_RWG_SE, i64, i64, i64)
> +
> +#ifdef CONFIG_USER_ONLY
> +
> +DEF_HELPER_FLAGS_3(atomic_cmpxchgb, TCG_CALL_NO_WG, i32, tl, i32, i32)
> +DEF_HELPER_FLAGS_3(atomic_cmpxchgw_be, TCG_CALL_NO_WG, i32, tl, i32, i32)
> +DEF_HELPER_FLAGS_3(atomic_cmpxchgl_be, TCG_CALL_NO_WG, i32, tl, i32, i32)
> +DEF_HELPER_FLAGS_3(atomic_cmpxchgq_be, TCG_CALL_NO_WG, i64, tl, i64, i64)
> +DEF_HELPER_FLAGS_3(atomic_cmpxchgw_le, TCG_CALL_NO_WG, i32, tl, i32, i32)
> +DEF_HELPER_FLAGS_3(atomic_cmpxchgl_le, TCG_CALL_NO_WG, i32, tl, i32, i32)
> +DEF_HELPER_FLAGS_3(atomic_cmpxchgq_le, TCG_CALL_NO_WG, i64, tl, i64, i64)
> +
> +#define GEN_ATOMIC_HELPERS(NAME)                        \
> +    DEF_HELPER_FLAGS_2(glue(glue(atomic_, NAME), b),    \
> +                       TCG_CALL_NO_WG, i32, tl, i32)    \
> +    DEF_HELPER_FLAGS_2(glue(glue(atomic_, NAME), w_le), \
> +                       TCG_CALL_NO_WG, i32, tl, i32)    \
> +    DEF_HELPER_FLAGS_2(glue(glue(atomic_, NAME), w_be), \
> +                       TCG_CALL_NO_WG, i32, tl, i32)    \
> +    DEF_HELPER_FLAGS_2(glue(glue(atomic_, NAME), l_le), \
> +                       TCG_CALL_NO_WG, i32, tl, i32)    \
> +    DEF_HELPER_FLAGS_2(glue(glue(atomic_, NAME), l_be), \
> +                       TCG_CALL_NO_WG, i32, tl, i32)    \
> +    DEF_HELPER_FLAGS_2(glue(glue(atomic_, NAME), q_le), \
> +                       TCG_CALL_NO_WG, i64, tl, i64)    \
> +    DEF_HELPER_FLAGS_2(glue(glue(atomic_, NAME), q_be), \
> +                       TCG_CALL_NO_WG, i64, tl, i64)
> +
> +#else
> +
> +DEF_HELPER_FLAGS_5(atomic_cmpxchgb, TCG_CALL_NO_WG, i32, env,
> +                   tl, i32, i32, i32)
> +DEF_HELPER_FLAGS_5(atomic_cmpxchgw_be, TCG_CALL_NO_WG, i32, env,
> +                   tl, i32, i32, i32)
> +DEF_HELPER_FLAGS_5(atomic_cmpxchgl_be, TCG_CALL_NO_WG, i32, env,
> +                   tl, i32, i32, i32)
> +DEF_HELPER_FLAGS_5(atomic_cmpxchgq_be, TCG_CALL_NO_WG, i64, env,
> +                   tl, i64, i64, i32)
> +DEF_HELPER_FLAGS_5(atomic_cmpxchgw_le, TCG_CALL_NO_WG, i32, env,
> +                   tl, i32, i32, i32)
> +DEF_HELPER_FLAGS_5(atomic_cmpxchgl_le, TCG_CALL_NO_WG, i32, env,
> +                   tl, i32, i32, i32)
> +DEF_HELPER_FLAGS_5(atomic_cmpxchgq_le, TCG_CALL_NO_WG, i64, env,
> +                   tl, i64, i64, i32)
> +
> +#define GEN_ATOMIC_HELPERS(NAME)                                \
> +    DEF_HELPER_FLAGS_4(glue(glue(atomic_, NAME), b),            \
> +                       TCG_CALL_NO_WG, i32, env, tl, i32, i32)  \
> +    DEF_HELPER_FLAGS_4(glue(glue(atomic_, NAME), w_le),         \
> +                       TCG_CALL_NO_WG, i32, env, tl, i32, i32)  \
> +    DEF_HELPER_FLAGS_4(glue(glue(atomic_, NAME), w_be),         \
> +                       TCG_CALL_NO_WG, i32, env, tl, i32, i32)  \
> +    DEF_HELPER_FLAGS_4(glue(glue(atomic_, NAME), l_le),         \
> +                       TCG_CALL_NO_WG, i32, env, tl, i32, i32)  \
> +    DEF_HELPER_FLAGS_4(glue(glue(atomic_, NAME), l_be),         \
> +                       TCG_CALL_NO_WG, i32, env, tl, i32, i32)  \
> +    DEF_HELPER_FLAGS_4(glue(glue(atomic_, NAME), q_le),         \
> +                       TCG_CALL_NO_WG, i64, env, tl, i64, i32)  \
> +    DEF_HELPER_FLAGS_4(glue(glue(atomic_, NAME), q_be),         \
> +                       TCG_CALL_NO_WG, i64, env, tl, i64, i32)
> +
> +#endif
> +
> +GEN_ATOMIC_HELPERS(fetch_add)
> +GEN_ATOMIC_HELPERS(fetch_and)
> +GEN_ATOMIC_HELPERS(fetch_or)
> +GEN_ATOMIC_HELPERS(fetch_xor)
> +
> +GEN_ATOMIC_HELPERS(add_fetch)
> +GEN_ATOMIC_HELPERS(and_fetch)
> +GEN_ATOMIC_HELPERS(or_fetch)
> +GEN_ATOMIC_HELPERS(xor_fetch)
> +
> +GEN_ATOMIC_HELPERS(xchg)
> +
> +#undef GEN_ATOMIC_HELPERS
> diff --git a/tcg/tcg.h b/tcg/tcg.h
> index ab67537..4e60498 100644
> --- a/tcg/tcg.h
> +++ b/tcg/tcg.h
> @@ -1163,6 +1163,59 @@ uint64_t helper_be_ldq_cmmu(CPUArchState *env, target_ulong addr,
>  # define helper_ret_ldq_cmmu  helper_le_ldq_cmmu
>  #endif
>
> +uint8_t helper_atomic_cmpxchgb_mmu(CPUArchState *env, target_ulong addr,
> +                                   uint8_t cmpv, uint8_t newv,
> +                                   TCGMemOpIdx oi, uintptr_t retaddr);
> +uint16_t helper_atomic_cmpxchgw_le_mmu(CPUArchState *env, target_ulong addr,
> +                                       uint16_t cmpv, uint16_t newv,
> +                                       TCGMemOpIdx oi, uintptr_t retaddr);
> +uint32_t helper_atomic_cmpxchgl_le_mmu(CPUArchState *env, target_ulong addr,
> +                                       uint32_t cmpv, uint32_t newv,
> +                                       TCGMemOpIdx oi, uintptr_t retaddr);
> +uint64_t helper_atomic_cmpxchgq_le_mmu(CPUArchState *env, target_ulong addr,
> +                                       uint64_t cmpv, uint64_t newv,
> +                                       TCGMemOpIdx oi, uintptr_t retaddr);
> +uint16_t helper_atomic_cmpxchgw_be_mmu(CPUArchState *env, target_ulong addr,
> +                                       uint16_t cmpv, uint16_t newv,
> +                                       TCGMemOpIdx oi, uintptr_t retaddr);
> +uint32_t helper_atomic_cmpxchgl_be_mmu(CPUArchState *env, target_ulong addr,
> +                                       uint32_t cmpv, uint32_t newv,
> +                                       TCGMemOpIdx oi, uintptr_t retaddr);
> +uint64_t helper_atomic_cmpxchgq_be_mmu(CPUArchState *env, target_ulong addr,
> +                                       uint64_t cmpv, uint64_t newv,
> +                                       TCGMemOpIdx oi, uintptr_t retaddr);
> +
> +#define GEN_ATOMIC_HELPER(NAME, TYPE, SUFFIX)         \
> +TYPE helper_atomic_ ## NAME ## SUFFIX ## _mmu         \
> +    (CPUArchState *env, target_ulong addr, TYPE val,  \
> +     TCGMemOpIdx oi, uintptr_t retaddr);
> +
> +#define GEN_ATOMIC_HELPER_ALL(NAME)          \
> +    GEN_ATOMIC_HELPER(NAME, uint8_t, b)      \
> +    GEN_ATOMIC_HELPER(NAME, uint16_t, w_le)  \
> +    GEN_ATOMIC_HELPER(NAME, uint32_t, l_le)  \
> +    GEN_ATOMIC_HELPER(NAME, uint64_t, q_le)  \
> +    GEN_ATOMIC_HELPER(NAME, uint16_t, w_be)  \
> +    GEN_ATOMIC_HELPER(NAME, uint32_t, l_be)  \
> +    GEN_ATOMIC_HELPER(NAME, uint64_t, q_be)
> +
> +GEN_ATOMIC_HELPER_ALL(fetch_add)
> +GEN_ATOMIC_HELPER_ALL(fetch_sub)
> +GEN_ATOMIC_HELPER_ALL(fetch_and)
> +GEN_ATOMIC_HELPER_ALL(fetch_or)
> +GEN_ATOMIC_HELPER_ALL(fetch_xor)
> +
> +GEN_ATOMIC_HELPER_ALL(add_fetch)
> +GEN_ATOMIC_HELPER_ALL(sub_fetch)
> +GEN_ATOMIC_HELPER_ALL(and_fetch)
> +GEN_ATOMIC_HELPER_ALL(or_fetch)
> +GEN_ATOMIC_HELPER_ALL(xor_fetch)
> +
> +GEN_ATOMIC_HELPER_ALL(xchg)
> +
> +#undef GEN_ATOMIC_HELPER_ALL
> +#undef GEN_ATOMIC_HELPER
> +
>  #endif /* CONFIG_SOFTMMU */
>
>  #endif /* TCG_H */


--
Alex Bennée

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH v2 09/27] tcg: Add atomic helpers
  2016-09-08 13:43   ` Alex Bennée
@ 2016-09-08 16:08     ` Richard Henderson
  0 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2016-09-08 16:08 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel, cota, pbonzini, peter.maydell, serge.fdrv

On 09/08/2016 06:43 AM, Alex Bennée wrote:
>>  DATA_TYPE
>> > +#undef DATAX_TYPE
> Accidental typo?

It wasn't, actually, but...

>> >  #undef SUFFIX
>> >  #undef LSUFFIX
>> >  #undef DATA_SIZE
>> > @@ -524,8 +696,6 @@ void probe_write(CPUArchState *env, target_ulong addr, int mmu_idx,
>> >  #undef BSWAP
>> >  #undef TGT_BE
>> >  #undef TGT_LE
>> > -#undef CPU_BE
>> > -#undef CPU_LE
> I suspect these belong in a separate clean-up patch?
> 

Please look at the v3 patch set instead.


r~

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH v2 08/27] HACK: Always enable parallel_cpus
  2016-09-08  8:39   ` Alex Bennée
@ 2016-09-08 16:22     ` Richard Henderson
  0 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2016-09-08 16:22 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel, cota, pbonzini, peter.maydell, serge.fdrv

On 09/08/2016 01:39 AM, Alex Bennée wrote:
> 
> Richard Henderson <rth@twiddle.net> writes:
> 
>> This is really just a placeholder for an actual
>> command-line switch for mttcg.
>> ---
>>  translate-all.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/translate-all.c b/translate-all.c
>> index 99ae7f9..a10fa06 100644
>> --- a/translate-all.c
>> +++ b/translate-all.c
>> @@ -119,7 +119,7 @@ static void *l1_map[V_L1_SIZE];
>>
>>  /* code generation context */
>>  TCGContext tcg_ctx;
>> -bool parallel_cpus;
>> +bool parallel_cpus = 1;
> 
> I appreciate this is currently a hack but for CONFIG_USER it should
> always be true anyway.

One could delay setting parallel_cpus until one of a number of syscalls occur:

(1) clone, with CLONE_VM
(2) mmap, with MAP_SHARED
(3) shmat

which would allow single-threaded programs to run without the atomic overhead.

r~

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH v2 07/27] tcg: Add EXCP_ATOMIC
  2016-09-08  8:38   ` Alex Bennée
@ 2016-09-08 16:26     ` Richard Henderson
  0 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2016-09-08 16:26 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel, cota, pbonzini, peter.maydell, serge.fdrv

On 09/08/2016 01:38 AM, Alex Bennée wrote:
>> +            } else if (r == EXCP_ATOMIC) {
>> > +                /* ??? When we begin running cpus in parallel, we should
>> > +                   stop all cpus, clear parallel_cpus, and interpret a
>> > +                   single insn with cpu_exec_step.  In the meantime,
>> > +                   we should never get here.  */
>> > +                abort();
> Pranith has been hitting this abort in the latest merged tree with MTTCG
> but I'm a little unclear how it got here. So is the plan the MTTCG
> thread function should do a step_atomic a-la user mode but we'll never
> get here in the single threaded case?
> 

Yes, that's the plan.  I guess I could have filled in that blank, but I see
that I haven't even done that in the v3 patchset.


r~

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2016-09-08 16:26 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-01 17:04 [Qemu-devel] [PATCH v2 00/27] cmpxchg-based emulation of atomics Richard Henderson
2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 01/27] atomics: add atomic_xor Richard Henderson
2016-08-11 17:19   ` Alex Bennée
2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 02/27] atomics: add atomic_op_fetch variants Richard Henderson
2016-08-11 17:20   ` Alex Bennée
2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 03/27] exec: Avoid direct references to Int128 parts Richard Henderson
2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 04/27] int128: Use __int128 if available Richard Henderson
2016-08-11 10:45   ` Alex Bennée
2016-08-25 19:09   ` [Qemu-devel] [PATCH] fixup! " Alex Bennée
2016-08-26 12:48     ` no-reply
2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 06/27] int128: Use complex numbers if advisable Richard Henderson
2016-07-04 11:51   ` Paolo Bonzini
2016-07-04 12:07   ` Peter Maydell
2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 07/27] tcg: Add EXCP_ATOMIC Richard Henderson
2016-09-08  8:38   ` Alex Bennée
2016-09-08 16:26     ` Richard Henderson
2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 08/27] HACK: Always enable parallel_cpus Richard Henderson
2016-09-08  8:39   ` Alex Bennée
2016-09-08 16:22     ` Richard Henderson
2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 09/27] tcg: Add atomic helpers Richard Henderson
2016-09-08 13:43   ` Alex Bennée
2016-09-08 16:08     ` Richard Henderson
2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 10/27] tcg: Add atomic128 helpers Richard Henderson
2016-07-08  3:00   ` Emilio G. Cota
2016-07-08  5:26     ` Richard Henderson
2016-08-11 10:02   ` Alex Bennée
2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 11/27] target-i386: emulate LOCK'ed cmpxchg using cmpxchg helpers Richard Henderson
2016-07-08  3:08   ` Emilio G. Cota
2016-07-08  3:19     ` Emilio G. Cota
2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 12/27] target-i386: emulate LOCK'ed OP instructions using atomic helpers Richard Henderson
2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 13/27] target-i386: emulate LOCK'ed INC using atomic helper Richard Henderson
2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 14/27] target-i386: emulate LOCK'ed NOT " Richard Henderson
2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 15/27] target-i386: emulate LOCK'ed NEG using cmpxchg helper Richard Henderson
2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 16/27] target-i386: emulate LOCK'ed XADD using atomic helper Richard Henderson
2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 17/27] target-i386: emulate LOCK'ed BTX ops using atomic helpers Richard Henderson
2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 18/27] target-i386: emulate XCHG using atomic helper Richard Henderson
2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 19/27] target-i386: remove helper_lock() Richard Henderson
2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 20/27] tests: add atomic_add-bench Richard Henderson
2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 21/27] target-arm: Rearrange aa32 load and store functions Richard Henderson
2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 22/27] target-arm: emulate LL/SC using cmpxchg helpers Richard Henderson
2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 23/27] target-arm: emulate SWP with atomic_xchg helper Richard Henderson
2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 24/27] target-arm: emulate aarch64's LL/SC using cmpxchg helpers Richard Henderson
2016-07-08  3:34   ` Emilio G. Cota
2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 25/27] linux-user: remove handling of ARM's EXCP_STREX Richard Henderson
2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 26/27] linux-user: remove handling of aarch64's EXCP_STREX Richard Henderson
2016-07-01 17:04 ` [Qemu-devel] [PATCH v2 27/27] target-arm: remove EXCP_STREX + cpu_exclusive_{test, info} Richard Henderson
2016-07-01 17:23 ` [Qemu-devel] [PATCH v2 00/27] cmpxchg-based emulation of atomics Richard Henderson
2016-07-08  2:53 ` Emilio G. Cota

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.