All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v3 00/34] cmpxchg-based emulation of atomics
@ 2016-09-03 20:39 Richard Henderson
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 01/34] atomics: add atomic_xor Richard Henderson
                   ` (37 more replies)
  0 siblings, 38 replies; 97+ messages in thread
From: Richard Henderson @ 2016-09-03 20:39 UTC (permalink / raw)
  To: qemu-devel

Changes since v2:
  * Fix build for 32-bit host without 64-bit atomics.
    Tested with --extra-cflags='-march=-i486'.
    This is patch 15, which might ought to get folded back into
    patch 13 for bisection, but is ugly for review.

  * Move a lot of stuff out of softmmu_template.h, and into cputlb.c.
    This is both for code size reduction and readability.

  * Fold in Alex's test-int128 fix.


r~


Emilio G. Cota (18):
  atomics: add atomic_xor
  atomics: add atomic_op_fetch variants
  target-i386: emulate LOCK'ed cmpxchg using cmpxchg helpers
  target-i386: emulate LOCK'ed OP instructions using atomic helpers
  target-i386: emulate LOCK'ed INC using atomic helper
  target-i386: emulate LOCK'ed NOT using atomic helper
  target-i386: emulate LOCK'ed NEG using cmpxchg helper
  target-i386: emulate LOCK'ed XADD using atomic helper
  target-i386: emulate LOCK'ed BTX ops using atomic helpers
  target-i386: emulate XCHG using atomic helper
  target-i386: remove helper_lock()
  tests: add atomic_add-bench
  target-arm: emulate LL/SC using cmpxchg helpers
  target-arm: emulate SWP with atomic_xchg helper
  target-arm: emulate aarch64's LL/SC using cmpxchg helpers
  linux-user: remove handling of ARM's EXCP_STREX
  linux-user: remove handling of aarch64's EXCP_STREX
  target-arm: remove EXCP_STREX + cpu_exclusive_{test, info}

Richard Henderson (16):
  exec: Avoid direct references to Int128 parts
  int128: Use __int128 if available
  int128: Add int128_make128
  tcg: Add EXCP_ATOMIC
  HACK: Always enable parallel_cpus
  cputlb: Replace SHIFT with DATA_SIZE
  cputlb: Move probe_write out of softmmu_template.h
  cputlb: Remove includes from softmmu_template.h
  cputlb: Move most of iotlb code out of line
  cputlb: Tidy some macros
  tcg: Add atomic helpers
  tcg: Add atomic128 helpers
  tcg: Add CONFIG_ATOMIC64
  target-arm: Rearrange aa32 load and store functions
  target-alpha: Introduce MMU_PHYS_IDX
  target-alpha: Emulate LL/SC using cmpxchg helpers

 Makefile.objs              |   1 -
 Makefile.target            |   1 +
 atomic_template.h          | 211 +++++++++++++++++++++++++
 configure                  |  62 +++++++-
 cpu-exec-common.c          |   6 +
 cpu-exec.c                 |  23 +++
 cpus.c                     |   6 +
 cputlb.c                   | 203 ++++++++++++++++++++++--
 exec.c                     |   4 +-
 include/exec/cpu-all.h     |   1 +
 include/exec/exec-all.h    |   1 +
 include/qemu-common.h      |   1 +
 include/qemu/atomic.h      |  42 ++++-
 include/qemu/int128.h      | 171 +++++++++++++++++++-
 linux-user/main.c          | 326 +++++++-------------------------------
 softmmu_template.h         | 104 ++----------
 target-alpha/cpu.h         |  22 +--
 target-alpha/helper.c      |  16 +-
 target-alpha/helper.h      |   9 --
 target-alpha/machine.c     |   2 -
 target-alpha/mem_helper.c  |  73 ---------
 target-alpha/translate.c   | 148 +++++++++--------
 target-arm/cpu.h           |  17 +-
 target-arm/helper-a64.c    | 113 +++++++++++++
 target-arm/helper-a64.h    |   2 +
 target-arm/internals.h     |   4 +-
 target-arm/translate-a64.c | 106 ++++++-------
 target-arm/translate.c     | 342 ++++++++++++++-------------------------
 target-arm/translate.h     |   4 -
 target-i386/helper.h       |   4 +-
 target-i386/mem_helper.c   | 153 ++++++++++++------
 target-i386/translate.c    | 386 +++++++++++++++++++++++++++++----------------
 tcg-runtime.c              |  74 +++++++--
 tcg/tcg-op.c               | 342 +++++++++++++++++++++++++++++++++++++++
 tcg/tcg-op.h               |  44 ++++++
 tcg/tcg-runtime.h          | 109 +++++++++++++
 tcg/tcg.h                  |  85 ++++++++++
 tests/.gitignore           |   1 +
 tests/Makefile.include     |   4 +-
 tests/atomic_add-bench.c   | 180 +++++++++++++++++++++
 tests/test-int128.c        |  22 +--
 translate-all.c            |   1 +
 42 files changed, 2346 insertions(+), 1080 deletions(-)
 create mode 100644 atomic_template.h
 create mode 100644 tests/atomic_add-bench.c

-- 
2.7.4

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [Qemu-devel] [PATCH v3 01/34] atomics: add atomic_xor
  2016-09-03 20:39 [Qemu-devel] [PATCH v3 00/34] cmpxchg-based emulation of atomics Richard Henderson
@ 2016-09-03 20:39 ` Richard Henderson
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 02/34] atomics: add atomic_op_fetch variants Richard Henderson
                   ` (36 subsequent siblings)
  37 siblings, 0 replies; 97+ messages in thread
From: Richard Henderson @ 2016-09-03 20:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: Emilio G. Cota

From: "Emilio G. Cota" <cota@braap.org>

This paves the way for upcoming work.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Message-Id: <1467054136-10430-8-git-send-email-cota@braap.org>
---
 include/qemu/atomic.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/include/qemu/atomic.h b/include/qemu/atomic.h
index 43b0645..f0c1db0 100644
--- a/include/qemu/atomic.h
+++ b/include/qemu/atomic.h
@@ -201,6 +201,7 @@
 #define atomic_fetch_sub(ptr, n) __atomic_fetch_sub(ptr, n, __ATOMIC_SEQ_CST)
 #define atomic_fetch_and(ptr, n) __atomic_fetch_and(ptr, n, __ATOMIC_SEQ_CST)
 #define atomic_fetch_or(ptr, n)  __atomic_fetch_or(ptr, n, __ATOMIC_SEQ_CST)
+#define atomic_fetch_xor(ptr, n) __atomic_fetch_xor(ptr, n, __ATOMIC_SEQ_CST)
 
 /* And even shorter names that return void.  */
 #define atomic_inc(ptr)    ((void) __atomic_fetch_add(ptr, 1, __ATOMIC_SEQ_CST))
@@ -209,6 +210,7 @@
 #define atomic_sub(ptr, n) ((void) __atomic_fetch_sub(ptr, n, __ATOMIC_SEQ_CST))
 #define atomic_and(ptr, n) ((void) __atomic_fetch_and(ptr, n, __ATOMIC_SEQ_CST))
 #define atomic_or(ptr, n)  ((void) __atomic_fetch_or(ptr, n, __ATOMIC_SEQ_CST))
+#define atomic_xor(ptr, n) ((void) __atomic_fetch_xor(ptr, n, __ATOMIC_SEQ_CST))
 
 #else /* __ATOMIC_RELAXED */
 
@@ -395,6 +397,7 @@
 #define atomic_fetch_sub       __sync_fetch_and_sub
 #define atomic_fetch_and       __sync_fetch_and_and
 #define atomic_fetch_or        __sync_fetch_and_or
+#define atomic_fetch_xor       __sync_fetch_and_xor
 #define atomic_cmpxchg         __sync_val_compare_and_swap
 
 /* And even shorter names that return void.  */
@@ -404,6 +407,7 @@
 #define atomic_sub(ptr, n)     ((void) __sync_fetch_and_sub(ptr, n))
 #define atomic_and(ptr, n)     ((void) __sync_fetch_and_and(ptr, n))
 #define atomic_or(ptr, n)      ((void) __sync_fetch_and_or(ptr, n))
+#define atomic_xor(ptr, n)     ((void) __sync_fetch_and_xor(ptr, n))
 
 #endif /* __ATOMIC_RELAXED */
 #endif /* QEMU_ATOMIC_H */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [Qemu-devel] [PATCH v3 02/34] atomics: add atomic_op_fetch variants
  2016-09-03 20:39 [Qemu-devel] [PATCH v3 00/34] cmpxchg-based emulation of atomics Richard Henderson
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 01/34] atomics: add atomic_xor Richard Henderson
@ 2016-09-03 20:39 ` Richard Henderson
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 03/34] exec: Avoid direct references to Int128 parts Richard Henderson
                   ` (35 subsequent siblings)
  37 siblings, 0 replies; 97+ messages in thread
From: Richard Henderson @ 2016-09-03 20:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: Emilio G. Cota

From: "Emilio G. Cota" <cota@braap.org>

This paves the way for upcoming work.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Message-Id: <1467054136-10430-9-git-send-email-cota@braap.org>
---
 include/qemu/atomic.h | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/include/qemu/atomic.h b/include/qemu/atomic.h
index f0c1db0..6e5cdd5 100644
--- a/include/qemu/atomic.h
+++ b/include/qemu/atomic.h
@@ -203,6 +203,14 @@
 #define atomic_fetch_or(ptr, n)  __atomic_fetch_or(ptr, n, __ATOMIC_SEQ_CST)
 #define atomic_fetch_xor(ptr, n) __atomic_fetch_xor(ptr, n, __ATOMIC_SEQ_CST)
 
+#define atomic_inc_fetch(ptr)    __atomic_add_fetch(ptr, 1, __ATOMIC_SEQ_CST)
+#define atomic_dec_fetch(ptr)    __atomic_sub_fetch(ptr, 1, __ATOMIC_SEQ_CST)
+#define atomic_add_fetch(ptr, n) __atomic_add_fetch(ptr, n, __ATOMIC_SEQ_CST)
+#define atomic_sub_fetch(ptr, n) __atomic_sub_fetch(ptr, n, __ATOMIC_SEQ_CST)
+#define atomic_and_fetch(ptr, n) __atomic_and_fetch(ptr, n, __ATOMIC_SEQ_CST)
+#define atomic_or_fetch(ptr, n)  __atomic_or_fetch(ptr, n, __ATOMIC_SEQ_CST)
+#define atomic_xor_fetch(ptr, n) __atomic_xor_fetch(ptr, n, __ATOMIC_SEQ_CST)
+
 /* And even shorter names that return void.  */
 #define atomic_inc(ptr)    ((void) __atomic_fetch_add(ptr, 1, __ATOMIC_SEQ_CST))
 #define atomic_dec(ptr)    ((void) __atomic_fetch_sub(ptr, 1, __ATOMIC_SEQ_CST))
@@ -398,6 +406,15 @@
 #define atomic_fetch_and       __sync_fetch_and_and
 #define atomic_fetch_or        __sync_fetch_and_or
 #define atomic_fetch_xor       __sync_fetch_and_xor
+
+#define atomic_inc_fetch(ptr)  __sync_add_and_fetch(ptr, 1)
+#define atomic_dec_fetch(ptr)  __sync_add_and_fetch(ptr, -1)
+#define atomic_add_fetch       __sync_add_and_fetch
+#define atomic_sub_fetch       __sync_sub_and_fetch
+#define atomic_and_fetch       __sync_and_and_fetch
+#define atomic_or_fetch        __sync_or_and_fetch
+#define atomic_xor_fetch       __sync_xor_and_fetch
+
 #define atomic_cmpxchg         __sync_val_compare_and_swap
 
 /* And even shorter names that return void.  */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [Qemu-devel] [PATCH v3 03/34] exec: Avoid direct references to Int128 parts
  2016-09-03 20:39 [Qemu-devel] [PATCH v3 00/34] cmpxchg-based emulation of atomics Richard Henderson
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 01/34] atomics: add atomic_xor Richard Henderson
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 02/34] atomics: add atomic_op_fetch variants Richard Henderson
@ 2016-09-03 20:39 ` Richard Henderson
  2016-09-09 17:14   ` Alex Bennée
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 04/34] int128: Use __int128 if available Richard Henderson
                   ` (34 subsequent siblings)
  37 siblings, 1 reply; 97+ messages in thread
From: Richard Henderson @ 2016-09-03 20:39 UTC (permalink / raw)
  To: qemu-devel

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 exec.c                |  4 ++--
 include/qemu/int128.h | 10 ++++++++++
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/exec.c b/exec.c
index 8ffde75..373313d 100644
--- a/exec.c
+++ b/exec.c
@@ -320,9 +320,9 @@ static inline bool section_covers_addr(const MemoryRegionSection *section,
     /* Memory topology clips a memory region to [0, 2^64); size.hi > 0 means
      * the section must cover the entire address space.
      */
-    return section->size.hi ||
+    return int128_gethi(section->size) ||
            range_covers_byte(section->offset_within_address_space,
-                             section->size.lo, addr);
+                             int128_getlo(section->size), addr);
 }
 
 static MemoryRegionSection *phys_page_find(PhysPageEntry lp, hwaddr addr,
diff --git a/include/qemu/int128.h b/include/qemu/int128.h
index c598881..52aaf99 100644
--- a/include/qemu/int128.h
+++ b/include/qemu/int128.h
@@ -20,6 +20,16 @@ static inline uint64_t int128_get64(Int128 a)
     return a.lo;
 }
 
+static inline uint64_t int128_getlo(Int128 a)
+{
+    return a.lo;
+}
+
+static inline int64_t int128_gethi(Int128 a)
+{
+    return a.hi;
+}
+
 static inline Int128 int128_zero(void)
 {
     return int128_make64(0);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [Qemu-devel] [PATCH v3 04/34] int128: Use __int128 if available
  2016-09-03 20:39 [Qemu-devel] [PATCH v3 00/34] cmpxchg-based emulation of atomics Richard Henderson
                   ` (2 preceding siblings ...)
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 03/34] exec: Avoid direct references to Int128 parts Richard Henderson
@ 2016-09-03 20:39 ` Richard Henderson
  2016-09-09 17:19   ` Alex Bennée
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 05/34] int128: Add int128_make128 Richard Henderson
                   ` (33 subsequent siblings)
  37 siblings, 1 reply; 97+ messages in thread
From: Richard Henderson @ 2016-09-03 20:39 UTC (permalink / raw)
  To: qemu-devel

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 include/qemu/int128.h | 135 +++++++++++++++++++++++++++++++++++++++++++++++++-
 tests/test-int128.c   |  22 ++++----
 2 files changed, 145 insertions(+), 12 deletions(-)

diff --git a/include/qemu/int128.h b/include/qemu/int128.h
index 52aaf99..08f1db1 100644
--- a/include/qemu/int128.h
+++ b/include/qemu/int128.h
@@ -1,6 +1,138 @@
 #ifndef INT128_H
 #define INT128_H
 
+#ifdef CONFIG_INT128
+
+typedef __int128 Int128;
+
+static inline Int128 int128_make64(uint64_t a)
+{
+    return a;
+}
+
+static inline uint64_t int128_get64(Int128 a)
+{
+    uint64_t r = a;
+    assert(r == a);
+    return r;
+}
+
+static inline uint64_t int128_getlo(Int128 a)
+{
+    return a;
+}
+
+static inline int64_t int128_gethi(Int128 a)
+{
+    return a >> 64;
+}
+
+static inline Int128 int128_zero(void)
+{
+    return 0;
+}
+
+static inline Int128 int128_one(void)
+{
+    return 1;
+}
+
+static inline Int128 int128_2_64(void)
+{
+    return (Int128)1 << 64;
+}
+
+static inline Int128 int128_exts64(int64_t a)
+{
+    return a;
+}
+
+static inline Int128 int128_and(Int128 a, Int128 b)
+{
+    return a & b;
+}
+
+static inline Int128 int128_rshift(Int128 a, int n)
+{
+    return a >> n;
+}
+
+static inline Int128 int128_add(Int128 a, Int128 b)
+{
+    return a + b;
+}
+
+static inline Int128 int128_neg(Int128 a)
+{
+    return -a;
+}
+
+static inline Int128 int128_sub(Int128 a, Int128 b)
+{
+    return a - b;
+}
+
+static inline bool int128_nonneg(Int128 a)
+{
+    return a >= 0;
+}
+
+static inline bool int128_eq(Int128 a, Int128 b)
+{
+    return a == b;
+}
+
+static inline bool int128_ne(Int128 a, Int128 b)
+{
+    return a != b;
+}
+
+static inline bool int128_ge(Int128 a, Int128 b)
+{
+    return a >= b;
+}
+
+static inline bool int128_lt(Int128 a, Int128 b)
+{
+    return a < b;
+}
+
+static inline bool int128_le(Int128 a, Int128 b)
+{
+    return a <= b;
+}
+
+static inline bool int128_gt(Int128 a, Int128 b)
+{
+    return a > b;
+}
+
+static inline bool int128_nz(Int128 a)
+{
+    return a != 0;
+}
+
+static inline Int128 int128_min(Int128 a, Int128 b)
+{
+    return a < b ? a : b;
+}
+
+static inline Int128 int128_max(Int128 a, Int128 b)
+{
+    return a > b ? a : b;
+}
+
+static inline void int128_addto(Int128 *a, Int128 b)
+{
+    *a += b;
+}
+
+static inline void int128_subfrom(Int128 *a, Int128 b)
+{
+    *a -= b;
+}
+
+#else /* !CONFIG_INT128 */
 
 typedef struct Int128 Int128;
 
@@ -153,4 +285,5 @@ static inline void int128_subfrom(Int128 *a, Int128 b)
     *a = int128_sub(*a, b);
 }
 
-#endif
+#endif /* CONFIG_INT128 */
+#endif /* INT128_H */
diff --git a/tests/test-int128.c b/tests/test-int128.c
index 4390123..b86a3c7 100644
--- a/tests/test-int128.c
+++ b/tests/test-int128.c
@@ -41,7 +41,7 @@ static Int128 expand(uint32_t x)
     uint64_t l, h;
     l = expand16(x & 65535);
     h = expand16(x >> 16);
-    return (Int128) {l, h};
+    return (Int128) int128_make128(l, h);
 };
 
 static void test_and(void)
@@ -54,8 +54,8 @@ static void test_and(void)
             Int128 b = expand(tests[j]);
             Int128 r = expand(tests[i] & tests[j]);
             Int128 s = int128_and(a, b);
-            g_assert_cmpuint(r.lo, ==, s.lo);
-            g_assert_cmpuint(r.hi, ==, s.hi);
+            g_assert_cmpuint(int128_getlo(r), ==, int128_getlo(s));
+            g_assert_cmpuint(int128_gethi(r), ==, int128_gethi(s));
         }
     }
 }
@@ -70,8 +70,8 @@ static void test_add(void)
             Int128 b = expand(tests[j]);
             Int128 r = expand(tests[i] + tests[j]);
             Int128 s = int128_add(a, b);
-            g_assert_cmpuint(r.lo, ==, s.lo);
-            g_assert_cmpuint(r.hi, ==, s.hi);
+            g_assert_cmpuint(int128_getlo(r), ==, int128_getlo(s));
+            g_assert_cmpuint(int128_gethi(r), ==, int128_gethi(s));
         }
     }
 }
@@ -86,8 +86,8 @@ static void test_sub(void)
             Int128 b = expand(tests[j]);
             Int128 r = expand(tests[i] - tests[j]);
             Int128 s = int128_sub(a, b);
-            g_assert_cmpuint(r.lo, ==, s.lo);
-            g_assert_cmpuint(r.hi, ==, s.hi);
+            g_assert_cmpuint(int128_getlo(r), ==, int128_getlo(s));
+            g_assert_cmpuint(int128_gethi(r), ==, int128_gethi(s));
         }
     }
 }
@@ -100,8 +100,8 @@ static void test_neg(void)
         Int128 a = expand(tests[i]);
         Int128 r = expand(-tests[i]);
         Int128 s = int128_neg(a);
-        g_assert_cmpuint(r.lo, ==, s.lo);
-        g_assert_cmpuint(r.hi, ==, s.hi);
+        g_assert_cmpuint(int128_getlo(r), ==, int128_getlo(s));
+        g_assert_cmpuint(int128_gethi(r), ==, int128_gethi(s));
     }
 }
 
@@ -180,8 +180,8 @@ test_rshift_one(uint32_t x, int n, uint64_t h, uint64_t l)
 {
     Int128 a = expand(x);
     Int128 r = int128_rshift(a, n);
-    g_assert_cmpuint(r.lo, ==, l);
-    g_assert_cmpuint(r.hi, ==, h);
+    g_assert_cmpuint(int128_getlo(r), ==, l);
+    g_assert_cmpuint(int128_gethi(r), ==, h);
 }
 
 static void test_rshift(void)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [Qemu-devel] [PATCH v3 05/34] int128: Add int128_make128
  2016-09-03 20:39 [Qemu-devel] [PATCH v3 00/34] cmpxchg-based emulation of atomics Richard Henderson
                   ` (3 preceding siblings ...)
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 04/34] int128: Use __int128 if available Richard Henderson
@ 2016-09-03 20:39 ` Richard Henderson
  2016-09-09 13:01   ` Leon Alrae
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 06/34] tcg: Add EXCP_ATOMIC Richard Henderson
                   ` (32 subsequent siblings)
  37 siblings, 1 reply; 97+ messages in thread
From: Richard Henderson @ 2016-09-03 20:39 UTC (permalink / raw)
  To: qemu-devel

Allows Int128 to be used more generally, rather than having to
begin with 64-bit inputs and accumulate.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 include/qemu/int128.h | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/include/qemu/int128.h b/include/qemu/int128.h
index 08f1db1..67440fa 100644
--- a/include/qemu/int128.h
+++ b/include/qemu/int128.h
@@ -10,6 +10,11 @@ static inline Int128 int128_make64(uint64_t a)
     return a;
 }
 
+static inline Int128 int128_make128(uint64_t lo, uint64_t hi)
+{
+    return (unsigned __int128)hi << 64 | lo;
+}
+
 static inline uint64_t int128_get64(Int128 a)
 {
     uint64_t r = a;
@@ -146,6 +151,11 @@ static inline Int128 int128_make64(uint64_t a)
     return (Int128) { a, 0 };
 }
 
+static inline Int128 int128_make128(uint64_t lo, uint64_t hi)
+{
+    return (Int128) { lo, hi };
+}
+
 static inline uint64_t int128_get64(Int128 a)
 {
     assert(!a.hi);
@@ -195,9 +205,9 @@ static inline Int128 int128_rshift(Int128 a, int n)
     }
     h = a.hi >> (n & 63);
     if (n >= 64) {
-        return (Int128) { h, h >> 63 };
+        return int128_make128(h, h >> 63);
     } else {
-        return (Int128) { (a.lo >> n) | ((uint64_t)a.hi << (64 - n)), h };
+        return int128_make128((a.lo >> n) | ((uint64_t)a.hi << (64 - n)), h);
     }
 }
 
@@ -211,18 +221,18 @@ static inline Int128 int128_add(Int128 a, Int128 b)
      *
      * So the carry is lo < a.lo.
      */
-    return (Int128) { lo, (uint64_t)a.hi + b.hi + (lo < a.lo) };
+    return int128_make128(lo, (uint64_t)a.hi + b.hi + (lo < a.lo));
 }
 
 static inline Int128 int128_neg(Int128 a)
 {
     uint64_t lo = -a.lo;
-    return (Int128) { lo, ~(uint64_t)a.hi + !lo };
+    return int128_make128(lo, ~(uint64_t)a.hi + !lo);
 }
 
 static inline Int128 int128_sub(Int128 a, Int128 b)
 {
-    return (Int128){ a.lo - b.lo, (uint64_t)a.hi - b.hi - (a.lo < b.lo) };
+    return int128_make128(a.lo - b.lo, (uint64_t)a.hi - b.hi - (a.lo < b.lo));
 }
 
 static inline bool int128_nonneg(Int128 a)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [Qemu-devel] [PATCH v3 06/34] tcg: Add EXCP_ATOMIC
  2016-09-03 20:39 [Qemu-devel] [PATCH v3 00/34] cmpxchg-based emulation of atomics Richard Henderson
                   ` (4 preceding siblings ...)
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 05/34] int128: Add int128_make128 Richard Henderson
@ 2016-09-03 20:39 ` Richard Henderson
  2016-09-12 14:16   ` Alex Bennée
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 07/34] HACK: Always enable parallel_cpus Richard Henderson
                   ` (31 subsequent siblings)
  37 siblings, 1 reply; 97+ messages in thread
From: Richard Henderson @ 2016-09-03 20:39 UTC (permalink / raw)
  To: qemu-devel

When we cannot emulate an atomic operation within a parallel
context, this exception allows us to stop the world and try
again in a serial context.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 cpu-exec-common.c       |  6 +++++
 cpu-exec.c              | 23 +++++++++++++++++++
 cpus.c                  |  6 +++++
 include/exec/cpu-all.h  |  1 +
 include/exec/exec-all.h |  1 +
 include/qemu-common.h   |  1 +
 linux-user/main.c       | 59 ++++++++++++++++++++++++++++++++++++++++++++++++-
 tcg/tcg.h               |  1 +
 translate-all.c         |  1 +
 9 files changed, 98 insertions(+), 1 deletion(-)

diff --git a/cpu-exec-common.c b/cpu-exec-common.c
index 0cb4ae6..767d9c6 100644
--- a/cpu-exec-common.c
+++ b/cpu-exec-common.c
@@ -77,3 +77,9 @@ void cpu_loop_exit_restore(CPUState *cpu, uintptr_t pc)
     }
     siglongjmp(cpu->jmp_env, 1);
 }
+
+void cpu_loop_exit_atomic(CPUState *cpu, uintptr_t pc)
+{
+    cpu->exception_index = EXCP_ATOMIC;
+    cpu_loop_exit_restore(cpu, pc);
+}
diff --git a/cpu-exec.c b/cpu-exec.c
index 5d9710a..8d77516 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -225,6 +225,29 @@ static void cpu_exec_nocache(CPUState *cpu, int max_cycles,
 }
 #endif
 
+void cpu_exec_step(CPUState *cpu)
+{
+    CPUArchState *env = (CPUArchState *)cpu->env_ptr;
+    TranslationBlock *tb;
+    target_ulong cs_base, pc;
+    uint32_t flags;
+    bool old_tb_flushed;
+
+    old_tb_flushed = cpu->tb_flushed;
+    cpu->tb_flushed = false;
+
+    cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
+    tb = tb_gen_code(cpu, pc, cs_base, flags,
+                     1 | CF_NOCACHE | CF_IGNORE_ICOUNT);
+    tb->orig_tb = NULL;
+    cpu->tb_flushed |= old_tb_flushed;
+    /* execute the generated code */
+    trace_exec_tb_nocache(tb, pc);
+    cpu_tb_exec(cpu, tb);
+    tb_phys_invalidate(tb, -1);
+    tb_free(tb);
+}
+
 struct tb_desc {
     target_ulong pc;
     target_ulong cs_base;
diff --git a/cpus.c b/cpus.c
index 84c3520..a01bbbd 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1575,6 +1575,12 @@ static void tcg_exec_all(void)
             if (r == EXCP_DEBUG) {
                 cpu_handle_guest_debug(cpu);
                 break;
+            } else if (r == EXCP_ATOMIC) {
+                /* ??? When we begin running cpus in parallel, we should
+                   stop all cpus, clear parallel_cpus, and interpret a
+                   single insn with cpu_exec_step.  In the meantime,
+                   we should never get here.  */
+                abort();
             }
         } else if (cpu->stop || cpu->stopped) {
             if (cpu->unplug) {
diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index b6a7059..1f7e9d8 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -31,6 +31,7 @@
 #define EXCP_DEBUG      0x10002 /* cpu stopped after a breakpoint or singlestep */
 #define EXCP_HALTED     0x10003 /* cpu is halted (waiting for external event) */
 #define EXCP_YIELD      0x10004 /* cpu wants to yield timeslice to another */
+#define EXCP_ATOMIC     0x10005 /* stop-the-world and emulate atomic */
 
 /* some important defines:
  *
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index d008296..08424ae 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -71,6 +71,7 @@ static inline void cpu_list_lock(void)
 void cpu_exec_init(CPUState *cpu, Error **errp);
 void QEMU_NORETURN cpu_loop_exit(CPUState *cpu);
 void QEMU_NORETURN cpu_loop_exit_restore(CPUState *cpu, uintptr_t pc);
+void QEMU_NORETURN cpu_loop_exit_atomic(CPUState *cpu, uintptr_t pc);
 
 #if !defined(CONFIG_USER_ONLY)
 void cpu_reloading_memory_map(void);
diff --git a/include/qemu-common.h b/include/qemu-common.h
index 9e8b0bd..f814317 100644
--- a/include/qemu-common.h
+++ b/include/qemu-common.h
@@ -80,6 +80,7 @@ void tcg_exec_init(unsigned long tb_size);
 bool tcg_enabled(void);
 
 void cpu_exec_init_all(void);
+void cpu_exec_step(CPUState *cpu);
 
 /**
  * Sends a (part of) iovec down a socket, yielding when the socket is full, or
diff --git a/linux-user/main.c b/linux-user/main.c
index f2f4d2f..8d5c948 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -182,13 +182,25 @@ static inline void start_exclusive(void)
 }
 
 /* Finish an exclusive operation.  */
-static inline void __attribute__((unused)) end_exclusive(void)
+static inline void end_exclusive(void)
 {
     pending_cpus = 0;
     pthread_cond_broadcast(&exclusive_resume);
     pthread_mutex_unlock(&exclusive_lock);
 }
 
+static void step_atomic(CPUState *cpu)
+{
+    start_exclusive();
+
+    /* Since we got here, we know that parallel_cpus must be true.  */
+    parallel_cpus = false;
+    cpu_exec_step(cpu);
+    parallel_cpus = true;
+
+    end_exclusive();
+}
+
 /* Wait for exclusive ops to finish, and begin cpu execution.  */
 static inline void cpu_exec_start(CPUState *cpu)
 {
@@ -440,6 +452,9 @@ void cpu_loop(CPUX86State *env)
                   }
             }
             break;
+        case EXCP_ATOMIC:
+            step_atomic(cs);
+            break;
         default:
             pc = env->segs[R_CS].base + env->eip;
             EXCP_DUMP(env, "qemu: 0x%08lx: unhandled CPU exception 0x%x - aborting\n",
@@ -932,6 +947,9 @@ void cpu_loop(CPUARMState *env)
         case EXCP_YIELD:
             /* nothing to do here for user-mode, just resume guest code */
             break;
+        case EXCP_ATOMIC:
+            step_atomic(cs);
+            break;
         default:
         error:
             EXCP_DUMP(env, "qemu: unhandled CPU exception 0x%x - aborting\n", trapnr);
@@ -1131,6 +1149,9 @@ void cpu_loop(CPUARMState *env)
         case EXCP_YIELD:
             /* nothing to do here for user-mode, just resume guest code */
             break;
+        case EXCP_ATOMIC:
+            step_atomic(cs);
+            break;
         default:
             EXCP_DUMP(env, "qemu: unhandled CPU exception 0x%x - aborting\n", trapnr);
             abort();
@@ -1220,6 +1241,9 @@ void cpu_loop(CPUUniCore32State *env)
                 }
             }
             break;
+        case EXCP_ATOMIC:
+            step_atomic(cs);
+            break;
         default:
             goto error;
         }
@@ -1492,6 +1516,9 @@ void cpu_loop (CPUSPARCState *env)
                   }
             }
             break;
+        case EXCP_ATOMIC:
+            step_atomic(cs);
+            break;
         default:
             printf ("Unhandled trap: 0x%x\n", trapnr);
             cpu_dump_state(cs, stderr, fprintf, 0);
@@ -2040,6 +2067,9 @@ void cpu_loop(CPUPPCState *env)
         case EXCP_INTERRUPT:
             /* just indicate that signals should be handled asap */
             break;
+        case EXCP_ATOMIC:
+            step_atomic(cs);
+            break;
         default:
             cpu_abort(cs, "Unknown exception 0x%d. Aborting\n", trapnr);
             break;
@@ -2713,6 +2743,9 @@ done_syscall:
                 }
             }
             break;
+        case EXCP_ATOMIC:
+            step_atomic(cs);
+            break;
         default:
 error:
             EXCP_DUMP(env, "qemu: unhandled CPU exception 0x%x - aborting\n", trapnr);
@@ -2799,6 +2832,9 @@ void cpu_loop(CPUOpenRISCState *env)
         case EXCP_NR:
             qemu_log_mask(CPU_LOG_INT, "\nNR\n");
             break;
+        case EXCP_ATOMIC:
+            step_atomic(cs);
+            break;
         default:
             EXCP_DUMP(env, "\nqemu: unhandled CPU exception %#x - aborting\n",
                      trapnr);
@@ -2874,6 +2910,9 @@ void cpu_loop(CPUSH4State *env)
             queue_signal(env, info.si_signo, &info);
 	    break;
 
+        case EXCP_ATOMIC:
+            step_atomic(cs);
+            break;
         default:
             printf ("Unhandled trap: 0x%x\n", trapnr);
             cpu_dump_state(cs, stderr, fprintf, 0);
@@ -2939,6 +2978,9 @@ void cpu_loop(CPUCRISState *env)
                   }
             }
             break;
+        case EXCP_ATOMIC:
+            step_atomic(cs);
+            break;
         default:
             printf ("Unhandled trap: 0x%x\n", trapnr);
             cpu_dump_state(cs, stderr, fprintf, 0);
@@ -3053,6 +3095,9 @@ void cpu_loop(CPUMBState *env)
                   }
             }
             break;
+        case EXCP_ATOMIC:
+            step_atomic(cs);
+            break;
         default:
             printf ("Unhandled trap: 0x%x\n", trapnr);
             cpu_dump_state(cs, stderr, fprintf, 0);
@@ -3154,6 +3199,9 @@ void cpu_loop(CPUM68KState *env)
                   }
             }
             break;
+        case EXCP_ATOMIC:
+            step_atomic(cs);
+            break;
         default:
             EXCP_DUMP(env, "qemu: unhandled CPU exception 0x%x - aborting\n", trapnr);
             abort();
@@ -3389,6 +3437,9 @@ void cpu_loop(CPUAlphaState *env)
         case EXCP_INTERRUPT:
             /* Just indicate that signals should be handled asap.  */
             break;
+        case EXCP_ATOMIC:
+            step_atomic(cs);
+            break;
         default:
             printf ("Unhandled trap: 0x%x\n", trapnr);
             cpu_dump_state(cs, stderr, fprintf, 0);
@@ -3516,6 +3567,9 @@ void cpu_loop(CPUS390XState *env)
             queue_signal(env, info.si_signo, &info);
             break;
 
+        case EXCP_ATOMIC:
+            step_atomic(cs);
+            break;
         default:
             fprintf(stderr, "Unhandled trap: 0x%x\n", trapnr);
             cpu_dump_state(cs, stderr, fprintf, 0);
@@ -3768,6 +3822,9 @@ void cpu_loop(CPUTLGState *env)
         case TILEGX_EXCP_REG_UDN_ACCESS:
             gen_sigill_reg(env);
             break;
+        case EXCP_ATOMIC:
+            step_atomic(cs);
+            break;
         default:
             fprintf(stderr, "trapnr is %d[0x%x].\n", trapnr, trapnr);
             g_assert_not_reached();
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 1bcabca..00498fc 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -700,6 +700,7 @@ struct TCGContext {
 };
 
 extern TCGContext tcg_ctx;
+extern bool parallel_cpus;
 
 static inline void tcg_set_insn_param(int op_idx, int arg, TCGArg v)
 {
diff --git a/translate-all.c b/translate-all.c
index 0dd6466..f97fc1e 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -119,6 +119,7 @@ static void *l1_map[V_L1_SIZE];
 
 /* code generation context */
 TCGContext tcg_ctx;
+bool parallel_cpus;
 
 /* translation block context */
 #ifdef CONFIG_USER_ONLY
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [Qemu-devel] [PATCH v3 07/34] HACK: Always enable parallel_cpus
  2016-09-03 20:39 [Qemu-devel] [PATCH v3 00/34] cmpxchg-based emulation of atomics Richard Henderson
                   ` (5 preceding siblings ...)
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 06/34] tcg: Add EXCP_ATOMIC Richard Henderson
@ 2016-09-03 20:39 ` Richard Henderson
  2016-09-12 14:20   ` Alex Bennée
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 08/34] cputlb: Replace SHIFT with DATA_SIZE Richard Henderson
                   ` (30 subsequent siblings)
  37 siblings, 1 reply; 97+ messages in thread
From: Richard Henderson @ 2016-09-03 20:39 UTC (permalink / raw)
  To: qemu-devel

This is really just a placeholder for an actual
command-line switch for mttcg.
---
 translate-all.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/translate-all.c b/translate-all.c
index f97fc1e..d6d879c 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -119,7 +119,7 @@ static void *l1_map[V_L1_SIZE];
 
 /* code generation context */
 TCGContext tcg_ctx;
-bool parallel_cpus;
+bool parallel_cpus = 1;
 
 /* translation block context */
 #ifdef CONFIG_USER_ONLY
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [Qemu-devel] [PATCH v3 08/34] cputlb: Replace SHIFT with DATA_SIZE
  2016-09-03 20:39 [Qemu-devel] [PATCH v3 00/34] cmpxchg-based emulation of atomics Richard Henderson
                   ` (6 preceding siblings ...)
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 07/34] HACK: Always enable parallel_cpus Richard Henderson
@ 2016-09-03 20:39 ` Richard Henderson
  2016-09-12 14:22   ` Alex Bennée
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 09/34] cputlb: Move probe_write out of softmmu_template.h Richard Henderson
                   ` (29 subsequent siblings)
  37 siblings, 1 reply; 97+ messages in thread
From: Richard Henderson @ 2016-09-03 20:39 UTC (permalink / raw)
  To: qemu-devel

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 cputlb.c           | 16 ++++++++--------
 softmmu_template.h |  7 ++-----
 2 files changed, 10 insertions(+), 13 deletions(-)

diff --git a/cputlb.c b/cputlb.c
index d068ee5..cf68211 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -529,16 +529,16 @@ static bool victim_tlb_hit(CPUArchState *env, size_t mmu_idx, size_t index,
 
 #define MMUSUFFIX _mmu
 
-#define SHIFT 0
+#define DATA_SIZE 1
 #include "softmmu_template.h"
 
-#define SHIFT 1
+#define DATA_SIZE 2
 #include "softmmu_template.h"
 
-#define SHIFT 2
+#define DATA_SIZE 4
 #include "softmmu_template.h"
 
-#define SHIFT 3
+#define DATA_SIZE 8
 #include "softmmu_template.h"
 #undef MMUSUFFIX
 
@@ -549,14 +549,14 @@ static bool victim_tlb_hit(CPUArchState *env, size_t mmu_idx, size_t index,
 #define GETRA() ((uintptr_t)0)
 #define SOFTMMU_CODE_ACCESS
 
-#define SHIFT 0
+#define DATA_SIZE 1
 #include "softmmu_template.h"
 
-#define SHIFT 1
+#define DATA_SIZE 2
 #include "softmmu_template.h"
 
-#define SHIFT 2
+#define DATA_SIZE 4
 #include "softmmu_template.h"
 
-#define SHIFT 3
+#define DATA_SIZE 8
 #include "softmmu_template.h"
diff --git a/softmmu_template.h b/softmmu_template.h
index 284ab2c..3c56df1 100644
--- a/softmmu_template.h
+++ b/softmmu_template.h
@@ -25,8 +25,6 @@
 #include "exec/address-spaces.h"
 #include "exec/memory.h"
 
-#define DATA_SIZE (1 << SHIFT)
-
 #if DATA_SIZE == 8
 #define SUFFIX q
 #define LSUFFIX q
@@ -134,7 +132,7 @@ static inline DATA_TYPE glue(io_read, SUFFIX)(CPUArchState *env,
     }
 
     cpu->mem_io_vaddr = addr;
-    memory_region_dispatch_read(mr, physaddr, &val, 1 << SHIFT,
+    memory_region_dispatch_read(mr, physaddr, &val, DATA_SIZE,
                                 iotlbentry->attrs);
     return val;
 }
@@ -321,7 +319,7 @@ static inline void glue(io_write, SUFFIX)(CPUArchState *env,
 
     cpu->mem_io_vaddr = addr;
     cpu->mem_io_pc = retaddr;
-    memory_region_dispatch_write(mr, physaddr, val, 1 << SHIFT,
+    memory_region_dispatch_write(mr, physaddr, val, DATA_SIZE,
                                  iotlbentry->attrs);
 }
 
@@ -512,7 +510,6 @@ void probe_write(CPUArchState *env, target_ulong addr, int mmu_idx,
 #endif /* !defined(SOFTMMU_CODE_ACCESS) */
 
 #undef READ_ACCESS_TYPE
-#undef SHIFT
 #undef DATA_TYPE
 #undef SUFFIX
 #undef LSUFFIX
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [Qemu-devel] [PATCH v3 09/34] cputlb: Move probe_write out of softmmu_template.h
  2016-09-03 20:39 [Qemu-devel] [PATCH v3 00/34] cmpxchg-based emulation of atomics Richard Henderson
                   ` (7 preceding siblings ...)
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 08/34] cputlb: Replace SHIFT with DATA_SIZE Richard Henderson
@ 2016-09-03 20:39 ` Richard Henderson
  2016-09-12 14:35   ` Alex Bennée
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 10/34] cputlb: Remove includes from softmmu_template.h Richard Henderson
                   ` (28 subsequent siblings)
  37 siblings, 1 reply; 97+ messages in thread
From: Richard Henderson @ 2016-09-03 20:39 UTC (permalink / raw)
  To: qemu-devel

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 cputlb.c           | 21 +++++++++++++++++++++
 softmmu_template.h | 23 -----------------------
 2 files changed, 21 insertions(+), 23 deletions(-)

diff --git a/cputlb.c b/cputlb.c
index cf68211..8a021ab 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -527,6 +527,27 @@ static bool victim_tlb_hit(CPUArchState *env, size_t mmu_idx, size_t index,
   victim_tlb_hit(env, mmu_idx, index, offsetof(CPUTLBEntry, TY), \
                  (ADDR) & TARGET_PAGE_MASK)
 
+/* Probe for whether the specified guest write access is permitted.
+ * If it is not permitted then an exception will be taken in the same
+ * way as if this were a real write access (and we will not return).
+ * Otherwise the function will return, and there will be a valid
+ * entry in the TLB for this access.
+ */
+void probe_write(CPUArchState *env, target_ulong addr, int mmu_idx,
+                 uintptr_t retaddr)
+{
+    int index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
+    target_ulong tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
+
+    if ((addr & TARGET_PAGE_MASK)
+        != (tlb_addr & (TARGET_PAGE_MASK | TLB_INVALID_MASK))) {
+        /* TLB entry is for a different page */
+        if (!VICTIM_TLB_HIT(addr_write, addr)) {
+            tlb_fill(ENV_GET_CPU(env), addr, MMU_DATA_STORE, mmu_idx, retaddr);
+        }
+    }
+}
+
 #define MMUSUFFIX _mmu
 
 #define DATA_SIZE 1
diff --git a/softmmu_template.h b/softmmu_template.h
index 3c56df1..6805028 100644
--- a/softmmu_template.h
+++ b/softmmu_template.h
@@ -484,29 +484,6 @@ void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
     glue(glue(st, SUFFIX), _be_p)((uint8_t *)haddr, val);
 }
 #endif /* DATA_SIZE > 1 */
-
-#if DATA_SIZE == 1
-/* Probe for whether the specified guest write access is permitted.
- * If it is not permitted then an exception will be taken in the same
- * way as if this were a real write access (and we will not return).
- * Otherwise the function will return, and there will be a valid
- * entry in the TLB for this access.
- */
-void probe_write(CPUArchState *env, target_ulong addr, int mmu_idx,
-                 uintptr_t retaddr)
-{
-    int index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
-    target_ulong tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
-
-    if ((addr & TARGET_PAGE_MASK)
-        != (tlb_addr & (TARGET_PAGE_MASK | TLB_INVALID_MASK))) {
-        /* TLB entry is for a different page */
-        if (!VICTIM_TLB_HIT(addr_write, addr)) {
-            tlb_fill(ENV_GET_CPU(env), addr, MMU_DATA_STORE, mmu_idx, retaddr);
-        }
-    }
-}
-#endif
 #endif /* !defined(SOFTMMU_CODE_ACCESS) */
 
 #undef READ_ACCESS_TYPE
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [Qemu-devel] [PATCH v3 10/34] cputlb: Remove includes from softmmu_template.h
  2016-09-03 20:39 [Qemu-devel] [PATCH v3 00/34] cmpxchg-based emulation of atomics Richard Henderson
                   ` (8 preceding siblings ...)
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 09/34] cputlb: Move probe_write out of softmmu_template.h Richard Henderson
@ 2016-09-03 20:39 ` Richard Henderson
  2016-09-12 14:38   ` Alex Bennée
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 11/34] cputlb: Move most of iotlb code out of line Richard Henderson
                   ` (27 subsequent siblings)
  37 siblings, 1 reply; 97+ messages in thread
From: Richard Henderson @ 2016-09-03 20:39 UTC (permalink / raw)
  To: qemu-devel

We already include exec/address-spaces.h and exec/memory.h in
cputlb.c; the include of qemu/timer.h appears to be a fossil.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 softmmu_template.h | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/softmmu_template.h b/softmmu_template.h
index 6805028..a75d299 100644
--- a/softmmu_template.h
+++ b/softmmu_template.h
@@ -21,10 +21,6 @@
  * You should have received a copy of the GNU Lesser General Public
  * License along with this library; if not, see <http://www.gnu.org/licenses/>.
  */
-#include "qemu/timer.h"
-#include "exec/address-spaces.h"
-#include "exec/memory.h"
-
 #if DATA_SIZE == 8
 #define SUFFIX q
 #define LSUFFIX q
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [Qemu-devel] [PATCH v3 11/34] cputlb: Move most of iotlb code out of line
  2016-09-03 20:39 [Qemu-devel] [PATCH v3 00/34] cmpxchg-based emulation of atomics Richard Henderson
                   ` (9 preceding siblings ...)
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 10/34] cputlb: Remove includes from softmmu_template.h Richard Henderson
@ 2016-09-03 20:39 ` Richard Henderson
  2016-09-12 15:26   ` Alex Bennée
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 12/34] cputlb: Tidy some macros Richard Henderson
                   ` (26 subsequent siblings)
  37 siblings, 1 reply; 97+ messages in thread
From: Richard Henderson @ 2016-09-03 20:39 UTC (permalink / raw)
  To: qemu-devel

Saves 2k code size off of a cold path.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 cputlb.c           | 37 +++++++++++++++++++++++++++++++++++++
 softmmu_template.h | 52 ++++++++++------------------------------------------
 2 files changed, 47 insertions(+), 42 deletions(-)

diff --git a/cputlb.c b/cputlb.c
index 8a021ab..eba78a9 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -498,6 +498,43 @@ tb_page_addr_t get_page_addr_code(CPUArchState *env1, target_ulong addr)
     return qemu_ram_addr_from_host_nofail(p);
 }
 
+static uint64_t io_readx(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
+                         target_ulong addr, uintptr_t retaddr, int size)
+{
+    CPUState *cpu = ENV_GET_CPU(env);
+    hwaddr physaddr = iotlbentry->addr;
+    MemoryRegion *mr = iotlb_to_region(cpu, physaddr, iotlbentry->attrs);
+    uint64_t val;
+
+    physaddr = (physaddr & TARGET_PAGE_MASK) + addr;
+    cpu->mem_io_pc = retaddr;
+    if (mr != &io_mem_rom && mr != &io_mem_notdirty && !cpu->can_do_io) {
+        cpu_io_recompile(cpu, retaddr);
+    }
+
+    cpu->mem_io_vaddr = addr;
+    memory_region_dispatch_read(mr, physaddr, &val, size, iotlbentry->attrs);
+    return val;
+}
+
+static void io_writex(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
+                      uint64_t val, target_ulong addr,
+                      uintptr_t retaddr, int size)
+{
+    CPUState *cpu = ENV_GET_CPU(env);
+    hwaddr physaddr = iotlbentry->addr;
+    MemoryRegion *mr = iotlb_to_region(cpu, physaddr, iotlbentry->attrs);
+
+    physaddr = (physaddr & TARGET_PAGE_MASK) + addr;
+    if (mr != &io_mem_rom && mr != &io_mem_notdirty && !cpu->can_do_io) {
+        cpu_io_recompile(cpu, retaddr);
+    }
+
+    cpu->mem_io_vaddr = addr;
+    cpu->mem_io_pc = retaddr;
+    memory_region_dispatch_write(mr, physaddr, val, size, iotlbentry->attrs);
+}
+
 /* Return true if ADDR is present in the victim tlb, and has been copied
    back to the main tlb.  */
 static bool victim_tlb_hit(CPUArchState *env, size_t mmu_idx, size_t index,
diff --git a/softmmu_template.h b/softmmu_template.h
index a75d299..c513813 100644
--- a/softmmu_template.h
+++ b/softmmu_template.h
@@ -112,25 +112,12 @@
 
 #ifndef SOFTMMU_CODE_ACCESS
 static inline DATA_TYPE glue(io_read, SUFFIX)(CPUArchState *env,
-                                              CPUIOTLBEntry *iotlbentry,
+                                              size_t mmu_idx, size_t index,
                                               target_ulong addr,
                                               uintptr_t retaddr)
 {
-    uint64_t val;
-    CPUState *cpu = ENV_GET_CPU(env);
-    hwaddr physaddr = iotlbentry->addr;
-    MemoryRegion *mr = iotlb_to_region(cpu, physaddr, iotlbentry->attrs);
-
-    physaddr = (physaddr & TARGET_PAGE_MASK) + addr;
-    cpu->mem_io_pc = retaddr;
-    if (mr != &io_mem_rom && mr != &io_mem_notdirty && !cpu->can_do_io) {
-        cpu_io_recompile(cpu, retaddr);
-    }
-
-    cpu->mem_io_vaddr = addr;
-    memory_region_dispatch_read(mr, physaddr, &val, DATA_SIZE,
-                                iotlbentry->attrs);
-    return val;
+    CPUIOTLBEntry *iotlbentry = &env->iotlb[mmu_idx][index];
+    return io_readx(env, iotlbentry, addr, retaddr, DATA_SIZE);
 }
 #endif
 
@@ -164,15 +151,13 @@ WORD_TYPE helper_le_ld_name(CPUArchState *env, target_ulong addr,
 
     /* Handle an IO access.  */
     if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
-        CPUIOTLBEntry *iotlbentry;
         if ((addr & (DATA_SIZE - 1)) != 0) {
             goto do_unaligned_access;
         }
-        iotlbentry = &env->iotlb[mmu_idx][index];
 
         /* ??? Note that the io helpers always read data in the target
            byte ordering.  We should push the LE/BE request down into io.  */
-        res = glue(io_read, SUFFIX)(env, iotlbentry, addr, retaddr);
+        res = glue(io_read, SUFFIX)(env, mmu_idx, index, addr, retaddr);
         res = TGT_LE(res);
         return res;
     }
@@ -238,15 +223,13 @@ WORD_TYPE helper_be_ld_name(CPUArchState *env, target_ulong addr,
 
     /* Handle an IO access.  */
     if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
-        CPUIOTLBEntry *iotlbentry;
         if ((addr & (DATA_SIZE - 1)) != 0) {
             goto do_unaligned_access;
         }
-        iotlbentry = &env->iotlb[mmu_idx][index];
 
         /* ??? Note that the io helpers always read data in the target
            byte ordering.  We should push the LE/BE request down into io.  */
-        res = glue(io_read, SUFFIX)(env, iotlbentry, addr, retaddr);
+        res = glue(io_read, SUFFIX)(env, mmu_idx, index, addr, retaddr);
         res = TGT_BE(res);
         return res;
     }
@@ -299,24 +282,13 @@ WORD_TYPE helper_be_lds_name(CPUArchState *env, target_ulong addr,
 #endif
 
 static inline void glue(io_write, SUFFIX)(CPUArchState *env,
-                                          CPUIOTLBEntry *iotlbentry,
+                                          size_t mmu_idx, size_t index,
                                           DATA_TYPE val,
                                           target_ulong addr,
                                           uintptr_t retaddr)
 {
-    CPUState *cpu = ENV_GET_CPU(env);
-    hwaddr physaddr = iotlbentry->addr;
-    MemoryRegion *mr = iotlb_to_region(cpu, physaddr, iotlbentry->attrs);
-
-    physaddr = (physaddr & TARGET_PAGE_MASK) + addr;
-    if (mr != &io_mem_rom && mr != &io_mem_notdirty && !cpu->can_do_io) {
-        cpu_io_recompile(cpu, retaddr);
-    }
-
-    cpu->mem_io_vaddr = addr;
-    cpu->mem_io_pc = retaddr;
-    memory_region_dispatch_write(mr, physaddr, val, DATA_SIZE,
-                                 iotlbentry->attrs);
+    CPUIOTLBEntry *iotlbentry = &env->iotlb[mmu_idx][index];
+    return io_writex(env, iotlbentry, val, addr, retaddr, DATA_SIZE);
 }
 
 void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
@@ -347,16 +319,14 @@ void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
 
     /* Handle an IO access.  */
     if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
-        CPUIOTLBEntry *iotlbentry;
         if ((addr & (DATA_SIZE - 1)) != 0) {
             goto do_unaligned_access;
         }
-        iotlbentry = &env->iotlb[mmu_idx][index];
 
         /* ??? Note that the io helpers always read data in the target
            byte ordering.  We should push the LE/BE request down into io.  */
         val = TGT_LE(val);
-        glue(io_write, SUFFIX)(env, iotlbentry, val, addr, retaddr);
+        glue(io_write, SUFFIX)(env, mmu_idx, index, val, addr, retaddr);
         return;
     }
 
@@ -430,16 +400,14 @@ void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
 
     /* Handle an IO access.  */
     if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
-        CPUIOTLBEntry *iotlbentry;
         if ((addr & (DATA_SIZE - 1)) != 0) {
             goto do_unaligned_access;
         }
-        iotlbentry = &env->iotlb[mmu_idx][index];
 
         /* ??? Note that the io helpers always read data in the target
            byte ordering.  We should push the LE/BE request down into io.  */
         val = TGT_BE(val);
-        glue(io_write, SUFFIX)(env, iotlbentry, val, addr, retaddr);
+        glue(io_write, SUFFIX)(env, mmu_idx, index, val, addr, retaddr);
         return;
     }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [Qemu-devel] [PATCH v3 12/34] cputlb: Tidy some macros
  2016-09-03 20:39 [Qemu-devel] [PATCH v3 00/34] cmpxchg-based emulation of atomics Richard Henderson
                   ` (10 preceding siblings ...)
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 11/34] cputlb: Move most of iotlb code out of line Richard Henderson
@ 2016-09-03 20:39 ` Richard Henderson
  2016-09-12 15:28   ` Alex Bennée
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 13/34] tcg: Add atomic helpers Richard Henderson
                   ` (25 subsequent siblings)
  37 siblings, 1 reply; 97+ messages in thread
From: Richard Henderson @ 2016-09-03 20:39 UTC (permalink / raw)
  To: qemu-devel

TGT_LE and TGT_BE are not size dependent and do not need to be
redefined.  The others are no longer used at all.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 cputlb.c           |  8 ++++++++
 softmmu_template.h | 22 ----------------------
 2 files changed, 8 insertions(+), 22 deletions(-)

diff --git a/cputlb.c b/cputlb.c
index eba78a9..d710cc1 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -585,6 +585,14 @@ void probe_write(CPUArchState *env, target_ulong addr, int mmu_idx,
     }
 }
 
+#ifdef TARGET_WORDS_BIGENDIAN
+# define TGT_BE(X)  (X)
+# define TGT_LE(X)  BSWAP(X)
+#else
+# define TGT_BE(X)  BSWAP(X)
+# define TGT_LE(X)  (X)
+#endif
+
 #define MMUSUFFIX _mmu
 
 #define DATA_SIZE 1
diff --git a/softmmu_template.h b/softmmu_template.h
index c513813..efcfcd8 100644
--- a/softmmu_template.h
+++ b/softmmu_template.h
@@ -78,14 +78,6 @@
 # define BSWAP(X)  (X)
 #endif
 
-#ifdef TARGET_WORDS_BIGENDIAN
-# define TGT_BE(X)  (X)
-# define TGT_LE(X)  BSWAP(X)
-#else
-# define TGT_BE(X)  BSWAP(X)
-# define TGT_LE(X)  (X)
-#endif
-
 #if DATA_SIZE == 1
 # define helper_le_ld_name  glue(glue(helper_ret_ld, USUFFIX), MMUSUFFIX)
 # define helper_be_ld_name  helper_le_ld_name
@@ -102,14 +94,6 @@
 # define helper_be_st_name  glue(glue(helper_be_st, SUFFIX), MMUSUFFIX)
 #endif
 
-#ifdef TARGET_WORDS_BIGENDIAN
-# define helper_te_ld_name  helper_be_ld_name
-# define helper_te_st_name  helper_be_st_name
-#else
-# define helper_te_ld_name  helper_le_ld_name
-# define helper_te_st_name  helper_le_st_name
-#endif
-
 #ifndef SOFTMMU_CODE_ACCESS
 static inline DATA_TYPE glue(io_read, SUFFIX)(CPUArchState *env,
                                               size_t mmu_idx, size_t index,
@@ -461,15 +445,9 @@ void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
 #undef USUFFIX
 #undef SSUFFIX
 #undef BSWAP
-#undef TGT_BE
-#undef TGT_LE
-#undef CPU_BE
-#undef CPU_LE
 #undef helper_le_ld_name
 #undef helper_be_ld_name
 #undef helper_le_lds_name
 #undef helper_be_lds_name
 #undef helper_le_st_name
 #undef helper_be_st_name
-#undef helper_te_ld_name
-#undef helper_te_st_name
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [Qemu-devel] [PATCH v3 13/34] tcg: Add atomic helpers
  2016-09-03 20:39 [Qemu-devel] [PATCH v3 00/34] cmpxchg-based emulation of atomics Richard Henderson
                   ` (11 preceding siblings ...)
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 12/34] cputlb: Tidy some macros Richard Henderson
@ 2016-09-03 20:39 ` Richard Henderson
  2016-09-09 13:11   ` Leon Alrae
                     ` (3 more replies)
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 14/34] tcg: Add atomic128 helpers Richard Henderson
                   ` (24 subsequent siblings)
  37 siblings, 4 replies; 97+ messages in thread
From: Richard Henderson @ 2016-09-03 20:39 UTC (permalink / raw)
  To: qemu-devel

Add all of cmpxchg, op_fetch, fetch_op, and xchg.
Handle both endian-ness, and sizes up to 8.
Handle expanding non-atomically, when emulating in serial.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 Makefile.objs         |   1 -
 Makefile.target       |   1 +
 atomic_template.h     | 173 ++++++++++++++++++++++++++
 cputlb.c              | 112 ++++++++++++++++-
 include/qemu/atomic.h |  21 +++-
 tcg-runtime.c         |  49 ++++++--
 tcg/tcg-op.c          | 328 ++++++++++++++++++++++++++++++++++++++++++++++++++
 tcg/tcg-op.h          |  44 +++++++
 tcg/tcg-runtime.h     |  75 ++++++++++++
 tcg/tcg.h             |  53 ++++++++
 10 files changed, 836 insertions(+), 21 deletions(-)
 create mode 100644 atomic_template.h

diff --git a/Makefile.objs b/Makefile.objs
index 6d5ddcf..df8796e 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -88,7 +88,6 @@ endif
 
 #######################################################################
 # Target-independent parts used in system and user emulation
-common-obj-y += tcg-runtime.o
 common-obj-y += hw/
 common-obj-y += qom/
 common-obj-y += disas/
diff --git a/Makefile.target b/Makefile.target
index a440bcb..1d16d10 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -94,6 +94,7 @@ obj-$(CONFIG_TCG_INTERPRETER) += disas/tci.o
 obj-y += fpu/softfloat.o
 obj-y += target-$(TARGET_BASE_ARCH)/
 obj-y += disas.o
+obj-y += tcg-runtime.o
 obj-$(call notempty,$(TARGET_XML_FILES)) += gdbstub-xml.o
 obj-$(call lnot,$(CONFIG_KVM)) += kvm-stub.o
 
diff --git a/atomic_template.h b/atomic_template.h
new file mode 100644
index 0000000..d2c8a08
--- /dev/null
+++ b/atomic_template.h
@@ -0,0 +1,173 @@
+/*
+ * Atomic helper templates
+ * Included from tcg-runtime.c and cputlb.c.
+ *
+ * Copyright (c) 2016 Red Hat, Inc
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#if DATA_SIZE == 8
+# define SUFFIX     q
+# define DATA_TYPE  uint64_t
+# define BSWAP      bswap64
+#elif DATA_SIZE == 4
+# define SUFFIX     l
+# define DATA_TYPE  uint32_t
+# define BSWAP      bswap32
+#elif DATA_SIZE == 2
+# define SUFFIX     w
+# define DATA_TYPE  uint16_t
+# define BSWAP      bswap16
+#elif DATA_SIZE == 1
+# define SUFFIX     b
+# define DATA_TYPE  uint8_t
+# define BSWAP
+#else
+# error unsupported data size
+#endif
+
+#if DATA_SIZE >= 4
+# define ABI_TYPE  DATA_TYPE
+#else
+# define ABI_TYPE  uint32_t
+#endif
+
+#if DATA_SIZE == 1
+# define END
+#elif defined(HOST_WORDS_BIGENDIAN)
+# define END  _be
+#else
+# define END  _le
+#endif
+
+ABI_TYPE ATOMIC_NAME(cmpxchg)(CPUArchState *env, target_ulong addr,
+                              ABI_TYPE cmpv, ABI_TYPE newv EXTRA_ARGS)
+{
+    DATA_TYPE *haddr = ATOMIC_MMU_LOOKUP;
+    return atomic_cmpxchg__nocheck(haddr, cmpv, newv);
+}
+
+ABI_TYPE ATOMIC_NAME(xchg)(CPUArchState *env, target_ulong addr,
+                           ABI_TYPE val EXTRA_ARGS)
+{
+    DATA_TYPE *haddr = ATOMIC_MMU_LOOKUP;
+    return atomic_xchg__nocheck(haddr, val);
+}
+
+#define GEN_ATOMIC_HELPER(X)                                        \
+ABI_TYPE ATOMIC_NAME(X)(CPUArchState *env, target_ulong addr,       \
+                 ABI_TYPE val EXTRA_ARGS)                           \
+{                                                                   \
+    DATA_TYPE *haddr = ATOMIC_MMU_LOOKUP;                           \
+    return atomic_##X(haddr, val);                                  \
+}                                                                   \
+
+GEN_ATOMIC_HELPER(fetch_add)
+GEN_ATOMIC_HELPER(fetch_and)
+GEN_ATOMIC_HELPER(fetch_or)
+GEN_ATOMIC_HELPER(fetch_xor)
+GEN_ATOMIC_HELPER(add_fetch)
+GEN_ATOMIC_HELPER(and_fetch)
+GEN_ATOMIC_HELPER(or_fetch)
+GEN_ATOMIC_HELPER(xor_fetch)
+
+#undef GEN_ATOMIC_HELPER
+#undef END
+
+#if DATA_SIZE > 1
+
+#ifdef HOST_WORDS_BIGENDIAN
+# define END  _le
+#else
+# define END  _be
+#endif
+
+ABI_TYPE ATOMIC_NAME(cmpxchg)(CPUArchState *env, target_ulong addr,
+                              ABI_TYPE cmpv, ABI_TYPE newv EXTRA_ARGS)
+{
+    DATA_TYPE *haddr = ATOMIC_MMU_LOOKUP;
+    return BSWAP(atomic_cmpxchg__nocheck(haddr, BSWAP(cmpv), BSWAP(newv)));
+}
+
+ABI_TYPE ATOMIC_NAME(xchg)(CPUArchState *env, target_ulong addr,
+                           ABI_TYPE val EXTRA_ARGS)
+{
+    DATA_TYPE *haddr = ATOMIC_MMU_LOOKUP;
+    return BSWAP(atomic_xchg__nocheck(haddr, BSWAP(val)));
+}
+
+#define GEN_ATOMIC_HELPER(X)                                        \
+ABI_TYPE ATOMIC_NAME(X)(CPUArchState *env, target_ulong addr,       \
+                 ABI_TYPE val EXTRA_ARGS)                           \
+{                                                                   \
+    DATA_TYPE *haddr = ATOMIC_MMU_LOOKUP;                           \
+    return BSWAP(atomic_##X(haddr, BSWAP(val)));                    \
+}
+
+GEN_ATOMIC_HELPER(fetch_and)
+GEN_ATOMIC_HELPER(fetch_or)
+GEN_ATOMIC_HELPER(fetch_xor)
+GEN_ATOMIC_HELPER(and_fetch)
+GEN_ATOMIC_HELPER(or_fetch)
+GEN_ATOMIC_HELPER(xor_fetch)
+
+#undef GEN_ATOMIC_HELPER
+
+/* Note that for addition, we need to use a separate cmpxchg loop instead
+   of bswaps for the reverse-host-endian helpers.  */
+ABI_TYPE ATOMIC_NAME(fetch_add)(CPUArchState *env, target_ulong addr,
+                         ABI_TYPE val EXTRA_ARGS)
+{
+    DATA_TYPE *haddr = ATOMIC_MMU_LOOKUP;
+    DATA_TYPE ldo, ldn, ret, sto;
+
+    ldo = *haddr;
+    while (1) {
+        ret = BSWAP(ldo);
+        sto = BSWAP(ret + val);
+        ldn = atomic_cmpxchg__nocheck(haddr, ldo, sto);
+        if (ldn == ldo) {
+            return ret;
+        }
+        ldo = ldn;
+    }
+}
+
+ABI_TYPE ATOMIC_NAME(add_fetch)(CPUArchState *env, target_ulong addr,
+                         ABI_TYPE val EXTRA_ARGS)
+{
+    DATA_TYPE *haddr = ATOMIC_MMU_LOOKUP;
+    DATA_TYPE ldo, ldn, ret, sto;
+
+    ldo = *haddr;
+    while (1) {
+        ret = BSWAP(ldo) + val;
+        sto = BSWAP(ret);
+        ldn = atomic_cmpxchg__nocheck(haddr, ldo, sto);
+        if (ldn == ldo) {
+            return ret;
+        }
+        ldo = ldn;
+    }
+}
+
+#undef END
+#endif /* DATA_SIZE > 1 */
+
+#undef BSWAP
+#undef ABI_TYPE
+#undef DATA_TYPE
+#undef SUFFIX
+#undef DATA_SIZE
diff --git a/cputlb.c b/cputlb.c
index d710cc1..a54afb4 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -23,15 +23,15 @@
 #include "exec/memory.h"
 #include "exec/address-spaces.h"
 #include "exec/cpu_ldst.h"
-
 #include "exec/cputlb.h"
-
 #include "exec/memory-internal.h"
 #include "exec/ram_addr.h"
 #include "exec/exec-all.h"
 #include "tcg/tcg.h"
 #include "qemu/error-report.h"
 #include "exec/log.h"
+#include "exec/helper-proto.h"
+#include "qemu/atomic.h"
 
 /* DEBUG defines, enable DEBUG_TLB_LOG to log to the CPU_LOG_MMU target */
 /* #define DEBUG_TLB */
@@ -585,6 +585,69 @@ void probe_write(CPUArchState *env, target_ulong addr, int mmu_idx,
     }
 }
 
+/* Probe for a read-modify-write atomic operation.  Do not allow unaligned
+ * operations, or io operations to proceed.  Return the host address.  */
+static void *atomic_mmu_lookup(CPUArchState *env, target_ulong addr,
+                               TCGMemOpIdx oi, uintptr_t retaddr)
+{
+    size_t mmu_idx = get_mmuidx(oi);
+    size_t index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
+    CPUTLBEntry *tlbe = &env->tlb_table[mmu_idx][index];
+    target_ulong tlb_addr = tlbe->addr_write;
+    TCGMemOp mop = get_memop(oi);
+    int a_bits = get_alignment_bits(mop);
+    int s_bits = mop & MO_SIZE;
+
+    /* Adjust the given return address.  */
+    retaddr -= GETPC_ADJ;
+
+    /* Enforce guest required alignment.  */
+    if (unlikely(a_bits > 0 && (addr & ((1 << a_bits) - 1)))) {
+        /* ??? Maybe indicate atomic op to cpu_unaligned_access */
+        cpu_unaligned_access(ENV_GET_CPU(env), addr, MMU_DATA_STORE,
+                             mmu_idx, retaddr);
+    }
+
+    /* Enforce qemu required alignment.  */
+    if (unlikely(addr & ((1 << s_bits) - 1))) {
+        /* We get here if guest alignment was not requested,
+           or was not enforced by cpu_unaligned_access above.
+           We might widen the access and emulate, but for now
+           mark an exception and exit the cpu loop.  */
+        goto stop_the_world;
+    }
+
+    /* Check TLB entry and enforce page permissions.  */
+    if ((addr & TARGET_PAGE_MASK)
+        != (tlb_addr & (TARGET_PAGE_MASK | TLB_INVALID_MASK))) {
+        if (!VICTIM_TLB_HIT(addr_write, addr)) {
+            tlb_fill(ENV_GET_CPU(env), addr, MMU_DATA_STORE, mmu_idx, retaddr);
+        }
+        tlb_addr = tlbe->addr_write;
+    }
+
+    /* Notice an IO access, or a notdirty page.  */
+    if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
+        /* There's really nothing that can be done to
+           support this apart from stop-the-world.  */
+        goto stop_the_world;
+    }
+
+    /* Let the guest notice RMW on a write-only page.  */
+    if (unlikely(tlbe->addr_read != tlb_addr)) {
+        tlb_fill(ENV_GET_CPU(env), addr, MMU_DATA_LOAD, mmu_idx, retaddr);
+        /* Since we don't support reads and writes to different addresses,
+           and we do have the proper page loaded for write, this shouldn't
+           ever return.  But just in case, handle via stop-the-world.  */
+        goto stop_the_world;
+    }
+
+    return (void *)((uintptr_t)addr + tlbe->addend);
+
+ stop_the_world:
+    cpu_loop_exit_atomic(ENV_GET_CPU(env), retaddr);
+}
+
 #ifdef TARGET_WORDS_BIGENDIAN
 # define TGT_BE(X)  (X)
 # define TGT_LE(X)  BSWAP(X)
@@ -606,8 +669,51 @@ void probe_write(CPUArchState *env, target_ulong addr, int mmu_idx,
 
 #define DATA_SIZE 8
 #include "softmmu_template.h"
-#undef MMUSUFFIX
 
+/* First set of helpers allows passing in of OI and RETADDR.  This makes
+   them callable from other helpers.  */
+
+#define EXTRA_ARGS     , TCGMemOpIdx oi, uintptr_t retaddr
+#define ATOMIC_NAME(X) \
+    HELPER(glue(glue(glue(atomic_ ## X, SUFFIX), END), _mmu))
+#define ATOMIC_MMU_LOOKUP  atomic_mmu_lookup(env, addr, oi, retaddr)
+
+#define DATA_SIZE 1
+#include "atomic_template.h"
+
+#define DATA_SIZE 2
+#include "atomic_template.h"
+
+#define DATA_SIZE 4
+#include "atomic_template.h"
+
+#define DATA_SIZE 8
+#include "atomic_template.h"
+
+/* Second set of helpers are directly callable from TCG as helpers.  */
+
+#undef EXTRA_ARGS
+#undef ATOMIC_NAME
+#undef ATOMIC_MMU_LOOKUP
+#define EXTRA_ARGS         , TCGMemOpIdx oi
+#define ATOMIC_NAME(X)     HELPER(glue(glue(atomic_ ## X, SUFFIX), END))
+#define ATOMIC_MMU_LOOKUP  atomic_mmu_lookup(env, addr, oi, GETRA())
+
+#define DATA_SIZE 1
+#include "atomic_template.h"
+
+#define DATA_SIZE 2
+#include "atomic_template.h"
+
+#define DATA_SIZE 4
+#include "atomic_template.h"
+
+#define DATA_SIZE 8
+#include "atomic_template.h"
+
+/* Code access functions.  */
+
+#undef MMUSUFFIX
 #define MMUSUFFIX _cmmu
 #undef GETPC_ADJ
 #define GETPC_ADJ 0
diff --git a/include/qemu/atomic.h b/include/qemu/atomic.h
index 6e5cdd5..5542416 100644
--- a/include/qemu/atomic.h
+++ b/include/qemu/atomic.h
@@ -177,22 +177,29 @@
 
 /* All the remaining operations are fully sequentially consistent */
 
-#define atomic_xchg(ptr, i)    ({                           \
-    QEMU_BUILD_BUG_ON(sizeof(*ptr) > sizeof(void *));       \
+#define atomic_xchg__nocheck(ptr, i)    ({                  \
     typeof_strip_qual(*ptr) _new = (i), _old;               \
     __atomic_exchange(ptr, &_new, &_old, __ATOMIC_SEQ_CST); \
     _old;                                                   \
 })
 
+#define atomic_xchg(ptr, i)    ({                           \
+    QEMU_BUILD_BUG_ON(sizeof(*ptr) > sizeof(void *));       \
+    atomic_xchg__nocheck(ptr, i);                           \
+})
+
 /* Returns the eventual value, failed or not */
-#define atomic_cmpxchg(ptr, old, new)                                   \
-    ({                                                                  \
-    QEMU_BUILD_BUG_ON(sizeof(*ptr) > sizeof(void *));                   \
+#define atomic_cmpxchg__nocheck(ptr, old, new)    ({                    \
     typeof_strip_qual(*ptr) _old = (old), _new = (new);                 \
     __atomic_compare_exchange(ptr, &_old, &_new, false,                 \
                               __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST);      \
     _old;                                                               \
-    })
+})
+
+#define atomic_cmpxchg(ptr, old, new)    ({                             \
+    QEMU_BUILD_BUG_ON(sizeof(*ptr) > sizeof(void *));                   \
+    atomic_cmpxchg__nocheck(ptr, old, new);                             \
+})
 
 /* Provide shorter names for GCC atomic builtins, return old value */
 #define atomic_fetch_inc(ptr)  __atomic_fetch_add(ptr, 1, __ATOMIC_SEQ_CST)
@@ -397,6 +404,7 @@
 #define atomic_xchg(ptr, i)    (smp_mb(), __sync_lock_test_and_set(ptr, i))
 #endif
 #endif
+#define atomic_xchg__nocheck  atomic_xchg
 
 /* Provide shorter names for GCC atomic builtins.  */
 #define atomic_fetch_inc(ptr)  __sync_fetch_and_add(ptr, 1)
@@ -416,6 +424,7 @@
 #define atomic_xor_fetch       __sync_xor_and_fetch
 
 #define atomic_cmpxchg         __sync_val_compare_and_swap
+#define atomic_cmpxchg__nocheck  atomic_cmpxchg
 
 /* And even shorter names that return void.  */
 #define atomic_inc(ptr)        ((void) __sync_fetch_and_add(ptr, 1))
diff --git a/tcg-runtime.c b/tcg-runtime.c
index ea2ad64..aa55d12 100644
--- a/tcg-runtime.c
+++ b/tcg-runtime.c
@@ -23,17 +23,10 @@
  */
 #include "qemu/osdep.h"
 #include "qemu/host-utils.h"
-
-/* This file is compiled once, and thus we can't include the standard
-   "exec/helper-proto.h", which has includes that are target specific.  */
-
-#include "exec/helper-head.h"
-
-#define DEF_HELPER_FLAGS_2(name, flags, ret, t1, t2) \
-  dh_ctype(ret) HELPER(name) (dh_ctype(t1), dh_ctype(t2));
-
-#include "tcg-runtime.h"
-
+#include "cpu.h"
+#include "exec/helper-proto.h"
+#include "exec/cpu_ldst.h"
+#include "exec/exec-all.h"
 
 /* 32-bit helpers */
 
@@ -107,3 +100,37 @@ int64_t HELPER(mulsh_i64)(int64_t arg1, int64_t arg2)
     muls64(&l, &h, arg1, arg2);
     return h;
 }
+
+#ifndef CONFIG_SOFTMMU
+/* The softmmu versions of these helpers are in cputlb.c.  */
+
+/* Do not allow unaligned operations to proceed.  Return the host address.  */
+static void *atomic_mmu_lookup(CPUArchState *env, target_ulong addr,
+                               int size, uintptr_t retaddr)
+{
+    /* Enforce qemu required alignment.  */
+    if (unlikely(addr & (size - 1))) {
+        cpu_loop_exit_atomic(ENV_GET_CPU(env), retaddr);
+    }
+    return g2h(addr);
+}
+
+/* Macro to call the above, with local variables from the use context.  */
+#define ATOMIC_MMU_LOOKUP  atomic_mmu_lookup(env, addr, DATA_SIZE, GETPC())
+
+#define ATOMIC_NAME(X)   HELPER(glue(glue(atomic_ ## X, SUFFIX), END))
+#define EXTRA_ARGS
+
+#define DATA_SIZE 1
+#include "atomic_template.h"
+
+#define DATA_SIZE 2
+#include "atomic_template.h"
+
+#define DATA_SIZE 4
+#include "atomic_template.h"
+
+#define DATA_SIZE 8
+#include "atomic_template.h"
+
+#endif /* !CONFIG_SOFTMMU */
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 0243c99..e146ad4 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -1958,3 +1958,331 @@ void tcg_gen_qemu_st_i64(TCGv_i64 val, TCGv addr, TCGArg idx, TCGMemOp memop)
                                addr, trace_mem_get_info(memop, 1));
     gen_ldst_i64(INDEX_op_qemu_st_i64, val, addr, memop, idx);
 }
+
+static void tcg_gen_ext_i32(TCGv_i32 ret, TCGv_i32 val, TCGMemOp opc)
+{
+    switch (opc & MO_SSIZE) {
+    case MO_SB:
+        tcg_gen_ext8s_i32(ret, val);
+        break;
+    case MO_UB:
+        tcg_gen_ext8u_i32(ret, val);
+        break;
+    case MO_SW:
+        tcg_gen_ext16s_i32(ret, val);
+        break;
+    case MO_UW:
+        tcg_gen_ext16u_i32(ret, val);
+        break;
+    default:
+        tcg_gen_mov_i32(ret, val);
+        break;
+    }
+}
+
+static void tcg_gen_ext_i64(TCGv_i64 ret, TCGv_i64 val, TCGMemOp opc)
+{
+    switch (opc & MO_SSIZE) {
+    case MO_SB:
+        tcg_gen_ext8s_i64(ret, val);
+        break;
+    case MO_UB:
+        tcg_gen_ext8u_i64(ret, val);
+        break;
+    case MO_SW:
+        tcg_gen_ext16s_i64(ret, val);
+        break;
+    case MO_UW:
+        tcg_gen_ext16u_i64(ret, val);
+        break;
+    case MO_SL:
+        tcg_gen_ext32s_i64(ret, val);
+        break;
+    case MO_UL:
+        tcg_gen_ext32u_i64(ret, val);
+        break;
+    default:
+        tcg_gen_mov_i64(ret, val);
+        break;
+    }
+}
+
+#ifdef CONFIG_SOFTMMU
+typedef void (*gen_atomic_cx_i32)(TCGv_i32, TCGv_env, TCGv,
+                                  TCGv_i32, TCGv_i32, TCGv_i32);
+typedef void (*gen_atomic_cx_i64)(TCGv_i64, TCGv_env, TCGv,
+                                  TCGv_i64, TCGv_i64, TCGv_i32);
+typedef void (*gen_atomic_op_i32)(TCGv_i32, TCGv_env, TCGv,
+                                  TCGv_i32, TCGv_i32);
+typedef void (*gen_atomic_op_i64)(TCGv_i64, TCGv_env, TCGv,
+                                  TCGv_i64, TCGv_i32);
+#else
+typedef void (*gen_atomic_cx_i32)(TCGv_i32, TCGv_env, TCGv, TCGv_i32, TCGv_i32);
+typedef void (*gen_atomic_cx_i64)(TCGv_i64, TCGv_env, TCGv, TCGv_i64, TCGv_i64);
+typedef void (*gen_atomic_op_i32)(TCGv_i32, TCGv_env, TCGv, TCGv_i32);
+typedef void (*gen_atomic_op_i64)(TCGv_i64, TCGv_env, TCGv, TCGv_i64);
+#endif
+
+static void * const table_cmpxchg[16] = {
+    [MO_8] = gen_helper_atomic_cmpxchgb,
+    [MO_16 | MO_LE] = gen_helper_atomic_cmpxchgw_le,
+    [MO_16 | MO_BE] = gen_helper_atomic_cmpxchgw_be,
+    [MO_32 | MO_LE] = gen_helper_atomic_cmpxchgl_le,
+    [MO_32 | MO_BE] = gen_helper_atomic_cmpxchgl_be,
+    [MO_64 | MO_LE] = gen_helper_atomic_cmpxchgq_le,
+    [MO_64 | MO_BE] = gen_helper_atomic_cmpxchgq_be,
+};
+
+void tcg_gen_atomic_cmpxchg_i32(TCGv_i32 retv, TCGv addr, TCGv_i32 cmpv,
+                                TCGv_i32 newv, TCGArg idx, TCGMemOp memop)
+{
+    memop = tcg_canonicalize_memop(memop, 0, 0);
+
+    if (!parallel_cpus) {
+        TCGv_i32 t1 = tcg_temp_new_i32();
+        TCGv_i32 t2 = tcg_temp_new_i32();
+
+        tcg_gen_ext_i32(t2, cmpv, memop & MO_SIZE);
+
+        tcg_gen_qemu_ld_i32(t1, addr, idx, memop & ~MO_SIGN);
+        tcg_gen_movcond_i32(TCG_COND_EQ, t2, t1, t2, newv, t1);
+        tcg_gen_qemu_st_i32(t2, addr, idx, memop);
+        tcg_temp_free_i32(t2);
+
+        if (memop & MO_SIGN) {
+            tcg_gen_ext_i32(retv, t1, memop);
+        } else {
+            tcg_gen_mov_i32(retv, t1);
+        }
+        tcg_temp_free_i32(t1);
+    } else {
+        gen_atomic_cx_i32 gen;
+
+        gen = table_cmpxchg[memop & (MO_SIZE | MO_BSWAP)];
+        tcg_debug_assert(gen != NULL);
+
+#ifdef CONFIG_SOFTMMU
+        {
+            TCGv_i32 oi = tcg_const_i32(make_memop_idx(memop & ~MO_SIGN, idx));
+            gen(retv, tcg_ctx.tcg_env, addr, cmpv, newv, oi);
+            tcg_temp_free_i32(oi);
+        }
+#else
+        gen(retv, tcg_ctx.tcg_env, addr, cmpv, newv);
+#endif
+
+        if (memop & MO_SIGN) {
+            tcg_gen_ext_i32(retv, retv, memop);
+        }
+    }
+}
+
+void tcg_gen_atomic_cmpxchg_i64(TCGv_i64 retv, TCGv addr, TCGv_i64 cmpv,
+                                TCGv_i64 newv, TCGArg idx, TCGMemOp memop)
+{
+    memop = tcg_canonicalize_memop(memop, 1, 0);
+
+    if (!parallel_cpus) {
+        TCGv_i64 t1 = tcg_temp_new_i64();
+        TCGv_i64 t2 = tcg_temp_new_i64();
+
+        tcg_gen_ext_i64(t2, cmpv, memop & MO_SIZE);
+
+        tcg_gen_qemu_ld_i64(t1, addr, idx, memop & ~MO_SIGN);
+        tcg_gen_movcond_i64(TCG_COND_EQ, t2, t1, t2, newv, t1);
+        tcg_gen_qemu_st_i64(t2, addr, idx, memop);
+        tcg_temp_free_i64(t2);
+
+        if (memop & MO_SIGN) {
+            tcg_gen_ext_i64(retv, t1, memop);
+        } else {
+            tcg_gen_mov_i64(retv, t1);
+        }
+        tcg_temp_free_i64(t1);
+    } else if ((memop & MO_SIZE) == MO_64) {
+        gen_atomic_cx_i64 gen;
+
+        gen = table_cmpxchg[memop & (MO_SIZE | MO_BSWAP)];
+        tcg_debug_assert(gen != NULL);
+
+#ifdef CONFIG_SOFTMMU
+        {
+            TCGv_i32 oi = tcg_const_i32(make_memop_idx(memop, idx));
+            gen(retv, tcg_ctx.tcg_env, addr, cmpv, newv, oi);
+            tcg_temp_free_i32(oi);
+        }
+#else
+        gen(retv, tcg_ctx.tcg_env, addr, cmpv, newv);
+#endif
+    } else {
+        TCGv_i32 c32 = tcg_temp_new_i32();
+        TCGv_i32 n32 = tcg_temp_new_i32();
+        TCGv_i32 r32 = tcg_temp_new_i32();
+
+        tcg_gen_extrl_i64_i32(c32, cmpv);
+        tcg_gen_extrl_i64_i32(n32, newv);
+        tcg_gen_atomic_cmpxchg_i32(r32, addr, c32, n32, idx, memop & ~MO_SIGN);
+        tcg_temp_free_i32(c32);
+        tcg_temp_free_i32(n32);
+
+        tcg_gen_extu_i32_i64(retv, r32);
+        tcg_temp_free_i32(r32);
+
+        if (memop & MO_SIGN) {
+            tcg_gen_ext_i64(retv, retv, memop);
+        }
+    }
+}
+
+static void do_nonatomic_op_i32(TCGv_i32 ret, TCGv addr, TCGv_i32 val,
+                                TCGArg idx, TCGMemOp memop, bool new_val,
+                                void (*gen)(TCGv_i32, TCGv_i32, TCGv_i32))
+{
+    TCGv_i32 t1 = tcg_temp_new_i32();
+    TCGv_i32 t2 = tcg_temp_new_i32();
+
+    memop = tcg_canonicalize_memop(memop, 0, 0);
+
+    tcg_gen_qemu_ld_i32(t1, addr, idx, memop & ~MO_SIGN);
+    gen(t2, t1, val);
+    tcg_gen_qemu_st_i32(t2, addr, idx, memop);
+
+    tcg_gen_ext_i32(ret, (new_val ? t2 : t1), memop);
+    tcg_temp_free_i32(t1);
+    tcg_temp_free_i32(t2);
+}
+
+static void do_atomic_op_i32(TCGv_i32 ret, TCGv addr, TCGv_i32 val,
+                             TCGArg idx, TCGMemOp memop, void * const table[])
+{
+    gen_atomic_op_i32 gen;
+
+    memop = tcg_canonicalize_memop(memop, 0, 0);
+
+    gen = table[memop & (MO_SIZE | MO_BSWAP)];
+    tcg_debug_assert(gen != NULL);
+
+#ifdef CONFIG_SOFTMMU
+    {
+        TCGv_i32 oi = tcg_const_i32(make_memop_idx(memop & ~MO_SIGN, idx));
+        gen(ret, tcg_ctx.tcg_env, addr, val, oi);
+        tcg_temp_free_i32(oi);
+    }
+#else
+    gen(ret, tcg_ctx.tcg_env, addr, val);
+#endif
+
+    if (memop & MO_SIGN) {
+        tcg_gen_ext_i32(ret, ret, memop);
+    }
+}
+
+static void do_nonatomic_op_i64(TCGv_i64 ret, TCGv addr, TCGv_i64 val,
+                                TCGArg idx, TCGMemOp memop, bool new_val,
+                                void (*gen)(TCGv_i64, TCGv_i64, TCGv_i64))
+{
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 t2 = tcg_temp_new_i64();
+
+    memop = tcg_canonicalize_memop(memop, 1, 0);
+
+    tcg_gen_qemu_ld_i64(t1, addr, idx, memop & ~MO_SIGN);
+    gen(t2, t1, val);
+    tcg_gen_qemu_st_i64(t2, addr, idx, memop);
+
+    tcg_gen_ext_i64(ret, (new_val ? t2 : t1), memop);
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
+}
+
+static void do_atomic_op_i64(TCGv_i64 ret, TCGv addr, TCGv_i64 val,
+                             TCGArg idx, TCGMemOp memop, void * const table[])
+{
+    memop = tcg_canonicalize_memop(memop, 1, 0);
+
+    if ((memop & MO_SIZE) == MO_64) {
+        gen_atomic_op_i64 gen;
+
+        gen = table[memop & (MO_SIZE | MO_BSWAP)];
+        tcg_debug_assert(gen != NULL);
+
+#ifdef CONFIG_SOFTMMU
+        {
+            TCGv_i32 oi = tcg_const_i32(make_memop_idx(memop & ~MO_SIGN, idx));
+            gen(ret, tcg_ctx.tcg_env, addr, val, oi);
+            tcg_temp_free_i32(oi);
+        }
+#else
+        gen(ret, tcg_ctx.tcg_env, addr, val);
+#endif
+    } else {
+        TCGv_i32 v32 = tcg_temp_new_i32();
+        TCGv_i32 r32 = tcg_temp_new_i32();
+
+        tcg_gen_extrl_i64_i32(v32, val);
+        do_atomic_op_i32(r32, addr, v32, idx, memop & ~MO_SIGN, table);
+        tcg_temp_free_i32(v32);
+
+        tcg_gen_extu_i32_i64(ret, r32);
+        tcg_temp_free_i32(r32);
+
+        if (memop & MO_SIGN) {
+            tcg_gen_ext_i64(ret, ret, memop);
+        }
+    }
+}
+
+#define GEN_ATOMIC_HELPER(NAME, OP, NEW)                                \
+static void * const table_##NAME[16] = {                                \
+    [MO_8] = gen_helper_atomic_##NAME##b,                               \
+    [MO_16 | MO_LE] = gen_helper_atomic_##NAME##w_le,                   \
+    [MO_16 | MO_BE] = gen_helper_atomic_##NAME##w_be,                   \
+    [MO_32 | MO_LE] = gen_helper_atomic_##NAME##l_le,                   \
+    [MO_32 | MO_BE] = gen_helper_atomic_##NAME##l_be,                   \
+    [MO_64 | MO_LE] = gen_helper_atomic_##NAME##q_le,                   \
+    [MO_64 | MO_BE] = gen_helper_atomic_##NAME##q_be,                   \
+};                                                                      \
+void tcg_gen_atomic_##NAME##_i32                                        \
+    (TCGv_i32 ret, TCGv addr, TCGv_i32 val, TCGArg idx, TCGMemOp memop) \
+{                                                                       \
+    if (parallel_cpus) {                                                \
+        do_atomic_op_i32(ret, addr, val, idx, memop, table_##NAME);     \
+    } else {                                                            \
+        do_nonatomic_op_i32(ret, addr, val, idx, memop, NEW,            \
+                            tcg_gen_##OP##_i32);                        \
+    }                                                                   \
+}                                                                       \
+void tcg_gen_atomic_##NAME##_i64                                        \
+    (TCGv_i64 ret, TCGv addr, TCGv_i64 val, TCGArg idx, TCGMemOp memop) \
+{                                                                       \
+    if (parallel_cpus) {                                                \
+        do_atomic_op_i64(ret, addr, val, idx, memop, table_##NAME);     \
+    } else {                                                            \
+        do_nonatomic_op_i64(ret, addr, val, idx, memop, NEW,            \
+                            tcg_gen_##OP##_i64);                        \
+    }                                                                   \
+}
+
+GEN_ATOMIC_HELPER(fetch_add, add, 0)
+GEN_ATOMIC_HELPER(fetch_and, and, 0)
+GEN_ATOMIC_HELPER(fetch_or, or, 0)
+GEN_ATOMIC_HELPER(fetch_xor, xor, 0)
+
+GEN_ATOMIC_HELPER(add_fetch, add, 1)
+GEN_ATOMIC_HELPER(and_fetch, and, 1)
+GEN_ATOMIC_HELPER(or_fetch, or, 1)
+GEN_ATOMIC_HELPER(xor_fetch, xor, 1)
+
+static void tcg_gen_mov2_i32(TCGv_i32 r, TCGv_i32 a, TCGv_i32 b)
+{
+    tcg_gen_mov_i32(r, b);
+}
+
+static void tcg_gen_mov2_i64(TCGv_i64 r, TCGv_i64 a, TCGv_i64 b)
+{
+    tcg_gen_mov_i64(r, b);
+}
+
+GEN_ATOMIC_HELPER(xchg, mov2, 0)
+
+#undef GEN_ATOMIC_HELPER
diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
index f217e80..2a845ae 100644
--- a/tcg/tcg-op.h
+++ b/tcg/tcg-op.h
@@ -852,6 +852,30 @@ static inline void tcg_gen_qemu_st64(TCGv_i64 arg, TCGv addr, int mem_index)
     tcg_gen_qemu_st_i64(arg, addr, mem_index, MO_TEQ);
 }
 
+void tcg_gen_atomic_cmpxchg_i32(TCGv_i32, TCGv, TCGv_i32, TCGv_i32,
+                                TCGArg, TCGMemOp);
+void tcg_gen_atomic_cmpxchg_i64(TCGv_i64, TCGv, TCGv_i64, TCGv_i64,
+                                TCGArg, TCGMemOp);
+
+void tcg_gen_atomic_xchg_i32(TCGv_i32, TCGv, TCGv_i32, TCGArg, TCGMemOp);
+void tcg_gen_atomic_xchg_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
+void tcg_gen_atomic_fetch_add_i32(TCGv_i32, TCGv, TCGv_i32, TCGArg, TCGMemOp);
+void tcg_gen_atomic_fetch_add_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
+void tcg_gen_atomic_fetch_and_i32(TCGv_i32, TCGv, TCGv_i32, TCGArg, TCGMemOp);
+void tcg_gen_atomic_fetch_and_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
+void tcg_gen_atomic_fetch_or_i32(TCGv_i32, TCGv, TCGv_i32, TCGArg, TCGMemOp);
+void tcg_gen_atomic_fetch_or_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
+void tcg_gen_atomic_fetch_xor_i32(TCGv_i32, TCGv, TCGv_i32, TCGArg, TCGMemOp);
+void tcg_gen_atomic_fetch_xor_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
+void tcg_gen_atomic_add_fetch_i32(TCGv_i32, TCGv, TCGv_i32, TCGArg, TCGMemOp);
+void tcg_gen_atomic_add_fetch_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
+void tcg_gen_atomic_and_fetch_i32(TCGv_i32, TCGv, TCGv_i32, TCGArg, TCGMemOp);
+void tcg_gen_atomic_and_fetch_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
+void tcg_gen_atomic_or_fetch_i32(TCGv_i32, TCGv, TCGv_i32, TCGArg, TCGMemOp);
+void tcg_gen_atomic_or_fetch_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
+void tcg_gen_atomic_xor_fetch_i32(TCGv_i32, TCGv, TCGv_i32, TCGArg, TCGMemOp);
+void tcg_gen_atomic_xor_fetch_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
+
 #if TARGET_LONG_BITS == 64
 #define tcg_gen_movi_tl tcg_gen_movi_i64
 #define tcg_gen_mov_tl tcg_gen_mov_i64
@@ -930,6 +954,16 @@ static inline void tcg_gen_qemu_st64(TCGv_i64 arg, TCGv addr, int mem_index)
 #define tcg_gen_sub2_tl tcg_gen_sub2_i64
 #define tcg_gen_mulu2_tl tcg_gen_mulu2_i64
 #define tcg_gen_muls2_tl tcg_gen_muls2_i64
+#define tcg_gen_atomic_cmpxchg_tl tcg_gen_atomic_cmpxchg_i64
+#define tcg_gen_atomic_xchg_tl tcg_gen_atomic_xchg_i64
+#define tcg_gen_atomic_fetch_add_tl tcg_gen_atomic_fetch_add_i64
+#define tcg_gen_atomic_fetch_and_tl tcg_gen_atomic_fetch_and_i64
+#define tcg_gen_atomic_fetch_or_tl tcg_gen_atomic_fetch_or_i64
+#define tcg_gen_atomic_fetch_xor_tl tcg_gen_atomic_fetch_xor_i64
+#define tcg_gen_atomic_add_fetch_tl tcg_gen_atomic_add_fetch_i64
+#define tcg_gen_atomic_and_fetch_tl tcg_gen_atomic_and_fetch_i64
+#define tcg_gen_atomic_or_fetch_tl tcg_gen_atomic_or_fetch_i64
+#define tcg_gen_atomic_xor_fetch_tl tcg_gen_atomic_xor_fetch_i64
 #else
 #define tcg_gen_movi_tl tcg_gen_movi_i32
 #define tcg_gen_mov_tl tcg_gen_mov_i32
@@ -1007,6 +1041,16 @@ static inline void tcg_gen_qemu_st64(TCGv_i64 arg, TCGv addr, int mem_index)
 #define tcg_gen_sub2_tl tcg_gen_sub2_i32
 #define tcg_gen_mulu2_tl tcg_gen_mulu2_i32
 #define tcg_gen_muls2_tl tcg_gen_muls2_i32
+#define tcg_gen_atomic_cmpxchg_tl tcg_gen_atomic_cmpxchg_i32
+#define tcg_gen_atomic_xchg_tl tcg_gen_atomic_xchg_i32
+#define tcg_gen_atomic_fetch_add_tl tcg_gen_atomic_fetch_add_i32
+#define tcg_gen_atomic_fetch_and_tl tcg_gen_atomic_fetch_and_i32
+#define tcg_gen_atomic_fetch_or_tl tcg_gen_atomic_fetch_or_i32
+#define tcg_gen_atomic_fetch_xor_tl tcg_gen_atomic_fetch_xor_i32
+#define tcg_gen_atomic_add_fetch_tl tcg_gen_atomic_add_fetch_i32
+#define tcg_gen_atomic_and_fetch_tl tcg_gen_atomic_and_fetch_i32
+#define tcg_gen_atomic_or_fetch_tl tcg_gen_atomic_or_fetch_i32
+#define tcg_gen_atomic_xor_fetch_tl tcg_gen_atomic_xor_fetch_i32
 #endif
 
 #if UINTPTR_MAX == UINT32_MAX
diff --git a/tcg/tcg-runtime.h b/tcg/tcg-runtime.h
index 23a0c37..22367aa 100644
--- a/tcg/tcg-runtime.h
+++ b/tcg/tcg-runtime.h
@@ -14,3 +14,78 @@ DEF_HELPER_FLAGS_2(sar_i64, TCG_CALL_NO_RWG_SE, s64, s64, s64)
 
 DEF_HELPER_FLAGS_2(mulsh_i64, TCG_CALL_NO_RWG_SE, s64, s64, s64)
 DEF_HELPER_FLAGS_2(muluh_i64, TCG_CALL_NO_RWG_SE, i64, i64, i64)
+
+#ifdef CONFIG_SOFTMMU
+
+DEF_HELPER_FLAGS_5(atomic_cmpxchgb, TCG_CALL_NO_WG,
+                   i32, env, tl, i32, i32, i32)
+DEF_HELPER_FLAGS_5(atomic_cmpxchgw_be, TCG_CALL_NO_WG,
+                   i32, env, tl, i32, i32, i32)
+DEF_HELPER_FLAGS_5(atomic_cmpxchgl_be, TCG_CALL_NO_WG,
+                   i32, env, tl, i32, i32, i32)
+DEF_HELPER_FLAGS_5(atomic_cmpxchgq_be, TCG_CALL_NO_WG,
+                   i64, env, tl, i64, i64, i32)
+DEF_HELPER_FLAGS_5(atomic_cmpxchgw_le, TCG_CALL_NO_WG,
+                   i32, env, tl, i32, i32, i32)
+DEF_HELPER_FLAGS_5(atomic_cmpxchgl_le, TCG_CALL_NO_WG,
+                   i32, env, tl, i32, i32, i32)
+DEF_HELPER_FLAGS_5(atomic_cmpxchgq_le, TCG_CALL_NO_WG,
+                   i64, env, tl, i64, i64, i32)
+
+#define GEN_ATOMIC_HELPERS(NAME)                                  \
+    DEF_HELPER_FLAGS_4(glue(glue(atomic_, NAME), b),              \
+                       TCG_CALL_NO_WG, i32, env, tl, i32, i32)    \
+    DEF_HELPER_FLAGS_4(glue(glue(atomic_, NAME), w_le),           \
+                       TCG_CALL_NO_WG, i32, env, tl, i32, i32)    \
+    DEF_HELPER_FLAGS_4(glue(glue(atomic_, NAME), w_be),           \
+                       TCG_CALL_NO_WG, i32, env, tl, i32, i32)    \
+    DEF_HELPER_FLAGS_4(glue(glue(atomic_, NAME), l_le),           \
+                       TCG_CALL_NO_WG, i32, env, tl, i32, i32)    \
+    DEF_HELPER_FLAGS_4(glue(glue(atomic_, NAME), l_be),           \
+                       TCG_CALL_NO_WG, i32, env, tl, i32, i32)    \
+    DEF_HELPER_FLAGS_4(glue(glue(atomic_, NAME), q_le),           \
+                       TCG_CALL_NO_WG, i64, env, tl, i64, i32)    \
+    DEF_HELPER_FLAGS_4(glue(glue(atomic_, NAME), q_be),           \
+                       TCG_CALL_NO_WG, i64, env, tl, i64, i32)
+
+#else
+
+DEF_HELPER_FLAGS_4(atomic_cmpxchgb, TCG_CALL_NO_WG, i32, env, tl, i32, i32)
+DEF_HELPER_FLAGS_4(atomic_cmpxchgw_be, TCG_CALL_NO_WG, i32, env, tl, i32, i32)
+DEF_HELPER_FLAGS_4(atomic_cmpxchgl_be, TCG_CALL_NO_WG, i32, env, tl, i32, i32)
+DEF_HELPER_FLAGS_4(atomic_cmpxchgq_be, TCG_CALL_NO_WG, i64, env, tl, i64, i64)
+DEF_HELPER_FLAGS_4(atomic_cmpxchgw_le, TCG_CALL_NO_WG, i32, env, tl, i32, i32)
+DEF_HELPER_FLAGS_4(atomic_cmpxchgl_le, TCG_CALL_NO_WG, i32, env, tl, i32, i32)
+DEF_HELPER_FLAGS_4(atomic_cmpxchgq_le, TCG_CALL_NO_WG, i64, env, tl, i64, i64)
+
+#define GEN_ATOMIC_HELPERS(NAME)                             \
+    DEF_HELPER_FLAGS_3(glue(glue(atomic_, NAME), b),         \
+                       TCG_CALL_NO_WG, i32, env, tl, i32)    \
+    DEF_HELPER_FLAGS_3(glue(glue(atomic_, NAME), w_le),      \
+                       TCG_CALL_NO_WG, i32, env, tl, i32)    \
+    DEF_HELPER_FLAGS_3(glue(glue(atomic_, NAME), w_be),      \
+                       TCG_CALL_NO_WG, i32, env, tl, i32)    \
+    DEF_HELPER_FLAGS_3(glue(glue(atomic_, NAME), l_le),      \
+                       TCG_CALL_NO_WG, i32, env, tl, i32)    \
+    DEF_HELPER_FLAGS_3(glue(glue(atomic_, NAME), l_be),      \
+                       TCG_CALL_NO_WG, i32, env, tl, i32)    \
+    DEF_HELPER_FLAGS_3(glue(glue(atomic_, NAME), q_le),      \
+                       TCG_CALL_NO_WG, i64, env, tl, i64)    \
+    DEF_HELPER_FLAGS_3(glue(glue(atomic_, NAME), q_be),      \
+                       TCG_CALL_NO_WG, i64, env, tl, i64)
+
+#endif /* CONFIG_SOFTMMU */
+
+GEN_ATOMIC_HELPERS(fetch_add)
+GEN_ATOMIC_HELPERS(fetch_and)
+GEN_ATOMIC_HELPERS(fetch_or)
+GEN_ATOMIC_HELPERS(fetch_xor)
+
+GEN_ATOMIC_HELPERS(add_fetch)
+GEN_ATOMIC_HELPERS(and_fetch)
+GEN_ATOMIC_HELPERS(or_fetch)
+GEN_ATOMIC_HELPERS(xor_fetch)
+
+GEN_ATOMIC_HELPERS(xchg)
+
+#undef GEN_ATOMIC_HELPERS
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 00498fc..c91b8c6 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -1175,6 +1175,59 @@ uint64_t helper_be_ldq_cmmu(CPUArchState *env, target_ulong addr,
 # define helper_ret_ldq_cmmu  helper_le_ldq_cmmu
 #endif
 
+uint32_t helper_atomic_cmpxchgb_mmu(CPUArchState *env, target_ulong addr,
+                                    uint32_t cmpv, uint32_t newv,
+                                    TCGMemOpIdx oi, uintptr_t retaddr);
+uint32_t helper_atomic_cmpxchgw_le_mmu(CPUArchState *env, target_ulong addr,
+                                       uint32_t cmpv, uint32_t newv,
+                                       TCGMemOpIdx oi, uintptr_t retaddr);
+uint32_t helper_atomic_cmpxchgl_le_mmu(CPUArchState *env, target_ulong addr,
+                                       uint32_t cmpv, uint32_t newv,
+                                       TCGMemOpIdx oi, uintptr_t retaddr);
+uint64_t helper_atomic_cmpxchgq_le_mmu(CPUArchState *env, target_ulong addr,
+                                       uint64_t cmpv, uint64_t newv,
+                                       TCGMemOpIdx oi, uintptr_t retaddr);
+uint32_t helper_atomic_cmpxchgw_be_mmu(CPUArchState *env, target_ulong addr,
+                                       uint32_t cmpv, uint32_t newv,
+                                       TCGMemOpIdx oi, uintptr_t retaddr);
+uint32_t helper_atomic_cmpxchgl_be_mmu(CPUArchState *env, target_ulong addr,
+                                       uint32_t cmpv, uint32_t newv,
+                                       TCGMemOpIdx oi, uintptr_t retaddr);
+uint64_t helper_atomic_cmpxchgq_be_mmu(CPUArchState *env, target_ulong addr,
+                                       uint64_t cmpv, uint64_t newv,
+                                       TCGMemOpIdx oi, uintptr_t retaddr);
+
+#define GEN_ATOMIC_HELPER(NAME, TYPE, SUFFIX)         \
+TYPE helper_atomic_ ## NAME ## SUFFIX ## _mmu         \
+    (CPUArchState *env, target_ulong addr, TYPE val,  \
+     TCGMemOpIdx oi, uintptr_t retaddr);
+
+#define GEN_ATOMIC_HELPER_ALL(NAME)          \
+    GEN_ATOMIC_HELPER(NAME, uint32_t, b)      \
+    GEN_ATOMIC_HELPER(NAME, uint32_t, w_le)  \
+    GEN_ATOMIC_HELPER(NAME, uint32_t, l_le)  \
+    GEN_ATOMIC_HELPER(NAME, uint64_t, q_le)  \
+    GEN_ATOMIC_HELPER(NAME, uint32_t, w_be)  \
+    GEN_ATOMIC_HELPER(NAME, uint32_t, l_be)  \
+    GEN_ATOMIC_HELPER(NAME, uint64_t, q_be)
+
+GEN_ATOMIC_HELPER_ALL(fetch_add)
+GEN_ATOMIC_HELPER_ALL(fetch_sub)
+GEN_ATOMIC_HELPER_ALL(fetch_and)
+GEN_ATOMIC_HELPER_ALL(fetch_or)
+GEN_ATOMIC_HELPER_ALL(fetch_xor)
+
+GEN_ATOMIC_HELPER_ALL(add_fetch)
+GEN_ATOMIC_HELPER_ALL(sub_fetch)
+GEN_ATOMIC_HELPER_ALL(and_fetch)
+GEN_ATOMIC_HELPER_ALL(or_fetch)
+GEN_ATOMIC_HELPER_ALL(xor_fetch)
+
+GEN_ATOMIC_HELPER_ALL(xchg)
+
+#undef GEN_ATOMIC_HELPER_ALL
+#undef GEN_ATOMIC_HELPER
+
 #endif /* CONFIG_SOFTMMU */
 
 #endif /* TCG_H */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [Qemu-devel] [PATCH v3 14/34] tcg: Add atomic128 helpers
  2016-09-03 20:39 [Qemu-devel] [PATCH v3 00/34] cmpxchg-based emulation of atomics Richard Henderson
                   ` (12 preceding siblings ...)
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 13/34] tcg: Add atomic helpers Richard Henderson
@ 2016-09-03 20:39 ` Richard Henderson
  2016-09-13 11:18   ` Alex Bennée
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 15/34] tcg: Add CONFIG_ATOMIC64 Richard Henderson
                   ` (23 subsequent siblings)
  37 siblings, 1 reply; 97+ messages in thread
From: Richard Henderson @ 2016-09-03 20:39 UTC (permalink / raw)
  To: qemu-devel

Force the use of cmpxchg16b on x86_64.

Wikipedia suggests that only very old AMD64 (circa 2004) did not have
this instruction.  Further, it's required by Windows 8 so no new cpus
will ever omit it.

If we truely care about these, then we could check this at startup time
and then avoid executing paths that use it.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 atomic_template.h     | 40 +++++++++++++++++++++++++++++++++++++++-
 configure             | 29 ++++++++++++++++++++++++++++-
 cputlb.c              |  5 +++++
 include/qemu/int128.h |  6 ++++++
 tcg-runtime.c         | 20 +++++++++++++++++++-
 tcg/tcg.h             | 24 +++++++++++++++++++++++-
 6 files changed, 120 insertions(+), 4 deletions(-)

diff --git a/atomic_template.h b/atomic_template.h
index d2c8a08..4fdf722 100644
--- a/atomic_template.h
+++ b/atomic_template.h
@@ -18,7 +18,11 @@
  * License along with this library; if not, see <http://www.gnu.org/licenses/>.
  */
 
-#if DATA_SIZE == 8
+#if DATA_SIZE == 16
+# define SUFFIX     o
+# define DATA_TYPE  Int128
+# define BSWAP      bswap128
+#elif DATA_SIZE == 8
 # define SUFFIX     q
 # define DATA_TYPE  uint64_t
 # define BSWAP      bswap64
@@ -59,6 +63,21 @@ ABI_TYPE ATOMIC_NAME(cmpxchg)(CPUArchState *env, target_ulong addr,
     return atomic_cmpxchg__nocheck(haddr, cmpv, newv);
 }
 
+#if DATA_SIZE >= 16
+ABI_TYPE ATOMIC_NAME(ld)(CPUArchState *env, target_ulong addr EXTRA_ARGS)
+{
+    DATA_TYPE val, *haddr = ATOMIC_MMU_LOOKUP;
+    __atomic_load(haddr, &val, __ATOMIC_RELAXED);
+    return val;
+}
+
+void ATOMIC_NAME(st)(CPUArchState *env, target_ulong addr,
+                     ABI_TYPE val EXTRA_ARGS)
+{
+    DATA_TYPE *haddr = ATOMIC_MMU_LOOKUP;
+    __atomic_store(haddr, &val, __ATOMIC_RELAXED);
+}
+#else
 ABI_TYPE ATOMIC_NAME(xchg)(CPUArchState *env, target_ulong addr,
                            ABI_TYPE val EXTRA_ARGS)
 {
@@ -84,6 +103,8 @@ GEN_ATOMIC_HELPER(or_fetch)
 GEN_ATOMIC_HELPER(xor_fetch)
 
 #undef GEN_ATOMIC_HELPER
+#endif /* DATA SIZE >= 16 */
+
 #undef END
 
 #if DATA_SIZE > 1
@@ -101,6 +122,22 @@ ABI_TYPE ATOMIC_NAME(cmpxchg)(CPUArchState *env, target_ulong addr,
     return BSWAP(atomic_cmpxchg__nocheck(haddr, BSWAP(cmpv), BSWAP(newv)));
 }
 
+#if DATA_SIZE >= 16
+ABI_TYPE ATOMIC_NAME(ld)(CPUArchState *env, target_ulong addr EXTRA_ARGS)
+{
+    DATA_TYPE val, *haddr = ATOMIC_MMU_LOOKUP;
+    __atomic_load(haddr, &val, __ATOMIC_RELAXED);
+    return BSWAP(val);
+}
+
+void ATOMIC_NAME(st)(CPUArchState *env, target_ulong addr,
+                     ABI_TYPE val EXTRA_ARGS)
+{
+    DATA_TYPE *haddr = ATOMIC_MMU_LOOKUP;
+    val = BSWAP(val);
+    __atomic_store(haddr, &val, __ATOMIC_RELAXED);
+}
+#else
 ABI_TYPE ATOMIC_NAME(xchg)(CPUArchState *env, target_ulong addr,
                            ABI_TYPE val EXTRA_ARGS)
 {
@@ -162,6 +199,7 @@ ABI_TYPE ATOMIC_NAME(add_fetch)(CPUArchState *env, target_ulong addr,
         ldo = ldn;
     }
 }
+#endif /* DATA_SIZE >= 16 */
 
 #undef END
 #endif /* DATA_SIZE > 1 */
diff --git a/configure b/configure
index 4b808f9..e500652 100755
--- a/configure
+++ b/configure
@@ -1204,7 +1204,10 @@ case "$cpu" in
            cc_i386='$(CC) -m32'
            ;;
     x86_64)
-           CPU_CFLAGS="-m64"
+           # ??? Only extremely old AMD cpus do not have cmpxchg16b.
+           # If we truly care, we should simply detect this case at
+           # runtime and generate the fallback to serial emulation.
+           CPU_CFLAGS="-m64 -mcx16"
            LDFLAGS="-m64 $LDFLAGS"
            cc_i386='$(CC) -m32'
            ;;
@@ -4439,6 +4442,26 @@ if compile_prog "" "" ; then
     int128=yes
 fi
 
+#########################################
+# See if 128-bit atomic operations are supported.
+
+atomic128=no
+if test "$int128" = "yes"; then
+  cat > $TMPC << EOF
+int main(void)
+{
+  unsigned __int128 x = 0, y = 0;
+  y = __atomic_load_16(&x, 0);
+  __atomic_store_16(&x, y, 0);
+  __atomic_compare_exchange_16(&x, &y, x, 0, 0, 0);
+  return 0;
+}
+EOF
+  if compile_prog "" "" ; then
+    atomic128=yes
+  fi
+fi
+
 ########################################
 # check if getauxval is available.
 
@@ -5388,6 +5411,10 @@ if test "$int128" = "yes" ; then
   echo "CONFIG_INT128=y" >> $config_host_mak
 fi
 
+if test "$atomic128" = "yes" ; then
+  echo "CONFIG_ATOMIC128=y" >> $config_host_mak
+fi
+
 if test "$getauxval" = "yes" ; then
   echo "CONFIG_GETAUXVAL=y" >> $config_host_mak
 fi
diff --git a/cputlb.c b/cputlb.c
index a54afb4..773cc85 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -690,6 +690,11 @@ static void *atomic_mmu_lookup(CPUArchState *env, target_ulong addr,
 #define DATA_SIZE 8
 #include "atomic_template.h"
 
+#ifdef CONFIG_ATOMIC128
+#define DATA_SIZE 16
+#include "atomic_template.h"
+#endif
+
 /* Second set of helpers are directly callable from TCG as helpers.  */
 
 #undef EXTRA_ARGS
diff --git a/include/qemu/int128.h b/include/qemu/int128.h
index 67440fa..261b55f 100644
--- a/include/qemu/int128.h
+++ b/include/qemu/int128.h
@@ -2,6 +2,7 @@
 #define INT128_H
 
 #ifdef CONFIG_INT128
+#include "qemu/bswap.h"
 
 typedef __int128 Int128;
 
@@ -137,6 +138,11 @@ static inline void int128_subfrom(Int128 *a, Int128 b)
     *a -= b;
 }
 
+static inline Int128 bswap128(Int128 a)
+{
+    return int128_make128(bswap64(int128_gethi(a)), bswap64(int128_getlo(a)));
+}
+
 #else /* !CONFIG_INT128 */
 
 typedef struct Int128 Int128;
diff --git a/tcg-runtime.c b/tcg-runtime.c
index aa55d12..0c97cdf 100644
--- a/tcg-runtime.c
+++ b/tcg-runtime.c
@@ -118,8 +118,8 @@ static void *atomic_mmu_lookup(CPUArchState *env, target_ulong addr,
 /* Macro to call the above, with local variables from the use context.  */
 #define ATOMIC_MMU_LOOKUP  atomic_mmu_lookup(env, addr, DATA_SIZE, GETPC())
 
-#define ATOMIC_NAME(X)   HELPER(glue(glue(atomic_ ## X, SUFFIX), END))
 #define EXTRA_ARGS
+#define ATOMIC_NAME(X)   HELPER(glue(glue(atomic_ ## X, SUFFIX), END))
 
 #define DATA_SIZE 1
 #include "atomic_template.h"
@@ -133,4 +133,22 @@ static void *atomic_mmu_lookup(CPUArchState *env, target_ulong addr,
 #define DATA_SIZE 8
 #include "atomic_template.h"
 
+/* The following is only callable from other helpers, and matches up
+   with the softmmu version.  */
+
+#ifdef CONFIG_ATOMIC128
+
+#undef EXTRA_ARGS
+#undef ATOMIC_NAME
+#undef ATOMIC_MMU_LOOKUP
+
+#define EXTRA_ARGS     , TCGMemOpIdx oi, uintptr_t retaddr
+#define ATOMIC_NAME(X) \
+    HELPER(glue(glue(glue(atomic_ ## X, SUFFIX), END), _mmu))
+#define ATOMIC_MMU_LOOKUP  atomic_mmu_lookup(env, addr, DATA_SIZE, retaddr)
+
+#define DATA_SIZE 16
+#include "atomic_template.h"
+#endif /* CONFIG_ATOMIC128 */
+
 #endif /* !CONFIG_SOFTMMU */
diff --git a/tcg/tcg.h b/tcg/tcg.h
index c91b8c6..5a94cec 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -1227,7 +1227,29 @@ GEN_ATOMIC_HELPER_ALL(xchg)
 
 #undef GEN_ATOMIC_HELPER_ALL
 #undef GEN_ATOMIC_HELPER
-
 #endif /* CONFIG_SOFTMMU */
 
+#ifdef CONFIG_ATOMIC128
+#include "qemu/int128.h"
+
+/* These aren't really a "proper" helpers because TCG cannot manage Int128.
+   However, use the same format as the others, for use by the backends. */
+Int128 helper_atomic_cmpxchgo_le_mmu(CPUArchState *env, target_ulong addr,
+                                     Int128 cmpv, Int128 newv,
+                                     TCGMemOpIdx oi, uintptr_t retaddr);
+Int128 helper_atomic_cmpxchgo_be_mmu(CPUArchState *env, target_ulong addr,
+                                     Int128 cmpv, Int128 newv,
+                                     TCGMemOpIdx oi, uintptr_t retaddr);
+
+Int128 helper_atomic_ldo_le_mmu(CPUArchState *env, target_ulong addr,
+                                TCGMemOpIdx oi, uintptr_t retaddr);
+Int128 helper_atomic_ldo_be_mmu(CPUArchState *env, target_ulong addr,
+                                TCGMemOpIdx oi, uintptr_t retaddr);
+void helper_atomic_sto_le_mmu(CPUArchState *env, target_ulong addr, Int128 val,
+                              TCGMemOpIdx oi, uintptr_t retaddr);
+void helper_atomic_sto_be_mmu(CPUArchState *env, target_ulong addr, Int128 val,
+                              TCGMemOpIdx oi, uintptr_t retaddr);
+
+#endif /* CONFIG_ATOMIC128 */
+
 #endif /* TCG_H */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [Qemu-devel] [PATCH v3 15/34] tcg: Add CONFIG_ATOMIC64
  2016-09-03 20:39 [Qemu-devel] [PATCH v3 00/34] cmpxchg-based emulation of atomics Richard Henderson
                   ` (13 preceding siblings ...)
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 14/34] tcg: Add atomic128 helpers Richard Henderson
@ 2016-09-03 20:39 ` Richard Henderson
  2016-09-14 10:12   ` Alex Bennée
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 16/34] target-i386: emulate LOCK'ed cmpxchg using cmpxchg helpers Richard Henderson
                   ` (22 subsequent siblings)
  37 siblings, 1 reply; 97+ messages in thread
From: Richard Henderson @ 2016-09-03 20:39 UTC (permalink / raw)
  To: qemu-devel

Allow qemu to build on 32-bit hosts without 64-bit atomic ops.

Even if we only allow 32-bit hosts to multi-thread emulate 32-bit
guests, we still need some way to handle the 32-bit guest using a
64-bit atomic operation.  Do so by dropping back to single-step.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 configure         | 33 +++++++++++++++++++++++++++++++++
 cputlb.c          |  4 ++++
 tcg-runtime.c     |  7 +++++++
 tcg/tcg-op.c      | 22 ++++++++++++++++++----
 tcg/tcg-runtime.h | 46 ++++++++++++++++++++++++++++++++++++++++------
 tcg/tcg.h         | 15 ++++++++++++---
 6 files changed, 114 insertions(+), 13 deletions(-)

diff --git a/configure b/configure
index e500652..519de5d 100755
--- a/configure
+++ b/configure
@@ -4462,6 +4462,35 @@ EOF
   fi
 fi
 
+#########################################
+# See if 64-bit atomic operations are supported.
+# Note that without __atomic builtins, we can only
+# assume atomic loads/stores max at pointer size.
+
+cat > $TMPC << EOF
+#include <stdint.h>
+int main(void)
+{
+  uint64_t x = 0, y = 0;
+#ifdef __ATOMIC_RELAXED
+  y = __atomic_load_8(&x, 0);
+  __atomic_store_8(&x, y, 0);
+  __atomic_compare_exchange_8(&x, &y, x, 0, 0, 0);
+  __atomic_exchange_8(&x, y, 0);
+  __atomic_fetch_add_8(&x, y, 0);
+#else
+  char is_host64[sizeof(void *) >= sizeof(uint64_t) ? 1 : -1];
+  __sync_lock_test_and_set(&x, y);
+  __sync_val_compare_and_swap(&x, y, 0);
+  __sync_fetch_and_add(&x, y);
+#endif
+  return 0;
+}
+EOF
+if compile_prog "" "" ; then
+  atomic64=yes
+fi
+
 ########################################
 # check if getauxval is available.
 
@@ -5415,6 +5444,10 @@ if test "$atomic128" = "yes" ; then
   echo "CONFIG_ATOMIC128=y" >> $config_host_mak
 fi
 
+if test "$atomic64" = "yes" ; then
+  echo "CONFIG_ATOMIC64=y" >> $config_host_mak
+fi
+
 if test "$getauxval" = "yes" ; then
   echo "CONFIG_GETAUXVAL=y" >> $config_host_mak
 fi
diff --git a/cputlb.c b/cputlb.c
index 773cc85..8cdbf6c 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -687,8 +687,10 @@ static void *atomic_mmu_lookup(CPUArchState *env, target_ulong addr,
 #define DATA_SIZE 4
 #include "atomic_template.h"
 
+#ifdef CONFIG_ATOMIC64
 #define DATA_SIZE 8
 #include "atomic_template.h"
+#endif
 
 #ifdef CONFIG_ATOMIC128
 #define DATA_SIZE 16
@@ -713,8 +715,10 @@ static void *atomic_mmu_lookup(CPUArchState *env, target_ulong addr,
 #define DATA_SIZE 4
 #include "atomic_template.h"
 
+#ifdef CONFIG_ATOMIC64
 #define DATA_SIZE 8
 #include "atomic_template.h"
+#endif
 
 /* Code access functions.  */
 
diff --git a/tcg-runtime.c b/tcg-runtime.c
index 0c97cdf..d7704d4 100644
--- a/tcg-runtime.c
+++ b/tcg-runtime.c
@@ -101,6 +101,11 @@ int64_t HELPER(mulsh_i64)(int64_t arg1, int64_t arg2)
     return h;
 }
 
+void HELPER(exit_atomic)(CPUArchState *env)
+{
+    cpu_loop_exit_atomic(ENV_GET_CPU(env), GETRA());
+}
+
 #ifndef CONFIG_SOFTMMU
 /* The softmmu versions of these helpers are in cputlb.c.  */
 
@@ -130,8 +135,10 @@ static void *atomic_mmu_lookup(CPUArchState *env, target_ulong addr,
 #define DATA_SIZE 4
 #include "atomic_template.h"
 
+#ifdef CONFIG_ATOMIC64
 #define DATA_SIZE 8
 #include "atomic_template.h"
+#endif
 
 /* The following is only callable from other helpers, and matches up
    with the softmmu version.  */
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index e146ad4..1c09025 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -2023,14 +2023,20 @@ typedef void (*gen_atomic_op_i32)(TCGv_i32, TCGv_env, TCGv, TCGv_i32);
 typedef void (*gen_atomic_op_i64)(TCGv_i64, TCGv_env, TCGv, TCGv_i64);
 #endif
 
+#ifdef CONFIG_ATOMIC64
+# define WITH_ATOMIC64(X) X,
+#else
+# define WITH_ATOMIC64(X)
+#endif
+
 static void * const table_cmpxchg[16] = {
     [MO_8] = gen_helper_atomic_cmpxchgb,
     [MO_16 | MO_LE] = gen_helper_atomic_cmpxchgw_le,
     [MO_16 | MO_BE] = gen_helper_atomic_cmpxchgw_be,
     [MO_32 | MO_LE] = gen_helper_atomic_cmpxchgl_le,
     [MO_32 | MO_BE] = gen_helper_atomic_cmpxchgl_be,
-    [MO_64 | MO_LE] = gen_helper_atomic_cmpxchgq_le,
-    [MO_64 | MO_BE] = gen_helper_atomic_cmpxchgq_be,
+    WITH_ATOMIC64([MO_64 | MO_LE] = gen_helper_atomic_cmpxchgq_le)
+    WITH_ATOMIC64([MO_64 | MO_BE] = gen_helper_atomic_cmpxchgq_be)
 };
 
 void tcg_gen_atomic_cmpxchg_i32(TCGv_i32 retv, TCGv addr, TCGv_i32 cmpv,
@@ -2100,6 +2106,7 @@ void tcg_gen_atomic_cmpxchg_i64(TCGv_i64 retv, TCGv addr, TCGv_i64 cmpv,
         }
         tcg_temp_free_i64(t1);
     } else if ((memop & MO_SIZE) == MO_64) {
+#ifdef CONFIG_ATOMIC64
         gen_atomic_cx_i64 gen;
 
         gen = table_cmpxchg[memop & (MO_SIZE | MO_BSWAP)];
@@ -2114,6 +2121,9 @@ void tcg_gen_atomic_cmpxchg_i64(TCGv_i64 retv, TCGv addr, TCGv_i64 cmpv,
 #else
         gen(retv, tcg_ctx.tcg_env, addr, cmpv, newv);
 #endif
+#else
+        gen_helper_exit_atomic(tcg_ctx.tcg_env);
+#endif /* CONFIG_ATOMIC64 */
     } else {
         TCGv_i32 c32 = tcg_temp_new_i32();
         TCGv_i32 n32 = tcg_temp_new_i32();
@@ -2201,6 +2211,7 @@ static void do_atomic_op_i64(TCGv_i64 ret, TCGv addr, TCGv_i64 val,
     memop = tcg_canonicalize_memop(memop, 1, 0);
 
     if ((memop & MO_SIZE) == MO_64) {
+#ifdef CONFIG_ATOMIC64
         gen_atomic_op_i64 gen;
 
         gen = table[memop & (MO_SIZE | MO_BSWAP)];
@@ -2215,6 +2226,9 @@ static void do_atomic_op_i64(TCGv_i64 ret, TCGv addr, TCGv_i64 val,
 #else
         gen(ret, tcg_ctx.tcg_env, addr, val);
 #endif
+#else
+        gen_helper_exit_atomic(tcg_ctx.tcg_env);
+#endif /* CONFIG_ATOMIC64 */
     } else {
         TCGv_i32 v32 = tcg_temp_new_i32();
         TCGv_i32 r32 = tcg_temp_new_i32();
@@ -2239,8 +2253,8 @@ static void * const table_##NAME[16] = {                                \
     [MO_16 | MO_BE] = gen_helper_atomic_##NAME##w_be,                   \
     [MO_32 | MO_LE] = gen_helper_atomic_##NAME##l_le,                   \
     [MO_32 | MO_BE] = gen_helper_atomic_##NAME##l_be,                   \
-    [MO_64 | MO_LE] = gen_helper_atomic_##NAME##q_le,                   \
-    [MO_64 | MO_BE] = gen_helper_atomic_##NAME##q_be,                   \
+    WITH_ATOMIC64([MO_64 | MO_LE] = gen_helper_atomic_##NAME##q_le)     \
+    WITH_ATOMIC64([MO_64 | MO_BE] = gen_helper_atomic_##NAME##q_be)     \
 };                                                                      \
 void tcg_gen_atomic_##NAME##_i32                                        \
     (TCGv_i32 ret, TCGv addr, TCGv_i32 val, TCGArg idx, TCGMemOp memop) \
diff --git a/tcg/tcg-runtime.h b/tcg/tcg-runtime.h
index 22367aa..1deb86a 100644
--- a/tcg/tcg-runtime.h
+++ b/tcg/tcg-runtime.h
@@ -15,23 +15,28 @@ DEF_HELPER_FLAGS_2(sar_i64, TCG_CALL_NO_RWG_SE, s64, s64, s64)
 DEF_HELPER_FLAGS_2(mulsh_i64, TCG_CALL_NO_RWG_SE, s64, s64, s64)
 DEF_HELPER_FLAGS_2(muluh_i64, TCG_CALL_NO_RWG_SE, i64, i64, i64)
 
+DEF_HELPER_FLAGS_1(exit_atomic, TCG_CALL_NO_WG, noreturn, env)
+
 #ifdef CONFIG_SOFTMMU
 
 DEF_HELPER_FLAGS_5(atomic_cmpxchgb, TCG_CALL_NO_WG,
                    i32, env, tl, i32, i32, i32)
 DEF_HELPER_FLAGS_5(atomic_cmpxchgw_be, TCG_CALL_NO_WG,
                    i32, env, tl, i32, i32, i32)
-DEF_HELPER_FLAGS_5(atomic_cmpxchgl_be, TCG_CALL_NO_WG,
-                   i32, env, tl, i32, i32, i32)
-DEF_HELPER_FLAGS_5(atomic_cmpxchgq_be, TCG_CALL_NO_WG,
-                   i64, env, tl, i64, i64, i32)
 DEF_HELPER_FLAGS_5(atomic_cmpxchgw_le, TCG_CALL_NO_WG,
                    i32, env, tl, i32, i32, i32)
+DEF_HELPER_FLAGS_5(atomic_cmpxchgl_be, TCG_CALL_NO_WG,
+                   i32, env, tl, i32, i32, i32)
 DEF_HELPER_FLAGS_5(atomic_cmpxchgl_le, TCG_CALL_NO_WG,
                    i32, env, tl, i32, i32, i32)
+#ifdef CONFIG_ATOMIC64
+DEF_HELPER_FLAGS_5(atomic_cmpxchgq_be, TCG_CALL_NO_WG,
+                   i64, env, tl, i64, i64, i32)
 DEF_HELPER_FLAGS_5(atomic_cmpxchgq_le, TCG_CALL_NO_WG,
                    i64, env, tl, i64, i64, i32)
+#endif
 
+#ifdef CONFIG_ATOMIC64
 #define GEN_ATOMIC_HELPERS(NAME)                                  \
     DEF_HELPER_FLAGS_4(glue(glue(atomic_, NAME), b),              \
                        TCG_CALL_NO_WG, i32, env, tl, i32, i32)    \
@@ -47,17 +52,33 @@ DEF_HELPER_FLAGS_5(atomic_cmpxchgq_le, TCG_CALL_NO_WG,
                        TCG_CALL_NO_WG, i64, env, tl, i64, i32)    \
     DEF_HELPER_FLAGS_4(glue(glue(atomic_, NAME), q_be),           \
                        TCG_CALL_NO_WG, i64, env, tl, i64, i32)
+#else
+#define GEN_ATOMIC_HELPERS(NAME)                                  \
+    DEF_HELPER_FLAGS_4(glue(glue(atomic_, NAME), b),              \
+                       TCG_CALL_NO_WG, i32, env, tl, i32, i32)    \
+    DEF_HELPER_FLAGS_4(glue(glue(atomic_, NAME), w_le),           \
+                       TCG_CALL_NO_WG, i32, env, tl, i32, i32)    \
+    DEF_HELPER_FLAGS_4(glue(glue(atomic_, NAME), w_be),           \
+                       TCG_CALL_NO_WG, i32, env, tl, i32, i32)    \
+    DEF_HELPER_FLAGS_4(glue(glue(atomic_, NAME), l_le),           \
+                       TCG_CALL_NO_WG, i32, env, tl, i32, i32)    \
+    DEF_HELPER_FLAGS_4(glue(glue(atomic_, NAME), l_be),           \
+                       TCG_CALL_NO_WG, i32, env, tl, i32, i32)
+#endif /* CONFIG_ATOMIC64 */
 
 #else
 
 DEF_HELPER_FLAGS_4(atomic_cmpxchgb, TCG_CALL_NO_WG, i32, env, tl, i32, i32)
 DEF_HELPER_FLAGS_4(atomic_cmpxchgw_be, TCG_CALL_NO_WG, i32, env, tl, i32, i32)
-DEF_HELPER_FLAGS_4(atomic_cmpxchgl_be, TCG_CALL_NO_WG, i32, env, tl, i32, i32)
-DEF_HELPER_FLAGS_4(atomic_cmpxchgq_be, TCG_CALL_NO_WG, i64, env, tl, i64, i64)
 DEF_HELPER_FLAGS_4(atomic_cmpxchgw_le, TCG_CALL_NO_WG, i32, env, tl, i32, i32)
+DEF_HELPER_FLAGS_4(atomic_cmpxchgl_be, TCG_CALL_NO_WG, i32, env, tl, i32, i32)
 DEF_HELPER_FLAGS_4(atomic_cmpxchgl_le, TCG_CALL_NO_WG, i32, env, tl, i32, i32)
+#ifdef CONFIG_ATOMIC64
+DEF_HELPER_FLAGS_4(atomic_cmpxchgq_be, TCG_CALL_NO_WG, i64, env, tl, i64, i64)
 DEF_HELPER_FLAGS_4(atomic_cmpxchgq_le, TCG_CALL_NO_WG, i64, env, tl, i64, i64)
+#endif
 
+#ifdef CONFIG_ATOMIC64
 #define GEN_ATOMIC_HELPERS(NAME)                             \
     DEF_HELPER_FLAGS_3(glue(glue(atomic_, NAME), b),         \
                        TCG_CALL_NO_WG, i32, env, tl, i32)    \
@@ -73,6 +94,19 @@ DEF_HELPER_FLAGS_4(atomic_cmpxchgq_le, TCG_CALL_NO_WG, i64, env, tl, i64, i64)
                        TCG_CALL_NO_WG, i64, env, tl, i64)    \
     DEF_HELPER_FLAGS_3(glue(glue(atomic_, NAME), q_be),      \
                        TCG_CALL_NO_WG, i64, env, tl, i64)
+#else
+#define GEN_ATOMIC_HELPERS(NAME)                             \
+    DEF_HELPER_FLAGS_3(glue(glue(atomic_, NAME), b),         \
+                       TCG_CALL_NO_WG, i32, env, tl, i32)    \
+    DEF_HELPER_FLAGS_3(glue(glue(atomic_, NAME), w_le),      \
+                       TCG_CALL_NO_WG, i32, env, tl, i32)    \
+    DEF_HELPER_FLAGS_3(glue(glue(atomic_, NAME), w_be),      \
+                       TCG_CALL_NO_WG, i32, env, tl, i32)    \
+    DEF_HELPER_FLAGS_3(glue(glue(atomic_, NAME), l_le),      \
+                       TCG_CALL_NO_WG, i32, env, tl, i32)    \
+    DEF_HELPER_FLAGS_3(glue(glue(atomic_, NAME), l_be),      \
+                       TCG_CALL_NO_WG, i32, env, tl, i32)
+#endif /* CONFIG_ATOMIC64 */
 
 #endif /* CONFIG_SOFTMMU */
 
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 5a94cec..8bcf32a 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -1202,14 +1202,23 @@ TYPE helper_atomic_ ## NAME ## SUFFIX ## _mmu         \
     (CPUArchState *env, target_ulong addr, TYPE val,  \
      TCGMemOpIdx oi, uintptr_t retaddr);
 
+#ifdef CONFIG_ATOMIC64
 #define GEN_ATOMIC_HELPER_ALL(NAME)          \
-    GEN_ATOMIC_HELPER(NAME, uint32_t, b)      \
+    GEN_ATOMIC_HELPER(NAME, uint32_t, b)     \
     GEN_ATOMIC_HELPER(NAME, uint32_t, w_le)  \
-    GEN_ATOMIC_HELPER(NAME, uint32_t, l_le)  \
-    GEN_ATOMIC_HELPER(NAME, uint64_t, q_le)  \
     GEN_ATOMIC_HELPER(NAME, uint32_t, w_be)  \
+    GEN_ATOMIC_HELPER(NAME, uint32_t, l_le)  \
     GEN_ATOMIC_HELPER(NAME, uint32_t, l_be)  \
+    GEN_ATOMIC_HELPER(NAME, uint64_t, q_le)  \
     GEN_ATOMIC_HELPER(NAME, uint64_t, q_be)
+#else
+#define GEN_ATOMIC_HELPER_ALL(NAME)          \
+    GEN_ATOMIC_HELPER(NAME, uint32_t, b)     \
+    GEN_ATOMIC_HELPER(NAME, uint32_t, w_le)  \
+    GEN_ATOMIC_HELPER(NAME, uint32_t, w_be)  \
+    GEN_ATOMIC_HELPER(NAME, uint32_t, l_le)  \
+    GEN_ATOMIC_HELPER(NAME, uint32_t, l_be)
+#endif
 
 GEN_ATOMIC_HELPER_ALL(fetch_add)
 GEN_ATOMIC_HELPER_ALL(fetch_sub)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [Qemu-devel] [PATCH v3 16/34] target-i386: emulate LOCK'ed cmpxchg using cmpxchg helpers
  2016-09-03 20:39 [Qemu-devel] [PATCH v3 00/34] cmpxchg-based emulation of atomics Richard Henderson
                   ` (14 preceding siblings ...)
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 15/34] tcg: Add CONFIG_ATOMIC64 Richard Henderson
@ 2016-09-03 20:39 ` Richard Henderson
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 17/34] target-i386: emulate LOCK'ed OP instructions using atomic helpers Richard Henderson
                   ` (21 subsequent siblings)
  37 siblings, 0 replies; 97+ messages in thread
From: Richard Henderson @ 2016-09-03 20:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: Emilio G. Cota

From: "Emilio G. Cota" <cota@braap.org>

The diff here is uglier than necessary. All this does is to turn

FOO

into:

if (s->prefix & PREFIX_LOCK) {
  BAR
} else {
  FOO
}

where FOO is the original implementation of an unlocked cmpxchg.

[rth: Adjust unlocked cmpxchg to use movcond instead of branches.
Adjust helpers to use atomic helpers.]

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1467054136-10430-6-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-i386/helper.h     |   2 +
 target-i386/mem_helper.c | 134 +++++++++++++++++++++++++++++++++++++++--------
 target-i386/translate.c  |  99 ++++++++++++++++++----------------
 3 files changed, 169 insertions(+), 66 deletions(-)

diff --git a/target-i386/helper.h b/target-i386/helper.h
index 1320edc..729d4b6 100644
--- a/target-i386/helper.h
+++ b/target-i386/helper.h
@@ -74,8 +74,10 @@ DEF_HELPER_3(boundw, void, env, tl, int)
 DEF_HELPER_3(boundl, void, env, tl, int)
 DEF_HELPER_1(rsm, void, env)
 DEF_HELPER_2(into, void, env, int)
+DEF_HELPER_2(cmpxchg8b_unlocked, void, env, tl)
 DEF_HELPER_2(cmpxchg8b, void, env, tl)
 #ifdef TARGET_X86_64
+DEF_HELPER_2(cmpxchg16b_unlocked, void, env, tl)
 DEF_HELPER_2(cmpxchg16b, void, env, tl)
 #endif
 DEF_HELPER_1(single_step, void, env)
diff --git a/target-i386/mem_helper.c b/target-i386/mem_helper.c
index 5bc0594..8b72fda 100644
--- a/target-i386/mem_helper.c
+++ b/target-i386/mem_helper.c
@@ -22,6 +22,8 @@
 #include "exec/helper-proto.h"
 #include "exec/exec-all.h"
 #include "exec/cpu_ldst.h"
+#include "qemu/int128.h"
+#include "tcg.h"
 
 /* broken thread support */
 
@@ -56,53 +58,143 @@ void helper_lock_init(void)
 }
 #endif
 
+void helper_cmpxchg8b_unlocked(CPUX86State *env, target_ulong a0)
+{
+    uintptr_t ra = GETPC();
+    uint64_t oldv, cmpv, newv;
+    int eflags;
+
+    eflags = cpu_cc_compute_all(env, CC_OP);
+
+    cmpv = deposit64(env->regs[R_EAX], 32, 32, env->regs[R_EDX]);
+    newv = deposit64(env->regs[R_EBX], 32, 32, env->regs[R_ECX]);
+
+    oldv = cpu_ldq_data_ra(env, a0, ra);
+    newv = (cmpv == oldv ? newv : oldv);
+    /* always do the store */
+    cpu_stq_data_ra(env, a0, newv, ra);
+
+    if (oldv == cmpv) {
+        eflags |= CC_Z;
+    } else {
+        env->regs[R_EAX] = (uint32_t)oldv;
+        env->regs[R_EDX] = (uint32_t)(oldv >> 32);
+        eflags &= ~CC_Z;
+    }
+    CC_SRC = eflags;
+}
+
 void helper_cmpxchg8b(CPUX86State *env, target_ulong a0)
 {
-    uint64_t d;
+#ifdef CONFIG_ATOMIC64
+    uint64_t oldv, cmpv, newv;
     int eflags;
 
     eflags = cpu_cc_compute_all(env, CC_OP);
-    d = cpu_ldq_data_ra(env, a0, GETPC());
-    if (d == (((uint64_t)env->regs[R_EDX] << 32) | (uint32_t)env->regs[R_EAX])) {
-        cpu_stq_data_ra(env, a0, ((uint64_t)env->regs[R_ECX] << 32)
-                                  | (uint32_t)env->regs[R_EBX], GETPC());
+
+    cmpv = deposit64(env->regs[R_EAX], 32, 32, env->regs[R_EDX]);
+    newv = deposit64(env->regs[R_EBX], 32, 32, env->regs[R_ECX]);
+
+#ifdef CONFIG_USER_ONLY
+    {
+        uint64_t *haddr = g2h(a0);
+        cmpv = cpu_to_le64(cmpv);
+        newv = cpu_to_le64(newv);
+        oldv = atomic_cmpxchg__nocheck(haddr, cmpv, newv);
+        oldv = le64_to_cpu(oldv);
+    }
+#else
+    {
+        uintptr_t ra = GETPC();
+        int mem_idx = cpu_mmu_index(env, false);
+        TCGMemOpIdx oi = make_memop_idx(MO_TEQ, mem_idx);
+        oldv = helper_atomic_cmpxchgq_le_mmu(env, a0, cmpv, newv, oi, ra);
+    }
+#endif
+
+    if (oldv == cmpv) {
         eflags |= CC_Z;
     } else {
-        /* always do the store */
-        cpu_stq_data_ra(env, a0, d, GETPC());
-        env->regs[R_EDX] = (uint32_t)(d >> 32);
-        env->regs[R_EAX] = (uint32_t)d;
+        env->regs[R_EAX] = (uint32_t)oldv;
+        env->regs[R_EDX] = (uint32_t)(oldv >> 32);
         eflags &= ~CC_Z;
     }
     CC_SRC = eflags;
+#else
+    cpu_loop_exit_atomic(ENV_GET_CPU(env), GETRA());
+#endif /* CONFIG_ATOMIC64 */
 }
 
 #ifdef TARGET_X86_64
-void helper_cmpxchg16b(CPUX86State *env, target_ulong a0)
+void helper_cmpxchg16b_unlocked(CPUX86State *env, target_ulong a0)
 {
-    uint64_t d0, d1;
+    uintptr_t ra = GETPC();
+    Int128 oldv, cmpv, newv;
+    uint64_t o0, o1;
     int eflags;
+    bool success;
 
     if ((a0 & 0xf) != 0) {
         raise_exception_ra(env, EXCP0D_GPF, GETPC());
     }
     eflags = cpu_cc_compute_all(env, CC_OP);
-    d0 = cpu_ldq_data_ra(env, a0, GETPC());
-    d1 = cpu_ldq_data_ra(env, a0 + 8, GETPC());
-    if (d0 == env->regs[R_EAX] && d1 == env->regs[R_EDX]) {
-        cpu_stq_data_ra(env, a0, env->regs[R_EBX], GETPC());
-        cpu_stq_data_ra(env, a0 + 8, env->regs[R_ECX], GETPC());
+
+    cmpv = int128_make128(env->regs[R_EAX], env->regs[R_EDX]);
+    newv = int128_make128(env->regs[R_EBX], env->regs[R_ECX]);
+
+    o0 = cpu_ldq_data_ra(env, a0 + 0, ra);
+    o1 = cpu_ldq_data_ra(env, a0 + 8, ra);
+
+    oldv = int128_make128(o0, o1);
+    success = int128_eq(oldv, cmpv);
+    if (!success) {
+        newv = oldv;
+    }
+
+    cpu_stq_data_ra(env, a0 + 0, int128_getlo(newv), ra);
+    cpu_stq_data_ra(env, a0 + 8, int128_gethi(newv), ra);
+
+    if (success) {
         eflags |= CC_Z;
     } else {
-        /* always do the store */
-        cpu_stq_data_ra(env, a0, d0, GETPC());
-        cpu_stq_data_ra(env, a0 + 8, d1, GETPC());
-        env->regs[R_EDX] = d1;
-        env->regs[R_EAX] = d0;
+        env->regs[R_EAX] = int128_getlo(oldv);
+        env->regs[R_EDX] = int128_gethi(oldv);
         eflags &= ~CC_Z;
     }
     CC_SRC = eflags;
 }
+
+void helper_cmpxchg16b(CPUX86State *env, target_ulong a0)
+{
+    uintptr_t ra = GETPC();
+
+    if ((a0 & 0xf) != 0) {
+        raise_exception_ra(env, EXCP0D_GPF, ra);
+    } else {
+#ifndef CONFIG_ATOMIC128
+        cpu_loop_exit_atomic(ENV_GET_CPU(env), ra);
+#else
+        int eflags = cpu_cc_compute_all(env, CC_OP);
+
+        Int128 cmpv = int128_make128(env->regs[R_EAX], env->regs[R_EDX]);
+        Int128 newv = int128_make128(env->regs[R_EBX], env->regs[R_ECX]);
+
+        int mem_idx = cpu_mmu_index(env, false);
+        TCGMemOpIdx oi = make_memop_idx(MO_TEQ | MO_ALIGN_16, mem_idx);
+        Int128 oldv = helper_atomic_cmpxchgo_le_mmu(env, a0, cmpv,
+                                                    newv, oi, ra);
+
+        if (int128_eq(oldv, cmpv)) {
+            eflags |= CC_Z;
+        } else {
+            env->regs[R_EAX] = int128_getlo(oldv);
+            env->regs[R_EDX] = int128_gethi(oldv);
+            eflags &= ~CC_Z;
+        }
+        CC_SRC = eflags;
+#endif
+    }
+}
 #endif
 
 void helper_boundw(CPUX86State *env, target_ulong a0, int v)
diff --git a/target-i386/translate.c b/target-i386/translate.c
index fa2ac48..cc73d62 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -5070,57 +5070,58 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
     case 0x1b0:
     case 0x1b1: /* cmpxchg Ev, Gv */
         {
-            TCGLabel *label1, *label2;
-            TCGv t0, t1, t2, a0;
+            TCGv oldv, newv, cmpv;
 
             ot = mo_b_d(b, dflag);
             modrm = cpu_ldub_code(env, s->pc++);
             reg = ((modrm >> 3) & 7) | rex_r;
             mod = (modrm >> 6) & 3;
-            t0 = tcg_temp_local_new();
-            t1 = tcg_temp_local_new();
-            t2 = tcg_temp_local_new();
-            a0 = tcg_temp_local_new();
-            gen_op_mov_v_reg(ot, t1, reg);
-            if (mod == 3) {
-                rm = (modrm & 7) | REX_B(s);
-                gen_op_mov_v_reg(ot, t0, rm);
-            } else {
+            oldv = tcg_temp_new();
+            newv = tcg_temp_new();
+            cmpv = tcg_temp_new();
+            gen_op_mov_v_reg(ot, newv, reg);
+            tcg_gen_mov_tl(cmpv, cpu_regs[R_EAX]);
+
+            if (s->prefix & PREFIX_LOCK) {
+                if (mod == 3) {
+                    goto illegal_op;
+                }
                 gen_lea_modrm(env, s, modrm);
-                tcg_gen_mov_tl(a0, cpu_A0);
-                gen_op_ld_v(s, ot, t0, a0);
-                rm = 0; /* avoid warning */
-            }
-            label1 = gen_new_label();
-            tcg_gen_mov_tl(t2, cpu_regs[R_EAX]);
-            gen_extu(ot, t0);
-            gen_extu(ot, t2);
-            tcg_gen_brcond_tl(TCG_COND_EQ, t2, t0, label1);
-            label2 = gen_new_label();
-            if (mod == 3) {
-                gen_op_mov_reg_v(ot, R_EAX, t0);
-                tcg_gen_br(label2);
-                gen_set_label(label1);
-                gen_op_mov_reg_v(ot, rm, t1);
+                tcg_gen_atomic_cmpxchg_tl(oldv, cpu_A0, cmpv, newv,
+                                          s->mem_index, ot | MO_LE);
+                gen_op_mov_reg_v(ot, R_EAX, oldv);
             } else {
-                /* perform no-op store cycle like physical cpu; must be
-                   before changing accumulator to ensure idempotency if
-                   the store faults and the instruction is restarted */
-                gen_op_st_v(s, ot, t0, a0);
-                gen_op_mov_reg_v(ot, R_EAX, t0);
-                tcg_gen_br(label2);
-                gen_set_label(label1);
-                gen_op_st_v(s, ot, t1, a0);
-            }
-            gen_set_label(label2);
-            tcg_gen_mov_tl(cpu_cc_src, t0);
-            tcg_gen_mov_tl(cpu_cc_srcT, t2);
-            tcg_gen_sub_tl(cpu_cc_dst, t2, t0);
+                if (mod == 3) {
+                    rm = (modrm & 7) | REX_B(s);
+                    gen_op_mov_v_reg(ot, oldv, rm);
+                } else {
+                    gen_lea_modrm(env, s, modrm);
+                    gen_op_ld_v(s, ot, oldv, cpu_A0);
+                    rm = 0; /* avoid warning */
+                }
+                gen_extu(ot, oldv);
+                gen_extu(ot, cmpv);
+                /* store value = (old == cmp ? new : old);  */
+                tcg_gen_movcond_tl(TCG_COND_EQ, newv, oldv, cmpv, newv, oldv);
+                if (mod == 3) {
+                    gen_op_mov_reg_v(ot, R_EAX, oldv);
+                    gen_op_mov_reg_v(ot, rm, newv);
+                } else {
+                    /* Perform an unconditional store cycle like physical cpu;
+                       must be before changing accumulator to ensure
+                       idempotency if the store faults and the instruction
+                       is restarted */
+                    gen_op_st_v(s, ot, newv, cpu_A0);
+                    gen_op_mov_reg_v(ot, R_EAX, oldv);
+                }
+            }
+            tcg_gen_mov_tl(cpu_cc_src, oldv);
+            tcg_gen_mov_tl(cpu_cc_srcT, cmpv);
+            tcg_gen_sub_tl(cpu_cc_dst, cmpv, oldv);
             set_cc_op(s, CC_OP_SUBB + ot);
-            tcg_temp_free(t0);
-            tcg_temp_free(t1);
-            tcg_temp_free(t2);
-            tcg_temp_free(a0);
+            tcg_temp_free(oldv);
+            tcg_temp_free(newv);
+            tcg_temp_free(cmpv);
         }
         break;
     case 0x1c7: /* cmpxchg8b */
@@ -5133,14 +5134,22 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
             if (!(s->cpuid_ext_features & CPUID_EXT_CX16))
                 goto illegal_op;
             gen_lea_modrm(env, s, modrm);
-            gen_helper_cmpxchg16b(cpu_env, cpu_A0);
+            if ((s->prefix & PREFIX_LOCK) && parallel_cpus) {
+                gen_helper_cmpxchg16b(cpu_env, cpu_A0);
+            } else {
+                gen_helper_cmpxchg16b_unlocked(cpu_env, cpu_A0);
+            }
         } else
 #endif        
         {
             if (!(s->cpuid_features & CPUID_CX8))
                 goto illegal_op;
             gen_lea_modrm(env, s, modrm);
-            gen_helper_cmpxchg8b(cpu_env, cpu_A0);
+            if ((s->prefix & PREFIX_LOCK) && parallel_cpus) {
+                gen_helper_cmpxchg8b(cpu_env, cpu_A0);
+            } else {
+                gen_helper_cmpxchg8b_unlocked(cpu_env, cpu_A0);
+            }
         }
         set_cc_op(s, CC_OP_EFLAGS);
         break;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [Qemu-devel] [PATCH v3 17/34] target-i386: emulate LOCK'ed OP instructions using atomic helpers
  2016-09-03 20:39 [Qemu-devel] [PATCH v3 00/34] cmpxchg-based emulation of atomics Richard Henderson
                   ` (15 preceding siblings ...)
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 16/34] target-i386: emulate LOCK'ed cmpxchg using cmpxchg helpers Richard Henderson
@ 2016-09-03 20:39 ` Richard Henderson
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 18/34] target-i386: emulate LOCK'ed INC using atomic helper Richard Henderson
                   ` (20 subsequent siblings)
  37 siblings, 0 replies; 97+ messages in thread
From: Richard Henderson @ 2016-09-03 20:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: Emilio G. Cota

From: "Emilio G. Cota" <cota@braap.org>

[rth: Eliminate some unnecessary temporaries.]

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1467054136-10430-13-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-i386/translate.c | 76 +++++++++++++++++++++++++++++++++++++------------
 1 file changed, 58 insertions(+), 18 deletions(-)

diff --git a/target-i386/translate.c b/target-i386/translate.c
index cc73d62..b999a3a 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -1258,55 +1258,95 @@ static void gen_op(DisasContext *s1, int op, TCGMemOp ot, int d)
 {
     if (d != OR_TMP0) {
         gen_op_mov_v_reg(ot, cpu_T0, d);
-    } else {
+    } else if (!(s1->prefix & PREFIX_LOCK)) {
         gen_op_ld_v(s1, ot, cpu_T0, cpu_A0);
     }
     switch(op) {
     case OP_ADCL:
         gen_compute_eflags_c(s1, cpu_tmp4);
-        tcg_gen_add_tl(cpu_T0, cpu_T0, cpu_T1);
-        tcg_gen_add_tl(cpu_T0, cpu_T0, cpu_tmp4);
-        gen_op_st_rm_T0_A0(s1, ot, d);
+        if (s1->prefix & PREFIX_LOCK) {
+            tcg_gen_add_tl(cpu_T0, cpu_tmp4, cpu_T1);
+            tcg_gen_atomic_add_fetch_tl(cpu_T0, cpu_A0, cpu_T0,
+                                        s1->mem_index, ot | MO_LE);
+        } else {
+            tcg_gen_add_tl(cpu_T0, cpu_T0, cpu_T1);
+            tcg_gen_add_tl(cpu_T0, cpu_T0, cpu_tmp4);
+            gen_op_st_rm_T0_A0(s1, ot, d);
+        }
         gen_op_update3_cc(cpu_tmp4);
         set_cc_op(s1, CC_OP_ADCB + ot);
         break;
     case OP_SBBL:
         gen_compute_eflags_c(s1, cpu_tmp4);
-        tcg_gen_sub_tl(cpu_T0, cpu_T0, cpu_T1);
-        tcg_gen_sub_tl(cpu_T0, cpu_T0, cpu_tmp4);
-        gen_op_st_rm_T0_A0(s1, ot, d);
+        if (s1->prefix & PREFIX_LOCK) {
+            tcg_gen_add_tl(cpu_T0, cpu_T1, cpu_tmp4);
+            tcg_gen_neg_tl(cpu_T0, cpu_T0);
+            tcg_gen_atomic_add_fetch_tl(cpu_T0, cpu_A0, cpu_T0,
+                                        s1->mem_index, ot | MO_LE);
+        } else {
+            tcg_gen_sub_tl(cpu_T0, cpu_T0, cpu_T1);
+            tcg_gen_sub_tl(cpu_T0, cpu_T0, cpu_tmp4);
+            gen_op_st_rm_T0_A0(s1, ot, d);
+        }
         gen_op_update3_cc(cpu_tmp4);
         set_cc_op(s1, CC_OP_SBBB + ot);
         break;
     case OP_ADDL:
-        tcg_gen_add_tl(cpu_T0, cpu_T0, cpu_T1);
-        gen_op_st_rm_T0_A0(s1, ot, d);
+        if (s1->prefix & PREFIX_LOCK) {
+            tcg_gen_atomic_add_fetch_tl(cpu_T0, cpu_A0, cpu_T1,
+                                        s1->mem_index, ot | MO_LE);
+        } else {
+            tcg_gen_add_tl(cpu_T0, cpu_T0, cpu_T1);
+            gen_op_st_rm_T0_A0(s1, ot, d);
+        }
         gen_op_update2_cc();
         set_cc_op(s1, CC_OP_ADDB + ot);
         break;
     case OP_SUBL:
-        tcg_gen_mov_tl(cpu_cc_srcT, cpu_T0);
-        tcg_gen_sub_tl(cpu_T0, cpu_T0, cpu_T1);
-        gen_op_st_rm_T0_A0(s1, ot, d);
+        if (s1->prefix & PREFIX_LOCK) {
+            tcg_gen_neg_tl(cpu_T0, cpu_T1);
+            tcg_gen_atomic_fetch_add_tl(cpu_cc_srcT, cpu_A0, cpu_T0,
+                                        s1->mem_index, ot | MO_LE);
+            tcg_gen_sub_tl(cpu_T0, cpu_cc_srcT, cpu_T1);
+        } else {
+            tcg_gen_mov_tl(cpu_cc_srcT, cpu_T0);
+            tcg_gen_sub_tl(cpu_T0, cpu_T0, cpu_T1);
+            gen_op_st_rm_T0_A0(s1, ot, d);
+        }
         gen_op_update2_cc();
         set_cc_op(s1, CC_OP_SUBB + ot);
         break;
     default:
     case OP_ANDL:
-        tcg_gen_and_tl(cpu_T0, cpu_T0, cpu_T1);
-        gen_op_st_rm_T0_A0(s1, ot, d);
+        if (s1->prefix & PREFIX_LOCK) {
+            tcg_gen_atomic_and_fetch_tl(cpu_T0, cpu_A0, cpu_T1,
+                                        s1->mem_index, ot | MO_LE);
+        } else {
+            tcg_gen_and_tl(cpu_T0, cpu_T0, cpu_T1);
+            gen_op_st_rm_T0_A0(s1, ot, d);
+        }
         gen_op_update1_cc();
         set_cc_op(s1, CC_OP_LOGICB + ot);
         break;
     case OP_ORL:
-        tcg_gen_or_tl(cpu_T0, cpu_T0, cpu_T1);
-        gen_op_st_rm_T0_A0(s1, ot, d);
+        if (s1->prefix & PREFIX_LOCK) {
+            tcg_gen_atomic_or_fetch_tl(cpu_T0, cpu_A0, cpu_T1,
+                                       s1->mem_index, ot | MO_LE);
+        } else {
+            tcg_gen_or_tl(cpu_T0, cpu_T0, cpu_T1);
+            gen_op_st_rm_T0_A0(s1, ot, d);
+        }
         gen_op_update1_cc();
         set_cc_op(s1, CC_OP_LOGICB + ot);
         break;
     case OP_XORL:
-        tcg_gen_xor_tl(cpu_T0, cpu_T0, cpu_T1);
-        gen_op_st_rm_T0_A0(s1, ot, d);
+        if (s1->prefix & PREFIX_LOCK) {
+            tcg_gen_atomic_xor_fetch_tl(cpu_T0, cpu_A0, cpu_T1,
+                                        s1->mem_index, ot | MO_LE);
+        } else {
+            tcg_gen_xor_tl(cpu_T0, cpu_T0, cpu_T1);
+            gen_op_st_rm_T0_A0(s1, ot, d);
+        }
         gen_op_update1_cc();
         set_cc_op(s1, CC_OP_LOGICB + ot);
         break;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [Qemu-devel] [PATCH v3 18/34] target-i386: emulate LOCK'ed INC using atomic helper
  2016-09-03 20:39 [Qemu-devel] [PATCH v3 00/34] cmpxchg-based emulation of atomics Richard Henderson
                   ` (16 preceding siblings ...)
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 17/34] target-i386: emulate LOCK'ed OP instructions using atomic helpers Richard Henderson
@ 2016-09-03 20:39 ` Richard Henderson
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 19/34] target-i386: emulate LOCK'ed NOT " Richard Henderson
                   ` (19 subsequent siblings)
  37 siblings, 0 replies; 97+ messages in thread
From: Richard Henderson @ 2016-09-03 20:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: Emilio G. Cota

From: "Emilio G. Cota" <cota@braap.org>

[rth: Merge gen_inc_locked back into gen_inc to share cc update.]

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1467054136-10430-14-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-i386/translate.c | 24 +++++++++++++-----------
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/target-i386/translate.c b/target-i386/translate.c
index b999a3a..8cdf3cb 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -1362,21 +1362,23 @@ static void gen_op(DisasContext *s1, int op, TCGMemOp ot, int d)
 /* if d == OR_TMP0, it means memory operand (address in A0) */
 static void gen_inc(DisasContext *s1, TCGMemOp ot, int d, int c)
 {
-    if (d != OR_TMP0) {
-        gen_op_mov_v_reg(ot, cpu_T0, d);
+    if (s1->prefix & PREFIX_LOCK) {
+        tcg_gen_movi_tl(cpu_T0, c > 0 ? 1 : -1);
+        tcg_gen_atomic_add_fetch_tl(cpu_T0, cpu_A0, cpu_T0,
+                                    s1->mem_index, ot | MO_LE);
     } else {
-        gen_op_ld_v(s1, ot, cpu_T0, cpu_A0);
+        if (d != OR_TMP0) {
+            gen_op_mov_v_reg(ot, cpu_T0, d);
+        } else {
+            gen_op_ld_v(s1, ot, cpu_T0, cpu_A0);
+        }
+        tcg_gen_addi_tl(cpu_T0, cpu_T0, (c > 0 ? 1 : -1));
+        gen_op_st_rm_T0_A0(s1, ot, d);
     }
+
     gen_compute_eflags_c(s1, cpu_cc_src);
-    if (c > 0) {
-        tcg_gen_addi_tl(cpu_T0, cpu_T0, 1);
-        set_cc_op(s1, CC_OP_INCB + ot);
-    } else {
-        tcg_gen_addi_tl(cpu_T0, cpu_T0, -1);
-        set_cc_op(s1, CC_OP_DECB + ot);
-    }
-    gen_op_st_rm_T0_A0(s1, ot, d);
     tcg_gen_mov_tl(cpu_cc_dst, cpu_T0);
+    set_cc_op(s1, (c > 0 ? CC_OP_INCB : CC_OP_DECB) + ot);
 }
 
 static void gen_shift_flags(DisasContext *s, TCGMemOp ot, TCGv result,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [Qemu-devel] [PATCH v3 19/34] target-i386: emulate LOCK'ed NOT using atomic helper
  2016-09-03 20:39 [Qemu-devel] [PATCH v3 00/34] cmpxchg-based emulation of atomics Richard Henderson
                   ` (17 preceding siblings ...)
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 18/34] target-i386: emulate LOCK'ed INC using atomic helper Richard Henderson
@ 2016-09-03 20:39 ` Richard Henderson
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 20/34] target-i386: emulate LOCK'ed NEG using cmpxchg helper Richard Henderson
                   ` (18 subsequent siblings)
  37 siblings, 0 replies; 97+ messages in thread
From: Richard Henderson @ 2016-09-03 20:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: Emilio G. Cota

From: "Emilio G. Cota" <cota@braap.org>

[rth: Avoid qemu_load that's redundant with the atomic op.]

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1467054136-10430-15-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-i386/translate.c | 26 ++++++++++++++++++++------
 1 file changed, 20 insertions(+), 6 deletions(-)

diff --git a/target-i386/translate.c b/target-i386/translate.c
index 8cdf3cb..d409a44 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -4675,10 +4675,15 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
         rm = (modrm & 7) | REX_B(s);
         op = (modrm >> 3) & 7;
         if (mod != 3) {
-            if (op == 0)
+            if (op == 0) {
                 s->rip_offset = insn_const_size(ot);
+            }
             gen_lea_modrm(env, s, modrm);
-            gen_op_ld_v(s, ot, cpu_T0, cpu_A0);
+            /* For those below that handle locked memory, don't load here.  */
+            if (!(s->prefix & PREFIX_LOCK)
+                || op != 2) {
+                gen_op_ld_v(s, ot, cpu_T0, cpu_A0);
+            }
         } else {
             gen_op_mov_v_reg(ot, cpu_T0, rm);
         }
@@ -4691,11 +4696,20 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
             set_cc_op(s, CC_OP_LOGICB + ot);
             break;
         case 2: /* not */
-            tcg_gen_not_tl(cpu_T0, cpu_T0);
-            if (mod != 3) {
-                gen_op_st_v(s, ot, cpu_T0, cpu_A0);
+            if (s->prefix & PREFIX_LOCK) {
+                if (mod == 3) {
+                    goto illegal_op;
+                }
+                tcg_gen_movi_tl(cpu_T0, ~0);
+                tcg_gen_atomic_xor_fetch_tl(cpu_T0, cpu_A0, cpu_T0,
+                                            s->mem_index, ot | MO_LE);
             } else {
-                gen_op_mov_reg_v(ot, rm, cpu_T0);
+                tcg_gen_not_tl(cpu_T0, cpu_T0);
+                if (mod != 3) {
+                    gen_op_st_v(s, ot, cpu_T0, cpu_A0);
+                } else {
+                    gen_op_mov_reg_v(ot, rm, cpu_T0);
+                }
             }
             break;
         case 3: /* neg */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [Qemu-devel] [PATCH v3 20/34] target-i386: emulate LOCK'ed NEG using cmpxchg helper
  2016-09-03 20:39 [Qemu-devel] [PATCH v3 00/34] cmpxchg-based emulation of atomics Richard Henderson
                   ` (18 preceding siblings ...)
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 19/34] target-i386: emulate LOCK'ed NOT " Richard Henderson
@ 2016-09-03 20:39 ` Richard Henderson
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 21/34] target-i386: emulate LOCK'ed XADD using atomic helper Richard Henderson
                   ` (17 subsequent siblings)
  37 siblings, 0 replies; 97+ messages in thread
From: Richard Henderson @ 2016-09-03 20:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: Emilio G. Cota

From: "Emilio G. Cota" <cota@braap.org>

[rth: Move redundant qemu_load out of cmpxchg loop.]

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1467054136-10430-16-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-i386/translate.c | 38 ++++++++++++++++++++++++++++++++++----
 1 file changed, 34 insertions(+), 4 deletions(-)

diff --git a/target-i386/translate.c b/target-i386/translate.c
index d409a44..686dab5 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -4713,11 +4713,41 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
             }
             break;
         case 3: /* neg */
-            tcg_gen_neg_tl(cpu_T0, cpu_T0);
-            if (mod != 3) {
-                gen_op_st_v(s, ot, cpu_T0, cpu_A0);
+            if (s->prefix & PREFIX_LOCK) {
+                TCGLabel *label1;
+                TCGv a0, t0, t1, t2;
+
+                if (mod == 3) {
+                    goto illegal_op;
+                }
+                a0 = tcg_temp_local_new();
+                t0 = tcg_temp_local_new();
+                label1 = gen_new_label();
+
+                tcg_gen_mov_tl(a0, cpu_A0);
+                tcg_gen_mov_tl(t0, cpu_T0);
+
+                gen_set_label(label1);
+                t1 = tcg_temp_new();
+                t2 = tcg_temp_new();
+                tcg_gen_mov_tl(t2, t0);
+                tcg_gen_neg_tl(t1, t0);
+                tcg_gen_atomic_cmpxchg_tl(t0, a0, t0, t1,
+                                          s->mem_index, ot | MO_LE);
+                tcg_temp_free(t1);
+                tcg_gen_brcond_tl(TCG_COND_NE, t0, t2, label1);
+
+                tcg_temp_free(t2);
+                tcg_temp_free(a0);
+                tcg_gen_mov_tl(cpu_T0, t0);
+                tcg_temp_free(t0);
             } else {
-                gen_op_mov_reg_v(ot, rm, cpu_T0);
+                tcg_gen_neg_tl(cpu_T0, cpu_T0);
+                if (mod != 3) {
+                    gen_op_st_v(s, ot, cpu_T0, cpu_A0);
+                } else {
+                    gen_op_mov_reg_v(ot, rm, cpu_T0);
+                }
             }
             gen_op_update_neg_cc();
             set_cc_op(s, CC_OP_SUBB + ot);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [Qemu-devel] [PATCH v3 21/34] target-i386: emulate LOCK'ed XADD using atomic helper
  2016-09-03 20:39 [Qemu-devel] [PATCH v3 00/34] cmpxchg-based emulation of atomics Richard Henderson
                   ` (19 preceding siblings ...)
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 20/34] target-i386: emulate LOCK'ed NEG using cmpxchg helper Richard Henderson
@ 2016-09-03 20:39 ` Richard Henderson
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 22/34] target-i386: emulate LOCK'ed BTX ops using atomic helpers Richard Henderson
                   ` (16 subsequent siblings)
  37 siblings, 0 replies; 97+ messages in thread
From: Richard Henderson @ 2016-09-03 20:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: Emilio G. Cota

From: "Emilio G. Cota" <cota@braap.org>

[rth: Move load of reg value to common location.]

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1467054136-10430-17-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-i386/translate.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/target-i386/translate.c b/target-i386/translate.c
index 686dab5..f2f5030 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -5135,19 +5135,24 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
         modrm = cpu_ldub_code(env, s->pc++);
         reg = ((modrm >> 3) & 7) | rex_r;
         mod = (modrm >> 6) & 3;
+        gen_op_mov_v_reg(ot, cpu_T0, reg);
         if (mod == 3) {
             rm = (modrm & 7) | REX_B(s);
-            gen_op_mov_v_reg(ot, cpu_T0, reg);
             gen_op_mov_v_reg(ot, cpu_T1, rm);
             tcg_gen_add_tl(cpu_T0, cpu_T0, cpu_T1);
             gen_op_mov_reg_v(ot, reg, cpu_T1);
             gen_op_mov_reg_v(ot, rm, cpu_T0);
         } else {
             gen_lea_modrm(env, s, modrm);
-            gen_op_mov_v_reg(ot, cpu_T0, reg);
-            gen_op_ld_v(s, ot, cpu_T1, cpu_A0);
-            tcg_gen_add_tl(cpu_T0, cpu_T0, cpu_T1);
-            gen_op_st_v(s, ot, cpu_T0, cpu_A0);
+            if (s->prefix & PREFIX_LOCK) {
+                tcg_gen_atomic_fetch_add_tl(cpu_T1, cpu_A0, cpu_T0,
+                                            s->mem_index, ot | MO_LE);
+                tcg_gen_add_tl(cpu_T0, cpu_T0, cpu_T1);
+            } else {
+                gen_op_ld_v(s, ot, cpu_T1, cpu_A0);
+                tcg_gen_add_tl(cpu_T0, cpu_T0, cpu_T1);
+                gen_op_st_v(s, ot, cpu_T0, cpu_A0);
+            }
             gen_op_mov_reg_v(ot, reg, cpu_T1);
         }
         gen_op_update2_cc();
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [Qemu-devel] [PATCH v3 22/34] target-i386: emulate LOCK'ed BTX ops using atomic helpers
  2016-09-03 20:39 [Qemu-devel] [PATCH v3 00/34] cmpxchg-based emulation of atomics Richard Henderson
                   ` (20 preceding siblings ...)
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 21/34] target-i386: emulate LOCK'ed XADD using atomic helper Richard Henderson
@ 2016-09-03 20:39 ` Richard Henderson
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 23/34] target-i386: emulate XCHG using atomic helper Richard Henderson
                   ` (15 subsequent siblings)
  37 siblings, 0 replies; 97+ messages in thread
From: Richard Henderson @ 2016-09-03 20:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: Emilio G. Cota

From: "Emilio G. Cota" <cota@braap.org>

[rth: Avoid redundant qemu_ld in locked case.  Fix previously unnoticed
incorrect zero-extension of address in register-offset case.]

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1467054136-10430-18-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-i386/translate.c | 87 ++++++++++++++++++++++++++++++++-----------------
 1 file changed, 57 insertions(+), 30 deletions(-)

diff --git a/target-i386/translate.c b/target-i386/translate.c
index f2f5030..2e7cefe 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -6655,7 +6655,9 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
         if (mod != 3) {
             s->rip_offset = 1;
             gen_lea_modrm(env, s, modrm);
-            gen_op_ld_v(s, ot, cpu_T0, cpu_A0);
+            if (!(s->prefix & PREFIX_LOCK)) {
+                gen_op_ld_v(s, ot, cpu_T0, cpu_A0);
+            }
         } else {
             gen_op_mov_v_reg(ot, cpu_T0, rm);
         }
@@ -6685,44 +6687,69 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
         rm = (modrm & 7) | REX_B(s);
         gen_op_mov_v_reg(MO_32, cpu_T1, reg);
         if (mod != 3) {
-            gen_lea_modrm(env, s, modrm);
+            AddressParts a = gen_lea_modrm_0(env, s, modrm);
             /* specific case: we need to add a displacement */
             gen_exts(ot, cpu_T1);
             tcg_gen_sari_tl(cpu_tmp0, cpu_T1, 3 + ot);
             tcg_gen_shli_tl(cpu_tmp0, cpu_tmp0, ot);
-            tcg_gen_add_tl(cpu_A0, cpu_A0, cpu_tmp0);
-            gen_op_ld_v(s, ot, cpu_T0, cpu_A0);
+            tcg_gen_add_tl(cpu_A0, gen_lea_modrm_1(a), cpu_tmp0);
+            gen_lea_v_seg(s, s->aflag, cpu_A0, a.def_seg, s->override);
+            if (!(s->prefix & PREFIX_LOCK)) {
+                gen_op_ld_v(s, ot, cpu_T0, cpu_A0);
+            }
         } else {
             gen_op_mov_v_reg(ot, cpu_T0, rm);
         }
     bt_op:
         tcg_gen_andi_tl(cpu_T1, cpu_T1, (1 << (3 + ot)) - 1);
-        tcg_gen_shr_tl(cpu_tmp4, cpu_T0, cpu_T1);
-        switch(op) {
-        case 0:
-            break;
-        case 1:
-            tcg_gen_movi_tl(cpu_tmp0, 1);
-            tcg_gen_shl_tl(cpu_tmp0, cpu_tmp0, cpu_T1);
-            tcg_gen_or_tl(cpu_T0, cpu_T0, cpu_tmp0);
-            break;
-        case 2:
-            tcg_gen_movi_tl(cpu_tmp0, 1);
-            tcg_gen_shl_tl(cpu_tmp0, cpu_tmp0, cpu_T1);
-            tcg_gen_andc_tl(cpu_T0, cpu_T0, cpu_tmp0);
-            break;
-        default:
-        case 3:
-            tcg_gen_movi_tl(cpu_tmp0, 1);
-            tcg_gen_shl_tl(cpu_tmp0, cpu_tmp0, cpu_T1);
-            tcg_gen_xor_tl(cpu_T0, cpu_T0, cpu_tmp0);
-            break;
-        }
-        if (op != 0) {
-            if (mod != 3) {
-                gen_op_st_v(s, ot, cpu_T0, cpu_A0);
-            } else {
-                gen_op_mov_reg_v(ot, rm, cpu_T0);
+        tcg_gen_movi_tl(cpu_tmp0, 1);
+        tcg_gen_shl_tl(cpu_tmp0, cpu_tmp0, cpu_T1);
+        if (s->prefix & PREFIX_LOCK) {
+            switch (op) {
+            case 0: /* bt */
+                /* Needs no atomic ops; we surpressed the normal
+                   memory load for LOCK above so do it now.  */
+                gen_op_ld_v(s, ot, cpu_T0, cpu_A0);
+                break;
+            case 1: /* bts */
+                tcg_gen_atomic_fetch_or_tl(cpu_T0, cpu_A0, cpu_tmp0,
+                                           s->mem_index, ot | MO_LE);
+                break;
+            case 2: /* btr */
+                tcg_gen_not_tl(cpu_tmp0, cpu_tmp0);
+                tcg_gen_atomic_fetch_and_tl(cpu_T0, cpu_A0, cpu_tmp0,
+                                            s->mem_index, ot | MO_LE);
+                break;
+            default:
+            case 3: /* btc */
+                tcg_gen_atomic_fetch_xor_tl(cpu_T0, cpu_A0, cpu_tmp0,
+                                            s->mem_index, ot | MO_LE);
+                break;
+            }
+            tcg_gen_shr_tl(cpu_tmp4, cpu_T0, cpu_T1);
+        } else {
+            tcg_gen_shr_tl(cpu_tmp4, cpu_T0, cpu_T1);
+            switch (op) {
+            case 0: /* bt */
+                /* Data already loaded; nothing to do.  */
+                break;
+            case 1: /* bts */
+                tcg_gen_or_tl(cpu_T0, cpu_T0, cpu_tmp0);
+                break;
+            case 2: /* btr */
+                tcg_gen_andc_tl(cpu_T0, cpu_T0, cpu_tmp0);
+                break;
+            default:
+            case 3: /* btc */
+                tcg_gen_xor_tl(cpu_T0, cpu_T0, cpu_tmp0);
+                break;
+            }
+            if (op != 0) {
+                if (mod != 3) {
+                    gen_op_st_v(s, ot, cpu_T0, cpu_A0);
+                } else {
+                    gen_op_mov_reg_v(ot, rm, cpu_T0);
+                }
             }
         }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [Qemu-devel] [PATCH v3 23/34] target-i386: emulate XCHG using atomic helper
  2016-09-03 20:39 [Qemu-devel] [PATCH v3 00/34] cmpxchg-based emulation of atomics Richard Henderson
                   ` (21 preceding siblings ...)
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 22/34] target-i386: emulate LOCK'ed BTX ops using atomic helpers Richard Henderson
@ 2016-09-03 20:39 ` Richard Henderson
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 24/34] target-i386: remove helper_lock() Richard Henderson
                   ` (14 subsequent siblings)
  37 siblings, 0 replies; 97+ messages in thread
From: Richard Henderson @ 2016-09-03 20:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: Emilio G. Cota

From: "Emilio G. Cota" <cota@braap.org>

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1467054136-10430-19-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-i386/translate.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/target-i386/translate.c b/target-i386/translate.c
index 2e7cefe..b8c5dde 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -5564,12 +5564,8 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
             gen_lea_modrm(env, s, modrm);
             gen_op_mov_v_reg(ot, cpu_T0, reg);
             /* for xchg, lock is implicit */
-            if (!(prefixes & PREFIX_LOCK))
-                gen_helper_lock();
-            gen_op_ld_v(s, ot, cpu_T1, cpu_A0);
-            gen_op_st_v(s, ot, cpu_T0, cpu_A0);
-            if (!(prefixes & PREFIX_LOCK))
-                gen_helper_unlock();
+            tcg_gen_atomic_xchg_tl(cpu_T1, cpu_A0, cpu_T0,
+                                   s->mem_index, ot | MO_LE);
             gen_op_mov_reg_v(ot, reg, cpu_T1);
         }
         break;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [Qemu-devel] [PATCH v3 24/34] target-i386: remove helper_lock()
  2016-09-03 20:39 [Qemu-devel] [PATCH v3 00/34] cmpxchg-based emulation of atomics Richard Henderson
                   ` (22 preceding siblings ...)
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 23/34] target-i386: emulate XCHG using atomic helper Richard Henderson
@ 2016-09-03 20:39 ` Richard Henderson
  2016-09-14 11:14   ` Alex Bennée
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 25/34] tests: add atomic_add-bench Richard Henderson
                   ` (13 subsequent siblings)
  37 siblings, 1 reply; 97+ messages in thread
From: Richard Henderson @ 2016-09-03 20:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: Emilio G. Cota

From: "Emilio G. Cota" <cota@braap.org>

It's been superseded by the atomic helpers.

The use of the atomic helpers provides a significant performance and scalability
improvement. Below is the result of running the atomic_add-test microbenchmark with:
 $ x86_64-linux-user/qemu-x86_64 tests/atomic_add-bench -o 5000000 -r $r -n $n
, where $n is the number of threads and $r is the allowed range for the additions.

The scenarios measured are:
- atomic: implements x86' ADDL with the atomic_add helper (i.e. this patchset)
- cmpxchg: implement x86' ADDL with a TCG loop using the cmpxchg helper
- master: before this patchset

Results sorted in ascending range, i.e. descending degree of contention.
Y axis is Throughput in Mops/s. Tests are run on an AMD machine with 64
Opteron 6376 cores.

                atomic_add-bench: 5000000 ops/thread, [0,1] range

  25 ++---------+----------+---------+----------+----------+----------+---++
     + atomic +-E--+       +         +          +          +          +    |
     |cmpxchg +-H--+                                                       |
  20 +Emaster +-N--+                                                      ++
     ||                                                                    |
     |++                                                                   |
     ||                                                                    |
  15 +++                                                                  ++
     |N|                                                                   |
     |+|                                                                   |
  10 ++|                                                                  ++
     |+|+                                                                  |
     | |    -+E+------        +++  ---+E+------+E+------+E+-----+E+------+E|
     |+E+E+- +++     +E+------+E+--                                        |
   5 ++|+                                                                 ++
     |+N+H+---                                 +++                         |
     ++++N+--+H++----+++   +  +++  --++H+------+H+------+H++----+H+---+--- |
   0 ++---------+-----H----+---H-----+----------+----------+----------+---H+
     0          10         20        30         40         50         60
                                Number of threads

                atomic_add-bench: 5000000 ops/thread, [0,2] range

  25 ++---------+----------+---------+----------+----------+----------+---++
     ++atomic +-E--+       +         +          +          +          +    |
     |cmpxchg +-H--+                                                       |
  20 ++master +-N--+                                                      ++
     |E|                                                                   |
     |++                                                                   |
     ||E                                                                   |
  15 ++|                                                                  ++
     |N||                                                                  |
     |+||                                   ---+E+------+E+-----+E+------+E|
  10 ++| |        ---+E+------+E+-----+E+---                    +++      +++
     ||H+E+--+E+--                                                         |
     |+++++                                                                |
     | ||                                                                  |
   5 ++|+H+--                                  +++                        ++
     |+N+    -                              ---+H+------+H+------          |
     +  +N+--+H++----+H+---+--+H+----++H+---    +          +    +H+---+--+H|
   0 ++---------+----------+---------+----------+----------+----------+---++
     0          10         20        30         40         50         60
                                Number of threads

                atomic_add-bench: 5000000 ops/thread, [0,8] range

  40 ++---------+----------+---------+----------+----------+----------+---++
     ++atomic +-E--+       +         +          +          +          +    |
  35 +cmpxchg +-H--+                                                      ++
     | master +-N--+               ---+E+------+E+------+E+-----+E+------+E|
  30 ++|                   ---+E+--   +++                                 ++
     | |            -+E+---                                                |
  25 ++E        ---- +++                                                  ++
     |+++++ -+E+                                                           |
  20 +E+ E-- +++                                                          ++
     |H|+++                                                                |
     |+|                                       +H+-------                  |
  15 ++H+                                   ---+++      +H+------         ++
     |N++H+--                         +++---                    +H+------++|
  10 ++ +++  -       +++           ---+H+                       +++      +H+
     | |     +H+-----+H+------+H+--                                        |
   5 ++|                      +++                                         ++
     ++N+N+--+N++          +         +          +          +          +    |
   0 ++---------+----------+---------+----------+----------+----------+---++
     0          10         20        30         40         50         60
                                Number of threads

               atomic_add-bench: 5000000 ops/thread, [0,128] range

  160 ++---------+---------+----------+---------+----------+----------+---++
      + atomic +-E--+      +          +         +          +          +    |
  140 +cmpxchg +-H--+                          +++      +++               ++
      | master +-N--+                           E--------E------+E+------++|
  120 ++                                      --|        |      +++       E+
      |                                     -- +++      +++              ++|
  100 ++                                   -                              ++
      |                                +++-                     +++      ++|
   80 ++                              -+E+    -+H+------+H+------H--------++
      |                           ----    ----                  +++       H|
      |            ---+E+-----+E+-  ---+H+                               ++|
   60 ++     +E+---   +++  ---+H+---                                      ++
      |    --+++   ---+H+--                                                |
   40 ++ +E+-+H+---                                                       ++
      |  +H+                                                               |
   20 +EE+                                                                ++
      +N+        +         +          +         +          +          +    |
    0 ++N-N---N--+---------+----------+---------+----------+----------+---++
      0          10        20         30        40         50         60
                                Number of threads

              atomic_add-bench: 5000000 ops/thread, [0,1024] range

  350 ++---------+---------+----------+---------+----------+----------+---++
      + atomic +-E--+      +          +         +          +          +    |
  300 +cmpxchg +-H--+                                                    +++
      | master +-N--+                                           +++       ||
      |                                                 +++      |    ----E|
  250 ++                                                 |   ----E----    ++
      |                                              ----E---    |    ---+H|
  200 ++                                      -+E+---   +++  ---+H+---    ++
      |                                   ----         -+H+--              |
      |                                +E+     +++ ---- +++                |
  150 ++                            ---+++  ---+H+-                       ++
      |                          ---  -+H+--                               |
  100 ++                   ---+E+ ---- +++                                ++
      |      +++   ---+E+-----+H+-                                         |
      |     -+E+------+H+--                                                |
   50 ++ +E+                                                              ++
      +EE+       +         +          +         +          +          +    |
    0 ++N-N---N--+---------+----------+---------+----------+----------+---++
      0          10        20         30        40         50         60
                                Number of threads

  hi-res: http://imgur.com/a/fMRmq

For master I stopped measuring master after 8 threads, because there is little
point in measuring the well-known performance collapse of a contended lock.

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1467054136-10430-21-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-i386/helper.h     |  2 --
 target-i386/mem_helper.c | 33 ---------------------------------
 target-i386/translate.c  | 15 ---------------
 3 files changed, 50 deletions(-)

diff --git a/target-i386/helper.h b/target-i386/helper.h
index 729d4b6..4e859eb 100644
--- a/target-i386/helper.h
+++ b/target-i386/helper.h
@@ -1,8 +1,6 @@
 DEF_HELPER_FLAGS_4(cc_compute_all, TCG_CALL_NO_RWG_SE, tl, tl, tl, tl, int)
 DEF_HELPER_FLAGS_4(cc_compute_c, TCG_CALL_NO_RWG_SE, tl, tl, tl, tl, int)
 
-DEF_HELPER_0(lock, void)
-DEF_HELPER_0(unlock, void)
 DEF_HELPER_3(write_eflags, void, env, tl, i32)
 DEF_HELPER_1(read_eflags, tl, env)
 DEF_HELPER_2(divb_AL, void, env, tl)
diff --git a/target-i386/mem_helper.c b/target-i386/mem_helper.c
index 8b72fda..f8d2a1e 100644
--- a/target-i386/mem_helper.c
+++ b/target-i386/mem_helper.c
@@ -25,39 +25,6 @@
 #include "qemu/int128.h"
 #include "tcg.h"
 
-/* broken thread support */
-
-#if defined(CONFIG_USER_ONLY)
-QemuMutex global_cpu_lock;
-
-void helper_lock(void)
-{
-    qemu_mutex_lock(&global_cpu_lock);
-}
-
-void helper_unlock(void)
-{
-    qemu_mutex_unlock(&global_cpu_lock);
-}
-
-void helper_lock_init(void)
-{
-    qemu_mutex_init(&global_cpu_lock);
-}
-#else
-void helper_lock(void)
-{
-}
-
-void helper_unlock(void)
-{
-}
-
-void helper_lock_init(void)
-{
-}
-#endif
-
 void helper_cmpxchg8b_unlocked(CPUX86State *env, target_ulong a0)
 {
     uintptr_t ra = GETPC();
diff --git a/target-i386/translate.c b/target-i386/translate.c
index b8c5dde..755a18f 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -4537,10 +4537,6 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
     s->aflag = aflag;
     s->dflag = dflag;
 
-    /* lock generation */
-    if (prefixes & PREFIX_LOCK)
-        gen_helper_lock();
-
     /* now check op code */
  reswitch:
     switch(b) {
@@ -8203,20 +8199,11 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
     default:
         goto unknown_op;
     }
-    /* lock generation */
-    if (s->prefix & PREFIX_LOCK)
-        gen_helper_unlock();
     return s->pc;
  illegal_op:
-    if (s->prefix & PREFIX_LOCK)
-        gen_helper_unlock();
-    /* XXX: ensure that no lock was generated */
     gen_illegal_opcode(s);
     return s->pc;
  unknown_op:
-    if (s->prefix & PREFIX_LOCK)
-        gen_helper_unlock();
-    /* XXX: ensure that no lock was generated */
     gen_unknown_opcode(env, s);
     return s->pc;
 }
@@ -8308,8 +8295,6 @@ void tcg_x86_init(void)
                                      offsetof(CPUX86State, bnd_regs[i].ub),
                                      bnd_regu_names[i]);
     }
-
-    helper_lock_init();
 }
 
 /* generate intermediate code for basic block 'tb'.  */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [Qemu-devel] [PATCH v3 25/34] tests: add atomic_add-bench
  2016-09-03 20:39 [Qemu-devel] [PATCH v3 00/34] cmpxchg-based emulation of atomics Richard Henderson
                   ` (23 preceding siblings ...)
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 24/34] target-i386: remove helper_lock() Richard Henderson
@ 2016-09-03 20:39 ` Richard Henderson
  2016-09-14 13:53   ` Alex Bennée
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 26/34] target-arm: Rearrange aa32 load and store functions Richard Henderson
                   ` (12 subsequent siblings)
  37 siblings, 1 reply; 97+ messages in thread
From: Richard Henderson @ 2016-09-03 20:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: Emilio G. Cota

From: "Emilio G. Cota" <cota@braap.org>

With this microbenchmark we can measure the overhead of emulating atomic
instructions with a configurable degree of contention.

The benchmark spawns $n threads, each performing $o atomic ops (additions)
in a loop. Each atomic operation is performed on a different cache line
(assuming lines are 64b long) that is randomly selected from a range [0, $r).

[ Note: each $foo corresponds to a -foo flag ]

Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Message-Id: <1467054136-10430-20-git-send-email-cota@braap.org>
---
 tests/.gitignore         |   1 +
 tests/Makefile.include   |   4 +-
 tests/atomic_add-bench.c | 180 +++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 184 insertions(+), 1 deletion(-)
 create mode 100644 tests/atomic_add-bench.c

diff --git a/tests/.gitignore b/tests/.gitignore
index dbb5263..ec3137a 100644
--- a/tests/.gitignore
+++ b/tests/.gitignore
@@ -1,3 +1,4 @@
+atomic_add-bench
 check-qdict
 check-qfloat
 check-qint
diff --git a/tests/Makefile.include b/tests/Makefile.include
index 14be491..e1957ed 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -421,7 +421,8 @@ test-obj-y = tests/check-qint.o tests/check-qstring.o tests/check-qdict.o \
 	tests/test-opts-visitor.o tests/test-qmp-event.o \
 	tests/rcutorture.o tests/test-rcu-list.o \
 	tests/test-qdist.o \
-	tests/test-qht.o tests/qht-bench.o tests/test-qht-par.o
+	tests/test-qht.o tests/qht-bench.o tests/test-qht-par.o \
+	tests/atomic_add-bench.o
 
 $(test-obj-y): QEMU_INCLUDES += -Itests
 QEMU_CFLAGS += -I$(SRC_PATH)/tests
@@ -465,6 +466,7 @@ tests/test-qdist$(EXESUF): tests/test-qdist.o $(test-util-obj-y)
 tests/test-qht$(EXESUF): tests/test-qht.o $(test-util-obj-y)
 tests/test-qht-par$(EXESUF): tests/test-qht-par.o tests/qht-bench$(EXESUF) $(test-util-obj-y)
 tests/qht-bench$(EXESUF): tests/qht-bench.o $(test-util-obj-y)
+tests/atomic_add-bench$(EXESUF): tests/atomic_add-bench.o $(test-util-obj-y)
 
 tests/test-qdev-global-props$(EXESUF): tests/test-qdev-global-props.o \
 	hw/core/qdev.o hw/core/qdev-properties.o hw/core/hotplug.o\
diff --git a/tests/atomic_add-bench.c b/tests/atomic_add-bench.c
new file mode 100644
index 0000000..5bbecf6
--- /dev/null
+++ b/tests/atomic_add-bench.c
@@ -0,0 +1,180 @@
+#include "qemu/osdep.h"
+#include "qemu/thread.h"
+#include "qemu/host-utils.h"
+#include "qemu/processor.h"
+
+struct thread_info {
+    uint64_t r;
+} QEMU_ALIGNED(64);
+
+struct count {
+    unsigned long val;
+} QEMU_ALIGNED(64);
+
+static QemuThread *threads;
+static struct thread_info *th_info;
+static unsigned int n_threads = 1;
+static unsigned int n_ready_threads;
+static struct count *counts;
+static unsigned long n_ops = 10000;
+static double duration;
+static unsigned int range = 1;
+static bool test_start;
+
+static const char commands_string[] =
+    " -n = number of threads\n"
+    " -o = number of ops per thread\n"
+    " -r = range (will be rounded up to pow2)";
+
+static void usage_complete(char *argv[])
+{
+    fprintf(stderr, "Usage: %s [options]\n", argv[0]);
+    fprintf(stderr, "options:\n%s\n", commands_string);
+}
+
+/*
+ * From: https://en.wikipedia.org/wiki/Xorshift
+ * This is faster than rand_r(), and gives us a wider range (RAND_MAX is only
+ * guaranteed to be >= INT_MAX).
+ */
+static uint64_t xorshift64star(uint64_t x)
+{
+    x ^= x >> 12; /* a */
+    x ^= x << 25; /* b */
+    x ^= x >> 27; /* c */
+    return x * UINT64_C(2685821657736338717);
+}
+
+static void *thread_func(void *arg)
+{
+    struct thread_info *info = arg;
+    unsigned long i;
+
+    atomic_inc(&n_ready_threads);
+    while (!atomic_mb_read(&test_start)) {
+        cpu_relax();
+    }
+
+    for (i = 0; i < n_ops; i++) {
+        unsigned int index;
+
+        info->r = xorshift64star(info->r);
+        index = info->r & (range - 1);
+        atomic_inc(&counts[index].val);
+    }
+    return NULL;
+}
+
+static inline
+uint64_t ts_subtract(const struct timespec *a, const struct timespec *b)
+{
+    uint64_t ns;
+
+    ns = (b->tv_sec - a->tv_sec) * 1000000000ULL;
+    ns += (b->tv_nsec - a->tv_nsec);
+    return ns;
+}
+
+static void run_test(void)
+{
+    unsigned int i;
+    struct timespec ts_start, ts_end;
+
+    while (atomic_read(&n_ready_threads) != n_threads) {
+        cpu_relax();
+    }
+    atomic_mb_set(&test_start, true);
+
+    clock_gettime(CLOCK_MONOTONIC, &ts_start);
+    for (i = 0; i < n_threads; i++) {
+        qemu_thread_join(&threads[i]);
+    }
+    clock_gettime(CLOCK_MONOTONIC, &ts_end);
+    duration = ts_subtract(&ts_start, &ts_end) / 1e9;
+}
+
+static void create_threads(void)
+{
+    unsigned int i;
+
+    threads = g_new(QemuThread, n_threads);
+    th_info = g_new(struct thread_info, n_threads);
+    counts = qemu_memalign(64, sizeof(*counts) * range);
+
+    for (i = 0; i < n_threads; i++) {
+        struct thread_info *info = &th_info[i];
+
+        info->r = (i + 1) ^ time(NULL);
+        qemu_thread_create(&threads[i], NULL, thread_func, info,
+                           QEMU_THREAD_JOINABLE);
+    }
+}
+
+static void pr_params(void)
+{
+    printf("Parameters:\n");
+    printf(" # of threads:      %u\n", n_threads);
+    printf(" n_ops:             %lu\n", n_ops);
+    printf(" ops' range:        %u\n", range);
+}
+
+static void pr_stats(void)
+{
+    unsigned long long val = 0;
+    unsigned int i;
+    double tx;
+
+    for (i = 0; i < range; i++) {
+        val += counts[i].val;
+    }
+    assert(val == n_threads * n_ops);
+    tx = val / duration / 1e6;
+
+    printf("Results:\n");
+    printf("Duration:            %.2f s\n", duration);
+    printf(" Throughput:         %.2f Mops/s\n", tx);
+    printf(" Throughput/thread:  %.2f Mops/s/thread\n", tx / n_threads);
+}
+
+static void parse_args(int argc, char *argv[])
+{
+    unsigned long long n_ops_ull;
+    int c;
+
+    for (;;) {
+        c = getopt(argc, argv, "hn:o:r:");
+        if (c < 0) {
+            break;
+        }
+        switch (c) {
+        case 'h':
+            usage_complete(argv);
+            exit(0);
+        case 'n':
+            n_threads = atoi(optarg);
+            break;
+        case 'o':
+            n_ops_ull = atoll(optarg);
+            if (n_ops_ull > ULONG_MAX) {
+                fprintf(stderr,
+                        "fatal: -o cannot be greater than %lu\n", ULONG_MAX);
+                exit(1);
+            }
+            n_ops = n_ops_ull;
+            break;
+        case 'r':
+            range = pow2ceil(atoi(optarg));
+            break;
+        }
+    }
+}
+
+int main(int argc, char *argv[])
+{
+    parse_args(argc, argv);
+    pr_params();
+    create_threads();
+    run_test();
+    pr_stats();
+    return 0;
+}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [Qemu-devel] [PATCH v3 26/34] target-arm: Rearrange aa32 load and store functions
  2016-09-03 20:39 [Qemu-devel] [PATCH v3 00/34] cmpxchg-based emulation of atomics Richard Henderson
                   ` (24 preceding siblings ...)
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 25/34] tests: add atomic_add-bench Richard Henderson
@ 2016-09-03 20:39 ` Richard Henderson
  2016-09-14 15:58   ` Alex Bennée
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 27/34] target-arm: emulate LL/SC using cmpxchg helpers Richard Henderson
                   ` (11 subsequent siblings)
  37 siblings, 1 reply; 97+ messages in thread
From: Richard Henderson @ 2016-09-03 20:39 UTC (permalink / raw)
  To: qemu-devel

Stop specializing on TARGET_LONG_BITS == 32; unconditionally allocate
a temp and expand with tcg_gen_extu_i32_tl.  Split out gen_aa32_addr,
gen_aa32_frob64, gen_aa32_ld_i32 and gen_aa32_st_i32 as separate interfaces.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-arm/translate.c | 171 +++++++++++++++++++------------------------------
 1 file changed, 66 insertions(+), 105 deletions(-)

diff --git a/target-arm/translate.c b/target-arm/translate.c
index bd5d5cb..1b5bf87 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -926,145 +926,106 @@ static inline void store_reg_from_load(DisasContext *s, int reg, TCGv_i32 var)
  * These functions work like tcg_gen_qemu_{ld,st}* except
  * that the address argument is TCGv_i32 rather than TCGv.
  */
-#if TARGET_LONG_BITS == 32
 
-#define DO_GEN_LD(SUFF, OPC, BE32_XOR)                                   \
-static inline void gen_aa32_ld##SUFF(DisasContext *s, TCGv_i32 val,      \
-                                     TCGv_i32 addr, int index)           \
-{                                                                        \
-    TCGMemOp opc = (OPC) | s->be_data;                                   \
-    /* Not needed for user-mode BE32, where we use MO_BE instead.  */    \
-    if (!IS_USER_ONLY && s->sctlr_b && BE32_XOR) {                       \
-        TCGv addr_be = tcg_temp_new();                                   \
-        tcg_gen_xori_i32(addr_be, addr, BE32_XOR);                       \
-        tcg_gen_qemu_ld_i32(val, addr_be, index, opc);                   \
-        tcg_temp_free(addr_be);                                          \
-        return;                                                          \
-    }                                                                    \
-    tcg_gen_qemu_ld_i32(val, addr, index, opc);                          \
-}
-
-#define DO_GEN_ST(SUFF, OPC, BE32_XOR)                                   \
-static inline void gen_aa32_st##SUFF(DisasContext *s, TCGv_i32 val,      \
-                                     TCGv_i32 addr, int index)           \
-{                                                                        \
-    TCGMemOp opc = (OPC) | s->be_data;                                   \
-    /* Not needed for user-mode BE32, where we use MO_BE instead.  */    \
-    if (!IS_USER_ONLY && s->sctlr_b && BE32_XOR) {                       \
-        TCGv addr_be = tcg_temp_new();                                   \
-        tcg_gen_xori_i32(addr_be, addr, BE32_XOR);                       \
-        tcg_gen_qemu_st_i32(val, addr_be, index, opc);                   \
-        tcg_temp_free(addr_be);                                          \
-        return;                                                          \
-    }                                                                    \
-    tcg_gen_qemu_st_i32(val, addr, index, opc);                          \
-}
-
-static inline void gen_aa32_ld64(DisasContext *s, TCGv_i64 val,
-                                 TCGv_i32 addr, int index)
+static inline TCGv gen_aa32_addr(DisasContext *s, TCGv_i32 a32, TCGMemOp op)
 {
-    TCGMemOp opc = MO_Q | s->be_data;
-    tcg_gen_qemu_ld_i64(val, addr, index, opc);
+    TCGv addr = tcg_temp_new();
+    tcg_gen_extu_i32_tl(addr, a32);
+
     /* Not needed for user-mode BE32, where we use MO_BE instead.  */
-    if (!IS_USER_ONLY && s->sctlr_b) {
-        tcg_gen_rotri_i64(val, val, 32);
+    if (!IS_USER_ONLY && s->sctlr_b && (op & MO_SIZE) < MO_32) {
+        tcg_gen_xori_tl(addr, addr, 4 - (1 << (op & MO_SIZE)));
     }
+    return addr;
 }
 
-static inline void gen_aa32_st64(DisasContext *s, TCGv_i64 val,
-                                 TCGv_i32 addr, int index)
+static void gen_aa32_ld_i32(DisasContext *s, TCGv_i32 val, TCGv_i32 a32,
+                            int index, TCGMemOp opc)
 {
-    TCGMemOp opc = MO_Q | s->be_data;
-    /* Not needed for user-mode BE32, where we use MO_BE instead.  */
-    if (!IS_USER_ONLY && s->sctlr_b) {
-        TCGv_i64 tmp = tcg_temp_new_i64();
-        tcg_gen_rotri_i64(tmp, val, 32);
-        tcg_gen_qemu_st_i64(tmp, addr, index, opc);
-        tcg_temp_free_i64(tmp);
-        return;
-    }
-    tcg_gen_qemu_st_i64(val, addr, index, opc);
+    TCGv addr = gen_aa32_addr(s, a32, opc);
+    tcg_gen_qemu_ld_i32(val, addr, index, opc);
+    tcg_temp_free(addr);
 }
 
-#else
+static void gen_aa32_st_i32(DisasContext *s, TCGv_i32 val, TCGv_i32 a32,
+                            int index, TCGMemOp opc)
+{
+    TCGv addr = gen_aa32_addr(s, a32, opc);
+    tcg_gen_qemu_st_i32(val, addr, index, opc);
+    tcg_temp_free(addr);
+}
 
-#define DO_GEN_LD(SUFF, OPC, BE32_XOR)                                   \
+#define DO_GEN_LD(SUFF, OPC)                                             \
 static inline void gen_aa32_ld##SUFF(DisasContext *s, TCGv_i32 val,      \
-                                     TCGv_i32 addr, int index)           \
+                                     TCGv_i32 a32, int index)            \
 {                                                                        \
-    TCGMemOp opc = (OPC) | s->be_data;                                   \
-    TCGv addr64 = tcg_temp_new();                                        \
-    tcg_gen_extu_i32_i64(addr64, addr);                                  \
-    /* Not needed for user-mode BE32, where we use MO_BE instead.  */    \
-    if (!IS_USER_ONLY && s->sctlr_b && BE32_XOR) {                       \
-        tcg_gen_xori_i64(addr64, addr64, BE32_XOR);                      \
-    }                                                                    \
-    tcg_gen_qemu_ld_i32(val, addr64, index, opc);                        \
-    tcg_temp_free(addr64);                                               \
-}
-
-#define DO_GEN_ST(SUFF, OPC, BE32_XOR)                                   \
+    gen_aa32_ld_i32(s, val, a32, index, OPC | s->be_data);               \
+}
+
+#define DO_GEN_ST(SUFF, OPC)                                             \
 static inline void gen_aa32_st##SUFF(DisasContext *s, TCGv_i32 val,      \
-                                     TCGv_i32 addr, int index)           \
+                                     TCGv_i32 a32, int index)            \
 {                                                                        \
-    TCGMemOp opc = (OPC) | s->be_data;                                   \
-    TCGv addr64 = tcg_temp_new();                                        \
-    tcg_gen_extu_i32_i64(addr64, addr);                                  \
-    /* Not needed for user-mode BE32, where we use MO_BE instead.  */    \
-    if (!IS_USER_ONLY && s->sctlr_b && BE32_XOR) {                       \
-        tcg_gen_xori_i64(addr64, addr64, BE32_XOR);                      \
-    }                                                                    \
-    tcg_gen_qemu_st_i32(val, addr64, index, opc);                        \
-    tcg_temp_free(addr64);                                               \
+    gen_aa32_st_i32(s, val, a32, index, OPC | s->be_data);               \
 }
 
-static inline void gen_aa32_ld64(DisasContext *s, TCGv_i64 val,
-                                 TCGv_i32 addr, int index)
+static inline void gen_aa32_frob64(DisasContext *s, TCGv_i64 val)
 {
-    TCGMemOp opc = MO_Q | s->be_data;
-    TCGv addr64 = tcg_temp_new();
-    tcg_gen_extu_i32_i64(addr64, addr);
-    tcg_gen_qemu_ld_i64(val, addr64, index, opc);
-
     /* Not needed for user-mode BE32, where we use MO_BE instead.  */
     if (!IS_USER_ONLY && s->sctlr_b) {
         tcg_gen_rotri_i64(val, val, 32);
     }
-    tcg_temp_free(addr64);
 }
 
-static inline void gen_aa32_st64(DisasContext *s, TCGv_i64 val,
-                                 TCGv_i32 addr, int index)
+static void gen_aa32_ld_i64(DisasContext *s, TCGv_i64 val, TCGv_i32 a32,
+                            int index, TCGMemOp opc)
 {
-    TCGMemOp opc = MO_Q | s->be_data;
-    TCGv addr64 = tcg_temp_new();
-    tcg_gen_extu_i32_i64(addr64, addr);
+    TCGv addr = gen_aa32_addr(s, a32, opc);
+    tcg_gen_qemu_ld_i64(val, addr, index, opc);
+    gen_aa32_frob64(s, val);
+    tcg_temp_free(addr);
+}
+
+static inline void gen_aa32_ld64(DisasContext *s, TCGv_i64 val,
+                                 TCGv_i32 a32, int index)
+{
+    gen_aa32_ld_i64(s, val, a32, index, MO_Q | s->be_data);
+}
+
+static void gen_aa32_st_i64(DisasContext *s, TCGv_i64 val, TCGv_i32 a32,
+                            int index, TCGMemOp opc)
+{
+    TCGv addr = gen_aa32_addr(s, a32, opc);
 
     /* Not needed for user-mode BE32, where we use MO_BE instead.  */
     if (!IS_USER_ONLY && s->sctlr_b) {
-        TCGv tmp = tcg_temp_new();
+        TCGv_i64 tmp = tcg_temp_new_i64();
         tcg_gen_rotri_i64(tmp, val, 32);
-        tcg_gen_qemu_st_i64(tmp, addr64, index, opc);
-        tcg_temp_free(tmp);
+        tcg_gen_qemu_st_i64(tmp, addr, index, opc);
+        tcg_temp_free_i64(tmp);
     } else {
-        tcg_gen_qemu_st_i64(val, addr64, index, opc);
+        tcg_gen_qemu_st_i64(val, addr, index, opc);
     }
-    tcg_temp_free(addr64);
+    tcg_temp_free(addr);
 }
 
-#endif
+static inline void gen_aa32_st64(DisasContext *s, TCGv_i64 val,
+                                 TCGv_i32 a32, int index)
+{
+    gen_aa32_st_i64(s, val, a32, index, MO_Q | s->be_data);
+}
 
-DO_GEN_LD(8s, MO_SB, 3)
-DO_GEN_LD(8u, MO_UB, 3)
-DO_GEN_LD(16s, MO_SW, 2)
-DO_GEN_LD(16u, MO_UW, 2)
-DO_GEN_LD(32u, MO_UL, 0)
+DO_GEN_LD(8s, MO_SB)
+DO_GEN_LD(8u, MO_UB)
+DO_GEN_LD(16s, MO_SW)
+DO_GEN_LD(16u, MO_UW)
+DO_GEN_LD(32u, MO_UL)
 /* 'a' variants include an alignment check */
-DO_GEN_LD(16ua, MO_UW | MO_ALIGN, 2)
-DO_GEN_LD(32ua, MO_UL | MO_ALIGN, 0)
-DO_GEN_ST(8, MO_UB, 3)
-DO_GEN_ST(16, MO_UW, 2)
-DO_GEN_ST(32, MO_UL, 0)
+DO_GEN_LD(16ua, MO_UW | MO_ALIGN)
+DO_GEN_LD(32ua, MO_UL | MO_ALIGN)
+DO_GEN_ST(8, MO_UB)
+DO_GEN_ST(16, MO_UW)
+DO_GEN_ST(32, MO_UL)
 
 static inline void gen_set_pc_im(DisasContext *s, target_ulong val)
 {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [Qemu-devel] [PATCH v3 27/34] target-arm: emulate LL/SC using cmpxchg helpers
  2016-09-03 20:39 [Qemu-devel] [PATCH v3 00/34] cmpxchg-based emulation of atomics Richard Henderson
                   ` (25 preceding siblings ...)
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 26/34] target-arm: Rearrange aa32 load and store functions Richard Henderson
@ 2016-09-03 20:39 ` Richard Henderson
  2016-09-14 16:03   ` Alex Bennée
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 28/34] target-arm: emulate SWP with atomic_xchg helper Richard Henderson
                   ` (10 subsequent siblings)
  37 siblings, 1 reply; 97+ messages in thread
From: Richard Henderson @ 2016-09-03 20:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: Emilio G. Cota

From: "Emilio G. Cota" <cota@braap.org>

Emulating LL/SC with cmpxchg is not correct, since it can
suffer from the ABA problem. Portable parallel code, however,
is written assuming only cmpxchg--and not LL/SC--is available.
This means that in practice emulating LL/SC with cmpxchg is
a viable alternative.

The appended emulates LL/SC pairs in ARM with cmpxchg helpers.
This works in both user and system mode. In usermode, it avoids
pausing all other CPUs to perform the LL/SC pair. The subsequent
performance and scalability improvement is significant, as the
plots below show. They plot the throughput of atomic_add-bench
compiled for ARM and executed on a 64-core x86 machine.

Hi-res plots: http://imgur.com/a/aNQpB

               atomic_add-bench: 1000000 ops/thread, [0,1] range

  9 ++---------+----------+----------+----------+----------+----------+---++
    +cmpxchg +-E--+       +          +          +          +          +    |
  8 +Emaster +-H--+                                                       ++
    | |                                                                    |
  7 ++E                                                                   ++
    | |                                                                    |
  6 ++++                                                                  ++
    |  |                                                                   |
  5 ++ |                                                                  ++
  4 ++ |                                                                  ++
    |  |                                                                   |
  3 ++ |                                                                  ++
    |   |                                                                  |
  2 ++  |                                                                 ++
    |H++E+---                                  +++  ---+E+------+E+------+E|
  1 +++     +E+-----+E+------+E+------+E+------+E+--   +++      +++       ++
    ++H+       +    +++   +  +++     ++++       +          +          +    |
  0 ++--H----H-+-----H----+----------+----------+----------+----------+---++
    0          10         20         30         40         50         60
                               Number of threads

                atomic_add-bench: 1000000 ops/thread, [0,2] range

  16 ++---------+----------+---------+----------+----------+----------+---++
     +cmpxchg +-E--+       +         +          +          +          +    |
  14 ++master +-H--+                                                      ++
     | |                                                                   |
  12 ++|                                                                  ++
     | E                                                                   |
  10 ++|                                                                  ++
     | |                                                                   |
   8 ++++                                                                 ++
     |E+|                                                                  |
     |  |                                                                  |
   6 ++ |                                                                 ++
     |   |                                                                 |
   4 ++  |                                                                ++
     |  +E+---       +++      +++              +++           ---+E+------+E|
   2 +H+     +E+------E-------+E+-----+E+------+E+------+E+--            +++
     + |        +    +++   +         ++++       +          +          +    |
   0 ++H-H----H-+-----H----+---------+----------+----------+----------+---++
     0          10         20        30         40         50         60
                                Number of threads

               atomic_add-bench: 1000000 ops/thread, [0,128] range

  70 ++---------+----------+---------+----------+----------+----------+---++
     +cmpxchg +-E--+       +         +          +       ++++          +    |
  60 ++master +-H--+                                 ----E------+E+-------++
     |                                        -+E+---   +++     +++      +E|
     |                                +++ ---- +++                       ++|
  50 ++                       +++  ---+E+-                                ++
     |                        -E---                                        |
  40 ++                    ---+++                                         ++
     |               +++---                                                |
     |              -+E+                                                   |
  30 ++      +++----                                                      ++
     |       +E+                                                           |
  20 ++ +++--                                                             ++
     |  +E+                                                                |
     |+E+                                                                  |
  10 +E+                                                                  ++
     +          +          +         +          +          +          +    |
   0 +HH-H----H-+-----H----+---------+----------+----------+----------+---++
     0          10         20        30         40         50         60
                                Number of threads

              atomic_add-bench: 1000000 ops/thread, [0,1024] range

  120 ++---------+---------+----------+---------+----------+----------+---++
      +cmpxchg +-E--+      +          +         +          +          +    |
      | master +-H--+                                                    ++|
  100 ++                                                              ----E+
      |                                                 +++  ---+E+---   ++|
      |                                                --E---   +++        |
   80 ++                                           ---- +++               ++
      |                                     ---+E+-                        |
   60 ++                              -+E+--                              ++
      |                       +++ ---- +++                                 |
      |                      -+E+-                                         |
   40 ++              +++----                                             ++
      |      +++   ---+E+                                                  |
      |     -+E+---                                                        |
   20 ++ +E+                                                              ++
      |+E+++                                                               |
      +E+        +         +          +         +          +          +    |
    0 +HH-H---H--+-----H---+----------+---------+----------+----------+---++
      0          10        20         30        40         50         60
                                Number of threads

[rth: Enforce alignment for ldrexd.]

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1467054136-10430-23-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-arm/translate.c | 136 +++++++++++++++----------------------------------
 1 file changed, 42 insertions(+), 94 deletions(-)

diff --git a/target-arm/translate.c b/target-arm/translate.c
index 1b5bf87..680635c 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -7676,47 +7676,27 @@ static void gen_logicq_cc(TCGv_i32 lo, TCGv_i32 hi)
     tcg_gen_or_i32(cpu_ZF, lo, hi);
 }
 
-/* Load/Store exclusive instructions are implemented by remembering
-   the value/address loaded, and seeing if these are the same
-   when the store is performed. This should be sufficient to implement
-   the architecturally mandated semantics, and avoids having to monitor
-   regular stores.
-
-   In system emulation mode only one CPU will be running at once, so
-   this sequence is effectively atomic.  In user emulation mode we
-   throw an exception and handle the atomic operation elsewhere.  */
 static void gen_load_exclusive(DisasContext *s, int rt, int rt2,
                                TCGv_i32 addr, int size)
 {
     TCGv_i32 tmp = tcg_temp_new_i32();
+    TCGMemOp opc = size | MO_ALIGN | s->be_data;
 
     s->is_ldex = true;
 
-    switch (size) {
-    case 0:
-        gen_aa32_ld8u(s, tmp, addr, get_mem_index(s));
-        break;
-    case 1:
-        gen_aa32_ld16ua(s, tmp, addr, get_mem_index(s));
-        break;
-    case 2:
-    case 3:
-        gen_aa32_ld32ua(s, tmp, addr, get_mem_index(s));
-        break;
-    default:
-        abort();
-    }
-
     if (size == 3) {
         TCGv_i32 tmp2 = tcg_temp_new_i32();
-        TCGv_i32 tmp3 = tcg_temp_new_i32();
+        TCGv_i64 t64 = tcg_temp_new_i64();
+
+        gen_aa32_ld_i64(s, t64, addr, get_mem_index(s), opc);
+        tcg_gen_mov_i64(cpu_exclusive_val, t64);
+        tcg_gen_extr_i64_i32(tmp, tmp2, t64);
+        tcg_temp_free_i64(t64);
 
-        tcg_gen_addi_i32(tmp2, addr, 4);
-        gen_aa32_ld32u(s, tmp3, tmp2, get_mem_index(s));
+        store_reg(s, rt2, tmp2);
         tcg_temp_free_i32(tmp2);
-        tcg_gen_concat_i32_i64(cpu_exclusive_val, tmp, tmp3);
-        store_reg(s, rt2, tmp3);
     } else {
+        gen_aa32_ld_i32(s, tmp, addr, get_mem_index(s), opc);
         tcg_gen_extu_i32_i64(cpu_exclusive_val, tmp);
     }
 
@@ -7729,23 +7709,15 @@ static void gen_clrex(DisasContext *s)
     tcg_gen_movi_i64(cpu_exclusive_addr, -1);
 }
 
-#ifdef CONFIG_USER_ONLY
-static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2,
-                                TCGv_i32 addr, int size)
-{
-    tcg_gen_extu_i32_i64(cpu_exclusive_test, addr);
-    tcg_gen_movi_i32(cpu_exclusive_info,
-                     size | (rd << 4) | (rt << 8) | (rt2 << 12));
-    gen_exception_internal_insn(s, 4, EXCP_STREX);
-}
-#else
 static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2,
                                 TCGv_i32 addr, int size)
 {
-    TCGv_i32 tmp;
-    TCGv_i64 val64, extaddr;
+    TCGv_i32 t0, t1, t2;
+    TCGv_i64 extaddr;
+    TCGv taddr;
     TCGLabel *done_label;
     TCGLabel *fail_label;
+    TCGMemOp opc = size | MO_ALIGN | s->be_data;
 
     /* if (env->exclusive_addr == addr && env->exclusive_val == [addr]) {
          [addr] = {Rt};
@@ -7760,69 +7732,45 @@ static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2,
     tcg_gen_brcond_i64(TCG_COND_NE, extaddr, cpu_exclusive_addr, fail_label);
     tcg_temp_free_i64(extaddr);
 
-    tmp = tcg_temp_new_i32();
-    switch (size) {
-    case 0:
-        gen_aa32_ld8u(s, tmp, addr, get_mem_index(s));
-        break;
-    case 1:
-        gen_aa32_ld16u(s, tmp, addr, get_mem_index(s));
-        break;
-    case 2:
-    case 3:
-        gen_aa32_ld32u(s, tmp, addr, get_mem_index(s));
-        break;
-    default:
-        abort();
-    }
-
-    val64 = tcg_temp_new_i64();
+    taddr = gen_aa32_addr(s, addr, opc);
+    t0 = tcg_temp_new_i32();
+    t1 = load_reg(s, rt);
     if (size == 3) {
-        TCGv_i32 tmp2 = tcg_temp_new_i32();
-        TCGv_i32 tmp3 = tcg_temp_new_i32();
-        tcg_gen_addi_i32(tmp2, addr, 4);
-        gen_aa32_ld32u(s, tmp3, tmp2, get_mem_index(s));
-        tcg_temp_free_i32(tmp2);
-        tcg_gen_concat_i32_i64(val64, tmp, tmp3);
-        tcg_temp_free_i32(tmp3);
-    } else {
-        tcg_gen_extu_i32_i64(val64, tmp);
-    }
-    tcg_temp_free_i32(tmp);
+        TCGv_i64 o64 = tcg_temp_new_i64();
+        TCGv_i64 n64 = tcg_temp_new_i64();
 
-    tcg_gen_brcond_i64(TCG_COND_NE, val64, cpu_exclusive_val, fail_label);
-    tcg_temp_free_i64(val64);
+        t2 = load_reg(s, rt2);
+        tcg_gen_concat_i32_i64(n64, t1, t2);
+        tcg_temp_free_i32(t2);
+        gen_aa32_frob64(s, n64);
 
-    tmp = load_reg(s, rt);
-    switch (size) {
-    case 0:
-        gen_aa32_st8(s, tmp, addr, get_mem_index(s));
-        break;
-    case 1:
-        gen_aa32_st16(s, tmp, addr, get_mem_index(s));
-        break;
-    case 2:
-    case 3:
-        gen_aa32_st32(s, tmp, addr, get_mem_index(s));
-        break;
-    default:
-        abort();
-    }
-    tcg_temp_free_i32(tmp);
-    if (size == 3) {
-        tcg_gen_addi_i32(addr, addr, 4);
-        tmp = load_reg(s, rt2);
-        gen_aa32_st32(s, tmp, addr, get_mem_index(s));
-        tcg_temp_free_i32(tmp);
+        tcg_gen_atomic_cmpxchg_i64(o64, taddr, cpu_exclusive_val, n64,
+                                   get_mem_index(s), opc);
+        tcg_temp_free_i64(n64);
+
+        gen_aa32_frob64(s, o64);
+        tcg_gen_setcond_i64(TCG_COND_NE, o64, o64, cpu_exclusive_val);
+        tcg_gen_extrl_i64_i32(t0, o64);
+
+        tcg_temp_free_i64(o64);
+    } else {
+        t2 = tcg_temp_new_i32();
+        tcg_gen_extrl_i64_i32(t2, cpu_exclusive_val);
+        tcg_gen_atomic_cmpxchg_i32(t0, taddr, t2, t1, get_mem_index(s), opc);
+        tcg_gen_setcond_i32(TCG_COND_NE, t0, t0, t2);
+        tcg_temp_free_i32(t2);
     }
-    tcg_gen_movi_i32(cpu_R[rd], 0);
+    tcg_temp_free_i32(t1);
+    tcg_temp_free(taddr);
+    tcg_gen_mov_i32(cpu_R[rd], t0);
+    tcg_temp_free_i32(t0);
     tcg_gen_br(done_label);
+
     gen_set_label(fail_label);
     tcg_gen_movi_i32(cpu_R[rd], 1);
     gen_set_label(done_label);
     tcg_gen_movi_i64(cpu_exclusive_addr, -1);
 }
-#endif
 
 /* gen_srs:
  * @env: CPUARMState
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [Qemu-devel] [PATCH v3 28/34] target-arm: emulate SWP with atomic_xchg helper
  2016-09-03 20:39 [Qemu-devel] [PATCH v3 00/34] cmpxchg-based emulation of atomics Richard Henderson
                   ` (26 preceding siblings ...)
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 27/34] target-arm: emulate LL/SC using cmpxchg helpers Richard Henderson
@ 2016-09-03 20:39 ` Richard Henderson
  2016-09-14 16:05   ` Alex Bennée
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 29/34] target-arm: emulate aarch64's LL/SC using cmpxchg helpers Richard Henderson
                   ` (9 subsequent siblings)
  37 siblings, 1 reply; 97+ messages in thread
From: Richard Henderson @ 2016-09-03 20:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: Emilio G. Cota

From: "Emilio G. Cota" <cota@braap.org>

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1467054136-10430-25-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-arm/translate.c | 25 +++++++++++++------------
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/target-arm/translate.c b/target-arm/translate.c
index 680635c..2b3c34f 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -8741,25 +8741,26 @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
                         }
                         tcg_temp_free_i32(addr);
                     } else {
+                        TCGv taddr;
+                        TCGMemOp opc = s->be_data;
+
                         /* SWP instruction */
                         rm = (insn) & 0xf;
 
-                        /* ??? This is not really atomic.  However we know
-                           we never have multiple CPUs running in parallel,
-                           so it is good enough.  */
-                        addr = load_reg(s, rn);
-                        tmp = load_reg(s, rm);
-                        tmp2 = tcg_temp_new_i32();
                         if (insn & (1 << 22)) {
-                            gen_aa32_ld8u(s, tmp2, addr, get_mem_index(s));
-                            gen_aa32_st8(s, tmp, addr, get_mem_index(s));
+                            opc |= MO_UB;
                         } else {
-                            gen_aa32_ld32u(s, tmp2, addr, get_mem_index(s));
-                            gen_aa32_st32(s, tmp, addr, get_mem_index(s));
+                            opc |= MO_UL | MO_ALIGN;
                         }
-                        tcg_temp_free_i32(tmp);
+
+                        addr = load_reg(s, rn);
+                        taddr = gen_aa32_addr(s, addr, opc);
                         tcg_temp_free_i32(addr);
-                        store_reg(s, rd, tmp2);
+
+                        tmp = load_reg(s, rm);
+                        tcg_gen_atomic_xchg_i32(tmp, taddr, tmp,
+                                                get_mem_index(s), opc);
+                        store_reg(s, rd, tmp);
                     }
                 }
             } else {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [Qemu-devel] [PATCH v3 29/34] target-arm: emulate aarch64's LL/SC using cmpxchg helpers
  2016-09-03 20:39 [Qemu-devel] [PATCH v3 00/34] cmpxchg-based emulation of atomics Richard Henderson
                   ` (27 preceding siblings ...)
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 28/34] target-arm: emulate SWP with atomic_xchg helper Richard Henderson
@ 2016-09-03 20:39 ` Richard Henderson
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 30/34] linux-user: remove handling of ARM's EXCP_STREX Richard Henderson
                   ` (8 subsequent siblings)
  37 siblings, 0 replies; 97+ messages in thread
From: Richard Henderson @ 2016-09-03 20:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: Emilio G. Cota

From: "Emilio G. Cota" <cota@braap.org>

Emulating LL/SC with cmpxchg is not correct, since it can
suffer from the ABA problem. Portable parallel code, however,
is written assuming only cmpxchg--and not LL/SC--is available.
This means that in practice emulating LL/SC with cmpxchg is
a viable alternative.

The appended emulates LL/SC pairs in aarch64 with cmpxchg helpers.
This works in both user and system mode. In usermode, it avoids
pausing all other CPUs to perform the LL/SC pair. The subsequent
performance and scalability improvement is significant, as the
plots below show. They plot the throughput of atomic_add-bench
compiled for ARM and executed on a 64-core x86 machine.

Hi-res plots: http://imgur.com/a/JVc8Y

                atomic_add-bench: 1000000 ops/thread, [0,1] range

  18 ++---------+----------+---------+----------+----------+----------+---++
     +cmpxchg +-E--+       +         +          +          +          +    |
  16 ++master +-H--+                                                      ++
     ||                                                                    |
  14 ++                                                                   ++
     | |                                                                   |
  12 ++|                                                                  ++
     | |                                                                   |
  10 ++++                                                                 ++
   8 ++E                                                                  ++
     |+++                                                                  |
   6 ++ |                                                                 ++
     |  |                                                                  |
   4 ++ |                                                                 ++
     |   |                                                                 |
   2 +H++E+---                                                            ++
     + |     +E++----+E+---+--+E+----++E+------+E+------+E++----+E+---+--+E|
   0 ++H-H----H-+-----H----+---------+----------+----------+----------+---++
     0          10         20        30         40         50         60
                                Number of threads

                atomic_add-bench: 1000000 ops/thread, [0,2] range

  18 ++---------+----------+---------+----------+----------+----------+---++
     +cmpxchg +-E--+       +         +          +          +          +    |
  16 ++master +-H--+                                                      ++
     | |                                                                   |
  14 ++E                                                                  ++
     | |                                                                   |
  12 ++|                                                                  ++
     |+++                                                                  |
  10 ++ |                                                                 ++
   8 ++ |                                                                 ++
     |  |                                                                  |
   6 ++ |                                                                 ++
     |   |                                                                 |
   4 ++  |                                                                ++
     |  +E+---                                                             |
   2 +H+     +E+-----+++              +++      +++   ---+E+-----+E+------+++
     +++        +    +E+---+--+E+----++E+------+E+---   ++++    +++   +  +E|
   0 ++H-H----H-+-----H----+---------+----------+----------+----------+---++
     0          10         20        30         40         50         60
                                Number of threads

               atomic_add-bench: 1000000 ops/thread, [0,128] range

  70 ++---------+----------+---------+----------+----------+----------+---++
     +cmpxchg +-E--+       +         +          +          +          +    |
  60 ++master +-H--+                  +++            ---+E+-----+E+------+E+
     |                        +E+------E-------+E+---                      |
     |                     ---        +++                                  |
  50 ++              +++---                                               ++
     |              -+E+                                                   |
  40 ++      +++----                                                      ++
     |        E-                                                           |
     |      --|                                                            |
  30 ++   -- +++                                                          ++
     |  +E+                                                                |
  20 ++E+                                                                 ++
     |E+                                                                   |
     |                                                                     |
  10 ++                                                                   ++
     +          +          +         +          +          +          +    |
   0 +HH-H----H-+-----H----+---------+----------+----------+----------+---++
     0          10         20        30         40         50         60
                                Number of threads

              atomic_add-bench: 1000000 ops/thread, [0,1024] range

  160 ++---------+---------+----------+---------+----------+----------+---++
      +cmpxchg +-E--+      +          +         +          +          +    |
  140 ++master +-H--+                                           +++      +++
      |                                                -+E+-----+E+-------E|
  120 ++                                       +++ ----                  +++
      |                                +++  ----E--                        |
  100 ++                              --E---   +++                        ++
      |                       +++ ---- +++                                 |
   80 ++                     --E--                                        ++
      |                  ---- +++                                          |
      |              -+E+                                                  |
   60 ++         ---- +++                                                 ++
      |      +E+-                                                          |
   40 ++   --                                                             ++
      |  +E+                                                               |
   20 +EE+                                                                ++
      +++        +         +          +         +          +          +    |
    0 +HH-H---H--+-----H---+----------+---------+----------+----------+---++
      0          10        20         30        40         50         60
                                Number of threads

[rth: Rearrange 128-bit cmpxchg helper.  Enforce alignment on LL.]

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1467054136-10430-28-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-arm/helper-a64.c    | 113 +++++++++++++++++++++++++++++++++++++++++++++
 target-arm/helper-a64.h    |   2 +
 target-arm/translate-a64.c | 106 +++++++++++++++++++-----------------------
 3 files changed, 163 insertions(+), 58 deletions(-)

diff --git a/target-arm/helper-a64.c b/target-arm/helper-a64.c
index 41e48a4..35c82b4 100644
--- a/target-arm/helper-a64.c
+++ b/target-arm/helper-a64.c
@@ -27,6 +27,10 @@
 #include "qemu/bitops.h"
 #include "internals.h"
 #include "qemu/crc32c.h"
+#include "exec/exec-all.h"
+#include "exec/cpu_ldst.h"
+#include "qemu/int128.h"
+#include "tcg.h"
 #include <zlib.h> /* For crc32 */
 
 /* C2.4.7 Multiply and divide */
@@ -444,3 +448,112 @@ uint64_t HELPER(crc32c_64)(uint64_t acc, uint64_t val, uint32_t bytes)
     /* Linux crc32c converts the output to one's complement.  */
     return crc32c(acc, buf, bytes) ^ 0xffffffff;
 }
+
+/* Returns 0 on success; 1 otherwise.  */
+uint64_t HELPER(paired_cmpxchg64_le)(CPUARMState *env, uint64_t addr,
+                                     uint64_t new_lo, uint64_t new_hi)
+{
+    uintptr_t ra = GETPC();
+    Int128 oldv, cmpv, newv;
+    bool success;
+
+    cmpv = int128_make128(env->exclusive_val, env->exclusive_high);
+    newv = int128_make128(new_lo, new_hi);
+
+    if (parallel_cpus) {
+#ifndef CONFIG_ATOMIC128
+        cpu_loop_exit_atomic(ENV_GET_CPU(env), ra);
+#else
+        int mem_idx = cpu_mmu_index(env, false);
+        TCGMemOpIdx oi = make_memop_idx(MO_LEQ | MO_ALIGN_16, mem_idx);
+        oldv = helper_atomic_cmpxchgo_le_mmu(env, addr, cmpv, newv, oi, ra);
+        success = int128_eq(oldv, cmpv);
+#endif
+    } else {
+        uint64_t o0, o1;
+
+#ifdef CONFIG_USER_ONLY
+        /* ??? Enforce alignment.  */
+        uint64_t *haddr = g2h(addr);
+        o0 = ldq_le_p(haddr + 0);
+        o1 = ldq_le_p(haddr + 1);
+        oldv = int128_make128(o0, o1);
+
+        success = int128_eq(oldv, cmpv);
+        if (success) {
+            stq_le_p(haddr + 0, int128_getlo(newv));
+            stq_le_p(haddr + 8, int128_gethi(newv));
+        }
+#else
+        int mem_idx = cpu_mmu_index(env, false);
+        TCGMemOpIdx oi0 = make_memop_idx(MO_LEQ | MO_ALIGN_16, mem_idx);
+        TCGMemOpIdx oi1 = make_memop_idx(MO_LEQ, mem_idx);
+
+        o0 = helper_le_ldq_mmu(env, addr + 0, oi0, ra);
+        o1 = helper_le_ldq_mmu(env, addr + 8, oi1, ra);
+        oldv = int128_make128(o0, o1);
+
+        success = int128_eq(oldv, cmpv);
+        if (success) {
+            helper_le_stq_mmu(env, addr + 0, int128_getlo(newv), oi1, ra);
+            helper_le_stq_mmu(env, addr + 8, int128_gethi(newv), oi1, ra);
+        }
+#endif
+    }
+
+    return !success;
+}
+
+uint64_t HELPER(paired_cmpxchg64_be)(CPUARMState *env, uint64_t addr,
+                                     uint64_t new_lo, uint64_t new_hi)
+{
+    uintptr_t ra = GETPC();
+    Int128 oldv, cmpv, newv;
+    bool success;
+
+    cmpv = int128_make128(env->exclusive_val, env->exclusive_high);
+    newv = int128_make128(new_lo, new_hi);
+
+    if (parallel_cpus) {
+#ifndef CONFIG_ATOMIC128
+        cpu_loop_exit_atomic(ENV_GET_CPU(env), ra);
+#else
+        int mem_idx = cpu_mmu_index(env, false);
+        TCGMemOpIdx oi = make_memop_idx(MO_BEQ | MO_ALIGN_16, mem_idx);
+        oldv = helper_atomic_cmpxchgo_be_mmu(env, addr, cmpv, newv, oi, ra);
+        success = int128_eq(oldv, cmpv);
+#endif
+    } else {
+        uint64_t o0, o1;
+
+#ifdef CONFIG_USER_ONLY
+        /* ??? Enforce alignment.  */
+        uint64_t *haddr = g2h(addr);
+        o1 = ldq_be_p(haddr + 0);
+        o0 = ldq_be_p(haddr + 1);
+        oldv = int128_make128(o0, o1);
+
+        success = int128_eq(oldv, cmpv);
+        if (success) {
+            stq_be_p(haddr + 0, int128_gethi(newv));
+            stq_be_p(haddr + 8, int128_getlo(newv));
+        }
+#else
+        int mem_idx = cpu_mmu_index(env, false);
+        TCGMemOpIdx oi0 = make_memop_idx(MO_BEQ | MO_ALIGN_16, mem_idx);
+        TCGMemOpIdx oi1 = make_memop_idx(MO_BEQ, mem_idx);
+
+        o1 = helper_be_ldq_mmu(env, addr + 0, oi0, ra);
+        o0 = helper_be_ldq_mmu(env, addr + 8, oi1, ra);
+        oldv = int128_make128(o0, o1);
+
+        success = int128_eq(oldv, cmpv);
+        if (success) {
+            helper_be_stq_mmu(env, addr + 0, int128_gethi(newv), oi1, ra);
+            helper_be_stq_mmu(env, addr + 8, int128_getlo(newv), oi1, ra);
+        }
+#endif
+    }
+
+    return !success;
+}
diff --git a/target-arm/helper-a64.h b/target-arm/helper-a64.h
index 1d3d10f..dd32000 100644
--- a/target-arm/helper-a64.h
+++ b/target-arm/helper-a64.h
@@ -46,3 +46,5 @@ DEF_HELPER_FLAGS_2(frecpx_f32, TCG_CALL_NO_RWG, f32, f32, ptr)
 DEF_HELPER_FLAGS_2(fcvtx_f64_to_f32, TCG_CALL_NO_RWG, f32, f64, env)
 DEF_HELPER_FLAGS_3(crc32_64, TCG_CALL_NO_RWG_SE, i64, i64, i64, i32)
 DEF_HELPER_FLAGS_3(crc32c_64, TCG_CALL_NO_RWG_SE, i64, i64, i64, i32)
+DEF_HELPER_FLAGS_4(paired_cmpxchg64_le, TCG_CALL_NO_WG, i64, env, i64, i64, i64)
+DEF_HELPER_FLAGS_4(paired_cmpxchg64_be, TCG_CALL_NO_WG, i64, env, i64, i64, i64)
diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index f5e29d2..450c359 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -1754,37 +1754,41 @@ static void disas_b_exc_sys(DisasContext *s, uint32_t insn)
     }
 }
 
-/*
- * Load/Store exclusive instructions are implemented by remembering
- * the value/address loaded, and seeing if these are the same
- * when the store is performed. This is not actually the architecturally
- * mandated semantics, but it works for typical guest code sequences
- * and avoids having to monitor regular stores.
- *
- * In system emulation mode only one CPU will be running at once, so
- * this sequence is effectively atomic.  In user emulation mode we
- * throw an exception and handle the atomic operation elsewhere.
- */
 static void gen_load_exclusive(DisasContext *s, int rt, int rt2,
                                TCGv_i64 addr, int size, bool is_pair)
 {
     TCGv_i64 tmp = tcg_temp_new_i64();
-    TCGMemOp memop = s->be_data + size;
+    TCGMemOp be = s->be_data;
 
     g_assert(size <= 3);
-    tcg_gen_qemu_ld_i64(tmp, addr, get_mem_index(s), memop);
-
     if (is_pair) {
-        TCGv_i64 addr2 = tcg_temp_new_i64();
         TCGv_i64 hitmp = tcg_temp_new_i64();
 
-        g_assert(size >= 2);
-        tcg_gen_addi_i64(addr2, addr, 1 << size);
-        tcg_gen_qemu_ld_i64(hitmp, addr2, get_mem_index(s), memop);
-        tcg_temp_free_i64(addr2);
+        if (size == 3) {
+            TCGv_i64 addr2 = tcg_temp_new_i64();
+
+            tcg_gen_qemu_ld_i64(tmp, addr, get_mem_index(s),
+                                MO_64 | MO_ALIGN_16 | be);
+            tcg_gen_addi_i64(addr2, addr, 8);
+            tcg_gen_qemu_ld_i64(hitmp, addr2, get_mem_index(s),
+                                MO_64 | MO_ALIGN | be);
+            tcg_temp_free_i64(addr2);
+        } else {
+            g_assert(size == 2);
+            tcg_gen_qemu_ld_i64(tmp, addr, get_mem_index(s),
+                                MO_64 | MO_ALIGN | be);
+            if (be == MO_LE) {
+                tcg_gen_extr32_i64(tmp, hitmp, tmp);
+            } else {
+                tcg_gen_extr32_i64(hitmp, tmp, tmp);
+            }
+        }
+
         tcg_gen_mov_i64(cpu_exclusive_high, hitmp);
         tcg_gen_mov_i64(cpu_reg(s, rt2), hitmp);
         tcg_temp_free_i64(hitmp);
+    } else {
+        tcg_gen_qemu_ld_i64(tmp, addr, get_mem_index(s), size | MO_ALIGN | be);
     }
 
     tcg_gen_mov_i64(cpu_exclusive_val, tmp);
@@ -1794,16 +1798,6 @@ static void gen_load_exclusive(DisasContext *s, int rt, int rt2,
     tcg_gen_mov_i64(cpu_exclusive_addr, addr);
 }
 
-#ifdef CONFIG_USER_ONLY
-static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2,
-                                TCGv_i64 addr, int size, int is_pair)
-{
-    tcg_gen_mov_i64(cpu_exclusive_test, addr);
-    tcg_gen_movi_i32(cpu_exclusive_info,
-                     size | is_pair << 2 | (rd << 4) | (rt << 9) | (rt2 << 14));
-    gen_exception_internal_insn(s, 4, EXCP_STREX);
-}
-#else
 static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2,
                                 TCGv_i64 inaddr, int size, int is_pair)
 {
@@ -1831,46 +1825,42 @@ static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2,
     tcg_gen_brcond_i64(TCG_COND_NE, addr, cpu_exclusive_addr, fail_label);
 
     tmp = tcg_temp_new_i64();
-    tcg_gen_qemu_ld_i64(tmp, addr, get_mem_index(s), s->be_data + size);
-    tcg_gen_brcond_i64(TCG_COND_NE, tmp, cpu_exclusive_val, fail_label);
-    tcg_temp_free_i64(tmp);
-
-    if (is_pair) {
-        TCGv_i64 addrhi = tcg_temp_new_i64();
-        TCGv_i64 tmphi = tcg_temp_new_i64();
-
-        tcg_gen_addi_i64(addrhi, addr, 1 << size);
-        tcg_gen_qemu_ld_i64(tmphi, addrhi, get_mem_index(s),
-                            s->be_data + size);
-        tcg_gen_brcond_i64(TCG_COND_NE, tmphi, cpu_exclusive_high, fail_label);
-
-        tcg_temp_free_i64(tmphi);
-        tcg_temp_free_i64(addrhi);
-    }
-
-    /* We seem to still have the exclusive monitor, so do the store */
-    tcg_gen_qemu_st_i64(cpu_reg(s, rt), addr, get_mem_index(s),
-                        s->be_data + size);
     if (is_pair) {
-        TCGv_i64 addrhi = tcg_temp_new_i64();
-
-        tcg_gen_addi_i64(addrhi, addr, 1 << size);
-        tcg_gen_qemu_st_i64(cpu_reg(s, rt2), addrhi,
-                            get_mem_index(s), s->be_data + size);
-        tcg_temp_free_i64(addrhi);
+        if (size == 2) {
+            TCGv_i64 val = tcg_temp_new_i64();
+            tcg_gen_concat32_i64(tmp, cpu_reg(s, rt), cpu_reg(s, rt2));
+            tcg_gen_concat32_i64(val, cpu_exclusive_val, cpu_exclusive_high);
+            tcg_gen_atomic_cmpxchg_i64(tmp, addr, val, tmp,
+                                       get_mem_index(s),
+                                       size | MO_ALIGN | s->be_data);
+            tcg_gen_setcond_i64(TCG_COND_NE, tmp, tmp, val);
+            tcg_temp_free_i64(val);
+        } else if (s->be_data == MO_LE) {
+            gen_helper_paired_cmpxchg64_le(tmp, cpu_env, addr, cpu_reg(s, rt),
+                                           cpu_reg(s, rt2));
+        } else {
+            gen_helper_paired_cmpxchg64_be(tmp, cpu_env, addr, cpu_reg(s, rt),
+                                           cpu_reg(s, rt2));
+        }
+    } else {
+        TCGv_i64 val = cpu_reg(s, rt);
+        tcg_gen_atomic_cmpxchg_i64(tmp, addr, cpu_exclusive_val, val,
+                                   get_mem_index(s),
+                                   size | MO_ALIGN | s->be_data);
+        tcg_gen_setcond_i64(TCG_COND_NE, tmp, tmp, cpu_exclusive_val);
     }
 
     tcg_temp_free_i64(addr);
 
-    tcg_gen_movi_i64(cpu_reg(s, rd), 0);
+    tcg_gen_mov_i64(cpu_reg(s, rd), tmp);
+    tcg_temp_free_i64(tmp);
     tcg_gen_br(done_label);
+
     gen_set_label(fail_label);
     tcg_gen_movi_i64(cpu_reg(s, rd), 1);
     gen_set_label(done_label);
     tcg_gen_movi_i64(cpu_exclusive_addr, -1);
-
 }
-#endif
 
 /* Update the Sixty-Four bit (SF) registersize. This logic is derived
  * from the ARMv8 specs for LDR (Shared decode for all encodings).
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [Qemu-devel] [PATCH v3 30/34] linux-user: remove handling of ARM's EXCP_STREX
  2016-09-03 20:39 [Qemu-devel] [PATCH v3 00/34] cmpxchg-based emulation of atomics Richard Henderson
                   ` (28 preceding siblings ...)
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 29/34] target-arm: emulate aarch64's LL/SC using cmpxchg helpers Richard Henderson
@ 2016-09-03 20:39 ` Richard Henderson
  2016-09-15  9:36   ` Alex Bennée
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 31/34] linux-user: remove handling of aarch64's EXCP_STREX Richard Henderson
                   ` (7 subsequent siblings)
  37 siblings, 1 reply; 97+ messages in thread
From: Richard Henderson @ 2016-09-03 20:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: Emilio G. Cota, Richard Henderson

From: "Emilio G. Cota" <cota@braap.org>

The exception is not emitted anymore.

Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <rth@twidle.net>
Message-Id: <1467054136-10430-29-git-send-email-cota@braap.org>
---
 linux-user/main.c | 93 -------------------------------------------------------
 1 file changed, 93 deletions(-)

diff --git a/linux-user/main.c b/linux-user/main.c
index 8d5c948..256382a 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -651,94 +651,6 @@ do_kernel_trap(CPUARMState *env)
     return 0;
 }
 
-/* Store exclusive handling for AArch32 */
-static int do_strex(CPUARMState *env)
-{
-    uint64_t val;
-    int size;
-    int rc = 1;
-    int segv = 0;
-    uint32_t addr;
-    start_exclusive();
-    if (env->exclusive_addr != env->exclusive_test) {
-        goto fail;
-    }
-    /* We know we're always AArch32 so the address is in uint32_t range
-     * unless it was the -1 exclusive-monitor-lost value (which won't
-     * match exclusive_test above).
-     */
-    assert(extract64(env->exclusive_addr, 32, 32) == 0);
-    addr = env->exclusive_addr;
-    size = env->exclusive_info & 0xf;
-    switch (size) {
-    case 0:
-        segv = get_user_u8(val, addr);
-        break;
-    case 1:
-        segv = get_user_data_u16(val, addr, env);
-        break;
-    case 2:
-    case 3:
-        segv = get_user_data_u32(val, addr, env);
-        break;
-    default:
-        abort();
-    }
-    if (segv) {
-        env->exception.vaddress = addr;
-        goto done;
-    }
-    if (size == 3) {
-        uint32_t valhi;
-        segv = get_user_data_u32(valhi, addr + 4, env);
-        if (segv) {
-            env->exception.vaddress = addr + 4;
-            goto done;
-        }
-        if (arm_cpu_bswap_data(env)) {
-            val = deposit64((uint64_t)valhi, 32, 32, val);
-        } else {
-            val = deposit64(val, 32, 32, valhi);
-        }
-    }
-    if (val != env->exclusive_val) {
-        goto fail;
-    }
-
-    val = env->regs[(env->exclusive_info >> 8) & 0xf];
-    switch (size) {
-    case 0:
-        segv = put_user_u8(val, addr);
-        break;
-    case 1:
-        segv = put_user_data_u16(val, addr, env);
-        break;
-    case 2:
-    case 3:
-        segv = put_user_data_u32(val, addr, env);
-        break;
-    }
-    if (segv) {
-        env->exception.vaddress = addr;
-        goto done;
-    }
-    if (size == 3) {
-        val = env->regs[(env->exclusive_info >> 12) & 0xf];
-        segv = put_user_data_u32(val, addr + 4, env);
-        if (segv) {
-            env->exception.vaddress = addr + 4;
-            goto done;
-        }
-    }
-    rc = 0;
-fail:
-    env->regs[15] += 4;
-    env->regs[(env->exclusive_info >> 4) & 0xf] = rc;
-done:
-    end_exclusive();
-    return segv;
-}
-
 void cpu_loop(CPUARMState *env)
 {
     CPUState *cs = CPU(arm_env_get_cpu(env));
@@ -908,11 +820,6 @@ void cpu_loop(CPUARMState *env)
         case EXCP_INTERRUPT:
             /* just indicate that signals should be handled asap */
             break;
-        case EXCP_STREX:
-            if (!do_strex(env)) {
-                break;
-            }
-            /* fall through for segv */
         case EXCP_PREFETCH_ABORT:
         case EXCP_DATA_ABORT:
             addr = env->exception.vaddress;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [Qemu-devel] [PATCH v3 31/34] linux-user: remove handling of aarch64's EXCP_STREX
  2016-09-03 20:39 [Qemu-devel] [PATCH v3 00/34] cmpxchg-based emulation of atomics Richard Henderson
                   ` (29 preceding siblings ...)
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 30/34] linux-user: remove handling of ARM's EXCP_STREX Richard Henderson
@ 2016-09-03 20:39 ` Richard Henderson
  2016-09-15  9:36   ` Alex Bennée
  2016-09-03 20:40 ` [Qemu-devel] [PATCH v3 32/34] target-arm: remove EXCP_STREX + cpu_exclusive_{test, info} Richard Henderson
                   ` (6 subsequent siblings)
  37 siblings, 1 reply; 97+ messages in thread
From: Richard Henderson @ 2016-09-03 20:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: Emilio G. Cota

From: "Emilio G. Cota" <cota@braap.org>

The exception is not emitted anymore.

Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Message-Id: <1467054136-10430-30-git-send-email-cota@braap.org>
---
 linux-user/main.c | 125 ------------------------------------------------------
 1 file changed, 125 deletions(-)

diff --git a/linux-user/main.c b/linux-user/main.c
index 256382a..64838bf 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -868,124 +868,6 @@ void cpu_loop(CPUARMState *env)
 
 #else
 
-/*
- * Handle AArch64 store-release exclusive
- *
- * rs = gets the status result of store exclusive
- * rt = is the register that is stored
- * rt2 = is the second register store (in STP)
- *
- */
-static int do_strex_a64(CPUARMState *env)
-{
-    uint64_t val;
-    int size;
-    bool is_pair;
-    int rc = 1;
-    int segv = 0;
-    uint64_t addr;
-    int rs, rt, rt2;
-
-    start_exclusive();
-    /* size | is_pair << 2 | (rs << 4) | (rt << 9) | (rt2 << 14)); */
-    size = extract32(env->exclusive_info, 0, 2);
-    is_pair = extract32(env->exclusive_info, 2, 1);
-    rs = extract32(env->exclusive_info, 4, 5);
-    rt = extract32(env->exclusive_info, 9, 5);
-    rt2 = extract32(env->exclusive_info, 14, 5);
-
-    addr = env->exclusive_addr;
-
-    if (addr != env->exclusive_test) {
-        goto finish;
-    }
-
-    switch (size) {
-    case 0:
-        segv = get_user_u8(val, addr);
-        break;
-    case 1:
-        segv = get_user_u16(val, addr);
-        break;
-    case 2:
-        segv = get_user_u32(val, addr);
-        break;
-    case 3:
-        segv = get_user_u64(val, addr);
-        break;
-    default:
-        abort();
-    }
-    if (segv) {
-        env->exception.vaddress = addr;
-        goto error;
-    }
-    if (val != env->exclusive_val) {
-        goto finish;
-    }
-    if (is_pair) {
-        if (size == 2) {
-            segv = get_user_u32(val, addr + 4);
-        } else {
-            segv = get_user_u64(val, addr + 8);
-        }
-        if (segv) {
-            env->exception.vaddress = addr + (size == 2 ? 4 : 8);
-            goto error;
-        }
-        if (val != env->exclusive_high) {
-            goto finish;
-        }
-    }
-    /* handle the zero register */
-    val = rt == 31 ? 0 : env->xregs[rt];
-    switch (size) {
-    case 0:
-        segv = put_user_u8(val, addr);
-        break;
-    case 1:
-        segv = put_user_u16(val, addr);
-        break;
-    case 2:
-        segv = put_user_u32(val, addr);
-        break;
-    case 3:
-        segv = put_user_u64(val, addr);
-        break;
-    }
-    if (segv) {
-        goto error;
-    }
-    if (is_pair) {
-        /* handle the zero register */
-        val = rt2 == 31 ? 0 : env->xregs[rt2];
-        if (size == 2) {
-            segv = put_user_u32(val, addr + 4);
-        } else {
-            segv = put_user_u64(val, addr + 8);
-        }
-        if (segv) {
-            env->exception.vaddress = addr + (size == 2 ? 4 : 8);
-            goto error;
-        }
-    }
-    rc = 0;
-finish:
-    env->pc += 4;
-    /* rs == 31 encodes a write to the ZR, thus throwing away
-     * the status return. This is rather silly but valid.
-     */
-    if (rs < 31) {
-        env->xregs[rs] = rc;
-    }
-error:
-    /* instruction faulted, PC does not advance */
-    /* either way a strex releases any exclusive lock we have */
-    env->exclusive_addr = -1;
-    end_exclusive();
-    return segv;
-}
-
 /* AArch64 main loop */
 void cpu_loop(CPUARMState *env)
 {
@@ -1026,11 +908,6 @@ void cpu_loop(CPUARMState *env)
             info._sifields._sigfault._addr = env->pc;
             queue_signal(env, info.si_signo, &info);
             break;
-        case EXCP_STREX:
-            if (!do_strex_a64(env)) {
-                break;
-            }
-            /* fall through for segv */
         case EXCP_PREFETCH_ABORT:
         case EXCP_DATA_ABORT:
             info.si_signo = TARGET_SIGSEGV;
@@ -1066,8 +943,6 @@ void cpu_loop(CPUARMState *env)
         process_pending_signals(env);
         /* Exception return on AArch64 always clears the exclusive monitor,
          * so any return to running guest code implies this.
-         * A strex (successful or otherwise) also clears the monitor, so
-         * we don't need to specialcase EXCP_STREX.
          */
         env->exclusive_addr = -1;
     }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [Qemu-devel] [PATCH v3 32/34] target-arm: remove EXCP_STREX + cpu_exclusive_{test, info}
  2016-09-03 20:39 [Qemu-devel] [PATCH v3 00/34] cmpxchg-based emulation of atomics Richard Henderson
                   ` (30 preceding siblings ...)
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 31/34] linux-user: remove handling of aarch64's EXCP_STREX Richard Henderson
@ 2016-09-03 20:40 ` Richard Henderson
  2016-09-15  9:39   ` Alex Bennée
  2016-09-03 20:40 ` [Qemu-devel] [PATCH v3 33/34] target-alpha: Introduce MMU_PHYS_IDX Richard Henderson
                   ` (5 subsequent siblings)
  37 siblings, 1 reply; 97+ messages in thread
From: Richard Henderson @ 2016-09-03 20:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: Emilio G. Cota

From: "Emilio G. Cota" <cota@braap.org>

The exception is not emitted anymore; remove it and the associated
TCG variables.

Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Message-Id: <1467054136-10430-31-git-send-email-cota@braap.org>
---
 target-arm/cpu.h       | 17 ++++++-----------
 target-arm/internals.h |  4 +---
 target-arm/translate.c | 10 ----------
 target-arm/translate.h |  4 ----
 4 files changed, 7 insertions(+), 28 deletions(-)

diff --git a/target-arm/cpu.h b/target-arm/cpu.h
index 76d824d..a38cec0 100644
--- a/target-arm/cpu.h
+++ b/target-arm/cpu.h
@@ -46,13 +46,12 @@
 #define EXCP_BKPT            7
 #define EXCP_EXCEPTION_EXIT  8   /* Return from v7M exception.  */
 #define EXCP_KERNEL_TRAP     9   /* Jumped to kernel code page.  */
-#define EXCP_STREX          10
-#define EXCP_HVC            11   /* HyperVisor Call */
-#define EXCP_HYP_TRAP       12
-#define EXCP_SMC            13   /* Secure Monitor Call */
-#define EXCP_VIRQ           14
-#define EXCP_VFIQ           15
-#define EXCP_SEMIHOST       16   /* semihosting call (A64 only) */
+#define EXCP_HVC            10   /* HyperVisor Call */
+#define EXCP_HYP_TRAP       11
+#define EXCP_SMC            12   /* Secure Monitor Call */
+#define EXCP_VIRQ           13
+#define EXCP_VFIQ           14
+#define EXCP_SEMIHOST       15   /* semihosting call (A64 only) */
 
 #define ARMV7M_EXCP_RESET   1
 #define ARMV7M_EXCP_NMI     2
@@ -475,10 +474,6 @@ typedef struct CPUARMState {
     uint64_t exclusive_addr;
     uint64_t exclusive_val;
     uint64_t exclusive_high;
-#if defined(CONFIG_USER_ONLY)
-    uint64_t exclusive_test;
-    uint32_t exclusive_info;
-#endif
 
     /* iwMMXt coprocessor state.  */
     struct {
diff --git a/target-arm/internals.h b/target-arm/internals.h
index cd57401..3edccd2 100644
--- a/target-arm/internals.h
+++ b/target-arm/internals.h
@@ -46,8 +46,7 @@ static inline bool excp_is_internal(int excp)
         || excp == EXCP_HALTED
         || excp == EXCP_EXCEPTION_EXIT
         || excp == EXCP_KERNEL_TRAP
-        || excp == EXCP_SEMIHOST
-        || excp == EXCP_STREX;
+        || excp == EXCP_SEMIHOST;
 }
 
 /* Exception names for debug logging; note that not all of these
@@ -63,7 +62,6 @@ static const char * const excnames[] = {
     [EXCP_BKPT] = "Breakpoint",
     [EXCP_EXCEPTION_EXIT] = "QEMU v7M exception exit",
     [EXCP_KERNEL_TRAP] = "QEMU intercept of kernel commpage",
-    [EXCP_STREX] = "QEMU intercept of STREX",
     [EXCP_HVC] = "Hypervisor Call",
     [EXCP_HYP_TRAP] = "Hypervisor Trap",
     [EXCP_SMC] = "Secure Monitor Call",
diff --git a/target-arm/translate.c b/target-arm/translate.c
index 2b3c34f..e8e8502 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -64,10 +64,6 @@ static TCGv_i32 cpu_R[16];
 TCGv_i32 cpu_CF, cpu_NF, cpu_VF, cpu_ZF;
 TCGv_i64 cpu_exclusive_addr;
 TCGv_i64 cpu_exclusive_val;
-#ifdef CONFIG_USER_ONLY
-TCGv_i64 cpu_exclusive_test;
-TCGv_i32 cpu_exclusive_info;
-#endif
 
 /* FIXME:  These should be removed.  */
 static TCGv_i32 cpu_F0s, cpu_F1s;
@@ -101,12 +97,6 @@ void arm_translate_init(void)
         offsetof(CPUARMState, exclusive_addr), "exclusive_addr");
     cpu_exclusive_val = tcg_global_mem_new_i64(cpu_env,
         offsetof(CPUARMState, exclusive_val), "exclusive_val");
-#ifdef CONFIG_USER_ONLY
-    cpu_exclusive_test = tcg_global_mem_new_i64(cpu_env,
-        offsetof(CPUARMState, exclusive_test), "exclusive_test");
-    cpu_exclusive_info = tcg_global_mem_new_i32(cpu_env,
-        offsetof(CPUARMState, exclusive_info), "exclusive_info");
-#endif
 
     a64_translate_init();
 }
diff --git a/target-arm/translate.h b/target-arm/translate.h
index dbd7ac8..d4e205e 100644
--- a/target-arm/translate.h
+++ b/target-arm/translate.h
@@ -77,10 +77,6 @@ extern TCGv_env cpu_env;
 extern TCGv_i32 cpu_NF, cpu_ZF, cpu_CF, cpu_VF;
 extern TCGv_i64 cpu_exclusive_addr;
 extern TCGv_i64 cpu_exclusive_val;
-#ifdef CONFIG_USER_ONLY
-extern TCGv_i64 cpu_exclusive_test;
-extern TCGv_i32 cpu_exclusive_info;
-#endif
 
 static inline int arm_dc_feature(DisasContext *dc, int feature)
 {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [Qemu-devel] [PATCH v3 33/34] target-alpha: Introduce MMU_PHYS_IDX
  2016-09-03 20:39 [Qemu-devel] [PATCH v3 00/34] cmpxchg-based emulation of atomics Richard Henderson
                   ` (31 preceding siblings ...)
  2016-09-03 20:40 ` [Qemu-devel] [PATCH v3 32/34] target-arm: remove EXCP_STREX + cpu_exclusive_{test, info} Richard Henderson
@ 2016-09-03 20:40 ` Richard Henderson
  2016-09-15 10:10   ` Alex Bennée
  2016-09-03 20:40 ` [Qemu-devel] [PATCH v3 34/34] target-alpha: Emulate LL/SC using cmpxchg helpers Richard Henderson
                   ` (4 subsequent siblings)
  37 siblings, 1 reply; 97+ messages in thread
From: Richard Henderson @ 2016-09-03 20:40 UTC (permalink / raw)
  To: qemu-devel

Rather than using helpers for physical accesses, use a mmu index.
The primary cleanup is with store-conditional on physical addresses.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-alpha/cpu.h        | 18 +++++-------
 target-alpha/helper.c     | 10 ++++++-
 target-alpha/helper.h     |  9 ------
 target-alpha/mem_helper.c | 73 -----------------------------------------------
 target-alpha/translate.c  | 50 ++++++++++++++++++--------------
 5 files changed, 45 insertions(+), 115 deletions(-)

diff --git a/target-alpha/cpu.h b/target-alpha/cpu.h
index ac5e801..9d9489c 100644
--- a/target-alpha/cpu.h
+++ b/target-alpha/cpu.h
@@ -201,7 +201,7 @@ enum {
 
 /* MMU modes definitions */
 
-/* Alpha has 5 MMU modes: PALcode, kernel, executive, supervisor, and user.
+/* Alpha has 5 MMU modes: PALcode, Kernel, Executive, Supervisor, and User.
    The Unix PALcode only exposes the kernel and user modes; presumably
    executive and supervisor are used by VMS.
 
@@ -209,22 +209,18 @@ enum {
    there are PALmode instructions that can access data via physical mode
    or via an os-installed "alternate mode", which is one of the 4 above.
 
-   QEMU does not currently properly distinguish between code/data when
-   looking up addresses.  To avoid having to address this issue, our
-   emulated PALcode will cheat and use the KSEG mapping for its code+data
-   rather than physical addresses.
+   That said, we're only emulating Unix PALcode, and not attempting VMS,
+   so we don't need to implement Executive and Supervisor.  QEMU's own
+   PALcode cheats and usees the KSEG mapping for its code+data rather than
+   physical addresses.  */
 
-   Moreover, we're only emulating Unix PALcode, and not attempting VMS.
-
-   All of which allows us to drop all but kernel and user modes.
-   Elide the unused MMU modes to save space.  */
-
-#define NB_MMU_MODES 2
+#define NB_MMU_MODES 3
 
 #define MMU_MODE0_SUFFIX _kernel
 #define MMU_MODE1_SUFFIX _user
 #define MMU_KERNEL_IDX   0
 #define MMU_USER_IDX     1
+#define MMU_PHYS_IDX     2
 
 typedef struct CPUAlphaState CPUAlphaState;
 
diff --git a/target-alpha/helper.c b/target-alpha/helper.c
index 85168b7..1ed0725 100644
--- a/target-alpha/helper.c
+++ b/target-alpha/helper.c
@@ -126,6 +126,14 @@ static int get_physical_address(CPUAlphaState *env, target_ulong addr,
     int prot = 0;
     int ret = MM_K_ACV;
 
+    /* Handle physical accesses.  */
+    if (mmu_idx == MMU_PHYS_IDX) {
+        phys = addr;
+        prot = PAGE_READ | PAGE_WRITE | PAGE_EXEC;
+        ret = -1;
+        goto exit;
+    }
+
     /* Ensure that the virtual address is properly sign-extended from
        the last implemented virtual address bit.  */
     if (saddr >> TARGET_VIRT_ADDR_SPACE_BITS != saddr >> 63) {
@@ -137,7 +145,7 @@ static int get_physical_address(CPUAlphaState *env, target_ulong addr,
        determine which KSEG is actually active.  */
     if (saddr < 0 && ((saddr >> 41) & 3) == 2) {
         /* User-space cannot access KSEG addresses.  */
-        if (mmu_idx != MMU_KERNEL_IDX) {
+        if (mmu_idx < MMU_KERNEL_IDX) {
             goto exit;
         }
 
diff --git a/target-alpha/helper.h b/target-alpha/helper.h
index c3d8a3e..004221d 100644
--- a/target-alpha/helper.h
+++ b/target-alpha/helper.h
@@ -92,15 +92,6 @@ DEF_HELPER_FLAGS_2(ieee_input_cmp, TCG_CALL_NO_WG, void, env, i64)
 DEF_HELPER_FLAGS_2(ieee_input_s, TCG_CALL_NO_WG, void, env, i64)
 
 #if !defined (CONFIG_USER_ONLY)
-DEF_HELPER_2(ldl_phys, i64, env, i64)
-DEF_HELPER_2(ldq_phys, i64, env, i64)
-DEF_HELPER_2(ldl_l_phys, i64, env, i64)
-DEF_HELPER_2(ldq_l_phys, i64, env, i64)
-DEF_HELPER_3(stl_phys, void, env, i64, i64)
-DEF_HELPER_3(stq_phys, void, env, i64, i64)
-DEF_HELPER_3(stl_c_phys, i64, env, i64, i64)
-DEF_HELPER_3(stq_c_phys, i64, env, i64, i64)
-
 DEF_HELPER_FLAGS_1(tbia, TCG_CALL_NO_RWG, void, env)
 DEF_HELPER_FLAGS_2(tbis, TCG_CALL_NO_RWG, void, env, i64)
 DEF_HELPER_FLAGS_1(tb_flush, TCG_CALL_NO_RWG, void, env)
diff --git a/target-alpha/mem_helper.c b/target-alpha/mem_helper.c
index 1b2be50..78a7d45 100644
--- a/target-alpha/mem_helper.c
+++ b/target-alpha/mem_helper.c
@@ -25,79 +25,6 @@
 
 /* Softmmu support */
 #ifndef CONFIG_USER_ONLY
-
-uint64_t helper_ldl_phys(CPUAlphaState *env, uint64_t p)
-{
-    CPUState *cs = CPU(alpha_env_get_cpu(env));
-    return (int32_t)ldl_phys(cs->as, p);
-}
-
-uint64_t helper_ldq_phys(CPUAlphaState *env, uint64_t p)
-{
-    CPUState *cs = CPU(alpha_env_get_cpu(env));
-    return ldq_phys(cs->as, p);
-}
-
-uint64_t helper_ldl_l_phys(CPUAlphaState *env, uint64_t p)
-{
-    CPUState *cs = CPU(alpha_env_get_cpu(env));
-    env->lock_addr = p;
-    return env->lock_value = (int32_t)ldl_phys(cs->as, p);
-}
-
-uint64_t helper_ldq_l_phys(CPUAlphaState *env, uint64_t p)
-{
-    CPUState *cs = CPU(alpha_env_get_cpu(env));
-    env->lock_addr = p;
-    return env->lock_value = ldq_phys(cs->as, p);
-}
-
-void helper_stl_phys(CPUAlphaState *env, uint64_t p, uint64_t v)
-{
-    CPUState *cs = CPU(alpha_env_get_cpu(env));
-    stl_phys(cs->as, p, v);
-}
-
-void helper_stq_phys(CPUAlphaState *env, uint64_t p, uint64_t v)
-{
-    CPUState *cs = CPU(alpha_env_get_cpu(env));
-    stq_phys(cs->as, p, v);
-}
-
-uint64_t helper_stl_c_phys(CPUAlphaState *env, uint64_t p, uint64_t v)
-{
-    CPUState *cs = CPU(alpha_env_get_cpu(env));
-    uint64_t ret = 0;
-
-    if (p == env->lock_addr) {
-        int32_t old = ldl_phys(cs->as, p);
-        if (old == (int32_t)env->lock_value) {
-            stl_phys(cs->as, p, v);
-            ret = 1;
-        }
-    }
-    env->lock_addr = -1;
-
-    return ret;
-}
-
-uint64_t helper_stq_c_phys(CPUAlphaState *env, uint64_t p, uint64_t v)
-{
-    CPUState *cs = CPU(alpha_env_get_cpu(env));
-    uint64_t ret = 0;
-
-    if (p == env->lock_addr) {
-        uint64_t old = ldq_phys(cs->as, p);
-        if (old == env->lock_value) {
-            stq_phys(cs->as, p, v);
-            ret = 1;
-        }
-    }
-    env->lock_addr = -1;
-
-    return ret;
-}
-
 void alpha_cpu_do_unaligned_access(CPUState *cs, vaddr addr,
                                    MMUAccessType access_type,
                                    int mmu_idx, uintptr_t retaddr)
diff --git a/target-alpha/translate.c b/target-alpha/translate.c
index 0ea0e6e..2941159 100644
--- a/target-alpha/translate.c
+++ b/target-alpha/translate.c
@@ -392,7 +392,8 @@ static inline void gen_store_mem(DisasContext *ctx,
 }
 
 static ExitStatus gen_store_conditional(DisasContext *ctx, int ra, int rb,
-                                        int32_t disp16, int quad)
+                                        int32_t disp16, int mem_idx,
+                                        TCGMemOp op)
 {
     TCGv addr;
 
@@ -414,7 +415,7 @@ static ExitStatus gen_store_conditional(DisasContext *ctx, int ra, int rb,
     /* ??? This is handled via a complicated version of compare-and-swap
        in the cpu_loop.  Hopefully one day we'll have a real CAS opcode
        in TCG so that this isn't necessary.  */
-    return gen_excp(ctx, quad ? EXCP_STQ_C : EXCP_STL_C, ra);
+    return gen_excp(ctx, (op & MO_SIZE) == MO_64 ? EXCP_STQ_C : EXCP_STL_C, ra);
 #else
     /* ??? In system mode we are never multi-threaded, so CAS can be
        implemented via a non-atomic load-compare-store sequence.  */
@@ -427,11 +428,10 @@ static ExitStatus gen_store_conditional(DisasContext *ctx, int ra, int rb,
         tcg_gen_brcond_i64(TCG_COND_NE, addr, cpu_lock_addr, lab_fail);
 
         val = tcg_temp_new();
-        tcg_gen_qemu_ld_i64(val, addr, ctx->mem_idx, quad ? MO_LEQ : MO_LESL);
+        tcg_gen_qemu_ld_i64(val, addr, mem_idx, op);
         tcg_gen_brcond_i64(TCG_COND_NE, val, cpu_lock_value, lab_fail);
 
-        tcg_gen_qemu_st_i64(ctx->ir[ra], addr, ctx->mem_idx,
-                            quad ? MO_LEQ : MO_LEUL);
+        tcg_gen_qemu_st_i64(ctx->ir[ra], addr, mem_idx, op);
         tcg_gen_movi_i64(ctx->ir[ra], 1);
         tcg_gen_br(lab_done);
 
@@ -2423,19 +2423,19 @@ static ExitStatus translate_one(DisasContext *ctx, uint32_t insn)
             switch ((insn >> 12) & 0xF) {
             case 0x0:
                 /* Longword physical access (hw_ldl/p) */
-                gen_helper_ldl_phys(va, cpu_env, addr);
+                tcg_gen_qemu_ld_i64(va, addr, MMU_PHYS_IDX, MO_LESL);
                 break;
             case 0x1:
                 /* Quadword physical access (hw_ldq/p) */
-                gen_helper_ldq_phys(va, cpu_env, addr);
+                tcg_gen_qemu_ld_i64(va, addr, MMU_PHYS_IDX, MO_LEQ);
                 break;
             case 0x2:
                 /* Longword physical access with lock (hw_ldl_l/p) */
-                gen_helper_ldl_l_phys(va, cpu_env, addr);
+                gen_qemu_ldl_l(va, addr, MMU_PHYS_IDX);
                 break;
             case 0x3:
                 /* Quadword physical access with lock (hw_ldq_l/p) */
-                gen_helper_ldq_l_phys(va, cpu_env, addr);
+                gen_qemu_ldq_l(va, addr, MMU_PHYS_IDX);
                 break;
             case 0x4:
                 /* Longword virtual PTE fetch (hw_ldl/v) */
@@ -2674,27 +2674,34 @@ static ExitStatus translate_one(DisasContext *ctx, uint32_t insn)
 #ifndef CONFIG_USER_ONLY
         REQUIRE_TB_FLAG(TB_FLAGS_PAL_MODE);
         {
-            TCGv addr = tcg_temp_new();
-            va = load_gpr(ctx, ra);
-            vb = load_gpr(ctx, rb);
-
-            tcg_gen_addi_i64(addr, vb, disp12);
             switch ((insn >> 12) & 0xF) {
             case 0x0:
                 /* Longword physical access */
-                gen_helper_stl_phys(cpu_env, addr, va);
+                va = load_gpr(ctx, ra);
+                vb = load_gpr(ctx, rb);
+                tmp = tcg_temp_new();
+                tcg_gen_addi_i64(tmp, vb, disp12);
+                tcg_gen_qemu_st_i64(va, tmp, MMU_PHYS_IDX, MO_LESL);
+                tcg_temp_free(tmp);
                 break;
             case 0x1:
                 /* Quadword physical access */
-                gen_helper_stq_phys(cpu_env, addr, va);
+                va = load_gpr(ctx, ra);
+                vb = load_gpr(ctx, rb);
+                tmp = tcg_temp_new();
+                tcg_gen_addi_i64(tmp, vb, disp12);
+                tcg_gen_qemu_st_i64(va, tmp, MMU_PHYS_IDX, MO_LEQ);
+                tcg_temp_free(tmp);
                 break;
             case 0x2:
                 /* Longword physical access with lock */
-                gen_helper_stl_c_phys(dest_gpr(ctx, ra), cpu_env, addr, va);
+                ret = gen_store_conditional(ctx, ra, rb, disp12,
+                                            MMU_PHYS_IDX, MO_LESL);
                 break;
             case 0x3:
                 /* Quadword physical access with lock */
-                gen_helper_stq_c_phys(dest_gpr(ctx, ra), cpu_env, addr, va);
+                ret = gen_store_conditional(ctx, ra, rb, disp12,
+                                            MMU_PHYS_IDX, MO_LEQ);
                 break;
             case 0x4:
                 /* Longword virtual access */
@@ -2733,7 +2740,6 @@ static ExitStatus translate_one(DisasContext *ctx, uint32_t insn)
                 /* Invalid */
                 goto invalid_opc;
             }
-            tcg_temp_free(addr);
             break;
         }
 #else
@@ -2797,11 +2803,13 @@ static ExitStatus translate_one(DisasContext *ctx, uint32_t insn)
         break;
     case 0x2E:
         /* STL_C */
-        ret = gen_store_conditional(ctx, ra, rb, disp16, 0);
+        ret = gen_store_conditional(ctx, ra, rb, disp16,
+                                    ctx->mem_idx, MO_LESL);
         break;
     case 0x2F:
         /* STQ_C */
-        ret = gen_store_conditional(ctx, ra, rb, disp16, 1);
+        ret = gen_store_conditional(ctx, ra, rb, disp16,
+                                    ctx->mem_idx, MO_LEQ);
         break;
     case 0x30:
         /* BR */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [Qemu-devel] [PATCH v3 34/34] target-alpha: Emulate LL/SC using cmpxchg helpers
  2016-09-03 20:39 [Qemu-devel] [PATCH v3 00/34] cmpxchg-based emulation of atomics Richard Henderson
                   ` (32 preceding siblings ...)
  2016-09-03 20:40 ` [Qemu-devel] [PATCH v3 33/34] target-alpha: Introduce MMU_PHYS_IDX Richard Henderson
@ 2016-09-03 20:40 ` Richard Henderson
  2016-09-15 14:38   ` Alex Bennée
  2016-09-03 21:25 ` [Qemu-devel] [PATCH v3 00/34] cmpxchg-based emulation of atomics no-reply
                   ` (3 subsequent siblings)
  37 siblings, 1 reply; 97+ messages in thread
From: Richard Henderson @ 2016-09-03 20:40 UTC (permalink / raw)
  To: qemu-devel

Emulating LL/SC with cmpxchg is not correct, since it can
suffer from the ABA problem.  However, portable parallel
code is writting assuming only cmpxchg which means that in
practice this is a viable alternative.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 linux-user/main.c        |  49 ----------------------
 target-alpha/cpu.h       |   4 --
 target-alpha/helper.c    |   6 ---
 target-alpha/machine.c   |   2 -
 target-alpha/translate.c | 104 ++++++++++++++++++++---------------------------
 5 files changed, 45 insertions(+), 120 deletions(-)

diff --git a/linux-user/main.c b/linux-user/main.c
index 64838bf..6cdd0da 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -2994,51 +2994,6 @@ void cpu_loop(CPUM68KState *env)
 #endif /* TARGET_M68K */
 
 #ifdef TARGET_ALPHA
-static void do_store_exclusive(CPUAlphaState *env, int reg, int quad)
-{
-    target_ulong addr, val, tmp;
-    target_siginfo_t info;
-    int ret = 0;
-
-    addr = env->lock_addr;
-    tmp = env->lock_st_addr;
-    env->lock_addr = -1;
-    env->lock_st_addr = 0;
-
-    start_exclusive();
-    mmap_lock();
-
-    if (addr == tmp) {
-        if (quad ? get_user_s64(val, addr) : get_user_s32(val, addr)) {
-            goto do_sigsegv;
-        }
-
-        if (val == env->lock_value) {
-            tmp = env->ir[reg];
-            if (quad ? put_user_u64(tmp, addr) : put_user_u32(tmp, addr)) {
-                goto do_sigsegv;
-            }
-            ret = 1;
-        }
-    }
-    env->ir[reg] = ret;
-    env->pc += 4;
-
-    mmap_unlock();
-    end_exclusive();
-    return;
-
- do_sigsegv:
-    mmap_unlock();
-    end_exclusive();
-
-    info.si_signo = TARGET_SIGSEGV;
-    info.si_errno = 0;
-    info.si_code = TARGET_SEGV_MAPERR;
-    info._sifields._sigfault._addr = addr;
-    queue_signal(env, TARGET_SIGSEGV, &info);
-}
-
 void cpu_loop(CPUAlphaState *env)
 {
     CPUState *cs = CPU(alpha_env_get_cpu(env));
@@ -3212,10 +3167,6 @@ void cpu_loop(CPUAlphaState *env)
                 queue_signal(env, info.si_signo, &info);
             }
             break;
-        case EXCP_STL_C:
-        case EXCP_STQ_C:
-            do_store_exclusive(env, env->error_code, trapnr - EXCP_STL_C);
-            break;
         case EXCP_INTERRUPT:
             /* Just indicate that signals should be handled asap.  */
             break;
diff --git a/target-alpha/cpu.h b/target-alpha/cpu.h
index 9d9489c..a4068c8 100644
--- a/target-alpha/cpu.h
+++ b/target-alpha/cpu.h
@@ -230,7 +230,6 @@ struct CPUAlphaState {
     uint64_t pc;
     uint64_t unique;
     uint64_t lock_addr;
-    uint64_t lock_st_addr;
     uint64_t lock_value;
 
     /* The FPCR, and disassembled portions thereof.  */
@@ -346,9 +345,6 @@ enum {
     EXCP_ARITH,
     EXCP_FEN,
     EXCP_CALL_PAL,
-    /* For Usermode emulation.  */
-    EXCP_STL_C,
-    EXCP_STQ_C,
 };
 
 /* Alpha-specific interrupt pending bits.  */
diff --git a/target-alpha/helper.c b/target-alpha/helper.c
index 1ed0725..95453f5 100644
--- a/target-alpha/helper.c
+++ b/target-alpha/helper.c
@@ -306,12 +306,6 @@ void alpha_cpu_do_interrupt(CPUState *cs)
         case EXCP_CALL_PAL:
             name = "call_pal";
             break;
-        case EXCP_STL_C:
-            name = "stl_c";
-            break;
-        case EXCP_STQ_C:
-            name = "stq_c";
-            break;
         }
         qemu_log("INT %6d: %s(%#x) pc=%016" PRIx64 " sp=%016" PRIx64 "\n",
                  ++count, name, env->error_code, env->pc, env->ir[IR_SP]);
diff --git a/target-alpha/machine.c b/target-alpha/machine.c
index 710b783..b99a123 100644
--- a/target-alpha/machine.c
+++ b/target-alpha/machine.c
@@ -45,8 +45,6 @@ static VMStateField vmstate_env_fields[] = {
     VMSTATE_UINTTL(unique, CPUAlphaState),
     VMSTATE_UINTTL(lock_addr, CPUAlphaState),
     VMSTATE_UINTTL(lock_value, CPUAlphaState),
-    /* Note that lock_st_addr is not saved; it is a temporary
-       used during the execution of the st[lq]_c insns.  */
 
     VMSTATE_UINT8(ps, CPUAlphaState),
     VMSTATE_UINT8(intr_flag, CPUAlphaState),
diff --git a/target-alpha/translate.c b/target-alpha/translate.c
index 2941159..ed353de 100644
--- a/target-alpha/translate.c
+++ b/target-alpha/translate.c
@@ -99,7 +99,6 @@ static TCGv cpu_std_ir[31];
 static TCGv cpu_fir[31];
 static TCGv cpu_pc;
 static TCGv cpu_lock_addr;
-static TCGv cpu_lock_st_addr;
 static TCGv cpu_lock_value;
 
 #ifndef CONFIG_USER_ONLY
@@ -116,7 +115,6 @@ void alpha_translate_init(void)
     static const GlobalVar vars[] = {
         DEF_VAR(pc),
         DEF_VAR(lock_addr),
-        DEF_VAR(lock_st_addr),
         DEF_VAR(lock_value),
     };
 
@@ -198,6 +196,23 @@ static TCGv dest_sink(DisasContext *ctx)
     return ctx->sink;
 }
 
+static void free_context_temps(DisasContext *ctx)
+{
+    if (!TCGV_IS_UNUSED_I64(ctx->sink)) {
+        tcg_gen_discard_i64(ctx->sink);
+        tcg_temp_free(ctx->sink);
+        TCGV_UNUSED_I64(ctx->sink);
+    }
+    if (!TCGV_IS_UNUSED_I64(ctx->zero)) {
+        tcg_temp_free(ctx->zero);
+        TCGV_UNUSED_I64(ctx->zero);
+    }
+    if (!TCGV_IS_UNUSED_I64(ctx->lit)) {
+        tcg_temp_free(ctx->lit);
+        TCGV_UNUSED_I64(ctx->lit);
+    }
+}
+
 static TCGv load_gpr(DisasContext *ctx, unsigned reg)
 {
     if (likely(reg < 31)) {
@@ -395,56 +410,37 @@ static ExitStatus gen_store_conditional(DisasContext *ctx, int ra, int rb,
                                         int32_t disp16, int mem_idx,
                                         TCGMemOp op)
 {
-    TCGv addr;
-
-    if (ra == 31) {
-        /* ??? Don't bother storing anything.  The user can't tell
-           the difference, since the zero register always reads zero.  */
-        return NO_EXIT;
-    }
-
-#if defined(CONFIG_USER_ONLY)
-    addr = cpu_lock_st_addr;
-#else
-    addr = tcg_temp_local_new();
-#endif
+    TCGLabel *lab_fail, *lab_done;
+    TCGv addr, val;
 
+    addr = tcg_temp_new_i64();
     tcg_gen_addi_i64(addr, load_gpr(ctx, rb), disp16);
+    free_context_temps(ctx);
 
-#if defined(CONFIG_USER_ONLY)
-    /* ??? This is handled via a complicated version of compare-and-swap
-       in the cpu_loop.  Hopefully one day we'll have a real CAS opcode
-       in TCG so that this isn't necessary.  */
-    return gen_excp(ctx, (op & MO_SIZE) == MO_64 ? EXCP_STQ_C : EXCP_STL_C, ra);
-#else
-    /* ??? In system mode we are never multi-threaded, so CAS can be
-       implemented via a non-atomic load-compare-store sequence.  */
-    {
-        TCGLabel *lab_fail, *lab_done;
-        TCGv val;
+    lab_fail = gen_new_label();
+    lab_done = gen_new_label();
+    tcg_gen_brcond_i64(TCG_COND_NE, addr, cpu_lock_addr, lab_fail);
+    tcg_temp_free_i64(addr);
 
-        lab_fail = gen_new_label();
-        lab_done = gen_new_label();
-        tcg_gen_brcond_i64(TCG_COND_NE, addr, cpu_lock_addr, lab_fail);
+    val = tcg_temp_new_i64();
+    tcg_gen_atomic_cmpxchg_i64(val, cpu_lock_addr, cpu_lock_value,
+                               load_gpr(ctx, ra), mem_idx, op);
+    free_context_temps(ctx);
 
-        val = tcg_temp_new();
-        tcg_gen_qemu_ld_i64(val, addr, mem_idx, op);
-        tcg_gen_brcond_i64(TCG_COND_NE, val, cpu_lock_value, lab_fail);
-
-        tcg_gen_qemu_st_i64(ctx->ir[ra], addr, mem_idx, op);
-        tcg_gen_movi_i64(ctx->ir[ra], 1);
-        tcg_gen_br(lab_done);
+    if (ra != 31) {
+        tcg_gen_setcond_i64(TCG_COND_EQ, ctx->ir[ra], val, cpu_lock_value);
+    }
+    tcg_temp_free_i64(val);
+    tcg_gen_br(lab_done);
 
-        gen_set_label(lab_fail);
+    gen_set_label(lab_fail);
+    if (ra != 31) {
         tcg_gen_movi_i64(ctx->ir[ra], 0);
-
-        gen_set_label(lab_done);
-        tcg_gen_movi_i64(cpu_lock_addr, -1);
-
-        tcg_temp_free(addr);
-        return NO_EXIT;
     }
-#endif
+
+    gen_set_label(lab_done);
+    tcg_gen_movi_i64(cpu_lock_addr, -1);
+    return NO_EXIT;
 }
 
 static bool in_superpage(DisasContext *ctx, int64_t addr)
@@ -2914,6 +2910,10 @@ void gen_intermediate_code(CPUAlphaState *env, struct TranslationBlock *tb)
     /* Similarly for flush-to-zero.  */
     ctx.tb_ftz = -1;
 
+    TCGV_UNUSED_I64(ctx.zero);
+    TCGV_UNUSED_I64(ctx.sink);
+    TCGV_UNUSED_I64(ctx.lit);
+
     num_insns = 0;
     max_insns = tb->cflags & CF_COUNT_MASK;
     if (max_insns == 0) {
@@ -2948,23 +2948,9 @@ void gen_intermediate_code(CPUAlphaState *env, struct TranslationBlock *tb)
         }
         insn = cpu_ldl_code(env, ctx.pc);
 
-        TCGV_UNUSED_I64(ctx.zero);
-        TCGV_UNUSED_I64(ctx.sink);
-        TCGV_UNUSED_I64(ctx.lit);
-
         ctx.pc += 4;
         ret = translate_one(ctxp, insn);
-
-        if (!TCGV_IS_UNUSED_I64(ctx.sink)) {
-            tcg_gen_discard_i64(ctx.sink);
-            tcg_temp_free(ctx.sink);
-        }
-        if (!TCGV_IS_UNUSED_I64(ctx.zero)) {
-            tcg_temp_free(ctx.zero);
-        }
-        if (!TCGV_IS_UNUSED_I64(ctx.lit)) {
-            tcg_temp_free(ctx.lit);
-        }
+        free_context_temps(ctxp);
 
         /* If we reach a page boundary, are single stepping,
            or exhaust instruction count, stop generation.  */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* Re: [Qemu-devel] [PATCH v3 00/34] cmpxchg-based emulation of atomics
  2016-09-03 20:39 [Qemu-devel] [PATCH v3 00/34] cmpxchg-based emulation of atomics Richard Henderson
                   ` (33 preceding siblings ...)
  2016-09-03 20:40 ` [Qemu-devel] [PATCH v3 34/34] target-alpha: Emulate LL/SC using cmpxchg helpers Richard Henderson
@ 2016-09-03 21:25 ` no-reply
  2016-09-03 21:26 ` no-reply
                   ` (2 subsequent siblings)
  37 siblings, 0 replies; 97+ messages in thread
From: no-reply @ 2016-09-03 21:25 UTC (permalink / raw)
  To: rth; +Cc: famz, qemu-devel

Hi,

Your series seems to have some coding style problems. See output below for
more information:

Subject: [Qemu-devel] [PATCH v3 00/34] cmpxchg-based emulation of atomics
Type: series
Message-id: 1472935202-3342-1-git-send-email-rth@twiddle.net

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

# Useful git options
git config --local diff.renamelimit 0
git config --local diff.renames True

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
    echo "Checking PATCH $n/$total: $(git show --no-patch --format=%s $c)..."
    if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
        failed=1
        echo
    fi
    n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag]         patchew/1472935202-3342-1-git-send-email-rth@twiddle.net -> patchew/1472935202-3342-1-git-send-email-rth@twiddle.net
Switched to a new branch 'test'
7d6f99f target-alpha: Emulate LL/SC using cmpxchg helpers
95c65de target-alpha: Introduce MMU_PHYS_IDX
0250b51 target-arm: remove EXCP_STREX + cpu_exclusive_{test, info}
f5fb6dc linux-user: remove handling of aarch64's EXCP_STREX
5205e25 linux-user: remove handling of ARM's EXCP_STREX
779b958 target-arm: emulate aarch64's LL/SC using cmpxchg helpers
e78b7c8 target-arm: emulate SWP with atomic_xchg helper
0d71586 target-arm: emulate LL/SC using cmpxchg helpers
f3c834e target-arm: Rearrange aa32 load and store functions
9caae16 tests: add atomic_add-bench
01cb92d target-i386: remove helper_lock()
3b0638f target-i386: emulate XCHG using atomic helper
299454c target-i386: emulate LOCK'ed BTX ops using atomic helpers
ea9bf7b target-i386: emulate LOCK'ed XADD using atomic helper
9ed94c7 target-i386: emulate LOCK'ed NEG using cmpxchg helper
b114db2 target-i386: emulate LOCK'ed NOT using atomic helper
98ed70e target-i386: emulate LOCK'ed INC using atomic helper
6864197 target-i386: emulate LOCK'ed OP instructions using atomic helpers
60ea4a3 target-i386: emulate LOCK'ed cmpxchg using cmpxchg helpers
49c8f18 tcg: Add CONFIG_ATOMIC64
924cf07 tcg: Add atomic128 helpers
6f824c0 tcg: Add atomic helpers
289ae34 cputlb: Tidy some macros
6688a03 cputlb: Move most of iotlb code out of line
1233d51 cputlb: Remove includes from softmmu_template.h
ef3611f cputlb: Move probe_write out of softmmu_template.h
c2fe267 cputlb: Replace SHIFT with DATA_SIZE
be5d2eb HACK: Always enable parallel_cpus
5abdfaa tcg: Add EXCP_ATOMIC
45280a3 int128: Add int128_make128
fefab1d int128: Use __int128 if available
b302da3 exec: Avoid direct references to Int128 parts
b2eec15 atomics: add atomic_op_fetch variants
78a61f7 atomics: add atomic_xor

=== OUTPUT BEGIN ===
Checking PATCH 1/34: atomics: add atomic_xor...
Checking PATCH 2/34: atomics: add atomic_op_fetch variants...
Checking PATCH 3/34: exec: Avoid direct references to Int128 parts...
Checking PATCH 4/34: int128: Use __int128 if available...
Checking PATCH 5/34: int128: Add int128_make128...
Checking PATCH 6/34: tcg: Add EXCP_ATOMIC...
Checking PATCH 7/34: HACK: Always enable parallel_cpus...
Checking PATCH 8/34: cputlb: Replace SHIFT with DATA_SIZE...
Checking PATCH 9/34: cputlb: Move probe_write out of softmmu_template.h...
Checking PATCH 10/34: cputlb: Remove includes from softmmu_template.h...
Checking PATCH 11/34: cputlb: Move most of iotlb code out of line...
Checking PATCH 12/34: cputlb: Tidy some macros...
Checking PATCH 13/34: tcg: Add atomic helpers...
Checking PATCH 14/34: tcg: Add atomic128 helpers...
Checking PATCH 15/34: tcg: Add CONFIG_ATOMIC64...
ERROR: Macros with complex values should be enclosed in parenthesis
#128: FILE: tcg/tcg-op.c:2027:
+# define WITH_ATOMIC64(X) X,

total: 1 errors, 0 warnings, 262 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 16/34: target-i386: emulate LOCK'ed cmpxchg using cmpxchg helpers...
Checking PATCH 17/34: target-i386: emulate LOCK'ed OP instructions using atomic helpers...
Checking PATCH 18/34: target-i386: emulate LOCK'ed INC using atomic helper...
Checking PATCH 19/34: target-i386: emulate LOCK'ed NOT using atomic helper...
Checking PATCH 20/34: target-i386: emulate LOCK'ed NEG using cmpxchg helper...
Checking PATCH 21/34: target-i386: emulate LOCK'ed XADD using atomic helper...
Checking PATCH 22/34: target-i386: emulate LOCK'ed BTX ops using atomic helpers...
Checking PATCH 23/34: target-i386: emulate XCHG using atomic helper...
Checking PATCH 24/34: target-i386: remove helper_lock()...
Checking PATCH 25/34: tests: add atomic_add-bench...
Checking PATCH 26/34: target-arm: Rearrange aa32 load and store functions...
Checking PATCH 27/34: target-arm: emulate LL/SC using cmpxchg helpers...
Checking PATCH 28/34: target-arm: emulate SWP with atomic_xchg helper...
Checking PATCH 29/34: target-arm: emulate aarch64's LL/SC using cmpxchg helpers...
Checking PATCH 30/34: linux-user: remove handling of ARM's EXCP_STREX...
Checking PATCH 31/34: linux-user: remove handling of aarch64's EXCP_STREX...
Checking PATCH 32/34: target-arm: remove EXCP_STREX + cpu_exclusive_{test, info}...
Checking PATCH 33/34: target-alpha: Introduce MMU_PHYS_IDX...
Checking PATCH 34/34: target-alpha: Emulate LL/SC using cmpxchg helpers...
=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@freelists.org

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Qemu-devel] [PATCH v3 00/34] cmpxchg-based emulation of atomics
  2016-09-03 20:39 [Qemu-devel] [PATCH v3 00/34] cmpxchg-based emulation of atomics Richard Henderson
                   ` (34 preceding siblings ...)
  2016-09-03 21:25 ` [Qemu-devel] [PATCH v3 00/34] cmpxchg-based emulation of atomics no-reply
@ 2016-09-03 21:26 ` no-reply
  2016-09-09 18:33 ` Alex Bennée
  2016-09-15 14:39 ` Alex Bennée
  37 siblings, 0 replies; 97+ messages in thread
From: no-reply @ 2016-09-03 21:26 UTC (permalink / raw)
  To: rth; +Cc: famz, qemu-devel

Hi,

Your series failed automatic build test. Please find the testing commands and
their output below. If you have docker installed, you can probably reproduce it
locally.

Subject: [Qemu-devel] [PATCH v3 00/34] cmpxchg-based emulation of atomics
Type: series
Message-id: 1472935202-3342-1-git-send-email-rth@twiddle.net

=== TEST SCRIPT BEGIN ===
#!/bin/bash
set -e
git submodule update --init dtc
make J=8 docker-test-quick@centos6
make J=8 docker-test-mingw@fedora
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
7d6f99f target-alpha: Emulate LL/SC using cmpxchg helpers
95c65de target-alpha: Introduce MMU_PHYS_IDX
0250b51 target-arm: remove EXCP_STREX + cpu_exclusive_{test, info}
f5fb6dc linux-user: remove handling of aarch64's EXCP_STREX
5205e25 linux-user: remove handling of ARM's EXCP_STREX
779b958 target-arm: emulate aarch64's LL/SC using cmpxchg helpers
e78b7c8 target-arm: emulate SWP with atomic_xchg helper
0d71586 target-arm: emulate LL/SC using cmpxchg helpers
f3c834e target-arm: Rearrange aa32 load and store functions
9caae16 tests: add atomic_add-bench
01cb92d target-i386: remove helper_lock()
3b0638f target-i386: emulate XCHG using atomic helper
299454c target-i386: emulate LOCK'ed BTX ops using atomic helpers
ea9bf7b target-i386: emulate LOCK'ed XADD using atomic helper
9ed94c7 target-i386: emulate LOCK'ed NEG using cmpxchg helper
b114db2 target-i386: emulate LOCK'ed NOT using atomic helper
98ed70e target-i386: emulate LOCK'ed INC using atomic helper
6864197 target-i386: emulate LOCK'ed OP instructions using atomic helpers
60ea4a3 target-i386: emulate LOCK'ed cmpxchg using cmpxchg helpers
49c8f18 tcg: Add CONFIG_ATOMIC64
924cf07 tcg: Add atomic128 helpers
6f824c0 tcg: Add atomic helpers
289ae34 cputlb: Tidy some macros
6688a03 cputlb: Move most of iotlb code out of line
1233d51 cputlb: Remove includes from softmmu_template.h
ef3611f cputlb: Move probe_write out of softmmu_template.h
c2fe267 cputlb: Replace SHIFT with DATA_SIZE
be5d2eb HACK: Always enable parallel_cpus
5abdfaa tcg: Add EXCP_ATOMIC
45280a3 int128: Add int128_make128
fefab1d int128: Use __int128 if available
b302da3 exec: Avoid direct references to Int128 parts
b2eec15 atomics: add atomic_op_fetch variants
78a61f7 atomics: add atomic_xor

=== OUTPUT BEGIN ===
Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc'
Cloning into 'dtc'...
Submodule path 'dtc': checked out '65cc4d2748a2c2e6f27f1cf39e07a5dbabd80ebf'
  BUILD centos6
  ARCHIVE qemu.tgz
  ARCHIVE dtc.tgz
  COPY RUNNER
  RUN test-quick in centos6
No C++ compiler available; disabling C++ specific optional code
Install prefix    /tmp/qemu-test/src/tests/docker/install
BIOS directory    /tmp/qemu-test/src/tests/docker/install/share/qemu
binary directory  /tmp/qemu-test/src/tests/docker/install/bin
library directory /tmp/qemu-test/src/tests/docker/install/lib
module directory  /tmp/qemu-test/src/tests/docker/install/lib/qemu
libexec directory /tmp/qemu-test/src/tests/docker/install/libexec
include directory /tmp/qemu-test/src/tests/docker/install/include
config directory  /tmp/qemu-test/src/tests/docker/install/etc
local state directory   /tmp/qemu-test/src/tests/docker/install/var
Manual directory  /tmp/qemu-test/src/tests/docker/install/share/man
ELF interp prefix /usr/gnemul/qemu-%M
Source path       /tmp/qemu-test/src
C compiler        cc
Host C compiler   cc
C++ compiler      
Objective-C compiler cc
ARFLAGS           rv
CFLAGS            -O2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -pthread -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include   -g 
QEMU_CFLAGS       -I/usr/include/pixman-1    -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common  -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all
LDFLAGS           -Wl,--warn-common -Wl,-z,relro -Wl,-z,now -pie -m64 -g 
make              make
install           install
python            python -B
smbd              /usr/sbin/smbd
module support    no
host CPU          x86_64
host big endian   no
target list       x86_64-softmmu aarch64-softmmu
tcg debug enabled no
gprof enabled     no
sparse enabled    no
strip binaries    yes
profiler          no
static build      no
pixman            system
SDL support       yes (1.2.14)
GTK support       no 
GTK GL support    no
VTE support       no 
TLS priority      NORMAL
GNUTLS support    no
GNUTLS rnd        no
libgcrypt         no
libgcrypt kdf     no
nettle            no 
nettle kdf        no
libtasn1          no
curses support    no
virgl support     no
curl support      no
mingw32 support   no
Audio drivers     oss
Block whitelist (rw) 
Block whitelist (ro) 
VirtFS support    no
VNC support       yes
VNC SASL support  no
VNC JPEG support  no
VNC PNG support   no
xen support       no
brlapi support    no
bluez  support    no
Documentation     no
PIE               yes
vde support       no
netmap support    no
Linux AIO support no
ATTR/XATTR support yes
Install blobs     yes
KVM support       yes
RDMA support      no
TCG interpreter   no
fdt support       yes
preadv support    yes
fdatasync         yes
madvise           yes
posix_madvise     yes
uuid support      no
libcap-ng support no
vhost-net support yes
vhost-scsi support yes
Trace backends    log
spice support     no 
rbd support       no
xfsctl support    no
smartcard support no
libusb            no
usb net redir     no
OpenGL support    no
OpenGL dmabufs    no
libiscsi support  no
libnfs support    no
build guest agent yes
QGA VSS support   no
QGA w32 disk info no
QGA MSI support   no
seccomp support   no
coroutine backend ucontext
coroutine pool    yes
GlusterFS support no
Archipelago support no
gcov              gcov
gcov enabled      no
TPM support       yes
libssh2 support   no
TPM passthrough   yes
QOM debugging     yes
vhdx              no
lzo support       no
snappy support    no
bzip2 support     no
NUMA host support no
tcmalloc support  no
jemalloc support  no
avx2 optimization no
  GEN   x86_64-softmmu/config-devices.mak.tmp
  GEN   aarch64-softmmu/config-devices.mak.tmp
  GEN   config-host.h
  GEN   qemu-options.def
  GEN   qmp-commands.h
  GEN   qapi-types.h
  GEN   qapi-visit.h
  GEN   qapi-event.h
  GEN   x86_64-softmmu/config-devices.mak
  GEN   aarch64-softmmu/config-devices.mak
  GEN   qmp-introspect.h
  GEN   tests/test-qapi-visit.h
  GEN   tests/test-qapi-types.h
  GEN   tests/test-qmp-commands.h
  GEN   tests/test-qapi-event.h
  GEN   tests/test-qmp-introspect.h
  GEN   config-all-devices.mak
  GEN   trace/generated-events.h
  GEN   trace/generated-tracers.h
  GEN   trace/generated-tcg-tracers.h
  GEN   trace/generated-helpers-wrappers.h
  GEN   trace/generated-helpers.h
  CC    tests/qemu-iotests/socket_scm_helper.o
  GEN   qga/qapi-generated/qga-qapi-types.h
  GEN   qga/qapi-generated/qga-qapi-visit.h
  GEN   qga/qapi-generated/qga-qmp-commands.h
  GEN   qga/qapi-generated/qga-qapi-types.c
  GEN   qga/qapi-generated/qga-qapi-visit.c
  GEN   qga/qapi-generated/qga-qmp-marshal.c
  GEN   qmp-introspect.c
  GEN   qapi-types.c
  GEN   qapi-visit.c
  GEN   qapi-event.c
  CC    qapi/qapi-visit-core.o
  CC    qapi/qapi-dealloc-visitor.o
  CC    qapi/qmp-input-visitor.o
  CC    qapi/qmp-output-visitor.o
  CC    qapi/qmp-registry.o
  CC    qapi/qmp-dispatch.o
  CC    qapi/string-input-visitor.o
  CC    qapi/string-output-visitor.o
  CC    qapi/opts-visitor.o
  CC    qapi/qapi-clone-visitor.o
  CC    qapi/qmp-event.o
  CC    qapi/qapi-util.o
  CC    qobject/qnull.o
  CC    qobject/qint.o
  CC    qobject/qstring.o
  CC    qobject/qdict.o
  CC    qobject/qlist.o
  CC    qobject/qfloat.o
  CC    qobject/qbool.o
  CC    qobject/qjson.o
  CC    qobject/qobject.o
  CC    qobject/json-lexer.o
  CC    qobject/json-streamer.o
  CC    qobject/json-parser.o
  GEN   trace/generated-events.c
  CC    trace/control.o
  CC    trace/qmp.o
  CC    util/osdep.o
  CC    util/cutils.o
  CC    util/unicode.o
  CC    util/qemu-timer-common.o
  CC    util/compatfd.o
  CC    util/event_notifier-posix.o
  CC    util/mmap-alloc.o
  CC    util/oslib-posix.o
  CC    util/qemu-openpty.o
  CC    util/qemu-thread-posix.o
  CC    util/memfd.o
  CC    util/envlist.o
  CC    util/path.o
  CC    util/module.o
  CC    util/bitmap.o
  CC    util/bitops.o
  CC    util/hbitmap.o
  CC    util/fifo8.o
  CC    util/acl.o
  CC    util/error.o
  CC    util/qemu-error.o
  CC    util/id.o
  CC    util/iov.o
  CC    util/qemu-config.o
  CC    util/qemu-sockets.o
  CC    util/uri.o
  CC    util/notify.o
  CC    util/qemu-option.o
  CC    util/qemu-progress.o
  CC    util/hexdump.o
  CC    util/crc32c.o
  CC    util/throttle.o
  CC    util/getauxval.o
  CC    util/readline.o
  CC    util/rfifolock.o
  CC    util/rcu.o
  CC    util/qemu-coroutine.o
  CC    util/qemu-coroutine-lock.o
  CC    util/qemu-coroutine-io.o
  CC    util/qemu-coroutine-sleep.o
  CC    util/coroutine-ucontext.o
  CC    util/buffer.o
  CC    util/timed-average.o
  CC    util/base64.o
  CC    util/log.o
  CC    util/qdist.o
  CC    util/qht.o
  CC    util/range.o
  CC    crypto/pbkdf-stub.o
  CC    stubs/arch-query-cpu-def.o
  CC    stubs/bdrv-next-monitor-owned.o
  CC    stubs/blk-commit-all.o
  CC    stubs/blockdev-close-all-bdrv-states.o
  CC    stubs/clock-warp.o
  CC    stubs/cpu-get-icount.o
  CC    stubs/cpu-get-clock.o
  CC    stubs/dump.o
  CC    stubs/fdset-add-fd.o
  CC    stubs/fdset-find-fd.o
  CC    stubs/fdset-get-fd.o
  CC    stubs/fdset-remove-fd.o
/tmp/qemu-test/src/util/qht.c: In function ‘qht_reset_size’:
/tmp/qemu-test/src/util/qht.c:413: warning: ‘new’ may be used uninitialized in this function
  CC    stubs/gdbstub.o
  CC    stubs/get-fd.o
  CC    stubs/get-next-serial.o
  CC    stubs/get-vm-name.o
  CC    stubs/iothread-lock.o
  CC    stubs/machine-init-done.o
  CC    stubs/is-daemonized.o
  CC    stubs/migr-blocker.o
  CC    stubs/mon-is-qmp.o
  CC    stubs/mon-printf.o
  CC    stubs/monitor-init.o
  CC    stubs/notify-event.o
  CC    stubs/qtest.o
  CC    stubs/replay.o
  CC    stubs/replay-user.o
  CC    stubs/reset.o
  CC    stubs/runstate-check.o
  CC    stubs/set-fd-handler.o
  CC    stubs/slirp.o
  CC    stubs/sysbus.o
  CC    stubs/trace-control.o
  CC    stubs/uuid.o
  CC    stubs/vm-stop.o
  CC    stubs/vmstate.o
In file included from /tmp/qemu-test/src/include/exec/memory.h:31,
                 from /tmp/qemu-test/src/include/hw/hw.h:12,
                 from /tmp/qemu-test/src/stubs/reset.c:3:
/tmp/qemu-test/src/include/qemu/int128.h:7: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘Int128’
/tmp/qemu-test/src/include/qemu/int128.h:9: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘int128_make64’
/tmp/qemu-test/src/include/qemu/int128.h:14: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘int128_make128’
/tmp/qemu-test/src/include/qemu/int128.h:19: error: expected ‘)’ before ‘a’
/tmp/qemu-test/src/include/qemu/int128.h:26: error: expected ‘)’ before ‘a’
/tmp/qemu-test/src/include/qemu/int128.h:31: error: expected ‘)’ before ‘a’
/tmp/qemu-test/src/include/qemu/int128.h:36: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘int128_zero’
/tmp/qemu-test/src/include/qemu/int128.h:41: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘int128_one’
/tmp/qemu-test/src/include/qemu/int128.h:46: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘int128_2_64’
/tmp/qemu-test/src/include/qemu/int128.h:51: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘int128_exts64’
/tmp/qemu-test/src/include/qemu/int128.h:56: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘int128_and’
/tmp/qemu-test/src/include/qemu/int128.h:61: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘int128_rshift’
/tmp/qemu-test/src/include/qemu/int128.h:66: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘int128_add’
/tmp/qemu-test/src/include/qemu/int128.h:71: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘int128_neg’
/tmp/qemu-test/src/include/qemu/int128.h:76: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘int128_sub’
/tmp/qemu-test/src/include/qemu/int128.h:81: error: expected ‘)’ before ‘a’
/tmp/qemu-test/src/include/qemu/int128.h:86: error: expected ‘)’ before ‘a’
/tmp/qemu-test/src/include/qemu/int128.h:91: error: expected ‘)’ before ‘a’
/tmp/qemu-test/src/include/qemu/int128.h:96: error: expected ‘)’ before ‘a’
/tmp/qemu-test/src/include/qemu/int128.h:101: error: expected ‘)’ before ‘a’
/tmp/qemu-test/src/include/qemu/int128.h:106: error: expected ‘)’ before ‘a’
/tmp/qemu-test/src/include/qemu/int128.h:111: error: expected ‘)’ before ‘a’
/tmp/qemu-test/src/include/qemu/int128.h:116: error: expected ‘)’ before ‘a’
/tmp/qemu-test/src/include/qemu/int128.h:121: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘int128_min’
/tmp/qemu-test/src/include/qemu/int128.h:126: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘int128_max’
/tmp/qemu-test/src/include/qemu/int128.h:131: error: expected ‘)’ before ‘*’ token
/tmp/qemu-test/src/include/qemu/int128.h:136: error: expected ‘)’ before ‘*’ token
/tmp/qemu-test/src/include/qemu/int128.h:141: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘bswap128’
In file included from /tmp/qemu-test/src/include/hw/hw.h:12,
                 from /tmp/qemu-test/src/stubs/reset.c:3:
/tmp/qemu-test/src/include/exec/memory.h:186: error: expected specifier-qualifier-list before ‘Int128’
In file included from /tmp/qemu-test/src/include/hw/hw.h:12,
                 from /tmp/qemu-test/src/stubs/reset.c:3:
/tmp/qemu-test/src/include/exec/memory.h:278: error: expected specifier-qualifier-list before ‘Int128’
make: *** [stubs/reset.o] Error 1
make: *** Waiting for unfinished jobs....
  CC    stubs/cpus.o
tests/docker/Makefile.include:104: recipe for target 'docker-run-test-quick@centos6' failed
make: *** [docker-run-test-quick@centos6] Error 1
=== OUTPUT END ===

Test command exited with code: 2


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@freelists.org

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Qemu-devel] [PATCH v3 05/34] int128: Add int128_make128
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 05/34] int128: Add int128_make128 Richard Henderson
@ 2016-09-09 13:01   ` Leon Alrae
  2016-09-09 20:16     ` Richard Henderson
  0 siblings, 1 reply; 97+ messages in thread
From: Leon Alrae @ 2016-09-09 13:01 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Sat, Sep 03, 2016 at 09:39:33PM +0100, Richard Henderson wrote:
> Allows Int128 to be used more generally, rather than having to
> begin with 64-bit inputs and accumulate.
> 
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  include/qemu/int128.h | 20 +++++++++++++++-----
>  1 file changed, 15 insertions(+), 5 deletions(-)
> 
> diff --git a/include/qemu/int128.h b/include/qemu/int128.h
> index 08f1db1..67440fa 100644
> --- a/include/qemu/int128.h
> +++ b/include/qemu/int128.h
> @@ -10,6 +10,11 @@ static inline Int128 int128_make64(uint64_t a)
>      return a;
>  }
>  
> +static inline Int128 int128_make128(uint64_t lo, uint64_t hi)
> +{
> +    return (unsigned __int128)hi << 64 | lo;
> +}
> +

This causes build failures for me on CentOS6.5:
/user/lea/dev/qemu/include/qemu/int128.h:7: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘Int128’
/user/lea/dev/qemu/include/qemu/int128.h:9: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘int128_make64’
/user/lea/dev/qemu/include/qemu/int128.h:14: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘int128_make128’
(...)

This is because CONFIG_INT128 is set if test for __int128_t succeeds, not
__int128. The following change on top of patches 4 and 5 in this series
fixes the problem for me:

diff --git a/include/qemu/int128.h b/include/qemu/int128.h
index 261b55f..5c9890d 100644
--- a/include/qemu/int128.h
+++ b/include/qemu/int128.h
@@ -4,7 +4,7 @@
 #ifdef CONFIG_INT128
 #include "qemu/bswap.h"

-typedef __int128 Int128;
+typedef __int128_t Int128;

 static inline Int128 int128_make64(uint64_t a)
 {
@@ -13,7 +13,7 @@ static inline Int128 int128_make64(uint64_t a)

 static inline Int128 int128_make128(uint64_t lo, uint64_t hi)
 {
-    return (unsigned __int128)hi << 64 | lo;
+    return (__uint128_t)hi << 64 | lo;
 }

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* Re: [Qemu-devel] [PATCH v3 13/34] tcg: Add atomic helpers
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 13/34] tcg: Add atomic helpers Richard Henderson
@ 2016-09-09 13:11   ` Leon Alrae
  2016-09-09 14:46   ` Leon Alrae
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 97+ messages in thread
From: Leon Alrae @ 2016-09-09 13:11 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

> +#define GEN_ATOMIC_HELPER(NAME, OP, NEW)                                \
> +static void * const table_##NAME[16] = {                                \
> +    [MO_8] = gen_helper_atomic_##NAME##b,                               \
> +    [MO_16 | MO_LE] = gen_helper_atomic_##NAME##w_le,                   \
> +    [MO_16 | MO_BE] = gen_helper_atomic_##NAME##w_be,                   \
> +    [MO_32 | MO_LE] = gen_helper_atomic_##NAME##l_le,                   \
> +    [MO_32 | MO_BE] = gen_helper_atomic_##NAME##l_be,                   \
> +    [MO_64 | MO_LE] = gen_helper_atomic_##NAME##q_le,                   \
> +    [MO_64 | MO_BE] = gen_helper_atomic_##NAME##q_be,                   \
> +};                                                                      \
> +void tcg_gen_atomic_##NAME##_i32                                        \
> +    (TCGv_i32 ret, TCGv addr, TCGv_i32 val, TCGArg idx, TCGMemOp memop) \
> +{                                                                       \
> +    if (parallel_cpus) {                                                \
> +        do_atomic_op_i32(ret, addr, val, idx, memop, table_##NAME);     \
> +    } else {                                                            \
> +        do_nonatomic_op_i32(ret, addr, val, idx, memop, NEW,            \
> +                            tcg_gen_##OP##_i32);                        \
> +    }                                                                   \
> +}                                                                       \
> +void tcg_gen_atomic_##NAME##_i64                                        \
> +    (TCGv_i64 ret, TCGv addr, TCGv_i64 val, TCGArg idx, TCGMemOp memop) \
> +{                                                                       \
> +    if (parallel_cpus) {                                                \
> +        do_atomic_op_i64(ret, addr, val, idx, memop, table_##NAME);     \
> +    } else {                                                            \
> +        do_nonatomic_op_i64(ret, addr, val, idx, memop, NEW,            \
> +                            tcg_gen_##OP##_i64);                        \
> +    }                                                                   \
> +}
> +
> +GEN_ATOMIC_HELPER(fetch_add, add, 0)
> +GEN_ATOMIC_HELPER(fetch_and, and, 0)
> +GEN_ATOMIC_HELPER(fetch_or, or, 0)
> +GEN_ATOMIC_HELPER(fetch_xor, xor, 0)
> +
> +GEN_ATOMIC_HELPER(add_fetch, add, 1)
> +GEN_ATOMIC_HELPER(and_fetch, and, 1)
> +GEN_ATOMIC_HELPER(or_fetch, or, 1)
> +GEN_ATOMIC_HELPER(xor_fetch, xor, 1)

This causes build errors on CentOS6.5 (where __ATOMIC_RELAXED is undefined):
/user/lea/dev/qemu/tcg/tcg-op.c:2280: error: ‘gen_helper_atomic_fetch_addb’ undeclared here (not in a function)
/user/lea/dev/qemu/tcg/tcg-op.c:2280: error: ‘gen_helper_atomic_fetch_addw_le’ undeclared here (not in a function)
/user/lea/dev/qemu/tcg/tcg-op.c:2280: error: ‘gen_helper_atomic_fetch_addw_be’ undeclared here (not in a function)
(...)


The fix I've been using locally while working on enabling MTTCG for MIPS:

diff --git a/include/qemu/atomic.h b/include/qemu/atomic.h
index 5542416..818a5a4 100644
--- a/include/qemu/atomic.h
+++ b/include/qemu/atomic.h
@@ -409,22 +409,22 @@
 /* Provide shorter names for GCC atomic builtins.  */
 #define atomic_fetch_inc(ptr)  __sync_fetch_and_add(ptr, 1)
 #define atomic_fetch_dec(ptr)  __sync_fetch_and_add(ptr, -1)
-#define atomic_fetch_add       __sync_fetch_and_add
-#define atomic_fetch_sub       __sync_fetch_and_sub
-#define atomic_fetch_and       __sync_fetch_and_and
-#define atomic_fetch_or        __sync_fetch_and_or
-#define atomic_fetch_xor       __sync_fetch_and_xor
+#define atomic_fetch_add(ptr, n)  __sync_fetch_and_add(ptr, n)
+#define atomic_fetch_sub(ptr, n)  __sync_fetch_and_sub(ptr, n)
+#define atomic_fetch_and(ptr, n)  __sync_fetch_and_and(ptr, n)
+#define atomic_fetch_or(ptr, n)  __sync_fetch_and_or(ptr, n)
+#define atomic_fetch_xor(ptr, n)  __sync_fetch_and_xor(ptr, n)

 #define atomic_inc_fetch(ptr)  __sync_add_and_fetch(ptr, 1)
 #define atomic_dec_fetch(ptr)  __sync_add_and_fetch(ptr, -1)
-#define atomic_add_fetch       __sync_add_and_fetch
-#define atomic_sub_fetch       __sync_sub_and_fetch
-#define atomic_and_fetch       __sync_and_and_fetch
-#define atomic_or_fetch        __sync_or_and_fetch
-#define atomic_xor_fetch       __sync_xor_and_fetch
-
-#define atomic_cmpxchg         __sync_val_compare_and_swap
-#define atomic_cmpxchg__nocheck  atomic_cmpxchg
+#define atomic_add_fetch(ptr, n)  __sync_add_and_fetch(ptr, n)
+#define atomic_sub_fetch(ptr, n)  __sync_sub_and_fetch(ptr, n)
+#define atomic_and_fetch(ptr, n)  __sync_and_and_fetch(ptr, n)
+#define atomic_or_fetch(ptr, n)  __sync_or_and_fetch(ptr, n)
+#define atomic_xor_fetch(ptr, n)  __sync_xor_and_fetch(ptr, n)
+
+#define atomic_cmpxchg(ptr, old, new) __sync_val_compare_and_swap(ptr, old, new)
+#define atomic_cmpxchg__nocheck(ptr, old, new)  atomic_cmpxchg(ptr, old, new)

 /* And even shorter names that return void.  */
 #define atomic_inc(ptr)        ((void) __sync_fetch_and_add(ptr, 1))

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* Re: [Qemu-devel] [PATCH v3 13/34] tcg: Add atomic helpers
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 13/34] tcg: Add atomic helpers Richard Henderson
  2016-09-09 13:11   ` Leon Alrae
@ 2016-09-09 14:46   ` Leon Alrae
  2016-09-09 16:26     ` Richard Henderson
  2016-09-12 13:47   ` Alex Bennée
  2016-09-13 17:06   ` Alex Bennée
  3 siblings, 1 reply; 97+ messages in thread
From: Leon Alrae @ 2016-09-09 14:46 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Sat, Sep 03, 2016 at 09:39:41PM +0100, Richard Henderson wrote:
> --- a/tcg/tcg.h
> +++ b/tcg/tcg.h
> @@ -1175,6 +1175,59 @@ uint64_t helper_be_ldq_cmmu(CPUArchState *env, target_ulong addr,
>  # define helper_ret_ldq_cmmu  helper_le_ldq_cmmu
>  #endif
>  
> +uint32_t helper_atomic_cmpxchgb_mmu(CPUArchState *env, target_ulong addr,
> +                                    uint32_t cmpv, uint32_t newv,
> +                                    TCGMemOpIdx oi, uintptr_t retaddr);
> +uint32_t helper_atomic_cmpxchgw_le_mmu(CPUArchState *env, target_ulong addr,
> +                                       uint32_t cmpv, uint32_t newv,
> +                                       TCGMemOpIdx oi, uintptr_t retaddr);
> +uint32_t helper_atomic_cmpxchgl_le_mmu(CPUArchState *env, target_ulong addr,
> +                                       uint32_t cmpv, uint32_t newv,
> +                                       TCGMemOpIdx oi, uintptr_t retaddr);
> +uint64_t helper_atomic_cmpxchgq_le_mmu(CPUArchState *env, target_ulong addr,
> +                                       uint64_t cmpv, uint64_t newv,
> +                                       TCGMemOpIdx oi, uintptr_t retaddr);
> +uint32_t helper_atomic_cmpxchgw_be_mmu(CPUArchState *env, target_ulong addr,
> +                                       uint32_t cmpv, uint32_t newv,
> +                                       TCGMemOpIdx oi, uintptr_t retaddr);
> +uint32_t helper_atomic_cmpxchgl_be_mmu(CPUArchState *env, target_ulong addr,
> +                                       uint32_t cmpv, uint32_t newv,
> +                                       TCGMemOpIdx oi, uintptr_t retaddr);
> +uint64_t helper_atomic_cmpxchgq_be_mmu(CPUArchState *env, target_ulong addr,
> +                                       uint64_t cmpv, uint64_t newv,
> +                                       TCGMemOpIdx oi, uintptr_t retaddr);

Wouldn't it be useful if tcg.h provided also aliases for _le/_be atomic
helpers (equivalent to helper_ret_X_mmu) so that in target-* code we wouldn't
need to care about the endianness (specifically I'm thinking about SC in MIPS
where I need to select between helper_atomic_cmpxchgl_le_mmu() and 
helper_atomic_cmpxchgl_be_mmu()).

Thanks,
Leon

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Qemu-devel] [PATCH v3 13/34] tcg: Add atomic helpers
  2016-09-09 14:46   ` Leon Alrae
@ 2016-09-09 16:26     ` Richard Henderson
  2016-09-12  7:59       ` Leon Alrae
  0 siblings, 1 reply; 97+ messages in thread
From: Richard Henderson @ 2016-09-09 16:26 UTC (permalink / raw)
  To: Leon Alrae; +Cc: qemu-devel

On 09/09/2016 07:46 AM, Leon Alrae wrote:
> Wouldn't it be useful if tcg.h provided also aliases for _le/_be atomic
> helpers (equivalent to helper_ret_X_mmu) so that in target-* code we wouldn't
> need to care about the endianness (specifically I'm thinking about SC in MIPS
> where I need to select between helper_atomic_cmpxchgl_le_mmu() and
> helper_atomic_cmpxchgl_be_mmu()).

Perhaps.  I would have hoped that you could do the SC inline now though, and 
tcg_gen_atomic_cmpxchg() will do the selection for you.


r~

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Qemu-devel] [PATCH v3 03/34] exec: Avoid direct references to Int128 parts
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 03/34] exec: Avoid direct references to Int128 parts Richard Henderson
@ 2016-09-09 17:14   ` Alex Bennée
  0 siblings, 0 replies; 97+ messages in thread
From: Alex Bennée @ 2016-09-09 17:14 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <rth@twiddle.net> writes:

> Signed-off-by: Richard Henderson <rth@twiddle.net>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  exec.c                |  4 ++--
>  include/qemu/int128.h | 10 ++++++++++
>  2 files changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/exec.c b/exec.c
> index 8ffde75..373313d 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -320,9 +320,9 @@ static inline bool section_covers_addr(const MemoryRegionSection *section,
>      /* Memory topology clips a memory region to [0, 2^64); size.hi > 0 means
>       * the section must cover the entire address space.
>       */
> -    return section->size.hi ||
> +    return int128_gethi(section->size) ||
>             range_covers_byte(section->offset_within_address_space,
> -                             section->size.lo, addr);
> +                             int128_getlo(section->size), addr);
>  }
>
>  static MemoryRegionSection *phys_page_find(PhysPageEntry lp, hwaddr addr,
> diff --git a/include/qemu/int128.h b/include/qemu/int128.h
> index c598881..52aaf99 100644
> --- a/include/qemu/int128.h
> +++ b/include/qemu/int128.h
> @@ -20,6 +20,16 @@ static inline uint64_t int128_get64(Int128 a)
>      return a.lo;
>  }
>
> +static inline uint64_t int128_getlo(Int128 a)
> +{
> +    return a.lo;
> +}
> +
> +static inline int64_t int128_gethi(Int128 a)
> +{
> +    return a.hi;
> +}
> +
>  static inline Int128 int128_zero(void)
>  {
>      return int128_make64(0);


--
Alex Bennée

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Qemu-devel] [PATCH v3 04/34] int128: Use __int128 if available
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 04/34] int128: Use __int128 if available Richard Henderson
@ 2016-09-09 17:19   ` Alex Bennée
  2016-09-09 17:38     ` Richard Henderson
  0 siblings, 1 reply; 97+ messages in thread
From: Alex Bennée @ 2016-09-09 17:19 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <rth@twiddle.net> writes:

> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  include/qemu/int128.h | 135 +++++++++++++++++++++++++++++++++++++++++++++++++-
>  tests/test-int128.c   |  22 ++++----
>  2 files changed, 145 insertions(+), 12 deletions(-)
>
> diff --git a/include/qemu/int128.h b/include/qemu/int128.h
> index 52aaf99..08f1db1 100644
> --- a/include/qemu/int128.h
> +++ b/include/qemu/int128.h
> @@ -1,6 +1,138 @@
>  #ifndef INT128_H
>  #define INT128_H
>
> +#ifdef CONFIG_INT128
> +
> +typedef __int128 Int128;
> +
> +static inline Int128 int128_make64(uint64_t a)
> +{
> +    return a;
> +}
> +
> +static inline uint64_t int128_get64(Int128 a)
> +{
> +    uint64_t r = a;
> +    assert(r == a);
> +    return r;
> +}

So is _get64 just an alias for getlo? It looks fragile for the time
someone has a > 64 bit number stuffed in before they fetch it.

> +
> +static inline uint64_t int128_getlo(Int128 a)
> +{
> +    return a;
> +}
> +
> +static inline int64_t int128_gethi(Int128 a)
> +{
> +    return a >> 64;
> +}
> +
> +static inline Int128 int128_zero(void)
> +{
> +    return 0;
> +}
> +
> +static inline Int128 int128_one(void)
> +{
> +    return 1;
> +}
> +
> +static inline Int128 int128_2_64(void)
> +{
> +    return (Int128)1 << 64;
> +}
> +
> +static inline Int128 int128_exts64(int64_t a)
> +{
> +    return a;
> +}
> +
> +static inline Int128 int128_and(Int128 a, Int128 b)
> +{
> +    return a & b;
> +}
> +
> +static inline Int128 int128_rshift(Int128 a, int n)
> +{
> +    return a >> n;
> +}
> +
> +static inline Int128 int128_add(Int128 a, Int128 b)
> +{
> +    return a + b;
> +}
> +
> +static inline Int128 int128_neg(Int128 a)
> +{
> +    return -a;
> +}
> +
> +static inline Int128 int128_sub(Int128 a, Int128 b)
> +{
> +    return a - b;
> +}
> +
> +static inline bool int128_nonneg(Int128 a)
> +{
> +    return a >= 0;
> +}
> +
> +static inline bool int128_eq(Int128 a, Int128 b)
> +{
> +    return a == b;
> +}
> +
> +static inline bool int128_ne(Int128 a, Int128 b)
> +{
> +    return a != b;
> +}
> +
> +static inline bool int128_ge(Int128 a, Int128 b)
> +{
> +    return a >= b;
> +}
> +
> +static inline bool int128_lt(Int128 a, Int128 b)
> +{
> +    return a < b;
> +}
> +
> +static inline bool int128_le(Int128 a, Int128 b)
> +{
> +    return a <= b;
> +}
> +
> +static inline bool int128_gt(Int128 a, Int128 b)
> +{
> +    return a > b;
> +}
> +
> +static inline bool int128_nz(Int128 a)
> +{
> +    return a != 0;
> +}
> +
> +static inline Int128 int128_min(Int128 a, Int128 b)
> +{
> +    return a < b ? a : b;
> +}
> +
> +static inline Int128 int128_max(Int128 a, Int128 b)
> +{
> +    return a > b ? a : b;
> +}
> +
> +static inline void int128_addto(Int128 *a, Int128 b)
> +{
> +    *a += b;
> +}
> +
> +static inline void int128_subfrom(Int128 *a, Int128 b)
> +{
> +    *a -= b;
> +}
> +
> +#else /* !CONFIG_INT128 */
>
>  typedef struct Int128 Int128;
>
> @@ -153,4 +285,5 @@ static inline void int128_subfrom(Int128 *a, Int128 b)
>      *a = int128_sub(*a, b);
>  }
>
> -#endif
> +#endif /* CONFIG_INT128 */
> +#endif /* INT128_H */
> diff --git a/tests/test-int128.c b/tests/test-int128.c
> index 4390123..b86a3c7 100644
> --- a/tests/test-int128.c
> +++ b/tests/test-int128.c
> @@ -41,7 +41,7 @@ static Int128 expand(uint32_t x)
>      uint64_t l, h;
>      l = expand16(x & 65535);
>      h = expand16(x >> 16);
> -    return (Int128) {l, h};
> +    return (Int128) int128_make128(l, h);
>  };
>
>  static void test_and(void)
> @@ -54,8 +54,8 @@ static void test_and(void)
>              Int128 b = expand(tests[j]);
>              Int128 r = expand(tests[i] & tests[j]);
>              Int128 s = int128_and(a, b);
> -            g_assert_cmpuint(r.lo, ==, s.lo);
> -            g_assert_cmpuint(r.hi, ==, s.hi);
> +            g_assert_cmpuint(int128_getlo(r), ==, int128_getlo(s));
> +            g_assert_cmpuint(int128_gethi(r), ==, int128_gethi(s));
>          }
>      }
>  }
> @@ -70,8 +70,8 @@ static void test_add(void)
>              Int128 b = expand(tests[j]);
>              Int128 r = expand(tests[i] + tests[j]);
>              Int128 s = int128_add(a, b);
> -            g_assert_cmpuint(r.lo, ==, s.lo);
> -            g_assert_cmpuint(r.hi, ==, s.hi);
> +            g_assert_cmpuint(int128_getlo(r), ==, int128_getlo(s));
> +            g_assert_cmpuint(int128_gethi(r), ==, int128_gethi(s));
>          }
>      }
>  }
> @@ -86,8 +86,8 @@ static void test_sub(void)
>              Int128 b = expand(tests[j]);
>              Int128 r = expand(tests[i] - tests[j]);
>              Int128 s = int128_sub(a, b);
> -            g_assert_cmpuint(r.lo, ==, s.lo);
> -            g_assert_cmpuint(r.hi, ==, s.hi);
> +            g_assert_cmpuint(int128_getlo(r), ==, int128_getlo(s));
> +            g_assert_cmpuint(int128_gethi(r), ==, int128_gethi(s));
>          }
>      }
>  }
> @@ -100,8 +100,8 @@ static void test_neg(void)
>          Int128 a = expand(tests[i]);
>          Int128 r = expand(-tests[i]);
>          Int128 s = int128_neg(a);
> -        g_assert_cmpuint(r.lo, ==, s.lo);
> -        g_assert_cmpuint(r.hi, ==, s.hi);
> +        g_assert_cmpuint(int128_getlo(r), ==, int128_getlo(s));
> +        g_assert_cmpuint(int128_gethi(r), ==, int128_gethi(s));
>      }
>  }
>
> @@ -180,8 +180,8 @@ test_rshift_one(uint32_t x, int n, uint64_t h, uint64_t l)
>  {
>      Int128 a = expand(x);
>      Int128 r = int128_rshift(a, n);
> -    g_assert_cmpuint(r.lo, ==, l);
> -    g_assert_cmpuint(r.hi, ==, h);
> +    g_assert_cmpuint(int128_getlo(r), ==, l);
> +    g_assert_cmpuint(int128_gethi(r), ==, h);
>  }
>
>  static void test_rshift(void)


--
Alex Bennée

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Qemu-devel] [PATCH v3 04/34] int128: Use __int128 if available
  2016-09-09 17:19   ` Alex Bennée
@ 2016-09-09 17:38     ` Richard Henderson
  0 siblings, 0 replies; 97+ messages in thread
From: Richard Henderson @ 2016-09-09 17:38 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel

On 09/09/2016 10:19 AM, Alex Bennée wrote:
>
> Richard Henderson <rth@twiddle.net> writes:
>
>> Signed-off-by: Richard Henderson <rth@twiddle.net>
>> ---
>>  include/qemu/int128.h | 135 +++++++++++++++++++++++++++++++++++++++++++++++++-
>>  tests/test-int128.c   |  22 ++++----
>>  2 files changed, 145 insertions(+), 12 deletions(-)
>>
>> diff --git a/include/qemu/int128.h b/include/qemu/int128.h
>> index 52aaf99..08f1db1 100644
>> --- a/include/qemu/int128.h
>> +++ b/include/qemu/int128.h
>> @@ -1,6 +1,138 @@
>>  #ifndef INT128_H
>>  #define INT128_H
>>
>> +#ifdef CONFIG_INT128
>> +
>> +typedef __int128 Int128;
>> +
>> +static inline Int128 int128_make64(uint64_t a)
>> +{
>> +    return a;
>> +}
>> +
>> +static inline uint64_t int128_get64(Int128 a)
>> +{
>> +    uint64_t r = a;
>> +    assert(r == a);
>> +    return r;
>> +}
>
> So is _get64 just an alias for getlo? It looks fragile for the time
> someone has a > 64 bit number stuffed in before they fetch it.

You'll note that this interface is not my invention.  However...

It's asserting that we *don't* have a > 64 bit number stuffed in.  Prior to me 
adding make_int128, the only way to get a > 64 bit number was via arithmetic.


r~

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Qemu-devel] [PATCH v3 00/34] cmpxchg-based emulation of atomics
  2016-09-03 20:39 [Qemu-devel] [PATCH v3 00/34] cmpxchg-based emulation of atomics Richard Henderson
                   ` (35 preceding siblings ...)
  2016-09-03 21:26 ` no-reply
@ 2016-09-09 18:33 ` Alex Bennée
  2016-09-09 19:07   ` Richard Henderson
  2016-09-15 14:39 ` Alex Bennée
  37 siblings, 1 reply; 97+ messages in thread
From: Alex Bennée @ 2016-09-09 18:33 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <rth@twiddle.net> writes:

> Changes since v2:
>   * Fix build for 32-bit host without 64-bit atomics.
>     Tested with --extra-cflags='-march=-i486'.
>     This is patch 15, which might ought to get folded back into
>     patch 13 for bisection, but is ugly for review.

Hmm I wonder what's breaking the configure script on Travis:

    https://travis-ci.org/stsquad/qemu/builds/158784781

It's not very forthcoming with detail....

>
>   * Move a lot of stuff out of softmmu_template.h, and into cputlb.c.
>     This is both for code size reduction and readability.

\o/

>
>   * Fold in Alex's test-int128 fix.
>
>
> r~
>
>
> Emilio G. Cota (18):
>   atomics: add atomic_xor
>   atomics: add atomic_op_fetch variants
>   target-i386: emulate LOCK'ed cmpxchg using cmpxchg helpers
>   target-i386: emulate LOCK'ed OP instructions using atomic helpers
>   target-i386: emulate LOCK'ed INC using atomic helper
>   target-i386: emulate LOCK'ed NOT using atomic helper
>   target-i386: emulate LOCK'ed NEG using cmpxchg helper
>   target-i386: emulate LOCK'ed XADD using atomic helper
>   target-i386: emulate LOCK'ed BTX ops using atomic helpers
>   target-i386: emulate XCHG using atomic helper
>   target-i386: remove helper_lock()
>   tests: add atomic_add-bench
>   target-arm: emulate LL/SC using cmpxchg helpers
>   target-arm: emulate SWP with atomic_xchg helper
>   target-arm: emulate aarch64's LL/SC using cmpxchg helpers
>   linux-user: remove handling of ARM's EXCP_STREX
>   linux-user: remove handling of aarch64's EXCP_STREX
>   target-arm: remove EXCP_STREX + cpu_exclusive_{test, info}
>
> Richard Henderson (16):
>   exec: Avoid direct references to Int128 parts
>   int128: Use __int128 if available
>   int128: Add int128_make128
>   tcg: Add EXCP_ATOMIC
>   HACK: Always enable parallel_cpus
>   cputlb: Replace SHIFT with DATA_SIZE
>   cputlb: Move probe_write out of softmmu_template.h
>   cputlb: Remove includes from softmmu_template.h
>   cputlb: Move most of iotlb code out of line
>   cputlb: Tidy some macros
>   tcg: Add atomic helpers
>   tcg: Add atomic128 helpers
>   tcg: Add CONFIG_ATOMIC64
>   target-arm: Rearrange aa32 load and store functions
>   target-alpha: Introduce MMU_PHYS_IDX
>   target-alpha: Emulate LL/SC using cmpxchg helpers
>
>  Makefile.objs              |   1 -
>  Makefile.target            |   1 +
>  atomic_template.h          | 211 +++++++++++++++++++++++++
>  configure                  |  62 +++++++-
>  cpu-exec-common.c          |   6 +
>  cpu-exec.c                 |  23 +++
>  cpus.c                     |   6 +
>  cputlb.c                   | 203 ++++++++++++++++++++++--
>  exec.c                     |   4 +-
>  include/exec/cpu-all.h     |   1 +
>  include/exec/exec-all.h    |   1 +
>  include/qemu-common.h      |   1 +
>  include/qemu/atomic.h      |  42 ++++-
>  include/qemu/int128.h      | 171 +++++++++++++++++++-
>  linux-user/main.c          | 326 +++++++-------------------------------
>  softmmu_template.h         | 104 ++----------
>  target-alpha/cpu.h         |  22 +--
>  target-alpha/helper.c      |  16 +-
>  target-alpha/helper.h      |   9 --
>  target-alpha/machine.c     |   2 -
>  target-alpha/mem_helper.c  |  73 ---------
>  target-alpha/translate.c   | 148 +++++++++--------
>  target-arm/cpu.h           |  17 +-
>  target-arm/helper-a64.c    | 113 +++++++++++++
>  target-arm/helper-a64.h    |   2 +
>  target-arm/internals.h     |   4 +-
>  target-arm/translate-a64.c | 106 ++++++-------
>  target-arm/translate.c     | 342 ++++++++++++++-------------------------
>  target-arm/translate.h     |   4 -
>  target-i386/helper.h       |   4 +-
>  target-i386/mem_helper.c   | 153 ++++++++++++------
>  target-i386/translate.c    | 386 +++++++++++++++++++++++++++++----------------
>  tcg-runtime.c              |  74 +++++++--
>  tcg/tcg-op.c               | 342 +++++++++++++++++++++++++++++++++++++++
>  tcg/tcg-op.h               |  44 ++++++
>  tcg/tcg-runtime.h          | 109 +++++++++++++
>  tcg/tcg.h                  |  85 ++++++++++
>  tests/.gitignore           |   1 +
>  tests/Makefile.include     |   4 +-
>  tests/atomic_add-bench.c   | 180 +++++++++++++++++++++
>  tests/test-int128.c        |  22 +--
>  translate-all.c            |   1 +
>  42 files changed, 2346 insertions(+), 1080 deletions(-)
>  create mode 100644 atomic_template.h
>  create mode 100644 tests/atomic_add-bench.c


--
Alex Bennée

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Qemu-devel] [PATCH v3 00/34] cmpxchg-based emulation of atomics
  2016-09-09 18:33 ` Alex Bennée
@ 2016-09-09 19:07   ` Richard Henderson
  2016-09-09 19:29     ` Alex Bennée
  0 siblings, 1 reply; 97+ messages in thread
From: Richard Henderson @ 2016-09-09 19:07 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel

On 09/09/2016 11:33 AM, Alex Bennée wrote:
>
> Richard Henderson <rth@twiddle.net> writes:
>
>> Changes since v2:
>>   * Fix build for 32-bit host without 64-bit atomics.
>>     Tested with --extra-cflags='-march=-i486'.
>>     This is patch 15, which might ought to get folded back into
>>     patch 13 for bisection, but is ugly for review.
>
> Hmm I wonder what's breaking the configure script on Travis:
>
>     https://travis-ci.org/stsquad/qemu/builds/158784781
>
> It's not very forthcoming with detail....

Huh.  Well, I can attempt a build with the same version of gcc, and see if 
that's enlightening.  Anything you could do to track that down would be helpful.


>>   * Move a lot of stuff out of softmmu_template.h, and into cputlb.c.
>>     This is both for code size reduction and readability.
>
> \o/

:-)


r~

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Qemu-devel] [PATCH v3 00/34] cmpxchg-based emulation of atomics
  2016-09-09 19:07   ` Richard Henderson
@ 2016-09-09 19:29     ` Alex Bennée
  2016-09-09 20:03       ` Richard Henderson
  2016-09-09 20:11       ` Richard Henderson
  0 siblings, 2 replies; 97+ messages in thread
From: Alex Bennée @ 2016-09-09 19:29 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 891 bytes --]


Richard Henderson <rth@twiddle.net> writes:

> On 09/09/2016 11:33 AM, Alex Bennée wrote:
>>
>> Richard Henderson <rth@twiddle.net> writes:
>>
>>> Changes since v2:
>>>   * Fix build for 32-bit host without 64-bit atomics.
>>>     Tested with --extra-cflags='-march=-i486'.
>>>     This is patch 15, which might ought to get folded back into
>>>     patch 13 for bisection, but is ugly for review.
>>
>> Hmm I wonder what's breaking the configure script on Travis:
>>
>>     https://travis-ci.org/stsquad/qemu/builds/158784781
>>
>> It's not very forthcoming with detail....
>
> Huh.  Well, I can attempt a build with the same version of gcc, and see if
> that's enlightening.  Anything you could do to track that down would
> be helpful.

If your local system has docker you can add the attached file into
tests/docker/dockerfiles and then do:

    make docker-test-quick@travis J=9 V=1


[-- Attachment #2: travis/ubuntu dockerfile --]
[-- Type: application/octet-stream, Size: 180 bytes --]

FROM ubuntu:12.04
RUN apt-get update
RUN apt-get -y build-dep qemu
RUN apt-get -y build-dep device-tree-compiler
RUN apt-get -y install python2.7 dh-autoreconf
ENV FEATURES pyyaml

[-- Attachment #3: Type: text/plain, Size: 163 bytes --]


I ran:

    make docker-test-quick@travis J=9 V=1 DEBUG=1

and then manually:

    /tmp/qemu-test/src/tests/docker/test-quick

and got the following config.log:


[-- Attachment #4: config.log from failed build --]
[-- Type: application/octet-stream, Size: 98475 bytes --]

# QEMU configure log Fri Sep  9 19:25:56 UTC 2016
# Configured with: '/tmp/qemu-test/src/configure' '--enable-werror' '--target-list=x86_64-softmmu aarch64-softmmu' '--prefix=/tmp/qemu-test/src/tests/docker/install'
#
cc -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -c -o config-temp/qemu-conf.o config-temp/qemu-conf.c
cc -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -c -o config-temp/qemu-conf.o config-temp/qemu-conf.c
config-temp/qemu-conf.c:2:2: error: #error __i386__ not defined
cc -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -c -o config-temp/qemu-conf.o config-temp/qemu-conf.c
cc -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -c -o config-temp/qemu-conf.o config-temp/qemu-conf.c
config-temp/qemu-conf.c:2:2: error: #error __ILP32__ not defined
cc -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -c -o config-temp/qemu-conf.o config-temp/qemu-conf.c
cc -Werror -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -c -o config-temp/qemu-conf.o config-temp/qemu-conf.c
cc -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -m64 -g
cc -Werror -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -m64 -g
cc -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -c -o config-temp/qemu-conf.o config-temp/qemu-conf.c
cc -Werror -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -c -o config-temp/qemu-conf.o config-temp/qemu-conf.c
c++ -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wall -Wundef -Wwrite-strings -fno-strict-aliasing -fno-common -o config-temp/qemu-conf.exe config-temp/qemu-conf.cxx config-temp/qemu-conf.o -m64 -g
c++ -Werror -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wall -Wundef -Wwrite-strings -fno-strict-aliasing -fno-common -o config-temp/qemu-conf.exe config-temp/qemu-conf.cxx config-temp/qemu-conf.o -m64 -g
cc -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Werror -Wstring-plus-int -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -m64 -g
cc1: error: unrecognized command line option '-Wstring-plus-int'
cc -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Werror -Winitializer-overrides -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -m64 -g
cc1: error: unrecognized command line option '-Winitializer-overrides'
cc -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Werror -Wendif-labels -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -m64 -g
cc -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Werror -Wshift-negative-value -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -m64 -g
cc1: error: unrecognized command line option '-Wshift-negative-value'
cc -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Werror -Wmissing-include-dirs -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -m64 -g
cc -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Werror -Wempty-body -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -m64 -g
cc -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Werror -Wnested-externs -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -m64 -g
cc -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Werror -Wformat-security -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -m64 -g
cc -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Werror -Wformat-y2k -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -m64 -g
cc -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Werror -Winit-self -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -m64 -g
cc -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Werror -Wignored-qualifiers -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -m64 -g
cc -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Werror -Wold-style-declaration -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -m64 -g
cc -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Werror -Wold-style-definition -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -m64 -g
cc -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Werror -Wtype-limits -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -m64 -g
cc -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -Werror -fstack-protector-strong -c -o config-temp/qemu-conf.o config-temp/qemu-conf.c
cc1: error: unrecognized command line option '-fstack-protector-strong'
cc -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -Werror -fstack-protector-all -c -o config-temp/qemu-conf.o config-temp/qemu-conf.c
cc -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -Werror -fstack-protector-all -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -m64 -g
cc -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -Werror -fno-gcse -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -m64 -g
cc -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -Werror -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -m64 -g
cc -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -fPIE -DPIE -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -m64 -g -pie
cc -Werror -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -fPIE -DPIE -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -m64 -g -pie
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -pie -m64 -g -Wl,-z,relro -Wl,-z,now
cc -Werror -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -pie -m64 -g -Wl,-z,relro -Wl,-z,now
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -Werror -fno-pie -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g -nopie
gcc-4.6.real: error: unrecognized option '-nopie'
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -c -o config-temp/qemu-conf.o config-temp/qemu-conf.c
cc -Werror -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -c -o config-temp/qemu-conf.o config-temp/qemu-conf.c
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -Werror -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -c -o config-temp/qemu-conf.o config-temp/qemu-conf.c
cc -Werror -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -c -o config-temp/qemu-conf.o config-temp/qemu-conf.c
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -c -o config-temp/qemu-conf.o config-temp/qemu-conf.c
config-temp/qemu-conf.c:2:9: error: attribute(target("avx2")) is unknown
config-temp/qemu-conf.c: In function 'bar':
config-temp/qemu-conf.c:7:5: warning: implicit declaration of function '_mm256_movemask_epi8' [-Wimplicit-function-declaration]
config-temp/qemu-conf.c:7:5: warning: nested extern declaration of '_mm256_movemask_epi8' [-Wnested-externs]
config-temp/qemu-conf.c:7:5: warning: implicit declaration of function '_mm256_cmpeq_epi8' [-Wimplicit-function-declaration]
config-temp/qemu-conf.c:7:5: warning: nested extern declaration of '_mm256_cmpeq_epi8' [-Wnested-externs]
config-temp/qemu-conf.c:7:53: error: '__m256i' undeclared (first use in this function)
config-temp/qemu-conf.c:7:53: note: each undeclared identifier is reported only once for each function it appears in
config-temp/qemu-conf.c:7:62: error: expected expression before ')' token
config-temp/qemu-conf.c:8:1: warning: control reaches end of non-void function [-Wreturn-type]
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g -lz
cc -Werror -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g -lz
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g -llzo2
config-temp/qemu-conf.c:1:23: fatal error: lzo/lzo1x.h: No such file or directory
compilation terminated.
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g -lsnappy
config-temp/qemu-conf.c:1:22: fatal error: snappy-c.h: No such file or directory
compilation terminated.
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g -lbz2
config-temp/qemu-conf.c:1:19: fatal error: bzlib.h: No such file or directory
compilation terminated.
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g -lxenstore -lxenctrl -lxenguest
config-temp/qemu-conf.c:1:21: fatal error: xenctrl.h: No such file or directory
compilation terminated.
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g -lgnutls
cc -Werror -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g -lgnutls
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g -L/lib/x86_64-linux-gnu -lgcrypt
cc -Werror -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g -L/lib/x86_64-linux-gnu -lgcrypt
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -c -o config-temp/qemu-conf.o config-temp/qemu-conf.c
cc -Werror -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -c -o config-temp/qemu-conf.o config-temp/qemu-conf.c
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -D_GNU_SOURCE=1 -D_REENTRANT -I/usr/include/SDL -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g -lSDL
cc -Werror -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -D_GNU_SOURCE=1 -D_REENTRANT -I/usr/include/SDL -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g -lSDL
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -D_GNU_SOURCE=1 -D_REENTRANT -I/usr/include/SDL -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g -lSDL -lX11
cc -Werror -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -D_GNU_SOURCE=1 -D_REENTRANT -I/usr/include/SDL -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g -lSDL -lX11
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g -lrdmacm -libverbs
config-temp/qemu-conf.c:1:27: fatal error: rdma/rdma_cma.h: No such file or directory
compilation terminated.
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g -lsasl2
cc -Werror -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g -lsasl2
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g -ljpeg
config-temp/qemu-conf.c:2:21: fatal error: jpeglib.h: No such file or directory
compilation terminated.
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g -lpng12
cc -Werror -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g -lpng12
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -Werror -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
/tmp/ccD2ma17.o: In function `main':
/tmp/qemu-test/src/tests/docker/config-temp/qemu-conf.c:5: undefined reference to `uuid_generate'
collect2: ld returned 1 exit status
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g -luuid
cc -Werror -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g -luuid
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
config-temp/qemu-conf.c:2:21: fatal error: xfs/xfs.h: No such file or directory
compilation terminated.
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g -lvdeplug
config-temp/qemu-conf.c:1:24: fatal error: libvdeplug.h: No such file or directory
compilation terminated.
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g -lcap-ng
config-temp/qemu-conf.c:1:20: fatal error: cap-ng.h: No such file or directory
compilation terminated.
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g -lbrlapi
config-temp/qemu-conf.c:1:20: fatal error: brlapi.h: No such file or directory
compilation terminated.
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g -lncurses -ltinfo
cc -Werror -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g -lncurses -ltinfo
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g -lcurl
cc -Werror -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g -lcurl
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
config-temp/qemu-conf.c:1:33: fatal error: bluetooth/bluetooth.h: No such file or directory
compilation terminated.
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -pthread -I/usr/include/glib-2.0 -I/usr/lib/x86_64-linux-gnu/glib-2.0/include -g -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g -pthread -lgthread-2.0 -lrt -lglib-2.0 -lz
cc -Werror -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -pthread -I/usr/include/glib-2.0 -I/usr/lib/x86_64-linux-gnu/glib-2.0/include -g -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g -pthread -lgthread-2.0 -lrt -lglib-2.0 -lz
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -pthread -I/usr/include/glib-2.0 -I/usr/lib/x86_64-linux-gnu/glib-2.0/include -Werror -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g -pthread -lgthread-2.0 -lrt -lglib-2.0
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g -lcap
config-temp/qemu-conf.c:2:28: fatal error: sys/capability.h: No such file or directory
compilation terminated.
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
/tmp/cccWHcRM.o: In function `main':
/tmp/qemu-test/src/tests/docker/config-temp/qemu-conf.c:5: undefined reference to `pthread_create'
collect2: ld returned 1 exit status
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g -pthread
cc -Werror -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g -pthread
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g -pthread
cc -Werror -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g -pthread
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g -lrbd -lrados
cc -Werror -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g -lrbd -lrados
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g -laio
cc -Werror -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g -laio
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -Werror -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -Werror -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -Werror -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g -lfdt
config-temp/qemu-conf.c:1:20: fatal error: libfdt.h: No such file or directory
compilation terminated.
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -Werror -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -Werror -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -Werror -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -Werror -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -Werror -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -Werror -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g -lnuma
config-temp/qemu-conf.c:1:18: fatal error: numa.h: No such file or directory
compilation terminated.
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -Werror -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -Werror -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
config-temp/qemu-conf.c:1:23: fatal error: sys/memfd.h: No such file or directory
compilation terminated.
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -Werror -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -Werror -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
config-temp/qemu-conf.c: In function 'main':
config-temp/qemu-conf.c:6:18: error: 'FALLOC_FL_ZERO_RANGE' undeclared (first use in this function)
config-temp/qemu-conf.c:6:18: note: each undeclared identifier is reported only once for each function it appears in
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -Werror -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -Werror -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -Werror -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -Werror -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -Werror -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -Werror -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -Werror -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -Werror -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -Werror -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -Werror -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -Werror -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -Werror -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
config-temp/qemu-conf.c:1:24: fatal error: sys/endian.h: No such file or directory
compilation terminated.
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
/tmp/ccrWHaFL.o: In function `main':
/tmp/qemu-test/src/tests/docker/config-temp/qemu-conf.c:2: undefined reference to `sin'
collect2: ld returned 1 exit status
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g -lm
cc -Werror -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g -lm
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
/tmp/ccUAAcTU.o: In function `main':
/tmp/qemu-test/src/tests/docker/config-temp/qemu-conf.c:4: undefined reference to `timer_create'
/tmp/qemu-test/src/tests/docker/config-temp/qemu-conf.c:5: undefined reference to `clock_gettime'
collect2: ld returned 1 exit status
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g -pthread -lrt
cc -Werror -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g -pthread -lrt
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -Werror -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -Werror -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -Werror -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -Werror -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -Werror -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -Werror -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -Werror -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -Werror -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
config-temp/qemu-conf.c:1:31: fatal error: valgrind/valgrind.h: No such file or directory
compilation terminated.
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -Werror -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -Werror -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -Werror -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
config-temp/qemu-conf.c: In function 'main':
config-temp/qemu-conf.c:4:3: warning: implicit declaration of function '__atomic_load_16' [-Wimplicit-function-declaration]
config-temp/qemu-conf.c:4:3: warning: nested extern declaration of '__atomic_load_16' [-Wnested-externs]
config-temp/qemu-conf.c:5:3: warning: implicit declaration of function '__atomic_store_16' [-Wimplicit-function-declaration]
config-temp/qemu-conf.c:5:3: warning: nested extern declaration of '__atomic_store_16' [-Wnested-externs]
config-temp/qemu-conf.c:6:3: warning: implicit declaration of function '__atomic_compare_exchange_16' [-Wimplicit-function-declaration]
config-temp/qemu-conf.c:6:3: warning: nested extern declaration of '__atomic_compare_exchange_16' [-Wnested-externs]
/tmp/cc0imc8D.o: In function `main':
/tmp/qemu-test/src/tests/docker/config-temp/qemu-conf.c:4: undefined reference to `__atomic_load_16'
/tmp/qemu-test/src/tests/docker/config-temp/qemu-conf.c:5: undefined reference to `__atomic_store_16'
/tmp/qemu-test/src/tests/docker/config-temp/qemu-conf.c:6: undefined reference to `__atomic_compare_exchange_16'
collect2: ld returned 1 exit status
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
config-temp/qemu-conf.c: In function 'main':
config-temp/qemu-conf.c:12:8: warning: unused variable 'is_host64' [-Wunused-variable]
cc -Werror -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
config-temp/qemu-conf.c: In function 'main':
config-temp/qemu-conf.c:12:8: error: unused variable 'is_host64' [-Werror=unused-variable]
cc1: all warnings being treated as errors

[-- Attachment #5: Type: text/plain, Size: 2784 bytes --]


Of which:

config-temp/qemu-conf.c: In function 'main':
config-temp/qemu-conf.c:4:3: warning: implicit declaration of function '__atomic_load_16' [-Wimplicit-function-declaration]
config-temp/qemu-conf.c:4:3: warning: nested extern declaration of '__atomic_load_16' [-Wnested-externs]
config-temp/qemu-conf.c:5:3: warning: implicit declaration of function '__atomic_store_16' [-Wimplicit-function-declaration]
config-temp/qemu-conf.c:5:3: warning: nested extern declaration of '__atomic_store_16' [-Wnested-externs]
config-temp/qemu-conf.c:6:3: warning: implicit declaration of function '__atomic_compare_exchange_16' [-Wimplicit-function-declaration]
config-temp/qemu-conf.c:6:3: warning: nested extern declaration of '__atomic_compare_exchange_16' [-Wnested-externs]
/tmp/cc0imc8D.o: In function `main':
/tmp/qemu-test/src/tests/docker/config-temp/qemu-conf.c:4: undefined reference to `__atomic_load_16'
/tmp/qemu-test/src/tests/docker/config-temp/qemu-conf.c:5: undefined reference to `__atomic_store_16'
/tmp/qemu-test/src/tests/docker/config-temp/qemu-conf.c:6: undefined reference to `__atomic_compare_exchange_16'
collect2: ld returned 1 exit status
cc -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
config-temp/qemu-conf.c: In function 'main':
config-temp/qemu-conf.c:12:8: warning: unused variable 'is_host64' [-Wunused-variable]
cc -Werror -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/p11-kit-1 -I/usr/include/libpng12 -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -Wl,-z,relro -Wl,-z,now -pie -m64 -g
config-temp/qemu-conf.c: In function 'main':
config-temp/qemu-conf.c:12:8: error: unused variable 'is_host64' [-Werror=unused-variable]
cc1: all warnings being treated as errors


>
>
>>>   * Move a lot of stuff out of softmmu_template.h, and into cputlb.c.
>>>     This is both for code size reduction and readability.
>>
>> \o/
>
> :-)
>
>
> r~


--
Alex Bennée

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Qemu-devel] [PATCH v3 00/34] cmpxchg-based emulation of atomics
  2016-09-09 19:29     ` Alex Bennée
@ 2016-09-09 20:03       ` Richard Henderson
  2016-09-09 20:11       ` Richard Henderson
  1 sibling, 0 replies; 97+ messages in thread
From: Richard Henderson @ 2016-09-09 20:03 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel

On 09/09/2016 12:29 PM, Alex Bennée wrote:
> config-temp/qemu-conf.c: In function 'main':
> config-temp/qemu-conf.c:12:8: error: unused variable 'is_host64' [-Werror=unused-variable]
> cc1: all warnings being treated as errors

Ah hah.  All I needed was a compiler without __atomic support.
I thought I'd tested that, but perhaps not in the final form.


r~

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Qemu-devel] [PATCH v3 00/34] cmpxchg-based emulation of atomics
  2016-09-09 19:29     ` Alex Bennée
  2016-09-09 20:03       ` Richard Henderson
@ 2016-09-09 20:11       ` Richard Henderson
  1 sibling, 0 replies; 97+ messages in thread
From: Richard Henderson @ 2016-09-09 20:11 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel

On 09/09/2016 12:29 PM, Alex Bennée wrote:
> config-temp/qemu-conf.c:12:8: error: unused variable 'is_host64' [-Werror=unused-variable]
> cc1: all warnings being treated as errors

Try this.


r~


diff --git a/configure b/configure
index 519de5d..bd59cdd 100755
--- a/configure
+++ b/configure
@@ -4479,7 +4479,7 @@ int main(void)
    __atomic_exchange_8(&x, y, 0);
    __atomic_fetch_add_8(&x, y, 0);
  #else
-  char is_host64[sizeof(void *) >= sizeof(uint64_t) ? 1 : -1];
+  typedef char is_host64[sizeof(void *) >= sizeof(uint64_t) ? 1 : -1];
    __sync_lock_test_and_set(&x, y);
    __sync_val_compare_and_swap(&x, y, 0);
    __sync_fetch_and_add(&x, y);

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* Re: [Qemu-devel] [PATCH v3 05/34] int128: Add int128_make128
  2016-09-09 13:01   ` Leon Alrae
@ 2016-09-09 20:16     ` Richard Henderson
  0 siblings, 0 replies; 97+ messages in thread
From: Richard Henderson @ 2016-09-09 20:16 UTC (permalink / raw)
  To: Leon Alrae; +Cc: qemu-devel

On 09/09/2016 06:01 AM, Leon Alrae wrote:
> This causes build failures for me on CentOS6.5:
> /user/lea/dev/qemu/include/qemu/int128.h:7: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘Int128’
> /user/lea/dev/qemu/include/qemu/int128.h:9: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘int128_make64’
> /user/lea/dev/qemu/include/qemu/int128.h:14: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘int128_make128’
> (...)
>
> This is because CONFIG_INT128 is set if test for __int128_t succeeds, not
> __int128. The following change on top of patches 4 and 5 in this series
> fixes the problem for me:
>
> diff --git a/include/qemu/int128.h b/include/qemu/int128.h
> index 261b55f..5c9890d 100644
> --- a/include/qemu/int128.h
> +++ b/include/qemu/int128.h
> @@ -4,7 +4,7 @@
>  #ifdef CONFIG_INT128
>  #include "qemu/bswap.h"
>
> -typedef __int128 Int128;
> +typedef __int128_t Int128;
>
>  static inline Int128 int128_make64(uint64_t a)
>  {
> @@ -13,7 +13,7 @@ static inline Int128 int128_make64(uint64_t a)
>
>  static inline Int128 int128_make128(uint64_t lo, uint64_t hi)
>  {
> -    return (unsigned __int128)hi << 64 | lo;
> +    return (__uint128_t)hi << 64 | lo;
>  }

Thanks, I'll fold that in.


r~

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Qemu-devel] [PATCH v3 13/34] tcg: Add atomic helpers
  2016-09-09 16:26     ` Richard Henderson
@ 2016-09-12  7:59       ` Leon Alrae
  2016-09-12 16:13         ` Richard Henderson
  0 siblings, 1 reply; 97+ messages in thread
From: Leon Alrae @ 2016-09-12  7:59 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Fri, Sep 09, 2016 at 09:26:29AM -0700, Richard Henderson wrote:
> On 09/09/2016 07:46 AM, Leon Alrae wrote:
> >Wouldn't it be useful if tcg.h provided also aliases for _le/_be atomic
> >helpers (equivalent to helper_ret_X_mmu) so that in target-* code we wouldn't
> >need to care about the endianness (specifically I'm thinking about SC in MIPS
> >where I need to select between helper_atomic_cmpxchgl_le_mmu() and
> >helper_atomic_cmpxchgl_be_mmu()).
> 
> Perhaps.  I would have hoped that you could do the SC inline now
> though, and tcg_gen_atomic_cmpxchg() will do the selection for you.

On every SC we need to do the virtual -> physical address translation as we
have to compare the physical address against that of the preceeding LL.
This operation seems too complex to be inlined :(

Leon

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Qemu-devel] [PATCH v3 13/34] tcg: Add atomic helpers
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 13/34] tcg: Add atomic helpers Richard Henderson
  2016-09-09 13:11   ` Leon Alrae
  2016-09-09 14:46   ` Leon Alrae
@ 2016-09-12 13:47   ` Alex Bennée
  2016-09-13 18:00     ` Richard Henderson
  2016-09-13 17:06   ` Alex Bennée
  3 siblings, 1 reply; 97+ messages in thread
From: Alex Bennée @ 2016-09-12 13:47 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <rth@twiddle.net> writes:

> Add all of cmpxchg, op_fetch, fetch_op, and xchg.
> Handle both endian-ness, and sizes up to 8.
> Handle expanding non-atomically, when emulating in serial.
>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
<snip>
>
> +/* Probe for a read-modify-write atomic operation.  Do not allow unaligned
> + * operations, or io operations to proceed.  Return the host address.  */
> +static void *atomic_mmu_lookup(CPUArchState *env, target_ulong addr,
> +                               TCGMemOpIdx oi, uintptr_t retaddr)
> +{
> +    size_t mmu_idx = get_mmuidx(oi);
> +    size_t index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
> +    CPUTLBEntry *tlbe = &env->tlb_table[mmu_idx][index];
> +    target_ulong tlb_addr = tlbe->addr_write;
> +    TCGMemOp mop = get_memop(oi);
> +    int a_bits = get_alignment_bits(mop);
> +    int s_bits = mop & MO_SIZE;
> +
> +    /* Adjust the given return address.  */
> +    retaddr -= GETPC_ADJ;
> +
> +    /* Enforce guest required alignment.  */
> +    if (unlikely(a_bits > 0 && (addr & ((1 << a_bits) - 1)))) {
> +        /* ??? Maybe indicate atomic op to cpu_unaligned_access */
> +        cpu_unaligned_access(ENV_GET_CPU(env), addr, MMU_DATA_STORE,
> +                             mmu_idx, retaddr);
> +    }
> +
> +    /* Enforce qemu required alignment.  */
> +    if (unlikely(addr & ((1 << s_bits) - 1))) {
> +        /* We get here if guest alignment was not requested,
> +           or was not enforced by cpu_unaligned_access above.
> +           We might widen the access and emulate, but for now
> +           mark an exception and exit the cpu loop.  */
> +        goto stop_the_world;
> +    }
> +
> +    /* Check TLB entry and enforce page permissions.  */
> +    if ((addr & TARGET_PAGE_MASK)
> +        != (tlb_addr & (TARGET_PAGE_MASK | TLB_INVALID_MASK))) {
> +        if (!VICTIM_TLB_HIT(addr_write, addr)) {
> +            tlb_fill(ENV_GET_CPU(env), addr, MMU_DATA_STORE, mmu_idx, retaddr);
> +        }
> +        tlb_addr = tlbe->addr_write;
> +    }
> +
> +    /* Notice an IO access, or a notdirty page.  */
> +    if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
> +        /* There's really nothing that can be done to
> +           support this apart from stop-the-world.  */
> +        goto stop_the_world;

We are also triggering on TLB_NOTDIRTY here in the case where a
conditional write is the first write to a page. I don't know if a
stop_the_world is required at this point but we will need to ensure we
clear bits as notdirty_mem_write() does.

> +    }
> +
> +    /* Let the guest notice RMW on a write-only page.  */
> +    if (unlikely(tlbe->addr_read != tlb_addr)) {
> +        tlb_fill(ENV_GET_CPU(env), addr, MMU_DATA_LOAD, mmu_idx, retaddr);
> +        /* Since we don't support reads and writes to different addresses,
> +           and we do have the proper page loaded for write, this shouldn't
> +           ever return.  But just in case, handle via stop-the-world.  */
> +        goto stop_the_world;
> +    }
> +
> +    return (void *)((uintptr_t)addr + tlbe->addend);
> +
> + stop_the_world:
> +    cpu_loop_exit_atomic(ENV_GET_CPU(env), retaddr);
> +}


--
Alex Bennée

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Qemu-devel] [PATCH v3 06/34] tcg: Add EXCP_ATOMIC
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 06/34] tcg: Add EXCP_ATOMIC Richard Henderson
@ 2016-09-12 14:16   ` Alex Bennée
  2016-09-12 20:19     ` Richard Henderson
  0 siblings, 1 reply; 97+ messages in thread
From: Alex Bennée @ 2016-09-12 14:16 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <rth@twiddle.net> writes:

> When we cannot emulate an atomic operation within a parallel
> context, this exception allows us to stop the world and try
> again in a serial context.
>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  cpu-exec-common.c       |  6 +++++
>  cpu-exec.c              | 23 +++++++++++++++++++
>  cpus.c                  |  6 +++++
>  include/exec/cpu-all.h  |  1 +
>  include/exec/exec-all.h |  1 +
>  include/qemu-common.h   |  1 +
>  linux-user/main.c       | 59 ++++++++++++++++++++++++++++++++++++++++++++++++-
>  tcg/tcg.h               |  1 +
>  translate-all.c         |  1 +
>  9 files changed, 98 insertions(+), 1 deletion(-)
>
> diff --git a/cpu-exec-common.c b/cpu-exec-common.c
> index 0cb4ae6..767d9c6 100644
> --- a/cpu-exec-common.c
> +++ b/cpu-exec-common.c
> @@ -77,3 +77,9 @@ void cpu_loop_exit_restore(CPUState *cpu, uintptr_t pc)
>      }
>      siglongjmp(cpu->jmp_env, 1);
>  }
> +
> +void cpu_loop_exit_atomic(CPUState *cpu, uintptr_t pc)
> +{
> +    cpu->exception_index = EXCP_ATOMIC;
> +    cpu_loop_exit_restore(cpu, pc);
> +}
> diff --git a/cpu-exec.c b/cpu-exec.c
> index 5d9710a..8d77516 100644
> --- a/cpu-exec.c
> +++ b/cpu-exec.c
> @@ -225,6 +225,29 @@ static void cpu_exec_nocache(CPUState *cpu, int max_cycles,
>  }
>  #endif
>
> +void cpu_exec_step(CPUState *cpu)
> +{
> +    CPUArchState *env = (CPUArchState *)cpu->env_ptr;
> +    TranslationBlock *tb;
> +    target_ulong cs_base, pc;
> +    uint32_t flags;
> +    bool old_tb_flushed;
> +
> +    old_tb_flushed = cpu->tb_flushed;
> +    cpu->tb_flushed = false;

Advanced warning, these disappear soon when the async safe work (plus
safe tb flush) patches get merged.

> +
> +    cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
> +    tb = tb_gen_code(cpu, pc, cs_base, flags,
> +                     1 | CF_NOCACHE | CF_IGNORE_ICOUNT);
> +    tb->orig_tb = NULL;
> +    cpu->tb_flushed |= old_tb_flushed;
> +    /* execute the generated code */
> +    trace_exec_tb_nocache(tb, pc);
> +    cpu_tb_exec(cpu, tb);
> +    tb_phys_invalidate(tb, -1);
> +    tb_free(tb);
> +}
> +
>  struct tb_desc {
>      target_ulong pc;
>      target_ulong cs_base;
> diff --git a/cpus.c b/cpus.c
> index 84c3520..a01bbbd 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -1575,6 +1575,12 @@ static void tcg_exec_all(void)
>              if (r == EXCP_DEBUG) {
>                  cpu_handle_guest_debug(cpu);
>                  break;
> +            } else if (r == EXCP_ATOMIC) {
> +                /* ??? When we begin running cpus in parallel, we should
> +                   stop all cpus, clear parallel_cpus, and interpret a
> +                   single insn with cpu_exec_step.  In the meantime,
> +                   we should never get here.  */
> +                abort();

Possibly more correct would be:

         g_assert(parallel_cpus == false);
         abort();

As mentioned in other patches I was seeing this triggered by
TLB_NOTDIRTY writes. However I think the fix is fairly simple, see
bellow:

>              }
>          } else if (cpu->stop || cpu->stopped) {
>              if (cpu->unplug) {
> diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
> index b6a7059..1f7e9d8 100644
> --- a/include/exec/cpu-all.h
> +++ b/include/exec/cpu-all.h
> @@ -31,6 +31,7 @@
>  #define EXCP_DEBUG      0x10002 /* cpu stopped after a breakpoint or singlestep */
>  #define EXCP_HALTED     0x10003 /* cpu is halted (waiting for external event) */
>  #define EXCP_YIELD      0x10004 /* cpu wants to yield timeslice to another */
> +#define EXCP_ATOMIC     0x10005 /* stop-the-world and emulate atomic */
>
>  /* some important defines:
>   *
> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
> index d008296..08424ae 100644
> --- a/include/exec/exec-all.h
> +++ b/include/exec/exec-all.h
> @@ -71,6 +71,7 @@ static inline void cpu_list_lock(void)
>  void cpu_exec_init(CPUState *cpu, Error **errp);
>  void QEMU_NORETURN cpu_loop_exit(CPUState *cpu);
>  void QEMU_NORETURN cpu_loop_exit_restore(CPUState *cpu, uintptr_t pc);
> +void QEMU_NORETURN cpu_loop_exit_atomic(CPUState *cpu, uintptr_t pc);
>
>  #if !defined(CONFIG_USER_ONLY)
>  void cpu_reloading_memory_map(void);
> diff --git a/include/qemu-common.h b/include/qemu-common.h
> index 9e8b0bd..f814317 100644
> --- a/include/qemu-common.h
> +++ b/include/qemu-common.h
> @@ -80,6 +80,7 @@ void tcg_exec_init(unsigned long tb_size);
>  bool tcg_enabled(void);
>
>  void cpu_exec_init_all(void);
> +void cpu_exec_step(CPUState *cpu);
>
>  /**
>   * Sends a (part of) iovec down a socket, yielding when the socket is full, or
> diff --git a/linux-user/main.c b/linux-user/main.c
> index f2f4d2f..8d5c948 100644
> --- a/linux-user/main.c
> +++ b/linux-user/main.c
> @@ -182,13 +182,25 @@ static inline void start_exclusive(void)
>  }
>
>  /* Finish an exclusive operation.  */
> -static inline void __attribute__((unused)) end_exclusive(void)
> +static inline void end_exclusive(void)
>  {
>      pending_cpus = 0;
>      pthread_cond_broadcast(&exclusive_resume);
>      pthread_mutex_unlock(&exclusive_lock);
>  }
>
> +static void step_atomic(CPUState *cpu)
> +{
> +    start_exclusive();
> +
> +    /* Since we got here, we know that parallel_cpus must be true.  */
> +    parallel_cpus = false;
> +    cpu_exec_step(cpu);
> +    parallel_cpus = true;
> +
> +    end_exclusive();
> +}
> +

Paolo's safe work patches bring the start/end_exclusive functions into
cpu-exec-common so I think after that has been merged this function
can also be moved and called directly from the MTTCG loop on an
EXCP_ATOMIC exit.

>  /* Wait for exclusive ops to finish, and begin cpu execution.  */
>  static inline void cpu_exec_start(CPUState *cpu)
>  {
> @@ -440,6 +452,9 @@ void cpu_loop(CPUX86State *env)
>                    }
>              }
>              break;
> +        case EXCP_ATOMIC:
> +            step_atomic(cs);
> +            break;
>          default:
>              pc = env->segs[R_CS].base + env->eip;
>              EXCP_DUMP(env, "qemu: 0x%08lx: unhandled CPU exception 0x%x - aborting\n",
> @@ -932,6 +947,9 @@ void cpu_loop(CPUARMState *env)
>          case EXCP_YIELD:
>              /* nothing to do here for user-mode, just resume guest code */
>              break;
> +        case EXCP_ATOMIC:
> +            step_atomic(cs);
> +            break;
>          default:
>          error:
>              EXCP_DUMP(env, "qemu: unhandled CPU exception 0x%x - aborting\n", trapnr);
> @@ -1131,6 +1149,9 @@ void cpu_loop(CPUARMState *env)
>          case EXCP_YIELD:
>              /* nothing to do here for user-mode, just resume guest code */
>              break;
> +        case EXCP_ATOMIC:
> +            step_atomic(cs);
> +            break;
>          default:
>              EXCP_DUMP(env, "qemu: unhandled CPU exception 0x%x - aborting\n", trapnr);
>              abort();
> @@ -1220,6 +1241,9 @@ void cpu_loop(CPUUniCore32State *env)
>                  }
>              }
>              break;
> +        case EXCP_ATOMIC:
> +            step_atomic(cs);
> +            break;
>          default:
>              goto error;
>          }
> @@ -1492,6 +1516,9 @@ void cpu_loop (CPUSPARCState *env)
>                    }
>              }
>              break;
> +        case EXCP_ATOMIC:
> +            step_atomic(cs);
> +            break;
>          default:
>              printf ("Unhandled trap: 0x%x\n", trapnr);
>              cpu_dump_state(cs, stderr, fprintf, 0);
> @@ -2040,6 +2067,9 @@ void cpu_loop(CPUPPCState *env)
>          case EXCP_INTERRUPT:
>              /* just indicate that signals should be handled asap */
>              break;
> +        case EXCP_ATOMIC:
> +            step_atomic(cs);
> +            break;
>          default:
>              cpu_abort(cs, "Unknown exception 0x%d. Aborting\n", trapnr);
>              break;
> @@ -2713,6 +2743,9 @@ done_syscall:
>                  }
>              }
>              break;
> +        case EXCP_ATOMIC:
> +            step_atomic(cs);
> +            break;
>          default:
>  error:
>              EXCP_DUMP(env, "qemu: unhandled CPU exception 0x%x - aborting\n", trapnr);
> @@ -2799,6 +2832,9 @@ void cpu_loop(CPUOpenRISCState *env)
>          case EXCP_NR:
>              qemu_log_mask(CPU_LOG_INT, "\nNR\n");
>              break;
> +        case EXCP_ATOMIC:
> +            step_atomic(cs);
> +            break;
>          default:
>              EXCP_DUMP(env, "\nqemu: unhandled CPU exception %#x - aborting\n",
>                       trapnr);
> @@ -2874,6 +2910,9 @@ void cpu_loop(CPUSH4State *env)
>              queue_signal(env, info.si_signo, &info);
>  	    break;
>
> +        case EXCP_ATOMIC:
> +            step_atomic(cs);
> +            break;
>          default:
>              printf ("Unhandled trap: 0x%x\n", trapnr);
>              cpu_dump_state(cs, stderr, fprintf, 0);
> @@ -2939,6 +2978,9 @@ void cpu_loop(CPUCRISState *env)
>                    }
>              }
>              break;
> +        case EXCP_ATOMIC:
> +            step_atomic(cs);
> +            break;
>          default:
>              printf ("Unhandled trap: 0x%x\n", trapnr);
>              cpu_dump_state(cs, stderr, fprintf, 0);
> @@ -3053,6 +3095,9 @@ void cpu_loop(CPUMBState *env)
>                    }
>              }
>              break;
> +        case EXCP_ATOMIC:
> +            step_atomic(cs);
> +            break;
>          default:
>              printf ("Unhandled trap: 0x%x\n", trapnr);
>              cpu_dump_state(cs, stderr, fprintf, 0);
> @@ -3154,6 +3199,9 @@ void cpu_loop(CPUM68KState *env)
>                    }
>              }
>              break;
> +        case EXCP_ATOMIC:
> +            step_atomic(cs);
> +            break;
>          default:
>              EXCP_DUMP(env, "qemu: unhandled CPU exception 0x%x - aborting\n", trapnr);
>              abort();
> @@ -3389,6 +3437,9 @@ void cpu_loop(CPUAlphaState *env)
>          case EXCP_INTERRUPT:
>              /* Just indicate that signals should be handled asap.  */
>              break;
> +        case EXCP_ATOMIC:
> +            step_atomic(cs);
> +            break;
>          default:
>              printf ("Unhandled trap: 0x%x\n", trapnr);
>              cpu_dump_state(cs, stderr, fprintf, 0);
> @@ -3516,6 +3567,9 @@ void cpu_loop(CPUS390XState *env)
>              queue_signal(env, info.si_signo, &info);
>              break;
>
> +        case EXCP_ATOMIC:
> +            step_atomic(cs);
> +            break;
>          default:
>              fprintf(stderr, "Unhandled trap: 0x%x\n", trapnr);
>              cpu_dump_state(cs, stderr, fprintf, 0);
> @@ -3768,6 +3822,9 @@ void cpu_loop(CPUTLGState *env)
>          case TILEGX_EXCP_REG_UDN_ACCESS:
>              gen_sigill_reg(env);
>              break;
> +        case EXCP_ATOMIC:
> +            step_atomic(cs);
> +            break;
>          default:
>              fprintf(stderr, "trapnr is %d[0x%x].\n", trapnr, trapnr);
>              g_assert_not_reached();
> diff --git a/tcg/tcg.h b/tcg/tcg.h
> index 1bcabca..00498fc 100644
> --- a/tcg/tcg.h
> +++ b/tcg/tcg.h
> @@ -700,6 +700,7 @@ struct TCGContext {
>  };
>
>  extern TCGContext tcg_ctx;
> +extern bool parallel_cpus;
>
>  static inline void tcg_set_insn_param(int op_idx, int arg, TCGArg v)
>  {
> diff --git a/translate-all.c b/translate-all.c
> index 0dd6466..f97fc1e 100644
> --- a/translate-all.c
> +++ b/translate-all.c
> @@ -119,6 +119,7 @@ static void *l1_map[V_L1_SIZE];
>
>  /* code generation context */
>  TCGContext tcg_ctx;
> +bool parallel_cpus;

So lets add some words to this to distinguish between this and the mttcg
enabled flags and its relation to -smp. Something like:

  parallel_cpus indicates to the front-ends if code needs to be
  generated taking into account multiple threads of execution. It will
  be true for linux-user after the first thread clone and if system mode
  if MTTCG is enabled. On the transition from false->true any code
  generated while false needs to be invalidated. It may be temporally
  set to false when generating non-cached code in an exclusive context.

>
>  /* translation block context */
>  #ifdef CONFIG_USER_ONLY


--
Alex Bennée

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Qemu-devel] [PATCH v3 07/34] HACK: Always enable parallel_cpus
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 07/34] HACK: Always enable parallel_cpus Richard Henderson
@ 2016-09-12 14:20   ` Alex Bennée
  0 siblings, 0 replies; 97+ messages in thread
From: Alex Bennée @ 2016-09-12 14:20 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <rth@twiddle.net> writes:

> This is really just a placeholder for an actual
> command-line switch for mttcg.
> ---
>  translate-all.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/translate-all.c b/translate-all.c
> index f97fc1e..d6d879c 100644
> --- a/translate-all.c
> +++ b/translate-all.c
> @@ -119,7 +119,7 @@ static void *l1_map[V_L1_SIZE];
>
>  /* code generation context */
>  TCGContext tcg_ctx;
> -bool parallel_cpus;
> +bool parallel_cpus = 1;

s/1/true/ but...

As mentioned on the previous version we can also skip parallel_cpus for
single threaded linux-user code. This will be easier once the async safe
work is merged so we can flip the flag and trigger a safe tb_flush()
once the linux-user guest does its first clone().

>
>  /* translation block context */
>  #ifdef CONFIG_USER_ONLY


--
Alex Bennée

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Qemu-devel] [PATCH v3 08/34] cputlb: Replace SHIFT with DATA_SIZE
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 08/34] cputlb: Replace SHIFT with DATA_SIZE Richard Henderson
@ 2016-09-12 14:22   ` Alex Bennée
  0 siblings, 0 replies; 97+ messages in thread
From: Alex Bennée @ 2016-09-12 14:22 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <rth@twiddle.net> writes:

> Signed-off-by: Richard Henderson <rth@twiddle.net>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  cputlb.c           | 16 ++++++++--------
>  softmmu_template.h |  7 ++-----
>  2 files changed, 10 insertions(+), 13 deletions(-)
>
> diff --git a/cputlb.c b/cputlb.c
> index d068ee5..cf68211 100644
> --- a/cputlb.c
> +++ b/cputlb.c
> @@ -529,16 +529,16 @@ static bool victim_tlb_hit(CPUArchState *env, size_t mmu_idx, size_t index,
>
>  #define MMUSUFFIX _mmu
>
> -#define SHIFT 0
> +#define DATA_SIZE 1
>  #include "softmmu_template.h"
>
> -#define SHIFT 1
> +#define DATA_SIZE 2
>  #include "softmmu_template.h"
>
> -#define SHIFT 2
> +#define DATA_SIZE 4
>  #include "softmmu_template.h"
>
> -#define SHIFT 3
> +#define DATA_SIZE 8
>  #include "softmmu_template.h"
>  #undef MMUSUFFIX
>
> @@ -549,14 +549,14 @@ static bool victim_tlb_hit(CPUArchState *env, size_t mmu_idx, size_t index,
>  #define GETRA() ((uintptr_t)0)
>  #define SOFTMMU_CODE_ACCESS
>
> -#define SHIFT 0
> +#define DATA_SIZE 1
>  #include "softmmu_template.h"
>
> -#define SHIFT 1
> +#define DATA_SIZE 2
>  #include "softmmu_template.h"
>
> -#define SHIFT 2
> +#define DATA_SIZE 4
>  #include "softmmu_template.h"
>
> -#define SHIFT 3
> +#define DATA_SIZE 8
>  #include "softmmu_template.h"
> diff --git a/softmmu_template.h b/softmmu_template.h
> index 284ab2c..3c56df1 100644
> --- a/softmmu_template.h
> +++ b/softmmu_template.h
> @@ -25,8 +25,6 @@
>  #include "exec/address-spaces.h"
>  #include "exec/memory.h"
>
> -#define DATA_SIZE (1 << SHIFT)
> -
>  #if DATA_SIZE == 8
>  #define SUFFIX q
>  #define LSUFFIX q
> @@ -134,7 +132,7 @@ static inline DATA_TYPE glue(io_read, SUFFIX)(CPUArchState *env,
>      }
>
>      cpu->mem_io_vaddr = addr;
> -    memory_region_dispatch_read(mr, physaddr, &val, 1 << SHIFT,
> +    memory_region_dispatch_read(mr, physaddr, &val, DATA_SIZE,
>                                  iotlbentry->attrs);
>      return val;
>  }
> @@ -321,7 +319,7 @@ static inline void glue(io_write, SUFFIX)(CPUArchState *env,
>
>      cpu->mem_io_vaddr = addr;
>      cpu->mem_io_pc = retaddr;
> -    memory_region_dispatch_write(mr, physaddr, val, 1 << SHIFT,
> +    memory_region_dispatch_write(mr, physaddr, val, DATA_SIZE,
>                                   iotlbentry->attrs);
>  }
>
> @@ -512,7 +510,6 @@ void probe_write(CPUArchState *env, target_ulong addr, int mmu_idx,
>  #endif /* !defined(SOFTMMU_CODE_ACCESS) */
>
>  #undef READ_ACCESS_TYPE
> -#undef SHIFT
>  #undef DATA_TYPE
>  #undef SUFFIX
>  #undef LSUFFIX


--
Alex Bennée

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Qemu-devel] [PATCH v3 09/34] cputlb: Move probe_write out of softmmu_template.h
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 09/34] cputlb: Move probe_write out of softmmu_template.h Richard Henderson
@ 2016-09-12 14:35   ` Alex Bennée
  0 siblings, 0 replies; 97+ messages in thread
From: Alex Bennée @ 2016-09-12 14:35 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <rth@twiddle.net> writes:

> Signed-off-by: Richard Henderson <rth@twiddle.net>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  cputlb.c           | 21 +++++++++++++++++++++
>  softmmu_template.h | 23 -----------------------
>  2 files changed, 21 insertions(+), 23 deletions(-)
>
> diff --git a/cputlb.c b/cputlb.c
> index cf68211..8a021ab 100644
> --- a/cputlb.c
> +++ b/cputlb.c
> @@ -527,6 +527,27 @@ static bool victim_tlb_hit(CPUArchState *env, size_t mmu_idx, size_t index,
>    victim_tlb_hit(env, mmu_idx, index, offsetof(CPUTLBEntry, TY), \
>                   (ADDR) & TARGET_PAGE_MASK)
>
> +/* Probe for whether the specified guest write access is permitted.
> + * If it is not permitted then an exception will be taken in the same
> + * way as if this were a real write access (and we will not return).
> + * Otherwise the function will return, and there will be a valid
> + * entry in the TLB for this access.
> + */
> +void probe_write(CPUArchState *env, target_ulong addr, int mmu_idx,
> +                 uintptr_t retaddr)
> +{
> +    int index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
> +    target_ulong tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
> +
> +    if ((addr & TARGET_PAGE_MASK)
> +        != (tlb_addr & (TARGET_PAGE_MASK | TLB_INVALID_MASK))) {
> +        /* TLB entry is for a different page */
> +        if (!VICTIM_TLB_HIT(addr_write, addr)) {
> +            tlb_fill(ENV_GET_CPU(env), addr, MMU_DATA_STORE, mmu_idx, retaddr);
> +        }
> +    }
> +}
> +
>  #define MMUSUFFIX _mmu
>
>  #define DATA_SIZE 1
> diff --git a/softmmu_template.h b/softmmu_template.h
> index 3c56df1..6805028 100644
> --- a/softmmu_template.h
> +++ b/softmmu_template.h
> @@ -484,29 +484,6 @@ void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
>      glue(glue(st, SUFFIX), _be_p)((uint8_t *)haddr, val);
>  }
>  #endif /* DATA_SIZE > 1 */
> -
> -#if DATA_SIZE == 1
> -/* Probe for whether the specified guest write access is permitted.
> - * If it is not permitted then an exception will be taken in the same
> - * way as if this were a real write access (and we will not return).
> - * Otherwise the function will return, and there will be a valid
> - * entry in the TLB for this access.
> - */
> -void probe_write(CPUArchState *env, target_ulong addr, int mmu_idx,
> -                 uintptr_t retaddr)
> -{
> -    int index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
> -    target_ulong tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
> -
> -    if ((addr & TARGET_PAGE_MASK)
> -        != (tlb_addr & (TARGET_PAGE_MASK | TLB_INVALID_MASK))) {
> -        /* TLB entry is for a different page */
> -        if (!VICTIM_TLB_HIT(addr_write, addr)) {
> -            tlb_fill(ENV_GET_CPU(env), addr, MMU_DATA_STORE, mmu_idx, retaddr);
> -        }
> -    }
> -}
> -#endif
>  #endif /* !defined(SOFTMMU_CODE_ACCESS) */
>
>  #undef READ_ACCESS_TYPE


--
Alex Bennée

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Qemu-devel] [PATCH v3 10/34] cputlb: Remove includes from softmmu_template.h
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 10/34] cputlb: Remove includes from softmmu_template.h Richard Henderson
@ 2016-09-12 14:38   ` Alex Bennée
  0 siblings, 0 replies; 97+ messages in thread
From: Alex Bennée @ 2016-09-12 14:38 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <rth@twiddle.net> writes:

> We already include exec/address-spaces.h and exec/memory.h in
> cputlb.c; the include of qemu/timer.h appears to be a fossil.
>
> Signed-off-by: Richard Henderson <rth@twiddle.net>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  softmmu_template.h | 4 ----
>  1 file changed, 4 deletions(-)
>
> diff --git a/softmmu_template.h b/softmmu_template.h
> index 6805028..a75d299 100644
> --- a/softmmu_template.h
> +++ b/softmmu_template.h
> @@ -21,10 +21,6 @@
>   * You should have received a copy of the GNU Lesser General Public
>   * License along with this library; if not, see <http://www.gnu.org/licenses/>.
>   */
> -#include "qemu/timer.h"
> -#include "exec/address-spaces.h"
> -#include "exec/memory.h"
> -
>  #if DATA_SIZE == 8
>  #define SUFFIX q
>  #define LSUFFIX q


--
Alex Bennée

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Qemu-devel] [PATCH v3 11/34] cputlb: Move most of iotlb code out of line
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 11/34] cputlb: Move most of iotlb code out of line Richard Henderson
@ 2016-09-12 15:26   ` Alex Bennée
  0 siblings, 0 replies; 97+ messages in thread
From: Alex Bennée @ 2016-09-12 15:26 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <rth@twiddle.net> writes:

> Saves 2k code size off of a cold path.
>
> Signed-off-by: Richard Henderson <rth@twiddle.net>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  cputlb.c           | 37 +++++++++++++++++++++++++++++++++++++
>  softmmu_template.h | 52 ++++++++++------------------------------------------
>  2 files changed, 47 insertions(+), 42 deletions(-)
>
> diff --git a/cputlb.c b/cputlb.c
> index 8a021ab..eba78a9 100644
> --- a/cputlb.c
> +++ b/cputlb.c
> @@ -498,6 +498,43 @@ tb_page_addr_t get_page_addr_code(CPUArchState *env1, target_ulong addr)
>      return qemu_ram_addr_from_host_nofail(p);
>  }
>
> +static uint64_t io_readx(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
> +                         target_ulong addr, uintptr_t retaddr, int size)
> +{
> +    CPUState *cpu = ENV_GET_CPU(env);
> +    hwaddr physaddr = iotlbentry->addr;
> +    MemoryRegion *mr = iotlb_to_region(cpu, physaddr, iotlbentry->attrs);
> +    uint64_t val;
> +
> +    physaddr = (physaddr & TARGET_PAGE_MASK) + addr;
> +    cpu->mem_io_pc = retaddr;
> +    if (mr != &io_mem_rom && mr != &io_mem_notdirty && !cpu->can_do_io) {
> +        cpu_io_recompile(cpu, retaddr);
> +    }
> +
> +    cpu->mem_io_vaddr = addr;
> +    memory_region_dispatch_read(mr, physaddr, &val, size, iotlbentry->attrs);
> +    return val;
> +}
> +
> +static void io_writex(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
> +                      uint64_t val, target_ulong addr,
> +                      uintptr_t retaddr, int size)
> +{
> +    CPUState *cpu = ENV_GET_CPU(env);
> +    hwaddr physaddr = iotlbentry->addr;
> +    MemoryRegion *mr = iotlb_to_region(cpu, physaddr, iotlbentry->attrs);
> +
> +    physaddr = (physaddr & TARGET_PAGE_MASK) + addr;
> +    if (mr != &io_mem_rom && mr != &io_mem_notdirty && !cpu->can_do_io) {
> +        cpu_io_recompile(cpu, retaddr);
> +    }
> +
> +    cpu->mem_io_vaddr = addr;
> +    cpu->mem_io_pc = retaddr;
> +    memory_region_dispatch_write(mr, physaddr, val, size, iotlbentry->attrs);
> +}
> +
>  /* Return true if ADDR is present in the victim tlb, and has been copied
>     back to the main tlb.  */
>  static bool victim_tlb_hit(CPUArchState *env, size_t mmu_idx, size_t index,
> diff --git a/softmmu_template.h b/softmmu_template.h
> index a75d299..c513813 100644
> --- a/softmmu_template.h
> +++ b/softmmu_template.h
> @@ -112,25 +112,12 @@
>
>  #ifndef SOFTMMU_CODE_ACCESS
>  static inline DATA_TYPE glue(io_read, SUFFIX)(CPUArchState *env,
> -                                              CPUIOTLBEntry *iotlbentry,
> +                                              size_t mmu_idx, size_t index,
>                                                target_ulong addr,
>                                                uintptr_t retaddr)
>  {
> -    uint64_t val;
> -    CPUState *cpu = ENV_GET_CPU(env);
> -    hwaddr physaddr = iotlbentry->addr;
> -    MemoryRegion *mr = iotlb_to_region(cpu, physaddr, iotlbentry->attrs);
> -
> -    physaddr = (physaddr & TARGET_PAGE_MASK) + addr;
> -    cpu->mem_io_pc = retaddr;
> -    if (mr != &io_mem_rom && mr != &io_mem_notdirty && !cpu->can_do_io) {
> -        cpu_io_recompile(cpu, retaddr);
> -    }
> -
> -    cpu->mem_io_vaddr = addr;
> -    memory_region_dispatch_read(mr, physaddr, &val, DATA_SIZE,
> -                                iotlbentry->attrs);
> -    return val;
> +    CPUIOTLBEntry *iotlbentry = &env->iotlb[mmu_idx][index];
> +    return io_readx(env, iotlbentry, addr, retaddr, DATA_SIZE);
>  }
>  #endif
>
> @@ -164,15 +151,13 @@ WORD_TYPE helper_le_ld_name(CPUArchState *env, target_ulong addr,
>
>      /* Handle an IO access.  */
>      if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
> -        CPUIOTLBEntry *iotlbentry;
>          if ((addr & (DATA_SIZE - 1)) != 0) {
>              goto do_unaligned_access;
>          }
> -        iotlbentry = &env->iotlb[mmu_idx][index];
>
>          /* ??? Note that the io helpers always read data in the target
>             byte ordering.  We should push the LE/BE request down into io.  */
> -        res = glue(io_read, SUFFIX)(env, iotlbentry, addr, retaddr);
> +        res = glue(io_read, SUFFIX)(env, mmu_idx, index, addr, retaddr);
>          res = TGT_LE(res);
>          return res;
>      }
> @@ -238,15 +223,13 @@ WORD_TYPE helper_be_ld_name(CPUArchState *env, target_ulong addr,
>
>      /* Handle an IO access.  */
>      if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
> -        CPUIOTLBEntry *iotlbentry;
>          if ((addr & (DATA_SIZE - 1)) != 0) {
>              goto do_unaligned_access;
>          }
> -        iotlbentry = &env->iotlb[mmu_idx][index];
>
>          /* ??? Note that the io helpers always read data in the target
>             byte ordering.  We should push the LE/BE request down into io.  */
> -        res = glue(io_read, SUFFIX)(env, iotlbentry, addr, retaddr);
> +        res = glue(io_read, SUFFIX)(env, mmu_idx, index, addr, retaddr);
>          res = TGT_BE(res);
>          return res;
>      }
> @@ -299,24 +282,13 @@ WORD_TYPE helper_be_lds_name(CPUArchState *env, target_ulong addr,
>  #endif
>
>  static inline void glue(io_write, SUFFIX)(CPUArchState *env,
> -                                          CPUIOTLBEntry *iotlbentry,
> +                                          size_t mmu_idx, size_t index,
>                                            DATA_TYPE val,
>                                            target_ulong addr,
>                                            uintptr_t retaddr)
>  {
> -    CPUState *cpu = ENV_GET_CPU(env);
> -    hwaddr physaddr = iotlbentry->addr;
> -    MemoryRegion *mr = iotlb_to_region(cpu, physaddr, iotlbentry->attrs);
> -
> -    physaddr = (physaddr & TARGET_PAGE_MASK) + addr;
> -    if (mr != &io_mem_rom && mr != &io_mem_notdirty && !cpu->can_do_io) {
> -        cpu_io_recompile(cpu, retaddr);
> -    }
> -
> -    cpu->mem_io_vaddr = addr;
> -    cpu->mem_io_pc = retaddr;
> -    memory_region_dispatch_write(mr, physaddr, val, DATA_SIZE,
> -                                 iotlbentry->attrs);
> +    CPUIOTLBEntry *iotlbentry = &env->iotlb[mmu_idx][index];
> +    return io_writex(env, iotlbentry, val, addr, retaddr, DATA_SIZE);
>  }
>
>  void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
> @@ -347,16 +319,14 @@ void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
>
>      /* Handle an IO access.  */
>      if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
> -        CPUIOTLBEntry *iotlbentry;
>          if ((addr & (DATA_SIZE - 1)) != 0) {
>              goto do_unaligned_access;
>          }
> -        iotlbentry = &env->iotlb[mmu_idx][index];
>
>          /* ??? Note that the io helpers always read data in the target
>             byte ordering.  We should push the LE/BE request down into io.  */
>          val = TGT_LE(val);
> -        glue(io_write, SUFFIX)(env, iotlbentry, val, addr, retaddr);
> +        glue(io_write, SUFFIX)(env, mmu_idx, index, val, addr, retaddr);
>          return;
>      }
>
> @@ -430,16 +400,14 @@ void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
>
>      /* Handle an IO access.  */
>      if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
> -        CPUIOTLBEntry *iotlbentry;
>          if ((addr & (DATA_SIZE - 1)) != 0) {
>              goto do_unaligned_access;
>          }
> -        iotlbentry = &env->iotlb[mmu_idx][index];
>
>          /* ??? Note that the io helpers always read data in the target
>             byte ordering.  We should push the LE/BE request down into io.  */
>          val = TGT_BE(val);
> -        glue(io_write, SUFFIX)(env, iotlbentry, val, addr, retaddr);
> +        glue(io_write, SUFFIX)(env, mmu_idx, index, val, addr, retaddr);
>          return;
>      }


--
Alex Bennée

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Qemu-devel] [PATCH v3 12/34] cputlb: Tidy some macros
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 12/34] cputlb: Tidy some macros Richard Henderson
@ 2016-09-12 15:28   ` Alex Bennée
  0 siblings, 0 replies; 97+ messages in thread
From: Alex Bennée @ 2016-09-12 15:28 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <rth@twiddle.net> writes:

> TGT_LE and TGT_BE are not size dependent and do not need to be
> redefined.  The others are no longer used at all.
>
> Signed-off-by: Richard Henderson <rth@twiddle.net>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  cputlb.c           |  8 ++++++++
>  softmmu_template.h | 22 ----------------------
>  2 files changed, 8 insertions(+), 22 deletions(-)
>
> diff --git a/cputlb.c b/cputlb.c
> index eba78a9..d710cc1 100644
> --- a/cputlb.c
> +++ b/cputlb.c
> @@ -585,6 +585,14 @@ void probe_write(CPUArchState *env, target_ulong addr, int mmu_idx,
>      }
>  }
>
> +#ifdef TARGET_WORDS_BIGENDIAN
> +# define TGT_BE(X)  (X)
> +# define TGT_LE(X)  BSWAP(X)
> +#else
> +# define TGT_BE(X)  BSWAP(X)
> +# define TGT_LE(X)  (X)
> +#endif
> +
>  #define MMUSUFFIX _mmu
>
>  #define DATA_SIZE 1
> diff --git a/softmmu_template.h b/softmmu_template.h
> index c513813..efcfcd8 100644
> --- a/softmmu_template.h
> +++ b/softmmu_template.h
> @@ -78,14 +78,6 @@
>  # define BSWAP(X)  (X)
>  #endif
>
> -#ifdef TARGET_WORDS_BIGENDIAN
> -# define TGT_BE(X)  (X)
> -# define TGT_LE(X)  BSWAP(X)
> -#else
> -# define TGT_BE(X)  BSWAP(X)
> -# define TGT_LE(X)  (X)
> -#endif
> -
>  #if DATA_SIZE == 1
>  # define helper_le_ld_name  glue(glue(helper_ret_ld, USUFFIX), MMUSUFFIX)
>  # define helper_be_ld_name  helper_le_ld_name
> @@ -102,14 +94,6 @@
>  # define helper_be_st_name  glue(glue(helper_be_st, SUFFIX), MMUSUFFIX)
>  #endif
>
> -#ifdef TARGET_WORDS_BIGENDIAN
> -# define helper_te_ld_name  helper_be_ld_name
> -# define helper_te_st_name  helper_be_st_name
> -#else
> -# define helper_te_ld_name  helper_le_ld_name
> -# define helper_te_st_name  helper_le_st_name
> -#endif
> -
>  #ifndef SOFTMMU_CODE_ACCESS
>  static inline DATA_TYPE glue(io_read, SUFFIX)(CPUArchState *env,
>                                                size_t mmu_idx, size_t index,
> @@ -461,15 +445,9 @@ void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
>  #undef USUFFIX
>  #undef SSUFFIX
>  #undef BSWAP
> -#undef TGT_BE
> -#undef TGT_LE
> -#undef CPU_BE
> -#undef CPU_LE
>  #undef helper_le_ld_name
>  #undef helper_be_ld_name
>  #undef helper_le_lds_name
>  #undef helper_be_lds_name
>  #undef helper_le_st_name
>  #undef helper_be_st_name
> -#undef helper_te_ld_name
> -#undef helper_te_st_name


--
Alex Bennée

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Qemu-devel] [PATCH v3 13/34] tcg: Add atomic helpers
  2016-09-12  7:59       ` Leon Alrae
@ 2016-09-12 16:13         ` Richard Henderson
  2016-09-13 12:32           ` Leon Alrae
  0 siblings, 1 reply; 97+ messages in thread
From: Richard Henderson @ 2016-09-12 16:13 UTC (permalink / raw)
  To: Leon Alrae; +Cc: qemu-devel

On 09/12/2016 12:59 AM, Leon Alrae wrote:
> On Fri, Sep 09, 2016 at 09:26:29AM -0700, Richard Henderson wrote:
>> On 09/09/2016 07:46 AM, Leon Alrae wrote:
>>> Wouldn't it be useful if tcg.h provided also aliases for _le/_be atomic
>>> helpers (equivalent to helper_ret_X_mmu) so that in target-* code we wouldn't
>>> need to care about the endianness (specifically I'm thinking about SC in MIPS
>>> where I need to select between helper_atomic_cmpxchgl_le_mmu() and
>>> helper_atomic_cmpxchgl_be_mmu()).
>>
>> Perhaps.  I would have hoped that you could do the SC inline now
>> though, and tcg_gen_atomic_cmpxchg() will do the selection for you.
>
> On every SC we need to do the virtual -> physical address translation as we
> have to compare the physical address against that of the preceeding LL.
> This operation seems too complex to be inlined :(

What happens if you do virtual address comparisons, like everyone else?  It 
might not be strictly correct, but in practice I bet its no worse than using 
cmpxchg in the first place.


r~

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Qemu-devel] [PATCH v3 06/34] tcg: Add EXCP_ATOMIC
  2016-09-12 14:16   ` Alex Bennée
@ 2016-09-12 20:19     ` Richard Henderson
  2016-09-13  6:42       ` Alex Bennée
  0 siblings, 1 reply; 97+ messages in thread
From: Richard Henderson @ 2016-09-12 20:19 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel

On 09/12/2016 07:16 AM, Alex Bennée wrote:
>> +void cpu_exec_step(CPUState *cpu)
>> +{
>> +    CPUArchState *env = (CPUArchState *)cpu->env_ptr;
>> +    TranslationBlock *tb;
>> +    target_ulong cs_base, pc;
>> +    uint32_t flags;
>> +    bool old_tb_flushed;
>> +
>> +    old_tb_flushed = cpu->tb_flushed;
>> +    cpu->tb_flushed = false;
>
> Advanced warning, these disappear soon when the async safe work (plus
> safe tb flush) patches get merged.

Fair enough.

Having thought about this more, I think it may be better to handle this without 
flushing the tb.  To have parallel_cpus included in the TB flags or somesuch 
and keep that TB around.

My thinking is that there are certain things, like alignment, that could result 
in repeated single-stepping.  So better to keep the TB around than keep having 
to regenerate it.

>> +                /* ??? When we begin running cpus in parallel, we should
>> +                   stop all cpus, clear parallel_cpus, and interpret a
>> +                   single insn with cpu_exec_step.  In the meantime,
>> +                   we should never get here.  */
>> +                abort();
>
> Possibly more correct would be:
>
>          g_assert(parallel_cpus == false);
>          abort();

No, since it is here that we would *set* parallel_cpus to false.  Or did you 
mean assert parallel_cpus true?  Not that that helps for the moment...

>> +static void step_atomic(CPUState *cpu)
>> +{
>> +    start_exclusive();
>> +
>> +    /* Since we got here, we know that parallel_cpus must be true.  */
>> +    parallel_cpus = false;
>> +    cpu_exec_step(cpu);
>> +    parallel_cpus = true;
>> +
>> +    end_exclusive();
>> +}
>> +
>
> Paolo's safe work patches bring the start/end_exclusive functions into
> cpu-exec-common so I think after that has been merged this function
> can also be moved and called directly from the MTTCG loop on an
> EXCP_ATOMIC exit.

Excellent.  Perhaps I should rebase this upon that right away.  Have you got a 
pointer to a tree handy?

>> +bool parallel_cpus;
>
> So lets add some words to this to distinguish between this and the mttcg
> enabled flags and its relation to -smp. Something like:
>
>   parallel_cpus indicates to the front-ends if code needs to be
>   generated taking into account multiple threads of execution. It will
>   be true for linux-user after the first thread clone and if system mode
>   if MTTCG is enabled. On the transition from false->true any code
>   generated while false needs to be invalidated. It may be temporally
>   set to false when generating non-cached code in an exclusive context.

Sure.


r~

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Qemu-devel] [PATCH v3 06/34] tcg: Add EXCP_ATOMIC
  2016-09-12 20:19     ` Richard Henderson
@ 2016-09-13  6:42       ` Alex Bennée
  0 siblings, 0 replies; 97+ messages in thread
From: Alex Bennée @ 2016-09-13  6:42 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <rth@twiddle.net> writes:

> On 09/12/2016 07:16 AM, Alex Bennée wrote:
>>> +void cpu_exec_step(CPUState *cpu)
>>> +{
>>> +    CPUArchState *env = (CPUArchState *)cpu->env_ptr;
>>> +    TranslationBlock *tb;
>>> +    target_ulong cs_base, pc;
>>> +    uint32_t flags;
>>> +    bool old_tb_flushed;
>>> +
>>> +    old_tb_flushed = cpu->tb_flushed;
>>> +    cpu->tb_flushed = false;
>>
>> Advanced warning, these disappear soon when the async safe work (plus
>> safe tb flush) patches get merged.
>
> Fair enough.
>
> Having thought about this more, I think it may be better to handle this without
> flushing the tb.  To have parallel_cpus included in the TB flags or somesuch
> and keep that TB around.
>
> My thinking is that there are certain things, like alignment, that could result
> in repeated single-stepping.  So better to keep the TB around than keep having
> to regenerate it.
>
>>> +                /* ??? When we begin running cpus in parallel, we should
>>> +                   stop all cpus, clear parallel_cpus, and interpret a
>>> +                   single insn with cpu_exec_step.  In the meantime,
>>> +                   we should never get here.  */
>>> +                abort();
>>
>> Possibly more correct would be:
>>
>>          g_assert(parallel_cpus == false);
>>          abort();
>
> No, since it is here that we would *set* parallel_cpus to false.  Or did you
> mean assert parallel_cpus true?  Not that that helps for the moment...

For SoftMMU this particular loop should never hit because paralell_cpus
should be false, hence we never generate any code that might
EXCP_ATOMIC. It only fails at the moment because of the "hack" for
testing which makes parallel_cpus true.

The MTTCG adds a new thread function for MTTCG mode which
will have to handle EXCP_ATOMIC as discussed.

>
>>> +static void step_atomic(CPUState *cpu)
>>> +{
>>> +    start_exclusive();
>>> +
>>> +    /* Since we got here, we know that parallel_cpus must be true.  */
>>> +    parallel_cpus = false;
>>> +    cpu_exec_step(cpu);
>>> +    parallel_cpus = true;
>>> +
>>> +    end_exclusive();
>>> +}
>>> +
>>
>> Paolo's safe work patches bring the start/end_exclusive functions into
>> cpu-exec-common so I think after that has been merged this function
>> can also be moved and called directly from the MTTCG loop on an
>> EXCP_ATOMIC exit.
>
> Excellent.  Perhaps I should rebase this upon that right away.  Have you got a
> pointer to a tree handy?

Paolo posted a new version recently but I haven't built a tree with it
yet. I was hoping some of the hot-path and maybe barrier stuff would get
merged while I finish reviewing this ;-)

See:

    Subject: [PATCH v7 00/16] cpu-exec: Safe work in quiescent state
    Date: Mon, 12 Sep 2016 13:12:25 +0200
    Message-Id: <1473678761-8885-1-git-send-email-pbonzini@redhat.com>

>
>>> +bool parallel_cpus;
>>
>> So lets add some words to this to distinguish between this and the mttcg
>> enabled flags and its relation to -smp. Something like:
>>
>>   parallel_cpus indicates to the front-ends if code needs to be
>>   generated taking into account multiple threads of execution. It will
>>   be true for linux-user after the first thread clone and if system mode
>>   if MTTCG is enabled. On the transition from false->true any code
>>   generated while false needs to be invalidated. It may be temporally
>>   set to false when generating non-cached code in an exclusive context.
>
> Sure.
>
>
> r~


--
Alex Bennée

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Qemu-devel] [PATCH v3 14/34] tcg: Add atomic128 helpers
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 14/34] tcg: Add atomic128 helpers Richard Henderson
@ 2016-09-13 11:18   ` Alex Bennée
  2016-09-13 14:18     ` Richard Henderson
  0 siblings, 1 reply; 97+ messages in thread
From: Alex Bennée @ 2016-09-13 11:18 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <rth@twiddle.net> writes:

> Force the use of cmpxchg16b on x86_64.
>
> Wikipedia suggests that only very old AMD64 (circa 2004) did not have
> this instruction.  Further, it's required by Windows 8 so no new cpus
> will ever omit it.
>
> If we truely care about these, then we could check this at startup time
> and then avoid executing paths that use it.
>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  atomic_template.h     | 40 +++++++++++++++++++++++++++++++++++++++-
>  configure             | 29 ++++++++++++++++++++++++++++-
>  cputlb.c              |  5 +++++
>  include/qemu/int128.h |  6 ++++++
>  tcg-runtime.c         | 20 +++++++++++++++++++-
>  tcg/tcg.h             | 24 +++++++++++++++++++++++-
>  6 files changed, 120 insertions(+), 4 deletions(-)
>
<snip>
> diff --git a/tcg-runtime.c b/tcg-runtime.c
> index aa55d12..0c97cdf 100644
> --- a/tcg-runtime.c
> +++ b/tcg-runtime.c
> @@ -118,8 +118,8 @@ static void *atomic_mmu_lookup(CPUArchState *env, target_ulong addr,
>  /* Macro to call the above, with local variables from the use context.  */
>  #define ATOMIC_MMU_LOOKUP  atomic_mmu_lookup(env, addr, DATA_SIZE, GETPC())
>
> -#define ATOMIC_NAME(X)   HELPER(glue(glue(atomic_ ## X, SUFFIX), END))
>  #define EXTRA_ARGS
> +#define ATOMIC_NAME(X)   HELPER(glue(glue(atomic_ ## X, SUFFIX), END))

Did I miss a subtly here? Should this change be squashed into the
original atomic helpers patch?


>  #define DATA_SIZE 1
>  #include "atomic_template.h"
> @@ -133,4 +133,22 @@ static void *atomic_mmu_lookup(CPUArchState *env, target_ulong addr,
>  #define DATA_SIZE 8
>  #include "atomic_template.h"
>
> +/* The following is only callable from other helpers, and matches up
> +   with the softmmu version.  */
> +
> +#ifdef CONFIG_ATOMIC128
> +
> +#undef EXTRA_ARGS
> +#undef ATOMIC_NAME
> +#undef ATOMIC_MMU_LOOKUP
> +
> +#define EXTRA_ARGS     , TCGMemOpIdx oi, uintptr_t retaddr
> +#define ATOMIC_NAME(X) \
> +    HELPER(glue(glue(glue(atomic_ ## X, SUFFIX), END), _mmu))
> +#define ATOMIC_MMU_LOOKUP  atomic_mmu_lookup(env, addr, DATA_SIZE, retaddr)
> +
> +#define DATA_SIZE 16
> +#include "atomic_template.h"
> +#endif /* CONFIG_ATOMIC128 */
> +
>  #endif /* !CONFIG_SOFTMMU */
> diff --git a/tcg/tcg.h b/tcg/tcg.h
> index c91b8c6..5a94cec 100644
> --- a/tcg/tcg.h
> +++ b/tcg/tcg.h
> @@ -1227,7 +1227,29 @@ GEN_ATOMIC_HELPER_ALL(xchg)
>
>  #undef GEN_ATOMIC_HELPER_ALL
>  #undef GEN_ATOMIC_HELPER
> -
>  #endif /* CONFIG_SOFTMMU */
>
> +#ifdef CONFIG_ATOMIC128
> +#include "qemu/int128.h"
> +
> +/* These aren't really a "proper" helpers because TCG cannot manage Int128.
> +   However, use the same format as the others, for use by the backends. */
> +Int128 helper_atomic_cmpxchgo_le_mmu(CPUArchState *env, target_ulong addr,
> +                                     Int128 cmpv, Int128 newv,
> +                                     TCGMemOpIdx oi, uintptr_t retaddr);
> +Int128 helper_atomic_cmpxchgo_be_mmu(CPUArchState *env, target_ulong addr,
> +                                     Int128 cmpv, Int128 newv,
> +                                     TCGMemOpIdx oi, uintptr_t retaddr);
> +
> +Int128 helper_atomic_ldo_le_mmu(CPUArchState *env, target_ulong addr,
> +                                TCGMemOpIdx oi, uintptr_t retaddr);
> +Int128 helper_atomic_ldo_be_mmu(CPUArchState *env, target_ulong addr,
> +                                TCGMemOpIdx oi, uintptr_t retaddr);
> +void helper_atomic_sto_le_mmu(CPUArchState *env, target_ulong addr, Int128 val,
> +                              TCGMemOpIdx oi, uintptr_t retaddr);
> +void helper_atomic_sto_be_mmu(CPUArchState *env, target_ulong addr, Int128 val,
> +                              TCGMemOpIdx oi, uintptr_t retaddr);
> +
> +#endif /* CONFIG_ATOMIC128 */
> +
>  #endif /* TCG_H */

Otherwise:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

--
Alex Bennée

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Qemu-devel] [PATCH v3 13/34] tcg: Add atomic helpers
  2016-09-12 16:13         ` Richard Henderson
@ 2016-09-13 12:32           ` Leon Alrae
  0 siblings, 0 replies; 97+ messages in thread
From: Leon Alrae @ 2016-09-13 12:32 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Mon, Sep 12, 2016 at 09:13:10AM -0700, Richard Henderson wrote:
> On 09/12/2016 12:59 AM, Leon Alrae wrote:
> >On Fri, Sep 09, 2016 at 09:26:29AM -0700, Richard Henderson wrote:
> >>On 09/09/2016 07:46 AM, Leon Alrae wrote:
> >>>Wouldn't it be useful if tcg.h provided also aliases for _le/_be atomic
> >>>helpers (equivalent to helper_ret_X_mmu) so that in target-* code we wouldn't
> >>>need to care about the endianness (specifically I'm thinking about SC in MIPS
> >>>where I need to select between helper_atomic_cmpxchgl_le_mmu() and
> >>>helper_atomic_cmpxchgl_be_mmu()).
> >>
> >>Perhaps.  I would have hoped that you could do the SC inline now
> >>though, and tcg_gen_atomic_cmpxchg() will do the selection for you.
> >
> >On every SC we need to do the virtual -> physical address translation as we
> >have to compare the physical address against that of the preceeding LL.
> >This operation seems too complex to be inlined :(
> 
> What happens if you do virtual address comparisons, like everyone
> else?  It might not be strictly correct, but in practice I bet its
> no worse than using cmpxchg in the first place.

Yeah, probably true. Moreover, MIPS Instruction Set Manual explicitly says
that an RMW sequence must use the same address in the LL and SC (virtual
address, physical address, cacheability and coherency attributes must be
identical). Otherwise the result of the SC is not predictable.

I'll try to come up with simpler SC implementation using virtual
addresses and tcg_gen_atomic_cmpxchg().

Thanks,
Leon

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Qemu-devel] [PATCH v3 14/34] tcg: Add atomic128 helpers
  2016-09-13 11:18   ` Alex Bennée
@ 2016-09-13 14:18     ` Richard Henderson
  0 siblings, 0 replies; 97+ messages in thread
From: Richard Henderson @ 2016-09-13 14:18 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel

On 09/13/2016 04:18 AM, Alex Bennée wrote:
>> > -#define ATOMIC_NAME(X)   HELPER(glue(glue(atomic_ ## X, SUFFIX), END))
>> >  #define EXTRA_ARGS
>> > +#define ATOMIC_NAME(X)   HELPER(glue(glue(atomic_ ## X, SUFFIX), END))
> Did I miss a subtly here? Should this change be squashed into the
> original atomic helpers patch?
> 
> 

Dunno what happened here.  Probably an error fixing up a rebase conflict.


r~

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Qemu-devel] [PATCH v3 13/34] tcg: Add atomic helpers
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 13/34] tcg: Add atomic helpers Richard Henderson
                     ` (2 preceding siblings ...)
  2016-09-12 13:47   ` Alex Bennée
@ 2016-09-13 17:06   ` Alex Bennée
  2016-09-13 17:26     ` Richard Henderson
  3 siblings, 1 reply; 97+ messages in thread
From: Alex Bennée @ 2016-09-13 17:06 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <rth@twiddle.net> writes:

> Add all of cmpxchg, op_fetch, fetch_op, and xchg.
> Handle both endian-ness, and sizes up to 8.
> Handle expanding non-atomically, when emulating in serial.
>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  Makefile.objs         |   1 -
>  Makefile.target       |   1 +
>  atomic_template.h     | 173 ++++++++++++++++++++++++++
>  cputlb.c              | 112 ++++++++++++++++-
>  include/qemu/atomic.h |  21 +++-
>  tcg-runtime.c         |  49 ++++++--
>  tcg/tcg-op.c          | 328
> ++++++++++++++++++++++++++++++++++++++++++++++++++

I get a failure when building on centos6, x86:

  CC    aarch64-softmmu/tcg/tcg-op.o
/home/alex/lsrc/qemu.git/tcg/tcg-op.c:2280: error: ‘gen_helper_atomic_fetch_addb’ undeclared here (not in a function)
/home/alex/lsrc/qemu.git/tcg/tcg-op.c:2280: error: ‘gen_helper_atomic_fetch_addw_le’ undeclared here (not in a function)
/home/alex/lsrc/qemu.git/tcg/tcg-op.c:2280: error: ‘gen_helper_atomic_fetch_addw_be’ undeclared here (not in a function)
/home/alex/lsrc/qemu.git/tcg/tcg-op.c:2280: error: ‘gen_helper_atomic_fetch_addl_le’ undeclared here (not in a function)
/home/alex/lsrc/qemu.git/tcg/tcg-op.c:2280: error: ‘gen_helper_atomic_fetch_addl_be’ undeclared here (not in a function)
...


>  tcg/tcg-op.h          |  44 +++++++
>  tcg/tcg-runtime.h     |  75 ++++++++++++
>  tcg/tcg.h             |  53 ++++++++
>  10 files changed, 836 insertions(+), 21 deletions(-)
>  create mode 100644 atomic_template.h
>
> diff --git a/Makefile.objs b/Makefile.objs
> index 6d5ddcf..df8796e 100644
> --- a/Makefile.objs
> +++ b/Makefile.objs
> @@ -88,7 +88,6 @@ endif
>
>  #######################################################################
>  # Target-independent parts used in system and user emulation
> -common-obj-y += tcg-runtime.o
>  common-obj-y += hw/
>  common-obj-y += qom/
>  common-obj-y += disas/
> diff --git a/Makefile.target b/Makefile.target
> index a440bcb..1d16d10 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -94,6 +94,7 @@ obj-$(CONFIG_TCG_INTERPRETER) += disas/tci.o
>  obj-y += fpu/softfloat.o
>  obj-y += target-$(TARGET_BASE_ARCH)/
>  obj-y += disas.o
> +obj-y += tcg-runtime.o
>  obj-$(call notempty,$(TARGET_XML_FILES)) += gdbstub-xml.o
>  obj-$(call lnot,$(CONFIG_KVM)) += kvm-stub.o
>
> diff --git a/atomic_template.h b/atomic_template.h
> new file mode 100644
> index 0000000..d2c8a08
> --- /dev/null
> +++ b/atomic_template.h
> @@ -0,0 +1,173 @@
> +/*
> + * Atomic helper templates
> + * Included from tcg-runtime.c and cputlb.c.
> + *
> + * Copyright (c) 2016 Red Hat, Inc
> + *
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2 of the License, or (at your option) any later version.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with this library; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#if DATA_SIZE == 8
> +# define SUFFIX     q
> +# define DATA_TYPE  uint64_t
> +# define BSWAP      bswap64
> +#elif DATA_SIZE == 4
> +# define SUFFIX     l
> +# define DATA_TYPE  uint32_t
> +# define BSWAP      bswap32
> +#elif DATA_SIZE == 2
> +# define SUFFIX     w
> +# define DATA_TYPE  uint16_t
> +# define BSWAP      bswap16
> +#elif DATA_SIZE == 1
> +# define SUFFIX     b
> +# define DATA_TYPE  uint8_t
> +# define BSWAP
> +#else
> +# error unsupported data size
> +#endif
> +
> +#if DATA_SIZE >= 4
> +# define ABI_TYPE  DATA_TYPE
> +#else
> +# define ABI_TYPE  uint32_t
> +#endif
> +
> +#if DATA_SIZE == 1
> +# define END
> +#elif defined(HOST_WORDS_BIGENDIAN)
> +# define END  _be
> +#else
> +# define END  _le
> +#endif
> +
> +ABI_TYPE ATOMIC_NAME(cmpxchg)(CPUArchState *env, target_ulong addr,
> +                              ABI_TYPE cmpv, ABI_TYPE newv EXTRA_ARGS)
> +{
> +    DATA_TYPE *haddr = ATOMIC_MMU_LOOKUP;
> +    return atomic_cmpxchg__nocheck(haddr, cmpv, newv);
> +}
> +
> +ABI_TYPE ATOMIC_NAME(xchg)(CPUArchState *env, target_ulong addr,
> +                           ABI_TYPE val EXTRA_ARGS)
> +{
> +    DATA_TYPE *haddr = ATOMIC_MMU_LOOKUP;
> +    return atomic_xchg__nocheck(haddr, val);
> +}
> +
> +#define GEN_ATOMIC_HELPER(X)                                        \
> +ABI_TYPE ATOMIC_NAME(X)(CPUArchState *env, target_ulong addr,       \
> +                 ABI_TYPE val EXTRA_ARGS)                           \
> +{                                                                   \
> +    DATA_TYPE *haddr = ATOMIC_MMU_LOOKUP;                           \
> +    return atomic_##X(haddr, val);                                  \
> +}                                                                   \
> +
> +GEN_ATOMIC_HELPER(fetch_add)
> +GEN_ATOMIC_HELPER(fetch_and)
> +GEN_ATOMIC_HELPER(fetch_or)
> +GEN_ATOMIC_HELPER(fetch_xor)
> +GEN_ATOMIC_HELPER(add_fetch)
> +GEN_ATOMIC_HELPER(and_fetch)
> +GEN_ATOMIC_HELPER(or_fetch)
> +GEN_ATOMIC_HELPER(xor_fetch)
> +
> +#undef GEN_ATOMIC_HELPER
> +#undef END
> +
> +#if DATA_SIZE > 1
> +
> +#ifdef HOST_WORDS_BIGENDIAN
> +# define END  _le
> +#else
> +# define END  _be
> +#endif
> +
> +ABI_TYPE ATOMIC_NAME(cmpxchg)(CPUArchState *env, target_ulong addr,
> +                              ABI_TYPE cmpv, ABI_TYPE newv EXTRA_ARGS)
> +{
> +    DATA_TYPE *haddr = ATOMIC_MMU_LOOKUP;
> +    return BSWAP(atomic_cmpxchg__nocheck(haddr, BSWAP(cmpv), BSWAP(newv)));
> +}
> +
> +ABI_TYPE ATOMIC_NAME(xchg)(CPUArchState *env, target_ulong addr,
> +                           ABI_TYPE val EXTRA_ARGS)
> +{
> +    DATA_TYPE *haddr = ATOMIC_MMU_LOOKUP;
> +    return BSWAP(atomic_xchg__nocheck(haddr, BSWAP(val)));
> +}
> +
> +#define GEN_ATOMIC_HELPER(X)                                        \
> +ABI_TYPE ATOMIC_NAME(X)(CPUArchState *env, target_ulong addr,       \
> +                 ABI_TYPE val EXTRA_ARGS)                           \
> +{                                                                   \
> +    DATA_TYPE *haddr = ATOMIC_MMU_LOOKUP;                           \
> +    return BSWAP(atomic_##X(haddr, BSWAP(val)));                    \
> +}
> +
> +GEN_ATOMIC_HELPER(fetch_and)
> +GEN_ATOMIC_HELPER(fetch_or)
> +GEN_ATOMIC_HELPER(fetch_xor)
> +GEN_ATOMIC_HELPER(and_fetch)
> +GEN_ATOMIC_HELPER(or_fetch)
> +GEN_ATOMIC_HELPER(xor_fetch)
> +
> +#undef GEN_ATOMIC_HELPER
> +
> +/* Note that for addition, we need to use a separate cmpxchg loop instead
> +   of bswaps for the reverse-host-endian helpers.  */
> +ABI_TYPE ATOMIC_NAME(fetch_add)(CPUArchState *env, target_ulong addr,
> +                         ABI_TYPE val EXTRA_ARGS)
> +{
> +    DATA_TYPE *haddr = ATOMIC_MMU_LOOKUP;
> +    DATA_TYPE ldo, ldn, ret, sto;
> +
> +    ldo = *haddr;
> +    while (1) {
> +        ret = BSWAP(ldo);
> +        sto = BSWAP(ret + val);
> +        ldn = atomic_cmpxchg__nocheck(haddr, ldo, sto);
> +        if (ldn == ldo) {
> +            return ret;
> +        }
> +        ldo = ldn;
> +    }
> +}
> +
> +ABI_TYPE ATOMIC_NAME(add_fetch)(CPUArchState *env, target_ulong addr,
> +                         ABI_TYPE val EXTRA_ARGS)
> +{
> +    DATA_TYPE *haddr = ATOMIC_MMU_LOOKUP;
> +    DATA_TYPE ldo, ldn, ret, sto;
> +
> +    ldo = *haddr;
> +    while (1) {
> +        ret = BSWAP(ldo) + val;
> +        sto = BSWAP(ret);
> +        ldn = atomic_cmpxchg__nocheck(haddr, ldo, sto);
> +        if (ldn == ldo) {
> +            return ret;
> +        }
> +        ldo = ldn;
> +    }
> +}
> +
> +#undef END
> +#endif /* DATA_SIZE > 1 */
> +
> +#undef BSWAP
> +#undef ABI_TYPE
> +#undef DATA_TYPE
> +#undef SUFFIX
> +#undef DATA_SIZE
> diff --git a/cputlb.c b/cputlb.c
> index d710cc1..a54afb4 100644
> --- a/cputlb.c
> +++ b/cputlb.c
> @@ -23,15 +23,15 @@
>  #include "exec/memory.h"
>  #include "exec/address-spaces.h"
>  #include "exec/cpu_ldst.h"
> -
>  #include "exec/cputlb.h"
> -
>  #include "exec/memory-internal.h"
>  #include "exec/ram_addr.h"
>  #include "exec/exec-all.h"
>  #include "tcg/tcg.h"
>  #include "qemu/error-report.h"
>  #include "exec/log.h"
> +#include "exec/helper-proto.h"
> +#include "qemu/atomic.h"
>
>  /* DEBUG defines, enable DEBUG_TLB_LOG to log to the CPU_LOG_MMU target */
>  /* #define DEBUG_TLB */
> @@ -585,6 +585,69 @@ void probe_write(CPUArchState *env, target_ulong addr, int mmu_idx,
>      }
>  }
>
> +/* Probe for a read-modify-write atomic operation.  Do not allow unaligned
> + * operations, or io operations to proceed.  Return the host address.  */
> +static void *atomic_mmu_lookup(CPUArchState *env, target_ulong addr,
> +                               TCGMemOpIdx oi, uintptr_t retaddr)
> +{
> +    size_t mmu_idx = get_mmuidx(oi);
> +    size_t index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
> +    CPUTLBEntry *tlbe = &env->tlb_table[mmu_idx][index];
> +    target_ulong tlb_addr = tlbe->addr_write;
> +    TCGMemOp mop = get_memop(oi);
> +    int a_bits = get_alignment_bits(mop);
> +    int s_bits = mop & MO_SIZE;
> +
> +    /* Adjust the given return address.  */
> +    retaddr -= GETPC_ADJ;
> +
> +    /* Enforce guest required alignment.  */
> +    if (unlikely(a_bits > 0 && (addr & ((1 << a_bits) - 1)))) {
> +        /* ??? Maybe indicate atomic op to cpu_unaligned_access */
> +        cpu_unaligned_access(ENV_GET_CPU(env), addr, MMU_DATA_STORE,
> +                             mmu_idx, retaddr);
> +    }
> +
> +    /* Enforce qemu required alignment.  */
> +    if (unlikely(addr & ((1 << s_bits) - 1))) {
> +        /* We get here if guest alignment was not requested,
> +           or was not enforced by cpu_unaligned_access above.
> +           We might widen the access and emulate, but for now
> +           mark an exception and exit the cpu loop.  */
> +        goto stop_the_world;
> +    }
> +
> +    /* Check TLB entry and enforce page permissions.  */
> +    if ((addr & TARGET_PAGE_MASK)
> +        != (tlb_addr & (TARGET_PAGE_MASK | TLB_INVALID_MASK))) {
> +        if (!VICTIM_TLB_HIT(addr_write, addr)) {
> +            tlb_fill(ENV_GET_CPU(env), addr, MMU_DATA_STORE, mmu_idx, retaddr);
> +        }
> +        tlb_addr = tlbe->addr_write;
> +    }
> +
> +    /* Notice an IO access, or a notdirty page.  */
> +    if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
> +        /* There's really nothing that can be done to
> +           support this apart from stop-the-world.  */
> +        goto stop_the_world;
> +    }
> +
> +    /* Let the guest notice RMW on a write-only page.  */
> +    if (unlikely(tlbe->addr_read != tlb_addr)) {
> +        tlb_fill(ENV_GET_CPU(env), addr, MMU_DATA_LOAD, mmu_idx, retaddr);
> +        /* Since we don't support reads and writes to different addresses,
> +           and we do have the proper page loaded for write, this shouldn't
> +           ever return.  But just in case, handle via stop-the-world.  */
> +        goto stop_the_world;
> +    }
> +
> +    return (void *)((uintptr_t)addr + tlbe->addend);
> +
> + stop_the_world:
> +    cpu_loop_exit_atomic(ENV_GET_CPU(env), retaddr);
> +}
> +
>  #ifdef TARGET_WORDS_BIGENDIAN
>  # define TGT_BE(X)  (X)
>  # define TGT_LE(X)  BSWAP(X)
> @@ -606,8 +669,51 @@ void probe_write(CPUArchState *env, target_ulong addr, int mmu_idx,
>
>  #define DATA_SIZE 8
>  #include "softmmu_template.h"
> -#undef MMUSUFFIX
>
> +/* First set of helpers allows passing in of OI and RETADDR.  This makes
> +   them callable from other helpers.  */
> +
> +#define EXTRA_ARGS     , TCGMemOpIdx oi, uintptr_t retaddr
> +#define ATOMIC_NAME(X) \
> +    HELPER(glue(glue(glue(atomic_ ## X, SUFFIX), END), _mmu))
> +#define ATOMIC_MMU_LOOKUP  atomic_mmu_lookup(env, addr, oi, retaddr)
> +
> +#define DATA_SIZE 1
> +#include "atomic_template.h"
> +
> +#define DATA_SIZE 2
> +#include "atomic_template.h"
> +
> +#define DATA_SIZE 4
> +#include "atomic_template.h"
> +
> +#define DATA_SIZE 8
> +#include "atomic_template.h"
> +
> +/* Second set of helpers are directly callable from TCG as helpers.  */
> +
> +#undef EXTRA_ARGS
> +#undef ATOMIC_NAME
> +#undef ATOMIC_MMU_LOOKUP
> +#define EXTRA_ARGS         , TCGMemOpIdx oi
> +#define ATOMIC_NAME(X)     HELPER(glue(glue(atomic_ ## X, SUFFIX), END))
> +#define ATOMIC_MMU_LOOKUP  atomic_mmu_lookup(env, addr, oi, GETRA())
> +
> +#define DATA_SIZE 1
> +#include "atomic_template.h"
> +
> +#define DATA_SIZE 2
> +#include "atomic_template.h"
> +
> +#define DATA_SIZE 4
> +#include "atomic_template.h"
> +
> +#define DATA_SIZE 8
> +#include "atomic_template.h"
> +
> +/* Code access functions.  */
> +
> +#undef MMUSUFFIX
>  #define MMUSUFFIX _cmmu
>  #undef GETPC_ADJ
>  #define GETPC_ADJ 0
> diff --git a/include/qemu/atomic.h b/include/qemu/atomic.h
> index 6e5cdd5..5542416 100644
> --- a/include/qemu/atomic.h
> +++ b/include/qemu/atomic.h
> @@ -177,22 +177,29 @@
>
>  /* All the remaining operations are fully sequentially consistent */
>
> -#define atomic_xchg(ptr, i)    ({                           \
> -    QEMU_BUILD_BUG_ON(sizeof(*ptr) > sizeof(void *));       \
> +#define atomic_xchg__nocheck(ptr, i)    ({                  \
>      typeof_strip_qual(*ptr) _new = (i), _old;               \
>      __atomic_exchange(ptr, &_new, &_old, __ATOMIC_SEQ_CST); \
>      _old;                                                   \
>  })
>
> +#define atomic_xchg(ptr, i)    ({                           \
> +    QEMU_BUILD_BUG_ON(sizeof(*ptr) > sizeof(void *));       \
> +    atomic_xchg__nocheck(ptr, i);                           \
> +})
> +
>  /* Returns the eventual value, failed or not */
> -#define atomic_cmpxchg(ptr, old, new)                                   \
> -    ({                                                                  \
> -    QEMU_BUILD_BUG_ON(sizeof(*ptr) > sizeof(void *));                   \
> +#define atomic_cmpxchg__nocheck(ptr, old, new)    ({                    \
>      typeof_strip_qual(*ptr) _old = (old), _new = (new);                 \
>      __atomic_compare_exchange(ptr, &_old, &_new, false,                 \
>                                __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST);      \
>      _old;                                                               \
> -    })
> +})
> +
> +#define atomic_cmpxchg(ptr, old, new)    ({                             \
> +    QEMU_BUILD_BUG_ON(sizeof(*ptr) > sizeof(void *));                   \
> +    atomic_cmpxchg__nocheck(ptr, old, new);                             \
> +})
>
>  /* Provide shorter names for GCC atomic builtins, return old value */
>  #define atomic_fetch_inc(ptr)  __atomic_fetch_add(ptr, 1, __ATOMIC_SEQ_CST)
> @@ -397,6 +404,7 @@
>  #define atomic_xchg(ptr, i)    (smp_mb(), __sync_lock_test_and_set(ptr, i))
>  #endif
>  #endif
> +#define atomic_xchg__nocheck  atomic_xchg
>
>  /* Provide shorter names for GCC atomic builtins.  */
>  #define atomic_fetch_inc(ptr)  __sync_fetch_and_add(ptr, 1)
> @@ -416,6 +424,7 @@
>  #define atomic_xor_fetch       __sync_xor_and_fetch
>
>  #define atomic_cmpxchg         __sync_val_compare_and_swap
> +#define atomic_cmpxchg__nocheck  atomic_cmpxchg
>
>  /* And even shorter names that return void.  */
>  #define atomic_inc(ptr)        ((void) __sync_fetch_and_add(ptr, 1))
> diff --git a/tcg-runtime.c b/tcg-runtime.c
> index ea2ad64..aa55d12 100644
> --- a/tcg-runtime.c
> +++ b/tcg-runtime.c
> @@ -23,17 +23,10 @@
>   */
>  #include "qemu/osdep.h"
>  #include "qemu/host-utils.h"
> -
> -/* This file is compiled once, and thus we can't include the standard
> -   "exec/helper-proto.h", which has includes that are target specific.  */
> -
> -#include "exec/helper-head.h"
> -
> -#define DEF_HELPER_FLAGS_2(name, flags, ret, t1, t2) \
> -  dh_ctype(ret) HELPER(name) (dh_ctype(t1), dh_ctype(t2));
> -
> -#include "tcg-runtime.h"
> -
> +#include "cpu.h"
> +#include "exec/helper-proto.h"
> +#include "exec/cpu_ldst.h"
> +#include "exec/exec-all.h"
>
>  /* 32-bit helpers */
>
> @@ -107,3 +100,37 @@ int64_t HELPER(mulsh_i64)(int64_t arg1, int64_t arg2)
>      muls64(&l, &h, arg1, arg2);
>      return h;
>  }
> +
> +#ifndef CONFIG_SOFTMMU
> +/* The softmmu versions of these helpers are in cputlb.c.  */
> +
> +/* Do not allow unaligned operations to proceed.  Return the host address.  */
> +static void *atomic_mmu_lookup(CPUArchState *env, target_ulong addr,
> +                               int size, uintptr_t retaddr)
> +{
> +    /* Enforce qemu required alignment.  */
> +    if (unlikely(addr & (size - 1))) {
> +        cpu_loop_exit_atomic(ENV_GET_CPU(env), retaddr);
> +    }
> +    return g2h(addr);
> +}
> +
> +/* Macro to call the above, with local variables from the use context.  */
> +#define ATOMIC_MMU_LOOKUP  atomic_mmu_lookup(env, addr, DATA_SIZE, GETPC())
> +
> +#define ATOMIC_NAME(X)   HELPER(glue(glue(atomic_ ## X, SUFFIX), END))
> +#define EXTRA_ARGS
> +
> +#define DATA_SIZE 1
> +#include "atomic_template.h"
> +
> +#define DATA_SIZE 2
> +#include "atomic_template.h"
> +
> +#define DATA_SIZE 4
> +#include "atomic_template.h"
> +
> +#define DATA_SIZE 8
> +#include "atomic_template.h"
> +
> +#endif /* !CONFIG_SOFTMMU */
> diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
> index 0243c99..e146ad4 100644
> --- a/tcg/tcg-op.c
> +++ b/tcg/tcg-op.c
> @@ -1958,3 +1958,331 @@ void tcg_gen_qemu_st_i64(TCGv_i64 val, TCGv addr, TCGArg idx, TCGMemOp memop)
>                                 addr, trace_mem_get_info(memop, 1));
>      gen_ldst_i64(INDEX_op_qemu_st_i64, val, addr, memop, idx);
>  }
> +
> +static void tcg_gen_ext_i32(TCGv_i32 ret, TCGv_i32 val, TCGMemOp opc)
> +{
> +    switch (opc & MO_SSIZE) {
> +    case MO_SB:
> +        tcg_gen_ext8s_i32(ret, val);
> +        break;
> +    case MO_UB:
> +        tcg_gen_ext8u_i32(ret, val);
> +        break;
> +    case MO_SW:
> +        tcg_gen_ext16s_i32(ret, val);
> +        break;
> +    case MO_UW:
> +        tcg_gen_ext16u_i32(ret, val);
> +        break;
> +    default:
> +        tcg_gen_mov_i32(ret, val);
> +        break;
> +    }
> +}
> +
> +static void tcg_gen_ext_i64(TCGv_i64 ret, TCGv_i64 val, TCGMemOp opc)
> +{
> +    switch (opc & MO_SSIZE) {
> +    case MO_SB:
> +        tcg_gen_ext8s_i64(ret, val);
> +        break;
> +    case MO_UB:
> +        tcg_gen_ext8u_i64(ret, val);
> +        break;
> +    case MO_SW:
> +        tcg_gen_ext16s_i64(ret, val);
> +        break;
> +    case MO_UW:
> +        tcg_gen_ext16u_i64(ret, val);
> +        break;
> +    case MO_SL:
> +        tcg_gen_ext32s_i64(ret, val);
> +        break;
> +    case MO_UL:
> +        tcg_gen_ext32u_i64(ret, val);
> +        break;
> +    default:
> +        tcg_gen_mov_i64(ret, val);
> +        break;
> +    }
> +}
> +
> +#ifdef CONFIG_SOFTMMU
> +typedef void (*gen_atomic_cx_i32)(TCGv_i32, TCGv_env, TCGv,
> +                                  TCGv_i32, TCGv_i32, TCGv_i32);
> +typedef void (*gen_atomic_cx_i64)(TCGv_i64, TCGv_env, TCGv,
> +                                  TCGv_i64, TCGv_i64, TCGv_i32);
> +typedef void (*gen_atomic_op_i32)(TCGv_i32, TCGv_env, TCGv,
> +                                  TCGv_i32, TCGv_i32);
> +typedef void (*gen_atomic_op_i64)(TCGv_i64, TCGv_env, TCGv,
> +                                  TCGv_i64, TCGv_i32);
> +#else
> +typedef void (*gen_atomic_cx_i32)(TCGv_i32, TCGv_env, TCGv, TCGv_i32, TCGv_i32);
> +typedef void (*gen_atomic_cx_i64)(TCGv_i64, TCGv_env, TCGv, TCGv_i64, TCGv_i64);
> +typedef void (*gen_atomic_op_i32)(TCGv_i32, TCGv_env, TCGv, TCGv_i32);
> +typedef void (*gen_atomic_op_i64)(TCGv_i64, TCGv_env, TCGv, TCGv_i64);
> +#endif
> +
> +static void * const table_cmpxchg[16] = {
> +    [MO_8] = gen_helper_atomic_cmpxchgb,
> +    [MO_16 | MO_LE] = gen_helper_atomic_cmpxchgw_le,
> +    [MO_16 | MO_BE] = gen_helper_atomic_cmpxchgw_be,
> +    [MO_32 | MO_LE] = gen_helper_atomic_cmpxchgl_le,
> +    [MO_32 | MO_BE] = gen_helper_atomic_cmpxchgl_be,
> +    [MO_64 | MO_LE] = gen_helper_atomic_cmpxchgq_le,
> +    [MO_64 | MO_BE] = gen_helper_atomic_cmpxchgq_be,
> +};
> +
> +void tcg_gen_atomic_cmpxchg_i32(TCGv_i32 retv, TCGv addr, TCGv_i32 cmpv,
> +                                TCGv_i32 newv, TCGArg idx, TCGMemOp memop)
> +{
> +    memop = tcg_canonicalize_memop(memop, 0, 0);
> +
> +    if (!parallel_cpus) {
> +        TCGv_i32 t1 = tcg_temp_new_i32();
> +        TCGv_i32 t2 = tcg_temp_new_i32();
> +
> +        tcg_gen_ext_i32(t2, cmpv, memop & MO_SIZE);
> +
> +        tcg_gen_qemu_ld_i32(t1, addr, idx, memop & ~MO_SIGN);
> +        tcg_gen_movcond_i32(TCG_COND_EQ, t2, t1, t2, newv, t1);
> +        tcg_gen_qemu_st_i32(t2, addr, idx, memop);
> +        tcg_temp_free_i32(t2);
> +
> +        if (memop & MO_SIGN) {
> +            tcg_gen_ext_i32(retv, t1, memop);
> +        } else {
> +            tcg_gen_mov_i32(retv, t1);
> +        }
> +        tcg_temp_free_i32(t1);
> +    } else {
> +        gen_atomic_cx_i32 gen;
> +
> +        gen = table_cmpxchg[memop & (MO_SIZE | MO_BSWAP)];
> +        tcg_debug_assert(gen != NULL);
> +
> +#ifdef CONFIG_SOFTMMU
> +        {
> +            TCGv_i32 oi = tcg_const_i32(make_memop_idx(memop & ~MO_SIGN, idx));
> +            gen(retv, tcg_ctx.tcg_env, addr, cmpv, newv, oi);
> +            tcg_temp_free_i32(oi);
> +        }
> +#else
> +        gen(retv, tcg_ctx.tcg_env, addr, cmpv, newv);
> +#endif
> +
> +        if (memop & MO_SIGN) {
> +            tcg_gen_ext_i32(retv, retv, memop);
> +        }
> +    }
> +}
> +
> +void tcg_gen_atomic_cmpxchg_i64(TCGv_i64 retv, TCGv addr, TCGv_i64 cmpv,
> +                                TCGv_i64 newv, TCGArg idx, TCGMemOp memop)
> +{
> +    memop = tcg_canonicalize_memop(memop, 1, 0);
> +
> +    if (!parallel_cpus) {
> +        TCGv_i64 t1 = tcg_temp_new_i64();
> +        TCGv_i64 t2 = tcg_temp_new_i64();
> +
> +        tcg_gen_ext_i64(t2, cmpv, memop & MO_SIZE);
> +
> +        tcg_gen_qemu_ld_i64(t1, addr, idx, memop & ~MO_SIGN);
> +        tcg_gen_movcond_i64(TCG_COND_EQ, t2, t1, t2, newv, t1);
> +        tcg_gen_qemu_st_i64(t2, addr, idx, memop);
> +        tcg_temp_free_i64(t2);
> +
> +        if (memop & MO_SIGN) {
> +            tcg_gen_ext_i64(retv, t1, memop);
> +        } else {
> +            tcg_gen_mov_i64(retv, t1);
> +        }
> +        tcg_temp_free_i64(t1);
> +    } else if ((memop & MO_SIZE) == MO_64) {
> +        gen_atomic_cx_i64 gen;
> +
> +        gen = table_cmpxchg[memop & (MO_SIZE | MO_BSWAP)];
> +        tcg_debug_assert(gen != NULL);
> +
> +#ifdef CONFIG_SOFTMMU
> +        {
> +            TCGv_i32 oi = tcg_const_i32(make_memop_idx(memop, idx));
> +            gen(retv, tcg_ctx.tcg_env, addr, cmpv, newv, oi);
> +            tcg_temp_free_i32(oi);
> +        }
> +#else
> +        gen(retv, tcg_ctx.tcg_env, addr, cmpv, newv);
> +#endif
> +    } else {
> +        TCGv_i32 c32 = tcg_temp_new_i32();
> +        TCGv_i32 n32 = tcg_temp_new_i32();
> +        TCGv_i32 r32 = tcg_temp_new_i32();
> +
> +        tcg_gen_extrl_i64_i32(c32, cmpv);
> +        tcg_gen_extrl_i64_i32(n32, newv);
> +        tcg_gen_atomic_cmpxchg_i32(r32, addr, c32, n32, idx, memop & ~MO_SIGN);
> +        tcg_temp_free_i32(c32);
> +        tcg_temp_free_i32(n32);
> +
> +        tcg_gen_extu_i32_i64(retv, r32);
> +        tcg_temp_free_i32(r32);
> +
> +        if (memop & MO_SIGN) {
> +            tcg_gen_ext_i64(retv, retv, memop);
> +        }
> +    }
> +}
> +
> +static void do_nonatomic_op_i32(TCGv_i32 ret, TCGv addr, TCGv_i32 val,
> +                                TCGArg idx, TCGMemOp memop, bool new_val,
> +                                void (*gen)(TCGv_i32, TCGv_i32, TCGv_i32))
> +{
> +    TCGv_i32 t1 = tcg_temp_new_i32();
> +    TCGv_i32 t2 = tcg_temp_new_i32();
> +
> +    memop = tcg_canonicalize_memop(memop, 0, 0);
> +
> +    tcg_gen_qemu_ld_i32(t1, addr, idx, memop & ~MO_SIGN);
> +    gen(t2, t1, val);
> +    tcg_gen_qemu_st_i32(t2, addr, idx, memop);
> +
> +    tcg_gen_ext_i32(ret, (new_val ? t2 : t1), memop);
> +    tcg_temp_free_i32(t1);
> +    tcg_temp_free_i32(t2);
> +}
> +
> +static void do_atomic_op_i32(TCGv_i32 ret, TCGv addr, TCGv_i32 val,
> +                             TCGArg idx, TCGMemOp memop, void * const table[])
> +{
> +    gen_atomic_op_i32 gen;
> +
> +    memop = tcg_canonicalize_memop(memop, 0, 0);
> +
> +    gen = table[memop & (MO_SIZE | MO_BSWAP)];
> +    tcg_debug_assert(gen != NULL);
> +
> +#ifdef CONFIG_SOFTMMU
> +    {
> +        TCGv_i32 oi = tcg_const_i32(make_memop_idx(memop & ~MO_SIGN, idx));
> +        gen(ret, tcg_ctx.tcg_env, addr, val, oi);
> +        tcg_temp_free_i32(oi);
> +    }
> +#else
> +    gen(ret, tcg_ctx.tcg_env, addr, val);
> +#endif
> +
> +    if (memop & MO_SIGN) {
> +        tcg_gen_ext_i32(ret, ret, memop);
> +    }
> +}
> +
> +static void do_nonatomic_op_i64(TCGv_i64 ret, TCGv addr, TCGv_i64 val,
> +                                TCGArg idx, TCGMemOp memop, bool new_val,
> +                                void (*gen)(TCGv_i64, TCGv_i64, TCGv_i64))
> +{
> +    TCGv_i64 t1 = tcg_temp_new_i64();
> +    TCGv_i64 t2 = tcg_temp_new_i64();
> +
> +    memop = tcg_canonicalize_memop(memop, 1, 0);
> +
> +    tcg_gen_qemu_ld_i64(t1, addr, idx, memop & ~MO_SIGN);
> +    gen(t2, t1, val);
> +    tcg_gen_qemu_st_i64(t2, addr, idx, memop);
> +
> +    tcg_gen_ext_i64(ret, (new_val ? t2 : t1), memop);
> +    tcg_temp_free_i64(t1);
> +    tcg_temp_free_i64(t2);
> +}
> +
> +static void do_atomic_op_i64(TCGv_i64 ret, TCGv addr, TCGv_i64 val,
> +                             TCGArg idx, TCGMemOp memop, void * const table[])
> +{
> +    memop = tcg_canonicalize_memop(memop, 1, 0);
> +
> +    if ((memop & MO_SIZE) == MO_64) {
> +        gen_atomic_op_i64 gen;
> +
> +        gen = table[memop & (MO_SIZE | MO_BSWAP)];
> +        tcg_debug_assert(gen != NULL);
> +
> +#ifdef CONFIG_SOFTMMU
> +        {
> +            TCGv_i32 oi = tcg_const_i32(make_memop_idx(memop & ~MO_SIGN, idx));
> +            gen(ret, tcg_ctx.tcg_env, addr, val, oi);
> +            tcg_temp_free_i32(oi);
> +        }
> +#else
> +        gen(ret, tcg_ctx.tcg_env, addr, val);
> +#endif
> +    } else {
> +        TCGv_i32 v32 = tcg_temp_new_i32();
> +        TCGv_i32 r32 = tcg_temp_new_i32();
> +
> +        tcg_gen_extrl_i64_i32(v32, val);
> +        do_atomic_op_i32(r32, addr, v32, idx, memop & ~MO_SIGN, table);
> +        tcg_temp_free_i32(v32);
> +
> +        tcg_gen_extu_i32_i64(ret, r32);
> +        tcg_temp_free_i32(r32);
> +
> +        if (memop & MO_SIGN) {
> +            tcg_gen_ext_i64(ret, ret, memop);
> +        }
> +    }
> +}
> +
> +#define GEN_ATOMIC_HELPER(NAME, OP, NEW)                                \
> +static void * const table_##NAME[16] = {                                \
> +    [MO_8] = gen_helper_atomic_##NAME##b,                               \
> +    [MO_16 | MO_LE] = gen_helper_atomic_##NAME##w_le,                   \
> +    [MO_16 | MO_BE] = gen_helper_atomic_##NAME##w_be,                   \
> +    [MO_32 | MO_LE] = gen_helper_atomic_##NAME##l_le,                   \
> +    [MO_32 | MO_BE] = gen_helper_atomic_##NAME##l_be,                   \
> +    [MO_64 | MO_LE] = gen_helper_atomic_##NAME##q_le,                   \
> +    [MO_64 | MO_BE] = gen_helper_atomic_##NAME##q_be,                   \
> +};                                                                      \
> +void tcg_gen_atomic_##NAME##_i32                                        \
> +    (TCGv_i32 ret, TCGv addr, TCGv_i32 val, TCGArg idx, TCGMemOp memop) \
> +{                                                                       \
> +    if (parallel_cpus) {                                                \
> +        do_atomic_op_i32(ret, addr, val, idx, memop, table_##NAME);     \
> +    } else {                                                            \
> +        do_nonatomic_op_i32(ret, addr, val, idx, memop, NEW,            \
> +                            tcg_gen_##OP##_i32);                        \
> +    }                                                                   \
> +}                                                                       \
> +void tcg_gen_atomic_##NAME##_i64                                        \
> +    (TCGv_i64 ret, TCGv addr, TCGv_i64 val, TCGArg idx, TCGMemOp memop) \
> +{                                                                       \
> +    if (parallel_cpus) {                                                \
> +        do_atomic_op_i64(ret, addr, val, idx, memop, table_##NAME);     \
> +    } else {                                                            \
> +        do_nonatomic_op_i64(ret, addr, val, idx, memop, NEW,            \
> +                            tcg_gen_##OP##_i64);                        \
> +    }                                                                   \
> +}
> +
> +GEN_ATOMIC_HELPER(fetch_add, add, 0)
> +GEN_ATOMIC_HELPER(fetch_and, and, 0)
> +GEN_ATOMIC_HELPER(fetch_or, or, 0)
> +GEN_ATOMIC_HELPER(fetch_xor, xor, 0)
> +
> +GEN_ATOMIC_HELPER(add_fetch, add, 1)
> +GEN_ATOMIC_HELPER(and_fetch, and, 1)
> +GEN_ATOMIC_HELPER(or_fetch, or, 1)
> +GEN_ATOMIC_HELPER(xor_fetch, xor, 1)
> +
> +static void tcg_gen_mov2_i32(TCGv_i32 r, TCGv_i32 a, TCGv_i32 b)
> +{
> +    tcg_gen_mov_i32(r, b);
> +}
> +
> +static void tcg_gen_mov2_i64(TCGv_i64 r, TCGv_i64 a, TCGv_i64 b)
> +{
> +    tcg_gen_mov_i64(r, b);
> +}
> +
> +GEN_ATOMIC_HELPER(xchg, mov2, 0)
> +
> +#undef GEN_ATOMIC_HELPER
> diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
> index f217e80..2a845ae 100644
> --- a/tcg/tcg-op.h
> +++ b/tcg/tcg-op.h
> @@ -852,6 +852,30 @@ static inline void tcg_gen_qemu_st64(TCGv_i64 arg, TCGv addr, int mem_index)
>      tcg_gen_qemu_st_i64(arg, addr, mem_index, MO_TEQ);
>  }
>
> +void tcg_gen_atomic_cmpxchg_i32(TCGv_i32, TCGv, TCGv_i32, TCGv_i32,
> +                                TCGArg, TCGMemOp);
> +void tcg_gen_atomic_cmpxchg_i64(TCGv_i64, TCGv, TCGv_i64, TCGv_i64,
> +                                TCGArg, TCGMemOp);
> +
> +void tcg_gen_atomic_xchg_i32(TCGv_i32, TCGv, TCGv_i32, TCGArg, TCGMemOp);
> +void tcg_gen_atomic_xchg_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
> +void tcg_gen_atomic_fetch_add_i32(TCGv_i32, TCGv, TCGv_i32, TCGArg, TCGMemOp);
> +void tcg_gen_atomic_fetch_add_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
> +void tcg_gen_atomic_fetch_and_i32(TCGv_i32, TCGv, TCGv_i32, TCGArg, TCGMemOp);
> +void tcg_gen_atomic_fetch_and_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
> +void tcg_gen_atomic_fetch_or_i32(TCGv_i32, TCGv, TCGv_i32, TCGArg, TCGMemOp);
> +void tcg_gen_atomic_fetch_or_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
> +void tcg_gen_atomic_fetch_xor_i32(TCGv_i32, TCGv, TCGv_i32, TCGArg, TCGMemOp);
> +void tcg_gen_atomic_fetch_xor_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
> +void tcg_gen_atomic_add_fetch_i32(TCGv_i32, TCGv, TCGv_i32, TCGArg, TCGMemOp);
> +void tcg_gen_atomic_add_fetch_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
> +void tcg_gen_atomic_and_fetch_i32(TCGv_i32, TCGv, TCGv_i32, TCGArg, TCGMemOp);
> +void tcg_gen_atomic_and_fetch_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
> +void tcg_gen_atomic_or_fetch_i32(TCGv_i32, TCGv, TCGv_i32, TCGArg, TCGMemOp);
> +void tcg_gen_atomic_or_fetch_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
> +void tcg_gen_atomic_xor_fetch_i32(TCGv_i32, TCGv, TCGv_i32, TCGArg, TCGMemOp);
> +void tcg_gen_atomic_xor_fetch_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
> +
>  #if TARGET_LONG_BITS == 64
>  #define tcg_gen_movi_tl tcg_gen_movi_i64
>  #define tcg_gen_mov_tl tcg_gen_mov_i64
> @@ -930,6 +954,16 @@ static inline void tcg_gen_qemu_st64(TCGv_i64 arg, TCGv addr, int mem_index)
>  #define tcg_gen_sub2_tl tcg_gen_sub2_i64
>  #define tcg_gen_mulu2_tl tcg_gen_mulu2_i64
>  #define tcg_gen_muls2_tl tcg_gen_muls2_i64
> +#define tcg_gen_atomic_cmpxchg_tl tcg_gen_atomic_cmpxchg_i64
> +#define tcg_gen_atomic_xchg_tl tcg_gen_atomic_xchg_i64
> +#define tcg_gen_atomic_fetch_add_tl tcg_gen_atomic_fetch_add_i64
> +#define tcg_gen_atomic_fetch_and_tl tcg_gen_atomic_fetch_and_i64
> +#define tcg_gen_atomic_fetch_or_tl tcg_gen_atomic_fetch_or_i64
> +#define tcg_gen_atomic_fetch_xor_tl tcg_gen_atomic_fetch_xor_i64
> +#define tcg_gen_atomic_add_fetch_tl tcg_gen_atomic_add_fetch_i64
> +#define tcg_gen_atomic_and_fetch_tl tcg_gen_atomic_and_fetch_i64
> +#define tcg_gen_atomic_or_fetch_tl tcg_gen_atomic_or_fetch_i64
> +#define tcg_gen_atomic_xor_fetch_tl tcg_gen_atomic_xor_fetch_i64
>  #else
>  #define tcg_gen_movi_tl tcg_gen_movi_i32
>  #define tcg_gen_mov_tl tcg_gen_mov_i32
> @@ -1007,6 +1041,16 @@ static inline void tcg_gen_qemu_st64(TCGv_i64 arg, TCGv addr, int mem_index)
>  #define tcg_gen_sub2_tl tcg_gen_sub2_i32
>  #define tcg_gen_mulu2_tl tcg_gen_mulu2_i32
>  #define tcg_gen_muls2_tl tcg_gen_muls2_i32
> +#define tcg_gen_atomic_cmpxchg_tl tcg_gen_atomic_cmpxchg_i32
> +#define tcg_gen_atomic_xchg_tl tcg_gen_atomic_xchg_i32
> +#define tcg_gen_atomic_fetch_add_tl tcg_gen_atomic_fetch_add_i32
> +#define tcg_gen_atomic_fetch_and_tl tcg_gen_atomic_fetch_and_i32
> +#define tcg_gen_atomic_fetch_or_tl tcg_gen_atomic_fetch_or_i32
> +#define tcg_gen_atomic_fetch_xor_tl tcg_gen_atomic_fetch_xor_i32
> +#define tcg_gen_atomic_add_fetch_tl tcg_gen_atomic_add_fetch_i32
> +#define tcg_gen_atomic_and_fetch_tl tcg_gen_atomic_and_fetch_i32
> +#define tcg_gen_atomic_or_fetch_tl tcg_gen_atomic_or_fetch_i32
> +#define tcg_gen_atomic_xor_fetch_tl tcg_gen_atomic_xor_fetch_i32
>  #endif
>
>  #if UINTPTR_MAX == UINT32_MAX
> diff --git a/tcg/tcg-runtime.h b/tcg/tcg-runtime.h
> index 23a0c37..22367aa 100644
> --- a/tcg/tcg-runtime.h
> +++ b/tcg/tcg-runtime.h
> @@ -14,3 +14,78 @@ DEF_HELPER_FLAGS_2(sar_i64, TCG_CALL_NO_RWG_SE, s64, s64, s64)
>
>  DEF_HELPER_FLAGS_2(mulsh_i64, TCG_CALL_NO_RWG_SE, s64, s64, s64)
>  DEF_HELPER_FLAGS_2(muluh_i64, TCG_CALL_NO_RWG_SE, i64, i64, i64)
> +
> +#ifdef CONFIG_SOFTMMU
> +
> +DEF_HELPER_FLAGS_5(atomic_cmpxchgb, TCG_CALL_NO_WG,
> +                   i32, env, tl, i32, i32, i32)
> +DEF_HELPER_FLAGS_5(atomic_cmpxchgw_be, TCG_CALL_NO_WG,
> +                   i32, env, tl, i32, i32, i32)
> +DEF_HELPER_FLAGS_5(atomic_cmpxchgl_be, TCG_CALL_NO_WG,
> +                   i32, env, tl, i32, i32, i32)
> +DEF_HELPER_FLAGS_5(atomic_cmpxchgq_be, TCG_CALL_NO_WG,
> +                   i64, env, tl, i64, i64, i32)
> +DEF_HELPER_FLAGS_5(atomic_cmpxchgw_le, TCG_CALL_NO_WG,
> +                   i32, env, tl, i32, i32, i32)
> +DEF_HELPER_FLAGS_5(atomic_cmpxchgl_le, TCG_CALL_NO_WG,
> +                   i32, env, tl, i32, i32, i32)
> +DEF_HELPER_FLAGS_5(atomic_cmpxchgq_le, TCG_CALL_NO_WG,
> +                   i64, env, tl, i64, i64, i32)
> +
> +#define GEN_ATOMIC_HELPERS(NAME)                                  \
> +    DEF_HELPER_FLAGS_4(glue(glue(atomic_, NAME), b),              \
> +                       TCG_CALL_NO_WG, i32, env, tl, i32, i32)    \
> +    DEF_HELPER_FLAGS_4(glue(glue(atomic_, NAME), w_le),           \
> +                       TCG_CALL_NO_WG, i32, env, tl, i32, i32)    \
> +    DEF_HELPER_FLAGS_4(glue(glue(atomic_, NAME), w_be),           \
> +                       TCG_CALL_NO_WG, i32, env, tl, i32, i32)    \
> +    DEF_HELPER_FLAGS_4(glue(glue(atomic_, NAME), l_le),           \
> +                       TCG_CALL_NO_WG, i32, env, tl, i32, i32)    \
> +    DEF_HELPER_FLAGS_4(glue(glue(atomic_, NAME), l_be),           \
> +                       TCG_CALL_NO_WG, i32, env, tl, i32, i32)    \
> +    DEF_HELPER_FLAGS_4(glue(glue(atomic_, NAME), q_le),           \
> +                       TCG_CALL_NO_WG, i64, env, tl, i64, i32)    \
> +    DEF_HELPER_FLAGS_4(glue(glue(atomic_, NAME), q_be),           \
> +                       TCG_CALL_NO_WG, i64, env, tl, i64, i32)
> +
> +#else
> +
> +DEF_HELPER_FLAGS_4(atomic_cmpxchgb, TCG_CALL_NO_WG, i32, env, tl, i32, i32)
> +DEF_HELPER_FLAGS_4(atomic_cmpxchgw_be, TCG_CALL_NO_WG, i32, env, tl, i32, i32)
> +DEF_HELPER_FLAGS_4(atomic_cmpxchgl_be, TCG_CALL_NO_WG, i32, env, tl, i32, i32)
> +DEF_HELPER_FLAGS_4(atomic_cmpxchgq_be, TCG_CALL_NO_WG, i64, env, tl, i64, i64)
> +DEF_HELPER_FLAGS_4(atomic_cmpxchgw_le, TCG_CALL_NO_WG, i32, env, tl, i32, i32)
> +DEF_HELPER_FLAGS_4(atomic_cmpxchgl_le, TCG_CALL_NO_WG, i32, env, tl, i32, i32)
> +DEF_HELPER_FLAGS_4(atomic_cmpxchgq_le, TCG_CALL_NO_WG, i64, env, tl, i64, i64)
> +
> +#define GEN_ATOMIC_HELPERS(NAME)                             \
> +    DEF_HELPER_FLAGS_3(glue(glue(atomic_, NAME), b),         \
> +                       TCG_CALL_NO_WG, i32, env, tl, i32)    \
> +    DEF_HELPER_FLAGS_3(glue(glue(atomic_, NAME), w_le),      \
> +                       TCG_CALL_NO_WG, i32, env, tl, i32)    \
> +    DEF_HELPER_FLAGS_3(glue(glue(atomic_, NAME), w_be),      \
> +                       TCG_CALL_NO_WG, i32, env, tl, i32)    \
> +    DEF_HELPER_FLAGS_3(glue(glue(atomic_, NAME), l_le),      \
> +                       TCG_CALL_NO_WG, i32, env, tl, i32)    \
> +    DEF_HELPER_FLAGS_3(glue(glue(atomic_, NAME), l_be),      \
> +                       TCG_CALL_NO_WG, i32, env, tl, i32)    \
> +    DEF_HELPER_FLAGS_3(glue(glue(atomic_, NAME), q_le),      \
> +                       TCG_CALL_NO_WG, i64, env, tl, i64)    \
> +    DEF_HELPER_FLAGS_3(glue(glue(atomic_, NAME), q_be),      \
> +                       TCG_CALL_NO_WG, i64, env, tl, i64)
> +
> +#endif /* CONFIG_SOFTMMU */
> +
> +GEN_ATOMIC_HELPERS(fetch_add)
> +GEN_ATOMIC_HELPERS(fetch_and)
> +GEN_ATOMIC_HELPERS(fetch_or)
> +GEN_ATOMIC_HELPERS(fetch_xor)
> +
> +GEN_ATOMIC_HELPERS(add_fetch)
> +GEN_ATOMIC_HELPERS(and_fetch)
> +GEN_ATOMIC_HELPERS(or_fetch)
> +GEN_ATOMIC_HELPERS(xor_fetch)
> +
> +GEN_ATOMIC_HELPERS(xchg)
> +
> +#undef GEN_ATOMIC_HELPERS
> diff --git a/tcg/tcg.h b/tcg/tcg.h
> index 00498fc..c91b8c6 100644
> --- a/tcg/tcg.h
> +++ b/tcg/tcg.h
> @@ -1175,6 +1175,59 @@ uint64_t helper_be_ldq_cmmu(CPUArchState *env, target_ulong addr,
>  # define helper_ret_ldq_cmmu  helper_le_ldq_cmmu
>  #endif
>
> +uint32_t helper_atomic_cmpxchgb_mmu(CPUArchState *env, target_ulong addr,
> +                                    uint32_t cmpv, uint32_t newv,
> +                                    TCGMemOpIdx oi, uintptr_t retaddr);
> +uint32_t helper_atomic_cmpxchgw_le_mmu(CPUArchState *env, target_ulong addr,
> +                                       uint32_t cmpv, uint32_t newv,
> +                                       TCGMemOpIdx oi, uintptr_t retaddr);
> +uint32_t helper_atomic_cmpxchgl_le_mmu(CPUArchState *env, target_ulong addr,
> +                                       uint32_t cmpv, uint32_t newv,
> +                                       TCGMemOpIdx oi, uintptr_t retaddr);
> +uint64_t helper_atomic_cmpxchgq_le_mmu(CPUArchState *env, target_ulong addr,
> +                                       uint64_t cmpv, uint64_t newv,
> +                                       TCGMemOpIdx oi, uintptr_t retaddr);
> +uint32_t helper_atomic_cmpxchgw_be_mmu(CPUArchState *env, target_ulong addr,
> +                                       uint32_t cmpv, uint32_t newv,
> +                                       TCGMemOpIdx oi, uintptr_t retaddr);
> +uint32_t helper_atomic_cmpxchgl_be_mmu(CPUArchState *env, target_ulong addr,
> +                                       uint32_t cmpv, uint32_t newv,
> +                                       TCGMemOpIdx oi, uintptr_t retaddr);
> +uint64_t helper_atomic_cmpxchgq_be_mmu(CPUArchState *env, target_ulong addr,
> +                                       uint64_t cmpv, uint64_t newv,
> +                                       TCGMemOpIdx oi, uintptr_t retaddr);
> +
> +#define GEN_ATOMIC_HELPER(NAME, TYPE, SUFFIX)         \
> +TYPE helper_atomic_ ## NAME ## SUFFIX ## _mmu         \
> +    (CPUArchState *env, target_ulong addr, TYPE val,  \
> +     TCGMemOpIdx oi, uintptr_t retaddr);
> +
> +#define GEN_ATOMIC_HELPER_ALL(NAME)          \
> +    GEN_ATOMIC_HELPER(NAME, uint32_t, b)      \
> +    GEN_ATOMIC_HELPER(NAME, uint32_t, w_le)  \
> +    GEN_ATOMIC_HELPER(NAME, uint32_t, l_le)  \
> +    GEN_ATOMIC_HELPER(NAME, uint64_t, q_le)  \
> +    GEN_ATOMIC_HELPER(NAME, uint32_t, w_be)  \
> +    GEN_ATOMIC_HELPER(NAME, uint32_t, l_be)  \
> +    GEN_ATOMIC_HELPER(NAME, uint64_t, q_be)
> +
> +GEN_ATOMIC_HELPER_ALL(fetch_add)
> +GEN_ATOMIC_HELPER_ALL(fetch_sub)
> +GEN_ATOMIC_HELPER_ALL(fetch_and)
> +GEN_ATOMIC_HELPER_ALL(fetch_or)
> +GEN_ATOMIC_HELPER_ALL(fetch_xor)
> +
> +GEN_ATOMIC_HELPER_ALL(add_fetch)
> +GEN_ATOMIC_HELPER_ALL(sub_fetch)
> +GEN_ATOMIC_HELPER_ALL(and_fetch)
> +GEN_ATOMIC_HELPER_ALL(or_fetch)
> +GEN_ATOMIC_HELPER_ALL(xor_fetch)
> +
> +GEN_ATOMIC_HELPER_ALL(xchg)
> +
> +#undef GEN_ATOMIC_HELPER_ALL
> +#undef GEN_ATOMIC_HELPER
> +
>  #endif /* CONFIG_SOFTMMU */
>
>  #endif /* TCG_H */


--
Alex Bennée

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Qemu-devel] [PATCH v3 13/34] tcg: Add atomic helpers
  2016-09-13 17:06   ` Alex Bennée
@ 2016-09-13 17:26     ` Richard Henderson
  2016-09-13 18:45       ` Alex Bennée
  0 siblings, 1 reply; 97+ messages in thread
From: Richard Henderson @ 2016-09-13 17:26 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel

On 09/13/2016 10:06 AM, Alex Bennée wrote:
> I get a failure when building on centos6, x86:
> 
>   CC    aarch64-softmmu/tcg/tcg-op.o
> /home/alex/lsrc/qemu.git/tcg/tcg-op.c:2280: error: ‘gen_helper_atomic_fetch_addb’ undeclared here (not in a function)
> /home/alex/lsrc/qemu.git/tcg/tcg-op.c:2280: error: ‘gen_helper_atomic_fetch_addw_le’ undeclared here (not in a function)
> /home/alex/lsrc/qemu.git/tcg/tcg-op.c:2280: error: ‘gen_helper_atomic_fetch_addw_be’ undeclared here (not in a function)
> /home/alex/lsrc/qemu.git/tcg/tcg-op.c:2280: error: ‘gen_helper_atomic_fetch_addl_le’ undeclared here (not in a function)
> /home/alex/lsrc/qemu.git/tcg/tcg-op.c:2280: error: ‘gen_helper_atomic_fetch_addl_be’ undeclared here (not in a function)
> ...

Yes, that's what Leon Alrae reported last week.  He posted a couple of fixes.
I have folded those fixes into my atomic-3 branch but have not yet re-posted
the series.


r~

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Qemu-devel] [PATCH v3 13/34] tcg: Add atomic helpers
  2016-09-12 13:47   ` Alex Bennée
@ 2016-09-13 18:00     ` Richard Henderson
  2017-03-24 10:14       ` Nikunj A Dadhania
  0 siblings, 1 reply; 97+ messages in thread
From: Richard Henderson @ 2016-09-13 18:00 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel

On 09/12/2016 06:47 AM, Alex Bennée wrote:
>> > +    /* Notice an IO access, or a notdirty page.  */
>> > +    if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
>> > +        /* There's really nothing that can be done to
>> > +           support this apart from stop-the-world.  */
>> > +        goto stop_the_world;
> We are also triggering on TLB_NOTDIRTY here in the case where a
> conditional write is the first write to a page. I don't know if a
> stop_the_world is required at this point but we will need to ensure we
> clear bits as notdirty_mem_write() does.
> 

You're quite right that we could probably special-case TLB_NOTDIRTY here such
that (1) we needn't leave the cpu loop, and (2) needn't utilize the actual
"write" part of notdirty_mem_write; just set the bits then fall through to the
actual atomic instruction below.


r~

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Qemu-devel] [PATCH v3 13/34] tcg: Add atomic helpers
  2016-09-13 17:26     ` Richard Henderson
@ 2016-09-13 18:45       ` Alex Bennée
  0 siblings, 0 replies; 97+ messages in thread
From: Alex Bennée @ 2016-09-13 18:45 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <rth@twiddle.net> writes:

> On 09/13/2016 10:06 AM, Alex Bennée wrote:
>> I get a failure when building on centos6, x86:
>>
>>   CC    aarch64-softmmu/tcg/tcg-op.o
>> /home/alex/lsrc/qemu.git/tcg/tcg-op.c:2280: error: ‘gen_helper_atomic_fetch_addb’ undeclared here (not in a function)
>> /home/alex/lsrc/qemu.git/tcg/tcg-op.c:2280: error: ‘gen_helper_atomic_fetch_addw_le’ undeclared here (not in a function)
>> /home/alex/lsrc/qemu.git/tcg/tcg-op.c:2280: error: ‘gen_helper_atomic_fetch_addw_be’ undeclared here (not in a function)
>> /home/alex/lsrc/qemu.git/tcg/tcg-op.c:2280: error: ‘gen_helper_atomic_fetch_addl_le’ undeclared here (not in a function)
>> /home/alex/lsrc/qemu.git/tcg/tcg-op.c:2280: error: ‘gen_helper_atomic_fetch_addl_be’ undeclared here (not in a function)
>> ...
>
> Yes, that's what Leon Alrae reported last week.  He posted a couple of fixes.
> I have folded those fixes into my atomic-3 branch but have not yet re-posted
> the series.

Ahh it seemed familiar but I couldn't find the mail. No problem then,
carry on ;-)

>
>
> r~


--
Alex Bennée

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Qemu-devel] [PATCH v3 15/34] tcg: Add CONFIG_ATOMIC64
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 15/34] tcg: Add CONFIG_ATOMIC64 Richard Henderson
@ 2016-09-14 10:12   ` Alex Bennée
  0 siblings, 0 replies; 97+ messages in thread
From: Alex Bennée @ 2016-09-14 10:12 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <rth@twiddle.net> writes:

> Allow qemu to build on 32-bit hosts without 64-bit atomic ops.
>
> Even if we only allow 32-bit hosts to multi-thread emulate 32-bit
> guests, we still need some way to handle the 32-bit guest using a
> 64-bit atomic operation.  Do so by dropping back to single-step.
>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  configure         | 33 +++++++++++++++++++++++++++++++++
>  cputlb.c          |  4 ++++
>  tcg-runtime.c     |  7 +++++++
>  tcg/tcg-op.c      | 22 ++++++++++++++++++----
>  tcg/tcg-runtime.h | 46 ++++++++++++++++++++++++++++++++++++++++------
>  tcg/tcg.h         | 15 ++++++++++++---
>  6 files changed, 114 insertions(+), 13 deletions(-)
>
> diff --git a/configure b/configure
> index e500652..519de5d 100755
> --- a/configure
> +++ b/configure
> @@ -4462,6 +4462,35 @@ EOF
>    fi
>  fi
>
> +#########################################
> +# See if 64-bit atomic operations are supported.
> +# Note that without __atomic builtins, we can only
> +# assume atomic loads/stores max at pointer size.
> +
> +cat > $TMPC << EOF
> +#include <stdint.h>
> +int main(void)
> +{
> +  uint64_t x = 0, y = 0;
> +#ifdef __ATOMIC_RELAXED
> +  y = __atomic_load_8(&x, 0);
> +  __atomic_store_8(&x, y, 0);
> +  __atomic_compare_exchange_8(&x, &y, x, 0, 0, 0);
> +  __atomic_exchange_8(&x, y, 0);
> +  __atomic_fetch_add_8(&x, y, 0);
> +#else
> +  char is_host64[sizeof(void *) >= sizeof(uint64_t) ? 1 : -1];

I missed this piece of magic when I first read the patch :-/

> +  __sync_lock_test_and_set(&x, y);
> +  __sync_val_compare_and_swap(&x, y, 0);
> +  __sync_fetch_and_add(&x, y);
> +#endif
> +  return 0;
> +}
> +EOF
> +if compile_prog "" "" ; then
> +  atomic64=yes
> +fi
> +
>  ########################################
>  # check if getauxval is available.
>
> @@ -5415,6 +5444,10 @@ if test "$atomic128" = "yes" ; then
>    echo "CONFIG_ATOMIC128=y" >> $config_host_mak
>  fi
>
> +if test "$atomic64" = "yes" ; then
> +  echo "CONFIG_ATOMIC64=y" >> $config_host_mak
> +fi
> +
>  if test "$getauxval" = "yes" ; then
>    echo "CONFIG_GETAUXVAL=y" >> $config_host_mak
>  fi
> diff --git a/cputlb.c b/cputlb.c
> index 773cc85..8cdbf6c 100644
> --- a/cputlb.c
> +++ b/cputlb.c
> @@ -687,8 +687,10 @@ static void *atomic_mmu_lookup(CPUArchState *env, target_ulong addr,
>  #define DATA_SIZE 4
>  #include "atomic_template.h"
>
> +#ifdef CONFIG_ATOMIC64
>  #define DATA_SIZE 8
>  #include "atomic_template.h"
> +#endif
>
>  #ifdef CONFIG_ATOMIC128
>  #define DATA_SIZE 16
> @@ -713,8 +715,10 @@ static void *atomic_mmu_lookup(CPUArchState *env, target_ulong addr,
>  #define DATA_SIZE 4
>  #include "atomic_template.h"
>
> +#ifdef CONFIG_ATOMIC64
>  #define DATA_SIZE 8
>  #include "atomic_template.h"
> +#endif
>
>  /* Code access functions.  */
>
> diff --git a/tcg-runtime.c b/tcg-runtime.c
> index 0c97cdf..d7704d4 100644
> --- a/tcg-runtime.c
> +++ b/tcg-runtime.c
> @@ -101,6 +101,11 @@ int64_t HELPER(mulsh_i64)(int64_t arg1, int64_t arg2)
>      return h;
>  }
>
> +void HELPER(exit_atomic)(CPUArchState *env)
> +{
> +    cpu_loop_exit_atomic(ENV_GET_CPU(env), GETRA());
> +}
> +
>  #ifndef CONFIG_SOFTMMU
>  /* The softmmu versions of these helpers are in cputlb.c.  */
>
> @@ -130,8 +135,10 @@ static void *atomic_mmu_lookup(CPUArchState *env, target_ulong addr,
>  #define DATA_SIZE 4
>  #include "atomic_template.h"
>
> +#ifdef CONFIG_ATOMIC64
>  #define DATA_SIZE 8
>  #include "atomic_template.h"
> +#endif
>
>  /* The following is only callable from other helpers, and matches up
>     with the softmmu version.  */
> diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
> index e146ad4..1c09025 100644
> --- a/tcg/tcg-op.c
> +++ b/tcg/tcg-op.c
> @@ -2023,14 +2023,20 @@ typedef void (*gen_atomic_op_i32)(TCGv_i32, TCGv_env, TCGv, TCGv_i32);
>  typedef void (*gen_atomic_op_i64)(TCGv_i64, TCGv_env, TCGv, TCGv_i64);
>  #endif
>
> +#ifdef CONFIG_ATOMIC64
> +# define WITH_ATOMIC64(X) X,
> +#else
> +# define WITH_ATOMIC64(X)
> +#endif
> +
>  static void * const table_cmpxchg[16] = {
>      [MO_8] = gen_helper_atomic_cmpxchgb,
>      [MO_16 | MO_LE] = gen_helper_atomic_cmpxchgw_le,
>      [MO_16 | MO_BE] = gen_helper_atomic_cmpxchgw_be,
>      [MO_32 | MO_LE] = gen_helper_atomic_cmpxchgl_le,
>      [MO_32 | MO_BE] = gen_helper_atomic_cmpxchgl_be,
> -    [MO_64 | MO_LE] = gen_helper_atomic_cmpxchgq_le,
> -    [MO_64 | MO_BE] = gen_helper_atomic_cmpxchgq_be,
> +    WITH_ATOMIC64([MO_64 | MO_LE] = gen_helper_atomic_cmpxchgq_le)
> +    WITH_ATOMIC64([MO_64 | MO_BE] = gen_helper_atomic_cmpxchgq_be)
>  };
>
>  void tcg_gen_atomic_cmpxchg_i32(TCGv_i32 retv, TCGv addr, TCGv_i32 cmpv,
> @@ -2100,6 +2106,7 @@ void tcg_gen_atomic_cmpxchg_i64(TCGv_i64 retv, TCGv addr, TCGv_i64 cmpv,
>          }
>          tcg_temp_free_i64(t1);
>      } else if ((memop & MO_SIZE) == MO_64) {
> +#ifdef CONFIG_ATOMIC64
>          gen_atomic_cx_i64 gen;
>
>          gen = table_cmpxchg[memop & (MO_SIZE | MO_BSWAP)];
> @@ -2114,6 +2121,9 @@ void tcg_gen_atomic_cmpxchg_i64(TCGv_i64 retv, TCGv addr, TCGv_i64 cmpv,
>  #else
>          gen(retv, tcg_ctx.tcg_env, addr, cmpv, newv);
>  #endif
> +#else
> +        gen_helper_exit_atomic(tcg_ctx.tcg_env);
> +#endif /* CONFIG_ATOMIC64 */
>      } else {
>          TCGv_i32 c32 = tcg_temp_new_i32();
>          TCGv_i32 n32 = tcg_temp_new_i32();
> @@ -2201,6 +2211,7 @@ static void do_atomic_op_i64(TCGv_i64 ret, TCGv addr, TCGv_i64 val,
>      memop = tcg_canonicalize_memop(memop, 1, 0);
>
>      if ((memop & MO_SIZE) == MO_64) {
> +#ifdef CONFIG_ATOMIC64
>          gen_atomic_op_i64 gen;
>
>          gen = table[memop & (MO_SIZE | MO_BSWAP)];
> @@ -2215,6 +2226,9 @@ static void do_atomic_op_i64(TCGv_i64 ret, TCGv addr, TCGv_i64 val,
>  #else
>          gen(ret, tcg_ctx.tcg_env, addr, val);
>  #endif
> +#else
> +        gen_helper_exit_atomic(tcg_ctx.tcg_env);
> +#endif /* CONFIG_ATOMIC64 */
>      } else {
>          TCGv_i32 v32 = tcg_temp_new_i32();
>          TCGv_i32 r32 = tcg_temp_new_i32();
> @@ -2239,8 +2253,8 @@ static void * const table_##NAME[16] = {                                \
>      [MO_16 | MO_BE] = gen_helper_atomic_##NAME##w_be,                   \
>      [MO_32 | MO_LE] = gen_helper_atomic_##NAME##l_le,                   \
>      [MO_32 | MO_BE] = gen_helper_atomic_##NAME##l_be,                   \
> -    [MO_64 | MO_LE] = gen_helper_atomic_##NAME##q_le,                   \
> -    [MO_64 | MO_BE] = gen_helper_atomic_##NAME##q_be,                   \
> +    WITH_ATOMIC64([MO_64 | MO_LE] = gen_helper_atomic_##NAME##q_le)     \
> +    WITH_ATOMIC64([MO_64 | MO_BE] = gen_helper_atomic_##NAME##q_be)     \
>  };                                                                      \
>  void tcg_gen_atomic_##NAME##_i32                                        \
>      (TCGv_i32 ret, TCGv addr, TCGv_i32 val, TCGArg idx, TCGMemOp memop) \
> diff --git a/tcg/tcg-runtime.h b/tcg/tcg-runtime.h
> index 22367aa..1deb86a 100644
> --- a/tcg/tcg-runtime.h
> +++ b/tcg/tcg-runtime.h
> @@ -15,23 +15,28 @@ DEF_HELPER_FLAGS_2(sar_i64, TCG_CALL_NO_RWG_SE, s64, s64, s64)
>  DEF_HELPER_FLAGS_2(mulsh_i64, TCG_CALL_NO_RWG_SE, s64, s64, s64)
>  DEF_HELPER_FLAGS_2(muluh_i64, TCG_CALL_NO_RWG_SE, i64, i64, i64)
>
> +DEF_HELPER_FLAGS_1(exit_atomic, TCG_CALL_NO_WG, noreturn, env)
> +
>  #ifdef CONFIG_SOFTMMU
>
>  DEF_HELPER_FLAGS_5(atomic_cmpxchgb, TCG_CALL_NO_WG,
>                     i32, env, tl, i32, i32, i32)
>  DEF_HELPER_FLAGS_5(atomic_cmpxchgw_be, TCG_CALL_NO_WG,
>                     i32, env, tl, i32, i32, i32)
> -DEF_HELPER_FLAGS_5(atomic_cmpxchgl_be, TCG_CALL_NO_WG,
> -                   i32, env, tl, i32, i32, i32)
> -DEF_HELPER_FLAGS_5(atomic_cmpxchgq_be, TCG_CALL_NO_WG,
> -                   i64, env, tl, i64, i64, i32)
>  DEF_HELPER_FLAGS_5(atomic_cmpxchgw_le, TCG_CALL_NO_WG,
>                     i32, env, tl, i32, i32, i32)
> +DEF_HELPER_FLAGS_5(atomic_cmpxchgl_be, TCG_CALL_NO_WG,
> +                   i32, env, tl, i32, i32, i32)
>  DEF_HELPER_FLAGS_5(atomic_cmpxchgl_le, TCG_CALL_NO_WG,
>                     i32, env, tl, i32, i32, i32)
> +#ifdef CONFIG_ATOMIC64
> +DEF_HELPER_FLAGS_5(atomic_cmpxchgq_be, TCG_CALL_NO_WG,
> +                   i64, env, tl, i64, i64, i32)
>  DEF_HELPER_FLAGS_5(atomic_cmpxchgq_le, TCG_CALL_NO_WG,
>                     i64, env, tl, i64, i64, i32)
> +#endif
>
> +#ifdef CONFIG_ATOMIC64
>  #define GEN_ATOMIC_HELPERS(NAME)                                  \
>      DEF_HELPER_FLAGS_4(glue(glue(atomic_, NAME), b),              \
>                         TCG_CALL_NO_WG, i32, env, tl, i32, i32)    \
> @@ -47,17 +52,33 @@ DEF_HELPER_FLAGS_5(atomic_cmpxchgq_le, TCG_CALL_NO_WG,
>                         TCG_CALL_NO_WG, i64, env, tl, i64, i32)    \
>      DEF_HELPER_FLAGS_4(glue(glue(atomic_, NAME), q_be),           \
>                         TCG_CALL_NO_WG, i64, env, tl, i64, i32)
> +#else
> +#define GEN_ATOMIC_HELPERS(NAME)                                  \
> +    DEF_HELPER_FLAGS_4(glue(glue(atomic_, NAME), b),              \
> +                       TCG_CALL_NO_WG, i32, env, tl, i32, i32)    \
> +    DEF_HELPER_FLAGS_4(glue(glue(atomic_, NAME), w_le),           \
> +                       TCG_CALL_NO_WG, i32, env, tl, i32, i32)    \
> +    DEF_HELPER_FLAGS_4(glue(glue(atomic_, NAME), w_be),           \
> +                       TCG_CALL_NO_WG, i32, env, tl, i32, i32)    \
> +    DEF_HELPER_FLAGS_4(glue(glue(atomic_, NAME), l_le),           \
> +                       TCG_CALL_NO_WG, i32, env, tl, i32, i32)    \
> +    DEF_HELPER_FLAGS_4(glue(glue(atomic_, NAME), l_be),           \
> +                       TCG_CALL_NO_WG, i32, env, tl, i32, i32)
> +#endif /* CONFIG_ATOMIC64 */
>
>  #else
>
>  DEF_HELPER_FLAGS_4(atomic_cmpxchgb, TCG_CALL_NO_WG, i32, env, tl, i32, i32)
>  DEF_HELPER_FLAGS_4(atomic_cmpxchgw_be, TCG_CALL_NO_WG, i32, env, tl, i32, i32)
> -DEF_HELPER_FLAGS_4(atomic_cmpxchgl_be, TCG_CALL_NO_WG, i32, env, tl, i32, i32)
> -DEF_HELPER_FLAGS_4(atomic_cmpxchgq_be, TCG_CALL_NO_WG, i64, env, tl, i64, i64)
>  DEF_HELPER_FLAGS_4(atomic_cmpxchgw_le, TCG_CALL_NO_WG, i32, env, tl, i32, i32)
> +DEF_HELPER_FLAGS_4(atomic_cmpxchgl_be, TCG_CALL_NO_WG, i32, env, tl, i32, i32)
>  DEF_HELPER_FLAGS_4(atomic_cmpxchgl_le, TCG_CALL_NO_WG, i32, env, tl, i32, i32)
> +#ifdef CONFIG_ATOMIC64
> +DEF_HELPER_FLAGS_4(atomic_cmpxchgq_be, TCG_CALL_NO_WG, i64, env, tl, i64, i64)
>  DEF_HELPER_FLAGS_4(atomic_cmpxchgq_le, TCG_CALL_NO_WG, i64, env, tl, i64, i64)
> +#endif
>
> +#ifdef CONFIG_ATOMIC64
>  #define GEN_ATOMIC_HELPERS(NAME)                             \
>      DEF_HELPER_FLAGS_3(glue(glue(atomic_, NAME), b),         \
>                         TCG_CALL_NO_WG, i32, env, tl, i32)    \
> @@ -73,6 +94,19 @@ DEF_HELPER_FLAGS_4(atomic_cmpxchgq_le, TCG_CALL_NO_WG, i64, env, tl, i64, i64)
>                         TCG_CALL_NO_WG, i64, env, tl, i64)    \
>      DEF_HELPER_FLAGS_3(glue(glue(atomic_, NAME), q_be),      \
>                         TCG_CALL_NO_WG, i64, env, tl, i64)
> +#else
> +#define GEN_ATOMIC_HELPERS(NAME)                             \
> +    DEF_HELPER_FLAGS_3(glue(glue(atomic_, NAME), b),         \
> +                       TCG_CALL_NO_WG, i32, env, tl, i32)    \
> +    DEF_HELPER_FLAGS_3(glue(glue(atomic_, NAME), w_le),      \
> +                       TCG_CALL_NO_WG, i32, env, tl, i32)    \
> +    DEF_HELPER_FLAGS_3(glue(glue(atomic_, NAME), w_be),      \
> +                       TCG_CALL_NO_WG, i32, env, tl, i32)    \
> +    DEF_HELPER_FLAGS_3(glue(glue(atomic_, NAME), l_le),      \
> +                       TCG_CALL_NO_WG, i32, env, tl, i32)    \
> +    DEF_HELPER_FLAGS_3(glue(glue(atomic_, NAME), l_be),      \
> +                       TCG_CALL_NO_WG, i32, env, tl, i32)
> +#endif /* CONFIG_ATOMIC64 */
>
>  #endif /* CONFIG_SOFTMMU */
>
> diff --git a/tcg/tcg.h b/tcg/tcg.h
> index 5a94cec..8bcf32a 100644
> --- a/tcg/tcg.h
> +++ b/tcg/tcg.h
> @@ -1202,14 +1202,23 @@ TYPE helper_atomic_ ## NAME ## SUFFIX ## _mmu         \
>      (CPUArchState *env, target_ulong addr, TYPE val,  \
>       TCGMemOpIdx oi, uintptr_t retaddr);
>
> +#ifdef CONFIG_ATOMIC64
>  #define GEN_ATOMIC_HELPER_ALL(NAME)          \
> -    GEN_ATOMIC_HELPER(NAME, uint32_t, b)      \
> +    GEN_ATOMIC_HELPER(NAME, uint32_t, b)     \
>      GEN_ATOMIC_HELPER(NAME, uint32_t, w_le)  \
> -    GEN_ATOMIC_HELPER(NAME, uint32_t, l_le)  \
> -    GEN_ATOMIC_HELPER(NAME, uint64_t, q_le)  \
>      GEN_ATOMIC_HELPER(NAME, uint32_t, w_be)  \
> +    GEN_ATOMIC_HELPER(NAME, uint32_t, l_le)  \
>      GEN_ATOMIC_HELPER(NAME, uint32_t, l_be)  \
> +    GEN_ATOMIC_HELPER(NAME, uint64_t, q_le)  \
>      GEN_ATOMIC_HELPER(NAME, uint64_t, q_be)
> +#else
> +#define GEN_ATOMIC_HELPER_ALL(NAME)          \
> +    GEN_ATOMIC_HELPER(NAME, uint32_t, b)     \
> +    GEN_ATOMIC_HELPER(NAME, uint32_t, w_le)  \
> +    GEN_ATOMIC_HELPER(NAME, uint32_t, w_be)  \
> +    GEN_ATOMIC_HELPER(NAME, uint32_t, l_le)  \
> +    GEN_ATOMIC_HELPER(NAME, uint32_t, l_be)
> +#endif
>
>  GEN_ATOMIC_HELPER_ALL(fetch_add)
>  GEN_ATOMIC_HELPER_ALL(fetch_sub)

I find the nesting here makes it a bit harder to follow all the various
permutations of CONFIG_SOFTMMU/CONFIG_ATOMIC64 but I can't see a nicer
way of doing it with plain old C macros so *shrug*

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>


--
Alex Bennée

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Qemu-devel] [PATCH v3 24/34] target-i386: remove helper_lock()
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 24/34] target-i386: remove helper_lock() Richard Henderson
@ 2016-09-14 11:14   ` Alex Bennée
  0 siblings, 0 replies; 97+ messages in thread
From: Alex Bennée @ 2016-09-14 11:14 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, Emilio G. Cota


Richard Henderson <rth@twiddle.net> writes:

> From: "Emilio G. Cota" <cota@braap.org>
>
> It's been superseded by the atomic helpers.
>
> The use of the atomic helpers provides a significant performance and scalability
> improvement. Below is the result of running the atomic_add-test microbenchmark with:
>  $ x86_64-linux-user/qemu-x86_64 tests/atomic_add-bench -o 5000000 -r $r -n $n
> , where $n is the number of threads and $r is the allowed range for the additions.
>
> The scenarios measured are:
> - atomic: implements x86' ADDL with the atomic_add helper (i.e. this patchset)
> - cmpxchg: implement x86' ADDL with a TCG loop using the cmpxchg helper
> - master: before this patchset
>
> Results sorted in ascending range, i.e. descending degree of contention.
> Y axis is Throughput in Mops/s. Tests are run on an AMD machine with 64
> Opteron 6376 cores.
>
>                 atomic_add-bench: 5000000 ops/thread, [0,1] range
>
>   25 ++---------+----------+---------+----------+----------+----------+---++
>      + atomic +-E--+       +         +          +          +          +    |
>      |cmpxchg +-H--+                                                       |
>   20 +Emaster +-N--+                                                      ++
>      ||                                                                    |
>      |++                                                                   |
>      ||                                                                    |
>   15 +++                                                                  ++
>      |N|                                                                   |
>      |+|                                                                   |
>   10 ++|                                                                  ++
>      |+|+                                                                  |
>      | |    -+E+------        +++  ---+E+------+E+------+E+-----+E+------+E|
>      |+E+E+- +++     +E+------+E+--                                        |
>    5 ++|+                                                                 ++
>      |+N+H+---                                 +++                         |
>      ++++N+--+H++----+++   +  +++  --++H+------+H+------+H++----+H+---+--- |
>    0 ++---------+-----H----+---H-----+----------+----------+----------+---H+
>      0          10         20        30         40         50         60
>                                 Number of threads
>
>                 atomic_add-bench: 5000000 ops/thread, [0,2] range
>
>   25 ++---------+----------+---------+----------+----------+----------+---++
>      ++atomic +-E--+       +         +          +          +          +    |
>      |cmpxchg +-H--+                                                       |
>   20 ++master +-N--+                                                      ++
>      |E|                                                                   |
>      |++                                                                   |
>      ||E                                                                   |
>   15 ++|                                                                  ++
>      |N||                                                                  |
>      |+||                                   ---+E+------+E+-----+E+------+E|
>   10 ++| |        ---+E+------+E+-----+E+---                    +++      +++
>      ||H+E+--+E+--                                                         |
>      |+++++                                                                |
>      | ||                                                                  |
>    5 ++|+H+--                                  +++                        ++
>      |+N+    -                              ---+H+------+H+------          |
>      +  +N+--+H++----+H+---+--+H+----++H+---    +          +    +H+---+--+H|
>    0 ++---------+----------+---------+----------+----------+----------+---++
>      0          10         20        30         40         50         60
>                                 Number of threads
>
>                 atomic_add-bench: 5000000 ops/thread, [0,8] range
>
>   40 ++---------+----------+---------+----------+----------+----------+---++
>      ++atomic +-E--+       +         +          +          +          +    |
>   35 +cmpxchg +-H--+                                                      ++
>      | master +-N--+               ---+E+------+E+------+E+-----+E+------+E|
>   30 ++|                   ---+E+--   +++                                 ++
>      | |            -+E+---                                                |
>   25 ++E        ---- +++                                                  ++
>      |+++++ -+E+                                                           |
>   20 +E+ E-- +++                                                          ++
>      |H|+++                                                                |
>      |+|                                       +H+-------                  |
>   15 ++H+                                   ---+++      +H+------         ++
>      |N++H+--                         +++---                    +H+------++|
>   10 ++ +++  -       +++           ---+H+                       +++      +H+
>      | |     +H+-----+H+------+H+--                                        |
>    5 ++|                      +++                                         ++
>      ++N+N+--+N++          +         +          +          +          +    |
>    0 ++---------+----------+---------+----------+----------+----------+---++
>      0          10         20        30         40         50         60
>                                 Number of threads
>
>                atomic_add-bench: 5000000 ops/thread, [0,128] range
>
>   160 ++---------+---------+----------+---------+----------+----------+---++
>       + atomic +-E--+      +          +         +          +          +    |
>   140 +cmpxchg +-H--+                          +++      +++               ++
>       | master +-N--+                           E--------E------+E+------++|
>   120 ++                                      --|        |      +++       E+
>       |                                     -- +++      +++              ++|
>   100 ++                                   -                              ++
>       |                                +++-                     +++      ++|
>    80 ++                              -+E+    -+H+------+H+------H--------++
>       |                           ----    ----                  +++       H|
>       |            ---+E+-----+E+-  ---+H+                               ++|
>    60 ++     +E+---   +++  ---+H+---                                      ++
>       |    --+++   ---+H+--                                                |
>    40 ++ +E+-+H+---                                                       ++
>       |  +H+                                                               |
>    20 +EE+                                                                ++
>       +N+        +         +          +         +          +          +    |
>     0 ++N-N---N--+---------+----------+---------+----------+----------+---++
>       0          10        20         30        40         50         60
>                                 Number of threads
>
>               atomic_add-bench: 5000000 ops/thread, [0,1024] range
>
>   350 ++---------+---------+----------+---------+----------+----------+---++
>       + atomic +-E--+      +          +         +          +          +    |
>   300 +cmpxchg +-H--+                                                    +++
>       | master +-N--+                                           +++       ||
>       |                                                 +++      |    ----E|
>   250 ++                                                 |   ----E----    ++
>       |                                              ----E---    |    ---+H|
>   200 ++                                      -+E+---   +++  ---+H+---    ++
>       |                                   ----         -+H+--              |
>       |                                +E+     +++ ---- +++                |
>   150 ++                            ---+++  ---+H+-                       ++
>       |                          ---  -+H+--                               |
>   100 ++                   ---+E+ ---- +++                                ++
>       |      +++   ---+E+-----+H+-                                         |
>       |     -+E+------+H+--                                                |
>    50 ++ +E+                                                              ++
>       +EE+       +         +          +         +          +          +    |
>     0 ++N-N---N--+---------+----------+---------+----------+----------+---++
>       0          10        20         30        40         50         60
>                                 Number of threads
>
>   hi-res: http://imgur.com/a/fMRmq
>
> For master I stopped measuring master after 8 threads, because there is little
> point in measuring the well-known performance collapse of a contended lock.
>
> Signed-off-by: Emilio G. Cota <cota@braap.org>
> Message-Id: <1467054136-10430-21-git-send-email-cota@braap.org>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  target-i386/helper.h     |  2 --
>  target-i386/mem_helper.c | 33 ---------------------------------
>  target-i386/translate.c  | 15 ---------------
>  3 files changed, 50 deletions(-)
>
> diff --git a/target-i386/helper.h b/target-i386/helper.h
> index 729d4b6..4e859eb 100644
> --- a/target-i386/helper.h
> +++ b/target-i386/helper.h
> @@ -1,8 +1,6 @@
>  DEF_HELPER_FLAGS_4(cc_compute_all, TCG_CALL_NO_RWG_SE, tl, tl, tl, tl, int)
>  DEF_HELPER_FLAGS_4(cc_compute_c, TCG_CALL_NO_RWG_SE, tl, tl, tl, tl, int)
>
> -DEF_HELPER_0(lock, void)
> -DEF_HELPER_0(unlock, void)
>  DEF_HELPER_3(write_eflags, void, env, tl, i32)
>  DEF_HELPER_1(read_eflags, tl, env)
>  DEF_HELPER_2(divb_AL, void, env, tl)
> diff --git a/target-i386/mem_helper.c b/target-i386/mem_helper.c
> index 8b72fda..f8d2a1e 100644
> --- a/target-i386/mem_helper.c
> +++ b/target-i386/mem_helper.c
> @@ -25,39 +25,6 @@
>  #include "qemu/int128.h"
>  #include "tcg.h"
>
> -/* broken thread support */
> -
> -#if defined(CONFIG_USER_ONLY)
> -QemuMutex global_cpu_lock;
> -
> -void helper_lock(void)
> -{
> -    qemu_mutex_lock(&global_cpu_lock);
> -}
> -
> -void helper_unlock(void)
> -{
> -    qemu_mutex_unlock(&global_cpu_lock);
> -}
> -
> -void helper_lock_init(void)
> -{
> -    qemu_mutex_init(&global_cpu_lock);
> -}
> -#else
> -void helper_lock(void)
> -{
> -}
> -
> -void helper_unlock(void)
> -{
> -}
> -
> -void helper_lock_init(void)
> -{
> -}
> -#endif
> -
>  void helper_cmpxchg8b_unlocked(CPUX86State *env, target_ulong a0)
>  {
>      uintptr_t ra = GETPC();
> diff --git a/target-i386/translate.c b/target-i386/translate.c
> index b8c5dde..755a18f 100644
> --- a/target-i386/translate.c
> +++ b/target-i386/translate.c
> @@ -4537,10 +4537,6 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
>      s->aflag = aflag;
>      s->dflag = dflag;
>
> -    /* lock generation */
> -    if (prefixes & PREFIX_LOCK)
> -        gen_helper_lock();
> -
>      /* now check op code */
>   reswitch:
>      switch(b) {
> @@ -8203,20 +8199,11 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
>      default:
>          goto unknown_op;
>      }
> -    /* lock generation */
> -    if (s->prefix & PREFIX_LOCK)
> -        gen_helper_unlock();
>      return s->pc;
>   illegal_op:
> -    if (s->prefix & PREFIX_LOCK)
> -        gen_helper_unlock();
> -    /* XXX: ensure that no lock was generated */
>      gen_illegal_opcode(s);
>      return s->pc;
>   unknown_op:
> -    if (s->prefix & PREFIX_LOCK)
> -        gen_helper_unlock();
> -    /* XXX: ensure that no lock was generated */
>      gen_unknown_opcode(env, s);
>      return s->pc;
>  }
> @@ -8308,8 +8295,6 @@ void tcg_x86_init(void)
>                                       offsetof(CPUX86State, bnd_regs[i].ub),
>                                       bnd_regu_names[i]);
>      }
> -
> -    helper_lock_init();
>  }
>
>  /* generate intermediate code for basic block 'tb'.  */

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

I also looked over the rest of the target-i386 stuff and it seems fine
to me but I'm not really an x86 expert. You can treat that as a r-b if
you want if no one with more x86-fu want to look at it.

--
Alex Bennée

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Qemu-devel] [PATCH v3 25/34] tests: add atomic_add-bench
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 25/34] tests: add atomic_add-bench Richard Henderson
@ 2016-09-14 13:53   ` Alex Bennée
  2016-09-15  2:23     ` Emilio G. Cota
  0 siblings, 1 reply; 97+ messages in thread
From: Alex Bennée @ 2016-09-14 13:53 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, Emilio G. Cota


Richard Henderson <rth@twiddle.net> writes:

> From: "Emilio G. Cota" <cota@braap.org>
>
> With this microbenchmark we can measure the overhead of emulating atomic
> instructions with a configurable degree of contention.
>
> The benchmark spawns $n threads, each performing $o atomic ops (additions)
> in a loop. Each atomic operation is performed on a different cache line
> (assuming lines are 64b long) that is randomly selected from a range [0, $r).
>
> [ Note: each $foo corresponds to a -foo flag ]
>
> Signed-off-by: Emilio G. Cota <cota@braap.org>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> Message-Id: <1467054136-10430-20-git-send-email-cota@braap.org>
> ---
>  tests/.gitignore         |   1 +
>  tests/Makefile.include   |   4 +-
>  tests/atomic_add-bench.c | 180 +++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 184 insertions(+), 1 deletion(-)
>  create mode 100644 tests/atomic_add-bench.c
>
> diff --git a/tests/.gitignore b/tests/.gitignore
> index dbb5263..ec3137a 100644
> --- a/tests/.gitignore
> +++ b/tests/.gitignore
> @@ -1,3 +1,4 @@
> +atomic_add-bench
>  check-qdict
>  check-qfloat
>  check-qint
> diff --git a/tests/Makefile.include b/tests/Makefile.include
> index 14be491..e1957ed 100644
> --- a/tests/Makefile.include
> +++ b/tests/Makefile.include
> @@ -421,7 +421,8 @@ test-obj-y = tests/check-qint.o tests/check-qstring.o tests/check-qdict.o \
>  	tests/test-opts-visitor.o tests/test-qmp-event.o \
>  	tests/rcutorture.o tests/test-rcu-list.o \
>  	tests/test-qdist.o \
> -	tests/test-qht.o tests/qht-bench.o tests/test-qht-par.o
> +	tests/test-qht.o tests/qht-bench.o tests/test-qht-par.o \
> +	tests/atomic_add-bench.o
>
>  $(test-obj-y): QEMU_INCLUDES += -Itests
>  QEMU_CFLAGS += -I$(SRC_PATH)/tests
> @@ -465,6 +466,7 @@ tests/test-qdist$(EXESUF): tests/test-qdist.o $(test-util-obj-y)
>  tests/test-qht$(EXESUF): tests/test-qht.o $(test-util-obj-y)
>  tests/test-qht-par$(EXESUF): tests/test-qht-par.o tests/qht-bench$(EXESUF) $(test-util-obj-y)
>  tests/qht-bench$(EXESUF): tests/qht-bench.o $(test-util-obj-y)
> +tests/atomic_add-bench$(EXESUF): tests/atomic_add-bench.o
>  $(test-util-obj-y)

This probably more properly lives in tests/tcg/generic or some such but
that needs the tcg/tests being rehabilitated into the build system so at
least here it gets built.

>
>  tests/test-qdev-global-props$(EXESUF): tests/test-qdev-global-props.o \
>  	hw/core/qdev.o hw/core/qdev-properties.o hw/core/hotplug.o\
> diff --git a/tests/atomic_add-bench.c b/tests/atomic_add-bench.c
> new file mode 100644
> index 0000000..5bbecf6
> --- /dev/null
> +++ b/tests/atomic_add-bench.c

I wonder if this would be worth making atomic-bench and adding the other
atomic operations into the benchmark? I know given the current helper
overhead its unlikely to show much difference between the ops but if we
move to backend support for the tcg atomics it would be a useful tool to
have.

> @@ -0,0 +1,180 @@
> +#include "qemu/osdep.h"
> +#include "qemu/thread.h"
> +#include "qemu/host-utils.h"
> +#include "qemu/processor.h"
> +
> +struct thread_info {
> +    uint64_t r;
> +} QEMU_ALIGNED(64);
> +
> +struct count {
> +    unsigned long val;
> +} QEMU_ALIGNED(64);
> +
> +static QemuThread *threads;
> +static struct thread_info *th_info;
> +static unsigned int n_threads = 1;
> +static unsigned int n_ready_threads;
> +static struct count *counts;
> +static unsigned long n_ops = 10000;
> +static double duration;
> +static unsigned int range = 1;
> +static bool test_start;
> +
> +static const char commands_string[] =
> +    " -n = number of threads\n"
> +    " -o = number of ops per thread\n"
> +    " -r = range (will be rounded up to pow2)";
> +
> +static void usage_complete(char *argv[])
> +{
> +    fprintf(stderr, "Usage: %s [options]\n", argv[0]);
> +    fprintf(stderr, "options:\n%s\n", commands_string);
> +}
> +
> +/*
> + * From: https://en.wikipedia.org/wiki/Xorshift
> + * This is faster than rand_r(), and gives us a wider range (RAND_MAX is only
> + * guaranteed to be >= INT_MAX).
> + */
> +static uint64_t xorshift64star(uint64_t x)
> +{
> +    x ^= x >> 12; /* a */
> +    x ^= x << 25; /* b */
> +    x ^= x >> 27; /* c */
> +    return x * UINT64_C(2685821657736338717);
> +}
> +
> +static void *thread_func(void *arg)
> +{
> +    struct thread_info *info = arg;
> +    unsigned long i;
> +
> +    atomic_inc(&n_ready_threads);
> +    while (!atomic_mb_read(&test_start)) {
> +        cpu_relax();
> +    }
> +
> +    for (i = 0; i < n_ops; i++) {
> +        unsigned int index;
> +
> +        info->r = xorshift64star(info->r);
> +        index = info->r & (range - 1);
> +        atomic_inc(&counts[index].val);
> +    }
> +    return NULL;
> +}
> +
> +static inline
> +uint64_t ts_subtract(const struct timespec *a, const struct timespec *b)
> +{
> +    uint64_t ns;
> +
> +    ns = (b->tv_sec - a->tv_sec) * 1000000000ULL;
> +    ns += (b->tv_nsec - a->tv_nsec);
> +    return ns;
> +}
> +
> +static void run_test(void)
> +{
> +    unsigned int i;
> +    struct timespec ts_start, ts_end;
> +
> +    while (atomic_read(&n_ready_threads) != n_threads) {
> +        cpu_relax();
> +    }
> +    atomic_mb_set(&test_start, true);
> +
> +    clock_gettime(CLOCK_MONOTONIC, &ts_start);
> +    for (i = 0; i < n_threads; i++) {
> +        qemu_thread_join(&threads[i]);
> +    }
> +    clock_gettime(CLOCK_MONOTONIC, &ts_end);
> +    duration = ts_subtract(&ts_start, &ts_end) / 1e9;
> +}
> +
> +static void create_threads(void)
> +{
> +    unsigned int i;
> +
> +    threads = g_new(QemuThread, n_threads);
> +    th_info = g_new(struct thread_info, n_threads);
> +    counts = qemu_memalign(64, sizeof(*counts) * range);

This fails on my setup as AFAICT qemu_memalign doesn't give you zeroed
memory. I added a memset after to zero it out.

> +
> +    for (i = 0; i < n_threads; i++) {
> +        struct thread_info *info = &th_info[i];
> +
> +        info->r = (i + 1) ^ time(NULL);
> +        qemu_thread_create(&threads[i], NULL, thread_func, info,
> +                           QEMU_THREAD_JOINABLE);
> +    }
> +}
> +
> +static void pr_params(void)
> +{
> +    printf("Parameters:\n");
> +    printf(" # of threads:      %u\n", n_threads);
> +    printf(" n_ops:             %lu\n", n_ops);
> +    printf(" ops' range:        %u\n", range);
> +}
> +
> +static void pr_stats(void)
> +{
> +    unsigned long long val = 0;
> +    unsigned int i;
> +    double tx;
> +
> +    for (i = 0; i < range; i++) {
> +        val += counts[i].val;
> +    }
> +    assert(val == n_threads * n_ops);

Again while I was testing this failed due to the above. It would proably
also be worth reporting the fail condition for the test so my current
hacky patch looks like:

modified   tests/atomic_add-bench.c
@@ -100,6 +100,7 @@ static void create_threads(void)
     threads = g_new(QemuThread, n_threads);
     th_info = g_new(struct thread_info, n_threads);
     counts = qemu_memalign(64, sizeof(*counts) * range);
+    memset(counts, 0, sizeof(*counts) * range);

     for (i = 0; i < n_threads; i++) {
         struct thread_info *info = &th_info[i];
@@ -118,22 +119,29 @@ static void pr_params(void)
     printf(" ops' range:        %u\n", range);
 }

-static void pr_stats(void)
+static int pr_stats(void)
 {
-    unsigned long long val = 0;
+    unsigned long long target_val, val = 0;
     unsigned int i;
     double tx;

     for (i = 0; i < range; i++) {
         val += counts[i].val;
     }
-    assert(val == n_threads * n_ops);
+
+    target_val = (n_threads * n_ops);
+    if (val != target_val) {
+        printf("Bad total: %llu vs %llu\n", val, target_val);
+        return -1;
+    };
     tx = val / duration / 1e6;

     printf("Results:\n");
     printf("Duration:            %.2f s\n", duration);
     printf(" Throughput:         %.2f Mops/s\n", tx);
     printf(" Throughput/thread:  %.2f Mops/s/thread\n", tx / n_threads);
+
+    return 0;
 }

 static void parse_args(int argc, char *argv[])
@@ -175,6 +183,5 @@ int main(int argc, char *argv[])
     pr_params();
     create_threads();
     run_test();
-    pr_stats();
-    return 0;
+    return pr_stats();
 }

--
Alex Bennée

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Qemu-devel] [PATCH v3 26/34] target-arm: Rearrange aa32 load and store functions
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 26/34] target-arm: Rearrange aa32 load and store functions Richard Henderson
@ 2016-09-14 15:58   ` Alex Bennée
  0 siblings, 0 replies; 97+ messages in thread
From: Alex Bennée @ 2016-09-14 15:58 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <rth@twiddle.net> writes:

> Stop specializing on TARGET_LONG_BITS == 32; unconditionally allocate
> a temp and expand with tcg_gen_extu_i32_tl.  Split out gen_aa32_addr,
> gen_aa32_frob64, gen_aa32_ld_i32 and gen_aa32_st_i32 as separate interfaces.
>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  target-arm/translate.c | 171 +++++++++++++++++++------------------------------
>  1 file changed, 66 insertions(+), 105 deletions(-)
>
> diff --git a/target-arm/translate.c b/target-arm/translate.c
> index bd5d5cb..1b5bf87 100644
> --- a/target-arm/translate.c
> +++ b/target-arm/translate.c
> @@ -926,145 +926,106 @@ static inline void store_reg_from_load(DisasContext *s, int reg, TCGv_i32 var)
>   * These functions work like tcg_gen_qemu_{ld,st}* except
>   * that the address argument is TCGv_i32 rather than TCGv.
>   */
<snip>

Phew diff made that quite hard to follow so I did a side-by-side ediff
instead:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

--
Alex Bennée

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Qemu-devel] [PATCH v3 27/34] target-arm: emulate LL/SC using cmpxchg helpers
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 27/34] target-arm: emulate LL/SC using cmpxchg helpers Richard Henderson
@ 2016-09-14 16:03   ` Alex Bennée
  2016-09-14 16:38     ` Richard Henderson
  0 siblings, 1 reply; 97+ messages in thread
From: Alex Bennée @ 2016-09-14 16:03 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, Emilio G. Cota


Richard Henderson <rth@twiddle.net> writes:

> From: "Emilio G. Cota" <cota@braap.org>
>
> Emulating LL/SC with cmpxchg is not correct, since it can
> suffer from the ABA problem. Portable parallel code, however,
> is written assuming only cmpxchg--and not LL/SC--is available.
> This means that in practice emulating LL/SC with cmpxchg is
> a viable alternative.
>
> The appended emulates LL/SC pairs in ARM with cmpxchg helpers.
> This works in both user and system mode. In usermode, it avoids
> pausing all other CPUs to perform the LL/SC pair. The subsequent
> performance and scalability improvement is significant, as the
> plots below show. They plot the throughput of atomic_add-bench
> compiled for ARM and executed on a 64-core x86 machine.
>
> Hi-res plots: http://imgur.com/a/aNQpB
>
>                atomic_add-bench: 1000000 ops/thread, [0,1] range
>
>   9 ++---------+----------+----------+----------+----------+----------+---++
>     +cmpxchg +-E--+       +          +          +          +          +    |
>   8 +Emaster +-H--+                                                       ++
>     | |                                                                    |
>   7 ++E                                                                   ++
>     | |                                                                    |
>   6 ++++                                                                  ++
>     |  |                                                                   |
>   5 ++ |                                                                  ++
>   4 ++ |                                                                  ++
>     |  |                                                                   |
>   3 ++ |                                                                  ++
>     |   |                                                                  |
>   2 ++  |                                                                 ++
>     |H++E+---                                  +++  ---+E+------+E+------+E|
>   1 +++     +E+-----+E+------+E+------+E+------+E+--   +++      +++       ++
>     ++H+       +    +++   +  +++     ++++       +          +          +    |
>   0 ++--H----H-+-----H----+----------+----------+----------+----------+---++
>     0          10         20         30         40         50         60
>                                Number of threads
>
>                 atomic_add-bench: 1000000 ops/thread, [0,2] range
>
>   16 ++---------+----------+---------+----------+----------+----------+---++
>      +cmpxchg +-E--+       +         +          +          +          +    |
>   14 ++master +-H--+                                                      ++
>      | |                                                                   |
>   12 ++|                                                                  ++
>      | E                                                                   |
>   10 ++|                                                                  ++
>      | |                                                                   |
>    8 ++++                                                                 ++
>      |E+|                                                                  |
>      |  |                                                                  |
>    6 ++ |                                                                 ++
>      |   |                                                                 |
>    4 ++  |                                                                ++
>      |  +E+---       +++      +++              +++           ---+E+------+E|
>    2 +H+     +E+------E-------+E+-----+E+------+E+------+E+--            +++
>      + |        +    +++   +         ++++       +          +          +    |
>    0 ++H-H----H-+-----H----+---------+----------+----------+----------+---++
>      0          10         20        30         40         50         60
>                                 Number of threads
>
>                atomic_add-bench: 1000000 ops/thread, [0,128] range
>
>   70 ++---------+----------+---------+----------+----------+----------+---++
>      +cmpxchg +-E--+       +         +          +       ++++          +    |
>   60 ++master +-H--+                                 ----E------+E+-------++
>      |                                        -+E+---   +++     +++      +E|
>      |                                +++ ---- +++                       ++|
>   50 ++                       +++  ---+E+-                                ++
>      |                        -E---                                        |
>   40 ++                    ---+++                                         ++
>      |               +++---                                                |
>      |              -+E+                                                   |
>   30 ++      +++----                                                      ++
>      |       +E+                                                           |
>   20 ++ +++--                                                             ++
>      |  +E+                                                                |
>      |+E+                                                                  |
>   10 +E+                                                                  ++
>      +          +          +         +          +          +          +    |
>    0 +HH-H----H-+-----H----+---------+----------+----------+----------+---++
>      0          10         20        30         40         50         60
>                                 Number of threads
>
>               atomic_add-bench: 1000000 ops/thread, [0,1024] range
>
>   120 ++---------+---------+----------+---------+----------+----------+---++
>       +cmpxchg +-E--+      +          +         +          +          +    |
>       | master +-H--+                                                    ++|
>   100 ++                                                              ----E+
>       |                                                 +++  ---+E+---   ++|
>       |                                                --E---   +++        |
>    80 ++                                           ---- +++               ++
>       |                                     ---+E+-                        |
>    60 ++                              -+E+--                              ++
>       |                       +++ ---- +++                                 |
>       |                      -+E+-                                         |
>    40 ++              +++----                                             ++
>       |      +++   ---+E+                                                  |
>       |     -+E+---                                                        |
>    20 ++ +E+                                                              ++
>       |+E+++                                                               |
>       +E+        +         +          +         +          +          +    |
>     0 +HH-H---H--+-----H---+----------+---------+----------+----------+---++
>       0          10        20         30        40         50         60
>                                 Number of threads
>
> [rth: Enforce alignment for ldrexd.]
>
> Signed-off-by: Emilio G. Cota <cota@braap.org>
> Message-Id: <1467054136-10430-23-git-send-email-cota@braap.org>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  target-arm/translate.c | 136 +++++++++++++++----------------------------------
>  1 file changed, 42 insertions(+), 94 deletions(-)
>
> diff --git a/target-arm/translate.c b/target-arm/translate.c
> index 1b5bf87..680635c 100644
> --- a/target-arm/translate.c
> +++ b/target-arm/translate.c
> @@ -7676,47 +7676,27 @@ static void gen_logicq_cc(TCGv_i32 lo, TCGv_i32 hi)
>      tcg_gen_or_i32(cpu_ZF, lo, hi);
>  }
>
> -/* Load/Store exclusive instructions are implemented by remembering
> -   the value/address loaded, and seeing if these are the same
> -   when the store is performed. This should be sufficient to implement
> -   the architecturally mandated semantics, and avoids having to monitor
> -   regular stores.
> -
> -   In system emulation mode only one CPU will be running at once, so
> -   this sequence is effectively atomic.  In user emulation mode we
> -   throw an exception and handle the atomic operation elsewhere.  */

At least half of this comment is still relevant although it could be
tweaked to mention that we use an atomic cmpxchg for the store that will
fail if exlusive_val doesn't match the current state.

>  static void gen_load_exclusive(DisasContext *s, int rt, int rt2,
>                                 TCGv_i32 addr, int size)
>  {
>      TCGv_i32 tmp = tcg_temp_new_i32();
> +    TCGMemOp opc = size | MO_ALIGN | s->be_data;
>
>      s->is_ldex = true;
>
> -    switch (size) {
> -    case 0:
> -        gen_aa32_ld8u(s, tmp, addr, get_mem_index(s));
> -        break;
> -    case 1:
> -        gen_aa32_ld16ua(s, tmp, addr, get_mem_index(s));
> -        break;
> -    case 2:
> -    case 3:
> -        gen_aa32_ld32ua(s, tmp, addr, get_mem_index(s));
> -        break;
> -    default:
> -        abort();
> -    }
> -
>      if (size == 3) {
>          TCGv_i32 tmp2 = tcg_temp_new_i32();
> -        TCGv_i32 tmp3 = tcg_temp_new_i32();
> +        TCGv_i64 t64 = tcg_temp_new_i64();
> +
> +        gen_aa32_ld_i64(s, t64, addr, get_mem_index(s), opc);
> +        tcg_gen_mov_i64(cpu_exclusive_val, t64);
> +        tcg_gen_extr_i64_i32(tmp, tmp2, t64);
> +        tcg_temp_free_i64(t64);
>
> -        tcg_gen_addi_i32(tmp2, addr, 4);
> -        gen_aa32_ld32u(s, tmp3, tmp2, get_mem_index(s));
> +        store_reg(s, rt2, tmp2);
>          tcg_temp_free_i32(tmp2);
> -        tcg_gen_concat_i32_i64(cpu_exclusive_val, tmp, tmp3);
> -        store_reg(s, rt2, tmp3);
>      } else {
> +        gen_aa32_ld_i32(s, tmp, addr, get_mem_index(s), opc);
>          tcg_gen_extu_i32_i64(cpu_exclusive_val, tmp);
>      }
>
> @@ -7729,23 +7709,15 @@ static void gen_clrex(DisasContext *s)
>      tcg_gen_movi_i64(cpu_exclusive_addr, -1);
>  }
>
> -#ifdef CONFIG_USER_ONLY
> -static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2,
> -                                TCGv_i32 addr, int size)
> -{
> -    tcg_gen_extu_i32_i64(cpu_exclusive_test, addr);
> -    tcg_gen_movi_i32(cpu_exclusive_info,
> -                     size | (rd << 4) | (rt << 8) | (rt2 << 12));
> -    gen_exception_internal_insn(s, 4, EXCP_STREX);
> -}
> -#else
>  static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2,
>                                  TCGv_i32 addr, int size)
>  {
> -    TCGv_i32 tmp;
> -    TCGv_i64 val64, extaddr;
> +    TCGv_i32 t0, t1, t2;
> +    TCGv_i64 extaddr;
> +    TCGv taddr;
>      TCGLabel *done_label;
>      TCGLabel *fail_label;
> +    TCGMemOp opc = size | MO_ALIGN | s->be_data;
>
>      /* if (env->exclusive_addr == addr && env->exclusive_val == [addr]) {
>           [addr] = {Rt};
> @@ -7760,69 +7732,45 @@ static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2,
>      tcg_gen_brcond_i64(TCG_COND_NE, extaddr, cpu_exclusive_addr, fail_label);
>      tcg_temp_free_i64(extaddr);
>
> -    tmp = tcg_temp_new_i32();
> -    switch (size) {
> -    case 0:
> -        gen_aa32_ld8u(s, tmp, addr, get_mem_index(s));
> -        break;
> -    case 1:
> -        gen_aa32_ld16u(s, tmp, addr, get_mem_index(s));
> -        break;
> -    case 2:
> -    case 3:
> -        gen_aa32_ld32u(s, tmp, addr, get_mem_index(s));
> -        break;
> -    default:
> -        abort();
> -    }
> -
> -    val64 = tcg_temp_new_i64();
> +    taddr = gen_aa32_addr(s, addr, opc);
> +    t0 = tcg_temp_new_i32();
> +    t1 = load_reg(s, rt);
>      if (size == 3) {
> -        TCGv_i32 tmp2 = tcg_temp_new_i32();
> -        TCGv_i32 tmp3 = tcg_temp_new_i32();
> -        tcg_gen_addi_i32(tmp2, addr, 4);
> -        gen_aa32_ld32u(s, tmp3, tmp2, get_mem_index(s));
> -        tcg_temp_free_i32(tmp2);
> -        tcg_gen_concat_i32_i64(val64, tmp, tmp3);
> -        tcg_temp_free_i32(tmp3);
> -    } else {
> -        tcg_gen_extu_i32_i64(val64, tmp);
> -    }
> -    tcg_temp_free_i32(tmp);
> +        TCGv_i64 o64 = tcg_temp_new_i64();
> +        TCGv_i64 n64 = tcg_temp_new_i64();
>
> -    tcg_gen_brcond_i64(TCG_COND_NE, val64, cpu_exclusive_val, fail_label);
> -    tcg_temp_free_i64(val64);
> +        t2 = load_reg(s, rt2);
> +        tcg_gen_concat_i32_i64(n64, t1, t2);
> +        tcg_temp_free_i32(t2);
> +        gen_aa32_frob64(s, n64);
>
> -    tmp = load_reg(s, rt);
> -    switch (size) {
> -    case 0:
> -        gen_aa32_st8(s, tmp, addr, get_mem_index(s));
> -        break;
> -    case 1:
> -        gen_aa32_st16(s, tmp, addr, get_mem_index(s));
> -        break;
> -    case 2:
> -    case 3:
> -        gen_aa32_st32(s, tmp, addr, get_mem_index(s));
> -        break;
> -    default:
> -        abort();
> -    }
> -    tcg_temp_free_i32(tmp);
> -    if (size == 3) {
> -        tcg_gen_addi_i32(addr, addr, 4);
> -        tmp = load_reg(s, rt2);
> -        gen_aa32_st32(s, tmp, addr, get_mem_index(s));
> -        tcg_temp_free_i32(tmp);
> +        tcg_gen_atomic_cmpxchg_i64(o64, taddr, cpu_exclusive_val, n64,
> +                                   get_mem_index(s), opc);
> +        tcg_temp_free_i64(n64);
> +
> +        gen_aa32_frob64(s, o64);
> +        tcg_gen_setcond_i64(TCG_COND_NE, o64, o64, cpu_exclusive_val);
> +        tcg_gen_extrl_i64_i32(t0, o64);
> +
> +        tcg_temp_free_i64(o64);
> +    } else {
> +        t2 = tcg_temp_new_i32();
> +        tcg_gen_extrl_i64_i32(t2, cpu_exclusive_val);
> +        tcg_gen_atomic_cmpxchg_i32(t0, taddr, t2, t1, get_mem_index(s), opc);
> +        tcg_gen_setcond_i32(TCG_COND_NE, t0, t0, t2);
> +        tcg_temp_free_i32(t2);
>      }
> -    tcg_gen_movi_i32(cpu_R[rd], 0);
> +    tcg_temp_free_i32(t1);
> +    tcg_temp_free(taddr);
> +    tcg_gen_mov_i32(cpu_R[rd], t0);
> +    tcg_temp_free_i32(t0);
>      tcg_gen_br(done_label);
> +
>      gen_set_label(fail_label);
>      tcg_gen_movi_i32(cpu_R[rd], 1);
>      gen_set_label(done_label);
>      tcg_gen_movi_i64(cpu_exclusive_addr, -1);
>  }
> -#endif
>
>  /* gen_srs:
>   * @env: CPUARMState

Apart from the comments the code gen looks good to me:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

--
Alex Bennée

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Qemu-devel] [PATCH v3 28/34] target-arm: emulate SWP with atomic_xchg helper
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 28/34] target-arm: emulate SWP with atomic_xchg helper Richard Henderson
@ 2016-09-14 16:05   ` Alex Bennée
  0 siblings, 0 replies; 97+ messages in thread
From: Alex Bennée @ 2016-09-14 16:05 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, Emilio G. Cota


Richard Henderson <rth@twiddle.net> writes:

> From: "Emilio G. Cota" <cota@braap.org>
>
> Signed-off-by: Emilio G. Cota <cota@braap.org>
> Message-Id: <1467054136-10430-25-git-send-email-cota@braap.org>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  target-arm/translate.c | 25 +++++++++++++------------
>  1 file changed, 13 insertions(+), 12 deletions(-)
>
> diff --git a/target-arm/translate.c b/target-arm/translate.c
> index 680635c..2b3c34f 100644
> --- a/target-arm/translate.c
> +++ b/target-arm/translate.c
> @@ -8741,25 +8741,26 @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
>                          }
>                          tcg_temp_free_i32(addr);
>                      } else {
> +                        TCGv taddr;
> +                        TCGMemOp opc = s->be_data;
> +
>                          /* SWP instruction */
>                          rm = (insn) & 0xf;
>
> -                        /* ??? This is not really atomic.  However we know
> -                           we never have multiple CPUs running in parallel,
> -                           so it is good enough.  */
> -                        addr = load_reg(s, rn);
> -                        tmp = load_reg(s, rm);
> -                        tmp2 = tcg_temp_new_i32();
>                          if (insn & (1 << 22)) {
> -                            gen_aa32_ld8u(s, tmp2, addr, get_mem_index(s));
> -                            gen_aa32_st8(s, tmp, addr, get_mem_index(s));
> +                            opc |= MO_UB;
>                          } else {
> -                            gen_aa32_ld32u(s, tmp2, addr, get_mem_index(s));
> -                            gen_aa32_st32(s, tmp, addr, get_mem_index(s));
> +                            opc |= MO_UL | MO_ALIGN;
>                          }
> -                        tcg_temp_free_i32(tmp);
> +
> +                        addr = load_reg(s, rn);
> +                        taddr = gen_aa32_addr(s, addr, opc);
>                          tcg_temp_free_i32(addr);
> -                        store_reg(s, rd, tmp2);
> +
> +                        tmp = load_reg(s, rm);
> +                        tcg_gen_atomic_xchg_i32(tmp, taddr, tmp,
> +                                                get_mem_index(s), opc);
> +                        store_reg(s, rd, tmp);
>                      }
>                  }
>              } else {

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

--
Alex Bennée

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Qemu-devel] [PATCH v3 27/34] target-arm: emulate LL/SC using cmpxchg helpers
  2016-09-14 16:03   ` Alex Bennée
@ 2016-09-14 16:38     ` Richard Henderson
  2016-10-20 17:51       ` Pranith Kumar
  0 siblings, 1 reply; 97+ messages in thread
From: Richard Henderson @ 2016-09-14 16:38 UTC (permalink / raw)
  To: Alex Bennée; +Cc: Emilio G. Cota, qemu-devel

On 09/14/2016 09:03 AM, Alex Bennée wrote:
>> > -/* Load/Store exclusive instructions are implemented by remembering
>> > -   the value/address loaded, and seeing if these are the same
>> > -   when the store is performed. This should be sufficient to implement
>> > -   the architecturally mandated semantics, and avoids having to monitor
>> > -   regular stores.
>> > -
>> > -   In system emulation mode only one CPU will be running at once, so
>> > -   this sequence is effectively atomic.  In user emulation mode we
>> > -   throw an exception and handle the atomic operation elsewhere.  */
> At least half of this comment is still relevant although it could be
> tweaked to mention that we use an atomic cmpxchg for the store that will
> fail if exlusive_val doesn't match the current state.
> 

Added back

/* Load/Store exclusive instructions are implemented by remembering
   the value/address loaded, and seeing if these are the same
   when the store is performed.  This should be sufficient to implement
   the architecturally mandated semantics, and avoids having to monitor
   regular stores.  The compare vs the remembered value is done during
   the cmpxchg operation, but we must compare the addresses manually.  */


r~

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Qemu-devel] [PATCH v3 25/34] tests: add atomic_add-bench
  2016-09-14 13:53   ` Alex Bennée
@ 2016-09-15  2:23     ` Emilio G. Cota
  0 siblings, 0 replies; 97+ messages in thread
From: Emilio G. Cota @ 2016-09-15  2:23 UTC (permalink / raw)
  To: Alex Bennée; +Cc: Richard Henderson, qemu-devel

On Wed, Sep 14, 2016 at 14:53:14 +0100, Alex Bennée wrote:
> Richard Henderson <rth@twiddle.net> writes:
> > From: "Emilio G. Cota" <cota@braap.org>
> >  QEMU_CFLAGS += -I$(SRC_PATH)/tests
> > @@ -465,6 +466,7 @@ tests/test-qdist$(EXESUF): tests/test-qdist.o $(test-util-obj-y)
> >  tests/test-qht$(EXESUF): tests/test-qht.o $(test-util-obj-y)
> >  tests/test-qht-par$(EXESUF): tests/test-qht-par.o tests/qht-bench$(EXESUF) $(test-util-obj-y)
> >  tests/qht-bench$(EXESUF): tests/qht-bench.o $(test-util-obj-y)
> > +tests/atomic_add-bench$(EXESUF): tests/atomic_add-bench.o
> >  $(test-util-obj-y)
> 
> This probably more properly lives in tests/tcg/generic or some such but
> that needs the tcg/tests being rehabilitated into the build system so at
> least here it gets built.

I didn't know where to put it; tests/ was easy enough :-)

> >  tests/test-qdev-global-props$(EXESUF): tests/test-qdev-global-props.o \
> >  	hw/core/qdev.o hw/core/qdev-properties.o hw/core/hotplug.o\
> > diff --git a/tests/atomic_add-bench.c b/tests/atomic_add-bench.c
> > new file mode 100644
> > index 0000000..5bbecf6
> > --- /dev/null
> > +++ b/tests/atomic_add-bench.c
> 
> I wonder if this would be worth making atomic-bench and adding the other
> atomic operations into the benchmark? I know given the current helper
> overhead its unlikely to show much difference between the ops but if we
> move to backend support for the tcg atomics it would be a useful tool to
> have.

I'd rather add more ops later if necessary, but if you insist I can do it.

(snip)
> > +static void create_threads(void)
> > +{
> > +    unsigned int i;
> > +
> > +    threads = g_new(QemuThread, n_threads);
> > +    th_info = g_new(struct thread_info, n_threads);
> > +    counts = qemu_memalign(64, sizeof(*counts) * range);
> 
> This fails on my setup as AFAICT qemu_memalign doesn't give you zeroed
> memory. I added a memset after to zero it out.

Yes I fixed this more than a month ago, among other things in this program,
e.g., running for -d seconds instead of -n operations (much easier way to
fairly measure throughput).

Obviously forgot to tell anyone about it :/ sorry for making you waste time.

I'm appending the appropriate delta -- just checked it applies cleanly over
rth's atomic-3 branch on github.

Thanks,

		Emilio

>From f4a1a6fe2ffcf9572353f0b85a21ed27cd1765e1 Mon Sep 17 00:00:00 2001
From: "Emilio G. Cota" <cota@braap.org>
Date: Tue, 9 Aug 2016 23:14:13 -0400
Subject: [PATCH] tests: fix atomic_add_bench

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 tests/atomic_add-bench.c | 51 ++++++++++++++++--------------------------------
 1 file changed, 17 insertions(+), 34 deletions(-)

diff --git a/tests/atomic_add-bench.c b/tests/atomic_add-bench.c
index 06300ba..dc97441 100644
--- a/tests/atomic_add-bench.c
+++ b/tests/atomic_add-bench.c
@@ -17,14 +17,14 @@ static struct thread_info *th_info;
 static unsigned int n_threads = 1;
 static unsigned int n_ready_threads;
 static struct count *counts;
-static unsigned long n_ops = 10000;
-static double duration;
-static unsigned int range = 1;
+static unsigned int duration = 1;
+static unsigned int range = 1024;
 static bool test_start;
+static bool test_stop;
 
 static const char commands_string[] =
     " -n = number of threads\n"
-    " -o = number of ops per thread\n"
+    " -d = duration in seconds\n"
     " -r = range (will be rounded up to pow2)";
 
 static void usage_complete(char *argv[])
@@ -49,14 +49,13 @@ static uint64_t xorshift64star(uint64_t x)
 static void *thread_func(void *arg)
 {
     struct thread_info *info = arg;
-    unsigned long i;
 
     atomic_inc(&n_ready_threads);
     while (!atomic_mb_read(&test_start)) {
         cpu_relax();
     }
 
-    for (i = 0; i < n_ops; i++) {
+    while (!atomic_read(&test_stop)) {
         unsigned int index;
 
         info->r = xorshift64star(info->r);
@@ -66,32 +65,23 @@ static void *thread_func(void *arg)
     return NULL;
 }
 
-static inline
-uint64_t ts_subtract(const struct timespec *a, const struct timespec *b)
-{
-    uint64_t ns;
-
-    ns = (b->tv_sec - a->tv_sec) * 1000000000ULL;
-    ns += (b->tv_nsec - a->tv_nsec);
-    return ns;
-}
-
 static void run_test(void)
 {
+    unsigned int remaining;
     unsigned int i;
-    struct timespec ts_start, ts_end;
 
     while (atomic_read(&n_ready_threads) != n_threads) {
         cpu_relax();
     }
     atomic_mb_set(&test_start, true);
+    do {
+        remaining = sleep(duration);
+    } while (remaining);
+    atomic_mb_set(&test_stop, true);
 
-    clock_gettime(CLOCK_MONOTONIC, &ts_start);
     for (i = 0; i < n_threads; i++) {
         qemu_thread_join(&threads[i]);
     }
-    clock_gettime(CLOCK_MONOTONIC, &ts_end);
-    duration = ts_subtract(&ts_start, &ts_end) / 1e9;
 }
 
 static void create_threads(void)
@@ -101,6 +91,7 @@ static void create_threads(void)
     threads = g_new(QemuThread, n_threads);
     th_info = g_new(struct thread_info, n_threads);
     counts = qemu_memalign(64, sizeof(*counts) * range);
+    memset(counts, 0, sizeof(*counts) * range);
 
     for (i = 0; i < n_threads; i++) {
         struct thread_info *info = &th_info[i];
@@ -115,7 +106,7 @@ static void pr_params(void)
 {
     printf("Parameters:\n");
     printf(" # of threads:      %u\n", n_threads);
-    printf(" n_ops:             %lu\n", n_ops);
+    printf(" duration:          %u\n", duration);
     printf(" ops' range:        %u\n", range);
 }
 
@@ -128,22 +119,20 @@ static void pr_stats(void)
     for (i = 0; i < range; i++) {
         val += counts[i].val;
     }
-    assert(val == n_threads * n_ops);
     tx = val / duration / 1e6;
 
     printf("Results:\n");
-    printf("Duration:            %.2f s\n", duration);
+    printf("Duration:            %u s\n", duration);
     printf(" Throughput:         %.2f Mops/s\n", tx);
     printf(" Throughput/thread:  %.2f Mops/s/thread\n", tx / n_threads);
 }
 
 static void parse_args(int argc, char *argv[])
 {
-    unsigned long long n_ops_ull;
     int c;
 
     for (;;) {
-        c = getopt(argc, argv, "hn:o:r:");
+        c = getopt(argc, argv, "hd:n:r:");
         if (c < 0) {
             break;
         }
@@ -151,18 +140,12 @@ static void parse_args(int argc, char *argv[])
         case 'h':
             usage_complete(argv);
             exit(0);
+        case 'd':
+            duration = atoi(optarg);
+            break;
         case 'n':
             n_threads = atoi(optarg);
             break;
-        case 'o':
-            n_ops_ull = atoll(optarg);
-            if (n_ops_ull > ULONG_MAX) {
-                fprintf(stderr,
-                        "fatal: -o cannot be greater than %lu\n", ULONG_MAX);
-                exit(1);
-            }
-            n_ops = n_ops_ull;
-            break;
         case 'r':
             range = pow2ceil(atoi(optarg));
             break;
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* Re: [Qemu-devel] [PATCH v3 30/34] linux-user: remove handling of ARM's EXCP_STREX
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 30/34] linux-user: remove handling of ARM's EXCP_STREX Richard Henderson
@ 2016-09-15  9:36   ` Alex Bennée
  0 siblings, 0 replies; 97+ messages in thread
From: Alex Bennée @ 2016-09-15  9:36 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, Emilio G. Cota, Richard Henderson


Richard Henderson <rth@twiddle.net> writes:

> From: "Emilio G. Cota" <cota@braap.org>
>
> The exception is not emitted anymore.
>
> Signed-off-by: Emilio G. Cota <cota@braap.org>
> Signed-off-by: Richard Henderson <rth@twidle.net>
> Message-Id: <1467054136-10430-29-git-send-email-cota@braap.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  linux-user/main.c | 93 -------------------------------------------------------
>  1 file changed, 93 deletions(-)
>
> diff --git a/linux-user/main.c b/linux-user/main.c
> index 8d5c948..256382a 100644
> --- a/linux-user/main.c
> +++ b/linux-user/main.c
> @@ -651,94 +651,6 @@ do_kernel_trap(CPUARMState *env)
>      return 0;
>  }
>
> -/* Store exclusive handling for AArch32 */
> -static int do_strex(CPUARMState *env)
> -{
> -    uint64_t val;
> -    int size;
> -    int rc = 1;
> -    int segv = 0;
> -    uint32_t addr;
> -    start_exclusive();
> -    if (env->exclusive_addr != env->exclusive_test) {
> -        goto fail;
> -    }
> -    /* We know we're always AArch32 so the address is in uint32_t range
> -     * unless it was the -1 exclusive-monitor-lost value (which won't
> -     * match exclusive_test above).
> -     */
> -    assert(extract64(env->exclusive_addr, 32, 32) == 0);
> -    addr = env->exclusive_addr;
> -    size = env->exclusive_info & 0xf;
> -    switch (size) {
> -    case 0:
> -        segv = get_user_u8(val, addr);
> -        break;
> -    case 1:
> -        segv = get_user_data_u16(val, addr, env);
> -        break;
> -    case 2:
> -    case 3:
> -        segv = get_user_data_u32(val, addr, env);
> -        break;
> -    default:
> -        abort();
> -    }
> -    if (segv) {
> -        env->exception.vaddress = addr;
> -        goto done;
> -    }
> -    if (size == 3) {
> -        uint32_t valhi;
> -        segv = get_user_data_u32(valhi, addr + 4, env);
> -        if (segv) {
> -            env->exception.vaddress = addr + 4;
> -            goto done;
> -        }
> -        if (arm_cpu_bswap_data(env)) {
> -            val = deposit64((uint64_t)valhi, 32, 32, val);
> -        } else {
> -            val = deposit64(val, 32, 32, valhi);
> -        }
> -    }
> -    if (val != env->exclusive_val) {
> -        goto fail;
> -    }
> -
> -    val = env->regs[(env->exclusive_info >> 8) & 0xf];
> -    switch (size) {
> -    case 0:
> -        segv = put_user_u8(val, addr);
> -        break;
> -    case 1:
> -        segv = put_user_data_u16(val, addr, env);
> -        break;
> -    case 2:
> -    case 3:
> -        segv = put_user_data_u32(val, addr, env);
> -        break;
> -    }
> -    if (segv) {
> -        env->exception.vaddress = addr;
> -        goto done;
> -    }
> -    if (size == 3) {
> -        val = env->regs[(env->exclusive_info >> 12) & 0xf];
> -        segv = put_user_data_u32(val, addr + 4, env);
> -        if (segv) {
> -            env->exception.vaddress = addr + 4;
> -            goto done;
> -        }
> -    }
> -    rc = 0;
> -fail:
> -    env->regs[15] += 4;
> -    env->regs[(env->exclusive_info >> 4) & 0xf] = rc;
> -done:
> -    end_exclusive();
> -    return segv;
> -}
> -
>  void cpu_loop(CPUARMState *env)
>  {
>      CPUState *cs = CPU(arm_env_get_cpu(env));
> @@ -908,11 +820,6 @@ void cpu_loop(CPUARMState *env)
>          case EXCP_INTERRUPT:
>              /* just indicate that signals should be handled asap */
>              break;
> -        case EXCP_STREX:
> -            if (!do_strex(env)) {
> -                break;
> -            }
> -            /* fall through for segv */
>          case EXCP_PREFETCH_ABORT:
>          case EXCP_DATA_ABORT:
>              addr = env->exception.vaddress;


--
Alex Bennée

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Qemu-devel] [PATCH v3 31/34] linux-user: remove handling of aarch64's EXCP_STREX
  2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 31/34] linux-user: remove handling of aarch64's EXCP_STREX Richard Henderson
@ 2016-09-15  9:36   ` Alex Bennée
  0 siblings, 0 replies; 97+ messages in thread
From: Alex Bennée @ 2016-09-15  9:36 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, Emilio G. Cota


Richard Henderson <rth@twiddle.net> writes:

> From: "Emilio G. Cota" <cota@braap.org>
>
> The exception is not emitted anymore.
>
> Signed-off-by: Emilio G. Cota <cota@braap.org>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> Message-Id: <1467054136-10430-30-git-send-email-cota@braap.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  linux-user/main.c | 125 ------------------------------------------------------
>  1 file changed, 125 deletions(-)
>
> diff --git a/linux-user/main.c b/linux-user/main.c
> index 256382a..64838bf 100644
> --- a/linux-user/main.c
> +++ b/linux-user/main.c
> @@ -868,124 +868,6 @@ void cpu_loop(CPUARMState *env)
>
>  #else
>
> -/*
> - * Handle AArch64 store-release exclusive
> - *
> - * rs = gets the status result of store exclusive
> - * rt = is the register that is stored
> - * rt2 = is the second register store (in STP)
> - *
> - */
> -static int do_strex_a64(CPUARMState *env)
> -{
> -    uint64_t val;
> -    int size;
> -    bool is_pair;
> -    int rc = 1;
> -    int segv = 0;
> -    uint64_t addr;
> -    int rs, rt, rt2;
> -
> -    start_exclusive();
> -    /* size | is_pair << 2 | (rs << 4) | (rt << 9) | (rt2 << 14)); */
> -    size = extract32(env->exclusive_info, 0, 2);
> -    is_pair = extract32(env->exclusive_info, 2, 1);
> -    rs = extract32(env->exclusive_info, 4, 5);
> -    rt = extract32(env->exclusive_info, 9, 5);
> -    rt2 = extract32(env->exclusive_info, 14, 5);
> -
> -    addr = env->exclusive_addr;
> -
> -    if (addr != env->exclusive_test) {
> -        goto finish;
> -    }
> -
> -    switch (size) {
> -    case 0:
> -        segv = get_user_u8(val, addr);
> -        break;
> -    case 1:
> -        segv = get_user_u16(val, addr);
> -        break;
> -    case 2:
> -        segv = get_user_u32(val, addr);
> -        break;
> -    case 3:
> -        segv = get_user_u64(val, addr);
> -        break;
> -    default:
> -        abort();
> -    }
> -    if (segv) {
> -        env->exception.vaddress = addr;
> -        goto error;
> -    }
> -    if (val != env->exclusive_val) {
> -        goto finish;
> -    }
> -    if (is_pair) {
> -        if (size == 2) {
> -            segv = get_user_u32(val, addr + 4);
> -        } else {
> -            segv = get_user_u64(val, addr + 8);
> -        }
> -        if (segv) {
> -            env->exception.vaddress = addr + (size == 2 ? 4 : 8);
> -            goto error;
> -        }
> -        if (val != env->exclusive_high) {
> -            goto finish;
> -        }
> -    }
> -    /* handle the zero register */
> -    val = rt == 31 ? 0 : env->xregs[rt];
> -    switch (size) {
> -    case 0:
> -        segv = put_user_u8(val, addr);
> -        break;
> -    case 1:
> -        segv = put_user_u16(val, addr);
> -        break;
> -    case 2:
> -        segv = put_user_u32(val, addr);
> -        break;
> -    case 3:
> -        segv = put_user_u64(val, addr);
> -        break;
> -    }
> -    if (segv) {
> -        goto error;
> -    }
> -    if (is_pair) {
> -        /* handle the zero register */
> -        val = rt2 == 31 ? 0 : env->xregs[rt2];
> -        if (size == 2) {
> -            segv = put_user_u32(val, addr + 4);
> -        } else {
> -            segv = put_user_u64(val, addr + 8);
> -        }
> -        if (segv) {
> -            env->exception.vaddress = addr + (size == 2 ? 4 : 8);
> -            goto error;
> -        }
> -    }
> -    rc = 0;
> -finish:
> -    env->pc += 4;
> -    /* rs == 31 encodes a write to the ZR, thus throwing away
> -     * the status return. This is rather silly but valid.
> -     */
> -    if (rs < 31) {
> -        env->xregs[rs] = rc;
> -    }
> -error:
> -    /* instruction faulted, PC does not advance */
> -    /* either way a strex releases any exclusive lock we have */
> -    env->exclusive_addr = -1;
> -    end_exclusive();
> -    return segv;
> -}
> -
>  /* AArch64 main loop */
>  void cpu_loop(CPUARMState *env)
>  {
> @@ -1026,11 +908,6 @@ void cpu_loop(CPUARMState *env)
>              info._sifields._sigfault._addr = env->pc;
>              queue_signal(env, info.si_signo, &info);
>              break;
> -        case EXCP_STREX:
> -            if (!do_strex_a64(env)) {
> -                break;
> -            }
> -            /* fall through for segv */
>          case EXCP_PREFETCH_ABORT:
>          case EXCP_DATA_ABORT:
>              info.si_signo = TARGET_SIGSEGV;
> @@ -1066,8 +943,6 @@ void cpu_loop(CPUARMState *env)
>          process_pending_signals(env);
>          /* Exception return on AArch64 always clears the exclusive monitor,
>           * so any return to running guest code implies this.
> -         * A strex (successful or otherwise) also clears the monitor, so
> -         * we don't need to specialcase EXCP_STREX.
>           */
>          env->exclusive_addr = -1;
>      }


--
Alex Bennée

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Qemu-devel] [PATCH v3 32/34] target-arm: remove EXCP_STREX + cpu_exclusive_{test, info}
  2016-09-03 20:40 ` [Qemu-devel] [PATCH v3 32/34] target-arm: remove EXCP_STREX + cpu_exclusive_{test, info} Richard Henderson
@ 2016-09-15  9:39   ` Alex Bennée
  0 siblings, 0 replies; 97+ messages in thread
From: Alex Bennée @ 2016-09-15  9:39 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, Emilio G. Cota


Richard Henderson <rth@twiddle.net> writes:

> From: "Emilio G. Cota" <cota@braap.org>
>
> The exception is not emitted anymore; remove it and the associated
> TCG variables.
>
> Signed-off-by: Emilio G. Cota <cota@braap.org>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> Message-Id: <1467054136-10430-31-git-send-email-cota@braap.org>
> ---
>  target-arm/cpu.h       | 17 ++++++-----------
>  target-arm/internals.h |  4 +---
>  target-arm/translate.c | 10 ----------
>  target-arm/translate.h |  4 ----
>  4 files changed, 7 insertions(+), 28 deletions(-)
>
> diff --git a/target-arm/cpu.h b/target-arm/cpu.h
> index 76d824d..a38cec0 100644
> --- a/target-arm/cpu.h
> +++ b/target-arm/cpu.h
> @@ -46,13 +46,12 @@
>  #define EXCP_BKPT            7
>  #define EXCP_EXCEPTION_EXIT  8   /* Return from v7M exception.  */
>  #define EXCP_KERNEL_TRAP     9   /* Jumped to kernel code page.  */
> -#define EXCP_STREX          10
> -#define EXCP_HVC            11   /* HyperVisor Call */
> -#define EXCP_HYP_TRAP       12
> -#define EXCP_SMC            13   /* Secure Monitor Call */
> -#define EXCP_VIRQ           14
> -#define EXCP_VFIQ           15
> -#define EXCP_SEMIHOST       16   /* semihosting call (A64 only) */
> +#define EXCP_HVC            10   /* HyperVisor Call */
> +#define EXCP_HYP_TRAP       11
> +#define EXCP_SMC            12   /* Secure Monitor Call */
> +#define EXCP_VIRQ           13
> +#define EXCP_VFIQ           14
> +#define EXCP_SEMIHOST       15   /* semihosting call (A64 only) */
>
>  #define ARMV7M_EXCP_RESET   1
>  #define ARMV7M_EXCP_NMI     2
> @@ -475,10 +474,6 @@ typedef struct CPUARMState {
>      uint64_t exclusive_addr;
>      uint64_t exclusive_val;
>      uint64_t exclusive_high;
> -#if defined(CONFIG_USER_ONLY)
> -    uint64_t exclusive_test;
> -    uint32_t exclusive_info;
> -#endif
>
>      /* iwMMXt coprocessor state.  */
>      struct {
> diff --git a/target-arm/internals.h b/target-arm/internals.h
> index cd57401..3edccd2 100644
> --- a/target-arm/internals.h
> +++ b/target-arm/internals.h
> @@ -46,8 +46,7 @@ static inline bool excp_is_internal(int excp)
>          || excp == EXCP_HALTED
>          || excp == EXCP_EXCEPTION_EXIT
>          || excp == EXCP_KERNEL_TRAP
> -        || excp == EXCP_SEMIHOST
> -        || excp == EXCP_STREX;
> +        || excp == EXCP_SEMIHOST;
>  }
>
>  /* Exception names for debug logging; note that not all of these
> @@ -63,7 +62,6 @@ static const char * const excnames[] = {
>      [EXCP_BKPT] = "Breakpoint",
>      [EXCP_EXCEPTION_EXIT] = "QEMU v7M exception exit",
>      [EXCP_KERNEL_TRAP] = "QEMU intercept of kernel commpage",
> -    [EXCP_STREX] = "QEMU intercept of STREX",
>      [EXCP_HVC] = "Hypervisor Call",
>      [EXCP_HYP_TRAP] = "Hypervisor Trap",
>      [EXCP_SMC] = "Secure Monitor Call",
> diff --git a/target-arm/translate.c b/target-arm/translate.c
> index 2b3c34f..e8e8502 100644
> --- a/target-arm/translate.c
> +++ b/target-arm/translate.c
> @@ -64,10 +64,6 @@ static TCGv_i32 cpu_R[16];
>  TCGv_i32 cpu_CF, cpu_NF, cpu_VF, cpu_ZF;
>  TCGv_i64 cpu_exclusive_addr;
>  TCGv_i64 cpu_exclusive_val;
> -#ifdef CONFIG_USER_ONLY
> -TCGv_i64 cpu_exclusive_test;
> -TCGv_i32 cpu_exclusive_info;
> -#endif
>
>  /* FIXME:  These should be removed.  */
>  static TCGv_i32 cpu_F0s, cpu_F1s;
> @@ -101,12 +97,6 @@ void arm_translate_init(void)
>          offsetof(CPUARMState, exclusive_addr), "exclusive_addr");
>      cpu_exclusive_val = tcg_global_mem_new_i64(cpu_env,
>          offsetof(CPUARMState, exclusive_val), "exclusive_val");
> -#ifdef CONFIG_USER_ONLY
> -    cpu_exclusive_test = tcg_global_mem_new_i64(cpu_env,
> -        offsetof(CPUARMState, exclusive_test), "exclusive_test");
> -    cpu_exclusive_info = tcg_global_mem_new_i32(cpu_env,
> -        offsetof(CPUARMState, exclusive_info), "exclusive_info");
> -#endif
>
>      a64_translate_init();
>  }
> diff --git a/target-arm/translate.h b/target-arm/translate.h
> index dbd7ac8..d4e205e 100644
> --- a/target-arm/translate.h
> +++ b/target-arm/translate.h
> @@ -77,10 +77,6 @@ extern TCGv_env cpu_env;
>  extern TCGv_i32 cpu_NF, cpu_ZF, cpu_CF, cpu_VF;
>  extern TCGv_i64 cpu_exclusive_addr;
>  extern TCGv_i64 cpu_exclusive_val;
> -#ifdef CONFIG_USER_ONLY
> -extern TCGv_i64 cpu_exclusive_test;
> -extern TCGv_i32 cpu_exclusive_info;
> -#endif
>
>  static inline int arm_dc_feature(DisasContext *dc, int feature)
>  {


Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

--
Alex Bennée

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Qemu-devel] [PATCH v3 33/34] target-alpha: Introduce MMU_PHYS_IDX
  2016-09-03 20:40 ` [Qemu-devel] [PATCH v3 33/34] target-alpha: Introduce MMU_PHYS_IDX Richard Henderson
@ 2016-09-15 10:10   ` Alex Bennée
  2016-09-15 16:38     ` Richard Henderson
  0 siblings, 1 reply; 97+ messages in thread
From: Alex Bennée @ 2016-09-15 10:10 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <rth@twiddle.net> writes:

> Rather than using helpers for physical accesses, use a mmu index.
> The primary cleanup is with store-conditional on physical addresses.
>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  target-alpha/cpu.h        | 18 +++++-------
>  target-alpha/helper.c     | 10 ++++++-
>  target-alpha/helper.h     |  9 ------
>  target-alpha/mem_helper.c | 73 -----------------------------------------------
>  target-alpha/translate.c  | 50 ++++++++++++++++++--------------
>  5 files changed, 45 insertions(+), 115 deletions(-)
>
> diff --git a/target-alpha/cpu.h b/target-alpha/cpu.h
> index ac5e801..9d9489c 100644
> --- a/target-alpha/cpu.h
> +++ b/target-alpha/cpu.h
> @@ -201,7 +201,7 @@ enum {
>
>  /* MMU modes definitions */
>
> -/* Alpha has 5 MMU modes: PALcode, kernel, executive, supervisor, and user.
> +/* Alpha has 5 MMU modes: PALcode, Kernel, Executive, Supervisor, and User.
>     The Unix PALcode only exposes the kernel and user modes; presumably
>     executive and supervisor are used by VMS.
>
> @@ -209,22 +209,18 @@ enum {
>     there are PALmode instructions that can access data via physical mode
>     or via an os-installed "alternate mode", which is one of the 4 above.
>
> -   QEMU does not currently properly distinguish between code/data when
> -   looking up addresses.  To avoid having to address this issue, our
> -   emulated PALcode will cheat and use the KSEG mapping for its code+data
> -   rather than physical addresses.
> +   That said, we're only emulating Unix PALcode, and not attempting VMS,
> +   so we don't need to implement Executive and Supervisor.  QEMU's own
> +   PALcode cheats and usees the KSEG mapping for its code+data rather than
> +   physical addresses.  */
>
> -   Moreover, we're only emulating Unix PALcode, and not attempting VMS.
> -
> -   All of which allows us to drop all but kernel and user modes.
> -   Elide the unused MMU modes to save space.  */
> -
> -#define NB_MMU_MODES 2
> +#define NB_MMU_MODES 3
>
>  #define MMU_MODE0_SUFFIX _kernel
>  #define MMU_MODE1_SUFFIX _user
>  #define MMU_KERNEL_IDX   0
>  #define MMU_USER_IDX     1
> +#define MMU_PHYS_IDX     2
>
>  typedef struct CPUAlphaState CPUAlphaState;
>
> diff --git a/target-alpha/helper.c b/target-alpha/helper.c
> index 85168b7..1ed0725 100644
> --- a/target-alpha/helper.c
> +++ b/target-alpha/helper.c
> @@ -126,6 +126,14 @@ static int get_physical_address(CPUAlphaState *env, target_ulong addr,
>      int prot = 0;
>      int ret = MM_K_ACV;
>
> +    /* Handle physical accesses.  */
> +    if (mmu_idx == MMU_PHYS_IDX) {
> +        phys = addr;
> +        prot = PAGE_READ | PAGE_WRITE | PAGE_EXEC;
> +        ret = -1;
> +        goto exit;
> +    }
> +
>      /* Ensure that the virtual address is properly sign-extended from
>         the last implemented virtual address bit.  */
>      if (saddr >> TARGET_VIRT_ADDR_SPACE_BITS != saddr >> 63) {
> @@ -137,7 +145,7 @@ static int get_physical_address(CPUAlphaState *env, target_ulong addr,
>         determine which KSEG is actually active.  */
>      if (saddr < 0 && ((saddr >> 41) & 3) == 2) {
>          /* User-space cannot access KSEG addresses.  */
> -        if (mmu_idx != MMU_KERNEL_IDX) {
> +        if (mmu_idx < MMU_KERNEL_IDX) {
>              goto exit;
>          }

I'm confused by this change. It's not the same test and when is mmu_idx ever
< 0?

>
> diff --git a/target-alpha/helper.h b/target-alpha/helper.h
> index c3d8a3e..004221d 100644
> --- a/target-alpha/helper.h
> +++ b/target-alpha/helper.h
> @@ -92,15 +92,6 @@ DEF_HELPER_FLAGS_2(ieee_input_cmp, TCG_CALL_NO_WG, void, env, i64)
>  DEF_HELPER_FLAGS_2(ieee_input_s, TCG_CALL_NO_WG, void, env, i64)
>
>  #if !defined (CONFIG_USER_ONLY)
> -DEF_HELPER_2(ldl_phys, i64, env, i64)
> -DEF_HELPER_2(ldq_phys, i64, env, i64)
> -DEF_HELPER_2(ldl_l_phys, i64, env, i64)
> -DEF_HELPER_2(ldq_l_phys, i64, env, i64)
> -DEF_HELPER_3(stl_phys, void, env, i64, i64)
> -DEF_HELPER_3(stq_phys, void, env, i64, i64)
> -DEF_HELPER_3(stl_c_phys, i64, env, i64, i64)
> -DEF_HELPER_3(stq_c_phys, i64, env, i64, i64)
> -
>  DEF_HELPER_FLAGS_1(tbia, TCG_CALL_NO_RWG, void, env)
>  DEF_HELPER_FLAGS_2(tbis, TCG_CALL_NO_RWG, void, env, i64)
>  DEF_HELPER_FLAGS_1(tb_flush, TCG_CALL_NO_RWG, void, env)
> diff --git a/target-alpha/mem_helper.c b/target-alpha/mem_helper.c
> index 1b2be50..78a7d45 100644
> --- a/target-alpha/mem_helper.c
> +++ b/target-alpha/mem_helper.c
> @@ -25,79 +25,6 @@
>
>  /* Softmmu support */
>  #ifndef CONFIG_USER_ONLY
> -
> -uint64_t helper_ldl_phys(CPUAlphaState *env, uint64_t p)
> -{
> -    CPUState *cs = CPU(alpha_env_get_cpu(env));
> -    return (int32_t)ldl_phys(cs->as, p);
> -}
> -
> -uint64_t helper_ldq_phys(CPUAlphaState *env, uint64_t p)
> -{
> -    CPUState *cs = CPU(alpha_env_get_cpu(env));
> -    return ldq_phys(cs->as, p);
> -}
> -
> -uint64_t helper_ldl_l_phys(CPUAlphaState *env, uint64_t p)
> -{
> -    CPUState *cs = CPU(alpha_env_get_cpu(env));
> -    env->lock_addr = p;
> -    return env->lock_value = (int32_t)ldl_phys(cs->as, p);
> -}
> -
> -uint64_t helper_ldq_l_phys(CPUAlphaState *env, uint64_t p)
> -{
> -    CPUState *cs = CPU(alpha_env_get_cpu(env));
> -    env->lock_addr = p;
> -    return env->lock_value = ldq_phys(cs->as, p);
> -}
> -
> -void helper_stl_phys(CPUAlphaState *env, uint64_t p, uint64_t v)
> -{
> -    CPUState *cs = CPU(alpha_env_get_cpu(env));
> -    stl_phys(cs->as, p, v);
> -}
> -
> -void helper_stq_phys(CPUAlphaState *env, uint64_t p, uint64_t v)
> -{
> -    CPUState *cs = CPU(alpha_env_get_cpu(env));
> -    stq_phys(cs->as, p, v);
> -}
> -
> -uint64_t helper_stl_c_phys(CPUAlphaState *env, uint64_t p, uint64_t v)
> -{
> -    CPUState *cs = CPU(alpha_env_get_cpu(env));
> -    uint64_t ret = 0;
> -
> -    if (p == env->lock_addr) {
> -        int32_t old = ldl_phys(cs->as, p);
> -        if (old == (int32_t)env->lock_value) {
> -            stl_phys(cs->as, p, v);
> -            ret = 1;
> -        }
> -    }
> -    env->lock_addr = -1;
> -
> -    return ret;
> -}
> -
> -uint64_t helper_stq_c_phys(CPUAlphaState *env, uint64_t p, uint64_t v)
> -{
> -    CPUState *cs = CPU(alpha_env_get_cpu(env));
> -    uint64_t ret = 0;
> -
> -    if (p == env->lock_addr) {
> -        uint64_t old = ldq_phys(cs->as, p);
> -        if (old == env->lock_value) {
> -            stq_phys(cs->as, p, v);
> -            ret = 1;
> -        }
> -    }
> -    env->lock_addr = -1;
> -
> -    return ret;
> -}
> -
>  void alpha_cpu_do_unaligned_access(CPUState *cs, vaddr addr,
>                                     MMUAccessType access_type,
>                                     int mmu_idx, uintptr_t retaddr)
> diff --git a/target-alpha/translate.c b/target-alpha/translate.c
> index 0ea0e6e..2941159 100644
> --- a/target-alpha/translate.c
> +++ b/target-alpha/translate.c
> @@ -392,7 +392,8 @@ static inline void gen_store_mem(DisasContext *ctx,
>  }
>
>  static ExitStatus gen_store_conditional(DisasContext *ctx, int ra, int rb,
> -                                        int32_t disp16, int quad)
> +                                        int32_t disp16, int mem_idx,
> +                                        TCGMemOp op)
>  {
>      TCGv addr;
>
> @@ -414,7 +415,7 @@ static ExitStatus gen_store_conditional(DisasContext *ctx, int ra, int rb,
>      /* ??? This is handled via a complicated version of compare-and-swap
>         in the cpu_loop.  Hopefully one day we'll have a real CAS opcode
>         in TCG so that this isn't necessary.  */
> -    return gen_excp(ctx, quad ? EXCP_STQ_C : EXCP_STL_C, ra);
> +    return gen_excp(ctx, (op & MO_SIZE) == MO_64 ? EXCP_STQ_C : EXCP_STL_C, ra);
>  #else
>      /* ??? In system mode we are never multi-threaded, so CAS can be
>         implemented via a non-atomic load-compare-store sequence.  */
> @@ -427,11 +428,10 @@ static ExitStatus gen_store_conditional(DisasContext *ctx, int ra, int rb,
>          tcg_gen_brcond_i64(TCG_COND_NE, addr, cpu_lock_addr, lab_fail);
>
>          val = tcg_temp_new();
> -        tcg_gen_qemu_ld_i64(val, addr, ctx->mem_idx, quad ? MO_LEQ : MO_LESL);
> +        tcg_gen_qemu_ld_i64(val, addr, mem_idx, op);
>          tcg_gen_brcond_i64(TCG_COND_NE, val, cpu_lock_value, lab_fail);
>
> -        tcg_gen_qemu_st_i64(ctx->ir[ra], addr, ctx->mem_idx,
> -                            quad ? MO_LEQ : MO_LEUL);
> +        tcg_gen_qemu_st_i64(ctx->ir[ra], addr, mem_idx, op);
>          tcg_gen_movi_i64(ctx->ir[ra], 1);
>          tcg_gen_br(lab_done);
>
> @@ -2423,19 +2423,19 @@ static ExitStatus translate_one(DisasContext *ctx, uint32_t insn)
>              switch ((insn >> 12) & 0xF) {
>              case 0x0:
>                  /* Longword physical access (hw_ldl/p) */
> -                gen_helper_ldl_phys(va, cpu_env, addr);
> +                tcg_gen_qemu_ld_i64(va, addr, MMU_PHYS_IDX, MO_LESL);
>                  break;
>              case 0x1:
>                  /* Quadword physical access (hw_ldq/p) */
> -                gen_helper_ldq_phys(va, cpu_env, addr);
> +                tcg_gen_qemu_ld_i64(va, addr, MMU_PHYS_IDX, MO_LEQ);
>                  break;
>              case 0x2:
>                  /* Longword physical access with lock (hw_ldl_l/p) */
> -                gen_helper_ldl_l_phys(va, cpu_env, addr);
> +                gen_qemu_ldl_l(va, addr, MMU_PHYS_IDX);
>                  break;
>              case 0x3:
>                  /* Quadword physical access with lock (hw_ldq_l/p) */
> -                gen_helper_ldq_l_phys(va, cpu_env, addr);
> +                gen_qemu_ldq_l(va, addr, MMU_PHYS_IDX);
>                  break;
>              case 0x4:
>                  /* Longword virtual PTE fetch (hw_ldl/v) */
> @@ -2674,27 +2674,34 @@ static ExitStatus translate_one(DisasContext *ctx, uint32_t insn)
>  #ifndef CONFIG_USER_ONLY
>          REQUIRE_TB_FLAG(TB_FLAGS_PAL_MODE);
>          {
> -            TCGv addr = tcg_temp_new();
> -            va = load_gpr(ctx, ra);
> -            vb = load_gpr(ctx, rb);
> -
> -            tcg_gen_addi_i64(addr, vb, disp12);
>              switch ((insn >> 12) & 0xF) {
>              case 0x0:
>                  /* Longword physical access */
> -                gen_helper_stl_phys(cpu_env, addr, va);
> +                va = load_gpr(ctx, ra);
> +                vb = load_gpr(ctx, rb);
> +                tmp = tcg_temp_new();
> +                tcg_gen_addi_i64(tmp, vb, disp12);
> +                tcg_gen_qemu_st_i64(va, tmp, MMU_PHYS_IDX, MO_LESL);
> +                tcg_temp_free(tmp);
>                  break;
>              case 0x1:
>                  /* Quadword physical access */
> -                gen_helper_stq_phys(cpu_env, addr, va);
> +                va = load_gpr(ctx, ra);
> +                vb = load_gpr(ctx, rb);
> +                tmp = tcg_temp_new();
> +                tcg_gen_addi_i64(tmp, vb, disp12);
> +                tcg_gen_qemu_st_i64(va, tmp, MMU_PHYS_IDX, MO_LEQ);
> +                tcg_temp_free(tmp);
>                  break;
>              case 0x2:
>                  /* Longword physical access with lock */
> -                gen_helper_stl_c_phys(dest_gpr(ctx, ra), cpu_env, addr, va);
> +                ret = gen_store_conditional(ctx, ra, rb, disp12,
> +                                            MMU_PHYS_IDX, MO_LESL);
>                  break;
>              case 0x3:
>                  /* Quadword physical access with lock */
> -                gen_helper_stq_c_phys(dest_gpr(ctx, ra), cpu_env, addr, va);
> +                ret = gen_store_conditional(ctx, ra, rb, disp12,
> +                                            MMU_PHYS_IDX, MO_LEQ);
>                  break;
>              case 0x4:
>                  /* Longword virtual access */
> @@ -2733,7 +2740,6 @@ static ExitStatus translate_one(DisasContext *ctx, uint32_t insn)
>                  /* Invalid */
>                  goto invalid_opc;
>              }
> -            tcg_temp_free(addr);
>              break;
>          }
>  #else
> @@ -2797,11 +2803,13 @@ static ExitStatus translate_one(DisasContext *ctx, uint32_t insn)
>          break;
>      case 0x2E:
>          /* STL_C */
> -        ret = gen_store_conditional(ctx, ra, rb, disp16, 0);
> +        ret = gen_store_conditional(ctx, ra, rb, disp16,
> +                                    ctx->mem_idx, MO_LESL);
>          break;
>      case 0x2F:
>          /* STQ_C */
> -        ret = gen_store_conditional(ctx, ra, rb, disp16, 1);
> +        ret = gen_store_conditional(ctx, ra, rb, disp16,
> +                                    ctx->mem_idx, MO_LEQ);
>          break;
>      case 0x30:
>          /* BR */


--
Alex Bennée

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Qemu-devel] [PATCH v3 34/34] target-alpha: Emulate LL/SC using cmpxchg helpers
  2016-09-03 20:40 ` [Qemu-devel] [PATCH v3 34/34] target-alpha: Emulate LL/SC using cmpxchg helpers Richard Henderson
@ 2016-09-15 14:38   ` Alex Bennée
  2016-09-15 16:48     ` Richard Henderson
  0 siblings, 1 reply; 97+ messages in thread
From: Alex Bennée @ 2016-09-15 14:38 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <rth@twiddle.net> writes:

> Emulating LL/SC with cmpxchg is not correct, since it can
> suffer from the ABA problem.  However, portable parallel
> code is writting assuming only cmpxchg which means that in
> practice this is a viable alternative.
>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  linux-user/main.c        |  49 ----------------------
>  target-alpha/cpu.h       |   4 --
>  target-alpha/helper.c    |   6 ---
>  target-alpha/machine.c   |   2 -
>  target-alpha/translate.c | 104 ++++++++++++++++++++---------------------------
>  5 files changed, 45 insertions(+), 120 deletions(-)
>
> diff --git a/linux-user/main.c b/linux-user/main.c
> index 64838bf..6cdd0da 100644
> --- a/linux-user/main.c
> +++ b/linux-user/main.c
> @@ -2994,51 +2994,6 @@ void cpu_loop(CPUM68KState *env)
>  #endif /* TARGET_M68K */
>
>  #ifdef TARGET_ALPHA
> -static void do_store_exclusive(CPUAlphaState *env, int reg, int quad)
> -{
> -    target_ulong addr, val, tmp;
> -    target_siginfo_t info;
> -    int ret = 0;
> -
> -    addr = env->lock_addr;
> -    tmp = env->lock_st_addr;
> -    env->lock_addr = -1;
> -    env->lock_st_addr = 0;
> -
> -    start_exclusive();
> -    mmap_lock();
> -
> -    if (addr == tmp) {
> -        if (quad ? get_user_s64(val, addr) : get_user_s32(val, addr)) {
> -            goto do_sigsegv;
> -        }
> -
> -        if (val == env->lock_value) {
> -            tmp = env->ir[reg];
> -            if (quad ? put_user_u64(tmp, addr) : put_user_u32(tmp, addr)) {
> -                goto do_sigsegv;
> -            }
> -            ret = 1;
> -        }
> -    }
> -    env->ir[reg] = ret;
> -    env->pc += 4;
> -
> -    mmap_unlock();
> -    end_exclusive();
> -    return;
> -
> - do_sigsegv:
> -    mmap_unlock();
> -    end_exclusive();
> -
> -    info.si_signo = TARGET_SIGSEGV;
> -    info.si_errno = 0;
> -    info.si_code = TARGET_SEGV_MAPERR;
> -    info._sifields._sigfault._addr = addr;
> -    queue_signal(env, TARGET_SIGSEGV, &info);
> -}
> -
>  void cpu_loop(CPUAlphaState *env)
>  {
>      CPUState *cs = CPU(alpha_env_get_cpu(env));
> @@ -3212,10 +3167,6 @@ void cpu_loop(CPUAlphaState *env)
>                  queue_signal(env, info.si_signo, &info);
>              }
>              break;
> -        case EXCP_STL_C:
> -        case EXCP_STQ_C:
> -            do_store_exclusive(env, env->error_code, trapnr - EXCP_STL_C);
> -            break;
>          case EXCP_INTERRUPT:
>              /* Just indicate that signals should be handled asap.  */
>              break;
> diff --git a/target-alpha/cpu.h b/target-alpha/cpu.h
> index 9d9489c..a4068c8 100644
> --- a/target-alpha/cpu.h
> +++ b/target-alpha/cpu.h
> @@ -230,7 +230,6 @@ struct CPUAlphaState {
>      uint64_t pc;
>      uint64_t unique;
>      uint64_t lock_addr;
> -    uint64_t lock_st_addr;
>      uint64_t lock_value;
>
>      /* The FPCR, and disassembled portions thereof.  */
> @@ -346,9 +345,6 @@ enum {
>      EXCP_ARITH,
>      EXCP_FEN,
>      EXCP_CALL_PAL,
> -    /* For Usermode emulation.  */
> -    EXCP_STL_C,
> -    EXCP_STQ_C,
>  };
>
>  /* Alpha-specific interrupt pending bits.  */
> diff --git a/target-alpha/helper.c b/target-alpha/helper.c
> index 1ed0725..95453f5 100644
> --- a/target-alpha/helper.c
> +++ b/target-alpha/helper.c
> @@ -306,12 +306,6 @@ void alpha_cpu_do_interrupt(CPUState *cs)
>          case EXCP_CALL_PAL:
>              name = "call_pal";
>              break;
> -        case EXCP_STL_C:
> -            name = "stl_c";
> -            break;
> -        case EXCP_STQ_C:
> -            name = "stq_c";
> -            break;
>          }
>          qemu_log("INT %6d: %s(%#x) pc=%016" PRIx64 " sp=%016" PRIx64 "\n",
>                   ++count, name, env->error_code, env->pc, env->ir[IR_SP]);
> diff --git a/target-alpha/machine.c b/target-alpha/machine.c
> index 710b783..b99a123 100644
> --- a/target-alpha/machine.c
> +++ b/target-alpha/machine.c
> @@ -45,8 +45,6 @@ static VMStateField vmstate_env_fields[] = {
>      VMSTATE_UINTTL(unique, CPUAlphaState),
>      VMSTATE_UINTTL(lock_addr, CPUAlphaState),
>      VMSTATE_UINTTL(lock_value, CPUAlphaState),
> -    /* Note that lock_st_addr is not saved; it is a temporary
> -       used during the execution of the st[lq]_c insns.  */
>
>      VMSTATE_UINT8(ps, CPUAlphaState),
>      VMSTATE_UINT8(intr_flag, CPUAlphaState),
> diff --git a/target-alpha/translate.c b/target-alpha/translate.c
> index 2941159..ed353de 100644
> --- a/target-alpha/translate.c
> +++ b/target-alpha/translate.c
> @@ -99,7 +99,6 @@ static TCGv cpu_std_ir[31];
>  static TCGv cpu_fir[31];
>  static TCGv cpu_pc;
>  static TCGv cpu_lock_addr;
> -static TCGv cpu_lock_st_addr;
>  static TCGv cpu_lock_value;
>
>  #ifndef CONFIG_USER_ONLY
> @@ -116,7 +115,6 @@ void alpha_translate_init(void)
>      static const GlobalVar vars[] = {
>          DEF_VAR(pc),
>          DEF_VAR(lock_addr),
> -        DEF_VAR(lock_st_addr),
>          DEF_VAR(lock_value),
>      };
>
> @@ -198,6 +196,23 @@ static TCGv dest_sink(DisasContext *ctx)
>      return ctx->sink;
>  }
>
> +static void free_context_temps(DisasContext *ctx)
> +{
> +    if (!TCGV_IS_UNUSED_I64(ctx->sink)) {
> +        tcg_gen_discard_i64(ctx->sink);
> +        tcg_temp_free(ctx->sink);
> +        TCGV_UNUSED_I64(ctx->sink);
> +    }
> +    if (!TCGV_IS_UNUSED_I64(ctx->zero)) {
> +        tcg_temp_free(ctx->zero);
> +        TCGV_UNUSED_I64(ctx->zero);
> +    }
> +    if (!TCGV_IS_UNUSED_I64(ctx->lit)) {
> +        tcg_temp_free(ctx->lit);
> +        TCGV_UNUSED_I64(ctx->lit);
> +    }
> +}
> +
>  static TCGv load_gpr(DisasContext *ctx, unsigned reg)
>  {
>      if (likely(reg < 31)) {
> @@ -395,56 +410,37 @@ static ExitStatus gen_store_conditional(DisasContext *ctx, int ra, int rb,
>                                          int32_t disp16, int mem_idx,
>                                          TCGMemOp op)
>  {
> -    TCGv addr;
> -
> -    if (ra == 31) {
> -        /* ??? Don't bother storing anything.  The user can't tell
> -           the difference, since the zero register always reads zero.  */
> -        return NO_EXIT;
> -    }
> -
> -#if defined(CONFIG_USER_ONLY)
> -    addr = cpu_lock_st_addr;
> -#else
> -    addr = tcg_temp_local_new();
> -#endif
> +    TCGLabel *lab_fail, *lab_done;
> +    TCGv addr, val;
>
> +    addr = tcg_temp_new_i64();
>      tcg_gen_addi_i64(addr, load_gpr(ctx, rb), disp16);
> +    free_context_temps(ctx);
>
> -#if defined(CONFIG_USER_ONLY)
> -    /* ??? This is handled via a complicated version of compare-and-swap
> -       in the cpu_loop.  Hopefully one day we'll have a real CAS opcode
> -       in TCG so that this isn't necessary.  */
> -    return gen_excp(ctx, (op & MO_SIZE) == MO_64 ? EXCP_STQ_C : EXCP_STL_C, ra);
> -#else
> -    /* ??? In system mode we are never multi-threaded, so CAS can be
> -       implemented via a non-atomic load-compare-store sequence.  */
> -    {
> -        TCGLabel *lab_fail, *lab_done;
> -        TCGv val;
> +    lab_fail = gen_new_label();
> +    lab_done = gen_new_label();
> +    tcg_gen_brcond_i64(TCG_COND_NE, addr, cpu_lock_addr, lab_fail);
> +    tcg_temp_free_i64(addr);
>
> -        lab_fail = gen_new_label();
> -        lab_done = gen_new_label();
> -        tcg_gen_brcond_i64(TCG_COND_NE, addr, cpu_lock_addr, lab_fail);
> +    val = tcg_temp_new_i64();
> +    tcg_gen_atomic_cmpxchg_i64(val, cpu_lock_addr, cpu_lock_value,
> +                               load_gpr(ctx, ra), mem_idx, op);
> +    free_context_temps(ctx);

I don't quite follow what free_context_temps() is doing and why it needs
to be done twice during the building of this instructions TCGOps?

>
> -        val = tcg_temp_new();
> -        tcg_gen_qemu_ld_i64(val, addr, mem_idx, op);
> -        tcg_gen_brcond_i64(TCG_COND_NE, val, cpu_lock_value, lab_fail);
> -
> -        tcg_gen_qemu_st_i64(ctx->ir[ra], addr, mem_idx, op);
> -        tcg_gen_movi_i64(ctx->ir[ra], 1);
> -        tcg_gen_br(lab_done);
> +    if (ra != 31) {
> +        tcg_gen_setcond_i64(TCG_COND_EQ, ctx->ir[ra], val, cpu_lock_value);
> +    }
> +    tcg_temp_free_i64(val);
> +    tcg_gen_br(lab_done);
>
> -        gen_set_label(lab_fail);
> +    gen_set_label(lab_fail);
> +    if (ra != 31) {
>          tcg_gen_movi_i64(ctx->ir[ra], 0);
> -
> -        gen_set_label(lab_done);
> -        tcg_gen_movi_i64(cpu_lock_addr, -1);
> -
> -        tcg_temp_free(addr);
> -        return NO_EXIT;
>      }
> -#endif
> +
> +    gen_set_label(lab_done);
> +    tcg_gen_movi_i64(cpu_lock_addr, -1);
> +    return NO_EXIT;
>  }
>
>  static bool in_superpage(DisasContext *ctx, int64_t addr)
> @@ -2914,6 +2910,10 @@ void gen_intermediate_code(CPUAlphaState *env, struct TranslationBlock *tb)
>      /* Similarly for flush-to-zero.  */
>      ctx.tb_ftz = -1;
>
> +    TCGV_UNUSED_I64(ctx.zero);
> +    TCGV_UNUSED_I64(ctx.sink);
> +    TCGV_UNUSED_I64(ctx.lit);
> +
>      num_insns = 0;
>      max_insns = tb->cflags & CF_COUNT_MASK;
>      if (max_insns == 0) {
> @@ -2948,23 +2948,9 @@ void gen_intermediate_code(CPUAlphaState *env, struct TranslationBlock *tb)
>          }
>          insn = cpu_ldl_code(env, ctx.pc);
>
> -        TCGV_UNUSED_I64(ctx.zero);
> -        TCGV_UNUSED_I64(ctx.sink);
> -        TCGV_UNUSED_I64(ctx.lit);
> -
>          ctx.pc += 4;
>          ret = translate_one(ctxp, insn);
> -
> -        if (!TCGV_IS_UNUSED_I64(ctx.sink)) {
> -            tcg_gen_discard_i64(ctx.sink);
> -            tcg_temp_free(ctx.sink);
> -        }
> -        if (!TCGV_IS_UNUSED_I64(ctx.zero)) {
> -            tcg_temp_free(ctx.zero);
> -        }
> -        if (!TCGV_IS_UNUSED_I64(ctx.lit)) {
> -            tcg_temp_free(ctx.lit);
> -        }
> +        free_context_temps(ctxp);
>
>          /* If we reach a page boundary, are single stepping,
>             or exhaust instruction count, stop generation.  */


--
Alex Bennée

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Qemu-devel] [PATCH v3 00/34] cmpxchg-based emulation of atomics
  2016-09-03 20:39 [Qemu-devel] [PATCH v3 00/34] cmpxchg-based emulation of atomics Richard Henderson
                   ` (36 preceding siblings ...)
  2016-09-09 18:33 ` Alex Bennée
@ 2016-09-15 14:39 ` Alex Bennée
  37 siblings, 0 replies; 97+ messages in thread
From: Alex Bennée @ 2016-09-15 14:39 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <rth@twiddle.net> writes:

> Changes since v2:
>   * Fix build for 32-bit host without 64-bit atomics.
>     Tested with --extra-cflags='-march=-i486'.
>     This is patch 15, which might ought to get folded back into
>     patch 13 for bisection, but is ugly for review.
>
>   * Move a lot of stuff out of softmmu_template.h, and into cputlb.c.
>     This is both for code size reduction and readability.
>
>   * Fold in Alex's test-int128 fix.

I'm done with my review pass now - I think we are looking pretty good
for merging soon. I look forward to the next version ;-)

>
>
> r~
>
>
> Emilio G. Cota (18):
>   atomics: add atomic_xor
>   atomics: add atomic_op_fetch variants
>   target-i386: emulate LOCK'ed cmpxchg using cmpxchg helpers
>   target-i386: emulate LOCK'ed OP instructions using atomic helpers
>   target-i386: emulate LOCK'ed INC using atomic helper
>   target-i386: emulate LOCK'ed NOT using atomic helper
>   target-i386: emulate LOCK'ed NEG using cmpxchg helper
>   target-i386: emulate LOCK'ed XADD using atomic helper
>   target-i386: emulate LOCK'ed BTX ops using atomic helpers
>   target-i386: emulate XCHG using atomic helper
>   target-i386: remove helper_lock()
>   tests: add atomic_add-bench
>   target-arm: emulate LL/SC using cmpxchg helpers
>   target-arm: emulate SWP with atomic_xchg helper
>   target-arm: emulate aarch64's LL/SC using cmpxchg helpers
>   linux-user: remove handling of ARM's EXCP_STREX
>   linux-user: remove handling of aarch64's EXCP_STREX
>   target-arm: remove EXCP_STREX + cpu_exclusive_{test, info}
>
> Richard Henderson (16):
>   exec: Avoid direct references to Int128 parts
>   int128: Use __int128 if available
>   int128: Add int128_make128
>   tcg: Add EXCP_ATOMIC
>   HACK: Always enable parallel_cpus
>   cputlb: Replace SHIFT with DATA_SIZE
>   cputlb: Move probe_write out of softmmu_template.h
>   cputlb: Remove includes from softmmu_template.h
>   cputlb: Move most of iotlb code out of line
>   cputlb: Tidy some macros
>   tcg: Add atomic helpers
>   tcg: Add atomic128 helpers
>   tcg: Add CONFIG_ATOMIC64
>   target-arm: Rearrange aa32 load and store functions
>   target-alpha: Introduce MMU_PHYS_IDX
>   target-alpha: Emulate LL/SC using cmpxchg helpers
>
>  Makefile.objs              |   1 -
>  Makefile.target            |   1 +
>  atomic_template.h          | 211 +++++++++++++++++++++++++
>  configure                  |  62 +++++++-
>  cpu-exec-common.c          |   6 +
>  cpu-exec.c                 |  23 +++
>  cpus.c                     |   6 +
>  cputlb.c                   | 203 ++++++++++++++++++++++--
>  exec.c                     |   4 +-
>  include/exec/cpu-all.h     |   1 +
>  include/exec/exec-all.h    |   1 +
>  include/qemu-common.h      |   1 +
>  include/qemu/atomic.h      |  42 ++++-
>  include/qemu/int128.h      | 171 +++++++++++++++++++-
>  linux-user/main.c          | 326 +++++++-------------------------------
>  softmmu_template.h         | 104 ++----------
>  target-alpha/cpu.h         |  22 +--
>  target-alpha/helper.c      |  16 +-
>  target-alpha/helper.h      |   9 --
>  target-alpha/machine.c     |   2 -
>  target-alpha/mem_helper.c  |  73 ---------
>  target-alpha/translate.c   | 148 +++++++++--------
>  target-arm/cpu.h           |  17 +-
>  target-arm/helper-a64.c    | 113 +++++++++++++
>  target-arm/helper-a64.h    |   2 +
>  target-arm/internals.h     |   4 +-
>  target-arm/translate-a64.c | 106 ++++++-------
>  target-arm/translate.c     | 342 ++++++++++++++-------------------------
>  target-arm/translate.h     |   4 -
>  target-i386/helper.h       |   4 +-
>  target-i386/mem_helper.c   | 153 ++++++++++++------
>  target-i386/translate.c    | 386 +++++++++++++++++++++++++++++----------------
>  tcg-runtime.c              |  74 +++++++--
>  tcg/tcg-op.c               | 342 +++++++++++++++++++++++++++++++++++++++
>  tcg/tcg-op.h               |  44 ++++++
>  tcg/tcg-runtime.h          | 109 +++++++++++++
>  tcg/tcg.h                  |  85 ++++++++++
>  tests/.gitignore           |   1 +
>  tests/Makefile.include     |   4 +-
>  tests/atomic_add-bench.c   | 180 +++++++++++++++++++++
>  tests/test-int128.c        |  22 +--
>  translate-all.c            |   1 +
>  42 files changed, 2346 insertions(+), 1080 deletions(-)
>  create mode 100644 atomic_template.h
>  create mode 100644 tests/atomic_add-bench.c


--
Alex Bennée

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Qemu-devel] [PATCH v3 33/34] target-alpha: Introduce MMU_PHYS_IDX
  2016-09-15 10:10   ` Alex Bennée
@ 2016-09-15 16:38     ` Richard Henderson
  0 siblings, 0 replies; 97+ messages in thread
From: Richard Henderson @ 2016-09-15 16:38 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel

On 09/15/2016 03:10 AM, Alex Bennée wrote:
>>          /* User-space cannot access KSEG addresses.  */
>> > -        if (mmu_idx != MMU_KERNEL_IDX) {
>> > +        if (mmu_idx < MMU_KERNEL_IDX) {
>> >              goto exit;
>> >          }
> I'm confused by this change. It's not the same test and when is mmu_idx ever
> < 0?
>

Huh.  I don't know what happened there.  It's clearly wrong...


r~

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Qemu-devel] [PATCH v3 34/34] target-alpha: Emulate LL/SC using cmpxchg helpers
  2016-09-15 14:38   ` Alex Bennée
@ 2016-09-15 16:48     ` Richard Henderson
  2016-09-15 17:48       ` Alex Bennée
  0 siblings, 1 reply; 97+ messages in thread
From: Richard Henderson @ 2016-09-15 16:48 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel

On 09/15/2016 07:38 AM, Alex Bennée wrote:
>> +    lab_fail = gen_new_label();
>> > +    lab_done = gen_new_label();
>> > +    tcg_gen_brcond_i64(TCG_COND_NE, addr, cpu_lock_addr, lab_fail);
>> > +    tcg_temp_free_i64(addr);
>> >
>> > -        lab_fail = gen_new_label();
>> > -        lab_done = gen_new_label();
>> > -        tcg_gen_brcond_i64(TCG_COND_NE, addr, cpu_lock_addr, lab_fail);
>> > +    val = tcg_temp_new_i64();
>> > +    tcg_gen_atomic_cmpxchg_i64(val, cpu_lock_addr, cpu_lock_value,
>> > +                               load_gpr(ctx, ra), mem_idx, op);
>> > +    free_context_temps(ctx);
> I don't quite follow what free_context_temps() is doing and why it needs
> to be done twice during the building of this instructions TCGOps?
>

It's releasing the zero, sink, imm TCGv temporaries from DisasContext, as 
they're going out of scope within each basic block.  Since they're freed, 
they'll be re-created as necessary in the next basic block.

The two (unlikely) relevant events are: (1) address register = zero register, 
so the zero temp becomes invalid, and (2) dest register = zero register, so the 
zero/sink temps become invalid.

It's the kind of thing that really shouldn't ever happen for non-malicious 
code, but you don't want qemu to abort when invalid temps are used.  Perhaps a 
comment is warranted here...


r~

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Qemu-devel] [PATCH v3 34/34] target-alpha: Emulate LL/SC using cmpxchg helpers
  2016-09-15 16:48     ` Richard Henderson
@ 2016-09-15 17:48       ` Alex Bennée
  2016-09-15 18:28         ` Richard Henderson
  0 siblings, 1 reply; 97+ messages in thread
From: Alex Bennée @ 2016-09-15 17:48 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <rth@twiddle.net> writes:

> On 09/15/2016 07:38 AM, Alex Bennée wrote:
>>> +    lab_fail = gen_new_label();
>>> > +    lab_done = gen_new_label();
>>> > +    tcg_gen_brcond_i64(TCG_COND_NE, addr, cpu_lock_addr, lab_fail);
>>> > +    tcg_temp_free_i64(addr);
>>> >
>>> > -        lab_fail = gen_new_label();
>>> > -        lab_done = gen_new_label();
>>> > -        tcg_gen_brcond_i64(TCG_COND_NE, addr, cpu_lock_addr, lab_fail);
>>> > +    val = tcg_temp_new_i64();
>>> > +    tcg_gen_atomic_cmpxchg_i64(val, cpu_lock_addr, cpu_lock_value,
>>> > +                               load_gpr(ctx, ra), mem_idx, op);
>>> > +    free_context_temps(ctx);
>> I don't quite follow what free_context_temps() is doing and why it needs
>> to be done twice during the building of this instructions TCGOps?
>>
>
> It's releasing the zero, sink, imm TCGv temporaries from DisasContext, as
> they're going out of scope within each basic block.  Since they're freed,
> they'll be re-created as necessary in the next basic block.
>
> The two (unlikely) relevant events are: (1) address register = zero register,
> so the zero temp becomes invalid, and (2) dest register = zero register, so the
> zero/sink temps become invalid.

Ahh ok. The ARM code just allocates temps on demand including for its
zero register which I guess means multiple ones could be assigned. There
is also temp allocation logic to free them at the end of the block.

I wonder if this is something that should be more of a feature of the
core TCG code because let's face it temp tracking is a pain.

>
> It's the kind of thing that really shouldn't ever happen for non-malicious
> code, but you don't want qemu to abort when invalid temps are used.  Perhaps a
> comment is warranted here...

Does the code generator suffer if temps aren't freed as soon as
possible?  Is there a reason why we don't just do a general free all
temps at the end of the block?

--
Alex Bennée

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Qemu-devel] [PATCH v3 34/34] target-alpha: Emulate LL/SC using cmpxchg helpers
  2016-09-15 17:48       ` Alex Bennée
@ 2016-09-15 18:28         ` Richard Henderson
  0 siblings, 0 replies; 97+ messages in thread
From: Richard Henderson @ 2016-09-15 18:28 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel

On 09/15/2016 10:48 AM, Alex Bennée wrote:
> Ahh ok. The ARM code just allocates temps on demand including for its
> zero register which I guess means multiple ones could be assigned. There
> is also temp allocation logic to free them at the end of the block.
>
> I wonder if this is something that should be more of a feature of the
> core TCG code because let's face it temp tracking is a pain.

Yep, sparc and s390 have a similar bit of temp tracking.  It would be nice for 
that to be generic.


> Does the code generator suffer if temps aren't freed as soon as
> possible?  Is there a reason why we don't just do a general free all
> temps at the end of the block?

Freeing temps as quickly as possible means that we minimize the number of temps 
required, and the register allocator has a couple of O(N**2) places, for N == 
number of temps.


r~

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Qemu-devel] [PATCH v3 27/34] target-arm: emulate LL/SC using cmpxchg helpers
  2016-09-14 16:38     ` Richard Henderson
@ 2016-10-20 17:51       ` Pranith Kumar
  2016-10-20 18:00         ` Richard Henderson
  0 siblings, 1 reply; 97+ messages in thread
From: Pranith Kumar @ 2016-10-20 17:51 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Alex Bennée, Emilio G. Cota, qemu-devel

On Wed, Sep 14, 2016 at 12:38 PM, Richard Henderson <rth@twiddle.net> wrote:
> On 09/14/2016 09:03 AM, Alex Bennée wrote:
>>> > -/* Load/Store exclusive instructions are implemented by remembering
>>> > -   the value/address loaded, and seeing if these are the same
>>> > -   when the store is performed. This should be sufficient to implement
>>> > -   the architecturally mandated semantics, and avoids having to monitor
>>> > -   regular stores.
>>> > -
>>> > -   In system emulation mode only one CPU will be running at once, so
>>> > -   this sequence is effectively atomic.  In user emulation mode we
>>> > -   throw an exception and handle the atomic operation elsewhere.  */
>> At least half of this comment is still relevant although it could be
>> tweaked to mention that we use an atomic cmpxchg for the store that will
>> fail if exlusive_val doesn't match the current state.
>>
>
> Added back
>
> /* Load/Store exclusive instructions are implemented by remembering
>    the value/address loaded, and seeing if these are the same
>    when the store is performed.  This should be sufficient to implement
>    the architecturally mandated semantics, and avoids having to monitor
>    regular stores.  The compare vs the remembered value is done during
>    the cmpxchg operation, but we must compare the addresses manually.  */
>

FYI, I do not see this in your v7 series.

--
Pranith

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Qemu-devel] [PATCH v3 27/34] target-arm: emulate LL/SC using cmpxchg helpers
  2016-10-20 17:51       ` Pranith Kumar
@ 2016-10-20 18:00         ` Richard Henderson
  2016-10-20 18:58           ` Pranith Kumar
  0 siblings, 1 reply; 97+ messages in thread
From: Richard Henderson @ 2016-10-20 18:00 UTC (permalink / raw)
  To: Pranith Kumar; +Cc: Alex Bennée, Emilio G. Cota, qemu-devel

On 10/20/2016 10:51 AM, Pranith Kumar wrote:
>> Added back
>>
>> /* Load/Store exclusive instructions are implemented by remembering
>>    the value/address loaded, and seeing if these are the same
>>    when the store is performed.  This should be sufficient to implement
>>    the architecturally mandated semantics, and avoids having to monitor
>>    regular stores.  The compare vs the remembered value is done during
>>    the cmpxchg operation, but we must compare the addresses manually.  */
>>
>
> FYI, I do not see this in your v7 series.

I do.  It's in patch 28:

  /* Load/Store exclusive instructions are implemented by remembering
     the value/address loaded, and seeing if these are the same
-   when the store is performed. This should be sufficient to implement
+   when the store is performed.  This should be sufficient to implement
     the architecturally mandated semantics, and avoids having to monitor
-   regular stores.
-
-   In system emulation mode only one CPU will be running at once, so
-   this sequence is effectively atomic.  In user emulation mode we
-   throw an exception and handle the atomic operation elsewhere.  */
+   regular stores.  The compare vs the remembered value is done during
+   the cmpxchg operation, but we must compare the addresses manually.  */


r~

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Qemu-devel] [PATCH v3 27/34] target-arm: emulate LL/SC using cmpxchg helpers
  2016-10-20 18:00         ` Richard Henderson
@ 2016-10-20 18:58           ` Pranith Kumar
  2016-10-20 19:02             ` Richard Henderson
  0 siblings, 1 reply; 97+ messages in thread
From: Pranith Kumar @ 2016-10-20 18:58 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Alex Bennée, Emilio G. Cota, qemu-devel

On Thu, Oct 20, 2016 at 2:00 PM, Richard Henderson <rth@twiddle.net> wrote:
> On 10/20/2016 10:51 AM, Pranith Kumar wrote:
>>>
>>> Added back
>>>
>>> /* Load/Store exclusive instructions are implemented by remembering
>>>    the value/address loaded, and seeing if these are the same
>>>    when the store is performed.  This should be sufficient to implement
>>>    the architecturally mandated semantics, and avoids having to monitor
>>>    regular stores.  The compare vs the remembered value is done during
>>>    the cmpxchg operation, but we must compare the addresses manually.  */
>>>
>>
>> FYI, I do not see this in your v7 series.
>
>
> I do.  It's in patch 28:
>
>  /* Load/Store exclusive instructions are implemented by remembering
>     the value/address loaded, and seeing if these are the same
> -   when the store is performed. This should be sufficient to implement
> +   when the store is performed.  This should be sufficient to implement
>     the architecturally mandated semantics, and avoids having to monitor
> -   regular stores.
> -
> -   In system emulation mode only one CPU will be running at once, so
> -   this sequence is effectively atomic.  In user emulation mode we
> -   throw an exception and handle the atomic operation elsewhere.  */
> +   regular stores.  The compare vs the remembered value is done during
> +   the cmpxchg operation, but we must compare the addresses manually.  */
>

Indeed, I was looking at atomic-6 on github. atomic-7 is not there yet :)

Thanks,
--
Pranith

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Qemu-devel] [PATCH v3 27/34] target-arm: emulate LL/SC using cmpxchg helpers
  2016-10-20 18:58           ` Pranith Kumar
@ 2016-10-20 19:02             ` Richard Henderson
  2016-10-20 19:07               ` Pranith Kumar
  0 siblings, 1 reply; 97+ messages in thread
From: Richard Henderson @ 2016-10-20 19:02 UTC (permalink / raw)
  To: Pranith Kumar; +Cc: Alex Bennée, Emilio G. Cota, qemu-devel

On 10/20/2016 11:58 AM, Pranith Kumar wrote:
> Indeed, I was looking at atomic-6 on github. atomic-7 is not there yet :)

I've rebased atomic-6 (no -7).  It should be there.


r~

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Qemu-devel] [PATCH v3 27/34] target-arm: emulate LL/SC using cmpxchg helpers
  2016-10-20 19:02             ` Richard Henderson
@ 2016-10-20 19:07               ` Pranith Kumar
  2016-10-21  4:34                 ` Richard Henderson
  0 siblings, 1 reply; 97+ messages in thread
From: Pranith Kumar @ 2016-10-20 19:07 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Alex Bennée, Emilio G. Cota, qemu-devel

On Thu, Oct 20, 2016 at 3:02 PM, Richard Henderson <rth@twiddle.net> wrote:
> On 10/20/2016 11:58 AM, Pranith Kumar wrote:
>>
>> Indeed, I was looking at atomic-6 on github. atomic-7 is not there yet :)
>
>
> I've rebased atomic-6 (no -7).  It should be there.
>

Am I looking at the right file?

https://github.com/rth7680/qemu/commit/07e50f163edc8a03d13cbfaf8f410a089892e071#diff-242dde3ce6effe2e3e92fc010eda7cb1L1842

--
Pranith

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Qemu-devel] [PATCH v3 27/34] target-arm: emulate LL/SC using cmpxchg helpers
  2016-10-20 19:07               ` Pranith Kumar
@ 2016-10-21  4:34                 ` Richard Henderson
  0 siblings, 0 replies; 97+ messages in thread
From: Richard Henderson @ 2016-10-21  4:34 UTC (permalink / raw)
  To: Pranith Kumar; +Cc: Alex Bennée, Emilio G. Cota, qemu-devel

On 10/20/2016 12:07 PM, Pranith Kumar wrote:
> On Thu, Oct 20, 2016 at 3:02 PM, Richard Henderson <rth@twiddle.net> wrote:
>> On 10/20/2016 11:58 AM, Pranith Kumar wrote:
>>>
>>> Indeed, I was looking at atomic-6 on github. atomic-7 is not there yet :)
>>
>>
>> I've rebased atomic-6 (no -7).  It should be there.
>>
>
> Am I looking at the right file?
>
> https://github.com/rth7680/qemu/commit/07e50f163edc8a03d13cbfaf8f410a089892e071#diff-242dde3ce6effe2e3e92fc010eda7cb1L1842
>

https://github.com/rth7680/qemu/commit/36795a98743bf884c8b9479fd82a840c1cf04e86


r~

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Qemu-devel] [PATCH v3 13/34] tcg: Add atomic helpers
  2016-09-13 18:00     ` Richard Henderson
@ 2017-03-24 10:14       ` Nikunj A Dadhania
  2017-03-24 10:58         ` Alex Bennée
  0 siblings, 1 reply; 97+ messages in thread
From: Nikunj A Dadhania @ 2017-03-24 10:14 UTC (permalink / raw)
  To: Richard Henderson, Alex Bennée; +Cc: qemu-devel

Richard Henderson <rth@twiddle.net> writes:

> On 09/12/2016 06:47 AM, Alex Bennée wrote:
>>> > +    /* Notice an IO access, or a notdirty page.  */
>>> > +    if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
>>> > +        /* There's really nothing that can be done to
>>> > +           support this apart from stop-the-world.  */
>>> > +        goto stop_the_world;
>> We are also triggering on TLB_NOTDIRTY here in the case where a
>> conditional write is the first write to a page. I don't know if a
>> stop_the_world is required at this point but we will need to ensure we
>> clear bits as notdirty_mem_write() does.
>> 
>
> You're quite right that we could probably special-case TLB_NOTDIRTY here such
> that (1) we needn't leave the cpu loop, and (2) needn't utilize the actual
> "write" part of notdirty_mem_write; just set the bits then fall through to the
> actual atomic instruction below.

I do hit this case with ppc64, where I see that its the first write to
the page and it exits from this every time, causing the kernel to print
soft-lockups.

Can we add the special case here for NOTDIRTY and set the page as dirty
and return successfully?

Regards
Nikunj

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Qemu-devel] [PATCH v3 13/34] tcg: Add atomic helpers
  2017-03-24 10:14       ` Nikunj A Dadhania
@ 2017-03-24 10:58         ` Alex Bennée
  2017-03-24 17:27           ` Nikunj A Dadhania
  2017-03-27 11:56           ` Nikunj A Dadhania
  0 siblings, 2 replies; 97+ messages in thread
From: Alex Bennée @ 2017-03-24 10:58 UTC (permalink / raw)
  To: Nikunj A Dadhania; +Cc: Richard Henderson, qemu-devel


Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> writes:

> Richard Henderson <rth@twiddle.net> writes:
>
>> On 09/12/2016 06:47 AM, Alex Bennée wrote:
>>>> > +    /* Notice an IO access, or a notdirty page.  */
>>>> > +    if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
>>>> > +        /* There's really nothing that can be done to
>>>> > +           support this apart from stop-the-world.  */
>>>> > +        goto stop_the_world;
>>> We are also triggering on TLB_NOTDIRTY here in the case where a
>>> conditional write is the first write to a page. I don't know if a
>>> stop_the_world is required at this point but we will need to ensure we
>>> clear bits as notdirty_mem_write() does.
>>>
>>
>> You're quite right that we could probably special-case TLB_NOTDIRTY here such
>> that (1) we needn't leave the cpu loop, and (2) needn't utilize the actual
>> "write" part of notdirty_mem_write; just set the bits then fall through to the
>> actual atomic instruction below.
>
> I do hit this case with ppc64, where I see that its the first write to
> the page and it exits from this every time, causing the kernel to print
> soft-lockups.
>
> Can we add the special case here for NOTDIRTY and set the page as dirty
> and return successfully?

Does the atomic step fall-back not work for you?

--
Alex Bennée

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Qemu-devel] [PATCH v3 13/34] tcg: Add atomic helpers
  2017-03-24 10:58         ` Alex Bennée
@ 2017-03-24 17:27           ` Nikunj A Dadhania
  2017-03-27 11:56           ` Nikunj A Dadhania
  1 sibling, 0 replies; 97+ messages in thread
From: Nikunj A Dadhania @ 2017-03-24 17:27 UTC (permalink / raw)
  To: Alex Bennée; +Cc: Richard Henderson, qemu-devel

Alex Bennée <alex.bennee@linaro.org> writes:

> Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> writes:
>
>> Richard Henderson <rth@twiddle.net> writes:
>>
>>> On 09/12/2016 06:47 AM, Alex Bennée wrote:
>>>>> > +    /* Notice an IO access, or a notdirty page.  */
>>>>> > +    if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
>>>>> > +        /* There's really nothing that can be done to
>>>>> > +           support this apart from stop-the-world.  */
>>>>> > +        goto stop_the_world;
>>>> We are also triggering on TLB_NOTDIRTY here in the case where a
>>>> conditional write is the first write to a page. I don't know if a
>>>> stop_the_world is required at this point but we will need to ensure we
>>>> clear bits as notdirty_mem_write() does.
>>>>
>>>
>>> You're quite right that we could probably special-case TLB_NOTDIRTY here such
>>> that (1) we needn't leave the cpu loop, and (2) needn't utilize the actual
>>> "write" part of notdirty_mem_write; just set the bits then fall through to the
>>> actual atomic instruction below.
>>
>> I do hit this case with ppc64, where I see that its the first write to
>> the page and it exits from this every time, causing the kernel to print
>> soft-lockups.
>>
>> Can we add the special case here for NOTDIRTY and set the page as dirty
>> and return successfully?
>
> Does the atomic step fall-back not work for you?

No, it doesn't seem so, otherwise I shouldn't have seen soft-lockups.

Regards
Nikunj

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Qemu-devel] [PATCH v3 13/34] tcg: Add atomic helpers
  2017-03-24 10:58         ` Alex Bennée
  2017-03-24 17:27           ` Nikunj A Dadhania
@ 2017-03-27 11:56           ` Nikunj A Dadhania
  1 sibling, 0 replies; 97+ messages in thread
From: Nikunj A Dadhania @ 2017-03-27 11:56 UTC (permalink / raw)
  To: Alex Bennée; +Cc: Richard Henderson, qemu-devel

Alex Bennée <alex.bennee@linaro.org> writes:

> Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> writes:
>
>> Richard Henderson <rth@twiddle.net> writes:
>>
>>> On 09/12/2016 06:47 AM, Alex Bennée wrote:
>>>>> > +    /* Notice an IO access, or a notdirty page.  */
>>>>> > +    if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
>>>>> > +        /* There's really nothing that can be done to
>>>>> > +           support this apart from stop-the-world.  */
>>>>> > +        goto stop_the_world;
>>>> We are also triggering on TLB_NOTDIRTY here in the case where a
>>>> conditional write is the first write to a page. I don't know if a
>>>> stop_the_world is required at this point but we will need to ensure we
>>>> clear bits as notdirty_mem_write() does.
>>>>
>>>
>>> You're quite right that we could probably special-case TLB_NOTDIRTY here such
>>> that (1) we needn't leave the cpu loop, and (2) needn't utilize the actual
>>> "write" part of notdirty_mem_write; just set the bits then fall through to the
>>> actual atomic instruction below.
>>
>> I do hit this case with ppc64, where I see that its the first write to
>> the page and it exits from this every time, causing the kernel to print
>> soft-lockups.
>>
>> Can we add the special case here for NOTDIRTY and set the page as dirty
>> and return successfully?
>
> Does the atomic step fall-back not work for you?

Looked further and I do see that EXCP_ATOMIC does get executed in
qemu_tcg_cpu_thread_fn(), but I am not sure what is going wrong there.

Following snippet fixes the issue for me:

diff --git a/cputlb.c b/cputlb.c
index f5d056c..743776a 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -930,7 +930,13 @@ static void *atomic_mmu_lookup(CPUArchState *env, target_ulong addr,
         tlb_addr = tlbe->addr_write;
     }
 
-    /* Notice an IO access, or a notdirty page.  */
+    /* Check notdirty */
+    if (unlikely(tlb_addr & TLB_NOTDIRTY)) {
+        tlb_set_dirty(ENV_GET_CPU(env), addr);
+        tlb_addr = tlb_addr & ~TLB_NOTDIRTY;
+    }
+
+    /* Notice an IO access  */
     if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
         /* There's really nothing that can be done to
            support this apart from stop-the-world.  */

Regards
Nikunj

^ permalink raw reply related	[flat|nested] 97+ messages in thread

end of thread, other threads:[~2017-03-27 11:56 UTC | newest]

Thread overview: 97+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-03 20:39 [Qemu-devel] [PATCH v3 00/34] cmpxchg-based emulation of atomics Richard Henderson
2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 01/34] atomics: add atomic_xor Richard Henderson
2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 02/34] atomics: add atomic_op_fetch variants Richard Henderson
2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 03/34] exec: Avoid direct references to Int128 parts Richard Henderson
2016-09-09 17:14   ` Alex Bennée
2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 04/34] int128: Use __int128 if available Richard Henderson
2016-09-09 17:19   ` Alex Bennée
2016-09-09 17:38     ` Richard Henderson
2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 05/34] int128: Add int128_make128 Richard Henderson
2016-09-09 13:01   ` Leon Alrae
2016-09-09 20:16     ` Richard Henderson
2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 06/34] tcg: Add EXCP_ATOMIC Richard Henderson
2016-09-12 14:16   ` Alex Bennée
2016-09-12 20:19     ` Richard Henderson
2016-09-13  6:42       ` Alex Bennée
2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 07/34] HACK: Always enable parallel_cpus Richard Henderson
2016-09-12 14:20   ` Alex Bennée
2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 08/34] cputlb: Replace SHIFT with DATA_SIZE Richard Henderson
2016-09-12 14:22   ` Alex Bennée
2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 09/34] cputlb: Move probe_write out of softmmu_template.h Richard Henderson
2016-09-12 14:35   ` Alex Bennée
2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 10/34] cputlb: Remove includes from softmmu_template.h Richard Henderson
2016-09-12 14:38   ` Alex Bennée
2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 11/34] cputlb: Move most of iotlb code out of line Richard Henderson
2016-09-12 15:26   ` Alex Bennée
2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 12/34] cputlb: Tidy some macros Richard Henderson
2016-09-12 15:28   ` Alex Bennée
2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 13/34] tcg: Add atomic helpers Richard Henderson
2016-09-09 13:11   ` Leon Alrae
2016-09-09 14:46   ` Leon Alrae
2016-09-09 16:26     ` Richard Henderson
2016-09-12  7:59       ` Leon Alrae
2016-09-12 16:13         ` Richard Henderson
2016-09-13 12:32           ` Leon Alrae
2016-09-12 13:47   ` Alex Bennée
2016-09-13 18:00     ` Richard Henderson
2017-03-24 10:14       ` Nikunj A Dadhania
2017-03-24 10:58         ` Alex Bennée
2017-03-24 17:27           ` Nikunj A Dadhania
2017-03-27 11:56           ` Nikunj A Dadhania
2016-09-13 17:06   ` Alex Bennée
2016-09-13 17:26     ` Richard Henderson
2016-09-13 18:45       ` Alex Bennée
2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 14/34] tcg: Add atomic128 helpers Richard Henderson
2016-09-13 11:18   ` Alex Bennée
2016-09-13 14:18     ` Richard Henderson
2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 15/34] tcg: Add CONFIG_ATOMIC64 Richard Henderson
2016-09-14 10:12   ` Alex Bennée
2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 16/34] target-i386: emulate LOCK'ed cmpxchg using cmpxchg helpers Richard Henderson
2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 17/34] target-i386: emulate LOCK'ed OP instructions using atomic helpers Richard Henderson
2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 18/34] target-i386: emulate LOCK'ed INC using atomic helper Richard Henderson
2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 19/34] target-i386: emulate LOCK'ed NOT " Richard Henderson
2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 20/34] target-i386: emulate LOCK'ed NEG using cmpxchg helper Richard Henderson
2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 21/34] target-i386: emulate LOCK'ed XADD using atomic helper Richard Henderson
2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 22/34] target-i386: emulate LOCK'ed BTX ops using atomic helpers Richard Henderson
2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 23/34] target-i386: emulate XCHG using atomic helper Richard Henderson
2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 24/34] target-i386: remove helper_lock() Richard Henderson
2016-09-14 11:14   ` Alex Bennée
2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 25/34] tests: add atomic_add-bench Richard Henderson
2016-09-14 13:53   ` Alex Bennée
2016-09-15  2:23     ` Emilio G. Cota
2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 26/34] target-arm: Rearrange aa32 load and store functions Richard Henderson
2016-09-14 15:58   ` Alex Bennée
2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 27/34] target-arm: emulate LL/SC using cmpxchg helpers Richard Henderson
2016-09-14 16:03   ` Alex Bennée
2016-09-14 16:38     ` Richard Henderson
2016-10-20 17:51       ` Pranith Kumar
2016-10-20 18:00         ` Richard Henderson
2016-10-20 18:58           ` Pranith Kumar
2016-10-20 19:02             ` Richard Henderson
2016-10-20 19:07               ` Pranith Kumar
2016-10-21  4:34                 ` Richard Henderson
2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 28/34] target-arm: emulate SWP with atomic_xchg helper Richard Henderson
2016-09-14 16:05   ` Alex Bennée
2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 29/34] target-arm: emulate aarch64's LL/SC using cmpxchg helpers Richard Henderson
2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 30/34] linux-user: remove handling of ARM's EXCP_STREX Richard Henderson
2016-09-15  9:36   ` Alex Bennée
2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 31/34] linux-user: remove handling of aarch64's EXCP_STREX Richard Henderson
2016-09-15  9:36   ` Alex Bennée
2016-09-03 20:40 ` [Qemu-devel] [PATCH v3 32/34] target-arm: remove EXCP_STREX + cpu_exclusive_{test, info} Richard Henderson
2016-09-15  9:39   ` Alex Bennée
2016-09-03 20:40 ` [Qemu-devel] [PATCH v3 33/34] target-alpha: Introduce MMU_PHYS_IDX Richard Henderson
2016-09-15 10:10   ` Alex Bennée
2016-09-15 16:38     ` Richard Henderson
2016-09-03 20:40 ` [Qemu-devel] [PATCH v3 34/34] target-alpha: Emulate LL/SC using cmpxchg helpers Richard Henderson
2016-09-15 14:38   ` Alex Bennée
2016-09-15 16:48     ` Richard Henderson
2016-09-15 17:48       ` Alex Bennée
2016-09-15 18:28         ` Richard Henderson
2016-09-03 21:25 ` [Qemu-devel] [PATCH v3 00/34] cmpxchg-based emulation of atomics no-reply
2016-09-03 21:26 ` no-reply
2016-09-09 18:33 ` Alex Bennée
2016-09-09 19:07   ` Richard Henderson
2016-09-09 19:29     ` Alex Bennée
2016-09-09 20:03       ` Richard Henderson
2016-09-09 20:11       ` Richard Henderson
2016-09-15 14:39 ` Alex Bennée

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.