All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v1 0/5] ThreadSanitizer support
@ 2016-01-28 10:15 Alex Bennée
  2016-01-28 10:15 ` [Qemu-devel] [PATCH v1 1/5] configure: introduce --extra-libs Alex Bennée
                   ` (4 more replies)
  0 siblings, 5 replies; 27+ messages in thread
From: Alex Bennée @ 2016-01-28 10:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: mttcg, peter.maydell, mark.burton, a.rigo, stefanha, pbonzini,
	Alex Bennée, fred.konrad

Hi,

Here is a proper V1 of the ThreadSanitizer patches I posted earlier
this week. While my motivation is to have additional tools when stress
testing MTTCG these should still be generally useful.

I've dropped the CFLAGS ordering patch which while useful for applying
debug flags isn't needed for this series. However during
experimentation is turns out various different invocations are needed
for enabling the thread sanitizer depending on your GCC version and
distro packaging of it. As a result there are now two configure
patches, one for --extra-ldflags and one that introduces --extra-libs.
Examples are included in the patches.

To build on my work machine (Ubuntu 14.04 LTS) I installed the
latest/greatest GCC PPA and configured with:

    ./configure ${TARGET_LIST} --cc=gcc-5 --cxx=g++-5 \
  --extra-cflags="-fsanitize=thread" --extra-libs="-ltsan" \
  --with-coroutine=gthread

The third patch ensure we use the __atomic builtins for all atomic
functions if we have it available. This is needed so the
ThreadSanitizer can properly instrument the code but we should use it
anyway for consistency. The code generation for atomic accesses on x86
it unaffected. I've gone through it with Paolo's review comments and
cleaned it up considerably.

The final two patches fix up some warnings the sanitizer threw up. I
haven't completed a fully clean make check run yet as building with
the gthread coroutines seems to currently break "make check" in
master. However the default coroutine implementation is sufficiently
funky to confuse tsan.

Additional change comments are included bellow the -- in the other
patches.

Alex Bennée (5):
  configure: introduce --extra-libs
  configure: ensure ldflags propagated to config_host
  include/qemu/atomic.h: default to __atomic functions
  async.c: various atomic fixes for tsan
  thread-pool: atomic fixes from tsan

 Makefile                 |   4 +-
 async.c                  |  12 ++--
 configure                |  15 +++-
 include/qemu/atomic.h    | 178 +++++++++++++++++++++++++++++++----------------
 tests/test-thread-pool.c |   8 +--
 thread-pool.c            |  12 ++--
 6 files changed, 148 insertions(+), 81 deletions(-)

-- 
2.7.0

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH v1 1/5] configure: introduce --extra-libs
  2016-01-28 10:15 [Qemu-devel] [PATCH v1 0/5] ThreadSanitizer support Alex Bennée
@ 2016-01-28 10:15 ` Alex Bennée
  2016-01-28 11:14   ` Paolo Bonzini
  2016-01-28 10:15 ` [Qemu-devel] [PATCH v1 2/5] configure: ensure ldflags propagated to config_host Alex Bennée
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 27+ messages in thread
From: Alex Bennée @ 2016-01-28 10:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: mttcg, peter.maydell, mark.burton, a.rigo, stefanha, pbonzini,
	Alex Bennée, fred.konrad

If for example you want to use the thread sanitizer you want to ensure all
binaries are linked with the library:

  ./configure ${TARGETS} --cc=gcc-5 --cxx=g++-5 \
    --extra-cflags="-fsanitize=thread" --extra-libs="-ltsan"

This is more explicit than just specifying --extra-ldflags which does
not get applied at the end of the linker string.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
v1
  - rebase
  - also ensure it is applied to test builds
---
 configure | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/configure b/configure
index 44ac9ab..bd29ba7 100755
--- a/configure
+++ b/configure
@@ -113,7 +113,7 @@ compile_object() {
 compile_prog() {
   local_cflags="$1"
   local_ldflags="$2"
-  do_cc $QEMU_CFLAGS $local_cflags -o $TMPE $TMPC $LDFLAGS $local_ldflags
+  do_cc $QEMU_CFLAGS $local_cflags -o $TMPE $TMPC $LDFLAGS $local_ldflags $EXTRA_LIBS
 }
 
 do_libtool() {
@@ -366,6 +366,8 @@ for opt do
   --extra-ldflags=*) LDFLAGS="$LDFLAGS $optarg"
                      EXTRA_LDFLAGS="$optarg"
   ;;
+  --extra-libs=*)    EXTRA_LIBS="$optarg"
+  ;;
   --enable-debug-info) debug_info="yes"
   ;;
   --disable-debug-info) debug_info="no"
@@ -786,6 +788,8 @@ for opt do
   ;;
   --extra-ldflags=*)
   ;;
+  --extra-libs=*)
+  ;;
   --enable-debug-info)
   ;;
   --disable-debug-info)
@@ -1282,6 +1286,7 @@ Advanced options (experts only):
   --objcc=OBJCC            use Objective-C compiler OBJCC [$objcc]
   --extra-cflags=CFLAGS    append extra C compiler flags QEMU_CFLAGS
   --extra-ldflags=LDFLAGS  append extra linker flags LDFLAGS
+  --extra-libs=LIBS        append extra libraries when linking
   --make=MAKE              use specified make [$make]
   --install=INSTALL        use specified install [$install]
   --python=PYTHON          use specified python [$python]
@@ -4715,6 +4720,11 @@ fi
 QEMU_CFLAGS="$pixman_cflags $fdt_cflags $QEMU_CFLAGS"
 libs_softmmu="$pixman_libs $libs_softmmu"
 
+# extra-libs
+LIBS="$LIBS $EXTRA_LIBS"
+libs_softmmu="$libs_softmmu $EXTRA_LIBS"
+libs_qga="$libs_qga $EXTRA_LIBS"
+
 echo "Install prefix    $prefix"
 echo "BIOS directory    `eval echo $qemu_datadir`"
 echo "binary directory  `eval echo $bindir`"
@@ -4885,6 +4895,7 @@ fi
 echo "qemu_helperdir=$libexecdir" >> $config_host_mak
 echo "extra_cflags=$EXTRA_CFLAGS" >> $config_host_mak
 echo "extra_ldflags=$EXTRA_LDFLAGS" >> $config_host_mak
+echo "extra_libs=$EXTRA_LIBS" >> $config_host_mak
 echo "qemu_localedir=$qemu_localedir" >> $config_host_mak
 echo "libs_softmmu=$libs_softmmu" >> $config_host_mak
 
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH v1 2/5] configure: ensure ldflags propagated to config_host
  2016-01-28 10:15 [Qemu-devel] [PATCH v1 0/5] ThreadSanitizer support Alex Bennée
  2016-01-28 10:15 ` [Qemu-devel] [PATCH v1 1/5] configure: introduce --extra-libs Alex Bennée
@ 2016-01-28 10:15 ` Alex Bennée
  2016-01-28 11:10   ` Paolo Bonzini
  2016-01-28 11:13   ` Paolo Bonzini
  2016-01-28 10:15 ` [Qemu-devel] [PATCH v1 3/5] include/qemu/atomic.h: default to __atomic functions Alex Bennée
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 27+ messages in thread
From: Alex Bennée @ 2016-01-28 10:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: mttcg, peter.maydell, mark.burton, a.rigo, stefanha, pbonzini,
	Alex Bennée, fred.konrad

If for example you want to use the thread sanitizer you want to ensure all
binaries have the extra linker flags applied:

  ./configure ${TARGETS} --cc=gcc-5 --cxx=g++-5 \
    --extra-cflags="-fsanitize=thread" --extra-ldflags="-fsanitize=thread"

Or possibly:

  ./configure ${TARGETS} --extra-cflags="-fsanitize=thread -fPIC" \
    --extra-ldflags="-fsanitize=thread -pie"

Unlike CFLAGS we don't have a QEMU_LDFLAGS which would be appropriate
if we only ever needed to affect ${TARGET} binaries.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
v1
  - new for v1
  - --extra-ldflags alternative to -ltsan
  - yet another example invocation (gcc 4.9/Gentoo)
---
 Makefile  | 4 ++--
 configure | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/Makefile b/Makefile
index d0de2d4..d30532f 100644
--- a/Makefile
+++ b/Makefile
@@ -329,9 +329,9 @@ qemu-ga: qemu-ga$(EXESUF) $(QGA_VSS_PROVIDER) $(QEMU_GA_MSI)
 endif
 
 ivshmem-client$(EXESUF): $(ivshmem-client-obj-y)
-	$(call LINK, $^)
+	$(call LINK, $^, $(ldflags))
 ivshmem-server$(EXESUF): $(ivshmem-server-obj-y) libqemuutil.a libqemustub.a
-	$(call LINK, $^)
+	$(call LINK, $^, $(ldflags))
 
 clean:
 # avoid old build problems by removing potentially incorrect old files
diff --git a/configure b/configure
index bd29ba7..148b79a 100755
--- a/configure
+++ b/configure
@@ -5871,7 +5871,7 @@ if test "$target_linux_user" = "yes" -o "$target_bsd_user" = "yes" ; then
   ldflags="$ldflags $textseg_ldflags"
 fi
 
-echo "LDFLAGS+=$ldflags" >> $config_target_mak
+echo "LDFLAGS+=$ldflags" >> $config_host_mak
 echo "QEMU_CFLAGS+=$cflags" >> $config_target_mak
 
 done # for target in $targets
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH v1 3/5] include/qemu/atomic.h: default to __atomic functions
  2016-01-28 10:15 [Qemu-devel] [PATCH v1 0/5] ThreadSanitizer support Alex Bennée
  2016-01-28 10:15 ` [Qemu-devel] [PATCH v1 1/5] configure: introduce --extra-libs Alex Bennée
  2016-01-28 10:15 ` [Qemu-devel] [PATCH v1 2/5] configure: ensure ldflags propagated to config_host Alex Bennée
@ 2016-01-28 10:15 ` Alex Bennée
  2016-01-28 11:00   ` Paolo Bonzini
                     ` (2 more replies)
  2016-01-28 10:15 ` [Qemu-devel] [PATCH v1 4/5] async.c: various atomic fixes for tsan Alex Bennée
  2016-01-28 10:15 ` [Qemu-devel] [PATCH v1 5/5] thread-pool: atomic fixes from tsan Alex Bennée
  4 siblings, 3 replies; 27+ messages in thread
From: Alex Bennée @ 2016-01-28 10:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: mttcg, peter.maydell, mark.burton, a.rigo, stefanha, pbonzini,
	Alex Bennée, fred.konrad

The __atomic primitives have been available since GCC 4.7 and provide
a richer interface for describing memory ordering requirements. As a
bonus by using the primitives instead of hand-rolled functions we can
use tools such as the AddressSanitizer which need the use of well
defined APIs for its analysis.

If we have __ATOMIC defines we exclusively use the __atomic primitives
for all our atomic access. Otherwise we fall back to the mixture of
__sync and hand-rolled barrier cases.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---

v2 - review updates
 - add reference to doc/atomics.txt at top of file
 - ensure smb_mb() is full __ATOMIC_SEQ_CST
 - make atomic_read/set __ATOMIC_RELAXED
 - make atomic_rcu_read/set __ATOMIC_CONSUME/RELEASE
 - make atomic_mb_red/set __ATOMIC_RELAXED + explicit barriers
 - move some comments about __ATOMIC no longer relevant to bottom half
 - rm un-needed ifdef/ifndefs
 - in cmpxchg return old instead of ptr
 - extra comments on old style volatile atomic access
---
 include/qemu/atomic.h | 178 +++++++++++++++++++++++++++++++++-----------------
 1 file changed, 117 insertions(+), 61 deletions(-)

diff --git a/include/qemu/atomic.h b/include/qemu/atomic.h
index bd2c075..ec3208a 100644
--- a/include/qemu/atomic.h
+++ b/include/qemu/atomic.h
@@ -8,6 +8,8 @@
  * This work is licensed under the terms of the GNU GPL, version 2 or later.
  * See the COPYING file in the top-level directory.
  *
+ * See docs/atomics.txt for discussion about the guarantees each
+ * atomic primitive is meant to provide.
  */
 
 #ifndef __QEMU_ATOMIC_H
@@ -15,12 +17,116 @@
 
 #include "qemu/compiler.h"
 
-/* For C11 atomic ops */
 
 /* Compiler barrier */
 #define barrier()   ({ asm volatile("" ::: "memory"); (void)0; })
 
-#ifndef __ATOMIC_RELAXED
+#ifdef __ATOMIC_RELAXED
+/* For C11 atomic ops */
+
+/* Manual memory barriers
+ *
+ *__atomic_thread_fence does not include a compiler barrier; instead,
+ * the barrier is part of __atomic_load/__atomic_store's "volatile-like"
+ * semantics. If smp_wmb() is a no-op, absence of the barrier means that
+ * the compiler is free to reorder stores on each side of the barrier.
+ * Add one here, and similarly in smp_rmb() and smp_read_barrier_depends().
+ */
+
+#define smp_mb()    ({ barrier(); __atomic_thread_fence(__ATOMIC_SEQ_CST); barrier(); })
+#define smp_wmb()   ({ barrier(); __atomic_thread_fence(__ATOMIC_RELEASE); barrier(); })
+#define smp_rmb()   ({ barrier(); __atomic_thread_fence(__ATOMIC_ACQUIRE); barrier(); })
+
+#define smp_read_barrier_depends() ({ barrier(); __atomic_thread_fence(__ATOMIC_CONSUME); barrier(); })
+
+/* Weak atomic operations prevent the compiler moving other
+ * loads/stores past the atomic operation load/store. However there is
+ * no explicit memory barrier for the processor.
+ */
+#define atomic_read(ptr)                          \
+    ({                                            \
+    typeof(*ptr) _val;                            \
+     __atomic_load(ptr, &_val, __ATOMIC_RELAXED); \
+    _val;                                         \
+    })
+
+#define atomic_set(ptr, i)  do {                  \
+    typeof(*ptr) _val = (i);                      \
+    __atomic_store(ptr, &_val, __ATOMIC_RELAXED); \
+} while(0)
+
+/* Atomic RCU operations imply weak memory barriers */
+
+#define atomic_rcu_read(ptr)                      \
+    ({                                            \
+    typeof(*ptr) _val;                            \
+     __atomic_load(ptr, &_val, __ATOMIC_CONSUME); \
+    _val;                                         \
+    })
+
+#define atomic_rcu_set(ptr, i)  do {                    \
+    typeof(*ptr) _val = (i);                            \
+    __atomic_store(ptr, &_val, __ATOMIC_RELEASE);       \
+} while(0)
+
+/* atomic_mb_read/set semantics map Java volatile variables. They are
+ * less expensive on some platforms (notably POWER & ARM) than fully
+ * sequentially consistent operations.
+ *
+ * As long as they are used as paired operations they are safe to
+ * use. See docs/atomic.txt for more discussion.
+ */
+
+#define atomic_mb_read(ptr)                             \
+    ({                                                  \
+    typeof(*ptr) _val;                                  \
+     __atomic_load(ptr, &_val, __ATOMIC_RELAXED);       \
+     smp_rmb();                                         \
+    _val;                                               \
+    })
+
+#define atomic_mb_set(ptr, i)  do {                     \
+    typeof(*ptr) _val = (i);                            \
+    smp_wmb();                                          \
+    __atomic_store(ptr, &_val, __ATOMIC_RELAXED);       \
+    smp_mb();                                           \
+} while(0)
+
+
+/* All the remaining operations are fully sequentially consistent */
+
+#define atomic_xchg(ptr, i)    ({                           \
+    typeof(*ptr) _new = (i), _old;                          \
+    __atomic_exchange(ptr, &_new, &_old, __ATOMIC_SEQ_CST); \
+    _old;                                                   \
+})
+
+/* Returns the eventual value, failed or not */
+#define atomic_cmpxchg(ptr, old, new)                                   \
+    ({                                                                  \
+    typeof(*ptr) _old = (old), _new = (new);                            \
+    __atomic_compare_exchange(ptr, &_old, &_new, false,                 \
+                              __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST);      \
+    _old; /* can this race if cmpxchg not used elsewhere? */            \
+    })
+
+/* Provide shorter names for GCC atomic builtins, return old value */
+#define atomic_fetch_inc(ptr)  __atomic_fetch_add(ptr, 1, __ATOMIC_SEQ_CST)
+#define atomic_fetch_dec(ptr)  __atomic_fetch_sub(ptr, 1, __ATOMIC_SEQ_CST)
+#define atomic_fetch_add(ptr, n) __atomic_fetch_add(ptr, n, __ATOMIC_SEQ_CST)
+#define atomic_fetch_sub(ptr, n) __atomic_fetch_sub(ptr, n, __ATOMIC_SEQ_CST)
+#define atomic_fetch_and(ptr, n) __atomic_fetch_and(ptr, n, __ATOMIC_SEQ_CST)
+#define atomic_fetch_or(ptr, n)  __atomic_fetch_or(ptr, n, __ATOMIC_SEQ_CST)
+
+/* And even shorter names that return void.  */
+#define atomic_inc(ptr)    ((void) __atomic_fetch_add(ptr, 1, __ATOMIC_SEQ_CST))
+#define atomic_dec(ptr)    ((void) __atomic_fetch_sub(ptr, 1, __ATOMIC_SEQ_CST))
+#define atomic_add(ptr, n) ((void) __atomic_fetch_add(ptr, n, __ATOMIC_SEQ_CST))
+#define atomic_sub(ptr, n) ((void) __atomic_fetch_sub(ptr, n, __ATOMIC_SEQ_CST))
+#define atomic_and(ptr, n) ((void) __atomic_fetch_and(ptr, n, __ATOMIC_SEQ_CST))
+#define atomic_or(ptr, n)  ((void) __atomic_fetch_or(ptr, n, __ATOMIC_SEQ_CST))
+
+#else /* __ATOMIC_RELAXED */
 
 /*
  * We use GCC builtin if it's available, as that can use mfence on
@@ -85,8 +191,6 @@
 
 #endif /* _ARCH_PPC */
 
-#endif /* C11 atomics */
-
 /*
  * For (host) platforms we don't have explicit barrier definitions
  * for, we use the gcc __sync_synchronize() primitive to generate a
@@ -98,42 +202,22 @@
 #endif
 
 #ifndef smp_wmb
-#ifdef __ATOMIC_RELEASE
-/* __atomic_thread_fence does not include a compiler barrier; instead,
- * the barrier is part of __atomic_load/__atomic_store's "volatile-like"
- * semantics. If smp_wmb() is a no-op, absence of the barrier means that
- * the compiler is free to reorder stores on each side of the barrier.
- * Add one here, and similarly in smp_rmb() and smp_read_barrier_depends().
- */
-#define smp_wmb()   ({ barrier(); __atomic_thread_fence(__ATOMIC_RELEASE); barrier(); })
-#else
 #define smp_wmb()   __sync_synchronize()
 #endif
-#endif
 
 #ifndef smp_rmb
-#ifdef __ATOMIC_ACQUIRE
-#define smp_rmb()   ({ barrier(); __atomic_thread_fence(__ATOMIC_ACQUIRE); barrier(); })
-#else
 #define smp_rmb()   __sync_synchronize()
 #endif
-#endif
 
 #ifndef smp_read_barrier_depends
-#ifdef __ATOMIC_CONSUME
-#define smp_read_barrier_depends()   ({ barrier(); __atomic_thread_fence(__ATOMIC_CONSUME); barrier(); })
-#else
 #define smp_read_barrier_depends()   barrier()
 #endif
-#endif
 
-#ifndef atomic_read
+/* These will only be atomic if the processor does the fetch or store
+ * in a single issue memory operation
+ */
 #define atomic_read(ptr)       (*(__typeof__(*ptr) volatile*) (ptr))
-#endif
-
-#ifndef atomic_set
 #define atomic_set(ptr, i)     ((*(__typeof__(*ptr) volatile*) (ptr)) = (i))
-#endif
 
 /**
  * atomic_rcu_read - reads a RCU-protected pointer to a local variable
@@ -146,30 +230,18 @@
  * Inserts memory barriers on architectures that require them (currently only
  * Alpha) and documents which pointers are protected by RCU.
  *
- * Unless the __ATOMIC_CONSUME memory order is available, atomic_rcu_read also
- * includes a compiler barrier to ensure that value-speculative optimizations
- * (e.g. VSS: Value Speculation Scheduling) does not perform the data read
- * before the pointer read by speculating the value of the pointer.  On new
- * enough compilers, atomic_load takes care of such concern about
- * dependency-breaking optimizations.
+ * atomic_rcu_read also includes a compiler barrier to ensure that
+ * value-speculative optimizations (e.g. VSS: Value Speculation
+ * Scheduling) does not perform the data read before the pointer read
+ * by speculating the value of the pointer.
  *
  * Should match atomic_rcu_set(), atomic_xchg(), atomic_cmpxchg().
  */
-#ifndef atomic_rcu_read
-#ifdef __ATOMIC_CONSUME
-#define atomic_rcu_read(ptr)    ({                \
-    typeof(*ptr) _val;                            \
-     __atomic_load(ptr, &_val, __ATOMIC_CONSUME); \
-    _val;                                         \
-})
-#else
 #define atomic_rcu_read(ptr)    ({                \
     typeof(*ptr) _val = atomic_read(ptr);         \
     smp_read_barrier_depends();                   \
     _val;                                         \
 })
-#endif
-#endif
 
 /**
  * atomic_rcu_set - assigns (publicizes) a pointer to a new data structure
@@ -182,19 +254,10 @@
  *
  * Should match atomic_rcu_read().
  */
-#ifndef atomic_rcu_set
-#ifdef __ATOMIC_RELEASE
-#define atomic_rcu_set(ptr, i)  do {              \
-    typeof(*ptr) _val = (i);                      \
-    __atomic_store(ptr, &_val, __ATOMIC_RELEASE); \
-} while(0)
-#else
 #define atomic_rcu_set(ptr, i)  do {              \
     smp_wmb();                                    \
     atomic_set(ptr, i);                           \
 } while (0)
-#endif
-#endif
 
 /* These have the same semantics as Java volatile variables.
  * See http://gee.cs.oswego.edu/dl/jmm/cookbook.html:
@@ -218,13 +281,11 @@
  * (see docs/atomics.txt), and I'm not sure that __ATOMIC_ACQ_REL is enough.
  * Just always use the barriers manually by the rules above.
  */
-#ifndef atomic_mb_read
 #define atomic_mb_read(ptr)    ({           \
     typeof(*ptr) _val = atomic_read(ptr);   \
     smp_rmb();                              \
     _val;                                   \
 })
-#endif
 
 #ifndef atomic_mb_set
 #define atomic_mb_set(ptr, i)  do {         \
@@ -237,12 +298,6 @@
 #ifndef atomic_xchg
 #if defined(__clang__)
 #define atomic_xchg(ptr, i)    __sync_swap(ptr, i)
-#elif defined(__ATOMIC_SEQ_CST)
-#define atomic_xchg(ptr, i)    ({                           \
-    typeof(*ptr) _new = (i), _old;                          \
-    __atomic_exchange(ptr, &_new, &_old, __ATOMIC_SEQ_CST); \
-    _old;                                                   \
-})
 #else
 /* __sync_lock_test_and_set() is documented to be an acquire barrier only.  */
 #define atomic_xchg(ptr, i)    (smp_mb(), __sync_lock_test_and_set(ptr, i))
@@ -266,4 +321,5 @@
 #define atomic_and(ptr, n)     ((void) __sync_fetch_and_and(ptr, n))
 #define atomic_or(ptr, n)      ((void) __sync_fetch_and_or(ptr, n))
 
-#endif
+#endif /* __ATOMIC_RELAXED */
+#endif /* __QEMU_ATOMIC_H */
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH v1 4/5] async.c: various atomic fixes for tsan
  2016-01-28 10:15 [Qemu-devel] [PATCH v1 0/5] ThreadSanitizer support Alex Bennée
                   ` (2 preceding siblings ...)
  2016-01-28 10:15 ` [Qemu-devel] [PATCH v1 3/5] include/qemu/atomic.h: default to __atomic functions Alex Bennée
@ 2016-01-28 10:15 ` Alex Bennée
  2016-01-28 10:45   ` Paolo Bonzini
  2016-01-28 10:15 ` [Qemu-devel] [PATCH v1 5/5] thread-pool: atomic fixes from tsan Alex Bennée
  4 siblings, 1 reply; 27+ messages in thread
From: Alex Bennée @ 2016-01-28 10:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: mttcg, peter.maydell, open list:Block I/O path, mark.burton,
	a.rigo, stefanha, pbonzini, Alex Bennée, fred.konrad

Running "make check" under ThreadSanitizer pointed to a number of
potential races.

  - use atomic_mb_read to check bh->schedule
  - use atomic_set/read primitives to manipulate bh->idle

The bh->idle changes are mostly for the benefit of explicit marking for
tsan but the code in non-sanitised builds should be the same.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 async.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/async.c b/async.c
index e106072..a9cb9c3 100644
--- a/async.c
+++ b/async.c
@@ -85,10 +85,10 @@ int aio_bh_poll(AioContext *ctx)
          */
         if (!bh->deleted && atomic_xchg(&bh->scheduled, 0)) {
             /* Idle BHs and the notify BH don't count as progress */
-            if (!bh->idle && bh != ctx->notify_dummy_bh) {
+            if (!atomic_read(&bh->idle) && bh != ctx->notify_dummy_bh) {
                 ret = 1;
             }
-            bh->idle = 0;
+            atomic_set(&bh->idle, 0);
             aio_bh_call(bh);
         }
     }
@@ -128,7 +128,7 @@ void qemu_bh_schedule(QEMUBH *bh)
     AioContext *ctx;
 
     ctx = bh->ctx;
-    bh->idle = 0;
+    atomic_set(&bh->idle, 0);
     /* The memory barrier implicit in atomic_xchg makes sure that:
      * 1. idle & any writes needed by the callback are done before the
      *    locations are read in the aio_bh_poll.
@@ -165,8 +165,8 @@ aio_compute_timeout(AioContext *ctx)
     QEMUBH *bh;
 
     for (bh = ctx->first_bh; bh; bh = bh->next) {
-        if (!bh->deleted && bh->scheduled) {
-            if (bh->idle) {
+        if (!bh->deleted && atomic_mb_read(&bh->scheduled)) {
+            if (atomic_read(&bh->idle)) {
                 /* idle bottom halves will be polled at least
                  * every 10ms */
                 timeout = 10000000;
@@ -286,7 +286,7 @@ void aio_notify(AioContext *ctx)
      * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll.
      */
     smp_mb();
-    if (ctx->notify_me) {
+    if (atomic_read(&ctx->notify_me)) {
         event_notifier_set(&ctx->notifier);
         atomic_mb_set(&ctx->notified, true);
     }
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH v1 5/5] thread-pool: atomic fixes from tsan
  2016-01-28 10:15 [Qemu-devel] [PATCH v1 0/5] ThreadSanitizer support Alex Bennée
                   ` (3 preceding siblings ...)
  2016-01-28 10:15 ` [Qemu-devel] [PATCH v1 4/5] async.c: various atomic fixes for tsan Alex Bennée
@ 2016-01-28 10:15 ` Alex Bennée
  2016-01-28 10:44   ` Paolo Bonzini
  4 siblings, 1 reply; 27+ messages in thread
From: Alex Bennée @ 2016-01-28 10:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: mttcg, peter.maydell, mark.burton, a.rigo, stefanha, pbonzini,
	Alex Bennée, fred.konrad

Mark changes in thread pool state as explicitly atomic. Also in the
test-thread-pool code make accesses to data.n explicitly atomic.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 tests/test-thread-pool.c |  8 ++++----
 thread-pool.c            | 12 ++++++------
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/tests/test-thread-pool.c b/tests/test-thread-pool.c
index ccdee39..f51e284 100644
--- a/tests/test-thread-pool.c
+++ b/tests/test-thread-pool.c
@@ -46,10 +46,10 @@ static void test_submit(void)
 {
     WorkerTestData data = { .n = 0 };
     thread_pool_submit(pool, worker_cb, &data);
-    while (data.n == 0) {
+    while (atomic_read(&data.n) == 0) {
         aio_poll(ctx, true);
     }
-    g_assert_cmpint(data.n, ==, 1);
+    g_assert_cmpint(atomic_read(&data.n), ==, 1);
 }
 
 static void test_submit_aio(void)
@@ -128,7 +128,7 @@ static void test_submit_many(void)
         aio_poll(ctx, true);
     }
     for (i = 0; i < 100; i++) {
-        g_assert_cmpint(data[i].n, ==, 1);
+        g_assert_cmpint(atomic_read(&data[i].n), ==, 1);
         g_assert_cmpint(data[i].ret, ==, 0);
     }
 }
@@ -183,7 +183,7 @@ static void do_test_cancel(bool sync)
     g_assert_cmpint(num_canceled, <, 100);
 
     for (i = 0; i < 100; i++) {
-        if (data[i].aiocb && data[i].n != 3) {
+        if (data[i].aiocb && (atomic_read(&data[i].n) != 3)) {
             if (sync) {
                 /* Canceling the others will be a blocking operation.  */
                 bdrv_aio_cancel(data[i].aiocb);
diff --git a/thread-pool.c b/thread-pool.c
index 402c778..431a6fb 100644
--- a/thread-pool.c
+++ b/thread-pool.c
@@ -99,15 +99,15 @@ static void *worker_thread(void *opaque)
 
         req = QTAILQ_FIRST(&pool->request_list);
         QTAILQ_REMOVE(&pool->request_list, req, reqs);
-        req->state = THREAD_ACTIVE;
+        atomic_set(&req->state, THREAD_ACTIVE);
         qemu_mutex_unlock(&pool->lock);
 
         ret = req->func(req->arg);
 
-        req->ret = ret;
+        atomic_set(&req->ret, ret);
         /* Write ret before state.  */
         smp_wmb();
-        req->state = THREAD_DONE;
+        atomic_set(&req->state, THREAD_DONE);
 
         qemu_mutex_lock(&pool->lock);
 
@@ -167,7 +167,7 @@ static void thread_pool_completion_bh(void *opaque)
 
 restart:
     QLIST_FOREACH_SAFE(elem, &pool->head, all, next) {
-        if (elem->state != THREAD_DONE) {
+        if (atomic_read(&elem->state) != THREAD_DONE) {
             continue;
         }
 
@@ -184,7 +184,7 @@ restart:
              */
             qemu_bh_schedule(pool->completion_bh);
 
-            elem->common.cb(elem->common.opaque, elem->ret);
+            elem->common.cb(elem->common.opaque, atomic_read(&elem->ret));
             qemu_aio_unref(elem);
             goto restart;
         } else {
@@ -201,7 +201,7 @@ static void thread_pool_cancel(BlockAIOCB *acb)
     trace_thread_pool_cancel(elem, elem->common.opaque);
 
     qemu_mutex_lock(&pool->lock);
-    if (elem->state == THREAD_QUEUED &&
+    if (atomic_read(&elem->state) == THREAD_QUEUED &&
         /* No thread has yet started working on elem. we can try to "steal"
          * the item from the worker if we can get a signal from the
          * semaphore.  Because this is non-blocking, we can do it with
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v1 5/5] thread-pool: atomic fixes from tsan
  2016-01-28 10:15 ` [Qemu-devel] [PATCH v1 5/5] thread-pool: atomic fixes from tsan Alex Bennée
@ 2016-01-28 10:44   ` Paolo Bonzini
  0 siblings, 0 replies; 27+ messages in thread
From: Paolo Bonzini @ 2016-01-28 10:44 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel
  Cc: mttcg, peter.maydell, qemu block, mark.burton, a.rigo, stefanha,
	fred.konrad

On 28/01/2016 11:15, Alex Bennée wrote:
> Mark changes in thread pool state as explicitly atomic. Also in the
> test-thread-pool code make accesses to data.n explicitly atomic.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>  tests/test-thread-pool.c |  8 ++++----
>  thread-pool.c            | 12 ++++++------
>  2 files changed, 10 insertions(+), 10 deletions(-)

Acked-by: Paolo Bonzini <pbonzini@redhat.com>

> diff --git a/tests/test-thread-pool.c b/tests/test-thread-pool.c
> index ccdee39..f51e284 100644
> --- a/tests/test-thread-pool.c
> +++ b/tests/test-thread-pool.c
> @@ -46,10 +46,10 @@ static void test_submit(void)
>  {
>      WorkerTestData data = { .n = 0 };
>      thread_pool_submit(pool, worker_cb, &data);
> -    while (data.n == 0) {
> +    while (atomic_read(&data.n) == 0) {
>          aio_poll(ctx, true);
>      }
> -    g_assert_cmpint(data.n, ==, 1);
> +    g_assert_cmpint(atomic_read(&data.n), ==, 1);
>  }
>  
>  static void test_submit_aio(void)
> @@ -128,7 +128,7 @@ static void test_submit_many(void)
>          aio_poll(ctx, true);
>      }
>      for (i = 0; i < 100; i++) {
> -        g_assert_cmpint(data[i].n, ==, 1);
> +        g_assert_cmpint(atomic_read(&data[i].n), ==, 1);
>          g_assert_cmpint(data[i].ret, ==, 0);
>      }
>  }
> @@ -183,7 +183,7 @@ static void do_test_cancel(bool sync)
>      g_assert_cmpint(num_canceled, <, 100);
>  
>      for (i = 0; i < 100; i++) {
> -        if (data[i].aiocb && data[i].n != 3) {
> +        if (data[i].aiocb && (atomic_read(&data[i].n) != 3)) {
>              if (sync) {
>                  /* Canceling the others will be a blocking operation.  */
>                  bdrv_aio_cancel(data[i].aiocb);
> diff --git a/thread-pool.c b/thread-pool.c
> index 402c778..431a6fb 100644
> --- a/thread-pool.c
> +++ b/thread-pool.c
> @@ -99,15 +99,15 @@ static void *worker_thread(void *opaque)
>  
>          req = QTAILQ_FIRST(&pool->request_list);
>          QTAILQ_REMOVE(&pool->request_list, req, reqs);
> -        req->state = THREAD_ACTIVE;
> +        atomic_set(&req->state, THREAD_ACTIVE);
>          qemu_mutex_unlock(&pool->lock);
>  
>          ret = req->func(req->arg);
>  
> -        req->ret = ret;
> +        atomic_set(&req->ret, ret);
>          /* Write ret before state.  */
>          smp_wmb();
> -        req->state = THREAD_DONE;
> +        atomic_set(&req->state, THREAD_DONE);
>  
>          qemu_mutex_lock(&pool->lock);
>  
> @@ -167,7 +167,7 @@ static void thread_pool_completion_bh(void *opaque)
>  
>  restart:
>      QLIST_FOREACH_SAFE(elem, &pool->head, all, next) {
> -        if (elem->state != THREAD_DONE) {
> +        if (atomic_read(&elem->state) != THREAD_DONE) {
>              continue;
>          }
>  
> @@ -184,7 +184,7 @@ restart:
>               */
>              qemu_bh_schedule(pool->completion_bh);
>  
> -            elem->common.cb(elem->common.opaque, elem->ret);
> +            elem->common.cb(elem->common.opaque, atomic_read(&elem->ret));
>              qemu_aio_unref(elem);
>              goto restart;
>          } else {
> @@ -201,7 +201,7 @@ static void thread_pool_cancel(BlockAIOCB *acb)
>      trace_thread_pool_cancel(elem, elem->common.opaque);
>  
>      qemu_mutex_lock(&pool->lock);
> -    if (elem->state == THREAD_QUEUED &&
> +    if (atomic_read(&elem->state) == THREAD_QUEUED &&
>          /* No thread has yet started working on elem. we can try to "steal"
>           * the item from the worker if we can get a signal from the
>           * semaphore.  Because this is non-blocking, we can do it with
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v1 4/5] async.c: various atomic fixes for tsan
  2016-01-28 10:15 ` [Qemu-devel] [PATCH v1 4/5] async.c: various atomic fixes for tsan Alex Bennée
@ 2016-01-28 10:45   ` Paolo Bonzini
  0 siblings, 0 replies; 27+ messages in thread
From: Paolo Bonzini @ 2016-01-28 10:45 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel
  Cc: mttcg, peter.maydell, open list:Block I/O path, mark.burton,
	a.rigo, stefanha, fred.konrad



On 28/01/2016 11:15, Alex Bennée wrote:
> Running "make check" under ThreadSanitizer pointed to a number of
> potential races.
> 
>   - use atomic_mb_read to check bh->schedule
>   - use atomic_set/read primitives to manipulate bh->idle
> 
> The bh->idle changes are mostly for the benefit of explicit marking for
> tsan but the code in non-sanitised builds should be the same.

These are harmless because of the aio_notify calls that happen after
bh->scheduled is written, but they do make the code easier to understand.

Acked-by: Paolo Bonzini <pbonzini@redhat.com>


> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>  async.c | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/async.c b/async.c
> index e106072..a9cb9c3 100644
> --- a/async.c
> +++ b/async.c
> @@ -85,10 +85,10 @@ int aio_bh_poll(AioContext *ctx)
>           */
>          if (!bh->deleted && atomic_xchg(&bh->scheduled, 0)) {
>              /* Idle BHs and the notify BH don't count as progress */
> -            if (!bh->idle && bh != ctx->notify_dummy_bh) {
> +            if (!atomic_read(&bh->idle) && bh != ctx->notify_dummy_bh) {
>                  ret = 1;
>              }
> -            bh->idle = 0;
> +            atomic_set(&bh->idle, 0);
>              aio_bh_call(bh);
>          }
>      }
> @@ -128,7 +128,7 @@ void qemu_bh_schedule(QEMUBH *bh)
>      AioContext *ctx;
>  
>      ctx = bh->ctx;
> -    bh->idle = 0;
> +    atomic_set(&bh->idle, 0);
>      /* The memory barrier implicit in atomic_xchg makes sure that:
>       * 1. idle & any writes needed by the callback are done before the
>       *    locations are read in the aio_bh_poll.
> @@ -165,8 +165,8 @@ aio_compute_timeout(AioContext *ctx)
>      QEMUBH *bh;
>  
>      for (bh = ctx->first_bh; bh; bh = bh->next) {
> -        if (!bh->deleted && bh->scheduled) {
> -            if (bh->idle) {
> +        if (!bh->deleted && atomic_mb_read(&bh->scheduled)) {
> +            if (atomic_read(&bh->idle)) {
>                  /* idle bottom halves will be polled at least
>                   * every 10ms */
>                  timeout = 10000000;
> @@ -286,7 +286,7 @@ void aio_notify(AioContext *ctx)
>       * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll.
>       */
>      smp_mb();
> -    if (ctx->notify_me) {
> +    if (atomic_read(&ctx->notify_me)) {
>          event_notifier_set(&ctx->notifier);
>          atomic_mb_set(&ctx->notified, true);
>      }
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v1 3/5] include/qemu/atomic.h: default to __atomic functions
  2016-01-28 10:15 ` [Qemu-devel] [PATCH v1 3/5] include/qemu/atomic.h: default to __atomic functions Alex Bennée
@ 2016-01-28 11:00   ` Paolo Bonzini
  2016-01-29 16:06     ` Alex Bennée
  2016-04-01 14:30   ` James Hogan
  2016-04-01 20:35   ` Pranith Kumar
  2 siblings, 1 reply; 27+ messages in thread
From: Paolo Bonzini @ 2016-01-28 11:00 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel
  Cc: mttcg, peter.maydell, mark.burton, a.rigo, stefanha, fred.konrad



On 28/01/2016 11:15, Alex Bennée wrote:
> +/* atomic_mb_read/set semantics map Java volatile variables. They are
> + * less expensive on some platforms (notably POWER & ARM) than fully
> + * sequentially consistent operations.
> + *
> + * As long as they are used as paired operations they are safe to
> + * use. See docs/atomic.txt for more discussion.
> + */
> +
> +#define atomic_mb_read(ptr)                             \
> +    ({                                                  \
> +    typeof(*ptr) _val;                                  \
> +     __atomic_load(ptr, &_val, __ATOMIC_RELAXED);       \
> +     smp_rmb();                                         \
> +    _val;                                               \
> +    })
> +
> +#define atomic_mb_set(ptr, i)  do {                     \
> +    typeof(*ptr) _val = (i);                            \
> +    smp_wmb();                                          \
> +    __atomic_store(ptr, &_val, __ATOMIC_RELAXED);       \
> +    smp_mb();                                           \
> +} while(0)

Great... I'll change this to

#if defined(_ARCH_PPC)
#define atomic_mb_read(ptr)                             \
    ({                                                  \
    typeof(*ptr) _val;                                  \
     __atomic_load(ptr, &_val, __ATOMIC_RELAXED);       \
     smp_rmb();                                         \
    _val;                                               \
    })

#define atomic_mb_set(ptr, i)  do {                     \
    typeof(*ptr) _val = (i);                            \
    smp_wmb();                                          \
    __atomic_store(ptr, &_val, __ATOMIC_RELAXED);       \
    smp_mb();                                           \
} while(0)
#else
#define atomic_mb_read(ptr)                       \
    ({                                            \
    typeof(*ptr) _val;                            \
     __atomic_load(ptr, &_val, __ATOMIC_SEQ_CST); \
    _val;                                         \
    })

#define atomic_mb_set(ptr, i)  do {               \
    typeof(*ptr) _val = (i);                      \
    __atomic_store(ptr, &_val, __ATOMIC_SEQ_CST); \
} while(0)
#endif

since this benefits x86 (which can generate mov/xchg respectively) and
aarch64 (where atomic_mb_read/atomic_mb_set map directly to ldar/stlr).

> +/* Returns the eventual value, failed or not */
> +#define atomic_cmpxchg(ptr, old, new)                                   \
> +    ({                                                                  \
> +    typeof(*ptr) _old = (old), _new = (new);                            \
> +    __atomic_compare_exchange(ptr, &_old, &_new, false,                 \
> +                              __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST);      \
> +    _old; /* can this race if cmpxchg not used elsewhere? */            \
> +    })

How so?

Paolo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v1 2/5] configure: ensure ldflags propagated to config_host
  2016-01-28 10:15 ` [Qemu-devel] [PATCH v1 2/5] configure: ensure ldflags propagated to config_host Alex Bennée
@ 2016-01-28 11:10   ` Paolo Bonzini
  2016-01-28 11:13   ` Paolo Bonzini
  1 sibling, 0 replies; 27+ messages in thread
From: Paolo Bonzini @ 2016-01-28 11:10 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel
  Cc: mttcg, peter.maydell, mark.burton, a.rigo, stefanha, fred.konrad



On 28/01/2016 11:15, Alex Bennée wrote:
> index d0de2d4..d30532f 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -329,9 +329,9 @@ qemu-ga: qemu-ga$(EXESUF) $(QGA_VSS_PROVIDER) $(QEMU_GA_MSI)
>  endif
>  
>  ivshmem-client$(EXESUF): $(ivshmem-client-obj-y)
> -	$(call LINK, $^)
> +	$(call LINK, $^, $(ldflags))
>  ivshmem-server$(EXESUF): $(ivshmem-server-obj-y) libqemuutil.a libqemustub.a
> -	$(call LINK, $^)
> +	$(call LINK, $^, $(ldflags))
>  
>  clean:
>  # avoid old build problems by removing potentially incorrect old files

This seems spurious?  There's no $2 in the LINK macro.

> diff --git a/configure b/configure
> index bd29ba7..148b79a 100755
> --- a/configure
> +++ b/configure
> @@ -5871,7 +5871,7 @@ if test "$target_linux_user" = "yes" -o "$target_bsd_user" = "yes" ; then
>    ldflags="$ldflags $textseg_ldflags"
>  fi
>  
> -echo "LDFLAGS+=$ldflags" >> $config_target_mak
> +echo "LDFLAGS+=$ldflags" >> $config_host_mak
>  echo "QEMU_CFLAGS+=$cflags" >> $config_target_mak
>  
>  done # for target in $targets
> -- 2.7.0

This is good.

Paolo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v1 2/5] configure: ensure ldflags propagated to config_host
  2016-01-28 10:15 ` [Qemu-devel] [PATCH v1 2/5] configure: ensure ldflags propagated to config_host Alex Bennée
  2016-01-28 11:10   ` Paolo Bonzini
@ 2016-01-28 11:13   ` Paolo Bonzini
  2016-01-29 15:26     ` Alex Bennée
  1 sibling, 1 reply; 27+ messages in thread
From: Paolo Bonzini @ 2016-01-28 11:13 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel
  Cc: mttcg, peter.maydell, mark.burton, a.rigo, stefanha, fred.konrad



On 28/01/2016 11:15, Alex Bennée wrote:
> diff --git a/configure b/configure
> index bd29ba7..148b79a 100755
> --- a/configure
> +++ b/configure
> @@ -5871,7 +5871,7 @@ if test "$target_linux_user" = "yes" -o "$target_bsd_user" = "yes" ; then
>    ldflags="$ldflags $textseg_ldflags"
>  fi
>  
> -echo "LDFLAGS+=$ldflags" >> $config_target_mak
> +echo "LDFLAGS+=$ldflags" >> $config_host_mak
>  echo "QEMU_CFLAGS+=$cflags" >> $config_target_mak
>  
>  done # for target in $targets

Hmm wait, it's not okay.

This adds the *target* LDFLAGS to config-host.mak, and adds them a
zillion times.  extra-ldflags is already added to LDFLAGS in
config-host.mak:

  --extra-ldflags=*) LDFLAGS="$LDFLAGS $optarg"
                     EXTRA_LDFLAGS="$optarg"
  ;;

...

echo "LDFLAGS=$LDFLAGS" >> $config_host_mak

So I'm totally confused as to what this patch is trying to achieve...

Paolo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v1 1/5] configure: introduce --extra-libs
  2016-01-28 10:15 ` [Qemu-devel] [PATCH v1 1/5] configure: introduce --extra-libs Alex Bennée
@ 2016-01-28 11:14   ` Paolo Bonzini
  2016-01-28 11:38     ` Alex Bennée
  0 siblings, 1 reply; 27+ messages in thread
From: Paolo Bonzini @ 2016-01-28 11:14 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel
  Cc: mttcg, peter.maydell, mark.burton, a.rigo, stefanha, fred.konrad



On 28/01/2016 11:15, Alex Bennée wrote:
> If for example you want to use the thread sanitizer you want to ensure all
> binaries are linked with the library:
> 
>   ./configure ${TARGETS} --cc=gcc-5 --cxx=g++-5 \
>     --extra-cflags="-fsanitize=thread" --extra-libs="-ltsan"
> 
> This is more explicit than just specifying --extra-ldflags which does
> not get applied at the end of the linker string.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

Did you find out why you need -ltsan?

Paolo

> ---
> v1
>   - rebase
>   - also ensure it is applied to test builds
> ---
>  configure | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/configure b/configure
> index 44ac9ab..bd29ba7 100755
> --- a/configure
> +++ b/configure
> @@ -113,7 +113,7 @@ compile_object() {
>  compile_prog() {
>    local_cflags="$1"
>    local_ldflags="$2"
> -  do_cc $QEMU_CFLAGS $local_cflags -o $TMPE $TMPC $LDFLAGS $local_ldflags
> +  do_cc $QEMU_CFLAGS $local_cflags -o $TMPE $TMPC $LDFLAGS $local_ldflags $EXTRA_LIBS
>  }
>  
>  do_libtool() {
> @@ -366,6 +366,8 @@ for opt do
>    --extra-ldflags=*) LDFLAGS="$LDFLAGS $optarg"
>                       EXTRA_LDFLAGS="$optarg"
>    ;;
> +  --extra-libs=*)    EXTRA_LIBS="$optarg"
> +  ;;
>    --enable-debug-info) debug_info="yes"
>    ;;
>    --disable-debug-info) debug_info="no"
> @@ -786,6 +788,8 @@ for opt do
>    ;;
>    --extra-ldflags=*)
>    ;;
> +  --extra-libs=*)
> +  ;;
>    --enable-debug-info)
>    ;;
>    --disable-debug-info)
> @@ -1282,6 +1286,7 @@ Advanced options (experts only):
>    --objcc=OBJCC            use Objective-C compiler OBJCC [$objcc]
>    --extra-cflags=CFLAGS    append extra C compiler flags QEMU_CFLAGS
>    --extra-ldflags=LDFLAGS  append extra linker flags LDFLAGS
> +  --extra-libs=LIBS        append extra libraries when linking
>    --make=MAKE              use specified make [$make]
>    --install=INSTALL        use specified install [$install]
>    --python=PYTHON          use specified python [$python]
> @@ -4715,6 +4720,11 @@ fi
>  QEMU_CFLAGS="$pixman_cflags $fdt_cflags $QEMU_CFLAGS"
>  libs_softmmu="$pixman_libs $libs_softmmu"
>  
> +# extra-libs
> +LIBS="$LIBS $EXTRA_LIBS"
> +libs_softmmu="$libs_softmmu $EXTRA_LIBS"
> +libs_qga="$libs_qga $EXTRA_LIBS"
> +
>  echo "Install prefix    $prefix"
>  echo "BIOS directory    `eval echo $qemu_datadir`"
>  echo "binary directory  `eval echo $bindir`"
> @@ -4885,6 +4895,7 @@ fi
>  echo "qemu_helperdir=$libexecdir" >> $config_host_mak
>  echo "extra_cflags=$EXTRA_CFLAGS" >> $config_host_mak
>  echo "extra_ldflags=$EXTRA_LDFLAGS" >> $config_host_mak
> +echo "extra_libs=$EXTRA_LIBS" >> $config_host_mak
>  echo "qemu_localedir=$qemu_localedir" >> $config_host_mak
>  echo "libs_softmmu=$libs_softmmu" >> $config_host_mak
>  
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v1 1/5] configure: introduce --extra-libs
  2016-01-28 11:14   ` Paolo Bonzini
@ 2016-01-28 11:38     ` Alex Bennée
  0 siblings, 0 replies; 27+ messages in thread
From: Alex Bennée @ 2016-01-28 11:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: mttcg, peter.maydell, mark.burton, qemu-devel, a.rigo, stefanha,
	fred.konrad


Paolo Bonzini <pbonzini@redhat.com> writes:

> On 28/01/2016 11:15, Alex Bennée wrote:
>> If for example you want to use the thread sanitizer you want to ensure all
>> binaries are linked with the library:
>>
>>   ./configure ${TARGETS} --cc=gcc-5 --cxx=g++-5 \
>>     --extra-cflags="-fsanitize=thread" --extra-libs="-ltsan"
>>
>> This is more explicit than just specifying --extra-ldflags which does
>> not get applied at the end of the linker string.
>>
>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>
> Did you find out why you need -ltsan?

Not yet, just on my system it's needed (which is an LTS + GCC PPA).

>
> Paolo
>
>> ---
>> v1
>>   - rebase
>>   - also ensure it is applied to test builds
>> ---
>>  configure | 13 ++++++++++++-
>>  1 file changed, 12 insertions(+), 1 deletion(-)
>>
>> diff --git a/configure b/configure
>> index 44ac9ab..bd29ba7 100755
>> --- a/configure
>> +++ b/configure
>> @@ -113,7 +113,7 @@ compile_object() {
>>  compile_prog() {
>>    local_cflags="$1"
>>    local_ldflags="$2"
>> -  do_cc $QEMU_CFLAGS $local_cflags -o $TMPE $TMPC $LDFLAGS $local_ldflags
>> +  do_cc $QEMU_CFLAGS $local_cflags -o $TMPE $TMPC $LDFLAGS $local_ldflags $EXTRA_LIBS
>>  }
>>
>>  do_libtool() {
>> @@ -366,6 +366,8 @@ for opt do
>>    --extra-ldflags=*) LDFLAGS="$LDFLAGS $optarg"
>>                       EXTRA_LDFLAGS="$optarg"
>>    ;;
>> +  --extra-libs=*)    EXTRA_LIBS="$optarg"
>> +  ;;
>>    --enable-debug-info) debug_info="yes"
>>    ;;
>>    --disable-debug-info) debug_info="no"
>> @@ -786,6 +788,8 @@ for opt do
>>    ;;
>>    --extra-ldflags=*)
>>    ;;
>> +  --extra-libs=*)
>> +  ;;
>>    --enable-debug-info)
>>    ;;
>>    --disable-debug-info)
>> @@ -1282,6 +1286,7 @@ Advanced options (experts only):
>>    --objcc=OBJCC            use Objective-C compiler OBJCC [$objcc]
>>    --extra-cflags=CFLAGS    append extra C compiler flags QEMU_CFLAGS
>>    --extra-ldflags=LDFLAGS  append extra linker flags LDFLAGS
>> +  --extra-libs=LIBS        append extra libraries when linking
>>    --make=MAKE              use specified make [$make]
>>    --install=INSTALL        use specified install [$install]
>>    --python=PYTHON          use specified python [$python]
>> @@ -4715,6 +4720,11 @@ fi
>>  QEMU_CFLAGS="$pixman_cflags $fdt_cflags $QEMU_CFLAGS"
>>  libs_softmmu="$pixman_libs $libs_softmmu"
>>
>> +# extra-libs
>> +LIBS="$LIBS $EXTRA_LIBS"
>> +libs_softmmu="$libs_softmmu $EXTRA_LIBS"
>> +libs_qga="$libs_qga $EXTRA_LIBS"
>> +
>>  echo "Install prefix    $prefix"
>>  echo "BIOS directory    `eval echo $qemu_datadir`"
>>  echo "binary directory  `eval echo $bindir`"
>> @@ -4885,6 +4895,7 @@ fi
>>  echo "qemu_helperdir=$libexecdir" >> $config_host_mak
>>  echo "extra_cflags=$EXTRA_CFLAGS" >> $config_host_mak
>>  echo "extra_ldflags=$EXTRA_LDFLAGS" >> $config_host_mak
>> +echo "extra_libs=$EXTRA_LIBS" >> $config_host_mak
>>  echo "qemu_localedir=$qemu_localedir" >> $config_host_mak
>>  echo "libs_softmmu=$libs_softmmu" >> $config_host_mak
>>
>>


--
Alex Bennée

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v1 2/5] configure: ensure ldflags propagated to config_host
  2016-01-28 11:13   ` Paolo Bonzini
@ 2016-01-29 15:26     ` Alex Bennée
  0 siblings, 0 replies; 27+ messages in thread
From: Alex Bennée @ 2016-01-29 15:26 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: mttcg, peter.maydell, mark.burton, qemu-devel, a.rigo, stefanha,
	fred.konrad


Paolo Bonzini <pbonzini@redhat.com> writes:

> On 28/01/2016 11:15, Alex Bennée wrote:
>> diff --git a/configure b/configure
>> index bd29ba7..148b79a 100755
>> --- a/configure
>> +++ b/configure
>> @@ -5871,7 +5871,7 @@ if test "$target_linux_user" = "yes" -o "$target_bsd_user" = "yes" ; then
>>    ldflags="$ldflags $textseg_ldflags"
>>  fi
>>
>> -echo "LDFLAGS+=$ldflags" >> $config_target_mak
>> +echo "LDFLAGS+=$ldflags" >> $config_host_mak
>>  echo "QEMU_CFLAGS+=$cflags" >> $config_target_mak
>>
>>  done # for target in $targets
>
> Hmm wait, it's not okay.
>
> This adds the *target* LDFLAGS to config-host.mak, and adds them a
> zillion times.  extra-ldflags is already added to LDFLAGS in
> config-host.mak:
>
>   --extra-ldflags=*) LDFLAGS="$LDFLAGS $optarg"
>                      EXTRA_LDFLAGS="$optarg"
>   ;;
>
> ...
>
> echo "LDFLAGS=$LDFLAGS" >> $config_host_mak
>
> So I'm totally confused as to what this patch is trying to achieve...

It seems so was I. So I was having problems with ancillary binaries
failing to link against tsan but as you point out this should work with
"-fsantiize=thread" in the ldflags which are already available to
config_host.mak

On my Gentoo (GCC 4.9) system without this I can build with:

  ./configure ${TARGETS} --extra-cflags="-fsanitize=thread -fPIE" \
    --extra-ldflags="-pie -fsanitize=thread" --with-coroutine=gthread

Although I get make check failures:

GTESTER tests/check-qdict
FATAL: ThreadSanitizer can not mmap the shadow memory (something is
mapped at 0x555555554000 < 0x7cf000000000)
FATAL: Make sure to compile with -fPIE and to link with -pie.
/home/alex/lsrc/qemu/qemu.git/tests/Makefile:629: recipe for target
'check-tests/check-qdict' failed
make: *** [check-tests/check-qdict] Error 1

But I suspect this is possibly an ASLR issue.

I think this patch can be dropped altogether.

With the other patches can you build with tsan the proper way? What are
you running? I'll add it to the VMs I have to double check.

--
Alex Bennée

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v1 3/5] include/qemu/atomic.h: default to __atomic functions
  2016-01-28 11:00   ` Paolo Bonzini
@ 2016-01-29 16:06     ` Alex Bennée
  2016-02-04 12:40       ` Paolo Bonzini
  0 siblings, 1 reply; 27+ messages in thread
From: Alex Bennée @ 2016-01-29 16:06 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: mttcg, peter.maydell, mark.burton, a.rigo, qemu-devel, stefanha,
	fred.konrad


Paolo Bonzini <pbonzini@redhat.com> writes:

> On 28/01/2016 11:15, Alex Bennée wrote:
>> +/* atomic_mb_read/set semantics map Java volatile variables. They are
>> + * less expensive on some platforms (notably POWER & ARM) than fully
>> + * sequentially consistent operations.
>> + *
>> + * As long as they are used as paired operations they are safe to
>> + * use. See docs/atomic.txt for more discussion.
>> + */
>> +
>> +#define atomic_mb_read(ptr)                             \
>> +    ({                                                  \
>> +    typeof(*ptr) _val;                                  \
>> +     __atomic_load(ptr, &_val, __ATOMIC_RELAXED);       \
>> +     smp_rmb();                                         \
>> +    _val;                                               \
>> +    })
>> +
>> +#define atomic_mb_set(ptr, i)  do {                     \
>> +    typeof(*ptr) _val = (i);                            \
>> +    smp_wmb();                                          \
>> +    __atomic_store(ptr, &_val, __ATOMIC_RELAXED);       \
>> +    smp_mb();                                           \
>> +} while(0)
>
> Great... I'll change this to
>
> #if defined(_ARCH_PPC)
> #define atomic_mb_read(ptr)                             \
>     ({                                                  \
>     typeof(*ptr) _val;                                  \
>      __atomic_load(ptr, &_val, __ATOMIC_RELAXED);       \
>      smp_rmb();                                         \
>     _val;                                               \
>     })
>
> #define atomic_mb_set(ptr, i)  do {                     \
>     typeof(*ptr) _val = (i);                            \
>     smp_wmb();                                          \
>     __atomic_store(ptr, &_val, __ATOMIC_RELAXED);       \
>     smp_mb();                                           \
> } while(0)
> #else
> #define atomic_mb_read(ptr)                       \
>     ({                                            \
>     typeof(*ptr) _val;                            \
>      __atomic_load(ptr, &_val, __ATOMIC_SEQ_CST); \
>     _val;                                         \
>     })
>
> #define atomic_mb_set(ptr, i)  do {               \
>     typeof(*ptr) _val = (i);                      \
>     __atomic_store(ptr, &_val, __ATOMIC_SEQ_CST); \
> } while(0)
> #endif
>
> since this benefits x86 (which can generate mov/xchg respectively) and
> aarch64 (where atomic_mb_read/atomic_mb_set map directly to
> ldar/stlr).

The original comment mentioned both POWER and ARM so I wondering if we
should also special case for the ARMv7?

>
>> +/* Returns the eventual value, failed or not */

Yeah this comment in bogus.

>> +#define atomic_cmpxchg(ptr, old, new)                                   \
>> +    ({                                                                  \
>> +    typeof(*ptr) _old = (old), _new = (new);                            \
>> +    __atomic_compare_exchange(ptr, &_old, &_new, false,                 \
>> +                              __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST);      \
>> +    _old; /* can this race if cmpxchg not used elsewhere? */            \
>> +    })
>
> How so?

My mistake, I was having a worry that we weren't following the old
semantics. In fact having read even more closely I understand that _old is
updated by the __atomic function if the update fails. In fact _old is a
poor name because its _expected at the start and _current in the case it
fails. In fact:

    This compares the contents of *ptr with the contents of *expected. If
    equal, the operation is a read-modify-write operation that writes
    desired into *ptr. If they are not equal, the operation is a read and
    the current contents of *ptr are written into *expected. <snip>...

    If desired is written into *ptr then true is returned and memory is
    affected according to the memory order specified by success_memorder.
    There are no restrictions on what memory order can be used here.

I was wondering if this was subtly different from the old
__sync_val_compare_and_swap:

    The “val” version returns the contents of *ptr before the operation.

I think we are OK because if cmpxchg succeeds _old was by definition
what was already there but it is confusing and leads to funny code like
this:

    if (atomic_cmpxchg(&data[i].n, 0, 3) == 0) {
        data[i].ret = -ECANCELED;
        ...

and

    if (atomic_cmpxchg(&s->state, old_state, new_state) == old_state) {
       ...

Which might be easier to read if atomic_cmpxchg used the bool
semantics, i.e. return true for a successful cmpxchg.

The old code even has a atomic_bool_cmpxchg which no one seems to use. I
wonder if the correct solution is to convert atomic_cmpxchg calls to use
atomic_cmpxchg_bool calls and remove atomic_cmpxchg from atomic.h?

What do you think?

>
> Paolo


--
Alex Bennée

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v1 3/5] include/qemu/atomic.h: default to __atomic functions
  2016-01-29 16:06     ` Alex Bennée
@ 2016-02-04 12:40       ` Paolo Bonzini
  2016-02-04 13:00         ` Peter Maydell
  0 siblings, 1 reply; 27+ messages in thread
From: Paolo Bonzini @ 2016-02-04 12:40 UTC (permalink / raw)
  To: Alex Bennée
  Cc: mttcg, peter.maydell, mark.burton, a.rigo, qemu-devel, stefanha,
	fred.konrad



On 29/01/2016 17:06, Alex Bennée wrote:
> 
> Paolo Bonzini <pbonzini@redhat.com> writes:
> 
>> On 28/01/2016 11:15, Alex Bennée wrote:
>>> +/* atomic_mb_read/set semantics map Java volatile variables. They are
>>> + * less expensive on some platforms (notably POWER & ARM) than fully
>>> + * sequentially consistent operations.
>>> + *
>>> + * As long as they are used as paired operations they are safe to
>>> + * use. See docs/atomic.txt for more discussion.
>>> + */
> 
> The original comment mentioned both POWER and ARM so I wondering if we
> should also special case for the ARMv7?

I don't know the exact feature test, and I think ARMv7's penalty is much
less because processors are slower, with a less deep pipeline and
usually not used in SMP configurations.

In fact, because it doesn't have "dmb st" and friends, the generated
code should be exactly the same for ARMv7.  Looking at
https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html confirms this.

> I think we are OK because if cmpxchg succeeds _old was by definition
> what was already there but it is confusing and leads to funny code like
> this:
> 
>     if (atomic_cmpxchg(&data[i].n, 0, 3) == 0) {
>         data[i].ret = -ECANCELED;
>         ...
> 
> and
> 
>     if (atomic_cmpxchg(&s->state, old_state, new_state) == old_state) {
>        ...
> 
> Which might be easier to read if atomic_cmpxchg used the bool
> semantics, i.e. return true for a successful cmpxchg.

It depends.  When s->state is bool, the bool version is *very* hard to
read.  Then you have two bools and you never know which one it is.  For
example if the expected value is false, atomic_bool_cmpxchg will return
true if the memory location was false...  Aargh! :D

> The old code even has a atomic_bool_cmpxchg which no one seems to use.

Alvise added it, but it's not upstream.

> I wonder if the correct solution is to convert atomic_cmpxchg calls to use
> atomic_cmpxchg_bool calls and remove atomic_cmpxchg from atomic.h?

Yeah, though there are also cases where atomic_cmpxchg_bool is less
efficient.

Not to mention that I don't like the name...  Perhaps atomic_cmpxchg
should be the bool one and atomic_fetch_cmpxchg should return the value.
 Separate patch series though.

Paolo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v1 3/5] include/qemu/atomic.h: default to __atomic functions
  2016-02-04 12:40       ` Paolo Bonzini
@ 2016-02-04 13:00         ` Peter Maydell
  0 siblings, 0 replies; 27+ messages in thread
From: Peter Maydell @ 2016-02-04 13:00 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: mttcg, Mark Burton, Alvise Rigo, QEMU Developers,
	Stefan Hajnoczi, Alex Bennée, KONRAD Frédéric

On 4 February 2016 at 12:40, Paolo Bonzini <pbonzini@redhat.com> wrote:
> I don't know the exact feature test, and I think ARMv7's penalty is much
> less because processors are slower, with a less deep pipeline and
> usually not used in SMP configurations.

Most v7 cores are SMP, and especially most v7 cores with enough
CPU grunt that it's sensible to run QEMU at all are going to be SMP.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v1 3/5] include/qemu/atomic.h: default to __atomic functions
  2016-01-28 10:15 ` [Qemu-devel] [PATCH v1 3/5] include/qemu/atomic.h: default to __atomic functions Alex Bennée
  2016-01-28 11:00   ` Paolo Bonzini
@ 2016-04-01 14:30   ` James Hogan
  2016-04-01 14:51     ` Peter Maydell
  2016-04-01 16:06     ` Alex Bennée
  2016-04-01 20:35   ` Pranith Kumar
  2 siblings, 2 replies; 27+ messages in thread
From: James Hogan @ 2016-04-01 14:30 UTC (permalink / raw)
  To: Alex Bennée
  Cc: mttcg, peter.maydell, mark.burton, qemu-devel, a.rigo, stefanha,
	pbonzini, fred.konrad

[-- Attachment #1: Type: text/plain, Size: 13156 bytes --]

Hi Alex,

On Thu, Jan 28, 2016 at 10:15:17AM +0000, Alex Bennée wrote:
> The __atomic primitives have been available since GCC 4.7 and provide
> a richer interface for describing memory ordering requirements. As a
> bonus by using the primitives instead of hand-rolled functions we can
> use tools such as the AddressSanitizer which need the use of well
> defined APIs for its analysis.
> 
> If we have __ATOMIC defines we exclusively use the __atomic primitives
> for all our atomic access. Otherwise we fall back to the mixture of
> __sync and hand-rolled barrier cases.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

This breaks the build on MIPS32, with the following link error:

cpus.o: In function `icount_warp_rt':
/work/mips/qemu/vz/cpus.c +343 : undefined reference to `__atomic_load_8'
collect2: error: ld returned 1 exit status

Seemingly __atomic_load_8 is provided by libatomic, so we're missing
-latomic.

Cheers
James

> 
> ---
> 
> v2 - review updates
>  - add reference to doc/atomics.txt at top of file
>  - ensure smb_mb() is full __ATOMIC_SEQ_CST
>  - make atomic_read/set __ATOMIC_RELAXED
>  - make atomic_rcu_read/set __ATOMIC_CONSUME/RELEASE
>  - make atomic_mb_red/set __ATOMIC_RELAXED + explicit barriers
>  - move some comments about __ATOMIC no longer relevant to bottom half
>  - rm un-needed ifdef/ifndefs
>  - in cmpxchg return old instead of ptr
>  - extra comments on old style volatile atomic access
> ---
>  include/qemu/atomic.h | 178 +++++++++++++++++++++++++++++++++-----------------
>  1 file changed, 117 insertions(+), 61 deletions(-)
> 
> diff --git a/include/qemu/atomic.h b/include/qemu/atomic.h
> index bd2c075..ec3208a 100644
> --- a/include/qemu/atomic.h
> +++ b/include/qemu/atomic.h
> @@ -8,6 +8,8 @@
>   * This work is licensed under the terms of the GNU GPL, version 2 or later.
>   * See the COPYING file in the top-level directory.
>   *
> + * See docs/atomics.txt for discussion about the guarantees each
> + * atomic primitive is meant to provide.
>   */
>  
>  #ifndef __QEMU_ATOMIC_H
> @@ -15,12 +17,116 @@
>  
>  #include "qemu/compiler.h"
>  
> -/* For C11 atomic ops */
>  
>  /* Compiler barrier */
>  #define barrier()   ({ asm volatile("" ::: "memory"); (void)0; })
>  
> -#ifndef __ATOMIC_RELAXED
> +#ifdef __ATOMIC_RELAXED
> +/* For C11 atomic ops */
> +
> +/* Manual memory barriers
> + *
> + *__atomic_thread_fence does not include a compiler barrier; instead,
> + * the barrier is part of __atomic_load/__atomic_store's "volatile-like"
> + * semantics. If smp_wmb() is a no-op, absence of the barrier means that
> + * the compiler is free to reorder stores on each side of the barrier.
> + * Add one here, and similarly in smp_rmb() and smp_read_barrier_depends().
> + */
> +
> +#define smp_mb()    ({ barrier(); __atomic_thread_fence(__ATOMIC_SEQ_CST); barrier(); })
> +#define smp_wmb()   ({ barrier(); __atomic_thread_fence(__ATOMIC_RELEASE); barrier(); })
> +#define smp_rmb()   ({ barrier(); __atomic_thread_fence(__ATOMIC_ACQUIRE); barrier(); })
> +
> +#define smp_read_barrier_depends() ({ barrier(); __atomic_thread_fence(__ATOMIC_CONSUME); barrier(); })
> +
> +/* Weak atomic operations prevent the compiler moving other
> + * loads/stores past the atomic operation load/store. However there is
> + * no explicit memory barrier for the processor.
> + */
> +#define atomic_read(ptr)                          \
> +    ({                                            \
> +    typeof(*ptr) _val;                            \
> +     __atomic_load(ptr, &_val, __ATOMIC_RELAXED); \
> +    _val;                                         \
> +    })
> +
> +#define atomic_set(ptr, i)  do {                  \
> +    typeof(*ptr) _val = (i);                      \
> +    __atomic_store(ptr, &_val, __ATOMIC_RELAXED); \
> +} while(0)
> +
> +/* Atomic RCU operations imply weak memory barriers */
> +
> +#define atomic_rcu_read(ptr)                      \
> +    ({                                            \
> +    typeof(*ptr) _val;                            \
> +     __atomic_load(ptr, &_val, __ATOMIC_CONSUME); \
> +    _val;                                         \
> +    })
> +
> +#define atomic_rcu_set(ptr, i)  do {                    \
> +    typeof(*ptr) _val = (i);                            \
> +    __atomic_store(ptr, &_val, __ATOMIC_RELEASE);       \
> +} while(0)
> +
> +/* atomic_mb_read/set semantics map Java volatile variables. They are
> + * less expensive on some platforms (notably POWER & ARM) than fully
> + * sequentially consistent operations.
> + *
> + * As long as they are used as paired operations they are safe to
> + * use. See docs/atomic.txt for more discussion.
> + */
> +
> +#define atomic_mb_read(ptr)                             \
> +    ({                                                  \
> +    typeof(*ptr) _val;                                  \
> +     __atomic_load(ptr, &_val, __ATOMIC_RELAXED);       \
> +     smp_rmb();                                         \
> +    _val;                                               \
> +    })
> +
> +#define atomic_mb_set(ptr, i)  do {                     \
> +    typeof(*ptr) _val = (i);                            \
> +    smp_wmb();                                          \
> +    __atomic_store(ptr, &_val, __ATOMIC_RELAXED);       \
> +    smp_mb();                                           \
> +} while(0)
> +
> +
> +/* All the remaining operations are fully sequentially consistent */
> +
> +#define atomic_xchg(ptr, i)    ({                           \
> +    typeof(*ptr) _new = (i), _old;                          \
> +    __atomic_exchange(ptr, &_new, &_old, __ATOMIC_SEQ_CST); \
> +    _old;                                                   \
> +})
> +
> +/* Returns the eventual value, failed or not */
> +#define atomic_cmpxchg(ptr, old, new)                                   \
> +    ({                                                                  \
> +    typeof(*ptr) _old = (old), _new = (new);                            \
> +    __atomic_compare_exchange(ptr, &_old, &_new, false,                 \
> +                              __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST);      \
> +    _old; /* can this race if cmpxchg not used elsewhere? */            \
> +    })
> +
> +/* Provide shorter names for GCC atomic builtins, return old value */
> +#define atomic_fetch_inc(ptr)  __atomic_fetch_add(ptr, 1, __ATOMIC_SEQ_CST)
> +#define atomic_fetch_dec(ptr)  __atomic_fetch_sub(ptr, 1, __ATOMIC_SEQ_CST)
> +#define atomic_fetch_add(ptr, n) __atomic_fetch_add(ptr, n, __ATOMIC_SEQ_CST)
> +#define atomic_fetch_sub(ptr, n) __atomic_fetch_sub(ptr, n, __ATOMIC_SEQ_CST)
> +#define atomic_fetch_and(ptr, n) __atomic_fetch_and(ptr, n, __ATOMIC_SEQ_CST)
> +#define atomic_fetch_or(ptr, n)  __atomic_fetch_or(ptr, n, __ATOMIC_SEQ_CST)
> +
> +/* And even shorter names that return void.  */
> +#define atomic_inc(ptr)    ((void) __atomic_fetch_add(ptr, 1, __ATOMIC_SEQ_CST))
> +#define atomic_dec(ptr)    ((void) __atomic_fetch_sub(ptr, 1, __ATOMIC_SEQ_CST))
> +#define atomic_add(ptr, n) ((void) __atomic_fetch_add(ptr, n, __ATOMIC_SEQ_CST))
> +#define atomic_sub(ptr, n) ((void) __atomic_fetch_sub(ptr, n, __ATOMIC_SEQ_CST))
> +#define atomic_and(ptr, n) ((void) __atomic_fetch_and(ptr, n, __ATOMIC_SEQ_CST))
> +#define atomic_or(ptr, n)  ((void) __atomic_fetch_or(ptr, n, __ATOMIC_SEQ_CST))
> +
> +#else /* __ATOMIC_RELAXED */
>  
>  /*
>   * We use GCC builtin if it's available, as that can use mfence on
> @@ -85,8 +191,6 @@
>  
>  #endif /* _ARCH_PPC */
>  
> -#endif /* C11 atomics */
> -
>  /*
>   * For (host) platforms we don't have explicit barrier definitions
>   * for, we use the gcc __sync_synchronize() primitive to generate a
> @@ -98,42 +202,22 @@
>  #endif
>  
>  #ifndef smp_wmb
> -#ifdef __ATOMIC_RELEASE
> -/* __atomic_thread_fence does not include a compiler barrier; instead,
> - * the barrier is part of __atomic_load/__atomic_store's "volatile-like"
> - * semantics. If smp_wmb() is a no-op, absence of the barrier means that
> - * the compiler is free to reorder stores on each side of the barrier.
> - * Add one here, and similarly in smp_rmb() and smp_read_barrier_depends().
> - */
> -#define smp_wmb()   ({ barrier(); __atomic_thread_fence(__ATOMIC_RELEASE); barrier(); })
> -#else
>  #define smp_wmb()   __sync_synchronize()
>  #endif
> -#endif
>  
>  #ifndef smp_rmb
> -#ifdef __ATOMIC_ACQUIRE
> -#define smp_rmb()   ({ barrier(); __atomic_thread_fence(__ATOMIC_ACQUIRE); barrier(); })
> -#else
>  #define smp_rmb()   __sync_synchronize()
>  #endif
> -#endif
>  
>  #ifndef smp_read_barrier_depends
> -#ifdef __ATOMIC_CONSUME
> -#define smp_read_barrier_depends()   ({ barrier(); __atomic_thread_fence(__ATOMIC_CONSUME); barrier(); })
> -#else
>  #define smp_read_barrier_depends()   barrier()
>  #endif
> -#endif
>  
> -#ifndef atomic_read
> +/* These will only be atomic if the processor does the fetch or store
> + * in a single issue memory operation
> + */
>  #define atomic_read(ptr)       (*(__typeof__(*ptr) volatile*) (ptr))
> -#endif
> -
> -#ifndef atomic_set
>  #define atomic_set(ptr, i)     ((*(__typeof__(*ptr) volatile*) (ptr)) = (i))
> -#endif
>  
>  /**
>   * atomic_rcu_read - reads a RCU-protected pointer to a local variable
> @@ -146,30 +230,18 @@
>   * Inserts memory barriers on architectures that require them (currently only
>   * Alpha) and documents which pointers are protected by RCU.
>   *
> - * Unless the __ATOMIC_CONSUME memory order is available, atomic_rcu_read also
> - * includes a compiler barrier to ensure that value-speculative optimizations
> - * (e.g. VSS: Value Speculation Scheduling) does not perform the data read
> - * before the pointer read by speculating the value of the pointer.  On new
> - * enough compilers, atomic_load takes care of such concern about
> - * dependency-breaking optimizations.
> + * atomic_rcu_read also includes a compiler barrier to ensure that
> + * value-speculative optimizations (e.g. VSS: Value Speculation
> + * Scheduling) does not perform the data read before the pointer read
> + * by speculating the value of the pointer.
>   *
>   * Should match atomic_rcu_set(), atomic_xchg(), atomic_cmpxchg().
>   */
> -#ifndef atomic_rcu_read
> -#ifdef __ATOMIC_CONSUME
> -#define atomic_rcu_read(ptr)    ({                \
> -    typeof(*ptr) _val;                            \
> -     __atomic_load(ptr, &_val, __ATOMIC_CONSUME); \
> -    _val;                                         \
> -})
> -#else
>  #define atomic_rcu_read(ptr)    ({                \
>      typeof(*ptr) _val = atomic_read(ptr);         \
>      smp_read_barrier_depends();                   \
>      _val;                                         \
>  })
> -#endif
> -#endif
>  
>  /**
>   * atomic_rcu_set - assigns (publicizes) a pointer to a new data structure
> @@ -182,19 +254,10 @@
>   *
>   * Should match atomic_rcu_read().
>   */
> -#ifndef atomic_rcu_set
> -#ifdef __ATOMIC_RELEASE
> -#define atomic_rcu_set(ptr, i)  do {              \
> -    typeof(*ptr) _val = (i);                      \
> -    __atomic_store(ptr, &_val, __ATOMIC_RELEASE); \
> -} while(0)
> -#else
>  #define atomic_rcu_set(ptr, i)  do {              \
>      smp_wmb();                                    \
>      atomic_set(ptr, i);                           \
>  } while (0)
> -#endif
> -#endif
>  
>  /* These have the same semantics as Java volatile variables.
>   * See http://gee.cs.oswego.edu/dl/jmm/cookbook.html:
> @@ -218,13 +281,11 @@
>   * (see docs/atomics.txt), and I'm not sure that __ATOMIC_ACQ_REL is enough.
>   * Just always use the barriers manually by the rules above.
>   */
> -#ifndef atomic_mb_read
>  #define atomic_mb_read(ptr)    ({           \
>      typeof(*ptr) _val = atomic_read(ptr);   \
>      smp_rmb();                              \
>      _val;                                   \
>  })
> -#endif
>  
>  #ifndef atomic_mb_set
>  #define atomic_mb_set(ptr, i)  do {         \
> @@ -237,12 +298,6 @@
>  #ifndef atomic_xchg
>  #if defined(__clang__)
>  #define atomic_xchg(ptr, i)    __sync_swap(ptr, i)
> -#elif defined(__ATOMIC_SEQ_CST)
> -#define atomic_xchg(ptr, i)    ({                           \
> -    typeof(*ptr) _new = (i), _old;                          \
> -    __atomic_exchange(ptr, &_new, &_old, __ATOMIC_SEQ_CST); \
> -    _old;                                                   \
> -})
>  #else
>  /* __sync_lock_test_and_set() is documented to be an acquire barrier only.  */
>  #define atomic_xchg(ptr, i)    (smp_mb(), __sync_lock_test_and_set(ptr, i))
> @@ -266,4 +321,5 @@
>  #define atomic_and(ptr, n)     ((void) __sync_fetch_and_and(ptr, n))
>  #define atomic_or(ptr, n)      ((void) __sync_fetch_and_or(ptr, n))
>  
> -#endif
> +#endif /* __ATOMIC_RELAXED */
> +#endif /* __QEMU_ATOMIC_H */
> -- 
> 2.7.0
> 
> 

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v1 3/5] include/qemu/atomic.h: default to __atomic functions
  2016-04-01 14:30   ` James Hogan
@ 2016-04-01 14:51     ` Peter Maydell
  2016-04-01 16:06     ` Alex Bennée
  1 sibling, 0 replies; 27+ messages in thread
From: Peter Maydell @ 2016-04-01 14:51 UTC (permalink / raw)
  To: James Hogan
  Cc: mttcg, Mark Burton, QEMU Developers, Alvise Rigo,
	Stefan Hajnoczi, Paolo Bonzini, Alex Bennée,
	KONRAD Frédéric

On 1 April 2016 at 15:30, James Hogan <james.hogan@imgtec.com> wrote:
> Hi Alex,
>
> On Thu, Jan 28, 2016 at 10:15:17AM +0000, Alex Bennée wrote:
>> The __atomic primitives have been available since GCC 4.7 and provide
>> a richer interface for describing memory ordering requirements. As a
>> bonus by using the primitives instead of hand-rolled functions we can
>> use tools such as the AddressSanitizer which need the use of well
>> defined APIs for its analysis.
>>
>> If we have __ATOMIC defines we exclusively use the __atomic primitives
>> for all our atomic access. Otherwise we fall back to the mixture of
>> __sync and hand-rolled barrier cases.
>>
>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>
> This breaks the build on MIPS32, with the following link error:
>
> cpus.o: In function `icount_warp_rt':
> /work/mips/qemu/vz/cpus.c +343 : undefined reference to `__atomic_load_8'
> collect2: error: ld returned 1 exit status
>
> Seemingly __atomic_load_8 is provided by libatomic, so we're missing
> -latomic.

We should not be doing atomic ops on types larger than the native
pointer type. I think there's a patch proposed on list to fix this.
I don't think we should be adding libatomic.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v1 3/5] include/qemu/atomic.h: default to __atomic functions
  2016-04-01 14:30   ` James Hogan
  2016-04-01 14:51     ` Peter Maydell
@ 2016-04-01 16:06     ` Alex Bennée
  1 sibling, 0 replies; 27+ messages in thread
From: Alex Bennée @ 2016-04-01 16:06 UTC (permalink / raw)
  To: James Hogan
  Cc: mttcg, peter.maydell, mark.burton, qemu-devel, a.rigo, stefanha,
	pbonzini, fred.konrad


James Hogan <james.hogan@imgtec.com> writes:

> Hi Alex,
>
> On Thu, Jan 28, 2016 at 10:15:17AM +0000, Alex Bennée wrote:
>> The __atomic primitives have been available since GCC 4.7 and provide
>> a richer interface for describing memory ordering requirements. As a
>> bonus by using the primitives instead of hand-rolled functions we can
>> use tools such as the AddressSanitizer which need the use of well
>> defined APIs for its analysis.
>>
>> If we have __ATOMIC defines we exclusively use the __atomic primitives
>> for all our atomic access. Otherwise we fall back to the mixture of
>> __sync and hand-rolled barrier cases.
>>
>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>
> This breaks the build on MIPS32, with the following link error:
>
> cpus.o: In function `icount_warp_rt':
> /work/mips/qemu/vz/cpus.c +343 : undefined reference to `__atomic_load_8'
> collect2: error: ld returned 1 exit status

See <1458577386-9984-1-git-send-email-alex.bennee@linaro.org> which has
two patches. One to avoid doing a > word width atomic load for
icount_warp_rt. The second patch adds a compile time assert in the
atomic.h code.

I'll be re-spinning those patches on Monday.

>
> Seemingly __atomic_load_8 is provided by libatomic, so we're missing
> -latomic.
>
> Cheers
> James
>
>>
>> ---
>>
>> v2 - review updates
>>  - add reference to doc/atomics.txt at top of file
>>  - ensure smb_mb() is full __ATOMIC_SEQ_CST
>>  - make atomic_read/set __ATOMIC_RELAXED
>>  - make atomic_rcu_read/set __ATOMIC_CONSUME/RELEASE
>>  - make atomic_mb_red/set __ATOMIC_RELAXED + explicit barriers
>>  - move some comments about __ATOMIC no longer relevant to bottom half
>>  - rm un-needed ifdef/ifndefs
>>  - in cmpxchg return old instead of ptr
>>  - extra comments on old style volatile atomic access
>> ---
>>  include/qemu/atomic.h | 178 +++++++++++++++++++++++++++++++++-----------------
>>  1 file changed, 117 insertions(+), 61 deletions(-)
>>
>> diff --git a/include/qemu/atomic.h b/include/qemu/atomic.h
>> index bd2c075..ec3208a 100644
>> --- a/include/qemu/atomic.h
>> +++ b/include/qemu/atomic.h
>> @@ -8,6 +8,8 @@
>>   * This work is licensed under the terms of the GNU GPL, version 2 or later.
>>   * See the COPYING file in the top-level directory.
>>   *
>> + * See docs/atomics.txt for discussion about the guarantees each
>> + * atomic primitive is meant to provide.
>>   */
>>
>>  #ifndef __QEMU_ATOMIC_H
>> @@ -15,12 +17,116 @@
>>
>>  #include "qemu/compiler.h"
>>
>> -/* For C11 atomic ops */
>>
>>  /* Compiler barrier */
>>  #define barrier()   ({ asm volatile("" ::: "memory"); (void)0; })
>>
>> -#ifndef __ATOMIC_RELAXED
>> +#ifdef __ATOMIC_RELAXED
>> +/* For C11 atomic ops */
>> +
>> +/* Manual memory barriers
>> + *
>> + *__atomic_thread_fence does not include a compiler barrier; instead,
>> + * the barrier is part of __atomic_load/__atomic_store's "volatile-like"
>> + * semantics. If smp_wmb() is a no-op, absence of the barrier means that
>> + * the compiler is free to reorder stores on each side of the barrier.
>> + * Add one here, and similarly in smp_rmb() and smp_read_barrier_depends().
>> + */
>> +
>> +#define smp_mb()    ({ barrier(); __atomic_thread_fence(__ATOMIC_SEQ_CST); barrier(); })
>> +#define smp_wmb()   ({ barrier(); __atomic_thread_fence(__ATOMIC_RELEASE); barrier(); })
>> +#define smp_rmb()   ({ barrier(); __atomic_thread_fence(__ATOMIC_ACQUIRE); barrier(); })
>> +
>> +#define smp_read_barrier_depends() ({ barrier(); __atomic_thread_fence(__ATOMIC_CONSUME); barrier(); })
>> +
>> +/* Weak atomic operations prevent the compiler moving other
>> + * loads/stores past the atomic operation load/store. However there is
>> + * no explicit memory barrier for the processor.
>> + */
>> +#define atomic_read(ptr)                          \
>> +    ({                                            \
>> +    typeof(*ptr) _val;                            \
>> +     __atomic_load(ptr, &_val, __ATOMIC_RELAXED); \
>> +    _val;                                         \
>> +    })
>> +
>> +#define atomic_set(ptr, i)  do {                  \
>> +    typeof(*ptr) _val = (i);                      \
>> +    __atomic_store(ptr, &_val, __ATOMIC_RELAXED); \
>> +} while(0)
>> +
>> +/* Atomic RCU operations imply weak memory barriers */
>> +
>> +#define atomic_rcu_read(ptr)                      \
>> +    ({                                            \
>> +    typeof(*ptr) _val;                            \
>> +     __atomic_load(ptr, &_val, __ATOMIC_CONSUME); \
>> +    _val;                                         \
>> +    })
>> +
>> +#define atomic_rcu_set(ptr, i)  do {                    \
>> +    typeof(*ptr) _val = (i);                            \
>> +    __atomic_store(ptr, &_val, __ATOMIC_RELEASE);       \
>> +} while(0)
>> +
>> +/* atomic_mb_read/set semantics map Java volatile variables. They are
>> + * less expensive on some platforms (notably POWER & ARM) than fully
>> + * sequentially consistent operations.
>> + *
>> + * As long as they are used as paired operations they are safe to
>> + * use. See docs/atomic.txt for more discussion.
>> + */
>> +
>> +#define atomic_mb_read(ptr)                             \
>> +    ({                                                  \
>> +    typeof(*ptr) _val;                                  \
>> +     __atomic_load(ptr, &_val, __ATOMIC_RELAXED);       \
>> +     smp_rmb();                                         \
>> +    _val;                                               \
>> +    })
>> +
>> +#define atomic_mb_set(ptr, i)  do {                     \
>> +    typeof(*ptr) _val = (i);                            \
>> +    smp_wmb();                                          \
>> +    __atomic_store(ptr, &_val, __ATOMIC_RELAXED);       \
>> +    smp_mb();                                           \
>> +} while(0)
>> +
>> +
>> +/* All the remaining operations are fully sequentially consistent */
>> +
>> +#define atomic_xchg(ptr, i)    ({                           \
>> +    typeof(*ptr) _new = (i), _old;                          \
>> +    __atomic_exchange(ptr, &_new, &_old, __ATOMIC_SEQ_CST); \
>> +    _old;                                                   \
>> +})
>> +
>> +/* Returns the eventual value, failed or not */
>> +#define atomic_cmpxchg(ptr, old, new)                                   \
>> +    ({                                                                  \
>> +    typeof(*ptr) _old = (old), _new = (new);                            \
>> +    __atomic_compare_exchange(ptr, &_old, &_new, false,                 \
>> +                              __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST);      \
>> +    _old; /* can this race if cmpxchg not used elsewhere? */            \
>> +    })
>> +
>> +/* Provide shorter names for GCC atomic builtins, return old value */
>> +#define atomic_fetch_inc(ptr)  __atomic_fetch_add(ptr, 1, __ATOMIC_SEQ_CST)
>> +#define atomic_fetch_dec(ptr)  __atomic_fetch_sub(ptr, 1, __ATOMIC_SEQ_CST)
>> +#define atomic_fetch_add(ptr, n) __atomic_fetch_add(ptr, n, __ATOMIC_SEQ_CST)
>> +#define atomic_fetch_sub(ptr, n) __atomic_fetch_sub(ptr, n, __ATOMIC_SEQ_CST)
>> +#define atomic_fetch_and(ptr, n) __atomic_fetch_and(ptr, n, __ATOMIC_SEQ_CST)
>> +#define atomic_fetch_or(ptr, n)  __atomic_fetch_or(ptr, n, __ATOMIC_SEQ_CST)
>> +
>> +/* And even shorter names that return void.  */
>> +#define atomic_inc(ptr)    ((void) __atomic_fetch_add(ptr, 1, __ATOMIC_SEQ_CST))
>> +#define atomic_dec(ptr)    ((void) __atomic_fetch_sub(ptr, 1, __ATOMIC_SEQ_CST))
>> +#define atomic_add(ptr, n) ((void) __atomic_fetch_add(ptr, n, __ATOMIC_SEQ_CST))
>> +#define atomic_sub(ptr, n) ((void) __atomic_fetch_sub(ptr, n, __ATOMIC_SEQ_CST))
>> +#define atomic_and(ptr, n) ((void) __atomic_fetch_and(ptr, n, __ATOMIC_SEQ_CST))
>> +#define atomic_or(ptr, n)  ((void) __atomic_fetch_or(ptr, n, __ATOMIC_SEQ_CST))
>> +
>> +#else /* __ATOMIC_RELAXED */
>>
>>  /*
>>   * We use GCC builtin if it's available, as that can use mfence on
>> @@ -85,8 +191,6 @@
>>
>>  #endif /* _ARCH_PPC */
>>
>> -#endif /* C11 atomics */
>> -
>>  /*
>>   * For (host) platforms we don't have explicit barrier definitions
>>   * for, we use the gcc __sync_synchronize() primitive to generate a
>> @@ -98,42 +202,22 @@
>>  #endif
>>
>>  #ifndef smp_wmb
>> -#ifdef __ATOMIC_RELEASE
>> -/* __atomic_thread_fence does not include a compiler barrier; instead,
>> - * the barrier is part of __atomic_load/__atomic_store's "volatile-like"
>> - * semantics. If smp_wmb() is a no-op, absence of the barrier means that
>> - * the compiler is free to reorder stores on each side of the barrier.
>> - * Add one here, and similarly in smp_rmb() and smp_read_barrier_depends().
>> - */
>> -#define smp_wmb()   ({ barrier(); __atomic_thread_fence(__ATOMIC_RELEASE); barrier(); })
>> -#else
>>  #define smp_wmb()   __sync_synchronize()
>>  #endif
>> -#endif
>>
>>  #ifndef smp_rmb
>> -#ifdef __ATOMIC_ACQUIRE
>> -#define smp_rmb()   ({ barrier(); __atomic_thread_fence(__ATOMIC_ACQUIRE); barrier(); })
>> -#else
>>  #define smp_rmb()   __sync_synchronize()
>>  #endif
>> -#endif
>>
>>  #ifndef smp_read_barrier_depends
>> -#ifdef __ATOMIC_CONSUME
>> -#define smp_read_barrier_depends()   ({ barrier(); __atomic_thread_fence(__ATOMIC_CONSUME); barrier(); })
>> -#else
>>  #define smp_read_barrier_depends()   barrier()
>>  #endif
>> -#endif
>>
>> -#ifndef atomic_read
>> +/* These will only be atomic if the processor does the fetch or store
>> + * in a single issue memory operation
>> + */
>>  #define atomic_read(ptr)       (*(__typeof__(*ptr) volatile*) (ptr))
>> -#endif
>> -
>> -#ifndef atomic_set
>>  #define atomic_set(ptr, i)     ((*(__typeof__(*ptr) volatile*) (ptr)) = (i))
>> -#endif
>>
>>  /**
>>   * atomic_rcu_read - reads a RCU-protected pointer to a local variable
>> @@ -146,30 +230,18 @@
>>   * Inserts memory barriers on architectures that require them (currently only
>>   * Alpha) and documents which pointers are protected by RCU.
>>   *
>> - * Unless the __ATOMIC_CONSUME memory order is available, atomic_rcu_read also
>> - * includes a compiler barrier to ensure that value-speculative optimizations
>> - * (e.g. VSS: Value Speculation Scheduling) does not perform the data read
>> - * before the pointer read by speculating the value of the pointer.  On new
>> - * enough compilers, atomic_load takes care of such concern about
>> - * dependency-breaking optimizations.
>> + * atomic_rcu_read also includes a compiler barrier to ensure that
>> + * value-speculative optimizations (e.g. VSS: Value Speculation
>> + * Scheduling) does not perform the data read before the pointer read
>> + * by speculating the value of the pointer.
>>   *
>>   * Should match atomic_rcu_set(), atomic_xchg(), atomic_cmpxchg().
>>   */
>> -#ifndef atomic_rcu_read
>> -#ifdef __ATOMIC_CONSUME
>> -#define atomic_rcu_read(ptr)    ({                \
>> -    typeof(*ptr) _val;                            \
>> -     __atomic_load(ptr, &_val, __ATOMIC_CONSUME); \
>> -    _val;                                         \
>> -})
>> -#else
>>  #define atomic_rcu_read(ptr)    ({                \
>>      typeof(*ptr) _val = atomic_read(ptr);         \
>>      smp_read_barrier_depends();                   \
>>      _val;                                         \
>>  })
>> -#endif
>> -#endif
>>
>>  /**
>>   * atomic_rcu_set - assigns (publicizes) a pointer to a new data structure
>> @@ -182,19 +254,10 @@
>>   *
>>   * Should match atomic_rcu_read().
>>   */
>> -#ifndef atomic_rcu_set
>> -#ifdef __ATOMIC_RELEASE
>> -#define atomic_rcu_set(ptr, i)  do {              \
>> -    typeof(*ptr) _val = (i);                      \
>> -    __atomic_store(ptr, &_val, __ATOMIC_RELEASE); \
>> -} while(0)
>> -#else
>>  #define atomic_rcu_set(ptr, i)  do {              \
>>      smp_wmb();                                    \
>>      atomic_set(ptr, i);                           \
>>  } while (0)
>> -#endif
>> -#endif
>>
>>  /* These have the same semantics as Java volatile variables.
>>   * See http://gee.cs.oswego.edu/dl/jmm/cookbook.html:
>> @@ -218,13 +281,11 @@
>>   * (see docs/atomics.txt), and I'm not sure that __ATOMIC_ACQ_REL is enough.
>>   * Just always use the barriers manually by the rules above.
>>   */
>> -#ifndef atomic_mb_read
>>  #define atomic_mb_read(ptr)    ({           \
>>      typeof(*ptr) _val = atomic_read(ptr);   \
>>      smp_rmb();                              \
>>      _val;                                   \
>>  })
>> -#endif
>>
>>  #ifndef atomic_mb_set
>>  #define atomic_mb_set(ptr, i)  do {         \
>> @@ -237,12 +298,6 @@
>>  #ifndef atomic_xchg
>>  #if defined(__clang__)
>>  #define atomic_xchg(ptr, i)    __sync_swap(ptr, i)
>> -#elif defined(__ATOMIC_SEQ_CST)
>> -#define atomic_xchg(ptr, i)    ({                           \
>> -    typeof(*ptr) _new = (i), _old;                          \
>> -    __atomic_exchange(ptr, &_new, &_old, __ATOMIC_SEQ_CST); \
>> -    _old;                                                   \
>> -})
>>  #else
>>  /* __sync_lock_test_and_set() is documented to be an acquire barrier only.  */
>>  #define atomic_xchg(ptr, i)    (smp_mb(), __sync_lock_test_and_set(ptr, i))
>> @@ -266,4 +321,5 @@
>>  #define atomic_and(ptr, n)     ((void) __sync_fetch_and_and(ptr, n))
>>  #define atomic_or(ptr, n)      ((void) __sync_fetch_and_or(ptr, n))
>>
>> -#endif
>> +#endif /* __ATOMIC_RELAXED */
>> +#endif /* __QEMU_ATOMIC_H */
>> --
>> 2.7.0
>>
>>


--
Alex Bennée

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v1 3/5] include/qemu/atomic.h: default to __atomic functions
  2016-01-28 10:15 ` [Qemu-devel] [PATCH v1 3/5] include/qemu/atomic.h: default to __atomic functions Alex Bennée
  2016-01-28 11:00   ` Paolo Bonzini
  2016-04-01 14:30   ` James Hogan
@ 2016-04-01 20:35   ` Pranith Kumar
  2016-04-04  8:14     ` Paolo Bonzini
  2 siblings, 1 reply; 27+ messages in thread
From: Pranith Kumar @ 2016-04-01 20:35 UTC (permalink / raw)
  To: Alex Bennée
  Cc: mttcg, peter.maydell, mark.burton, a.rigo, qemu-devel, stefanha,
	pbonzini, fred.konrad


Hi Alex,

I have one question inline below.

Alex Bennée writes:

> The __atomic primitives have been available since GCC 4.7 and provide
> a richer interface for describing memory ordering requirements. As a
> bonus by using the primitives instead of hand-rolled functions we can
> use tools such as the AddressSanitizer which need the use of well
> defined APIs for its analysis.
>
> If we have __ATOMIC defines we exclusively use the __atomic primitives
> for all our atomic access. Otherwise we fall back to the mixture of
> __sync and hand-rolled barrier cases.
>
> +/* For C11 atomic ops */
> +
> +/* Manual memory barriers
> + *
> + *__atomic_thread_fence does not include a compiler barrier; instead,
> + * the barrier is part of __atomic_load/__atomic_store's "volatile-like"
> + * semantics. If smp_wmb() is a no-op, absence of the barrier means that
> + * the compiler is free to reorder stores on each side of the barrier.
> + * Add one here, and similarly in smp_rmb() and smp_read_barrier_depends().
> + */
> +
> +#define smp_mb()    ({ barrier(); __atomic_thread_fence(__ATOMIC_SEQ_CST); barrier(); })

I could not really understand why we need to wrap the fence with
barrier()'s. There are three parts to my confusion. Let me ask one after the
other.

First, these primitives are used in qemu codebase which runs on the host
architecture. Let us consider two example architectures: x86 and ARM.

On x86, __atomic_thread_fence(__ATOMIC_SEQ_CST) will generate an mfence
instruction. On ARM, this will generate the dmb instruction. Both these
serializing instructions also act as compiler barriers. Is there any
architecture which does not generate such a serializing instruction?

> +#define smp_wmb()   ({ barrier(); __atomic_thread_fence(__ATOMIC_RELEASE); barrier(); })
> +#define smp_rmb()   ({ barrier(); __atomic_thread_fence(__ATOMIC_ACQUIRE); barrier(); })

Second, why do you need barrier() on both sides? One barrier() seems to be
sufficient to prevent the compiler from reordering across the macro. Am I
missing something?

Finally, I tried looking at the gcc docs but could find nothing regarding
__atomic_thread_fence() not being considered as a memory barrier. What I did
find mentions about it being treated as a function call during the main
optimization stages and not during later stages:

http://www.spinics.net/lists/gcchelp/msg39798.html

AFAIU, in these later stages, even adding a barrier() as we are doing will
have no effect.

Can you point me to any docs which talk more about this?

Thanks!
-- 
Pranith

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v1 3/5] include/qemu/atomic.h: default to __atomic functions
  2016-04-01 20:35   ` Pranith Kumar
@ 2016-04-04  8:14     ` Paolo Bonzini
  2016-04-04 16:26       ` Pranith Kumar
  0 siblings, 1 reply; 27+ messages in thread
From: Paolo Bonzini @ 2016-04-04  8:14 UTC (permalink / raw)
  To: Pranith Kumar, Alex Bennée
  Cc: mttcg, peter.maydell, mark.burton, a.rigo, qemu-devel, stefanha,
	fred.konrad



On 01/04/2016 22:35, Pranith Kumar wrote:; barrier(); })
> I could not really understand why we need to wrap the fence with
> barrier()'s. There are three parts to my confusion. Let me ask one after the
> other.
> 
> On x86, __atomic_thread_fence(__ATOMIC_SEQ_CST) will generate an mfence
> instruction. On ARM, this will generate the dmb instruction. Both these
> serializing instructions also act as compiler barriers. Is there any
> architecture which does not generate such a serializing instruction?

(More on this later).

>> +#define smp_wmb()   ({ barrier(); __atomic_thread_fence(__ATOMIC_RELEASE); barrier(); })
>> +#define smp_rmb()   ({ barrier(); __atomic_thread_fence(__ATOMIC_ACQUIRE); barrier(); })
> 
> Second, why do you need barrier() on both sides? One barrier() seems to be
> sufficient to prevent the compiler from reordering across the macro. Am I
> missing something?

Yes, that's true.

> Finally, I tried looking at the gcc docs but could find nothing regarding
> __atomic_thread_fence() not being considered as a memory barrier. What I did
> find mentions about it being treated as a function call during the main
> optimization stages and not during later stages:
> 
> http://www.spinics.net/lists/gcchelp/msg39798.html
> 
> AFAIU, in these later stages, even adding a barrier() as we are doing will
> have no effect.
> 
> Can you point me to any docs which talk more about this?

The issue is that atomic_thread_fence() only affects other atomic
operations, while smp_rmb() and smp_wmb() affect normal loads and stores
as well.

In the GCC implementation, atomic operations (even relaxed ones) access
memory as if the pointer was volatile.  By doing this, GCC can remove
the acquire and release fences altogether on TSO architectures.  We
actually observed a case where the compiler subsequently inverted the
order of two writes around a smp_wmb().  It was fixed in commit 3bbf572
("atomics: add explicit compiler fence in __atomic memory barriers",
2015-06-05).

In principle it could do the same on architectures that are sequentially
consistent; even if none exists in practice, keeping the barriers for
smp_mb() is consistent with the other barriers.

Paolo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v1 3/5] include/qemu/atomic.h: default to __atomic functions
  2016-04-04  8:14     ` Paolo Bonzini
@ 2016-04-04 16:26       ` Pranith Kumar
  2016-04-04 17:03         ` Paolo Bonzini
  0 siblings, 1 reply; 27+ messages in thread
From: Pranith Kumar @ 2016-04-04 16:26 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: mttcg, Peter Maydell, Mark Burton, alvise rigo, qemu-devel,
	stefanha, Alex Bennée, KONRAD Frédéric

Hi Paolo,

On Mon, Apr 4, 2016 at 4:14 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> The issue is that atomic_thread_fence() only affects other atomic
> operations, while smp_rmb() and smp_wmb() affect normal loads and stores
> as well.

That is different from what I understand what atomic_thread_fence()
does. AFAIK, it should order all loads/stores and not only atomic
operations.

Quoting http://www.inf.pucrs.br/~flash/progeng2/cppreference/w/cpp/atomic/atomic_thread_fencehtml.html:

"Establishes memory synchronization ordering of non-atomic and relaxed
atomic accesses"

>
> In the GCC implementation, atomic operations (even relaxed ones) access
> memory as if the pointer was volatile.  By doing this, GCC can remove
> the acquire and release fences altogether on TSO architectures.  We
> actually observed a case where the compiler subsequently inverted the
> order of two writes around a smp_wmb().

That is a GCC bug in that case. GCC should remove smp_wmb() only after
the optimization passes have run in the codegen phase. Can you please
point me to the mailing list thread/bug where this was discovered? Or
is it easily reproducible with reverting that patch? In that case, the
location of the bug will be helpful to analyse this further.

> In principle it could do the same on architectures that are sequentially
> consistent; even if none exists in practice, keeping the barriers for
> smp_mb() is consistent with the other barriers.

I understand one barrier(), but having two on both sides seems
unnecessary. I would prefer we clean up and have just one. Although, I
think it is just an annoyance since performance wise there is no
difference. Two consecutive barrier()'s should behave just like one.

Thanks!
-- 
Pranith

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v1 3/5] include/qemu/atomic.h: default to __atomic functions
  2016-04-04 16:26       ` Pranith Kumar
@ 2016-04-04 17:03         ` Paolo Bonzini
  2016-04-04 20:15           ` Paolo Bonzini
  2016-04-05  3:35           ` Pranith Kumar
  0 siblings, 2 replies; 27+ messages in thread
From: Paolo Bonzini @ 2016-04-04 17:03 UTC (permalink / raw)
  To: Pranith Kumar
  Cc: mttcg, Peter Maydell, Mark Burton, alvise rigo, qemu-devel,
	stefanha, Alex Bennée, KONRAD Frédéric



On 04/04/2016 18:26, Pranith Kumar wrote:
> Hi Paolo,
> 
> On Mon, Apr 4, 2016 at 4:14 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>
>> The issue is that atomic_thread_fence() only affects other atomic
>> operations, while smp_rmb() and smp_wmb() affect normal loads and stores
>> as well.
> 
> That is different from what I understand what atomic_thread_fence()
> does. AFAIK, it should order all loads/stores and not only atomic
> operations.
> 
> Quoting http://www.inf.pucrs.br/~flash/progeng2/cppreference/w/cpp/atomic/atomic_thread_fencehtml.html:
> 
> "Establishes memory synchronization ordering of non-atomic and relaxed
> atomic accesses"

You can find some information at
https://bugzilla.redhat.com/show_bug.cgi?id=1142857 and
https://retrace.fedoraproject.org/faf/problems/670281/.

I've looked at private email from that time and I was pointed to this
sentence in GCC's manual, which says the opposite:

    "Note that in the C++11 memory model, fences (e.g.,
    ‘__atomic_thread_fence’) take effect in combination with other
    atomic operations on specific memory locations (e.g., atomic loads);
    operations on specific memory locations do not necessarily affect
    other operations in the same way."

Basically, the __atomic fences are essentially a way to break up a
non-relaxed atomic access into a relaxed atomic access and the
ordering-constraining fence.  According to those emails, atomic
load-acquires and store-releases can provide synchronization for plain
loads and stores (not just for relaxed atomic loads and stores).
However, an atomic thread fence cannot do this.

>> In the GCC implementation, atomic operations (even relaxed ones) access
>> memory as if the pointer was volatile.  By doing this, GCC can remove
>> the acquire and release fences altogether on TSO architectures.  We
>> actually observed a case where the compiler subsequently inverted the
>> order of two writes around a smp_wmb().
> 
> That is a GCC bug in that case. GCC should remove smp_wmb() only after
> the optimization passes have run in the codegen phase. Can you please
> point me to the mailing list thread/bug where this was discovered? Or
> is it easily reproducible with reverting that patch? In that case, the
> location of the bug will be helpful to analyse this further.

It depends on the compiler you're using.  With some luck, reverting the
patch will cause accesses across smp_wmb() or smp_rmb() to get
reordered.  In our case it was in thread-pool.c; Kevin Wolf did the
analysis.

>> In principle it could do the same on architectures that are sequentially
>> consistent; even if none exists in practice, keeping the barriers for
>> smp_mb() is consistent with the other barriers.
> 
> I understand one barrier(), but having two on both sides seems
> unnecessary. I would prefer we clean up and have just one. Although, I
> think it is just an annoyance since performance wise there is no
> difference. Two consecutive barrier()'s should behave just like one.

Yes, this is okay to change.

Paolo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v1 3/5] include/qemu/atomic.h: default to __atomic functions
  2016-04-04 17:03         ` Paolo Bonzini
@ 2016-04-04 20:15           ` Paolo Bonzini
  2016-04-05  3:35           ` Pranith Kumar
  1 sibling, 0 replies; 27+ messages in thread
From: Paolo Bonzini @ 2016-04-04 20:15 UTC (permalink / raw)
  To: Pranith Kumar
  Cc: mttcg, Peter Maydell, Mark Burton, alvise rigo, qemu-devel,
	stefanha, Alex Bennée, KONRAD Frédéric



On 04/04/2016 19:03, Paolo Bonzini wrote:
> I've looked at private email from that time and I was pointed to this
> sentence in GCC's manual, which says the opposite:
> 
>     "Note that in the C++11 memory model, fences (e.g.,
>     ‘__atomic_thread_fence’) take effect in combination with other
>     atomic operations on specific memory locations (e.g., atomic loads);
>     operations on specific memory locations do not necessarily affect
>     other operations in the same way."

And GCC is right, based on N3291 paragraph 29.8 ("Fences"):

----
A release fence A synchronizes with an acquire fence B if there exist
atomic operations X and Y, both operating on some atomic object M, such
that A is sequenced before X, X modifies M, Y is sequenced before B, and
Y reads the value written by X or a value written by any side effect in
the hypothetical release sequence X would head if it were a release
operation.
-----

It only mentions atomic operations, not plain loads and stores.

Paolo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v1 3/5] include/qemu/atomic.h: default to __atomic functions
  2016-04-04 17:03         ` Paolo Bonzini
  2016-04-04 20:15           ` Paolo Bonzini
@ 2016-04-05  3:35           ` Pranith Kumar
  2016-04-05 12:47             ` Paolo Bonzini
  1 sibling, 1 reply; 27+ messages in thread
From: Pranith Kumar @ 2016-04-05  3:35 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: mttcg, Peter Maydell, Mark Burton, alvise rigo, qemu-devel,
	stefanha, Alex Bennée, KONRAD Frédéric

On Mon, Apr 4, 2016 at 1:03 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:

>>
>> Quoting http://www.inf.pucrs.br/~flash/progeng2/cppreference/w/cpp/atomic/atomic_thread_fencehtml.html:
>>
>> "Establishes memory synchronization ordering of non-atomic and relaxed
>> atomic accesses"
>
> You can find some information at
> https://bugzilla.redhat.com/show_bug.cgi?id=1142857 and

This bug seems to be out-of-bounds. I get a message saying that the
bug has been restricted for internal development processes.

> https://retrace.fedoraproject.org/faf/problems/670281/.
>
> I've looked at private email from that time and I was pointed to this
> sentence in GCC's manual, which says the opposite:
>
>     "Note that in the C++11 memory model, fences (e.g.,
>     ‘__atomic_thread_fence’) take effect in combination with other
>     atomic operations on specific memory locations (e.g., atomic loads);
>     operations on specific memory locations do not necessarily affect
>     other operations in the same way."
>
> Basically, the __atomic fences are essentially a way to break up a
> non-relaxed atomic access into a relaxed atomic access and the
> ordering-constraining fence.  According to those emails, atomic
> load-acquires and store-releases can provide synchronization for plain
> loads and stores (not just for relaxed atomic loads and stores).
> However, an atomic thread fence cannot do this.

Yes, on further research... this looks to be the case. I think I
understand the reasoning for this is to separate synchronization
variables/objects vs data.

>
> It depends on the compiler you're using.  With some luck, reverting the
> patch will cause accesses across smp_wmb() or smp_rmb() to get
> reordered.  In our case it was in thread-pool.c; Kevin Wolf did the
> analysis.

If ordering was crucial for those stores, I think it would have been
better to use atomics on them to enforce ordering.

Also, do you plan to introduce load_acquire/store_release() operations
like done in the linux kernel? They would cleanly map to gcc atomics
and make the ordering requirements explicit.

Thanks!
-- 
Pranith

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v1 3/5] include/qemu/atomic.h: default to __atomic functions
  2016-04-05  3:35           ` Pranith Kumar
@ 2016-04-05 12:47             ` Paolo Bonzini
  0 siblings, 0 replies; 27+ messages in thread
From: Paolo Bonzini @ 2016-04-05 12:47 UTC (permalink / raw)
  To: Pranith Kumar
  Cc: mttcg, Peter Maydell, Mark Burton, alvise rigo, qemu-devel,
	stefanha, Alex Bennée, KONRAD Frédéric



On 05/04/2016 05:35, Pranith Kumar wrote:
> > You can find some information at
> > https://bugzilla.redhat.com/show_bug.cgi?id=1142857 and
>
> This bug seems to be out-of-bounds. I get a message saying that the
> bug has been restricted for internal development processes.

Sorry, it's open now.

Paolo

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2016-04-05 12:47 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-28 10:15 [Qemu-devel] [PATCH v1 0/5] ThreadSanitizer support Alex Bennée
2016-01-28 10:15 ` [Qemu-devel] [PATCH v1 1/5] configure: introduce --extra-libs Alex Bennée
2016-01-28 11:14   ` Paolo Bonzini
2016-01-28 11:38     ` Alex Bennée
2016-01-28 10:15 ` [Qemu-devel] [PATCH v1 2/5] configure: ensure ldflags propagated to config_host Alex Bennée
2016-01-28 11:10   ` Paolo Bonzini
2016-01-28 11:13   ` Paolo Bonzini
2016-01-29 15:26     ` Alex Bennée
2016-01-28 10:15 ` [Qemu-devel] [PATCH v1 3/5] include/qemu/atomic.h: default to __atomic functions Alex Bennée
2016-01-28 11:00   ` Paolo Bonzini
2016-01-29 16:06     ` Alex Bennée
2016-02-04 12:40       ` Paolo Bonzini
2016-02-04 13:00         ` Peter Maydell
2016-04-01 14:30   ` James Hogan
2016-04-01 14:51     ` Peter Maydell
2016-04-01 16:06     ` Alex Bennée
2016-04-01 20:35   ` Pranith Kumar
2016-04-04  8:14     ` Paolo Bonzini
2016-04-04 16:26       ` Pranith Kumar
2016-04-04 17:03         ` Paolo Bonzini
2016-04-04 20:15           ` Paolo Bonzini
2016-04-05  3:35           ` Pranith Kumar
2016-04-05 12:47             ` Paolo Bonzini
2016-01-28 10:15 ` [Qemu-devel] [PATCH v1 4/5] async.c: various atomic fixes for tsan Alex Bennée
2016-01-28 10:45   ` Paolo Bonzini
2016-01-28 10:15 ` [Qemu-devel] [PATCH v1 5/5] thread-pool: atomic fixes from tsan Alex Bennée
2016-01-28 10:44   ` Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.