All of lore.kernel.org
 help / color / mirror / Atom feed
* [lttng-dev] [PATCH 00/11] Add support for TSAN to liburcu
@ 2023-05-15 20:17 Olivier Dion via lttng-dev
  2023-05-15 20:17 ` [lttng-dev] [PATCH 01/11] configure: Add --disable-atomic-builtins option Olivier Dion via lttng-dev
                   ` (24 more replies)
  0 siblings, 25 replies; 69+ messages in thread
From: Olivier Dion via lttng-dev @ 2023-05-15 20:17 UTC (permalink / raw)
  To: lttng-dev; +Cc: Olivier Dion, Dmitry Vyukov, Paul E. McKenney

This patch set adds support for TSAN in liburcu.

* Here are the major changes

  - Usage of compiler atomic builtins is added to the uatomic API.  This is
    required for TSAN to understand atomic memory accesses.  If the compiler
    supports such builtins, they are used by default.  User can opt-out and use
    the legacy implementation of the uatomic API by using the
    `--disable-atomic-builtins' configuration option.

  - The CMM memory model is introduced but yet formalized. It tries to be as
    close as possible to the C11 memory model while offering primitives such as
    cmm_smp_wmb(), cmm_smp_rmb() and cmm_mb() that can't be expressed in it.
    For example, cmm_mb() can be used for ordering memory accesses to MMIO
    devices, which is out of the scope of the C11 memory model.

  - The CMM annotation layer is a new public API that is highly experimental and
    not guaranteed to be stable at this stage.  It serves the dual purpose of
    verifying local (intra-thread) relaxed atomic accesses ordering with a
    memory barrier and global (inter-thread) relaxed atomic accesses with a
    shared state.  The second purpose is necessary for TSAN to understand memory
    accesses ordering since it does not fully support thread fence yet.

* CMM annotation example

  Consider the following pseudo-code of writer side in synchronize_rcu().  An
  acquire group is defined on the stack of the writer.  Annotations are made
  onto the group to ensure ordering of relaxed memory accesses in reader_state()
  before the memory barrier at the end of synchronize_rcu().  It also helps TSAN
  to understand that the relaxed accesses in reader_state() act like acquire
  accesses because of the memory barrier in synchronize_rcu().

  In other words, the purpose of this annotation is to convert a group of
  load-acquire memory operations into load-relaxed memory operations followed by
  a single memory barrier.  This highly benefits weakly ordered architectures by
  having a constant number of memory barriers instead of being linearly
  proportional to the number of loads.  This does not benefit TSO
  architectures.

```
enum urcu_state reader_state(unsigned long *ctr, cmm_annotate_t *acquire_group)
{
	unsigned long v;

	v = uatomic_load(ctr, CMM_RELAXED);
	cmm_annotate_group_mem_acquire(acquire_group, ctr);
	// ...
}

void wait_for_readers(..., cmm_annotate_group *acquire_group)
{
	// ...
	switch (reader_state(..., acquire_group)) {
		// ...
	}
	// ...
}

void synchronize_rcu()
{
	cmm_annotate_define(acquire_group);
	// ...
	wait_for_readers(..., &acquire_group);
	// ...
	cmm_annotate_group_mb_acquire(&acquire_group);
	cmm_smp_mb();
}
```

* Known limitation

  The only known limitation is with the urcu-signal flavor.  Indeed, TSAN
  hijacks calls to sigaction(2) and installs its own signal handler that will
  deliver the signals to the urcu handler at synchronization points.  This is
  known to deadlock the urcu-signal flavor in at least one case.  See commit log
  of `urcu/annotate: Add CMM annotation' for a minimal reproducer outside of
  liburcu.

  Therefore, we have the intention of deprecating the urcu-signal flavor in the
  future, starting by disabling it by default.

Olivier Dion (11):
  configure: Add --disable-atomic-builtins option
  urcu/uatomic: Use atomic builtins if configured
  urcu/compiler: Use atomic builtins if configured
  urcu/arch/generic: Use atomic builtins if configured
  urcu/system: Use atomic builtins if configured
  urcu/uatomic: Add CMM memory model
  urcu-wait: Fix wait state load/store
  tests: Use uatomic for accessing global states
  benchmark: Use uatomic for accessing global states
  tests/unit/test_build: Quiet unused return value
  urcu/annotate: Add CMM annotation

 README.md                               |  11 ++
 configure.ac                            |  26 ++++
 include/Makefile.am                     |   4 +
 include/urcu/annotate.h                 | 174 ++++++++++++++++++++++++
 include/urcu/arch/generic.h             |  37 +++++
 include/urcu/compiler.h                 |  20 ++-
 include/urcu/static/pointer.h           |  40 ++----
 include/urcu/static/urcu-bp.h           |  12 +-
 include/urcu/static/urcu-common.h       |   8 +-
 include/urcu/static/urcu-mb.h           |  11 +-
 include/urcu/static/urcu-memb.h         |  26 +++-
 include/urcu/static/urcu-qsbr.h         |  29 ++--
 include/urcu/system.h                   |  21 +++
 include/urcu/uatomic.h                  |  25 +++-
 include/urcu/uatomic/builtins-generic.h | 124 +++++++++++++++++
 include/urcu/uatomic/builtins-x86.h     | 124 +++++++++++++++++
 include/urcu/uatomic/builtins.h         |  83 +++++++++++
 include/urcu/uatomic/generic.h          | 128 +++++++++++++++++
 src/rculfhash.c                         |  92 ++++++++-----
 src/urcu-bp.c                           |  17 ++-
 src/urcu-pointer.c                      |   9 +-
 src/urcu-qsbr.c                         |  31 +++--
 src/urcu-wait.h                         |  15 +-
 src/urcu.c                              |  24 ++--
 tests/benchmark/Makefile.am             |  91 +++++++------
 tests/benchmark/common-states.c         |   1 +
 tests/benchmark/common-states.h         |  51 +++++++
 tests/benchmark/test_mutex.c            |  32 +----
 tests/benchmark/test_perthreadlock.c    |  32 +----
 tests/benchmark/test_rwlock.c           |  32 +----
 tests/benchmark/test_urcu.c             |  33 +----
 tests/benchmark/test_urcu_assign.c      |  33 +----
 tests/benchmark/test_urcu_bp.c          |  33 +----
 tests/benchmark/test_urcu_defer.c       |  33 +----
 tests/benchmark/test_urcu_gc.c          |  34 +----
 tests/benchmark/test_urcu_hash.c        |   6 +-
 tests/benchmark/test_urcu_hash.h        |  15 --
 tests/benchmark/test_urcu_hash_rw.c     |  10 +-
 tests/benchmark/test_urcu_hash_unique.c |  10 +-
 tests/benchmark/test_urcu_lfq.c         |  20 +--
 tests/benchmark/test_urcu_lfs.c         |  20 +--
 tests/benchmark/test_urcu_lfs_rcu.c     |  20 +--
 tests/benchmark/test_urcu_qsbr.c        |  33 +----
 tests/benchmark/test_urcu_qsbr_gc.c     |  34 +----
 tests/benchmark/test_urcu_wfcq.c        |  22 ++-
 tests/benchmark/test_urcu_wfq.c         |  20 +--
 tests/benchmark/test_urcu_wfs.c         |  22 ++-
 tests/common/api.h                      |  12 +-
 tests/regression/rcutorture.h           | 102 ++++++++++----
 tests/unit/test_build.c                 |   8 +-
 50 files changed, 1227 insertions(+), 623 deletions(-)
 create mode 100644 include/urcu/annotate.h
 create mode 100644 include/urcu/uatomic/builtins-generic.h
 create mode 100644 include/urcu/uatomic/builtins-x86.h
 create mode 100644 include/urcu/uatomic/builtins.h
 create mode 100644 tests/benchmark/common-states.c
 create mode 100644 tests/benchmark/common-states.h

-- 
2.39.2

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [lttng-dev] [PATCH 01/11] configure: Add --disable-atomic-builtins option
  2023-05-15 20:17 [lttng-dev] [PATCH 00/11] Add support for TSAN to liburcu Olivier Dion via lttng-dev
@ 2023-05-15 20:17 ` Olivier Dion via lttng-dev
  2023-05-15 20:17 ` [lttng-dev] [PATCH 02/11] urcu/uatomic: Use atomic builtins if configured Olivier Dion via lttng-dev
                   ` (23 subsequent siblings)
  24 siblings, 0 replies; 69+ messages in thread
From: Olivier Dion via lttng-dev @ 2023-05-15 20:17 UTC (permalink / raw)
  To: lttng-dev; +Cc: Olivier Dion, Paul E. McKenney

By default, if the toolchain supports atomic builtins, use them for the
uatomic API. This requires that the toolchains used to compile the
library and the user application supports such builtins.

The advantage of using these builtins is that they are well known
synchronization primitives by several tools such as TSAN.

Change-Id: Ia8e97112681f744f17816dbc4cbbec805a483331
Co-authored-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Olivier Dion <odion@efficios.com>
---
 README.md    | 11 +++++++++++
 configure.ac | 26 ++++++++++++++++++++++++++
 2 files changed, 37 insertions(+)

diff --git a/README.md b/README.md
index ba5bb08..6ce96c9 100644
--- a/README.md
+++ b/README.md
@@ -429,6 +429,17 @@ still being used to iterate on a hash table.
 This option alters the rculfhash ABI. Make sure to compile both library
 and application with matching configuration.
 
+### Usage of `--disable-atomic-builtins`
+
+By default, the configure script will check if the toolchain supports atomic
+builtins. If so, then the RCU memory model is implemented using the atomic
+builtins of the toolchain.
+
+Building liburcu with `--disable-atomic-builtins` force to use the legacy
+internal implementations for atomic accesses.
+
+This option is useful if for example the atomic builtins for a given toolchain
+version is known to be broken or to be inefficient.
 
 Make targets
 ------------
diff --git a/configure.ac b/configure.ac
index 909cf1d..4450a31 100644
--- a/configure.ac
+++ b/configure.ac
@@ -230,6 +230,11 @@ AE_FEATURE([rcu-debug], [Enable internal debugging self-checks. Introduces a per
 AE_FEATURE_DEFAULT_DISABLE
 AE_FEATURE([cds-lfht-iter-debug], [Enable extra debugging checks for lock-free hash table iterator traversal. Alters the rculfhash ABI. Make sure to compile both library and application with matching configuration.])
 
+# toolchain atomic builtins
+# Enabled by default
+AE_FEATURE_DEFAULT_ENABLE
+AE_FEATURE([atomic-builtins], [Disable the usage of toolchain atomic builtins.])
+
 # When given, add -Werror to WARN_CFLAGS and WARN_CXXFLAGS.
 # Disabled by default
 AE_FEATURE_DEFAULT_DISABLE
@@ -259,6 +264,23 @@ AE_IF_FEATURE_ENABLED([cds-lfht-iter-debug], [
   AC_DEFINE([CONFIG_CDS_LFHT_ITER_DEBUG], [1], [Enable extra debugging checks for lock-free hash table iterator traversal. Alters the rculfhash ABI. Make sure to compile both library and application with matching configuration.])
 ])
 
+AE_IF_FEATURE_ENABLED([atomic-builtins], [
+  AC_COMPILE_IFELSE(
+	[AC_LANG_PROGRAM(
+		[[int x, y;]],
+		[[__atomic_store_n(&x, 0, __ATOMIC_RELAXED);
+		  __atomic_load_n(&x, __ATOMIC_RELAXED);
+		  y = __atomic_exchange_n(&x, 1, __ATOMIC_RELAXED);
+		  __atomic_compare_exchange_n(&x, &y, 0, 0, __ATOMIC_RELAXED, __ATOMIC_RELAXED);
+		  __atomic_add_fetch(&x, 1, __ATOMIC_RELAXED);
+		  __atomic_sub_fetch(&x, 1, __ATOMIC_RELAXED);
+		  __atomic_and_fetch(&x, 0x01, __ATOMIC_RELAXED);
+		  __atomic_or_fetch(&x, 0x01, __ATOMIC_RELAXED);
+		  __atomic_thread_fence(__ATOMIC_RELAXED);
+		  __atomic_signal_fence(__ATOMIC_RELAXED);]])],
+	[AC_DEFINE([CONFIG_RCU_USE_ATOMIC_BUILTINS], [1], [Use compiler atomic builtins.])],
+	[AE_FEATURE_DISABLE(atomic-builtins)])
+])
 
 ##                                                                          ##
 ## Set automake variables for optional feature conditionnals in Makefile.am ##
@@ -361,6 +383,10 @@ PPRINT_PROP_BOOL([Internal debugging], $value)
 AE_IS_FEATURE_ENABLED([cds-lfht-iter-debug]) && value=1 || value=0
 PPRINT_PROP_BOOL([Lock-free HT iterator debugging], $value)
 
+# atomic builtins enabled/disabled
+AE_IS_FEATURE_ENABLED([atomic-builtins]) && value=1 || value=0
+PPRINT_PROP_BOOL([Use toolchain atomic builtins], $value)
+
 PPRINT_PROP_BOOL([Multi-flavor support], 1)
 
 report_bindir="`eval eval echo $bindir`"
-- 
2.39.2

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [lttng-dev] [PATCH 02/11] urcu/uatomic: Use atomic builtins if configured
  2023-05-15 20:17 [lttng-dev] [PATCH 00/11] Add support for TSAN to liburcu Olivier Dion via lttng-dev
  2023-05-15 20:17 ` [lttng-dev] [PATCH 01/11] configure: Add --disable-atomic-builtins option Olivier Dion via lttng-dev
@ 2023-05-15 20:17 ` Olivier Dion via lttng-dev
  2023-06-21 23:19   ` Paul E. McKenney via lttng-dev
  2023-05-15 20:17 ` [lttng-dev] [PATCH 03/11] urcu/compiler: " Olivier Dion via lttng-dev
                   ` (22 subsequent siblings)
  24 siblings, 1 reply; 69+ messages in thread
From: Olivier Dion via lttng-dev @ 2023-05-15 20:17 UTC (permalink / raw)
  To: lttng-dev; +Cc: Olivier Dion, Paul E. McKenney

Implement uatomic in term of atomic builtins if configured to do so.

Change-Id: I5814494c62ee507fd5d381c3ba4ccd0a80c4f4e3
Co-authored-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Olivier Dion <odion@efficios.com>
---
 include/Makefile.am                     |  3 +
 include/urcu/uatomic.h                  |  5 +-
 include/urcu/uatomic/builtins-generic.h | 85 +++++++++++++++++++++++++
 include/urcu/uatomic/builtins-x86.h     | 85 +++++++++++++++++++++++++
 include/urcu/uatomic/builtins.h         | 83 ++++++++++++++++++++++++
 5 files changed, 260 insertions(+), 1 deletion(-)
 create mode 100644 include/urcu/uatomic/builtins-generic.h
 create mode 100644 include/urcu/uatomic/builtins-x86.h
 create mode 100644 include/urcu/uatomic/builtins.h

diff --git a/include/Makefile.am b/include/Makefile.am
index ba1fe60..fac941f 100644
--- a/include/Makefile.am
+++ b/include/Makefile.am
@@ -63,6 +63,9 @@ nobase_include_HEADERS = \
 	urcu/uatomic/alpha.h \
 	urcu/uatomic_arch.h \
 	urcu/uatomic/arm.h \
+	urcu/uatomic/builtins.h \
+	urcu/uatomic/builtins-generic.h \
+	urcu/uatomic/builtins-x86.h \
 	urcu/uatomic/gcc.h \
 	urcu/uatomic/generic.h \
 	urcu/uatomic.h \
diff --git a/include/urcu/uatomic.h b/include/urcu/uatomic.h
index 2fb5fd4..6b57c5f 100644
--- a/include/urcu/uatomic.h
+++ b/include/urcu/uatomic.h
@@ -22,8 +22,11 @@
 #define _URCU_UATOMIC_H
 
 #include <urcu/arch.h>
+#include <urcu/config.h>
 
-#if defined(URCU_ARCH_X86)
+#if defined(CONFIG_RCU_USE_ATOMIC_BUILTINS)
+#include <urcu/uatomic/builtins.h>
+#elif defined(URCU_ARCH_X86)
 #include <urcu/uatomic/x86.h>
 #elif defined(URCU_ARCH_PPC)
 #include <urcu/uatomic/ppc.h>
diff --git a/include/urcu/uatomic/builtins-generic.h b/include/urcu/uatomic/builtins-generic.h
new file mode 100644
index 0000000..8e6a9b5
--- /dev/null
+++ b/include/urcu/uatomic/builtins-generic.h
@@ -0,0 +1,85 @@
+/*
+ * urcu/uatomic/builtins-generic.h
+ *
+ * Copyright (c) 2023 Olivier Dion <odion@efficios.com>
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#ifndef _URCU_UATOMIC_BUILTINS_GENERIC_H
+#define _URCU_UATOMIC_BUILTINS_GENERIC_H
+
+#include <urcu/system.h>
+
+#define uatomic_set(addr, v) __atomic_store_n(addr, v, __ATOMIC_RELAXED)
+
+#define uatomic_read(addr) __atomic_load_n(addr, __ATOMIC_RELAXED)
+
+#define uatomic_cmpxchg(addr, old, new)					\
+	__extension__							\
+	({								\
+		__typeof__(*(addr)) _old = (__typeof__(*(addr)))old;	\
+		__atomic_compare_exchange_n(addr, &_old, new, 0,	\
+					    __ATOMIC_SEQ_CST,		\
+					    __ATOMIC_SEQ_CST);		\
+		_old;							\
+	})
+
+#define uatomic_xchg(addr, v)				\
+	__atomic_exchange_n(addr, v, __ATOMIC_SEQ_CST)
+
+#define uatomic_add_return(addr, v)			\
+	__atomic_add_fetch(addr, v, __ATOMIC_SEQ_CST)
+
+#define uatomic_sub_return(addr, v)			\
+	__atomic_sub_fetch(addr, v, __ATOMIC_SEQ_CST)
+
+#define uatomic_and(addr, mask)					\
+	(void)__atomic_and_fetch(addr, mask, __ATOMIC_RELAXED)
+
+#define uatomic_or(addr, mask)					\
+	(void)__atomic_or_fetch(addr, mask, __ATOMIC_RELAXED)
+
+#define uatomic_add(addr, v)					\
+	(void)__atomic_add_fetch(addr, v, __ATOMIC_RELAXED)
+
+#define uatomic_sub(addr, v)					\
+	(void)__atomic_sub_fetch(addr, v, __ATOMIC_RELAXED)
+
+#define uatomic_inc(addr)					\
+	(void)__atomic_add_fetch(addr, 1, __ATOMIC_RELAXED)
+
+#define uatomic_dec(addr)					\
+	(void)__atomic_sub_fetch(addr, 1, __ATOMIC_RELAXED)
+
+#define cmm_smp_mb__before_uatomic_and() cmm_smp_mb()
+#define cmm_smp_mb__after_uatomic_and()  cmm_smp_mb()
+
+#define cmm_smp_mb__before_uatomic_or() cmm_smp_mb()
+#define cmm_smp_mb__after_uatomic_or()  cmm_smp_mb()
+
+#define cmm_smp_mb__before_uatomic_add() cmm_smp_mb()
+#define cmm_smp_mb__after_uatomic_add()  cmm_smp_mb()
+
+#define cmm_smp_mb__before_uatomic_sub() cmm_smp_mb()
+#define cmm_smp_mb__after_uatomic_sub()  cmm_smp_mb()
+
+#define cmm_smp_mb__before_uatomic_inc() cmm_smp_mb()
+#define cmm_smp_mb__after_uatomic_inc() cmm_smp_mb()
+
+#define cmm_smp_mb__before_uatomic_dec() cmm_smp_mb()
+#define cmm_smp_mb__after_uatomic_dec() cmm_smp_mb()
+
+#endif /* _URCU_UATOMIC_BUILTINS_GENERIC_H */
diff --git a/include/urcu/uatomic/builtins-x86.h b/include/urcu/uatomic/builtins-x86.h
new file mode 100644
index 0000000..a70f922
--- /dev/null
+++ b/include/urcu/uatomic/builtins-x86.h
@@ -0,0 +1,85 @@
+/*
+ * urcu/uatomic/builtins-x86.h
+ *
+ * Copyright (c) 2023 Olivier Dion <odion@efficios.com>
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#ifndef _URCU_UATOMIC_BUILTINS_X86_H
+#define _URCU_UATOMIC_BUILTINS_X86_H
+
+#include <urcu/system.h>
+
+#define uatomic_set(addr, v) __atomic_store_n(addr, v, __ATOMIC_RELAXED)
+
+#define uatomic_read(addr) __atomic_load_n(addr, __ATOMIC_RELAXED)
+
+#define uatomic_cmpxchg(addr, old, new)					\
+	__extension__							\
+	({								\
+		__typeof__(*(addr)) _old = (__typeof__(*(addr)))old;	\
+		__atomic_compare_exchange_n(addr, &_old, new, 0,	\
+					    __ATOMIC_SEQ_CST,		\
+					    __ATOMIC_SEQ_CST);		\
+		_old;							\
+	})
+
+#define uatomic_xchg(addr, v)				\
+	__atomic_exchange_n(addr, v, __ATOMIC_SEQ_CST)
+
+#define uatomic_add_return(addr, v)			\
+	__atomic_add_fetch(addr, v, __ATOMIC_SEQ_CST)
+
+#define uatomic_sub_return(addr, v)			\
+	__atomic_sub_fetch(addr, v, __ATOMIC_SEQ_CST)
+
+#define uatomic_and(addr, mask)					\
+	(void)__atomic_and_fetch(addr, mask, __ATOMIC_SEQ_CST)
+
+#define uatomic_or(addr, mask)					\
+	(void)__atomic_or_fetch(addr, mask, __ATOMIC_SEQ_CST)
+
+#define uatomic_add(addr, v)					\
+	(void)__atomic_add_fetch(addr, v, __ATOMIC_SEQ_CST)
+
+#define uatomic_sub(addr, v)					\
+	(void)__atomic_sub_fetch(addr, v, __ATOMIC_SEQ_CST)
+
+#define uatomic_inc(addr)					\
+	(void)__atomic_add_fetch(addr, 1, __ATOMIC_SEQ_CST)
+
+#define uatomic_dec(addr)					\
+	(void)__atomic_sub_fetch(addr, 1, __ATOMIC_SEQ_CST)
+
+#define cmm_smp_mb__before_uatomic_and() do { } while (0)
+#define cmm_smp_mb__after_uatomic_and()  do { } while (0)
+
+#define cmm_smp_mb__before_uatomic_or() do { } while (0)
+#define cmm_smp_mb__after_uatomic_or()  do { } while (0)
+
+#define cmm_smp_mb__before_uatomic_add() do { } while (0)
+#define cmm_smp_mb__after_uatomic_add()  do { } while (0)
+
+#define cmm_smp_mb__before_uatomic_sub() do { } while (0)
+#define cmm_smp_mb__after_uatomic_sub()  do { } while (0)
+
+#define cmm_smp_mb__before_uatomic_inc() do { } while (0)
+#define cmm_smp_mb__after_uatomic_inc()  do { } while (0)
+
+#define cmm_smp_mb__before_uatomic_dec() do { } while (0)
+#define cmm_smp_mb__after_uatomic_dec()  do { } while (0)
+
+#endif /* _URCU_UATOMIC_BUILTINS_X86_H */
diff --git a/include/urcu/uatomic/builtins.h b/include/urcu/uatomic/builtins.h
new file mode 100644
index 0000000..164201b
--- /dev/null
+++ b/include/urcu/uatomic/builtins.h
@@ -0,0 +1,83 @@
+/*
+ * urcu/uatomic/builtins.h
+ *
+ * Copyright (c) 2023 Olivier Dion <odion@efficios.com>
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#ifndef _URCU_UATOMIC_BUILTINS_H
+#define _URCU_UATOMIC_BUILTINS_H
+
+#include <urcu/arch.h>
+
+#if defined(__has_builtin)
+#  if !__has_builtin(__atomic_store_n)
+#    error "Toolchain does not support __atomic_store_n."
+#  endif
+#  if !__has_builtin(__atomic_load_n)
+#    error "Toolchain does not support __atomic_load_n."
+#  endif
+#  if !__has_builtin(__atomic_exchange_n)
+#    error "Toolchain does not support __atomic_exchange_n."
+#  endif
+#  if !__has_builtin(__atomic_compare_exchange_n)
+#    error "Toolchain does not support __atomic_compare_exchange_n."
+#  endif
+#  if !__has_builtin(__atomic_add_fetch)
+#    error "Toolchain does not support __atomic_add_fetch."
+#  endif
+#  if !__has_builtin(__atomic_sub_fetch)
+#    error "Toolchain does not support __atomic_sub_fetch."
+#  endif
+#  if !__has_builtin(__atomic_or_fetch)
+#    error "Toolchain does not support __atomic_or_fetch."
+#  endif
+#  if !__has_builtin(__atomic_thread_fence)
+#    error "Toolchain does not support __atomic_thread_fence."
+#  endif
+#  if !__has_builtin(__atomic_signal_fence)
+#    error "Toolchain does not support __atomic_signal_fence."
+#  endif
+#elif defined(__GNUC__)
+#  define GCC_VERSION (__GNUC__       * 10000 + \
+		       __GNUC_MINOR__ * 100   + \
+		       __GNUC_PATCHLEVEL__)
+#  if  GCC_VERSION < 40700
+#    error "GCC version is too old. Version must be 4.7 or greater"
+#  endif
+#  undef  GCC_VERSION
+#else
+#  error "Toolchain is not supported."
+#endif
+
+#if defined(__GNUC__)
+#  define UATOMIC_HAS_ATOMIC_BYTE  __GCC_ATOMIC_CHAR_LOCK_FREE
+#  define UATOMIC_HAS_ATOMIC_SHORT __GCC_ATOMIC_SHORT_LOCK_FREE
+#elif defined(__clang__)
+#  define UATOMIC_HAS_ATOMIC_BYTE  __CLANG_ATOMIC_CHAR_LOCK_FREE
+#  define UATOMIC_HAS_ATOMIC_SHORT __CLANG_ATOMIC_SHORT_LOCK_FREE
+#else
+/* #  define UATOMIC_HAS_ATOMIC_BYTE  */
+/* #  define UATOMIC_HAS_ATOMIC_SHORT */
+#endif
+
+#if defined(URCU_ARCH_X86)
+#  include <urcu/uatomic/builtins-x86.h>
+#else
+#  include <urcu/uatomic/builtins-generic.h>
+#endif
+
+#endif	/* _URCU_UATOMIC_BUILTINS_H */
-- 
2.39.2

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [lttng-dev] [PATCH 03/11] urcu/compiler: Use atomic builtins if configured
  2023-05-15 20:17 [lttng-dev] [PATCH 00/11] Add support for TSAN to liburcu Olivier Dion via lttng-dev
  2023-05-15 20:17 ` [lttng-dev] [PATCH 01/11] configure: Add --disable-atomic-builtins option Olivier Dion via lttng-dev
  2023-05-15 20:17 ` [lttng-dev] [PATCH 02/11] urcu/uatomic: Use atomic builtins if configured Olivier Dion via lttng-dev
@ 2023-05-15 20:17 ` Olivier Dion via lttng-dev
  2023-05-15 20:17 ` [lttng-dev] [PATCH 04/11] urcu/arch/generic: " Olivier Dion via lttng-dev
                   ` (21 subsequent siblings)
  24 siblings, 0 replies; 69+ messages in thread
From: Olivier Dion via lttng-dev @ 2023-05-15 20:17 UTC (permalink / raw)
  To: lttng-dev; +Cc: Olivier Dion, Paul E. McKenney

Use __atomic_signal_fence(__ATOMIC_SEQ_CST) for cmm_barrier() if
configured to use atomic builtins.

Change-Id: Ib168b50f1e97a8da861b92d6882c56db230ebb2c
Co-authored-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Olivier Dion <odion@efficios.com>
---
 include/urcu/compiler.h | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/include/urcu/compiler.h b/include/urcu/compiler.h
index 2f32b38..3604488 100644
--- a/include/urcu/compiler.h
+++ b/include/urcu/compiler.h
@@ -25,10 +25,16 @@
 # include <type_traits>	/* for std::remove_cv */
 #endif
 
+#include <urcu/config.h>
+
 #define caa_likely(x)	__builtin_expect(!!(x), 1)
 #define caa_unlikely(x)	__builtin_expect(!!(x), 0)
 
-#define	cmm_barrier()	__asm__ __volatile__ ("" : : : "memory")
+#ifdef CONFIG_RCU_USE_ATOMIC_BUILTINS
+#  define cmm_barrier() __atomic_signal_fence(__ATOMIC_SEQ_CST)
+#else
+#  define cmm_barrier() asm volatile ("" : : : "memory")
+#endif
 
 /*
  * Instruct the compiler to perform only a single access to a variable
-- 
2.39.2

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [lttng-dev] [PATCH 04/11] urcu/arch/generic: Use atomic builtins if configured
  2023-05-15 20:17 [lttng-dev] [PATCH 00/11] Add support for TSAN to liburcu Olivier Dion via lttng-dev
                   ` (2 preceding siblings ...)
  2023-05-15 20:17 ` [lttng-dev] [PATCH 03/11] urcu/compiler: " Olivier Dion via lttng-dev
@ 2023-05-15 20:17 ` Olivier Dion via lttng-dev
  2023-06-21 23:22   ` Paul E. McKenney via lttng-dev
  2023-05-15 20:17 ` [lttng-dev] [PATCH 05/11] urcu/system: " Olivier Dion via lttng-dev
                   ` (20 subsequent siblings)
  24 siblings, 1 reply; 69+ messages in thread
From: Olivier Dion via lttng-dev @ 2023-05-15 20:17 UTC (permalink / raw)
  To: lttng-dev; +Cc: Olivier Dion, Paul E. McKenney

If configured to use atomic builtins, implement SMP memory barriers in
term of atomic builtins if the architecture does not implement its own
version.

Change-Id: Iddc4283606e0fce572e104d2d3f03b5c0d9926fb
Co-authored-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Olivier Dion <odion@efficios.com>
---
 include/urcu/arch/generic.h | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/include/urcu/arch/generic.h b/include/urcu/arch/generic.h
index be6e41e..e292c70 100644
--- a/include/urcu/arch/generic.h
+++ b/include/urcu/arch/generic.h
@@ -43,6 +43,14 @@ extern "C" {
  * GCC builtins) as well as cmm_rmb and cmm_wmb (defaulting to cmm_mb).
  */
 
+#ifdef CONFIG_RCU_USE_ATOMIC_BUILTINS
+
+# ifndef cmm_smp_mb
+#  define cmm_smp_mb() __atomic_thread_fence(__ATOMIC_SEQ_CST)
+# endif
+
+#endif	/* CONFIG_RCU_USE_ATOMIC_BUILTINS */
+
 #ifndef cmm_mb
 #define cmm_mb()    __sync_synchronize()
 #endif
-- 
2.39.2

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [lttng-dev] [PATCH 05/11] urcu/system: Use atomic builtins if configured
  2023-05-15 20:17 [lttng-dev] [PATCH 00/11] Add support for TSAN to liburcu Olivier Dion via lttng-dev
                   ` (3 preceding siblings ...)
  2023-05-15 20:17 ` [lttng-dev] [PATCH 04/11] urcu/arch/generic: " Olivier Dion via lttng-dev
@ 2023-05-15 20:17 ` Olivier Dion via lttng-dev
  2023-05-15 20:17 ` [lttng-dev] [PATCH 06/11] urcu/uatomic: Add CMM memory model Olivier Dion via lttng-dev
                   ` (19 subsequent siblings)
  24 siblings, 0 replies; 69+ messages in thread
From: Olivier Dion via lttng-dev @ 2023-05-15 20:17 UTC (permalink / raw)
  To: lttng-dev; +Cc: Olivier Dion, Paul E. McKenney

If configured to use atomic builtins, use them for implementing the
CMM_LOAD_SHARED and CMM_STORE_SHARED macros.

Change-Id: I3eaaaaf0d26c47aced6e94b40fd59c7b8baa6272
Co-authored-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Olivier Dion <odion@efficios.com>
---
 include/urcu/system.h | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/include/urcu/system.h b/include/urcu/system.h
index faae390..f184aad 100644
--- a/include/urcu/system.h
+++ b/include/urcu/system.h
@@ -19,9 +19,28 @@
  * all copies or substantial portions of the Software.
  */
 
+#include <urcu/config.h>
 #include <urcu/compiler.h>
 #include <urcu/arch.h>
 
+#ifdef CONFIG_RCU_USE_ATOMIC_BUILTINS
+
+#define CMM_LOAD_SHARED(x)			\
+	__atomic_load_n(&(x), __ATOMIC_RELAXED)
+
+#define _CMM_LOAD_SHARED(x) CMM_LOAD_SHARED(x)
+
+#define CMM_STORE_SHARED(x, v)					\
+	__extension__						\
+	({							\
+		__typeof__(v) _v = (v);				\
+		__atomic_store_n(&(x), _v, __ATOMIC_RELAXED);	\
+		_v;						\
+	})
+
+#define _CMM_STORE_SHARED(x, v) CMM_STORE_SHARED(x, v)
+
+#else
 /*
  * Identify a shared load. A cmm_smp_rmc() or cmm_smp_mc() should come
  * before the load.
@@ -56,4 +75,6 @@
 		_v = _v;	/* Work around clang "unused result" */	\
 	})
 
+#endif	/* CONFIG_RCU_USE_ATOMIC_BUILTINS */
+
 #endif /* _URCU_SYSTEM_H */
-- 
2.39.2

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [lttng-dev] [PATCH 06/11] urcu/uatomic: Add CMM memory model
  2023-05-15 20:17 [lttng-dev] [PATCH 00/11] Add support for TSAN to liburcu Olivier Dion via lttng-dev
                   ` (4 preceding siblings ...)
  2023-05-15 20:17 ` [lttng-dev] [PATCH 05/11] urcu/system: " Olivier Dion via lttng-dev
@ 2023-05-15 20:17 ` Olivier Dion via lttng-dev
  2023-05-15 20:17 ` [lttng-dev] [PATCH 07/11] urcu-wait: Fix wait state load/store Olivier Dion via lttng-dev
                   ` (18 subsequent siblings)
  24 siblings, 0 replies; 69+ messages in thread
From: Olivier Dion via lttng-dev @ 2023-05-15 20:17 UTC (permalink / raw)
  To: lttng-dev; +Cc: Olivier Dion, Paul E. McKenney

Introducing the URCU memory model with the following new primitives:

  - uatomic_load(addr, memory_order)

  - uatomic_store(addr, value, memory_order)
  - uatomic_and_mo(addr, mask, memory_order)
  - uatomic_or_mo(addr, mask, memory_order)
  - uatomic_add_mo(addr, value, memory_order)
  - uatomic_sub_mo(addr, value, memory_order)
  - uatomic_inc_mo(addr, memory_order)
  - uatomic_dec_mo(addr, memory_order)

  - uatomic_add_return_mo(addr, value, memory_order)
  - uatomic_sub_return_mo(addr, value, memory_order)

  - uatomic_cmpxchg_mo(addr, old, new,
                       memory_order_success,
                       memory_order_failure)

  - uatomic_cmpxchg(addr, new, memory_order)

The URCU memory model reflects the C11 memory model. The memory order
can be selected through the enum rcu_memorder.

If configured with atomic builtins, the correspondence between the URCU
memory model and the C11 memory model is a one to one. However, if not
configured with atomic builtins, the following stipulate the memory
model.

For load operations with uatomic_load(), the memory orders
CMM_RELAXED, CMM_CONSUME, CMM_ACQUIRE and CMM_SEQ_CST are allowed. A
barrier is maybe inserted before and after the load from memory
depending on the memory order:

  - CMM_RELAXED: No barrier
  - CMM_CONSUME: Memory barrier after read
  - CMM_ACQUIRE: Memory barrier after read
  - CMM_SEQ_CST: Memory barriers before and after read

For store operations with uatomic_store(), the memory orders
CMM_RELAXED, CMM_RELEASE, CMM_SEQ_CST are allowed. A barrier is maybe
inserted before and after the load from memory depending on the memory
order:

  - CMM_RELAXED: No barrier
  - CMM_RELEASE: Memory barrier before operation
  - CMM_SEQ_CST: Memory barriers before and after operation

For store operations with uatomic_and_mo(), uatomic_or_mo(),
uatomic_add_mo(), uatomic_sub_mo(), uatomic_inc_mo(), uatomic_dec_mo(),
uatomic_add_return_mo() and uatomic_sub_return_mo(), all memory orders
are allowed. A barrier is maybe inserted before and after the store to
memory depending on the memory order:

  - CMM_RELAXED: No barrier
  - CMM_ACQUIRE: Memory barrier after operation
  - CMM_CONSUME: Memory barrier after operation
  - CMM_RELEASE: Memory barrier before operation
  - CMM_ACQ_REL: Memory barriers before and after operation
  - CMM_SEQ_CST: Memory barriers before and after operation

For the compare exchange operation uatomic_cmpxchg_mo(), the success
memory order can be anything while the failure memory order cannot be
CMM_RELEASE nor CMM_ACQ_REL and cannot be stronger than the success
memory order.

For the exchange operation uatomic_xchg_mo(), any memory order is valid.

Change-Id: I213ba19c84e82a63083f00143a3142ffbdab1d52
Co-authored-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Olivier Dion <odion@efficios.com>
---
 include/urcu/static/pointer.h           |  40 +++-----
 include/urcu/uatomic.h                  |  20 ++++
 include/urcu/uatomic/builtins-generic.h |  81 +++++++++++----
 include/urcu/uatomic/builtins-x86.h     |  79 +++++++++++----
 include/urcu/uatomic/generic.h          | 128 ++++++++++++++++++++++++
 src/urcu-pointer.c                      |   9 +-
 6 files changed, 283 insertions(+), 74 deletions(-)

diff --git a/include/urcu/static/pointer.h b/include/urcu/static/pointer.h
index 9e46a57..9da8657 100644
--- a/include/urcu/static/pointer.h
+++ b/include/urcu/static/pointer.h
@@ -96,23 +96,8 @@ extern "C" {
  * -Wincompatible-pointer-types errors.  Using the statement expression
  * makes it an rvalue and gets rid of the const-ness.
  */
-#ifdef __URCU_DEREFERENCE_USE_ATOMIC_CONSUME
-# define _rcu_dereference(p) __extension__ ({						\
-				__typeof__(__extension__ ({				\
-					__typeof__(p) __attribute__((unused)) _________p0 = { 0 }; \
-					_________p0;					\
-				})) _________p1;					\
-				__atomic_load(&(p), &_________p1, __ATOMIC_CONSUME);	\
-				(_________p1);						\
-			})
-#else
-# define _rcu_dereference(p) __extension__ ({						\
-				__typeof__(p) _________p1 = CMM_LOAD_SHARED(p);		\
-				cmm_smp_read_barrier_depends();				\
-				(_________p1);						\
-			})
-#endif
-
+# define _rcu_dereference(p)			\
+	uatomic_load(&(p), CMM_CONSUME)
 /**
  * _rcu_cmpxchg_pointer - same as rcu_assign_pointer, but tests if the pointer
  * is as expected by "old". If succeeds, returns the previous pointer to the
@@ -131,8 +116,9 @@ extern "C" {
 	({								\
 		__typeof__(*p) _________pold = (old);			\
 		__typeof__(*p) _________pnew = (_new);			\
-		uatomic_cmpxchg(p, _________pold, _________pnew);	\
-	})
+		uatomic_cmpxchg_mo(p, _________pold, _________pnew,	\
+				   CMM_SEQ_CST, CMM_SEQ_CST);		\
+	});
 
 /**
  * _rcu_xchg_pointer - same as rcu_assign_pointer, but returns the previous
@@ -149,17 +135,17 @@ extern "C" {
 	__extension__					\
 	({						\
 		__typeof__(*p) _________pv = (v);	\
-		uatomic_xchg(p, _________pv);		\
+		uatomic_xchg_mo(p, _________pv,		\
+				CMM_SEQ_CST);		\
 	})
 
 
-#define _rcu_set_pointer(p, v)				\
-	do {						\
-		__typeof__(*p) _________pv = (v);	\
-		if (!__builtin_constant_p(v) || 	\
-		    ((v) != NULL))			\
-			cmm_wmb();				\
-		uatomic_set(p, _________pv);		\
+#define _rcu_set_pointer(p, v)						\
+	do {								\
+		__typeof__(*p) _________pv = (v);			\
+		uatomic_store(p, _________pv,				\
+			__builtin_constant_p(v) && (v) == NULL ?	\
+			CMM_RELAXED : CMM_RELEASE);			\
 	} while (0)
 
 /**
diff --git a/include/urcu/uatomic.h b/include/urcu/uatomic.h
index 6b57c5f..6c0d38f 100644
--- a/include/urcu/uatomic.h
+++ b/include/urcu/uatomic.h
@@ -24,6 +24,26 @@
 #include <urcu/arch.h>
 #include <urcu/config.h>
 
+#ifdef CONFIG_RCU_USE_ATOMIC_BUILTINS
+enum cmm_memorder {
+	CMM_RELAXED = __ATOMIC_RELAXED,
+	CMM_CONSUME = __ATOMIC_CONSUME,
+	CMM_ACQUIRE = __ATOMIC_ACQUIRE,
+	CMM_RELEASE = __ATOMIC_RELEASE,
+	CMM_ACQ_REL = __ATOMIC_ACQ_REL,
+	CMM_SEQ_CST = __ATOMIC_SEQ_CST,
+};
+#else
+enum cmm_memorder {
+	CMM_RELAXED,
+	CMM_CONSUME,
+	CMM_ACQUIRE,
+	CMM_RELEASE,
+	CMM_ACQ_REL,
+	CMM_SEQ_CST,
+};
+#endif
+
 #if defined(CONFIG_RCU_USE_ATOMIC_BUILTINS)
 #include <urcu/uatomic/builtins.h>
 #elif defined(URCU_ARCH_X86)
diff --git a/include/urcu/uatomic/builtins-generic.h b/include/urcu/uatomic/builtins-generic.h
index 8e6a9b5..597bd61 100644
--- a/include/urcu/uatomic/builtins-generic.h
+++ b/include/urcu/uatomic/builtins-generic.h
@@ -23,46 +23,85 @@
 
 #include <urcu/system.h>
 
-#define uatomic_set(addr, v) __atomic_store_n(addr, v, __ATOMIC_RELAXED)
+#define uatomic_store(addr, v, mo)		\
+	__atomic_store_n(addr, v, mo)
 
-#define uatomic_read(addr) __atomic_load_n(addr, __ATOMIC_RELAXED)
+#define uatomic_set(addr, v)			\
+	uatomic_store(addr, v, CMM_RELAXED)
 
-#define uatomic_cmpxchg(addr, old, new)					\
+#define uatomic_load(addr, mo)			\
+	__atomic_load_n(addr, mo)
+
+#define uatomic_read(addr)			\
+	uatomic_load(addr, CMM_RELAXED)
+
+
+#define uatomic_cmpxchg_mo(addr, old, new, mos, mof)			\
 	__extension__							\
 	({								\
 		__typeof__(*(addr)) _old = (__typeof__(*(addr)))old;	\
 		__atomic_compare_exchange_n(addr, &_old, new, 0,	\
-					    __ATOMIC_SEQ_CST,		\
-					    __ATOMIC_SEQ_CST);		\
+					    mos,			\
+					    mof);			\
 		_old;							\
 	})
 
-#define uatomic_xchg(addr, v)				\
-	__atomic_exchange_n(addr, v, __ATOMIC_SEQ_CST)
+#define uatomic_cmpxchg(addr, old, new)					\
+	uatomic_cmpxchg_mo(addr, old, new, CMM_SEQ_CST, CMM_SEQ_CST)
+
+#define uatomic_xchg_mo(addr, v, mo)		\
+	__atomic_exchange_n(addr, v, mo)
+
+#define uatomic_xchg(addr, v)			\
+	uatomic_xchg_mo(addr, v, CMM_SEQ_CST)
+
+#define uatomic_add_return_mo(addr, v, mo)	\
+	__atomic_add_fetch(addr, v, mo)
 
 #define uatomic_add_return(addr, v)			\
-	__atomic_add_fetch(addr, v, __ATOMIC_SEQ_CST)
+	uatomic_add_return_mo(addr, v, CMM_SEQ_CST)
+
+#define uatomic_sub_return_mo(addr, v, mo)	\
+	__atomic_sub_fetch(addr, v, mo)
 
 #define uatomic_sub_return(addr, v)			\
-	__atomic_sub_fetch(addr, v, __ATOMIC_SEQ_CST)
+	uatomic_sub_return_mo(addr, v, CMM_SEQ_CST)
 
-#define uatomic_and(addr, mask)					\
-	(void)__atomic_and_fetch(addr, mask, __ATOMIC_RELAXED)
+#define uatomic_and_mo(addr, mask, mo)			\
+	(void ) __atomic_and_fetch(addr, mask, mo)
 
-#define uatomic_or(addr, mask)					\
-	(void)__atomic_or_fetch(addr, mask, __ATOMIC_RELAXED)
+#define uatomic_and(addr, mask)				\
+	(void) uatomic_and_mo(addr, mask, CMM_RELAXED)
 
-#define uatomic_add(addr, v)					\
-	(void)__atomic_add_fetch(addr, v, __ATOMIC_RELAXED)
+#define uatomic_or_mo(addr, mask, mo)			\
+	(void) __atomic_or_fetch(addr, mask, mo)
 
-#define uatomic_sub(addr, v)					\
-	(void)__atomic_sub_fetch(addr, v, __ATOMIC_RELAXED)
+#define uatomic_or(addr, mask)				\
+	(void) uatomic_or_mo(addr, mask, CMM_RELAXED)
 
-#define uatomic_inc(addr)					\
-	(void)__atomic_add_fetch(addr, 1, __ATOMIC_RELAXED)
+#define uatomic_add_mo(addr, v, mo)		\
+	(void) __atomic_add_fetch(addr, v, mo)
 
-#define uatomic_dec(addr)					\
-	(void)__atomic_sub_fetch(addr, 1, __ATOMIC_RELAXED)
+#define uatomic_add(addr, v)				\
+	(void) uatomic_add_mo(addr, v, CMM_RELAXED)
+
+#define uatomic_sub_mo(addr, v, mo)		\
+	(void) __atomic_sub_fetch(addr, v, mo)
+
+#define uatomic_sub(addr, v)				\
+	(void) uatomic_sub_mo(addr, v, CMM_RELAXED)
+
+#define uatomic_inc_mo(addr, mo)		\
+	(void) __atomic_add_fetch(addr, 1, mo)
+
+#define uatomic_inc(addr)				\
+	(void) uatomic_inc_mo(addr, CMM_RELAXED)
+
+#define uatomic_dec_mo(addr, mo)		\
+	(void) __atomic_sub_fetch(addr, 1, mo)
+
+#define uatomic_dec(addr)				\
+	(void) uatomic_dec_mo(addr, CMM_RELAXED)
 
 #define cmm_smp_mb__before_uatomic_and() cmm_smp_mb()
 #define cmm_smp_mb__after_uatomic_and()  cmm_smp_mb()
diff --git a/include/urcu/uatomic/builtins-x86.h b/include/urcu/uatomic/builtins-x86.h
index a70f922..c7f3bed 100644
--- a/include/urcu/uatomic/builtins-x86.h
+++ b/include/urcu/uatomic/builtins-x86.h
@@ -23,46 +23,85 @@
 
 #include <urcu/system.h>
 
-#define uatomic_set(addr, v) __atomic_store_n(addr, v, __ATOMIC_RELAXED)
+#define uatomic_store(addr, v, mo)		\
+	__atomic_store_n(addr, v, mo)
 
-#define uatomic_read(addr) __atomic_load_n(addr, __ATOMIC_RELAXED)
+#define uatomic_set(addr, v)			\
+	uatomic_store(addr, v, CMM_RELAXED)
 
-#define uatomic_cmpxchg(addr, old, new)					\
+#define uatomic_load(addr, mo)			\
+	__atomic_load_n(addr, mo)
+
+#define uatomic_read(addr)			\
+	uatomic_load(addr, CMM_RELAXED)
+
+#define uatomic_cmpxchg_mo(addr, old, new, mos, mof)			\
 	__extension__							\
 	({								\
 		__typeof__(*(addr)) _old = (__typeof__(*(addr)))old;	\
 		__atomic_compare_exchange_n(addr, &_old, new, 0,	\
-					    __ATOMIC_SEQ_CST,		\
-					    __ATOMIC_SEQ_CST);		\
+					    mos,			\
+					    mof);			\
 		_old;							\
 	})
 
+#define uatomic_cmpxchg(addr, old, new)					\
+	uatomic_cmpxchg_mo(addr, old, new, CMM_SEQ_CST, CMM_SEQ_CST)
+
+
+#define uatomic_xchg_mo(addr, v, mo)		\
+	__atomic_exchange_n(addr, v, mo)
+
 #define uatomic_xchg(addr, v)				\
-	__atomic_exchange_n(addr, v, __ATOMIC_SEQ_CST)
+	__atomic_exchange_n(addr, v, CMM_SEQ_CST)
+
+#define uatomic_add_return_mo(addr, v, mo)	\
+	__atomic_add_fetch(addr, v, mo)
 
 #define uatomic_add_return(addr, v)			\
-	__atomic_add_fetch(addr, v, __ATOMIC_SEQ_CST)
+	uatomic_add_return_mo(addr, v, CMM_SEQ_CST)
+
+#define uatomic_sub_return_mo(addr, v, mo)	\
+	__atomic_sub_fetch(addr, v, mo)
 
 #define uatomic_sub_return(addr, v)			\
-	__atomic_sub_fetch(addr, v, __ATOMIC_SEQ_CST)
+	uatomic_sub_return_mo(addr, v, CMM_SEQ_CST)
 
-#define uatomic_and(addr, mask)					\
-	(void)__atomic_and_fetch(addr, mask, __ATOMIC_SEQ_CST)
+#define uatomic_and_mo(addr, mask, mo)			\
+	(void) __atomic_and_fetch(addr, mask, mo)
 
-#define uatomic_or(addr, mask)					\
-	(void)__atomic_or_fetch(addr, mask, __ATOMIC_SEQ_CST)
+#define uatomic_and(addr, mask)				\
+	(void) uatomic_and_mo(addr, mask, CMM_SEQ_CST)
 
-#define uatomic_add(addr, v)					\
-	(void)__atomic_add_fetch(addr, v, __ATOMIC_SEQ_CST)
+#define uatomic_or_mo(addr, mask, mo)			\
+	(void) __atomic_or_fetch(addr, mask, mo)
 
-#define uatomic_sub(addr, v)					\
-	(void)__atomic_sub_fetch(addr, v, __ATOMIC_SEQ_CST)
+#define uatomic_or(addr, mask)				\
+	(void) uatomic_or_mo(addr, mask, CMM_SEQ_CST)
 
-#define uatomic_inc(addr)					\
-	(void)__atomic_add_fetch(addr, 1, __ATOMIC_SEQ_CST)
+#define uatomic_add_mo(addr, v, mo)		\
+	(void) __atomic_add_fetch(addr, v, mo)
 
-#define uatomic_dec(addr)					\
-	(void)__atomic_sub_fetch(addr, 1, __ATOMIC_SEQ_CST)
+#define uatomic_add(addr, v)				\
+	(void) uatomic_add_mo(addr, v, CMM_SEQ_CST)
+
+#define uatomic_sub_mo(addr, v, mo)		\
+	(void) __atomic_sub_fetch(addr, v, mo)
+
+#define uatomic_sub(addr, v)				\
+	(void) uatomic_sub_mo(addr, v, CMM_SEQ_CST)
+
+#define uatomic_inc_mo(addr, mo)		\
+	(void) __atomic_add_fetch(addr, 1, mo)
+
+#define uatomic_inc(addr)				\
+	(void) uatomic_inc_mo(addr, CMM_SEQ_CST)
+
+#define uatomic_dec_mo(addr, mo)		\
+	(void) __atomic_sub_fetch(addr, 1, mo)
+
+#define uatomic_dec(addr)				\
+	(void) uatomic_dec_mo(addr, CMM_SEQ_CST)
 
 #define cmm_smp_mb__before_uatomic_and() do { } while (0)
 #define cmm_smp_mb__after_uatomic_and()  do { } while (0)
diff --git a/include/urcu/uatomic/generic.h b/include/urcu/uatomic/generic.h
index e31a19b..4ec93c5 100644
--- a/include/urcu/uatomic/generic.h
+++ b/include/urcu/uatomic/generic.h
@@ -33,10 +33,138 @@ extern "C" {
 #define uatomic_set(addr, v)	((void) CMM_STORE_SHARED(*(addr), (v)))
 #endif
 
+extern void abort(void);
+
+#define uatomic_store_op(op, addr, v, mo)	\
+	({					\
+		switch (mo) {			\
+		case CMM_ACQUIRE:		\
+		case CMM_CONSUME:		\
+		case CMM_RELAXED:		\
+			break;			\
+		case CMM_RELEASE:		\
+		case CMM_ACQ_REL:		\
+		case CMM_SEQ_CST:		\
+			cmm_smp_mb();		\
+			break;			\
+		default:			\
+			abort();		\
+		}				\
+						\
+		op(addr, v);			\
+						\
+		switch (mo) {			\
+		case CMM_ACQUIRE:		\
+		case CMM_ACQ_REL:		\
+		case CMM_CONSUME:		\
+		case CMM_SEQ_CST:		\
+			cmm_smp_mb();		\
+			break;			\
+		case CMM_RELAXED:		\
+		case CMM_RELEASE:		\
+			break;			\
+		default:			\
+			abort();		\
+		}				\
+	})
+
+#define uatomic_store(addr, v, mo)			\
+	({						\
+		switch (mo) {				\
+		case CMM_RELAXED:			\
+			break;				\
+		case CMM_RELEASE:			\
+		case CMM_SEQ_CST:			\
+			cmm_smp_mb();			\
+			break;				\
+		default:				\
+			abort();			\
+		}					\
+							\
+		uatomic_set(addr, v);			\
+							\
+		switch (mo) {				\
+		case CMM_RELAXED:			\
+		case CMM_RELEASE:			\
+			break;				\
+		case CMM_SEQ_CST:			\
+			cmm_smp_mb();			\
+			break;				\
+		default:				\
+			abort();			\
+		}					\
+	})
+
+#define uatomic_and_mo(addr, v, mo)			\
+	uatomic_store_op(uatomic_and, addr, v, mo)
+
+#define uatomic_or_mo(addr, v, mo)			\
+	uatomic_store_op(uatomic_or, addr, v, mo)
+
+#define uatomic_add_mo(addr, v, mo)			\
+	uatomic_store_op(uatomic_add, addr, v, mo)
+
+#define uatomic_sub_mo(addr, v, mo)			\
+	uatomic_store_op(uatomic_sub, addr, v, mo)
+
+#define uatomic_inc_mo(addr, mo)			\
+	uatomic_store_op(uatomic_add, addr, 1, mo)
+
+#define uatomic_dec_mo(addr, mo)			\
+	uatomic_store_op(uatomic_add, addr, -1, mo)
+
+#define uatomic_cmpxchg_mo(addr, old, new, mos, mof)	\
+	uatomic_cmpxchg(addr, old, new)
+
+#define uatomic_xchg_mo(addr, v, mo)		\
+	uatomic_xchg(addr, v)
+
+#define uatomic_xchg_mo(addr, v, mo)		\
+	uatomic_xchg(addr, v)
+
+#define uatomic_add_return_mo(addr, v, mo)	\
+	uatomic_add_return(addr, v)
+
+#define uatomic_sub_return_mo(addr, v, mo)	\
+	uatomic_sub_return(addr, v)
+
+
 #ifndef uatomic_read
 #define uatomic_read(addr)	CMM_LOAD_SHARED(*(addr))
 #endif
 
+#define uatomic_load(addr, mo)						\
+	__extension__							\
+	({								\
+		switch (mo) {						\
+		case CMM_ACQUIRE:					\
+		case CMM_CONSUME:					\
+		case CMM_RELAXED:					\
+			break;						\
+		case CMM_SEQ_CST:					\
+			cmm_smp_mb();					\
+			break;						\
+		default:						\
+			abort();					\
+		}							\
+									\
+		__typeof__(*(addr)) _rcu_value = uatomic_read(addr);	\
+									\
+		switch (mo) {						\
+		case CMM_RELAXED:					\
+			break;						\
+		case CMM_CONSUME:					\
+		case CMM_ACQUIRE:					\
+		case CMM_SEQ_CST:					\
+			cmm_smp_mb();					\
+			break;						\
+		default:						\
+			abort();					\
+		}							\
+									\
+		_rcu_value;						\
+	})
+
 #if !defined __OPTIMIZE__  || defined UATOMIC_NO_LINK_ERROR
 #ifdef ILLEGAL_INSTR
 static inline __attribute__((always_inline))
diff --git a/src/urcu-pointer.c b/src/urcu-pointer.c
index d0854ac..cea8aeb 100644
--- a/src/urcu-pointer.c
+++ b/src/urcu-pointer.c
@@ -39,19 +39,16 @@ void *rcu_dereference_sym(void *p)
 
 void *rcu_set_pointer_sym(void **p, void *v)
 {
-	cmm_wmb();
-	uatomic_set(p, v);
+	uatomic_store(p, v, CMM_RELEASE);
 	return v;
 }
 
 void *rcu_xchg_pointer_sym(void **p, void *v)
 {
-	cmm_wmb();
-	return uatomic_xchg(p, v);
+	return uatomic_xchg_mo(p, v, CMM_SEQ_CST);
 }
 
 void *rcu_cmpxchg_pointer_sym(void **p, void *old, void *_new)
 {
-	cmm_wmb();
-	return uatomic_cmpxchg(p, old, _new);
+	return uatomic_cmpxchg_mo(p, old, _new, CMM_SEQ_CST, CMM_SEQ_CST);
 }
-- 
2.39.2

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [lttng-dev] [PATCH 07/11] urcu-wait: Fix wait state load/store
  2023-05-15 20:17 [lttng-dev] [PATCH 00/11] Add support for TSAN to liburcu Olivier Dion via lttng-dev
                   ` (5 preceding siblings ...)
  2023-05-15 20:17 ` [lttng-dev] [PATCH 06/11] urcu/uatomic: Add CMM memory model Olivier Dion via lttng-dev
@ 2023-05-15 20:17 ` Olivier Dion via lttng-dev
  2023-05-15 20:17 ` [lttng-dev] [PATCH 08/11] tests: Use uatomic for accessing global states Olivier Dion via lttng-dev
                   ` (17 subsequent siblings)
  24 siblings, 0 replies; 69+ messages in thread
From: Olivier Dion via lttng-dev @ 2023-05-15 20:17 UTC (permalink / raw)
  To: lttng-dev; +Cc: Olivier Dion, Paul E. McKenney

The state of a wait node must be accessed atomically. Also, the action
of busy loading until the teardown state is seen must follow a
CMM_ACQUIRE semantic while storing the teardown must follow a
CMM_RELEASE semantic.

Change-Id: I9cd9cf4cd9ab2081551d7f33c0b1c23c3cf3942f
Co-authored-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Olivier Dion <odion@efficios.com>
---
 src/urcu-wait.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/urcu-wait.h b/src/urcu-wait.h
index ef5f7ed..4667a13 100644
--- a/src/urcu-wait.h
+++ b/src/urcu-wait.h
@@ -135,7 +135,7 @@ void urcu_adaptative_wake_up(struct urcu_wait_node *wait)
 			urcu_die(errno);
 	}
 	/* Allow teardown of struct urcu_wait memory. */
-	uatomic_or(&wait->state, URCU_WAIT_TEARDOWN);
+	uatomic_or_mo(&wait->state, URCU_WAIT_TEARDOWN, CMM_RELEASE);
 }
 
 /*
@@ -193,7 +193,7 @@ skip_futex_wait:
 			break;
 		caa_cpu_relax();
 	}
-	while (!(uatomic_read(&wait->state) & URCU_WAIT_TEARDOWN))
+	while (!(uatomic_load(&wait->state, CMM_ACQUIRE) & URCU_WAIT_TEARDOWN))
 		poll(NULL, 0, 10);
 	urcu_posix_assert(uatomic_read(&wait->state) & URCU_WAIT_TEARDOWN);
 }
@@ -209,7 +209,7 @@ void urcu_wake_all_waiters(struct urcu_waiters *waiters)
 			caa_container_of(iter, struct urcu_wait_node, node);
 
 		/* Don't wake already running threads */
-		if (wait_node->state & URCU_WAIT_RUNNING)
+		if (uatomic_load(&wait_node->state, CMM_RELAXED) & URCU_WAIT_RUNNING)
 			continue;
 		urcu_adaptative_wake_up(wait_node);
 	}
-- 
2.39.2

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [lttng-dev] [PATCH 08/11] tests: Use uatomic for accessing global states
  2023-05-15 20:17 [lttng-dev] [PATCH 00/11] Add support for TSAN to liburcu Olivier Dion via lttng-dev
                   ` (6 preceding siblings ...)
  2023-05-15 20:17 ` [lttng-dev] [PATCH 07/11] urcu-wait: Fix wait state load/store Olivier Dion via lttng-dev
@ 2023-05-15 20:17 ` Olivier Dion via lttng-dev
  2023-05-15 20:17 ` [lttng-dev] [PATCH 09/11] benchmark: " Olivier Dion via lttng-dev
                   ` (16 subsequent siblings)
  24 siblings, 0 replies; 69+ messages in thread
From: Olivier Dion via lttng-dev @ 2023-05-15 20:17 UTC (permalink / raw)
  To: lttng-dev; +Cc: Olivier Dion, Paul E. McKenney

Global states accesses were protected via memory barriers. Use the
uatomic API with the RCU memory model so that TSAN does not warns about
none atomic concurrent accesses.

Also, the thread id map mutex must be unlocked after setting the new
created thread id in the map. Otherwise, the new thread could observe an
unset id.

Change-Id: I1ecdc387b3f510621cbc116ad3b95c676f5d659a
Co-authored-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Olivier Dion <odion@efficios.com>
---
 tests/common/api.h            |  12 ++--
 tests/regression/rcutorture.h | 102 ++++++++++++++++++++++++----------
 2 files changed, 80 insertions(+), 34 deletions(-)

diff --git a/tests/common/api.h b/tests/common/api.h
index a260463..9d22b0f 100644
--- a/tests/common/api.h
+++ b/tests/common/api.h
@@ -26,6 +26,7 @@
 
 #include <urcu/compiler.h>
 #include <urcu/arch.h>
+#include <urcu/uatomic.h>
 
 /*
  * Machine parameters.
@@ -135,7 +136,7 @@ static int __smp_thread_id(void)
 	thread_id_t tid = pthread_self();
 
 	for (i = 0; i < NR_THREADS; i++) {
-		if (__thread_id_map[i] == tid) {
+		if (uatomic_read(&__thread_id_map[i]) == tid) {
 			long v = i + 1;  /* must be non-NULL. */
 
 			if (pthread_setspecific(thread_id_key, (void *)v) != 0) {
@@ -184,12 +185,13 @@ static thread_id_t create_thread(void *(*func)(void *), void *arg)
 		exit(-1);
 	}
 	__thread_id_map[i] = __THREAD_ID_MAP_WAITING;
-	spin_unlock(&__thread_id_map_mutex);
+
 	if (pthread_create(&tid, NULL, func, arg) != 0) {
 		perror("create_thread:pthread_create");
 		exit(-1);
 	}
-	__thread_id_map[i] = tid;
+	uatomic_set(&__thread_id_map[i], tid);
+	spin_unlock(&__thread_id_map_mutex);
 	return tid;
 }
 
@@ -199,7 +201,7 @@ static void *wait_thread(thread_id_t tid)
 	void *vp;
 
 	for (i = 0; i < NR_THREADS; i++) {
-		if (__thread_id_map[i] == tid)
+		if (uatomic_read(&__thread_id_map[i]) == tid)
 			break;
 	}
 	if (i >= NR_THREADS){
@@ -211,7 +213,7 @@ static void *wait_thread(thread_id_t tid)
 		perror("wait_thread:pthread_join");
 		exit(-1);
 	}
-	__thread_id_map[i] = __THREAD_ID_MAP_EMPTY;
+	uatomic_set(&__thread_id_map[i], __THREAD_ID_MAP_EMPTY);
 	return vp;
 }
 
diff --git a/tests/regression/rcutorture.h b/tests/regression/rcutorture.h
index bc394f9..754bbf0 100644
--- a/tests/regression/rcutorture.h
+++ b/tests/regression/rcutorture.h
@@ -44,6 +44,14 @@
  * data.  A correct RCU implementation will have all but the first two
  * numbers non-zero.
  *
+ * rcu_stress_count: Histogram of "ages" of structures seen by readers.  If any
+ * entries past the first two are non-zero, RCU is broken. The age of a newly
+ * allocated structure is zero, it becomes one when removed from reader
+ * visibility, and is incremented once per grace period subsequently -- and is
+ * freed after passing through (RCU_STRESS_PIPE_LEN-2) grace periods.  Since
+ * this tests only has one true writer (there are fake writers), only buckets at
+ * indexes 0 and 1 should be none-zero.
+ *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
  * the Free Software Foundation; either version 2 of the License, or
@@ -68,6 +76,8 @@
 #include <stdlib.h>
 #include "tap.h"
 
+#include <urcu/uatomic.h>
+
 #define NR_TESTS	1
 
 DEFINE_PER_THREAD(long long, n_reads_pt);
@@ -145,10 +155,10 @@ void *rcu_read_perf_test(void *arg)
 	run_on(me);
 	uatomic_inc(&nthreadsrunning);
 	put_thread_offline();
-	while (goflag == GOFLAG_INIT)
+	while (uatomic_read(&goflag) == GOFLAG_INIT)
 		(void) poll(NULL, 0, 1);
 	put_thread_online();
-	while (goflag == GOFLAG_RUN) {
+	while (uatomic_read(&goflag) == GOFLAG_RUN) {
 		for (i = 0; i < RCU_READ_RUN; i++) {
 			rcu_read_lock();
 			/* rcu_read_lock_nest(); */
@@ -180,9 +190,9 @@ void *rcu_update_perf_test(void *arg __attribute__((unused)))
 		}
 	}
 	uatomic_inc(&nthreadsrunning);
-	while (goflag == GOFLAG_INIT)
+	while (uatomic_read(&goflag) == GOFLAG_INIT)
 		(void) poll(NULL, 0, 1);
-	while (goflag == GOFLAG_RUN) {
+	while (uatomic_read(&goflag) == GOFLAG_RUN) {
 		synchronize_rcu();
 		n_updates_local++;
 	}
@@ -211,15 +221,11 @@ int perftestrun(int nthreads, int nreaders, int nupdaters)
 	int t;
 	int duration = 1;
 
-	cmm_smp_mb();
 	while (uatomic_read(&nthreadsrunning) < nthreads)
 		(void) poll(NULL, 0, 1);
-	goflag = GOFLAG_RUN;
-	cmm_smp_mb();
+	uatomic_set(&goflag, GOFLAG_RUN);
 	sleep(duration);
-	cmm_smp_mb();
-	goflag = GOFLAG_STOP;
-	cmm_smp_mb();
+	uatomic_set(&goflag, GOFLAG_STOP);
 	wait_all_threads();
 	for_each_thread(t) {
 		n_reads += per_thread(n_reads_pt, t);
@@ -300,6 +306,13 @@ struct rcu_stress rcu_stress_array[RCU_STRESS_PIPE_LEN] = { { 0, 0 } };
 struct rcu_stress *rcu_stress_current;
 int rcu_stress_idx = 0;
 
+/*
+ * How many time a reader has seen something that should not be visible. It is
+ * an error if this value is different than zero at the end of the stress test.
+ *
+ * Here, the something that should not be visibile is an old pipe that has been
+ * freed (mbtest = 0).
+ */
 int n_mberror = 0;
 DEFINE_PER_THREAD(long long [RCU_STRESS_PIPE_LEN + 1], rcu_stress_count);
 
@@ -315,19 +328,25 @@ void *rcu_read_stress_test(void *arg __attribute__((unused)))
 
 	rcu_register_thread();
 	put_thread_offline();
-	while (goflag == GOFLAG_INIT)
+	while (uatomic_read(&goflag) == GOFLAG_INIT)
 		(void) poll(NULL, 0, 1);
 	put_thread_online();
-	while (goflag == GOFLAG_RUN) {
+	while (uatomic_read(&goflag) == GOFLAG_RUN) {
 		rcu_read_lock();
 		p = rcu_dereference(rcu_stress_current);
 		if (p->mbtest == 0)
-			n_mberror++;
+			uatomic_inc_mo(&n_mberror, CMM_RELAXED);
 		rcu_read_lock_nest();
+		/*
+		 * The value of garbage is nothing important. This is
+		 * essentially a busy loop. The atomic operation -- while not
+		 * important here -- helps tools such as TSAN to not flag this
+		 * as a race condition.
+		 */
 		for (i = 0; i < 100; i++)
-			garbage++;
+			uatomic_inc(&garbage);
 		rcu_read_unlock_nest();
-		pc = p->pipe_count;
+		pc = uatomic_read(&p->pipe_count);
 		rcu_read_unlock();
 		if ((pc > RCU_STRESS_PIPE_LEN) || (pc < 0))
 			pc = RCU_STRESS_PIPE_LEN;
@@ -397,26 +416,47 @@ static
 void *rcu_update_stress_test(void *arg __attribute__((unused)))
 {
 	int i;
-	struct rcu_stress *p;
+	struct rcu_stress *p, *old_p;
 	struct rcu_head rh;
 	enum writer_state writer_state = WRITER_STATE_SYNC_RCU;
 
-	while (goflag == GOFLAG_INIT)
+	rcu_register_thread();
+
+	put_thread_offline();
+	while (uatomic_read(&goflag) == GOFLAG_INIT)
 		(void) poll(NULL, 0, 1);
-	while (goflag == GOFLAG_RUN) {
+
+	put_thread_online();
+	while (uatomic_read(&goflag) == GOFLAG_RUN) {
 		i = rcu_stress_idx + 1;
 		if (i >= RCU_STRESS_PIPE_LEN)
 			i = 0;
+		/*
+		 * Get old pipe that we free after a synchronize_rcu().
+		 */
+		rcu_read_lock();
+		old_p = rcu_dereference(rcu_stress_current);
+		rcu_read_unlock();
+
+		/*
+		 * Allocate a new pipe.
+		 */
 		p = &rcu_stress_array[i];
-		p->mbtest = 0;
-		cmm_smp_mb();
 		p->pipe_count = 0;
 		p->mbtest = 1;
+
 		rcu_assign_pointer(rcu_stress_current, p);
 		rcu_stress_idx = i;
+
+		/*
+		 * Increment every pipe except the freshly allocated one. A
+		 * reader should only see either the old pipe or the new
+		 * pipe. This is reflected in the rcu_stress_count histogram.
+		 */
 		for (i = 0; i < RCU_STRESS_PIPE_LEN; i++)
 			if (i != rcu_stress_idx)
-				rcu_stress_array[i].pipe_count++;
+				uatomic_inc(&rcu_stress_array[i].pipe_count);
+
 		switch (writer_state) {
 		case WRITER_STATE_SYNC_RCU:
 			synchronize_rcu();
@@ -478,10 +518,18 @@ void *rcu_update_stress_test(void *arg __attribute__((unused)))
 			break;
 		}
 		}
+		/*
+		 * No readers should see that old pipe now. Setting mbtest to 0
+		 * to mark it as "freed".
+		 */
+		old_p->mbtest = 0;
 		n_updates++;
 		advance_writer_state(&writer_state);
 	}
 
+	put_thread_offline();
+	rcu_unregister_thread();
+
 	return NULL;
 }
 
@@ -497,9 +545,9 @@ void *rcu_fake_update_stress_test(void *arg __attribute__((unused)))
 			set_thread_call_rcu_data(crdp);
 		}
 	}
-	while (goflag == GOFLAG_INIT)
+	while (uatomic_read(&goflag) == GOFLAG_INIT)
 		(void) poll(NULL, 0, 1);
-	while (goflag == GOFLAG_RUN) {
+	while (uatomic_read(&goflag) == GOFLAG_RUN) {
 		synchronize_rcu();
 		(void) poll(NULL, 0, 1);
 	}
@@ -535,13 +583,9 @@ int stresstest(int nreaders)
 	create_thread(rcu_update_stress_test, NULL);
 	for (i = 0; i < 5; i++)
 		create_thread(rcu_fake_update_stress_test, NULL);
-	cmm_smp_mb();
-	goflag = GOFLAG_RUN;
-	cmm_smp_mb();
+	uatomic_set(&goflag, GOFLAG_RUN);
 	sleep(10);
-	cmm_smp_mb();
-	goflag = GOFLAG_STOP;
-	cmm_smp_mb();
+	uatomic_set(&goflag, GOFLAG_STOP);
 	wait_all_threads();
 	for_each_thread(t)
 		n_reads += per_thread(n_reads_pt, t);
-- 
2.39.2

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [lttng-dev] [PATCH 09/11] benchmark: Use uatomic for accessing global states
  2023-05-15 20:17 [lttng-dev] [PATCH 00/11] Add support for TSAN to liburcu Olivier Dion via lttng-dev
                   ` (7 preceding siblings ...)
  2023-05-15 20:17 ` [lttng-dev] [PATCH 08/11] tests: Use uatomic for accessing global states Olivier Dion via lttng-dev
@ 2023-05-15 20:17 ` Olivier Dion via lttng-dev
  2023-05-15 20:17 ` [lttng-dev] [PATCH 10/11] tests/unit/test_build: Quiet unused return value Olivier Dion via lttng-dev
                   ` (15 subsequent siblings)
  24 siblings, 0 replies; 69+ messages in thread
From: Olivier Dion via lttng-dev @ 2023-05-15 20:17 UTC (permalink / raw)
  To: lttng-dev; +Cc: Olivier Dion, Paul E. McKenney

Global states accesses were protected via memory barriers. Use the
uatomic API with the URCU memory model so that TSAN can understand the
ordering imposed by the synchronization flags.

Change-Id: I1bf5702c5ac470f308c478effe39e424a3158060
Co-authored-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Olivier Dion <odion@efficios.com>
---
 tests/benchmark/Makefile.am             | 91 +++++++++++++------------
 tests/benchmark/common-states.c         |  1 +
 tests/benchmark/common-states.h         | 51 ++++++++++++++
 tests/benchmark/test_mutex.c            | 32 +--------
 tests/benchmark/test_perthreadlock.c    | 32 +--------
 tests/benchmark/test_rwlock.c           | 32 +--------
 tests/benchmark/test_urcu.c             | 33 +--------
 tests/benchmark/test_urcu_assign.c      | 33 +--------
 tests/benchmark/test_urcu_bp.c          | 33 +--------
 tests/benchmark/test_urcu_defer.c       | 33 +--------
 tests/benchmark/test_urcu_gc.c          | 34 ++-------
 tests/benchmark/test_urcu_hash.c        |  6 +-
 tests/benchmark/test_urcu_hash.h        | 15 ----
 tests/benchmark/test_urcu_hash_rw.c     | 10 +--
 tests/benchmark/test_urcu_hash_unique.c | 10 +--
 tests/benchmark/test_urcu_lfq.c         | 20 ++----
 tests/benchmark/test_urcu_lfs.c         | 20 ++----
 tests/benchmark/test_urcu_lfs_rcu.c     | 20 ++----
 tests/benchmark/test_urcu_qsbr.c        | 33 +--------
 tests/benchmark/test_urcu_qsbr_gc.c     | 34 ++-------
 tests/benchmark/test_urcu_wfcq.c        | 22 +++---
 tests/benchmark/test_urcu_wfq.c         | 20 ++----
 tests/benchmark/test_urcu_wfs.c         | 22 +++---
 23 files changed, 177 insertions(+), 460 deletions(-)
 create mode 100644 tests/benchmark/common-states.c
 create mode 100644 tests/benchmark/common-states.h

diff --git a/tests/benchmark/Makefile.am b/tests/benchmark/Makefile.am
index c53e025..a7f91c2 100644
--- a/tests/benchmark/Makefile.am
+++ b/tests/benchmark/Makefile.am
@@ -1,4 +1,5 @@
 AM_CPPFLAGS += -I$(top_srcdir)/src -I$(top_srcdir)/tests/common
+AM_CPPFLAGS += -include $(top_srcdir)/tests/benchmark/common-states.h
 
 TEST_EXTENSIONS = .tap
 TAP_LOG_DRIVER_FLAGS = --merge --comments
@@ -7,6 +8,8 @@ TAP_LOG_DRIVER = env AM_TAP_AWK='$(AWK)' \
 	URCU_TESTS_BUILDDIR='$(abs_top_builddir)/tests' \
 	$(SHELL) $(top_srcdir)/tests/utils/tap-driver.sh
 
+noinst_HEADERS = common-states.h
+
 SCRIPT_LIST = \
 	runpaul-phase1.sh \
 	runpaul-phase2.sh \
@@ -61,163 +64,163 @@ URCU_CDS_LIB=$(top_builddir)/src/liburcu-cds.la
 
 DEBUG_YIELD_LIB=$(builddir)/../common/libdebug-yield.la
 
-test_urcu_SOURCES = test_urcu.c
+test_urcu_SOURCES = test_urcu.c common-states.c
 test_urcu_LDADD = $(URCU_LIB)
 
-test_urcu_dynamic_link_SOURCES = test_urcu.c
+test_urcu_dynamic_link_SOURCES = test_urcu.c common-states.c
 test_urcu_dynamic_link_LDADD = $(URCU_LIB)
 test_urcu_dynamic_link_CFLAGS = -DDYNAMIC_LINK_TEST $(AM_CFLAGS)
 
-test_urcu_timing_SOURCES = test_urcu_timing.c
+test_urcu_timing_SOURCES = test_urcu_timing.c common-states.c
 test_urcu_timing_LDADD = $(URCU_LIB)
 
-test_urcu_yield_SOURCES = test_urcu.c
+test_urcu_yield_SOURCES = test_urcu.c common-states.c
 test_urcu_yield_LDADD = $(URCU_LIB) $(DEBUG_YIELD_LIB)
 test_urcu_yield_CFLAGS = -DDEBUG_YIELD $(AM_CFLAGS)
 
 
-test_urcu_qsbr_SOURCES = test_urcu_qsbr.c
+test_urcu_qsbr_SOURCES = test_urcu_qsbr.c common-states.c
 test_urcu_qsbr_LDADD = $(URCU_QSBR_LIB)
 
-test_urcu_qsbr_timing_SOURCES = test_urcu_qsbr_timing.c
+test_urcu_qsbr_timing_SOURCES = test_urcu_qsbr_timing.c common-states.c
 test_urcu_qsbr_timing_LDADD = $(URCU_QSBR_LIB)
 
 
-test_urcu_mb_SOURCES = test_urcu.c
+test_urcu_mb_SOURCES = test_urcu.c common-states.c
 test_urcu_mb_LDADD = $(URCU_MB_LIB)
 test_urcu_mb_CFLAGS = -DRCU_MB $(AM_CFLAGS)
 
 
-test_urcu_signal_SOURCES = test_urcu.c
+test_urcu_signal_SOURCES = test_urcu.c common-states.c
 test_urcu_signal_LDADD = $(URCU_SIGNAL_LIB)
 test_urcu_signal_CFLAGS = -DRCU_SIGNAL $(AM_CFLAGS)
 
-test_urcu_signal_dynamic_link_SOURCES = test_urcu.c
+test_urcu_signal_dynamic_link_SOURCES = test_urcu.c common-states.c
 test_urcu_signal_dynamic_link_LDADD = $(URCU_SIGNAL_LIB)
 test_urcu_signal_dynamic_link_CFLAGS = -DRCU_SIGNAL -DDYNAMIC_LINK_TEST \
 					$(AM_CFLAGS)
 
-test_urcu_signal_timing_SOURCES = test_urcu_timing.c
+test_urcu_signal_timing_SOURCES = test_urcu_timing.c common-states.c
 test_urcu_signal_timing_LDADD = $(URCU_SIGNAL_LIB)
 test_urcu_signal_timing_CFLAGS= -DRCU_SIGNAL $(AM_CFLAGS)
 
-test_urcu_signal_yield_SOURCES = test_urcu.c
+test_urcu_signal_yield_SOURCES = test_urcu.c common-states.c
 test_urcu_signal_yield_LDADD = $(URCU_SIGNAL_LIB) $(DEBUG_YIELD_LIB)
 test_urcu_signal_yield_CFLAGS = -DRCU_SIGNAL -DDEBUG_YIELD $(AM_CFLAGS)
 
-test_rwlock_timing_SOURCES = test_rwlock_timing.c
+test_rwlock_timing_SOURCES = test_rwlock_timing.c common-states.c
 test_rwlock_timing_LDADD = $(URCU_SIGNAL_LIB)
 
-test_rwlock_SOURCES = test_rwlock.c
+test_rwlock_SOURCES = test_rwlock.c common-states.c
 test_rwlock_LDADD = $(URCU_SIGNAL_LIB)
 
-test_perthreadlock_timing_SOURCES = test_perthreadlock_timing.c
+test_perthreadlock_timing_SOURCES = test_perthreadlock_timing.c common-states.c
 test_perthreadlock_timing_LDADD = $(URCU_SIGNAL_LIB)
 
-test_perthreadlock_SOURCES = test_perthreadlock.c
+test_perthreadlock_SOURCES = test_perthreadlock.c common-states.c
 test_perthreadlock_LDADD = $(URCU_SIGNAL_LIB)
 
-test_mutex_SOURCES = test_mutex.c
+test_mutex_SOURCES = test_mutex.c common-states.c
 
-test_looplen_SOURCES = test_looplen.c
+test_looplen_SOURCES = test_looplen.c common-states.c
 
-test_urcu_gc_SOURCES = test_urcu_gc.c
+test_urcu_gc_SOURCES = test_urcu_gc.c common-states.c
 test_urcu_gc_LDADD = $(URCU_LIB)
 
-test_urcu_signal_gc_SOURCES = test_urcu_gc.c
+test_urcu_signal_gc_SOURCES = test_urcu_gc.c common-states.c
 test_urcu_signal_gc_LDADD = $(URCU_SIGNAL_LIB)
 test_urcu_signal_gc_CFLAGS = -DRCU_SIGNAL $(AM_CFLAGS)
 
-test_urcu_mb_gc_SOURCES = test_urcu_gc.c
+test_urcu_mb_gc_SOURCES = test_urcu_gc.c common-states.c
 test_urcu_mb_gc_LDADD = $(URCU_MB_LIB)
 test_urcu_mb_gc_CFLAGS = -DRCU_MB $(AM_CFLAGS)
 
-test_urcu_qsbr_gc_SOURCES = test_urcu_qsbr_gc.c
+test_urcu_qsbr_gc_SOURCES = test_urcu_qsbr_gc.c common-states.c
 test_urcu_qsbr_gc_LDADD = $(URCU_QSBR_LIB)
 
-test_urcu_qsbr_lgc_SOURCES = test_urcu_qsbr_gc.c
+test_urcu_qsbr_lgc_SOURCES = test_urcu_qsbr_gc.c common-states.c
 test_urcu_qsbr_lgc_LDADD = $(URCU_QSBR_LIB)
 test_urcu_qsbr_lgc_CFLAGS = -DTEST_LOCAL_GC $(AM_CFLAGS)
 
-test_urcu_lgc_SOURCES = test_urcu_gc.c
+test_urcu_lgc_SOURCES = test_urcu_gc.c common-states.c
 test_urcu_lgc_LDADD = $(URCU_LIB)
 test_urcu_lgc_CFLAGS = -DTEST_LOCAL_GC $(AM_CFLAGS)
 
-test_urcu_signal_lgc_SOURCES = test_urcu_gc.c
+test_urcu_signal_lgc_SOURCES = test_urcu_gc.c common-states.c
 test_urcu_signal_lgc_LDADD = $(URCU_SIGNAL_LIB)
 test_urcu_signal_lgc_CFLAGS = -DRCU_SIGNAL -DTEST_LOCAL_GC $(AM_CFLAGS)
 
-test_urcu_mb_lgc_SOURCES = test_urcu_gc.c
+test_urcu_mb_lgc_SOURCES = test_urcu_gc.c common-states.c
 test_urcu_mb_lgc_LDADD = $(URCU_MB_LIB)
 test_urcu_mb_lgc_CFLAGS = -DTEST_LOCAL_GC -DRCU_MB $(AM_CFLAGS)
 
-test_urcu_qsbr_dynamic_link_SOURCES = test_urcu_qsbr.c
+test_urcu_qsbr_dynamic_link_SOURCES = test_urcu_qsbr.c common-states.c
 test_urcu_qsbr_dynamic_link_LDADD = $(URCU_QSBR_LIB)
 test_urcu_qsbr_dynamic_link_CFLAGS = -DDYNAMIC_LINK_TEST $(AM_CFLAGS)
 
-test_urcu_defer_SOURCES = test_urcu_defer.c
+test_urcu_defer_SOURCES = test_urcu_defer.c common-states.c
 test_urcu_defer_LDADD = $(URCU_LIB)
 
 test_cycles_per_loop_SOURCES = test_cycles_per_loop.c
 
-test_urcu_assign_SOURCES = test_urcu_assign.c
+test_urcu_assign_SOURCES = test_urcu_assign.c common-states.c
 test_urcu_assign_LDADD = $(URCU_LIB)
 
-test_urcu_assign_dynamic_link_SOURCES = test_urcu_assign.c
+test_urcu_assign_dynamic_link_SOURCES = test_urcu_assign.c common-states.c
 test_urcu_assign_dynamic_link_LDADD = $(URCU_LIB)
 test_urcu_assign_dynamic_link_CFLAGS = -DDYNAMIC_LINK_TEST $(AM_CFLAGS)
 
-test_urcu_bp_SOURCES = test_urcu_bp.c
+test_urcu_bp_SOURCES = test_urcu_bp.c common-states.c
 test_urcu_bp_LDADD = $(URCU_BP_LIB)
 
-test_urcu_bp_dynamic_link_SOURCES = test_urcu_bp.c
+test_urcu_bp_dynamic_link_SOURCES = test_urcu_bp.c common-states.c
 test_urcu_bp_dynamic_link_LDADD = $(URCU_BP_LIB)
 test_urcu_bp_dynamic_link_CFLAGS = -DDYNAMIC_LINK_TEST $(AM_CFLAGS)
 
-test_urcu_lfq_SOURCES = test_urcu_lfq.c
+test_urcu_lfq_SOURCES = test_urcu_lfq.c common-states.c
 test_urcu_lfq_LDADD = $(URCU_LIB) $(URCU_CDS_LIB)
 
-test_urcu_lfq_dynlink_SOURCES = test_urcu_lfq.c
+test_urcu_lfq_dynlink_SOURCES = test_urcu_lfq.c common-states.c
 test_urcu_lfq_dynlink_CFLAGS = -DDYNAMIC_LINK_TEST $(AM_CFLAGS)
 test_urcu_lfq_dynlink_LDADD = $(URCU_LIB) $(URCU_CDS_LIB)
 
-test_urcu_wfq_SOURCES = test_urcu_wfq.c
+test_urcu_wfq_SOURCES = test_urcu_wfq.c common-states.c
 test_urcu_wfq_LDADD = $(URCU_COMMON_LIB)
 
-test_urcu_wfq_dynlink_SOURCES = test_urcu_wfq.c
+test_urcu_wfq_dynlink_SOURCES = test_urcu_wfq.c common-states.c
 test_urcu_wfq_dynlink_CFLAGS = -DDYNAMIC_LINK_TEST $(AM_CFLAGS)
 test_urcu_wfq_dynlink_LDADD = $(URCU_COMMON_LIB)
 
-test_urcu_wfcq_SOURCES = test_urcu_wfcq.c
+test_urcu_wfcq_SOURCES = test_urcu_wfcq.c common-states.c
 test_urcu_wfcq_LDADD = $(URCU_COMMON_LIB)
 
-test_urcu_wfcq_dynlink_SOURCES = test_urcu_wfcq.c
+test_urcu_wfcq_dynlink_SOURCES = test_urcu_wfcq.c common-states.c
 test_urcu_wfcq_dynlink_CFLAGS = -DDYNAMIC_LINK_TEST $(AM_CFLAGS)
 test_urcu_wfcq_dynlink_LDADD = $(URCU_COMMON_LIB)
 
-test_urcu_lfs_SOURCES = test_urcu_lfs.c
+test_urcu_lfs_SOURCES = test_urcu_lfs.c common-states.c
 test_urcu_lfs_LDADD = $(URCU_LIB) $(URCU_CDS_LIB)
 
-test_urcu_lfs_rcu_SOURCES = test_urcu_lfs_rcu.c
+test_urcu_lfs_rcu_SOURCES = test_urcu_lfs_rcu.c common-states.c
 test_urcu_lfs_rcu_LDADD = $(URCU_LIB) $(URCU_CDS_LIB)
 
-test_urcu_lfs_dynlink_SOURCES = test_urcu_lfs.c
+test_urcu_lfs_dynlink_SOURCES = test_urcu_lfs.c common-states.c
 test_urcu_lfs_dynlink_CFLAGS = -DDYNAMIC_LINK_TEST $(AM_CFLAGS)
 test_urcu_lfs_dynlink_LDADD = $(URCU_LIB) $(URCU_CDS_LIB)
 
-test_urcu_lfs_rcu_dynlink_SOURCES = test_urcu_lfs_rcu.c
+test_urcu_lfs_rcu_dynlink_SOURCES = test_urcu_lfs_rcu.c common-states.c
 test_urcu_lfs_rcu_dynlink_CFLAGS = -DDYNAMIC_LINK_TEST $(AM_CFLAGS)
 test_urcu_lfs_rcu_dynlink_LDADD = $(URCU_LIB) $(URCU_CDS_LIB)
 
-test_urcu_wfs_SOURCES = test_urcu_wfs.c
+test_urcu_wfs_SOURCES = test_urcu_wfs.c common-states.c
 test_urcu_wfs_LDADD = $(URCU_COMMON_LIB)
 
-test_urcu_wfs_dynlink_SOURCES = test_urcu_wfs.c
+test_urcu_wfs_dynlink_SOURCES = test_urcu_wfs.c common-states.c
 test_urcu_wfs_dynlink_CFLAGS = -DDYNAMIC_LINK_TEST $(AM_CFLAGS)
 test_urcu_wfs_dynlink_LDADD = $(URCU_COMMON_LIB)
 
 test_urcu_hash_SOURCES = test_urcu_hash.c test_urcu_hash.h \
-		test_urcu_hash_rw.c test_urcu_hash_unique.c
+		test_urcu_hash_rw.c test_urcu_hash_unique.c common-states.c
 test_urcu_hash_CFLAGS = -DRCU_QSBR $(AM_CFLAGS)
 test_urcu_hash_LDADD = $(URCU_QSBR_LIB) $(URCU_COMMON_LIB) $(URCU_CDS_LIB)
 
diff --git a/tests/benchmark/common-states.c b/tests/benchmark/common-states.c
new file mode 100644
index 0000000..6e70351
--- /dev/null
+++ b/tests/benchmark/common-states.c
@@ -0,0 +1 @@
+volatile int _test_go = 0, _test_stop = 0;
diff --git a/tests/benchmark/common-states.h b/tests/benchmark/common-states.h
new file mode 100644
index 0000000..dfbbfe5
--- /dev/null
+++ b/tests/benchmark/common-states.h
@@ -0,0 +1,51 @@
+/* Common states for benchmarks. */
+
+#include <unistd.h>
+
+#include <urcu/uatomic.h>
+
+extern volatile int _test_go, _test_stop;
+
+static inline void complete_sleep(unsigned int seconds)
+{
+	while (seconds != 0) {
+		seconds = sleep(seconds);
+	}
+}
+
+static inline void begin_test(void)
+{
+	uatomic_store(&_test_go, 1, CMM_RELEASE);
+}
+
+static inline void end_test(void)
+{
+	uatomic_store(&_test_stop, 1, CMM_RELAXED);
+}
+
+static inline void test_for(unsigned int duration)
+{
+	begin_test();
+	complete_sleep(duration);
+	end_test();
+}
+
+static inline void wait_until_go(void)
+{
+	while (!uatomic_load(&_test_go, CMM_ACQUIRE))
+	{
+	}
+}
+
+/*
+ * returns 0 if test should end.
+ */
+static inline int test_duration_write(void)
+{
+	return !uatomic_load(&_test_stop, CMM_RELAXED);
+}
+
+static inline int test_duration_read(void)
+{
+	return !uatomic_load(&_test_stop, CMM_RELAXED);
+}
diff --git a/tests/benchmark/test_mutex.c b/tests/benchmark/test_mutex.c
index 55f7c38..145139c 100644
--- a/tests/benchmark/test_mutex.c
+++ b/tests/benchmark/test_mutex.c
@@ -49,8 +49,6 @@ struct test_array {
 
 static pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
 
-static volatile int test_go, test_stop;
-
 static unsigned long wdelay;
 
 static volatile struct test_array test_array = { 8 };
@@ -111,19 +109,6 @@ static void set_affinity(void)
 #endif /* HAVE_SCHED_SETAFFINITY */
 }
 
-/*
- * returns 0 if test should end.
- */
-static int test_duration_write(void)
-{
-	return !test_stop;
-}
-
-static int test_duration_read(void)
-{
-	return !test_stop;
-}
-
 static DEFINE_URCU_TLS(unsigned long long, nr_writes);
 static DEFINE_URCU_TLS(unsigned long long, nr_reads);
 
@@ -147,9 +132,7 @@ void *thr_reader(void *data)
 
 	set_affinity();
 
-	while (!test_go)
-	{
-	}
+	wait_until_go();
 
 	for (;;) {
 		int v;
@@ -182,10 +165,7 @@ void *thr_writer(void *data)
 
 	set_affinity();
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	for (;;) {
 		pthread_mutex_lock(&lock);
@@ -325,13 +305,7 @@ int main(int argc, char **argv)
 			exit(1);
 	}
 
-	cmm_smp_mb();
-
-	test_go = 1;
-
-	sleep(duration);
-
-	test_stop = 1;
+	test_for(duration);
 
 	for (i_thr = 0; i_thr < nr_readers; i_thr++) {
 		err = pthread_join(tid_reader[i_thr], &tret);
diff --git a/tests/benchmark/test_perthreadlock.c b/tests/benchmark/test_perthreadlock.c
index 47a512c..bf468eb 100644
--- a/tests/benchmark/test_perthreadlock.c
+++ b/tests/benchmark/test_perthreadlock.c
@@ -53,8 +53,6 @@ struct per_thread_lock {
 
 static struct per_thread_lock *per_thread_lock;
 
-static volatile int test_go, test_stop;
-
 static unsigned long wdelay;
 
 static volatile struct test_array test_array = { 8 };
@@ -117,19 +115,6 @@ static void set_affinity(void)
 #endif /* HAVE_SCHED_SETAFFINITY */
 }
 
-/*
- * returns 0 if test should end.
- */
-static int test_duration_write(void)
-{
-	return !test_stop;
-}
-
-static int test_duration_read(void)
-{
-	return !test_stop;
-}
-
 static DEFINE_URCU_TLS(unsigned long long, nr_writes);
 static DEFINE_URCU_TLS(unsigned long long, nr_reads);
 
@@ -175,9 +160,7 @@ void *thr_reader(void *data)
 
 	set_affinity();
 
-	while (!test_go)
-	{
-	}
+	wait_until_go();
 
 	for (;;) {
 		int v;
@@ -211,10 +194,7 @@ void *thr_writer(void *data)
 
 	set_affinity();
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	for (;;) {
 		for (tidx = 0; tidx < (long)nr_readers; tidx++) {
@@ -359,13 +339,7 @@ int main(int argc, char **argv)
 			exit(1);
 	}
 
-	cmm_smp_mb();
-
-	test_go = 1;
-
-	sleep(duration);
-
-	test_stop = 1;
+	test_for(duration);
 
 	for (i_thr = 0; i_thr < nr_readers; i_thr++) {
 		err = pthread_join(tid_reader[i_thr], &tret);
diff --git a/tests/benchmark/test_rwlock.c b/tests/benchmark/test_rwlock.c
index 6908ea4..f5099e8 100644
--- a/tests/benchmark/test_rwlock.c
+++ b/tests/benchmark/test_rwlock.c
@@ -53,8 +53,6 @@ struct test_array {
  */
 pthread_rwlock_t lock;
 
-static volatile int test_go, test_stop;
-
 static unsigned long wdelay;
 
 static volatile struct test_array test_array = { 8 };
@@ -116,19 +114,6 @@ static void set_affinity(void)
 #endif /* HAVE_SCHED_SETAFFINITY */
 }
 
-/*
- * returns 0 if test should end.
- */
-static int test_duration_write(void)
-{
-	return !test_stop;
-}
-
-static int test_duration_read(void)
-{
-	return !test_stop;
-}
-
 static DEFINE_URCU_TLS(unsigned long long, nr_writes);
 static DEFINE_URCU_TLS(unsigned long long, nr_reads);
 
@@ -147,9 +132,7 @@ void *thr_reader(void *_count)
 
 	set_affinity();
 
-	while (!test_go)
-	{
-	}
+	wait_until_go();
 
 	for (;;) {
 		int a, ret;
@@ -194,10 +177,7 @@ void *thr_writer(void *_count)
 
 	set_affinity();
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	for (;;) {
 		int ret;
@@ -355,13 +335,7 @@ int main(int argc, char **argv)
 			exit(1);
 	}
 
-	cmm_smp_mb();
-
-	test_go = 1;
-
-	sleep(duration);
-
-	test_stop = 1;
+	test_for(duration);
 
 	for (i_thr = 0; i_thr < nr_readers; i_thr++) {
 		err = pthread_join(tid_reader[i_thr], &tret);
diff --git a/tests/benchmark/test_urcu.c b/tests/benchmark/test_urcu.c
index ea849fa..b89513b 100644
--- a/tests/benchmark/test_urcu.c
+++ b/tests/benchmark/test_urcu.c
@@ -44,8 +44,6 @@
 #endif
 #include <urcu.h>
 
-static volatile int test_go, test_stop;
-
 static unsigned long wdelay;
 
 static int *test_rcu_pointer;
@@ -107,19 +105,6 @@ static void set_affinity(void)
 #endif /* HAVE_SCHED_SETAFFINITY */
 }
 
-/*
- * returns 0 if test should end.
- */
-static int test_duration_write(void)
-{
-	return !test_stop;
-}
-
-static int test_duration_read(void)
-{
-	return !test_stop;
-}
-
 static DEFINE_URCU_TLS(unsigned long long, nr_writes);
 static DEFINE_URCU_TLS(unsigned long long, nr_reads);
 
@@ -142,10 +127,7 @@ void *thr_reader(void *_count)
 	rcu_register_thread();
 	urcu_posix_assert(!rcu_read_ongoing());
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	for (;;) {
 		rcu_read_lock();
@@ -186,10 +168,7 @@ void *thr_writer(void *_count)
 
 	set_affinity();
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	for (;;) {
 		new = malloc(sizeof(int));
@@ -337,13 +316,7 @@ int main(int argc, char **argv)
 			exit(1);
 	}
 
-	cmm_smp_mb();
-
-	test_go = 1;
-
-	sleep(duration);
-
-	test_stop = 1;
+	test_for(duration);
 
 	for (i_thr = 0; i_thr < nr_readers; i_thr++) {
 		err = pthread_join(tid_reader[i_thr], &tret);
diff --git a/tests/benchmark/test_urcu_assign.c b/tests/benchmark/test_urcu_assign.c
index 88889a8..e83b05e 100644
--- a/tests/benchmark/test_urcu_assign.c
+++ b/tests/benchmark/test_urcu_assign.c
@@ -48,8 +48,6 @@ struct test_array {
 	int a;
 };
 
-static volatile int test_go, test_stop;
-
 static unsigned long wdelay;
 
 static struct test_array *test_rcu_pointer;
@@ -111,19 +109,6 @@ static void set_affinity(void)
 #endif /* HAVE_SCHED_SETAFFINITY */
 }
 
-/*
- * returns 0 if test should end.
- */
-static int test_duration_write(void)
-{
-	return !test_stop;
-}
-
-static int test_duration_read(void)
-{
-	return !test_stop;
-}
-
 static DEFINE_URCU_TLS(unsigned long long, nr_writes);
 static DEFINE_URCU_TLS(unsigned long long, nr_reads);
 
@@ -201,10 +186,7 @@ void *thr_reader(void *_count)
 
 	rcu_register_thread();
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	for (;;) {
 		rcu_read_lock();
@@ -240,10 +222,7 @@ void *thr_writer(void *_count)
 
 	set_affinity();
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	for (;;) {
 		rcu_copy_mutex_lock();
@@ -394,13 +373,7 @@ int main(int argc, char **argv)
 			exit(1);
 	}
 
-	cmm_smp_mb();
-
-	test_go = 1;
-
-	sleep(duration);
-
-	test_stop = 1;
+	test_for(duration);
 
 	for (i_thr = 0; i_thr < nr_readers; i_thr++) {
 		err = pthread_join(tid_reader[i_thr], &tret);
diff --git a/tests/benchmark/test_urcu_bp.c b/tests/benchmark/test_urcu_bp.c
index 6f8c59d..c3b00f1 100644
--- a/tests/benchmark/test_urcu_bp.c
+++ b/tests/benchmark/test_urcu_bp.c
@@ -44,8 +44,6 @@
 #endif
 #include <urcu-bp.h>
 
-static volatile int test_go, test_stop;
-
 static unsigned long wdelay;
 
 static int *test_rcu_pointer;
@@ -107,19 +105,6 @@ static void set_affinity(void)
 #endif /* HAVE_SCHED_SETAFFINITY */
 }
 
-/*
- * returns 0 if test should end.
- */
-static int test_duration_write(void)
-{
-	return !test_stop;
-}
-
-static int test_duration_read(void)
-{
-	return !test_stop;
-}
-
 static DEFINE_URCU_TLS(unsigned long long, nr_writes);
 static DEFINE_URCU_TLS(unsigned long long, nr_reads);
 
@@ -142,10 +127,7 @@ void *thr_reader(void *_count)
 	rcu_register_thread();
 	urcu_posix_assert(!rcu_read_ongoing());
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	for (;;) {
 		rcu_read_lock();
@@ -182,10 +164,7 @@ void *thr_writer(void *_count)
 
 	set_affinity();
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	for (;;) {
 		new = malloc(sizeof(int));
@@ -332,13 +311,7 @@ int main(int argc, char **argv)
 			exit(1);
 	}
 
-	cmm_smp_mb();
-
-	test_go = 1;
-
-	sleep(duration);
-
-	test_stop = 1;
+	test_for(duration);
 
 	for (i_thr = 0; i_thr < nr_readers; i_thr++) {
 		err = pthread_join(tid_reader[i_thr], &tret);
diff --git a/tests/benchmark/test_urcu_defer.c b/tests/benchmark/test_urcu_defer.c
index e948ebf..c501f60 100644
--- a/tests/benchmark/test_urcu_defer.c
+++ b/tests/benchmark/test_urcu_defer.c
@@ -49,8 +49,6 @@ struct test_array {
 	int a;
 };
 
-static volatile int test_go, test_stop;
-
 static unsigned long wdelay;
 
 static struct test_array *test_rcu_pointer;
@@ -112,19 +110,6 @@ static void set_affinity(void)
 #endif /* HAVE_SCHED_SETAFFINITY */
 }
 
-/*
- * returns 0 if test should end.
- */
-static int test_duration_write(void)
-{
-	return !test_stop;
-}
-
-static int test_duration_read(void)
-{
-	return !test_stop;
-}
-
 static DEFINE_URCU_TLS(unsigned long long, nr_writes);
 static DEFINE_URCU_TLS(unsigned long long, nr_reads);
 
@@ -149,10 +134,7 @@ void *thr_reader(void *_count)
 
 	rcu_register_thread();
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	for (;;) {
 		rcu_read_lock();
@@ -203,10 +185,7 @@ void *thr_writer(void *data)
 		exit(-1);
 	}
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	for (;;) {
 		new = malloc(sizeof(*new));
@@ -359,13 +338,7 @@ int main(int argc, char **argv)
 			exit(1);
 	}
 
-	cmm_smp_mb();
-
-	test_go = 1;
-
-	sleep(duration);
-
-	test_stop = 1;
+	test_for(duration);
 
 	for (i_thr = 0; i_thr < nr_readers; i_thr++) {
 		err = pthread_join(tid_reader[i_thr], &tret);
diff --git a/tests/benchmark/test_urcu_gc.c b/tests/benchmark/test_urcu_gc.c
index f14f728..1cbee44 100644
--- a/tests/benchmark/test_urcu_gc.c
+++ b/tests/benchmark/test_urcu_gc.c
@@ -33,6 +33,7 @@
 #include <urcu/arch.h>
 #include <urcu/assert.h>
 #include <urcu/tls-compat.h>
+#include <urcu/uatomic.h>
 #include "thread-id.h"
 #include "../common/debug-yield.h"
 
@@ -48,8 +49,6 @@ struct test_array {
 	int a;
 };
 
-static volatile int test_go, test_stop;
-
 static unsigned long wdelay;
 
 static struct test_array *test_rcu_pointer;
@@ -120,19 +119,6 @@ static void set_affinity(void)
 #endif /* HAVE_SCHED_SETAFFINITY */
 }
 
-/*
- * returns 0 if test should end.
- */
-static int test_duration_write(void)
-{
-	return !test_stop;
-}
-
-static int test_duration_read(void)
-{
-	return !test_stop;
-}
-
 static DEFINE_URCU_TLS(unsigned long long, nr_writes);
 static DEFINE_URCU_TLS(unsigned long long, nr_reads);
 
@@ -157,10 +143,7 @@ void *thr_reader(void *_count)
 
 	rcu_register_thread();
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	for (;;) {
 		rcu_read_lock();
@@ -231,10 +214,7 @@ void *thr_writer(void *data)
 
 	set_affinity();
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	for (;;) {
 #ifndef TEST_LOCAL_GC
@@ -399,13 +379,7 @@ int main(int argc, char **argv)
 			exit(1);
 	}
 
-	cmm_smp_mb();
-
-	test_go = 1;
-
-	sleep(duration);
-
-	test_stop = 1;
+	test_for(duration);
 
 	for (i_thr = 0; i_thr < nr_readers; i_thr++) {
 		err = pthread_join(tid_reader[i_thr], &tret);
diff --git a/tests/benchmark/test_urcu_hash.c b/tests/benchmark/test_urcu_hash.c
index 3574b4c..1a3087e 100644
--- a/tests/benchmark/test_urcu_hash.c
+++ b/tests/benchmark/test_urcu_hash.c
@@ -96,8 +96,6 @@ DEFINE_URCU_TLS(unsigned long, lookup_ok);
 
 struct cds_lfht *test_ht;
 
-volatile int test_go, test_stop;
-
 unsigned long wdelay;
 
 unsigned long duration;
@@ -649,14 +647,14 @@ int main(int argc, char **argv)
 
 	cmm_smp_mb();
 
-	test_go = 1;
+	begin_test();
 
 	remain = duration;
 	do {
 		remain = sleep(remain);
 	} while (remain > 0);
 
-	test_stop = 1;
+	end_test();
 
 end_pthread_join:
 	for (i_thr = 0; i_thr < nr_readers_created; i_thr++) {
diff --git a/tests/benchmark/test_urcu_hash.h b/tests/benchmark/test_urcu_hash.h
index 47b2ae3..73a0a6d 100644
--- a/tests/benchmark/test_urcu_hash.h
+++ b/tests/benchmark/test_urcu_hash.h
@@ -125,8 +125,6 @@ cds_lfht_iter_get_test_node(struct cds_lfht_iter *iter)
 	return to_test_node(cds_lfht_iter_get_node(iter));
 }
 
-extern volatile int test_go, test_stop;
-
 extern unsigned long wdelay;
 
 extern unsigned long duration;
@@ -174,19 +172,6 @@ extern pthread_mutex_t affinity_mutex;
 
 void set_affinity(void);
 
-/*
- * returns 0 if test should end.
- */
-static inline int test_duration_write(void)
-{
-	return !test_stop;
-}
-
-static inline int test_duration_read(void)
-{
-	return !test_stop;
-}
-
 extern DECLARE_URCU_TLS(unsigned long long, nr_writes);
 extern DECLARE_URCU_TLS(unsigned long long, nr_reads);
 
diff --git a/tests/benchmark/test_urcu_hash_rw.c b/tests/benchmark/test_urcu_hash_rw.c
index 862a6f0..087e869 100644
--- a/tests/benchmark/test_urcu_hash_rw.c
+++ b/tests/benchmark/test_urcu_hash_rw.c
@@ -73,10 +73,7 @@ void *test_hash_rw_thr_reader(void *_count)
 
 	rcu_register_thread();
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	for (;;) {
 		rcu_read_lock();
@@ -133,10 +130,7 @@ void *test_hash_rw_thr_writer(void *_count)
 
 	rcu_register_thread();
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	for (;;) {
 		struct cds_lfht_node *ret_node = NULL;
diff --git a/tests/benchmark/test_urcu_hash_unique.c b/tests/benchmark/test_urcu_hash_unique.c
index de7c427..90c0e19 100644
--- a/tests/benchmark/test_urcu_hash_unique.c
+++ b/tests/benchmark/test_urcu_hash_unique.c
@@ -71,10 +71,7 @@ void *test_hash_unique_thr_reader(void *_count)
 
 	rcu_register_thread();
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	for (;;) {
 		struct lfht_test_node *node;
@@ -136,10 +133,7 @@ void *test_hash_unique_thr_writer(void *_count)
 
 	rcu_register_thread();
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	for (;;) {
 		/*
diff --git a/tests/benchmark/test_urcu_lfq.c b/tests/benchmark/test_urcu_lfq.c
index 490e8b0..50c4211 100644
--- a/tests/benchmark/test_urcu_lfq.c
+++ b/tests/benchmark/test_urcu_lfq.c
@@ -47,8 +47,6 @@
 #include <urcu.h>
 #include <urcu/cds.h>
 
-static volatile int test_go, test_stop;
-
 static unsigned long rduration;
 
 static unsigned long duration;
@@ -110,12 +108,12 @@ static void set_affinity(void)
  */
 static int test_duration_dequeue(void)
 {
-	return !test_stop;
+	return test_duration_read();
 }
 
 static int test_duration_enqueue(void)
 {
-	return !test_stop;
+	return test_duration_write();
 }
 
 static DEFINE_URCU_TLS(unsigned long long, nr_dequeues);
@@ -146,10 +144,7 @@ void *thr_enqueuer(void *_count)
 
 	rcu_register_thread();
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	for (;;) {
 		struct test *node = malloc(sizeof(*node));
@@ -202,10 +197,7 @@ void *thr_dequeuer(void *_count)
 
 	rcu_register_thread();
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	for (;;) {
 		struct cds_lfq_node_rcu *qnode;
@@ -375,7 +367,7 @@ int main(int argc, char **argv)
 
 	cmm_smp_mb();
 
-	test_go = 1;
+	begin_test();
 
 	for (i_thr = 0; i_thr < duration; i_thr++) {
 		sleep(1);
@@ -385,7 +377,7 @@ int main(int argc, char **argv)
 		}
 	}
 
-	test_stop = 1;
+	end_test();
 
 	for (i_thr = 0; i_thr < nr_enqueuers; i_thr++) {
 		err = pthread_join(tid_enqueuer[i_thr], &tret);
diff --git a/tests/benchmark/test_urcu_lfs.c b/tests/benchmark/test_urcu_lfs.c
index 52239e0..48b2b23 100644
--- a/tests/benchmark/test_urcu_lfs.c
+++ b/tests/benchmark/test_urcu_lfs.c
@@ -59,8 +59,6 @@ enum test_sync {
 
 static enum test_sync test_sync;
 
-static volatile int test_go, test_stop;
-
 static unsigned long rduration;
 
 static unsigned long duration;
@@ -124,12 +122,12 @@ static void set_affinity(void)
  */
 static int test_duration_dequeue(void)
 {
-	return !test_stop;
+	return test_duration_read();
 }
 
 static int test_duration_enqueue(void)
 {
-	return !test_stop;
+	return test_duration_write();
 }
 
 static DEFINE_URCU_TLS(unsigned long long, nr_dequeues);
@@ -159,10 +157,7 @@ static void *thr_enqueuer(void *_count)
 
 	rcu_register_thread();
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	for (;;) {
 		struct test *node = malloc(sizeof(*node));
@@ -261,10 +256,7 @@ static void *thr_dequeuer(void *_count)
 
 	rcu_register_thread();
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	urcu_posix_assert(test_pop || test_pop_all);
 
@@ -459,7 +451,7 @@ int main(int argc, char **argv)
 
 	cmm_smp_mb();
 
-	test_go = 1;
+	begin_test();
 
 	for (i_thr = 0; i_thr < duration; i_thr++) {
 		sleep(1);
@@ -469,7 +461,7 @@ int main(int argc, char **argv)
 		}
 	}
 
-	test_stop = 1;
+	end_test();
 
 	for (i_thr = 0; i_thr < nr_enqueuers; i_thr++) {
 		err = pthread_join(tid_enqueuer[i_thr], &tret);
diff --git a/tests/benchmark/test_urcu_lfs_rcu.c b/tests/benchmark/test_urcu_lfs_rcu.c
index 7975faf..ae3dff4 100644
--- a/tests/benchmark/test_urcu_lfs_rcu.c
+++ b/tests/benchmark/test_urcu_lfs_rcu.c
@@ -51,8 +51,6 @@
 
 #include <urcu/cds.h>
 
-static volatile int test_go, test_stop;
-
 static unsigned long rduration;
 
 static unsigned long duration;
@@ -114,12 +112,12 @@ static void set_affinity(void)
  */
 static int test_duration_dequeue(void)
 {
-	return !test_stop;
+	return test_duration_read();
 }
 
 static int test_duration_enqueue(void)
 {
-	return !test_stop;
+	return test_duration_write();
 }
 
 static DEFINE_URCU_TLS(unsigned long long, nr_dequeues);
@@ -150,10 +148,7 @@ void *thr_enqueuer(void *_count)
 
 	rcu_register_thread();
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	for (;;) {
 		struct test *node = malloc(sizeof(*node));
@@ -205,10 +200,7 @@ void *thr_dequeuer(void *_count)
 
 	rcu_register_thread();
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	for (;;) {
 		struct cds_lfs_node_rcu *snode;
@@ -377,7 +369,7 @@ int main(int argc, char **argv)
 
 	cmm_smp_mb();
 
-	test_go = 1;
+	begin_test();
 
 	for (i_thr = 0; i_thr < duration; i_thr++) {
 		sleep(1);
@@ -387,7 +379,7 @@ int main(int argc, char **argv)
 		}
 	}
 
-	test_stop = 1;
+	end_test();
 
 	for (i_thr = 0; i_thr < nr_enqueuers; i_thr++) {
 		err = pthread_join(tid_enqueuer[i_thr], &tret);
diff --git a/tests/benchmark/test_urcu_qsbr.c b/tests/benchmark/test_urcu_qsbr.c
index 1ea369c..295e9db 100644
--- a/tests/benchmark/test_urcu_qsbr.c
+++ b/tests/benchmark/test_urcu_qsbr.c
@@ -44,8 +44,6 @@
 #endif
 #include "urcu-qsbr.h"
 
-static volatile int test_go, test_stop;
-
 static unsigned long wdelay;
 
 static int *test_rcu_pointer;
@@ -106,19 +104,6 @@ static void set_affinity(void)
 #endif /* HAVE_SCHED_SETAFFINITY */
 }
 
-/*
- * returns 0 if test should end.
- */
-static int test_duration_write(void)
-{
-	return !test_stop;
-}
-
-static int test_duration_read(void)
-{
-	return !test_stop;
-}
-
 static DEFINE_URCU_TLS(unsigned long long, nr_writes);
 static DEFINE_URCU_TLS(unsigned long long, nr_reads);
 
@@ -145,10 +130,7 @@ void *thr_reader(void *_count)
 	urcu_posix_assert(!rcu_read_ongoing());
 	rcu_thread_online();
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	for (;;) {
 		rcu_read_lock();
@@ -192,10 +174,7 @@ void *thr_writer(void *_count)
 
 	set_affinity();
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	for (;;) {
 		new = malloc(sizeof(int));
@@ -343,13 +322,7 @@ int main(int argc, char **argv)
 			exit(1);
 	}
 
-	cmm_smp_mb();
-
-	test_go = 1;
-
-	sleep(duration);
-
-	test_stop = 1;
+	test_for(duration);
 
 	for (i_thr = 0; i_thr < nr_readers; i_thr++) {
 		err = pthread_join(tid_reader[i_thr], &tret);
diff --git a/tests/benchmark/test_urcu_qsbr_gc.c b/tests/benchmark/test_urcu_qsbr_gc.c
index 8877a82..163405d 100644
--- a/tests/benchmark/test_urcu_qsbr_gc.c
+++ b/tests/benchmark/test_urcu_qsbr_gc.c
@@ -33,6 +33,7 @@
 #include <urcu/arch.h>
 #include <urcu/assert.h>
 #include <urcu/tls-compat.h>
+#include <urcu/uatomic.h>
 #include "thread-id.h"
 #include "../common/debug-yield.h"
 
@@ -46,8 +47,6 @@ struct test_array {
 	int a;
 };
 
-static volatile int test_go, test_stop;
-
 static unsigned long wdelay;
 
 static struct test_array *test_rcu_pointer;
@@ -118,19 +117,6 @@ static void set_affinity(void)
 #endif /* HAVE_SCHED_SETAFFINITY */
 }
 
-/*
- * returns 0 if test should end.
- */
-static int test_duration_write(void)
-{
-	return !test_stop;
-}
-
-static int test_duration_read(void)
-{
-	return !test_stop;
-}
-
 static DEFINE_URCU_TLS(unsigned long long, nr_writes);
 static DEFINE_URCU_TLS(unsigned long long, nr_reads);
 
@@ -154,10 +140,7 @@ void *thr_reader(void *_count)
 
 	rcu_register_thread();
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	for (;;) {
 		_rcu_read_lock();
@@ -231,10 +214,7 @@ void *thr_writer(void *data)
 
 	set_affinity();
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	for (;;) {
 #ifndef TEST_LOCAL_GC
@@ -399,13 +379,7 @@ int main(int argc, char **argv)
 			exit(1);
 	}
 
-	cmm_smp_mb();
-
-	test_go = 1;
-
-	sleep(duration);
-
-	test_stop = 1;
+	test_for(duration);
 
 	for (i_thr = 0; i_thr < nr_readers; i_thr++) {
 		err = pthread_join(tid_reader[i_thr], &tret);
diff --git a/tests/benchmark/test_urcu_wfcq.c b/tests/benchmark/test_urcu_wfcq.c
index 2c6e0fd..542a13a 100644
--- a/tests/benchmark/test_urcu_wfcq.c
+++ b/tests/benchmark/test_urcu_wfcq.c
@@ -56,7 +56,7 @@ static enum test_sync test_sync;
 
 static int test_force_sync;
 
-static volatile int test_go, test_stop_enqueue, test_stop_dequeue;
+static volatile int test_stop_enqueue, test_stop_dequeue;
 
 static unsigned long rduration;
 
@@ -122,12 +122,12 @@ static void set_affinity(void)
  */
 static int test_duration_dequeue(void)
 {
-	return !test_stop_dequeue;
+	return !uatomic_load(&test_stop_dequeue, CMM_RELAXED);
 }
 
 static int test_duration_enqueue(void)
 {
-	return !test_stop_enqueue;
+	return !uatomic_load(&test_stop_enqueue, CMM_RELAXED);
 }
 
 static DEFINE_URCU_TLS(unsigned long long, nr_dequeues);
@@ -155,10 +155,7 @@ static void *thr_enqueuer(void *_count)
 
 	set_affinity();
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	for (;;) {
 		struct cds_wfcq_node *node = malloc(sizeof(*node));
@@ -266,10 +263,7 @@ static void *thr_dequeuer(void *_count)
 
 	set_affinity();
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	for (;;) {
 		if (test_dequeue && test_splice) {
@@ -482,7 +476,7 @@ int main(int argc, char **argv)
 
 	cmm_smp_mb();
 
-	test_go = 1;
+	begin_test();
 
 	for (i_thr = 0; i_thr < duration; i_thr++) {
 		sleep(1);
@@ -492,7 +486,7 @@ int main(int argc, char **argv)
 		}
 	}
 
-	test_stop_enqueue = 1;
+	uatomic_store(&test_stop_enqueue, 1, CMM_RELEASE);
 
 	if (test_wait_empty) {
 		while (nr_enqueuers != uatomic_read(&test_enqueue_stopped)) {
@@ -503,7 +497,7 @@ int main(int argc, char **argv)
 		}
 	}
 
-	test_stop_dequeue = 1;
+	uatomic_store(&test_stop_dequeue, 1, CMM_RELAXED);
 
 	for (i_thr = 0; i_thr < nr_enqueuers; i_thr++) {
 		err = pthread_join(tid_enqueuer[i_thr], &tret);
diff --git a/tests/benchmark/test_urcu_wfq.c b/tests/benchmark/test_urcu_wfq.c
index 8381160..2d8de87 100644
--- a/tests/benchmark/test_urcu_wfq.c
+++ b/tests/benchmark/test_urcu_wfq.c
@@ -51,8 +51,6 @@
 #include <urcu.h>
 #include <urcu/wfqueue.h>
 
-static volatile int test_go, test_stop;
-
 static unsigned long rduration;
 
 static unsigned long duration;
@@ -114,12 +112,12 @@ static void set_affinity(void)
  */
 static int test_duration_dequeue(void)
 {
-	return !test_stop;
+	return test_duration_read();
 }
 
 static int test_duration_enqueue(void)
 {
-	return !test_stop;
+	return test_duration_write();
 }
 
 static DEFINE_URCU_TLS(unsigned long long, nr_dequeues);
@@ -143,10 +141,7 @@ void *thr_enqueuer(void *_count)
 
 	set_affinity();
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	for (;;) {
 		struct cds_wfq_node *node = malloc(sizeof(*node));
@@ -185,10 +180,7 @@ void *thr_dequeuer(void *_count)
 
 	set_affinity();
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	for (;;) {
 		struct cds_wfq_node *node = cds_wfq_dequeue_blocking(&q);
@@ -343,7 +335,7 @@ int main(int argc, char **argv)
 
 	cmm_smp_mb();
 
-	test_go = 1;
+	begin_test();
 
 	for (i_thr = 0; i_thr < duration; i_thr++) {
 		sleep(1);
@@ -353,7 +345,7 @@ int main(int argc, char **argv)
 		}
 	}
 
-	test_stop = 1;
+	end_test();
 
 	for (i_thr = 0; i_thr < nr_enqueuers; i_thr++) {
 		err = pthread_join(tid_enqueuer[i_thr], &tret);
diff --git a/tests/benchmark/test_urcu_wfs.c b/tests/benchmark/test_urcu_wfs.c
index c285feb..d1a4afc 100644
--- a/tests/benchmark/test_urcu_wfs.c
+++ b/tests/benchmark/test_urcu_wfs.c
@@ -59,7 +59,7 @@ static enum test_sync test_sync;
 
 static int test_force_sync;
 
-static volatile int test_go, test_stop_enqueue, test_stop_dequeue;
+static volatile int test_stop_enqueue, test_stop_dequeue;
 
 static unsigned long rduration;
 
@@ -125,12 +125,12 @@ static void set_affinity(void)
  */
 static int test_duration_dequeue(void)
 {
-	return !test_stop_dequeue;
+	return !uatomic_load(&test_stop_dequeue, CMM_RELAXED);
 }
 
 static int test_duration_enqueue(void)
 {
-	return !test_stop_enqueue;
+	return !uatomic_load(&test_stop_enqueue, CMM_RELAXED);
 }
 
 static DEFINE_URCU_TLS(unsigned long long, nr_dequeues);
@@ -157,10 +157,7 @@ static void *thr_enqueuer(void *_count)
 
 	set_affinity();
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	for (;;) {
 		struct cds_wfs_node *node = malloc(sizeof(*node));
@@ -250,10 +247,7 @@ static void *thr_dequeuer(void *_count)
 
 	set_affinity();
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	urcu_posix_assert(test_pop || test_pop_all);
 
@@ -469,7 +463,7 @@ int main(int argc, char **argv)
 
 	cmm_smp_mb();
 
-	test_go = 1;
+	begin_test();
 
 	for (i_thr = 0; i_thr < duration; i_thr++) {
 		sleep(1);
@@ -479,7 +473,7 @@ int main(int argc, char **argv)
 		}
 	}
 
-	test_stop_enqueue = 1;
+	uatomic_store(&test_stop_enqueue, 1, CMM_RELEASE);
 
 	if (test_wait_empty) {
 		while (nr_enqueuers != uatomic_read(&test_enqueue_stopped)) {
@@ -490,7 +484,7 @@ int main(int argc, char **argv)
 		}
 	}
 
-	test_stop_dequeue = 1;
+	uatomic_store(&test_stop_dequeue, 1, CMM_RELAXED);
 
 	for (i_thr = 0; i_thr < nr_enqueuers; i_thr++) {
 		err = pthread_join(tid_enqueuer[i_thr], &tret);
-- 
2.39.2

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [lttng-dev] [PATCH 10/11] tests/unit/test_build: Quiet unused return value
  2023-05-15 20:17 [lttng-dev] [PATCH 00/11] Add support for TSAN to liburcu Olivier Dion via lttng-dev
                   ` (8 preceding siblings ...)
  2023-05-15 20:17 ` [lttng-dev] [PATCH 09/11] benchmark: " Olivier Dion via lttng-dev
@ 2023-05-15 20:17 ` Olivier Dion via lttng-dev
  2023-05-15 20:17 ` [lttng-dev] [PATCH 11/11] urcu/annotate: Add CMM annotation Olivier Dion via lttng-dev
                   ` (14 subsequent siblings)
  24 siblings, 0 replies; 69+ messages in thread
From: Olivier Dion via lttng-dev @ 2023-05-15 20:17 UTC (permalink / raw)
  To: lttng-dev; +Cc: Olivier Dion, Paul E. McKenney

Change-Id: Ie5a18e0ccc4b1b5ee85c5bd140561cc2ff9e2fbc
Co-authored-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Olivier Dion <odion@efficios.com>
---
 tests/unit/test_build.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/tests/unit/test_build.c b/tests/unit/test_build.c
index f6b667c..702c1ef 100644
--- a/tests/unit/test_build.c
+++ b/tests/unit/test_build.c
@@ -129,10 +129,10 @@ void test_build_rcu_dereference(void)
 	static struct a_clear_struct *clear = NULL;
 	static struct a_clear_struct *const clear_const = NULL;
 
-	rcu_dereference(opaque);
-	rcu_dereference(opaque_const);
-	rcu_dereference(clear);
-	rcu_dereference(clear_const);
+	(void) rcu_dereference(opaque);
+	(void) rcu_dereference(opaque_const);
+	(void) rcu_dereference(clear);
+	(void) rcu_dereference(clear_const);
 }
 
 int main(void)
-- 
2.39.2

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [lttng-dev] [PATCH 11/11] urcu/annotate: Add CMM annotation
  2023-05-15 20:17 [lttng-dev] [PATCH 00/11] Add support for TSAN to liburcu Olivier Dion via lttng-dev
                   ` (9 preceding siblings ...)
  2023-05-15 20:17 ` [lttng-dev] [PATCH 10/11] tests/unit/test_build: Quiet unused return value Olivier Dion via lttng-dev
@ 2023-05-15 20:17 ` Olivier Dion via lttng-dev
  2023-05-16 15:57   ` Olivier Dion via lttng-dev
  2023-05-16  8:18 ` [lttng-dev] [PATCH 00/11] Add support for TSAN to liburcu Dmitry Vyukov via lttng-dev
                   ` (13 subsequent siblings)
  24 siblings, 1 reply; 69+ messages in thread
From: Olivier Dion via lttng-dev @ 2023-05-15 20:17 UTC (permalink / raw)
  To: lttng-dev; +Cc: Olivier Dion, Paul E. McKenney

The CMM annotation is highly experimental and not meant to be used by
user for now, even though it is exposed in the public API since some
parts of the liburcu public API require those annotations.

The main primitive is the cmm_annotate_t which denotes a group of memory
operations associated with a memory barrier. A group follows a state
machine, starting from the `CMM_ANNOTATE_VOID' state. The following are
the only valid transitions:

  CMM_ANNOTATE_VOID -> CMM_ANNOTATE_MB (acquire & release MB)
  CMM_ANNOTATE_VOID -> CMM_ANNOTATE_LOAD (acquire memory)
  CMM_ANNOTATE_LOAD -> CMM_ANNOTATE_MB (acquire MB)

The macro `cmm_annotate_define(name)' can be used to create an
annotation object on the stack. The rest of the `cmm_annotate_*' macros
can be used to change the state of the group after validating that the
transition is allowed. Some of these macros also inject TSAN annotations
to help it understand the flow of events in the program since it does
not currently support thread fence.

Sometime, a single memory access does not need to be associated with a
group. In the case, the acquire/release macros variant without the
`group' infix can be used to annotate memory accesses.

Note that TSAN can not be used on the liburcu-signal flavor. This is
because TSAN hijacks calls to sigaction(3) and places its own handler
that will deliver the signal to the application at a synchronization
point.

Thus, the usage of TSAN on the signal flavor is undefined
behavior. However, there's at least one known behavior which is a
deadlock between readers that want to unregister them-self by locking
the `rcu_registry_lock' while a synchronize RCU is made on the writer
side which has already locked that mutex until all the registered
readers execute a memory barrier in a signal handler defined by
liburcu-signal. However, TSAN will not call the registered handler while
waiting on the mutex. Therefore, the writer spin infinitely on
pthread_kill(3p) because the reader simply never complete the handshake.

See the deadlock minimal reproducer below.

Deadlock reproducer:
```
#include <poll.h>
#include <signal.h>

#include <pthread.h>

#define SIGURCU SIGUSR1

static pthread_mutex_t rcu_registry_lock = PTHREAD_MUTEX_INITIALIZER;
static int need_mb = 0;

static void *reader_side(void *nil)
{
	(void) nil;

	pthread_mutex_lock(&rcu_registry_lock);
	pthread_mutex_unlock(&rcu_registry_lock);

	return NULL;
}

static void writer_side(pthread_t reader)
{
	__atomic_store_n(&need_mb, 1, __ATOMIC_RELEASE);
	while (__atomic_load_n(&need_mb, __ATOMIC_ACQUIRE)) {
		pthread_kill(reader, SIGURCU);
		(void) poll(NULL, 0, 1);
	}
	pthread_mutex_unlock(&rcu_registry_lock);

	pthread_join(reader, NULL);
}

static void sigrcu_handler(int signo, siginfo_t *siginfo, void *context)
{
	(void) signo;
	(void) siginfo;
	(void) context;

	__atomic_store_n(&need_mb, 0, __ATOMIC_SEQ_CST);
}

static void install_signal(void)
{
	struct sigaction act;

	act.sa_sigaction = sigrcu_handler;
	act.sa_flags     = SA_SIGINFO | SA_RESTART;

	sigemptyset(&act.sa_mask);

	(void) sigaction(SIGURCU, &act, NULL);
}

int main(void)
{
	pthread_t th;

	install_signal();

	pthread_mutex_lock(&rcu_registry_lock);
	pthread_create(&th, NULL, reader_side, NULL);

	writer_side(th);

	return 0;
}
```

Change-Id: I9c234bb311cc0f82ea9dbefdf4fee07047ab93f9
Co-authored-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Olivier Dion <odion@efficios.com>
---
 include/Makefile.am               |   1 +
 include/urcu/annotate.h           | 174 ++++++++++++++++++++++++++++++
 include/urcu/arch/generic.h       |  33 +++++-
 include/urcu/compiler.h           |  12 +++
 include/urcu/static/urcu-bp.h     |  12 ++-
 include/urcu/static/urcu-common.h |   8 +-
 include/urcu/static/urcu-mb.h     |  11 +-
 include/urcu/static/urcu-memb.h   |  26 +++--
 include/urcu/static/urcu-qsbr.h   |  29 +++--
 src/rculfhash.c                   |  92 ++++++++++------
 src/urcu-bp.c                     |  17 ++-
 src/urcu-qsbr.c                   |  31 ++++--
 src/urcu-wait.h                   |   9 +-
 src/urcu.c                        |  24 +++--
 14 files changed, 390 insertions(+), 89 deletions(-)
 create mode 100644 include/urcu/annotate.h

diff --git a/include/Makefile.am b/include/Makefile.am
index fac941f..b1520a1 100644
--- a/include/Makefile.am
+++ b/include/Makefile.am
@@ -1,4 +1,5 @@
 nobase_include_HEADERS = \
+	urcu/annotate.h \
 	urcu/arch/aarch64.h \
 	urcu/arch/alpha.h \
 	urcu/arch/arm.h \
diff --git a/include/urcu/annotate.h b/include/urcu/annotate.h
new file mode 100644
index 0000000..37e7f03
--- /dev/null
+++ b/include/urcu/annotate.h
@@ -0,0 +1,174 @@
+/*
+ * urcu/annotate.h
+ *
+ * Userspace RCU - annotation header.
+ *
+ * Copyright 2023 - Olivier Dion <odion@efficios.com>
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+/*
+ * WARNING!
+ *
+ * This API is highly experimental. There is zero guarantees of stability
+ * between releases.
+ *
+ * You have been warned.
+ */
+#ifndef _URCU_ANNOTATE_H
+#define _URCU_ANNOTATE_H
+
+#include <stdio.h>
+#include <stdlib.h>
+
+#include <urcu/compiler.h>
+
+enum cmm_annotate {
+	CMM_ANNOTATE_VOID,
+	CMM_ANNOTATE_LOAD,
+	CMM_ANNOTATE_STORE,
+	CMM_ANNOTATE_MB,
+};
+
+typedef enum cmm_annotate cmm_annotate_t __attribute__((unused));
+
+#define cmm_annotate_define(name)		\
+	cmm_annotate_t name = CMM_ANNOTATE_VOID
+
+#ifdef CMM_SANITIZE_THREAD
+
+# ifdef __cplusplus
+extern "C" {
+# endif
+extern void __tsan_acquire(void *);
+extern void __tsan_release(void *);
+# ifdef __cplusplus
+}
+# endif
+
+# define cmm_annotate_die(msg)						\
+	do {								\
+		fprintf(stderr,						\
+			"(" __FILE__ ":%s@%u) Annotation ERROR: %s\n",	\
+			__func__, __LINE__, msg);			\
+		abort();						\
+	} while (0)
+
+/* Only used for typechecking in macros. */
+static inline cmm_annotate_t cmm_annotate_dereference(cmm_annotate_t *group)
+{
+	return *group;
+}
+
+# define cmm_annotate_group_mb_acquire(group)				\
+	do {								\
+		switch (cmm_annotate_dereference(group)) {		\
+		case CMM_ANNOTATE_VOID:					\
+			break;						\
+		case CMM_ANNOTATE_LOAD:					\
+			break;						\
+		case CMM_ANNOTATE_STORE:				\
+			cmm_annotate_die("store for acquire group");	\
+			break;						\
+		case CMM_ANNOTATE_MB:					\
+			cmm_annotate_die(				\
+				"redundant mb for acquire group"	\
+					);				\
+			break;						\
+		}							\
+		*(group) = CMM_ANNOTATE_MB;				\
+	} while (0)
+
+# define cmm_annotate_group_mb_release(group)				\
+	do {								\
+		switch (cmm_annotate_dereference(group)) {		\
+		case CMM_ANNOTATE_VOID:					\
+			break;						\
+		case CMM_ANNOTATE_LOAD:					\
+			cmm_annotate_die("load before release group");	\
+			break;						\
+		case CMM_ANNOTATE_STORE:				\
+			cmm_annotate_die(				\
+				"store before release group"		\
+					);				\
+			break;						\
+		case CMM_ANNOTATE_MB:					\
+			cmm_annotate_die(				\
+				"redundant mb of release group"		\
+					);				\
+			break;						\
+		}							\
+		*(group) = CMM_ANNOTATE_MB;				\
+	} while (0)
+
+# define cmm_annotate_group_mem_acquire(group, mem)			\
+	do {								\
+		__tsan_acquire((void*)(mem));				\
+		switch (cmm_annotate_dereference(group)) {		\
+		case CMM_ANNOTATE_VOID:					\
+			*(group) = CMM_ANNOTATE_LOAD;			\
+			break;						\
+		case CMM_ANNOTATE_MB:					\
+			cmm_annotate_die(				\
+				"load after mb for acquire group"	\
+					);				\
+			break;						\
+		default:						\
+			break;						\
+		}							\
+	} while (0)
+
+# define cmm_annotate_group_mem_release(group, mem)		\
+	do {							\
+		__tsan_release((void*)(mem));			\
+		switch (cmm_annotate_dereference(group)) {	\
+		case CMM_ANNOTATE_MB:				\
+			break;					\
+		default:					\
+			cmm_annotate_die(			\
+				"missing mb for release group"	\
+					);			\
+		}						\
+	} while (0)
+
+# define cmm_annotate_mem_acquire(mem)		\
+	__tsan_acquire((void*)(mem))
+
+# define cmm_annotate_mem_release(mem)		\
+	__tsan_release((void*)(mem))
+#else
+
+# define cmm_annotate_group_mb_acquire(group)	\
+	(void) (group)
+
+# define cmm_annotate_group_mb_release(group)	\
+	(void) (group)
+
+# define cmm_annotate_group_mem_acquire(group, mem)	\
+	(void) (group)
+
+# define cmm_annotate_group_mem_release(group, mem)	\
+	(void) (group)
+
+# define cmm_annotate_mem_acquire(mem)		\
+	do { } while (0)
+
+# define cmm_annotate_mem_release(mem)		\
+	do { } while (0)
+
+#endif  /* CMM_SANITIZE_THREAD */
+
+#endif	/* _URCU_ANNOTATE_H */
diff --git a/include/urcu/arch/generic.h b/include/urcu/arch/generic.h
index e292c70..65dedf2 100644
--- a/include/urcu/arch/generic.h
+++ b/include/urcu/arch/generic.h
@@ -45,9 +45,38 @@ extern "C" {
 
 #ifdef CONFIG_RCU_USE_ATOMIC_BUILTINS
 
+# ifdef CMM_SANITIZE_THREAD
+/*
+ * This makes TSAN quiet about unsupported thread fence.
+ */
+static inline void _cmm_thread_fence_wrapper(void)
+{
+#   if defined(__clang__)
+#    pragma clang diagnostic push
+#    pragma clang diagnostic ignored "-Wpragmas"
+#    pragma clang diagnostic ignored "-Wunknown-warning-option"
+#    pragma clang diagnostic ignored "-Wtsan"
+#   elif defined(__GNUC__)
+#    pragma GCC diagnostic push
+#    pragma GCC diagnostic ignored "-Wpragmas"
+#    pragma GCC diagnostic ignored "-Wtsan"
+#   endif
+	__atomic_thread_fence(__ATOMIC_SEQ_CST);
+#   if defined(__clang__)
+#    pragma clang diagnostic pop
+#   elif defined(__GNUC__)
+#    pragma GCC diagnostic pop
+#   endif
+}
+# endif	 /* CMM_SANITIZE_THREAD */
+
 # ifndef cmm_smp_mb
-#  define cmm_smp_mb() __atomic_thread_fence(__ATOMIC_SEQ_CST)
-# endif
+#  ifdef CMM_SANITIZE_THREAD
+#   define cmm_smp_mb() _cmm_thread_fence_wrapper()
+#  else
+#   define cmm_smp_mb() __atomic_thread_fence(__ATOMIC_SEQ_CST)
+#  endif /* CMM_SANITIZE_THREAD */
+# endif /* !cmm_smp_mb */
 
 #endif	/* CONFIG_RCU_USE_ATOMIC_BUILTINS */
 
diff --git a/include/urcu/compiler.h b/include/urcu/compiler.h
index 3604488..7930820 100644
--- a/include/urcu/compiler.h
+++ b/include/urcu/compiler.h
@@ -129,4 +129,16 @@
 				+ __GNUC_PATCHLEVEL__)
 #endif
 
+/*
+ * Allow user to manually define CMM_SANITIZE_THREAD if their toolchain is not
+ * supported by this check.
+ */
+#ifndef CMM_SANITIZE_THREAD
+# if defined(__GNUC__) && defined(__SANITIZE_THREAD__)
+#   define CMM_SANITIZE_THREAD
+# elif defined(__clang__) && defined(__has_feature) && __has_feature(thread_sanitizer)
+#   define CMM_SANITIZE_THREAD
+# endif
+#endif	/* !CMM_SANITIZE_THREAD */
+
 #endif /* _URCU_COMPILER_H */
diff --git a/include/urcu/static/urcu-bp.h b/include/urcu/static/urcu-bp.h
index 8ba3830..3e14ef7 100644
--- a/include/urcu/static/urcu-bp.h
+++ b/include/urcu/static/urcu-bp.h
@@ -33,6 +33,7 @@
 #include <pthread.h>
 #include <unistd.h>
 
+#include <urcu/annotate.h>
 #include <urcu/debug.h>
 #include <urcu/config.h>
 #include <urcu/compiler.h>
@@ -117,7 +118,8 @@ static inline void urcu_bp_smp_mb_slave(void)
 		cmm_smp_mb();
 }
 
-static inline enum urcu_bp_state urcu_bp_reader_state(unsigned long *ctr)
+static inline enum urcu_bp_state urcu_bp_reader_state(unsigned long *ctr,
+						cmm_annotate_t *group)
 {
 	unsigned long v;
 
@@ -127,7 +129,9 @@ static inline enum urcu_bp_state urcu_bp_reader_state(unsigned long *ctr)
 	 * Make sure both tests below are done on the same version of *value
 	 * to insure consistency.
 	 */
-	v = CMM_LOAD_SHARED(*ctr);
+	v = uatomic_load(ctr, CMM_RELAXED);
+	cmm_annotate_group_mem_acquire(group, ctr);
+
 	if (!(v & URCU_BP_GP_CTR_NEST_MASK))
 		return URCU_BP_READER_INACTIVE;
 	if (!((v ^ urcu_bp_gp.ctr) & URCU_BP_GP_CTR_PHASE))
@@ -181,12 +185,14 @@ static inline void _urcu_bp_read_lock(void)
 static inline void _urcu_bp_read_unlock(void)
 {
 	unsigned long tmp;
+	unsigned long *ctr = &URCU_TLS(urcu_bp_reader)->ctr;
 
 	tmp = URCU_TLS(urcu_bp_reader)->ctr;
 	urcu_assert_debug(tmp & URCU_BP_GP_CTR_NEST_MASK);
 	/* Finish using rcu before decrementing the pointer. */
 	urcu_bp_smp_mb_slave();
-	_CMM_STORE_SHARED(URCU_TLS(urcu_bp_reader)->ctr, tmp - URCU_BP_GP_COUNT);
+	cmm_annotate_mem_release(ctr);
+	uatomic_store(ctr, tmp - URCU_BP_GP_COUNT, CMM_RELAXED);
 	cmm_barrier();	/* Ensure the compiler does not reorder us with mutex */
 }
 
diff --git a/include/urcu/static/urcu-common.h b/include/urcu/static/urcu-common.h
index 60ea8b8..32cb834 100644
--- a/include/urcu/static/urcu-common.h
+++ b/include/urcu/static/urcu-common.h
@@ -34,6 +34,7 @@
 #include <unistd.h>
 #include <stdint.h>
 
+#include <urcu/annotate.h>
 #include <urcu/config.h>
 #include <urcu/compiler.h>
 #include <urcu/arch.h>
@@ -105,7 +106,8 @@ static inline void urcu_common_wake_up_gp(struct urcu_gp *gp)
 }
 
 static inline enum urcu_state urcu_common_reader_state(struct urcu_gp *gp,
-		unsigned long *ctr)
+						unsigned long *ctr,
+						cmm_annotate_t *group)
 {
 	unsigned long v;
 
@@ -113,7 +115,9 @@ static inline enum urcu_state urcu_common_reader_state(struct urcu_gp *gp,
 	 * Make sure both tests below are done on the same version of *value
 	 * to insure consistency.
 	 */
-	v = CMM_LOAD_SHARED(*ctr);
+	v = uatomic_load(ctr, CMM_RELAXED);
+	cmm_annotate_group_mem_acquire(group, ctr);
+
 	if (!(v & URCU_GP_CTR_NEST_MASK))
 		return URCU_READER_INACTIVE;
 	if (!((v ^ gp->ctr) & URCU_GP_CTR_PHASE))
diff --git a/include/urcu/static/urcu-mb.h b/include/urcu/static/urcu-mb.h
index b97e42a..5bf7933 100644
--- a/include/urcu/static/urcu-mb.h
+++ b/include/urcu/static/urcu-mb.h
@@ -108,13 +108,14 @@ static inline void _urcu_mb_read_lock(void)
  */
 static inline void _urcu_mb_read_unlock_update_and_wakeup(unsigned long tmp)
 {
+	unsigned long *ctr = &URCU_TLS(urcu_mb_reader).ctr;
+
 	if (caa_likely((tmp & URCU_GP_CTR_NEST_MASK) == URCU_GP_COUNT)) {
-		cmm_smp_mb();
-		_CMM_STORE_SHARED(URCU_TLS(urcu_mb_reader).ctr, tmp - URCU_GP_COUNT);
-		cmm_smp_mb();
+		uatomic_store(ctr, tmp - URCU_GP_COUNT, CMM_SEQ_CST);
 		urcu_common_wake_up_gp(&urcu_mb_gp);
-	} else
-		_CMM_STORE_SHARED(URCU_TLS(urcu_mb_reader).ctr, tmp - URCU_GP_COUNT);
+	} else {
+		uatomic_store(ctr, tmp - URCU_GP_COUNT, CMM_RELAXED);
+	}
 }
 
 /*
diff --git a/include/urcu/static/urcu-memb.h b/include/urcu/static/urcu-memb.h
index c8d102f..8191ccc 100644
--- a/include/urcu/static/urcu-memb.h
+++ b/include/urcu/static/urcu-memb.h
@@ -34,6 +34,7 @@
 #include <unistd.h>
 #include <stdint.h>
 
+#include <urcu/annotate.h>
 #include <urcu/debug.h>
 #include <urcu/config.h>
 #include <urcu/compiler.h>
@@ -93,11 +94,20 @@ extern DECLARE_URCU_TLS(struct urcu_reader, urcu_memb_reader);
  */
 static inline void _urcu_memb_read_lock_update(unsigned long tmp)
 {
+	unsigned long *ctr = &URCU_TLS(urcu_memb_reader).ctr;
+
 	if (caa_likely(!(tmp & URCU_GP_CTR_NEST_MASK))) {
-		_CMM_STORE_SHARED(URCU_TLS(urcu_memb_reader).ctr, _CMM_LOAD_SHARED(urcu_memb_gp.ctr));
+		unsigned long *pgctr = &urcu_memb_gp.ctr;
+		unsigned long gctr = uatomic_load(pgctr, CMM_RELAXED);
+
+		/* Paired with following mb slave. */
+		cmm_annotate_mem_acquire(pgctr);
+		uatomic_store(ctr, gctr, CMM_RELAXED);
+
 		urcu_memb_smp_mb_slave();
-	} else
-		_CMM_STORE_SHARED(URCU_TLS(urcu_memb_reader).ctr, tmp + URCU_GP_COUNT);
+	} else {
+		uatomic_store(ctr, tmp + URCU_GP_COUNT, CMM_RELAXED);
+	}
 }
 
 /*
@@ -131,13 +141,17 @@ static inline void _urcu_memb_read_lock(void)
  */
 static inline void _urcu_memb_read_unlock_update_and_wakeup(unsigned long tmp)
 {
+	unsigned long *ctr = &URCU_TLS(urcu_memb_reader).ctr;
+
 	if (caa_likely((tmp & URCU_GP_CTR_NEST_MASK) == URCU_GP_COUNT)) {
 		urcu_memb_smp_mb_slave();
-		_CMM_STORE_SHARED(URCU_TLS(urcu_memb_reader).ctr, tmp - URCU_GP_COUNT);
+		cmm_annotate_mem_release(ctr);
+		uatomic_store(ctr, tmp - URCU_GP_COUNT, CMM_RELAXED);
 		urcu_memb_smp_mb_slave();
 		urcu_common_wake_up_gp(&urcu_memb_gp);
-	} else
-		_CMM_STORE_SHARED(URCU_TLS(urcu_memb_reader).ctr, tmp - URCU_GP_COUNT);
+	} else {
+		uatomic_store(ctr, tmp - URCU_GP_COUNT, CMM_RELAXED);
+	}
 }
 
 /*
diff --git a/include/urcu/static/urcu-qsbr.h b/include/urcu/static/urcu-qsbr.h
index b878877..864cbcf 100644
--- a/include/urcu/static/urcu-qsbr.h
+++ b/include/urcu/static/urcu-qsbr.h
@@ -35,6 +35,7 @@
 #include <unistd.h>
 #include <stdint.h>
 
+#include <urcu/annotate.h>
 #include <urcu/debug.h>
 #include <urcu/compiler.h>
 #include <urcu/arch.h>
@@ -96,11 +97,14 @@ static inline void urcu_qsbr_wake_up_gp(void)
 	}
 }
 
-static inline enum urcu_state urcu_qsbr_reader_state(unsigned long *ctr)
+static inline enum urcu_state urcu_qsbr_reader_state(unsigned long *ctr,
+						cmm_annotate_t *group)
 {
 	unsigned long v;
 
-	v = CMM_LOAD_SHARED(*ctr);
+	v = uatomic_load(ctr, CMM_RELAXED);
+	cmm_annotate_group_mem_acquire(group, ctr);
+
 	if (!v)
 		return URCU_READER_INACTIVE;
 	if (v == urcu_qsbr_gp.ctr)
@@ -155,9 +159,9 @@ static inline int _urcu_qsbr_read_ongoing(void)
  */
 static inline void _urcu_qsbr_quiescent_state_update_and_wakeup(unsigned long gp_ctr)
 {
-	cmm_smp_mb();
-	_CMM_STORE_SHARED(URCU_TLS(urcu_qsbr_reader).ctr, gp_ctr);
-	cmm_smp_mb();	/* write URCU_TLS(urcu_qsbr_reader).ctr before read futex */
+	uatomic_store(&URCU_TLS(urcu_qsbr_reader).ctr, gp_ctr, CMM_SEQ_CST);
+
+	/* write URCU_TLS(urcu_qsbr_reader).ctr before read futex */
 	urcu_qsbr_wake_up_gp();
 	cmm_smp_mb();
 }
@@ -179,7 +183,8 @@ static inline void _urcu_qsbr_quiescent_state(void)
 	unsigned long gp_ctr;
 
 	urcu_assert_debug(URCU_TLS(urcu_qsbr_reader).registered);
-	if ((gp_ctr = CMM_LOAD_SHARED(urcu_qsbr_gp.ctr)) == URCU_TLS(urcu_qsbr_reader).ctr)
+	gp_ctr = uatomic_load(&urcu_qsbr_gp.ctr, CMM_RELAXED);
+	if (gp_ctr == URCU_TLS(urcu_qsbr_reader).ctr)
 		return;
 	_urcu_qsbr_quiescent_state_update_and_wakeup(gp_ctr);
 }
@@ -195,9 +200,8 @@ static inline void _urcu_qsbr_quiescent_state(void)
 static inline void _urcu_qsbr_thread_offline(void)
 {
 	urcu_assert_debug(URCU_TLS(urcu_qsbr_reader).registered);
-	cmm_smp_mb();
-	CMM_STORE_SHARED(URCU_TLS(urcu_qsbr_reader).ctr, 0);
-	cmm_smp_mb();	/* write URCU_TLS(urcu_qsbr_reader).ctr before read futex */
+	uatomic_store(&URCU_TLS(urcu_qsbr_reader).ctr, 0, CMM_SEQ_CST);
+	/* write URCU_TLS(urcu_qsbr_reader).ctr before read futex */
 	urcu_qsbr_wake_up_gp();
 	cmm_barrier();	/* Ensure the compiler does not reorder us with mutex */
 }
@@ -212,9 +216,14 @@ static inline void _urcu_qsbr_thread_offline(void)
  */
 static inline void _urcu_qsbr_thread_online(void)
 {
+	unsigned long *pctr = &URCU_TLS(urcu_qsbr_reader).ctr;
+	unsigned long ctr;
+
 	urcu_assert_debug(URCU_TLS(urcu_qsbr_reader).registered);
 	cmm_barrier();	/* Ensure the compiler does not reorder us with mutex */
-	_CMM_STORE_SHARED(URCU_TLS(urcu_qsbr_reader).ctr, CMM_LOAD_SHARED(urcu_qsbr_gp.ctr));
+	ctr = uatomic_load(&urcu_qsbr_gp.ctr, CMM_RELAXED);
+	cmm_annotate_mem_acquire(&urcu_qsbr_gp.ctr);
+	uatomic_store(pctr, ctr, CMM_RELAXED);
 	cmm_smp_mb();
 }
 
diff --git a/src/rculfhash.c b/src/rculfhash.c
index b456415..cdc2aee 100644
--- a/src/rculfhash.c
+++ b/src/rculfhash.c
@@ -623,9 +623,7 @@ static void mutex_lock(pthread_mutex_t *mutex)
 		if (ret != EBUSY && ret != EINTR)
 			urcu_die(ret);
 		if (CMM_LOAD_SHARED(URCU_TLS(rcu_reader).need_mb)) {
-			cmm_smp_mb();
-			_CMM_STORE_SHARED(URCU_TLS(rcu_reader).need_mb, 0);
-			cmm_smp_mb();
+			uatomic_store(&URCU_TLS(rcu_reader).need_mb, 0, CMM_SEQ_CST);
 		}
 		(void) poll(NULL, 0, 10);
 	}
@@ -883,8 +881,10 @@ unsigned long _uatomic_xchg_monotonic_increase(unsigned long *ptr,
 	old1 = uatomic_read(ptr);
 	do {
 		old2 = old1;
-		if (old2 >= v)
+		if (old2 >= v) {
+			cmm_smp_mb();
 			return old2;
+		}
 	} while ((old1 = uatomic_cmpxchg(ptr, old2, v)) != old2);
 	return old2;
 }
@@ -1190,15 +1190,17 @@ int _cds_lfht_del(struct cds_lfht *ht, unsigned long size,
 	/*
 	 * The del operation semantic guarantees a full memory barrier
 	 * before the uatomic_or atomic commit of the deletion flag.
-	 */
-	cmm_smp_mb__before_uatomic_or();
-	/*
+	 *
 	 * We set the REMOVED_FLAG unconditionally. Note that there may
 	 * be more than one concurrent thread setting this flag.
 	 * Knowing which wins the race will be known after the garbage
 	 * collection phase, stay tuned!
+	 *
+	 * NOTE: The cast is here because Clang says that address argument to
+	 * atomic operation must be a pointer to integer.
 	 */
-	uatomic_or(&node->next, REMOVED_FLAG);
+	uatomic_or_mo((uintptr_t*) &node->next, REMOVED_FLAG, CMM_RELEASE);
+
 	/* We performed the (logical) deletion. */
 
 	/*
@@ -1223,7 +1225,7 @@ int _cds_lfht_del(struct cds_lfht *ht, unsigned long size,
 	 * was already set).
 	 */
 	if (!is_removal_owner(uatomic_xchg(&node->next,
-			flag_removal_owner(node->next))))
+			flag_removal_owner(uatomic_load(&node->next, CMM_RELAXED)))))
 		return 0;
 	else
 		return -ENOENT;
@@ -1389,9 +1391,10 @@ void init_table(struct cds_lfht *ht,
 
 		/*
 		 * Update table size.
+		 *
+		 * Populate data before RCU size.
 		 */
-		cmm_smp_wmb();	/* populate data before RCU size */
-		CMM_STORE_SHARED(ht->size, 1UL << i);
+		uatomic_store(&ht->size, 1UL << i, CMM_RELEASE);
 
 		dbg_printf("init new size: %lu\n", 1UL << i);
 		if (CMM_LOAD_SHARED(ht->in_progress_destroy))
@@ -1440,8 +1443,12 @@ void remove_table_partition(struct cds_lfht *ht, unsigned long i,
 		urcu_posix_assert(j >= size && j < (size << 1));
 		dbg_printf("remove entry: order %lu index %lu hash %lu\n",
 			   i, j, j);
-		/* Set the REMOVED_FLAG to freeze the ->next for gc */
-		uatomic_or(&fini_bucket->next, REMOVED_FLAG);
+		/* Set the REMOVED_FLAG to freeze the ->next for gc.
+		 *
+		 * NOTE: The cast is here because Clang says that address
+		 * argument to atomic operation must be a pointer to integer.
+		 */
+		uatomic_or((uintptr_t*) &fini_bucket->next, REMOVED_FLAG);
 		_cds_lfht_gc_bucket(parent_bucket, fini_bucket);
 	}
 	ht->flavor->read_unlock();
@@ -1667,7 +1674,14 @@ void cds_lfht_lookup(struct cds_lfht *ht, unsigned long hash,
 
 	reverse_hash = bit_reverse_ulong(hash);
 
-	size = rcu_dereference(ht->size);
+	/*
+	 * Use load acquire instead of rcu_dereference because there is no
+	 * dependency between the table size and the dereference of the bucket
+	 * content.
+	 *
+	 * This acquire is paired with the store release in init_table().
+	 */
+	size = uatomic_load(&ht->size, CMM_ACQUIRE);
 	bucket = lookup_bucket(ht, size, hash);
 	/* We can always skip the bucket node initially */
 	node = rcu_dereference(bucket->next);
@@ -1726,7 +1740,7 @@ void cds_lfht_next_duplicate(struct cds_lfht *ht __attribute__((unused)),
 		}
 		node = clear_flag(next);
 	}
-	urcu_posix_assert(!node || !is_bucket(CMM_LOAD_SHARED(node->next)));
+	urcu_posix_assert(!node || !is_bucket(uatomic_load(&node->next, CMM_RELAXED)));
 	iter->node = node;
 	iter->next = next;
 }
@@ -1750,7 +1764,7 @@ void cds_lfht_next(struct cds_lfht *ht __attribute__((unused)),
 		}
 		node = clear_flag(next);
 	}
-	urcu_posix_assert(!node || !is_bucket(CMM_LOAD_SHARED(node->next)));
+	urcu_posix_assert(!node || !is_bucket(uatomic_load(&node->next, CMM_RELAXED)));
 	iter->node = node;
 	iter->next = next;
 }
@@ -1762,7 +1776,7 @@ void cds_lfht_first(struct cds_lfht *ht, struct cds_lfht_iter *iter)
 	 * Get next after first bucket node. The first bucket node is the
 	 * first node of the linked list.
 	 */
-	iter->next = bucket_at(ht, 0)->next;
+	iter->next = uatomic_load(&bucket_at(ht, 0)->next, CMM_CONSUME);
 	cds_lfht_next(ht, iter);
 }
 
@@ -1772,7 +1786,7 @@ void cds_lfht_add(struct cds_lfht *ht, unsigned long hash,
 	unsigned long size;
 
 	node->reverse_hash = bit_reverse_ulong(hash);
-	size = rcu_dereference(ht->size);
+	size = uatomic_load(&ht->size, CMM_ACQUIRE);
 	_cds_lfht_add(ht, hash, NULL, NULL, size, node, NULL, 0);
 	ht_count_add(ht, size, hash);
 }
@@ -1787,7 +1801,7 @@ struct cds_lfht_node *cds_lfht_add_unique(struct cds_lfht *ht,
 	struct cds_lfht_iter iter;
 
 	node->reverse_hash = bit_reverse_ulong(hash);
-	size = rcu_dereference(ht->size);
+	size = uatomic_load(&ht->size, CMM_ACQUIRE);
 	_cds_lfht_add(ht, hash, match, key, size, node, &iter, 0);
 	if (iter.node == node)
 		ht_count_add(ht, size, hash);
@@ -1804,7 +1818,7 @@ struct cds_lfht_node *cds_lfht_add_replace(struct cds_lfht *ht,
 	struct cds_lfht_iter iter;
 
 	node->reverse_hash = bit_reverse_ulong(hash);
-	size = rcu_dereference(ht->size);
+	size = uatomic_load(&ht->size, CMM_ACQUIRE);
 	for (;;) {
 		_cds_lfht_add(ht, hash, match, key, size, node, &iter, 0);
 		if (iter.node == node) {
@@ -1833,7 +1847,7 @@ int cds_lfht_replace(struct cds_lfht *ht,
 		return -EINVAL;
 	if (caa_unlikely(!match(old_iter->node, key)))
 		return -EINVAL;
-	size = rcu_dereference(ht->size);
+	size = uatomic_load(&ht->size, CMM_ACQUIRE);
 	return _cds_lfht_replace(ht, size, old_iter->node, old_iter->next,
 			new_node);
 }
@@ -1843,7 +1857,7 @@ int cds_lfht_del(struct cds_lfht *ht, struct cds_lfht_node *node)
 	unsigned long size;
 	int ret;
 
-	size = rcu_dereference(ht->size);
+	size = uatomic_load(&ht->size, CMM_ACQUIRE);
 	ret = _cds_lfht_del(ht, size, node);
 	if (!ret) {
 		unsigned long hash;
@@ -1957,7 +1971,7 @@ int cds_lfht_destroy(struct cds_lfht *ht, pthread_attr_t **attr)
 		if (!cds_lfht_is_empty(ht))
 			return -EPERM;
 		/* Cancel ongoing resize operations. */
-		_CMM_STORE_SHARED(ht->in_progress_destroy, 1);
+		uatomic_store(&ht->in_progress_destroy, 1, CMM_RELAXED);
 		if (attr) {
 			*attr = ht->caller_resize_attr;
 			ht->caller_resize_attr = NULL;
@@ -2077,19 +2091,22 @@ void _do_cds_lfht_resize(struct cds_lfht *ht)
 	 * Resize table, re-do if the target size has changed under us.
 	 */
 	do {
-		if (CMM_LOAD_SHARED(ht->in_progress_destroy))
+		if (uatomic_load(&ht->in_progress_destroy, CMM_RELAXED))
 			break;
-		ht->resize_initiated = 1;
+
+		uatomic_store(&ht->resize_initiated, 1, CMM_RELAXED);
+
 		old_size = ht->size;
-		new_size = CMM_LOAD_SHARED(ht->resize_target);
+		new_size = uatomic_load(&ht->resize_target, CMM_RELAXED);
 		if (old_size < new_size)
 			_do_cds_lfht_grow(ht, old_size, new_size);
 		else if (old_size > new_size)
 			_do_cds_lfht_shrink(ht, old_size, new_size);
-		ht->resize_initiated = 0;
+
+		uatomic_store(&ht->resize_initiated, 0, CMM_RELAXED);
 		/* write resize_initiated before read resize_target */
 		cmm_smp_mb();
-	} while (ht->size != CMM_LOAD_SHARED(ht->resize_target));
+	} while (ht->size != uatomic_load(&ht->resize_target, CMM_RELAXED));
 }
 
 static
@@ -2110,7 +2127,12 @@ void resize_target_update_count(struct cds_lfht *ht,
 void cds_lfht_resize(struct cds_lfht *ht, unsigned long new_size)
 {
 	resize_target_update_count(ht, new_size);
-	CMM_STORE_SHARED(ht->resize_initiated, 1);
+
+	/*
+	 * Set flags has early as possible even in contention case.
+	 */
+	uatomic_store(&ht->resize_initiated, 1, CMM_RELAXED);
+
 	mutex_lock(&ht->resize_mutex);
 	_do_cds_lfht_resize(ht);
 	mutex_unlock(&ht->resize_mutex);
@@ -2136,10 +2158,12 @@ void __cds_lfht_resize_lazy_launch(struct cds_lfht *ht)
 {
 	struct resize_work *work;
 
-	/* Store resize_target before read resize_initiated */
-	cmm_smp_mb();
-	if (!CMM_LOAD_SHARED(ht->resize_initiated)) {
-		if (CMM_LOAD_SHARED(ht->in_progress_destroy)) {
+	/*
+	 * Store to resize_target is before read resize_initiated as guaranteed
+	 * by either cmpxchg or _uatomic_xchg_monotonic_increase.
+	 */
+	if (!uatomic_load(&ht->resize_initiated, CMM_RELAXED)) {
+		if (uatomic_load(&ht->in_progress_destroy, CMM_RELAXED)) {
 			return;
 		}
 		work = malloc(sizeof(*work));
@@ -2150,7 +2174,7 @@ void __cds_lfht_resize_lazy_launch(struct cds_lfht *ht)
 		work->ht = ht;
 		urcu_workqueue_queue_work(cds_lfht_workqueue,
 			&work->work, do_resize_cb);
-		CMM_STORE_SHARED(ht->resize_initiated, 1);
+		uatomic_store(&ht->resize_initiated, 1, CMM_RELAXED);
 	}
 }
 
diff --git a/src/urcu-bp.c b/src/urcu-bp.c
index 47fad8e..08aaa88 100644
--- a/src/urcu-bp.c
+++ b/src/urcu-bp.c
@@ -36,6 +36,7 @@
 #include <stdbool.h>
 #include <sys/mman.h>
 
+#include <urcu/annotate.h>
 #include <urcu/assert.h>
 #include <urcu/config.h>
 #include <urcu/arch.h>
@@ -220,7 +221,8 @@ static void smp_mb_master(void)
  */
 static void wait_for_readers(struct cds_list_head *input_readers,
 			struct cds_list_head *cur_snap_readers,
-			struct cds_list_head *qsreaders)
+			struct cds_list_head *qsreaders,
+			cmm_annotate_t *group)
 {
 	unsigned int wait_loops = 0;
 	struct urcu_bp_reader *index, *tmp;
@@ -235,7 +237,7 @@ static void wait_for_readers(struct cds_list_head *input_readers,
 			wait_loops++;
 
 		cds_list_for_each_entry_safe(index, tmp, input_readers, node) {
-			switch (urcu_bp_reader_state(&index->ctr)) {
+			switch (urcu_bp_reader_state(&index->ctr, group)) {
 			case URCU_BP_READER_ACTIVE_CURRENT:
 				if (cur_snap_readers) {
 					cds_list_move(&index->node,
@@ -274,6 +276,8 @@ static void wait_for_readers(struct cds_list_head *input_readers,
 
 void urcu_bp_synchronize_rcu(void)
 {
+	cmm_annotate_define(acquire_group);
+	cmm_annotate_define(release_group);
 	CDS_LIST_HEAD(cur_snap_readers);
 	CDS_LIST_HEAD(qsreaders);
 	sigset_t newmask, oldmask;
@@ -295,13 +299,14 @@ void urcu_bp_synchronize_rcu(void)
 	 * where new ptr points to. */
 	/* Write new ptr before changing the qparity */
 	smp_mb_master();
+	cmm_annotate_group_mb_release(&release_group);
 
 	/*
 	 * Wait for readers to observe original parity or be quiescent.
 	 * wait_for_readers() can release and grab again rcu_registry_lock
 	 * internally.
 	 */
-	wait_for_readers(&registry, &cur_snap_readers, &qsreaders);
+	wait_for_readers(&registry, &cur_snap_readers, &qsreaders, &acquire_group);
 
 	/*
 	 * Adding a cmm_smp_mb() which is _not_ formally required, but makes the
@@ -311,7 +316,8 @@ void urcu_bp_synchronize_rcu(void)
 	cmm_smp_mb();
 
 	/* Switch parity: 0 -> 1, 1 -> 0 */
-	CMM_STORE_SHARED(rcu_gp.ctr, rcu_gp.ctr ^ URCU_BP_GP_CTR_PHASE);
+	cmm_annotate_group_mem_release(&release_group, &rcu_gp.ctr);
+	uatomic_store(&rcu_gp.ctr, rcu_gp.ctr ^ URCU_BP_GP_CTR_PHASE, CMM_RELAXED);
 
 	/*
 	 * Must commit qparity update to memory before waiting for other parity
@@ -332,7 +338,7 @@ void urcu_bp_synchronize_rcu(void)
 	 * wait_for_readers() can release and grab again rcu_registry_lock
 	 * internally.
 	 */
-	wait_for_readers(&cur_snap_readers, NULL, &qsreaders);
+	wait_for_readers(&cur_snap_readers, NULL, &qsreaders, &acquire_group);
 
 	/*
 	 * Put quiescent reader list back into registry.
@@ -344,6 +350,7 @@ void urcu_bp_synchronize_rcu(void)
 	 * freed.
 	 */
 	smp_mb_master();
+	cmm_annotate_group_mb_acquire(&acquire_group);
 out:
 	mutex_unlock(&rcu_registry_lock);
 	mutex_unlock(&rcu_gp_lock);
diff --git a/src/urcu-qsbr.c b/src/urcu-qsbr.c
index 318ab29..fd50e80 100644
--- a/src/urcu-qsbr.c
+++ b/src/urcu-qsbr.c
@@ -34,6 +34,7 @@
 #include <errno.h>
 #include <poll.h>
 
+#include <urcu/annotate.h>
 #include <urcu/assert.h>
 #include <urcu/wfcqueue.h>
 #include <urcu/map/urcu-qsbr.h>
@@ -156,7 +157,8 @@ static void wait_gp(void)
  */
 static void wait_for_readers(struct cds_list_head *input_readers,
 			struct cds_list_head *cur_snap_readers,
-			struct cds_list_head *qsreaders)
+			struct cds_list_head *qsreaders,
+			cmm_annotate_t *group)
 {
 	unsigned int wait_loops = 0;
 	struct urcu_qsbr_reader *index, *tmp;
@@ -183,7 +185,7 @@ static void wait_for_readers(struct cds_list_head *input_readers,
 			cmm_smp_mb();
 		}
 		cds_list_for_each_entry_safe(index, tmp, input_readers, node) {
-			switch (urcu_qsbr_reader_state(&index->ctr)) {
+			switch (urcu_qsbr_reader_state(&index->ctr, group)) {
 			case URCU_READER_ACTIVE_CURRENT:
 				if (cur_snap_readers) {
 					cds_list_move(&index->node,
@@ -208,8 +210,7 @@ static void wait_for_readers(struct cds_list_head *input_readers,
 		if (cds_list_empty(input_readers)) {
 			if (wait_loops >= RCU_QS_ACTIVE_ATTEMPTS) {
 				/* Read reader_gp before write futex */
-				cmm_smp_mb();
-				uatomic_set(&urcu_qsbr_gp.futex, 0);
+				uatomic_store(&urcu_qsbr_gp.futex, 0, CMM_RELEASE);
 			}
 			break;
 		} else {
@@ -238,6 +239,8 @@ static void wait_for_readers(struct cds_list_head *input_readers,
 #if (CAA_BITS_PER_LONG < 64)
 void urcu_qsbr_synchronize_rcu(void)
 {
+	cmm_annotate_define(acquire_group);
+	cmm_annotate_define(release_group);
 	CDS_LIST_HEAD(cur_snap_readers);
 	CDS_LIST_HEAD(qsreaders);
 	unsigned long was_online;
@@ -258,6 +261,7 @@ void urcu_qsbr_synchronize_rcu(void)
 		urcu_qsbr_thread_offline();
 	else
 		cmm_smp_mb();
+	cmm_annotate_group_mb_release(&release_group);
 
 	/*
 	 * Add ourself to gp_waiters queue of threads awaiting to wait
@@ -289,7 +293,7 @@ void urcu_qsbr_synchronize_rcu(void)
 	 * wait_for_readers() can release and grab again rcu_registry_lock
 	 * internally.
 	 */
-	wait_for_readers(&registry, &cur_snap_readers, &qsreaders);
+	wait_for_readers(&registry, &cur_snap_readers, &qsreaders, &acquire_group);
 
 	/*
 	 * Must finish waiting for quiescent state for original parity
@@ -309,7 +313,8 @@ void urcu_qsbr_synchronize_rcu(void)
 	cmm_smp_mb();
 
 	/* Switch parity: 0 -> 1, 1 -> 0 */
-	CMM_STORE_SHARED(urcu_qsbr_gp.ctr, urcu_qsbr_gp.ctr ^ URCU_QSBR_GP_CTR);
+	cmm_annotate_group_mem_release(&release_group, &urcu_qsbr_gp.ctr);
+	uatomic_store(&urcu_qsbr_gp.ctr, urcu_qsbr_gp.ctr ^ URCU_QSBR_GP_CTR, CMM_RELAXED);
 
 	/*
 	 * Must commit urcu_qsbr_gp.ctr update to memory before waiting for
@@ -332,7 +337,7 @@ void urcu_qsbr_synchronize_rcu(void)
 	 * wait_for_readers() can release and grab again rcu_registry_lock
 	 * internally.
 	 */
-	wait_for_readers(&cur_snap_readers, NULL, &qsreaders);
+	wait_for_readers(&cur_snap_readers, NULL, &qsreaders, &acquire_group);
 
 	/*
 	 * Put quiescent reader list back into registry.
@@ -347,6 +352,8 @@ gp_end:
 	 * Finish waiting for reader threads before letting the old ptr being
 	 * freed.
 	 */
+	cmm_annotate_group_mb_acquire(&acquire_group);
+
 	if (was_online)
 		urcu_qsbr_thread_online();
 	else
@@ -355,6 +362,8 @@ gp_end:
 #else /* !(CAA_BITS_PER_LONG < 64) */
 void urcu_qsbr_synchronize_rcu(void)
 {
+	cmm_annotate_define(acquire_group);
+	cmm_annotate_define(release_group);
 	CDS_LIST_HEAD(qsreaders);
 	unsigned long was_online;
 	DEFINE_URCU_WAIT_NODE(wait, URCU_WAIT_WAITING);
@@ -371,6 +380,7 @@ void urcu_qsbr_synchronize_rcu(void)
 		urcu_qsbr_thread_offline();
 	else
 		cmm_smp_mb();
+	cmm_annotate_group_mb_release(&release_group);
 
 	/*
 	 * Add ourself to gp_waiters queue of threads awaiting to wait
@@ -398,7 +408,8 @@ void urcu_qsbr_synchronize_rcu(void)
 		goto out;
 
 	/* Increment current G.P. */
-	CMM_STORE_SHARED(urcu_qsbr_gp.ctr, urcu_qsbr_gp.ctr + URCU_QSBR_GP_CTR);
+	cmm_annotate_group_mem_release(&release_group, &urcu_qsbr_gp.ctr);
+	uatomic_store(&urcu_qsbr_gp.ctr, urcu_qsbr_gp.ctr + URCU_QSBR_GP_CTR, CMM_RELAXED);
 
 	/*
 	 * Must commit urcu_qsbr_gp.ctr update to memory before waiting for
@@ -421,7 +432,7 @@ void urcu_qsbr_synchronize_rcu(void)
 	 * wait_for_readers() can release and grab again rcu_registry_lock
 	 * internally.
 	 */
-	wait_for_readers(&registry, NULL, &qsreaders);
+	wait_for_readers(&registry, NULL, &qsreaders, &acquire_group);
 
 	/*
 	 * Put quiescent reader list back into registry.
@@ -436,6 +447,8 @@ gp_end:
 		urcu_qsbr_thread_online();
 	else
 		cmm_smp_mb();
+
+	cmm_annotate_group_mb_acquire(&acquire_group);
 }
 #endif  /* !(CAA_BITS_PER_LONG < 64) */
 
diff --git a/src/urcu-wait.h b/src/urcu-wait.h
index 4667a13..1ffced4 100644
--- a/src/urcu-wait.h
+++ b/src/urcu-wait.h
@@ -126,9 +126,8 @@ void urcu_wait_node_init(struct urcu_wait_node *node,
 static inline
 void urcu_adaptative_wake_up(struct urcu_wait_node *wait)
 {
-	cmm_smp_mb();
 	urcu_posix_assert(uatomic_read(&wait->state) == URCU_WAIT_WAITING);
-	uatomic_set(&wait->state, URCU_WAIT_WAKEUP);
+	uatomic_store(&wait->state, URCU_WAIT_WAKEUP, CMM_RELEASE);
 	if (!(uatomic_read(&wait->state) & URCU_WAIT_RUNNING)) {
 		if (futex_noasync(&wait->state, FUTEX_WAKE, 1,
 				NULL, NULL, 0) < 0)
@@ -150,11 +149,11 @@ void urcu_adaptative_busy_wait(struct urcu_wait_node *wait)
 	/* Load and test condition before read state */
 	cmm_smp_rmb();
 	for (i = 0; i < URCU_WAIT_ATTEMPTS; i++) {
-		if (uatomic_read(&wait->state) != URCU_WAIT_WAITING)
+		if (uatomic_load(&wait->state, CMM_ACQUIRE) != URCU_WAIT_WAITING)
 			goto skip_futex_wait;
 		caa_cpu_relax();
 	}
-	while (uatomic_read(&wait->state) == URCU_WAIT_WAITING) {
+	while (uatomic_load(&wait->state, CMM_ACQUIRE) == URCU_WAIT_WAITING) {
 		if (!futex_noasync(&wait->state, FUTEX_WAIT, URCU_WAIT_WAITING, NULL, NULL, 0)) {
 			/*
 			 * Prior queued wakeups queued by unrelated code
@@ -189,7 +188,7 @@ skip_futex_wait:
 	 * memory allocated for struct urcu_wait.
 	 */
 	for (i = 0; i < URCU_WAIT_ATTEMPTS; i++) {
-		if (uatomic_read(&wait->state) & URCU_WAIT_TEARDOWN)
+		if (uatomic_load(&wait->state, CMM_RELAXED) & URCU_WAIT_TEARDOWN)
 			break;
 		caa_cpu_relax();
 	}
diff --git a/src/urcu.c b/src/urcu.c
index c60307e..353e9bb 100644
--- a/src/urcu.c
+++ b/src/urcu.c
@@ -38,6 +38,7 @@
 #include <poll.h>
 
 #include <urcu/config.h>
+#include <urcu/annotate.h>
 #include <urcu/assert.h>
 #include <urcu/arch.h>
 #include <urcu/wfcqueue.h>
@@ -300,7 +301,8 @@ end:
  */
 static void wait_for_readers(struct cds_list_head *input_readers,
 			struct cds_list_head *cur_snap_readers,
-			struct cds_list_head *qsreaders)
+			struct cds_list_head *qsreaders,
+			cmm_annotate_t *group)
 {
 	unsigned int wait_loops = 0;
 	struct urcu_reader *index, *tmp;
@@ -323,7 +325,7 @@ static void wait_for_readers(struct cds_list_head *input_readers,
 		}
 
 		cds_list_for_each_entry_safe(index, tmp, input_readers, node) {
-			switch (urcu_common_reader_state(&rcu_gp, &index->ctr)) {
+			switch (urcu_common_reader_state(&rcu_gp, &index->ctr, group)) {
 			case URCU_READER_ACTIVE_CURRENT:
 				if (cur_snap_readers) {
 					cds_list_move(&index->node,
@@ -407,6 +409,8 @@ static void wait_for_readers(struct cds_list_head *input_readers,
 
 void synchronize_rcu(void)
 {
+	cmm_annotate_define(acquire_group);
+	cmm_annotate_define(release_group);
 	CDS_LIST_HEAD(cur_snap_readers);
 	CDS_LIST_HEAD(qsreaders);
 	DEFINE_URCU_WAIT_NODE(wait, URCU_WAIT_WAITING);
@@ -421,10 +425,11 @@ void synchronize_rcu(void)
 	 * queue before their insertion into the wait queue.
 	 */
 	if (urcu_wait_add(&gp_waiters, &wait) != 0) {
-		/* Not first in queue: will be awakened by another thread. */
+		/*
+		 * Not first in queue: will be awakened by another thread.
+		 * Implies a memory barrier after grace period.
+		 */
 		urcu_adaptative_busy_wait(&wait);
-		/* Order following memory accesses after grace period. */
-		cmm_smp_mb();
 		return;
 	}
 	/* We won't need to wake ourself up */
@@ -449,13 +454,14 @@ void synchronize_rcu(void)
 	 */
 	/* Write new ptr before changing the qparity */
 	smp_mb_master();
+	cmm_annotate_group_mb_release(&release_group);
 
 	/*
 	 * Wait for readers to observe original parity or be quiescent.
 	 * wait_for_readers() can release and grab again rcu_registry_lock
 	 * internally.
 	 */
-	wait_for_readers(&registry, &cur_snap_readers, &qsreaders);
+	wait_for_readers(&registry, &cur_snap_readers, &qsreaders, &acquire_group);
 
 	/*
 	 * Must finish waiting for quiescent state for original parity before
@@ -474,7 +480,8 @@ void synchronize_rcu(void)
 	cmm_smp_mb();
 
 	/* Switch parity: 0 -> 1, 1 -> 0 */
-	CMM_STORE_SHARED(rcu_gp.ctr, rcu_gp.ctr ^ URCU_GP_CTR_PHASE);
+	cmm_annotate_group_mem_release(&release_group, &rcu_gp.ctr);
+	uatomic_store(&rcu_gp.ctr, rcu_gp.ctr ^ URCU_GP_CTR_PHASE, CMM_RELAXED);
 
 	/*
 	 * Must commit rcu_gp.ctr update to memory before waiting for quiescent
@@ -497,7 +504,7 @@ void synchronize_rcu(void)
 	 * wait_for_readers() can release and grab again rcu_registry_lock
 	 * internally.
 	 */
-	wait_for_readers(&cur_snap_readers, NULL, &qsreaders);
+	wait_for_readers(&cur_snap_readers, NULL, &qsreaders, &acquire_group);
 
 	/*
 	 * Put quiescent reader list back into registry.
@@ -510,6 +517,7 @@ void synchronize_rcu(void)
 	 * iterates on reader threads.
 	 */
 	smp_mb_master();
+	cmm_annotate_group_mb_acquire(&acquire_group);
 out:
 	mutex_unlock(&rcu_registry_lock);
 	mutex_unlock(&rcu_gp_lock);
-- 
2.39.2

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [lttng-dev] [PATCH 00/11] Add support for TSAN to liburcu
  2023-05-15 20:17 [lttng-dev] [PATCH 00/11] Add support for TSAN to liburcu Olivier Dion via lttng-dev
                   ` (10 preceding siblings ...)
  2023-05-15 20:17 ` [lttng-dev] [PATCH 11/11] urcu/annotate: Add CMM annotation Olivier Dion via lttng-dev
@ 2023-05-16  8:18 ` Dmitry Vyukov via lttng-dev
  2023-05-16 15:47   ` Olivier Dion via lttng-dev
  2023-06-07 18:53 ` [lttng-dev] [PATCH v2 00/12] " Olivier Dion via lttng-dev
                   ` (12 subsequent siblings)
  24 siblings, 1 reply; 69+ messages in thread
From: Dmitry Vyukov via lttng-dev @ 2023-05-16  8:18 UTC (permalink / raw)
  To: Olivier Dion; +Cc: lttng-dev, Paul E. McKenney

On Mon, 15 May 2023 at 22:18, Olivier Dion <odion@efficios.com> wrote:
>
> This patch set adds support for TSAN in liburcu.

Hi Olivier,

Where can I see the actual changes? Preferably side-by-side diffs with
full context. Are they (or can they be) uploaded somewhere on
github/gerrit?

Are there any remaining open questions?

Is there CI coverage with -fsanitize=thread?


> * Here are the major changes
>
>   - Usage of compiler atomic builtins is added to the uatomic API.  This is
>     required for TSAN to understand atomic memory accesses.  If the compiler
>     supports such builtins, they are used by default.  User can opt-out and use
>     the legacy implementation of the uatomic API by using the
>     `--disable-atomic-builtins' configuration option.
>
>   - The CMM memory model is introduced but yet formalized. It tries to be as
>     close as possible to the C11 memory model while offering primitives such as
>     cmm_smp_wmb(), cmm_smp_rmb() and cmm_mb() that can't be expressed in it.
>     For example, cmm_mb() can be used for ordering memory accesses to MMIO
>     devices, which is out of the scope of the C11 memory model.
>
>   - The CMM annotation layer is a new public API that is highly experimental and
>     not guaranteed to be stable at this stage.  It serves the dual purpose of
>     verifying local (intra-thread) relaxed atomic accesses ordering with a
>     memory barrier and global (inter-thread) relaxed atomic accesses with a
>     shared state.  The second purpose is necessary for TSAN to understand memory
>     accesses ordering since it does not fully support thread fence yet.
>
> * CMM annotation example
>
>   Consider the following pseudo-code of writer side in synchronize_rcu().  An
>   acquire group is defined on the stack of the writer.  Annotations are made
>   onto the group to ensure ordering of relaxed memory accesses in reader_state()
>   before the memory barrier at the end of synchronize_rcu().  It also helps TSAN
>   to understand that the relaxed accesses in reader_state() act like acquire
>   accesses because of the memory barrier in synchronize_rcu().
>
>   In other words, the purpose of this annotation is to convert a group of
>   load-acquire memory operations into load-relaxed memory operations followed by
>   a single memory barrier.  This highly benefits weakly ordered architectures by
>   having a constant number of memory barriers instead of being linearly
>   proportional to the number of loads.  This does not benefit TSO
>   architectures.
>
> ```
> enum urcu_state reader_state(unsigned long *ctr, cmm_annotate_t *acquire_group)
> {
>         unsigned long v;
>
>         v = uatomic_load(ctr, CMM_RELAXED);
>         cmm_annotate_group_mem_acquire(acquire_group, ctr);
>         // ...
> }
>
> void wait_for_readers(..., cmm_annotate_group *acquire_group)
> {
>         // ...
>         switch (reader_state(..., acquire_group)) {
>                 // ...
>         }
>         // ...
> }
>
> void synchronize_rcu()
> {
>         cmm_annotate_define(acquire_group);
>         // ...
>         wait_for_readers(..., &acquire_group);
>         // ...
>         cmm_annotate_group_mb_acquire(&acquire_group);
>         cmm_smp_mb();
> }
> ```
>
> * Known limitation
>
>   The only known limitation is with the urcu-signal flavor.  Indeed, TSAN
>   hijacks calls to sigaction(2) and installs its own signal handler that will
>   deliver the signals to the urcu handler at synchronization points.  This is
>   known to deadlock the urcu-signal flavor in at least one case.  See commit log
>   of `urcu/annotate: Add CMM annotation' for a minimal reproducer outside of
>   liburcu.
>
>   Therefore, we have the intention of deprecating the urcu-signal flavor in the
>   future, starting by disabling it by default.
>
> Olivier Dion (11):
>   configure: Add --disable-atomic-builtins option
>   urcu/uatomic: Use atomic builtins if configured
>   urcu/compiler: Use atomic builtins if configured
>   urcu/arch/generic: Use atomic builtins if configured
>   urcu/system: Use atomic builtins if configured
>   urcu/uatomic: Add CMM memory model
>   urcu-wait: Fix wait state load/store
>   tests: Use uatomic for accessing global states
>   benchmark: Use uatomic for accessing global states
>   tests/unit/test_build: Quiet unused return value
>   urcu/annotate: Add CMM annotation
>
>  README.md                               |  11 ++
>  configure.ac                            |  26 ++++
>  include/Makefile.am                     |   4 +
>  include/urcu/annotate.h                 | 174 ++++++++++++++++++++++++
>  include/urcu/arch/generic.h             |  37 +++++
>  include/urcu/compiler.h                 |  20 ++-
>  include/urcu/static/pointer.h           |  40 ++----
>  include/urcu/static/urcu-bp.h           |  12 +-
>  include/urcu/static/urcu-common.h       |   8 +-
>  include/urcu/static/urcu-mb.h           |  11 +-
>  include/urcu/static/urcu-memb.h         |  26 +++-
>  include/urcu/static/urcu-qsbr.h         |  29 ++--
>  include/urcu/system.h                   |  21 +++
>  include/urcu/uatomic.h                  |  25 +++-
>  include/urcu/uatomic/builtins-generic.h | 124 +++++++++++++++++
>  include/urcu/uatomic/builtins-x86.h     | 124 +++++++++++++++++
>  include/urcu/uatomic/builtins.h         |  83 +++++++++++
>  include/urcu/uatomic/generic.h          | 128 +++++++++++++++++
>  src/rculfhash.c                         |  92 ++++++++-----
>  src/urcu-bp.c                           |  17 ++-
>  src/urcu-pointer.c                      |   9 +-
>  src/urcu-qsbr.c                         |  31 +++--
>  src/urcu-wait.h                         |  15 +-
>  src/urcu.c                              |  24 ++--
>  tests/benchmark/Makefile.am             |  91 +++++++------
>  tests/benchmark/common-states.c         |   1 +
>  tests/benchmark/common-states.h         |  51 +++++++
>  tests/benchmark/test_mutex.c            |  32 +----
>  tests/benchmark/test_perthreadlock.c    |  32 +----
>  tests/benchmark/test_rwlock.c           |  32 +----
>  tests/benchmark/test_urcu.c             |  33 +----
>  tests/benchmark/test_urcu_assign.c      |  33 +----
>  tests/benchmark/test_urcu_bp.c          |  33 +----
>  tests/benchmark/test_urcu_defer.c       |  33 +----
>  tests/benchmark/test_urcu_gc.c          |  34 +----
>  tests/benchmark/test_urcu_hash.c        |   6 +-
>  tests/benchmark/test_urcu_hash.h        |  15 --
>  tests/benchmark/test_urcu_hash_rw.c     |  10 +-
>  tests/benchmark/test_urcu_hash_unique.c |  10 +-
>  tests/benchmark/test_urcu_lfq.c         |  20 +--
>  tests/benchmark/test_urcu_lfs.c         |  20 +--
>  tests/benchmark/test_urcu_lfs_rcu.c     |  20 +--
>  tests/benchmark/test_urcu_qsbr.c        |  33 +----
>  tests/benchmark/test_urcu_qsbr_gc.c     |  34 +----
>  tests/benchmark/test_urcu_wfcq.c        |  22 ++-
>  tests/benchmark/test_urcu_wfq.c         |  20 +--
>  tests/benchmark/test_urcu_wfs.c         |  22 ++-
>  tests/common/api.h                      |  12 +-
>  tests/regression/rcutorture.h           | 102 ++++++++++----
>  tests/unit/test_build.c                 |   8 +-
>  50 files changed, 1227 insertions(+), 623 deletions(-)
>  create mode 100644 include/urcu/annotate.h
>  create mode 100644 include/urcu/uatomic/builtins-generic.h
>  create mode 100644 include/urcu/uatomic/builtins-x86.h
>  create mode 100644 include/urcu/uatomic/builtins.h
>  create mode 100644 tests/benchmark/common-states.c
>  create mode 100644 tests/benchmark/common-states.h
>
> --
> 2.39.2
>
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [lttng-dev] [PATCH 00/11] Add support for TSAN to liburcu
  2023-05-16  8:18 ` [lttng-dev] [PATCH 00/11] Add support for TSAN to liburcu Dmitry Vyukov via lttng-dev
@ 2023-05-16 15:47   ` Olivier Dion via lttng-dev
  2023-05-17 10:21     ` Dmitry Vyukov via lttng-dev
  0 siblings, 1 reply; 69+ messages in thread
From: Olivier Dion via lttng-dev @ 2023-05-16 15:47 UTC (permalink / raw)
  To: Dmitry Vyukov; +Cc: lttng-dev, Paul E. McKenney

On Tue, 16 May 2023, Dmitry Vyukov via lttng-dev <lttng-dev@lists.lttng.org> wrote:
> On Mon, 15 May 2023 at 22:18, Olivier Dion <odion@efficios.com> wrote:

Hi Dmitry,

> Where can I see the actual changes? Preferably side-by-side diffs with
> full context. Are they (or can they be) uploaded somewhere on
> github/gerrit?

Here's the link to the first patch of the set on our Gerrit
<https://review.lttng.org/c/userspace-rcu/+/9737>.

> Are there any remaining open questions?

On the Gerrit no.  But I would add this for the known issues:

We have a regression test for forking.  We get the following when
running it:

==23432==ThreadSanitizer: starting new threads after multi-threaded fork is not supported. Dying (set die_after_fork=0 to override)

With TSAN_OPTIONS=die_after_fork=0, here are the results for GCC and
Clang.

* gcc 11.3.0
  ==25266==ThreadSanitizer: dup thread with used id 0x7fd40dafe600

  Looks like this was fixed with recent merged of TSAN in gcc 13.

* clang 14.0.6

Tests pass but are very slow.  This seems to be because we're calling
exit(3) in the childs.  Changing it to _exit(2) solves the issue.  The
only thing I can think of is that exit(3) is not async-signal-safe like
_exit(2), yet we do other none-async safe calls like malloc(3) and
free(3) in the childs.

Here are some perf records of that:

With exit(3)
```
  12.35%  test_urcu_fork.  [unknown]                [k] 0xffffffff89ebd22b
   8.69%  test_urcu_fork.  [unknown]                [k] 0xffffffff89ebd8fa
   7.50%  test_urcu_fork.  [unknown]                [k] 0xffffffff8a001360
   6.11%  test_urcu_fork.  test_urcu_fork.tap       [.] __sanitizer::internal_memset
   5.85%  test_urcu_fork.  test_urcu_fork.tap       [.] __tsan::ForkChildAfter
   1.96%  test_urcu_fork.  [unknown]                [k] 0xffffffff892f8efd
   1.88%  test_urcu_fork.  [unknown]                [k] 0xffffffff89ec8e2c
   1.29%  test_urcu_fork.  [unknown]                [k] 0xffffffff890a99d1
   0.79%  test_urcu_fork.  [unknown]                [k] 0xffffffff8918c566
   0.75%  test_urcu_fork.  test_urcu_fork.tap       [.] __sanitizer::ThreadRegistry::OnFork
   0.71%  test_urcu_fork.  [unknown]                [k] 0xffffffff89349574
   0.64%  test_urcu_fork.  [unknown]                [k] 0xffffffff89ee424f
   0.57%  test_urcu_fork.  test_urcu_fork.tap       [.] __tsan::MetaMap::AllocBlock
   0.55%  test_urcu_fork.  [unknown]                [k] 0xffffffff89367249
   0.53%  test_urcu_fork.  test_urcu_fork.tap       [.] __tsan_read8
   0.51%  test_urcu_fork.  [unknown]                [k] 0xffffffff89ec8e05
```

With _exit(2)
```
  12.26%  test_urcu_fork.  [unknown]             [k] 0xffffffff89ebd8fa
   9.51%  test_urcu_fork.  [unknown]             [k] 0xffffffff89ebd22b
   6.78%  test_urcu_fork.  test_urcu_fork.tap    [.] __sanitizer::internal_memset
   6.49%  test_urcu_fork.  [unknown]             [k] 0xffffffff8a001360
   4.92%  test_urcu_fork.  test_urcu_fork.tap    [.] __tsan::ForkChildAfter
   2.92%  test_urcu_fork.  [unknown]             [k] 0xffffffff892f8efd
   2.19%  test_urcu_fork.  [unknown]             [k] 0xffffffff89ec8e2c
   1.88%  test_urcu_fork.  [unknown]             [k] 0xffffffff890a99d1
   0.83%  test_urcu_fork.  test_urcu_fork.tap    [.] __tsan::MetaMap::AllocBlock
   0.72%  test_urcu_fork.  [unknown]             [k] 0xffffffff892f8f1c
   0.67%  test_urcu_fork.  [unknown]             [k] 0xffffffff8918c566
   0.65%  test_urcu_fork.  [unknown]             [k] 0xffffffff89ec16dd
   0.62%  test_urcu_fork.  test_urcu_fork.tap    [.] __sanitizer::CombinedAllocator<__sanitizer::SizeClassAllocator32<__sanitizer::AP32>, __sanitizer::LargeMmapAllocatorPtrArrayStatic>::Allocate
   0.61%  test_urcu_fork.  test_urcu_fork.tap    [.] __tsan_write4
   0.60%  test_urcu_fork.  [unknown]             [k] 0xffffffff89367249
   0.59%  test_urcu_fork.  [unknown]             [k] 0xffffffff89ee424f
   0.50%  test_urcu_fork.  test_urcu_fork.tap    [.] __sanitizer::ThreadRegistry::OnFork
   0.50%  test_urcu_fork.  test_urcu_fork.tap    [.] __tsan_write8
   0.47%  test_urcu_fork.  test_urcu_fork.tap    [.] __tsan::ForkBefore
   0.45%  test_urcu_fork.  test_urcu_fork.tap    [.] __tsan_func_entry
```

> Is there CI coverage with -fsanitize=thread?

Since the urcu-signal flavor deadlocks with TSAN (see reproducer in
commit log), the CI simply timeouts.  I do however have a job with TSAN
as an matrix axis here <https://ci.lttng.org/job/dev_odion_liburcu/>.


-- 
Olivier Dion
EfficiOS Inc.
https://www.efficios.com
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [lttng-dev] [PATCH 11/11] urcu/annotate: Add CMM annotation
  2023-05-15 20:17 ` [lttng-dev] [PATCH 11/11] urcu/annotate: Add CMM annotation Olivier Dion via lttng-dev
@ 2023-05-16 15:57   ` Olivier Dion via lttng-dev
  0 siblings, 0 replies; 69+ messages in thread
From: Olivier Dion via lttng-dev @ 2023-05-16 15:57 UTC (permalink / raw)
  To: lttng-dev; +Cc: Tony Finch

On Tue, 16 May 2023, Tony Finch <dot@dotat.at> wrote:
> The __has_feature pseudo-macro interacts in fun ways with preprocessor
> expression evaluation, because when it is undefined and you write
>
> 	#if defined(__has_feature) && __has_feature(thread_sanitizer)
>
> macro expansion gives you a syntax error
>
> 	#if 0 && 0(thread_sanitizer)
>
> so it must be split into two #if expressions, thus:

TIL.  Thank you for that!

-- 
Olivier Dion
EfficiOS Inc.
https://www.efficios.com
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [lttng-dev] [PATCH 00/11] Add support for TSAN to liburcu
  2023-05-16 15:47   ` Olivier Dion via lttng-dev
@ 2023-05-17 10:21     ` Dmitry Vyukov via lttng-dev
  2023-05-17 10:57       ` Dmitry Vyukov via lttng-dev
  2023-05-17 14:44       ` Olivier Dion via lttng-dev
  0 siblings, 2 replies; 69+ messages in thread
From: Dmitry Vyukov via lttng-dev @ 2023-05-17 10:21 UTC (permalink / raw)
  To: Olivier Dion; +Cc: lttng-dev, Paul E. McKenney

On Tue, 16 May 2023 at 17:47, Olivier Dion <odion@efficios.com> wrote:
>
> On Tue, 16 May 2023, Dmitry Vyukov via lttng-dev <lttng-dev@lists.lttng.org> wrote:
> > On Mon, 15 May 2023 at 22:18, Olivier Dion <odion@efficios.com> wrote:
>
> Hi Dmitry,
>
> > Where can I see the actual changes? Preferably side-by-side diffs with
> > full context. Are they (or can they be) uploaded somewhere on
> > github/gerrit?
>
> Here's the link to the first patch of the set on our Gerrit
> <https://review.lttng.org/c/userspace-rcu/+/9737>.
>
> > Are there any remaining open questions?
>
> On the Gerrit no.  But I would add this for the known issues:
>
> We have a regression test for forking.  We get the following when
> running it:
>
> ==23432==ThreadSanitizer: starting new threads after multi-threaded fork is not supported. Dying (set die_after_fork=0 to override)

> With TSAN_OPTIONS=die_after_fork=0, here are the results for GCC and
> Clang.
>
> * gcc 11.3.0
>   ==25266==ThreadSanitizer: dup thread with used id 0x7fd40dafe600
>
>   Looks like this was fixed with recent merged of TSAN in gcc 13.
>
> * clang 14.0.6

Hi Olivier,

Forking under tsan is a bit tricky. But I see we now take a number of
internal mutexes around  fork (but I think still not all of them), so
maybe it's not that bad. If  TSAN_OPTIONS=die_after_fork=0 works
reasonably reliably for you, then export it for testing.

Older compilers are missing a number of bug fixes.
So if you can restrict testing to newer compilers only, that's the way to go.


> Tests pass but are very slow.  This seems to be because we're calling
> exit(3) in the childs.  Changing it to _exit(2) solves the issue.  The
> only thing I can think of is that exit(3) is not async-signal-safe like
> _exit(2), yet we do other none-async safe calls like malloc(3) and
> free(3) in the childs.

By default tsan sleeps for 1 second at exit, since exit sequence is a
common source of races. Perhaps it's just these sleeps. Try
TSAN_OPTIONS=atexit_sleep_ms=0 (or maybe =50).


> Here are some perf records of that:
>
> With exit(3)
> ```
>   12.35%  test_urcu_fork.  [unknown]                [k] 0xffffffff89ebd22b
>    8.69%  test_urcu_fork.  [unknown]                [k] 0xffffffff89ebd8fa
>    7.50%  test_urcu_fork.  [unknown]                [k] 0xffffffff8a001360
>    6.11%  test_urcu_fork.  test_urcu_fork.tap       [.] __sanitizer::internal_memset
>    5.85%  test_urcu_fork.  test_urcu_fork.tap       [.] __tsan::ForkChildAfter
>    1.96%  test_urcu_fork.  [unknown]                [k] 0xffffffff892f8efd
>    1.88%  test_urcu_fork.  [unknown]                [k] 0xffffffff89ec8e2c
>    1.29%  test_urcu_fork.  [unknown]                [k] 0xffffffff890a99d1
>    0.79%  test_urcu_fork.  [unknown]                [k] 0xffffffff8918c566
>    0.75%  test_urcu_fork.  test_urcu_fork.tap       [.] __sanitizer::ThreadRegistry::OnFork
>    0.71%  test_urcu_fork.  [unknown]                [k] 0xffffffff89349574
>    0.64%  test_urcu_fork.  [unknown]                [k] 0xffffffff89ee424f
>    0.57%  test_urcu_fork.  test_urcu_fork.tap       [.] __tsan::MetaMap::AllocBlock
>    0.55%  test_urcu_fork.  [unknown]                [k] 0xffffffff89367249
>    0.53%  test_urcu_fork.  test_urcu_fork.tap       [.] __tsan_read8
>    0.51%  test_urcu_fork.  [unknown]                [k] 0xffffffff89ec8e05
> ```
>
> With _exit(2)
> ```
>   12.26%  test_urcu_fork.  [unknown]             [k] 0xffffffff89ebd8fa
>    9.51%  test_urcu_fork.  [unknown]             [k] 0xffffffff89ebd22b
>    6.78%  test_urcu_fork.  test_urcu_fork.tap    [.] __sanitizer::internal_memset
>    6.49%  test_urcu_fork.  [unknown]             [k] 0xffffffff8a001360
>    4.92%  test_urcu_fork.  test_urcu_fork.tap    [.] __tsan::ForkChildAfter
>    2.92%  test_urcu_fork.  [unknown]             [k] 0xffffffff892f8efd
>    2.19%  test_urcu_fork.  [unknown]             [k] 0xffffffff89ec8e2c
>    1.88%  test_urcu_fork.  [unknown]             [k] 0xffffffff890a99d1
>    0.83%  test_urcu_fork.  test_urcu_fork.tap    [.] __tsan::MetaMap::AllocBlock
>    0.72%  test_urcu_fork.  [unknown]             [k] 0xffffffff892f8f1c
>    0.67%  test_urcu_fork.  [unknown]             [k] 0xffffffff8918c566
>    0.65%  test_urcu_fork.  [unknown]             [k] 0xffffffff89ec16dd
>    0.62%  test_urcu_fork.  test_urcu_fork.tap    [.] __sanitizer::CombinedAllocator<__sanitizer::SizeClassAllocator32<__sanitizer::AP32>, __sanitizer::LargeMmapAllocatorPtrArrayStatic>::Allocate
>    0.61%  test_urcu_fork.  test_urcu_fork.tap    [.] __tsan_write4
>    0.60%  test_urcu_fork.  [unknown]             [k] 0xffffffff89367249
>    0.59%  test_urcu_fork.  [unknown]             [k] 0xffffffff89ee424f
>    0.50%  test_urcu_fork.  test_urcu_fork.tap    [.] __sanitizer::ThreadRegistry::OnFork
>    0.50%  test_urcu_fork.  test_urcu_fork.tap    [.] __tsan_write8
>    0.47%  test_urcu_fork.  test_urcu_fork.tap    [.] __tsan::ForkBefore
>    0.45%  test_urcu_fork.  test_urcu_fork.tap    [.] __tsan_func_entry
> ```
>
> > Is there CI coverage with -fsanitize=thread?
>
> Since the urcu-signal flavor deadlocks with TSAN (see reproducer in
> commit log), the CI simply timeouts.  I do however have a job with TSAN
> as an matrix axis here <https://ci.lttng.org/job/dev_odion_liburcu/>.

Sounds good.
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [lttng-dev] [PATCH 00/11] Add support for TSAN to liburcu
  2023-05-17 10:21     ` Dmitry Vyukov via lttng-dev
@ 2023-05-17 10:57       ` Dmitry Vyukov via lttng-dev
  2023-05-17 14:44         ` Olivier Dion via lttng-dev
  2023-05-17 14:44       ` Olivier Dion via lttng-dev
  1 sibling, 1 reply; 69+ messages in thread
From: Dmitry Vyukov via lttng-dev @ 2023-05-17 10:57 UTC (permalink / raw)
  To: Olivier Dion; +Cc: lttng-dev, Paul E. McKenney

I've skimmed through the changes. They look sane. You know what you
are doing. So unless you have any other open issues, I don't have much
to add.

On Wed, 17 May 2023 at 12:21, Dmitry Vyukov <dvyukov@google.com> wrote:
> > Hi Dmitry,
> >
> > > Where can I see the actual changes? Preferably side-by-side diffs with
> > > full context. Are they (or can they be) uploaded somewhere on
> > > github/gerrit?
> >
> > Here's the link to the first patch of the set on our Gerrit
> > <https://review.lttng.org/c/userspace-rcu/+/9737>.
> >
> > > Are there any remaining open questions?
> >
> > On the Gerrit no.  But I would add this for the known issues:
> >
> > We have a regression test for forking.  We get the following when
> > running it:
> >
> > ==23432==ThreadSanitizer: starting new threads after multi-threaded fork is not supported. Dying (set die_after_fork=0 to override)
>
> > With TSAN_OPTIONS=die_after_fork=0, here are the results for GCC and
> > Clang.
> >
> > * gcc 11.3.0
> >   ==25266==ThreadSanitizer: dup thread with used id 0x7fd40dafe600
> >
> >   Looks like this was fixed with recent merged of TSAN in gcc 13.
> >
> > * clang 14.0.6
>
> Hi Olivier,
>
> Forking under tsan is a bit tricky. But I see we now take a number of
> internal mutexes around  fork (but I think still not all of them), so
> maybe it's not that bad. If  TSAN_OPTIONS=die_after_fork=0 works
> reasonably reliably for you, then export it for testing.
>
> Older compilers are missing a number of bug fixes.
> So if you can restrict testing to newer compilers only, that's the way to go.
>
>
> > Tests pass but are very slow.  This seems to be because we're calling
> > exit(3) in the childs.  Changing it to _exit(2) solves the issue.  The
> > only thing I can think of is that exit(3) is not async-signal-safe like
> > _exit(2), yet we do other none-async safe calls like malloc(3) and
> > free(3) in the childs.
>
> By default tsan sleeps for 1 second at exit, since exit sequence is a
> common source of races. Perhaps it's just these sleeps. Try
> TSAN_OPTIONS=atexit_sleep_ms=0 (or maybe =50).
>
>
> > Here are some perf records of that:
> >
> > With exit(3)
> > ```
> >   12.35%  test_urcu_fork.  [unknown]                [k] 0xffffffff89ebd22b
> >    8.69%  test_urcu_fork.  [unknown]                [k] 0xffffffff89ebd8fa
> >    7.50%  test_urcu_fork.  [unknown]                [k] 0xffffffff8a001360
> >    6.11%  test_urcu_fork.  test_urcu_fork.tap       [.] __sanitizer::internal_memset
> >    5.85%  test_urcu_fork.  test_urcu_fork.tap       [.] __tsan::ForkChildAfter
> >    1.96%  test_urcu_fork.  [unknown]                [k] 0xffffffff892f8efd
> >    1.88%  test_urcu_fork.  [unknown]                [k] 0xffffffff89ec8e2c
> >    1.29%  test_urcu_fork.  [unknown]                [k] 0xffffffff890a99d1
> >    0.79%  test_urcu_fork.  [unknown]                [k] 0xffffffff8918c566
> >    0.75%  test_urcu_fork.  test_urcu_fork.tap       [.] __sanitizer::ThreadRegistry::OnFork
> >    0.71%  test_urcu_fork.  [unknown]                [k] 0xffffffff89349574
> >    0.64%  test_urcu_fork.  [unknown]                [k] 0xffffffff89ee424f
> >    0.57%  test_urcu_fork.  test_urcu_fork.tap       [.] __tsan::MetaMap::AllocBlock
> >    0.55%  test_urcu_fork.  [unknown]                [k] 0xffffffff89367249
> >    0.53%  test_urcu_fork.  test_urcu_fork.tap       [.] __tsan_read8
> >    0.51%  test_urcu_fork.  [unknown]                [k] 0xffffffff89ec8e05
> > ```
> >
> > With _exit(2)
> > ```
> >   12.26%  test_urcu_fork.  [unknown]             [k] 0xffffffff89ebd8fa
> >    9.51%  test_urcu_fork.  [unknown]             [k] 0xffffffff89ebd22b
> >    6.78%  test_urcu_fork.  test_urcu_fork.tap    [.] __sanitizer::internal_memset
> >    6.49%  test_urcu_fork.  [unknown]             [k] 0xffffffff8a001360
> >    4.92%  test_urcu_fork.  test_urcu_fork.tap    [.] __tsan::ForkChildAfter
> >    2.92%  test_urcu_fork.  [unknown]             [k] 0xffffffff892f8efd
> >    2.19%  test_urcu_fork.  [unknown]             [k] 0xffffffff89ec8e2c
> >    1.88%  test_urcu_fork.  [unknown]             [k] 0xffffffff890a99d1
> >    0.83%  test_urcu_fork.  test_urcu_fork.tap    [.] __tsan::MetaMap::AllocBlock
> >    0.72%  test_urcu_fork.  [unknown]             [k] 0xffffffff892f8f1c
> >    0.67%  test_urcu_fork.  [unknown]             [k] 0xffffffff8918c566
> >    0.65%  test_urcu_fork.  [unknown]             [k] 0xffffffff89ec16dd
> >    0.62%  test_urcu_fork.  test_urcu_fork.tap    [.] __sanitizer::CombinedAllocator<__sanitizer::SizeClassAllocator32<__sanitizer::AP32>, __sanitizer::LargeMmapAllocatorPtrArrayStatic>::Allocate
> >    0.61%  test_urcu_fork.  test_urcu_fork.tap    [.] __tsan_write4
> >    0.60%  test_urcu_fork.  [unknown]             [k] 0xffffffff89367249
> >    0.59%  test_urcu_fork.  [unknown]             [k] 0xffffffff89ee424f
> >    0.50%  test_urcu_fork.  test_urcu_fork.tap    [.] __sanitizer::ThreadRegistry::OnFork
> >    0.50%  test_urcu_fork.  test_urcu_fork.tap    [.] __tsan_write8
> >    0.47%  test_urcu_fork.  test_urcu_fork.tap    [.] __tsan::ForkBefore
> >    0.45%  test_urcu_fork.  test_urcu_fork.tap    [.] __tsan_func_entry
> > ```
> >
> > > Is there CI coverage with -fsanitize=thread?
> >
> > Since the urcu-signal flavor deadlocks with TSAN (see reproducer in
> > commit log), the CI simply timeouts.  I do however have a job with TSAN
> > as an matrix axis here <https://ci.lttng.org/job/dev_odion_liburcu/>.
>
> Sounds good.
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [lttng-dev] [PATCH 00/11] Add support for TSAN to liburcu
  2023-05-17 10:21     ` Dmitry Vyukov via lttng-dev
  2023-05-17 10:57       ` Dmitry Vyukov via lttng-dev
@ 2023-05-17 14:44       ` Olivier Dion via lttng-dev
  1 sibling, 0 replies; 69+ messages in thread
From: Olivier Dion via lttng-dev @ 2023-05-17 14:44 UTC (permalink / raw)
  To: Dmitry Vyukov; +Cc: lttng-dev, Paul E. McKenney

On Wed, 17 May 2023, Dmitry Vyukov <dvyukov@google.com> wrote:

> Forking under tsan is a bit tricky. But I see we now take a number of
> internal mutexes around  fork (but I think still not all of them), so
> maybe it's not that bad. If  TSAN_OPTIONS=die_after_fork=0 works
> reasonably reliably for you, then export it for testing.

Works like a charm!  Thanks for that!

-- 
Olivier Dion
EfficiOS Inc.
https://www.efficios.com
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [lttng-dev] [PATCH 00/11] Add support for TSAN to liburcu
  2023-05-17 10:57       ` Dmitry Vyukov via lttng-dev
@ 2023-05-17 14:44         ` Olivier Dion via lttng-dev
  2023-05-23 16:05           ` Olivier Dion via lttng-dev
  0 siblings, 1 reply; 69+ messages in thread
From: Olivier Dion via lttng-dev @ 2023-05-17 14:44 UTC (permalink / raw)
  To: Dmitry Vyukov; +Cc: lttng-dev, Paul E. McKenney

On Wed, 17 May 2023, Dmitry Vyukov via lttng-dev <lttng-dev@lists.lttng.org> wrote:
> I've skimmed through the changes. They look sane. You know what you
> are doing. So unless you have any other open issues, I don't have much
> to add.

Thanks for taking the time!

-- 
Olivier Dion
EfficiOS Inc.
https://www.efficios.com
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [lttng-dev] [PATCH 00/11] Add support for TSAN to liburcu
  2023-05-17 14:44         ` Olivier Dion via lttng-dev
@ 2023-05-23 16:05           ` Olivier Dion via lttng-dev
  2023-05-24  8:14             ` Dmitry Vyukov via lttng-dev
  0 siblings, 1 reply; 69+ messages in thread
From: Olivier Dion via lttng-dev @ 2023-05-23 16:05 UTC (permalink / raw)
  To: Dmitry Vyukov; +Cc: lttng-dev, Paul E. McKenney

[-- Attachment #1: Type: text/plain, Size: 209 bytes --]

Hi Dmitry,

We do have a new issue and we think it might be a limitation from TSAN.

Find attached a test program that we believe is correct.  You can
compile it with `gcc -fsanitize=thread test.c -pthread'.


[-- Attachment #2: test.c --]
[-- Type: application/octet-stream, Size: 2035 bytes --]

#include <stdlib.h>
#include <stdint.h>

#include <pthread.h>

#define LOOP 1000

struct node {
	struct node *next;
};

static pthread_barrier_t barrier;
static struct node the_terminal_node = { .next = &the_terminal_node };
static struct node pending_stack = { &the_terminal_node };

extern void __tsan_acquire(void*);
extern void __tsan_release(void*);

extern void __tsan_ignore_thread_begin();
extern void __tsan_ignore_thread_end();

static void *push_worker(void *nil)
{
	(void) nil;

	pthread_barrier_wait(&barrier);

	for (size_t k=0; k<LOOP; ++k){
		struct node *new_node, *old_node;

		new_node = calloc(1, sizeof(struct node));
		old_node = __atomic_exchange_n(&pending_stack.next,
					       new_node,
					       __ATOMIC_SEQ_CST);
		/* Works if RELEASE. */
		__atomic_store_n(&new_node->next, old_node, __ATOMIC_RELAXED);

		/* Also works if: */
#if 0
		__tsan_ignore_thread_begin();
		__atomic_store_n(&new_node->next, old_node, __ATOMIC_RELAXED);
		__tsan_ignore_thread_end();
#endif
		/* Why is this not working? */
#if 0
		__tsan_release(&new_node->next);
		__atomic_store_n(&new_node->next, old_node, __ATOMIC_RELAXED);
#endif

	}

	return NULL;
}

static void *pop_worker(void *nil)
{
	(void) nil;

	size_t k = 0;

	pthread_barrier_wait(&barrier);

	while (k < LOOP) {
		struct node *current_stack;
		struct node *next_node;

		current_stack = __atomic_exchange_n(&pending_stack.next,
						    &the_terminal_node,
						    __ATOMIC_SEQ_CST);

		while (current_stack != &the_terminal_node) {

		retry_load:
			next_node = __atomic_load_n(&current_stack->next,
						    __ATOMIC_CONSUME);

			if (!next_node) {
				goto retry_load;
			}

			free(current_stack);
			current_stack = next_node;
			++k;
		}
	}

	return NULL;
}

int main(void)
{
	pthread_t ths[2];

	pthread_barrier_init(&barrier, NULL, 3);

	pthread_create(&ths[0], NULL, push_worker, NULL);
	pthread_create(&ths[1], NULL, pop_worker, NULL);

	pthread_barrier_wait(&barrier);

	pthread_join(ths[0], NULL);
	pthread_join(ths[1], NULL);

	return 0;
}

[-- Attachment #3: Type: text/plain, Size: 821 bytes --]


TSAN flags a race condition between the atomic store relaxed at line 36
and the free at line 81.

We can solve the issue by replacing the atomic store relaxed with a
atomic store release.  However, the preceding atomic exchange with
sequential consistency already acts as an implicit release (in term of
memory barrier) for the following store, making the release semantic
redundant.

We've found an alternative to fix this by ignoring the thread during the
relaxed store (line 39).  However, what we would like is to annotate the
memory stored (line 45).

My theory (I don't know the internal of TSAN much) is that TSAN thinks
for some reason that the atomic store relaxed happen at the same epoch
as the free, resulting in a false positive.  If so, m

Thought?

-- 
Olivier Dion
EfficiOS Inc.
https://www.efficios.com

[-- Attachment #4: Type: text/plain, Size: 156 bytes --]

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [lttng-dev] [PATCH 00/11] Add support for TSAN to liburcu
  2023-05-23 16:05           ` Olivier Dion via lttng-dev
@ 2023-05-24  8:14             ` Dmitry Vyukov via lttng-dev
  2023-05-26  5:33               ` Ondřej Surý via lttng-dev
  2023-05-26 15:15               ` Olivier Dion via lttng-dev
  0 siblings, 2 replies; 69+ messages in thread
From: Dmitry Vyukov via lttng-dev @ 2023-05-24  8:14 UTC (permalink / raw)
  To: Olivier Dion, tsan-users; +Cc: lttng-dev, Paul E. McKenney

On Tue, 23 May 2023 at 18:05, Olivier Dion <odion@efficios.com> wrote:
>
> Hi Dmitry,
>
> We do have a new issue and we think it might be a limitation from TSAN.
>
> Find attached a test program that we believe is correct.  You can
> compile it with `gcc -fsanitize=thread test.c -pthread'.
>
>
> TSAN flags a race condition between the atomic store relaxed at line 36
> and the free at line 81.
>
> We can solve the issue by replacing the atomic store relaxed with a
> atomic store release.  However, the preceding atomic exchange with
> sequential consistency already acts as an implicit release (in term of
> memory barrier) for the following store, making the release semantic
> redundant.
>
> We've found an alternative to fix this by ignoring the thread during the
> relaxed store (line 39).  However, what we would like is to annotate the
> memory stored (line 45).
>
> My theory (I don't know the internal of TSAN much) is that TSAN thinks
> for some reason that the atomic store relaxed happen at the same epoch
> as the free, resulting in a false positive.  If so, m


I don't this this is true in the C/C++ memory model:
"the preceding atomic exchange with sequential consistency already
acts as an implicit release (in term of memory barrier) for the
following store".

std::atomic_thread_fence does affect all preceding/subsequent
operations, but an atomic memory operation only affects ordering on
that variable, it doesn't also serve as a standalone memory fence.
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [lttng-dev] [PATCH 00/11] Add support for TSAN to liburcu
  2023-05-24  8:14             ` Dmitry Vyukov via lttng-dev
@ 2023-05-26  5:33               ` Ondřej Surý via lttng-dev
  2023-05-26  6:08                 ` Dmitry Vyukov via lttng-dev
  2023-05-26 15:15               ` Olivier Dion via lttng-dev
  1 sibling, 1 reply; 69+ messages in thread
From: Ondřej Surý via lttng-dev @ 2023-05-26  5:33 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Olivier Dion, tsan-users, lttng-dev, Paul E. McKenney, Tony Finch

A little bit related question/problem - some of our system test jobs got significantly slower (3x-5x) with **clang-16** TSAN after we integrated liburcu to BIND 9. The GCC (13.1.1) TSAN doesn’t exhibit this behavior.

The liburcu integration in BIND 9 is very light at the moment - we use liburcu for qptrie and there’s some usage of wfcqueue for queuing jobs(callbacks) between threads.

We were not able to put a finger on it yet, but saying this aloud might help diagnose this.

Ondrej
--
Ondřej Surý <ondrej@sury.org> (He/Him)

> On 24. 5. 2023, at 10:15, Dmitry Vyukov via lttng-dev <lttng-dev@lists.lttng.org> wrote:
> 
> On Tue, 23 May 2023 at 18:05, Olivier Dion <odion@efficios.com> wrote:
>> 
>> Hi Dmitry,
>> 
>> We do have a new issue and we think it might be a limitation from TSAN.
>> 
>> Find attached a test program that we believe is correct.  You can
>> compile it with `gcc -fsanitize=thread test.c -pthread'.
>> 
>> 
>> TSAN flags a race condition between the atomic store relaxed at line 36
>> and the free at line 81.
>> 
>> We can solve the issue by replacing the atomic store relaxed with a
>> atomic store release.  However, the preceding atomic exchange with
>> sequential consistency already acts as an implicit release (in term of
>> memory barrier) for the following store, making the release semantic
>> redundant.
>> 
>> We've found an alternative to fix this by ignoring the thread during the
>> relaxed store (line 39).  However, what we would like is to annotate the
>> memory stored (line 45).
>> 
>> My theory (I don't know the internal of TSAN much) is that TSAN thinks
>> for some reason that the atomic store relaxed happen at the same epoch
>> as the free, resulting in a false positive.  If so, m
> 
> 
> I don't this this is true in the C/C++ memory model:
> "the preceding atomic exchange with sequential consistency already
> acts as an implicit release (in term of memory barrier) for the
> following store".
> 
> std::atomic_thread_fence does affect all preceding/subsequent
> operations, but an atomic memory operation only affects ordering on
> that variable, it doesn't also serve as a standalone memory fence.
> _______________________________________________
> lttng-dev mailing list
> lttng-dev@lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [lttng-dev] [PATCH 00/11] Add support for TSAN to liburcu
  2023-05-26  5:33               ` Ondřej Surý via lttng-dev
@ 2023-05-26  6:08                 ` Dmitry Vyukov via lttng-dev
  2023-05-26  6:10                   ` Dmitry Vyukov via lttng-dev
  2023-05-26 10:06                   ` Ondřej Surý via lttng-dev
  0 siblings, 2 replies; 69+ messages in thread
From: Dmitry Vyukov via lttng-dev @ 2023-05-26  6:08 UTC (permalink / raw)
  To: Ondřej Surý
  Cc: Olivier Dion, tsan-users, lttng-dev, Paul E. McKenney, Tony Finch

On Fri, 26 May 2023 at 07:33, Ondřej Surý <ondrej@sury.org> wrote:
>
> A little bit related question/problem - some of our system test jobs got significantly slower (3x-5x) with **clang-16** TSAN after we integrated liburcu to BIND 9. The GCC (13.1.1) TSAN doesn’t exhibit this behavior.
>
> The liburcu integration in BIND 9 is very light at the moment - we use liburcu for qptrie and there’s some usage of wfcqueue for queuing jobs(callbacks) between threads.
>
> We were not able to put a finger on it yet, but saying this aloud might help diagnose this.

Interesting. Clang 16 got a completely new tsan runtime
implementation, which is generally ~2x faster and consumes less
memory.
However, of course, it's not possible to optimize for all gazillion of
programs in the world. I suspect librcu tests are quite extreme in
terms of synchronization.

If you collect perf profiles for both, I can look for some low hanging fruit.
Or is there a reasonable way for me to reproduce?


> Ondrej
> --
> Ondřej Surý <ondrej@sury.org> (He/Him)
>
> > On 24. 5. 2023, at 10:15, Dmitry Vyukov via lttng-dev <lttng-dev@lists.lttng.org> wrote:
> >
> > On Tue, 23 May 2023 at 18:05, Olivier Dion <odion@efficios.com> wrote:
> >>
> >> Hi Dmitry,
> >>
> >> We do have a new issue and we think it might be a limitation from TSAN.
> >>
> >> Find attached a test program that we believe is correct.  You can
> >> compile it with `gcc -fsanitize=thread test.c -pthread'.
> >>
> >>
> >> TSAN flags a race condition between the atomic store relaxed at line 36
> >> and the free at line 81.
> >>
> >> We can solve the issue by replacing the atomic store relaxed with a
> >> atomic store release.  However, the preceding atomic exchange with
> >> sequential consistency already acts as an implicit release (in term of
> >> memory barrier) for the following store, making the release semantic
> >> redundant.
> >>
> >> We've found an alternative to fix this by ignoring the thread during the
> >> relaxed store (line 39).  However, what we would like is to annotate the
> >> memory stored (line 45).
> >>
> >> My theory (I don't know the internal of TSAN much) is that TSAN thinks
> >> for some reason that the atomic store relaxed happen at the same epoch
> >> as the free, resulting in a false positive.  If so, m
> >
> >
> > I don't this this is true in the C/C++ memory model:
> > "the preceding atomic exchange with sequential consistency already
> > acts as an implicit release (in term of memory barrier) for the
> > following store".
> >
> > std::atomic_thread_fence does affect all preceding/subsequent
> > operations, but an atomic memory operation only affects ordering on
> > that variable, it doesn't also serve as a standalone memory fence.
> > _______________________________________________
> > lttng-dev mailing list
> > lttng-dev@lists.lttng.org
> > https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
>
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [lttng-dev] [PATCH 00/11] Add support for TSAN to liburcu
  2023-05-26  6:08                 ` Dmitry Vyukov via lttng-dev
@ 2023-05-26  6:10                   ` Dmitry Vyukov via lttng-dev
  2023-05-26 10:06                   ` Ondřej Surý via lttng-dev
  1 sibling, 0 replies; 69+ messages in thread
From: Dmitry Vyukov via lttng-dev @ 2023-05-26  6:10 UTC (permalink / raw)
  To: Ondřej Surý
  Cc: Olivier Dion, tsan-users, lttng-dev, Paul E. McKenney, Tony Finch

On Fri, 26 May 2023 at 08:08, Dmitry Vyukov <dvyukov@google.com> wrote:
> > A little bit related question/problem - some of our system test jobs got significantly slower (3x-5x) with **clang-16** TSAN after we integrated liburcu to BIND 9. The GCC (13.1.1) TSAN doesn’t exhibit this behavior.
> >
> > The liburcu integration in BIND 9 is very light at the moment - we use liburcu for qptrie and there’s some usage of wfcqueue for queuing jobs(callbacks) between threads.
> >
> > We were not able to put a finger on it yet, but saying this aloud might help diagnose this.
>
> Interesting. Clang 16 got a completely new tsan runtime
> implementation, which is generally ~2x faster and consumes less
> memory.
> However, of course, it's not possible to optimize for all gazillion of
> programs in the world. I suspect librcu tests are quite extreme in
> terms of synchronization.
>
> If you collect perf profiles for both, I can look for some low hanging fruit.
> Or is there a reasonable way for me to reproduce?

How many CPUs do these machines have?

> > Ondrej
> > --
> > Ondřej Surý <ondrej@sury.org> (He/Him)
> >
> > > On 24. 5. 2023, at 10:15, Dmitry Vyukov via lttng-dev <lttng-dev@lists.lttng.org> wrote:
> > >
> > > On Tue, 23 May 2023 at 18:05, Olivier Dion <odion@efficios.com> wrote:
> > >>
> > >> Hi Dmitry,
> > >>
> > >> We do have a new issue and we think it might be a limitation from TSAN.
> > >>
> > >> Find attached a test program that we believe is correct.  You can
> > >> compile it with `gcc -fsanitize=thread test.c -pthread'.
> > >>
> > >>
> > >> TSAN flags a race condition between the atomic store relaxed at line 36
> > >> and the free at line 81.
> > >>
> > >> We can solve the issue by replacing the atomic store relaxed with a
> > >> atomic store release.  However, the preceding atomic exchange with
> > >> sequential consistency already acts as an implicit release (in term of
> > >> memory barrier) for the following store, making the release semantic
> > >> redundant.
> > >>
> > >> We've found an alternative to fix this by ignoring the thread during the
> > >> relaxed store (line 39).  However, what we would like is to annotate the
> > >> memory stored (line 45).
> > >>
> > >> My theory (I don't know the internal of TSAN much) is that TSAN thinks
> > >> for some reason that the atomic store relaxed happen at the same epoch
> > >> as the free, resulting in a false positive.  If so, m
> > >
> > >
> > > I don't this this is true in the C/C++ memory model:
> > > "the preceding atomic exchange with sequential consistency already
> > > acts as an implicit release (in term of memory barrier) for the
> > > following store".
> > >
> > > std::atomic_thread_fence does affect all preceding/subsequent
> > > operations, but an atomic memory operation only affects ordering on
> > > that variable, it doesn't also serve as a standalone memory fence.
> > > _______________________________________________
> > > lttng-dev mailing list
> > > lttng-dev@lists.lttng.org
> > > https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
> >
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [lttng-dev] [PATCH 00/11] Add support for TSAN to liburcu
  2023-05-26  6:08                 ` Dmitry Vyukov via lttng-dev
  2023-05-26  6:10                   ` Dmitry Vyukov via lttng-dev
@ 2023-05-26 10:06                   ` Ondřej Surý via lttng-dev
  2023-05-26 10:08                     ` Dmitry Vyukov via lttng-dev
  2023-05-26 14:20                     ` Olivier Dion via lttng-dev
  1 sibling, 2 replies; 69+ messages in thread
From: Ondřej Surý via lttng-dev @ 2023-05-26 10:06 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Olivier Dion, tsan-users, lttng-dev, Paul E. McKenney, Tony Finch


> On 26. 5. 2023, at 8:08, Dmitry Vyukov <dvyukov@google.com> wrote:
> 
> On Fri, 26 May 2023 at 07:33, Ondřej Surý <ondrej@sury.org> wrote:
>> 
>> A little bit related question/problem - some of our system test jobs got significantly slower (3x-5x) with **clang-16** TSAN after we integrated liburcu to BIND 9. The GCC (13.1.1) TSAN doesn’t exhibit this behavior.
>> 
>> The liburcu integration in BIND 9 is very light at the moment - we use liburcu for qptrie and there’s some usage of wfcqueue for queuing jobs(callbacks) between threads.
>> 
>> We were not able to put a finger on it yet, but saying this aloud might help diagnose this.
> 
> Interesting. Clang 16 got a completely new tsan runtime
> implementation, which is generally ~2x faster and consumes less
> memory.
> However, of course, it's not possible to optimize for all gazillion of
> programs in the world. I suspect librcu tests are quite extreme in
> terms of synchronization.
> 
> If you collect perf profiles for both, I can look for some low hanging fruit.
> Or is there a reasonable way for me to reproduce?

Looks like it got all resolved by using the latest liburcu patch set with full
TSAN support. I haven't realized that our CI still runs with whatever liburcu
Debian bullseye have. Sorry for the confusion.

And I can no longer reproduce this locally.

FTR here are some numbers from my local system (13th Gen Intel(R) Core(TM) i9-13900K)
running TSAN-enabled patched userspace-rcu[1], libuv 1.45.0 and OpenSSL 3.0.8:

gcc-13 baseline

rpz:            passed in 198.71s (0:03:18)
rpzrecurse:     passed in 310.62s (0:05:10)
rrsetorder:     passed in 200.40s (0:03:20)

gcc-13 TSAN

rpz:            passed in 305.64s (0:05:05)
rpzrecurse:     passed in 427.73s (0:07:07)
rrsetorder:     passed in 260.65s (0:04:20)

clang-16 TSAN

rpz:            passed in 290.05s (0:04:50)
rpzrecurse:     passed in 468.53s (0:07:48)
rrsetorder:     passed in 258.97s (0:04:18)

This looks quite reasonable to me.

1. git fetch https://review.lttng.org/userspace-rcu refs/changes/58/10058/1 && git checkout FETCH_HEAD

Ondrej
--
Ondřej Surý (He/Him)
ondrej@sury.org

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [lttng-dev] [PATCH 00/11] Add support for TSAN to liburcu
  2023-05-26 10:06                   ` Ondřej Surý via lttng-dev
@ 2023-05-26 10:08                     ` Dmitry Vyukov via lttng-dev
  2023-05-26 14:20                     ` Olivier Dion via lttng-dev
  1 sibling, 0 replies; 69+ messages in thread
From: Dmitry Vyukov via lttng-dev @ 2023-05-26 10:08 UTC (permalink / raw)
  To: Ondřej Surý
  Cc: Olivier Dion, tsan-users, lttng-dev, Paul E. McKenney, Tony Finch

On Fri, 26 May 2023 at 12:06, Ondřej Surý <ondrej@sury.org> wrote:
>
>
> > On 26. 5. 2023, at 8:08, Dmitry Vyukov <dvyukov@google.com> wrote:
> >
> > On Fri, 26 May 2023 at 07:33, Ondřej Surý <ondrej@sury.org> wrote:
> >>
> >> A little bit related question/problem - some of our system test jobs got significantly slower (3x-5x) with **clang-16** TSAN after we integrated liburcu to BIND 9. The GCC (13.1.1) TSAN doesn’t exhibit this behavior.
> >>
> >> The liburcu integration in BIND 9 is very light at the moment - we use liburcu for qptrie and there’s some usage of wfcqueue for queuing jobs(callbacks) between threads.
> >>
> >> We were not able to put a finger on it yet, but saying this aloud might help diagnose this.
> >
> > Interesting. Clang 16 got a completely new tsan runtime
> > implementation, which is generally ~2x faster and consumes less
> > memory.
> > However, of course, it's not possible to optimize for all gazillion of
> > programs in the world. I suspect librcu tests are quite extreme in
> > terms of synchronization.
> >
> > If you collect perf profiles for both, I can look for some low hanging fruit.
> > Or is there a reasonable way for me to reproduce?
>
> Looks like it got all resolved by using the latest liburcu patch set with full
> TSAN support. I haven't realized that our CI still runs with whatever liburcu
> Debian bullseye have. Sorry for the confusion.
>
> And I can no longer reproduce this locally.
>
> FTR here are some numbers from my local system (13th Gen Intel(R) Core(TM) i9-13900K)
> running TSAN-enabled patched userspace-rcu[1], libuv 1.45.0 and OpenSSL 3.0.8:
>
> gcc-13 baseline
>
> rpz:            passed in 198.71s (0:03:18)
> rpzrecurse:     passed in 310.62s (0:05:10)
> rrsetorder:     passed in 200.40s (0:03:20)
>
> gcc-13 TSAN
>
> rpz:            passed in 305.64s (0:05:05)
> rpzrecurse:     passed in 427.73s (0:07:07)
> rrsetorder:     passed in 260.65s (0:04:20)
>
> clang-16 TSAN
>
> rpz:            passed in 290.05s (0:04:50)
> rpzrecurse:     passed in 468.53s (0:07:48)
> rrsetorder:     passed in 258.97s (0:04:18)
>
> This looks quite reasonable to me.
>
> 1. git fetch https://review.lttng.org/userspace-rcu refs/changes/58/10058/1 && git checkout FETCH_HEAD

Yes, looks reasonable.
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [lttng-dev] [PATCH 00/11] Add support for TSAN to liburcu
  2023-05-26 10:06                   ` Ondřej Surý via lttng-dev
  2023-05-26 10:08                     ` Dmitry Vyukov via lttng-dev
@ 2023-05-26 14:20                     ` Olivier Dion via lttng-dev
  1 sibling, 0 replies; 69+ messages in thread
From: Olivier Dion via lttng-dev @ 2023-05-26 14:20 UTC (permalink / raw)
  To: Ondřej Surý, Dmitry Vyukov
  Cc: tsan-users, lttng-dev, Paul E. McKenney, Tony Finch

On Fri, 26 May 2023, Ondřej Surý <ondrej@sury.org> wrote:

> Looks like it got all resolved by using the latest liburcu patch set with full
> TSAN support. I haven't realized that our CI still runs with whatever
> liburcu

Just pointing out that we've add support for wfcqueue which seems to be
quite used in BIND.  New patches are on the way for supporting other
data structures as well in the same vein :-)

-- 
Olivier Dion
EfficiOS Inc.
https://www.efficios.com
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [lttng-dev] [PATCH 00/11] Add support for TSAN to liburcu
  2023-05-24  8:14             ` Dmitry Vyukov via lttng-dev
  2023-05-26  5:33               ` Ondřej Surý via lttng-dev
@ 2023-05-26 15:15               ` Olivier Dion via lttng-dev
  1 sibling, 0 replies; 69+ messages in thread
From: Olivier Dion via lttng-dev @ 2023-05-26 15:15 UTC (permalink / raw)
  To: Dmitry Vyukov, tsan-users; +Cc: lttng-dev, Paul E. McKenney

On Wed, 24 May 2023, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Tue, 23 May 2023 at 18:05, Olivier Dion <odion@efficios.com> wrote:

> I don't this this is true in the C/C++ memory model:
> "the preceding atomic exchange with sequential consistency already
> acts as an implicit release (in term of memory barrier) for the
> following store".
>
> std::atomic_thread_fence does affect all preceding/subsequent
> operations, but an atomic memory operation only affects ordering on
> that variable, it doesn't also serve as a standalone memory fence.

After reading the standard, we concur with you.  We had to revisit the
memory model used by URCU to understand the conflict.  While doing so,
we had to adapt some of the algorithms that were assuming implicit full
memory barriers with operation such as CAS.

Thank for the insight!

-- 
Olivier Dion
EfficiOS Inc.
https://www.efficios.com
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [lttng-dev] [PATCH v2 00/12] Add support for TSAN to liburcu
  2023-05-15 20:17 [lttng-dev] [PATCH 00/11] Add support for TSAN to liburcu Olivier Dion via lttng-dev
                   ` (11 preceding siblings ...)
  2023-05-16  8:18 ` [lttng-dev] [PATCH 00/11] Add support for TSAN to liburcu Dmitry Vyukov via lttng-dev
@ 2023-06-07 18:53 ` Olivier Dion via lttng-dev
  2023-06-07 19:04   ` Ondřej Surý via lttng-dev
  2023-06-07 18:53 ` [lttng-dev] [PATCH v2 01/12] configure: Add --disable-atomic-builtins option Olivier Dion via lttng-dev
                   ` (11 subsequent siblings)
  24 siblings, 1 reply; 69+ messages in thread
From: Olivier Dion via lttng-dev @ 2023-06-07 18:53 UTC (permalink / raw)
  To: lttng-dev; +Cc: Olivier Dion, Tony Finch, Paul E. McKenney

This patch set adds support for TSAN in liburcu.
    
* Change since v1

** Adding CMM_SEQ_CST_FENCE memory order to the CMM memory model

   The C11 memory model is incompatible with the memory model used by liburcu,
   since the semantic of the C11 memory model is based on a happen-before
   (acquire/release) relationship of memory accesses, while liburcu is based on
   memory barriers and relaxed memory accesses.

   To circumvent this, a new memory order called CMM_SEQ_CST_FENCE is
   introduced.  It implies a CMM_SEQ_CST while emitting a thread fence after the
   operation.  Operations that were documented as emitting memory barriers
   before and after the operation are implemented now in term of this new memory
   order to keep compatibility.

   However, this model is redundant in some cases and the memory orders were
   changed internally in liburcu to simply use the CMM_SEQ_CST.

** Adding cmm_emit_legacy_smp_mb() to queue/stack APIs

   The queue/stack APIs document implicit memory barriers before or after
   operation.  These are now optionally emitted only if
   CONFIG_RCU_EMIT_LEGACY_MB is defined in urcu/config.h or manually defined by
   the user before including liburcu.  That way, users can opt-in even if the
   system headers were configured without this feature.

   However, users can not opt-out of that feature if configured by the system.

* v1

** Here are the major changes

  - Usage of compiler atomic builtins is added to the uatomic API.  This is
    required for TSAN to understand atomic memory accesses.  If the compiler
    supports such builtins, they are used by default.  User can opt-out and use
    the legacy implementation of the uatomic API by using the
    `--disable-atomic-builtins' configuration option.

  - The CMM memory model is introduced but yet formalized. It tries to be as
    close as possible to the C11 memory model while offering primitives such as
    cmm_smp_wmb(), cmm_smp_rmb() and cmm_mb() that can't be expressed in it.
    For example, cmm_mb() can be used for ordering memory accesses to MMIO
    devices, which is out of the scope of the C11 memory model.

  - The CMM annotation layer is a new public API that is highly experimental and
    not guaranteed to be stable at this stage.  It serves the dual purpose of
    verifying local (intra-thread) relaxed atomic accesses ordering with a
    memory barrier and global (inter-thread) relaxed atomic accesses with a
    shared state.  The second purpose is necessary for TSAN to understand memory
    accesses ordering since it does not fully support thread fence yet.

** CMM annotation example

  Consider the following pseudo-code of writer side in synchronize_rcu().  An
  acquire group is defined on the stack of the writer.  Annotations are made
  onto the group to ensure ordering of relaxed memory accesses in reader_state()
  before the memory barrier at the end of synchronize_rcu().  It also helps TSAN
  to understand that the relaxed accesses in reader_state() act like acquire
  accesses because of the memory barrier in synchronize_rcu().

  In other words, the purpose of this annotation is to convert a group of
  load-acquire memory operations into load-relaxed memory operations followed by
  a single memory barrier.  This highly benefits weakly ordered architectures by
  having a constant number of memory barriers instead of being linearly
  proportional to the number of loads.  This does not benefit TSO
  architectures.  


Olivier Dion (12):
  configure: Add --disable-atomic-builtins option
  urcu/compiler: Use atomic builtins if configured
  urcu/arch/generic: Use atomic builtins if configured
  urcu/system: Use atomic builtins if configured
  urcu/uatomic: Add CMM memory model
  urcu-wait: Fix wait state load/store
  tests: Use uatomic for accessing global states
  benchmark: Use uatomic for accessing global states
  tests/unit/test_build: Quiet unused return value
  urcu/annotate: Add CMM annotation
  Add cmm_emit_legacy_smp_mb()
  tests: Add tests for checking race conditions

 README.md                               |  11 ++
 configure.ac                            |  39 ++++
 doc/uatomic-api.md                      |   3 +-
 include/Makefile.am                     |   3 +
 include/urcu/annotate.h                 | 174 ++++++++++++++++++
 include/urcu/arch.h                     |   6 +
 include/urcu/arch/generic.h             |  37 ++++
 include/urcu/compiler.h                 |  22 ++-
 include/urcu/config.h.in                |   6 +
 include/urcu/static/lfstack.h           |  25 ++-
 include/urcu/static/pointer.h           |  40 ++--
 include/urcu/static/rculfqueue.h        |  14 +-
 include/urcu/static/rculfstack.h        |   8 +-
 include/urcu/static/urcu-bp.h           |  12 +-
 include/urcu/static/urcu-common.h       |   8 +-
 include/urcu/static/urcu-mb.h           |  11 +-
 include/urcu/static/urcu-memb.h         |  26 ++-
 include/urcu/static/urcu-qsbr.h         |  29 ++-
 include/urcu/static/wfcqueue.h          |  68 +++----
 include/urcu/static/wfqueue.h           |   9 +-
 include/urcu/static/wfstack.h           |  24 ++-
 include/urcu/system.h                   |  21 +++
 include/urcu/uatomic.h                  |  63 ++++++-
 include/urcu/uatomic/builtins-generic.h | 170 +++++++++++++++++
 include/urcu/uatomic/builtins.h         |  79 ++++++++
 include/urcu/uatomic/generic.h          | 234 ++++++++++++++++++++++++
 src/rculfhash.c                         |  92 ++++++----
 src/urcu-bp.c                           |  17 +-
 src/urcu-pointer.c                      |   9 +-
 src/urcu-qsbr.c                         |  31 +++-
 src/urcu-wait.h                         |  15 +-
 src/urcu.c                              |  24 ++-
 tests/benchmark/Makefile.am             |  91 ++++-----
 tests/benchmark/common-states.c         |   1 +
 tests/benchmark/common-states.h         |  51 ++++++
 tests/benchmark/test_mutex.c            |  32 +---
 tests/benchmark/test_perthreadlock.c    |  32 +---
 tests/benchmark/test_rwlock.c           |  32 +---
 tests/benchmark/test_urcu.c             |  33 +---
 tests/benchmark/test_urcu_assign.c      |  33 +---
 tests/benchmark/test_urcu_bp.c          |  33 +---
 tests/benchmark/test_urcu_defer.c       |  33 +---
 tests/benchmark/test_urcu_gc.c          |  34 +---
 tests/benchmark/test_urcu_hash.c        |   6 +-
 tests/benchmark/test_urcu_hash.h        |  15 --
 tests/benchmark/test_urcu_hash_rw.c     |  10 +-
 tests/benchmark/test_urcu_hash_unique.c |  10 +-
 tests/benchmark/test_urcu_lfq.c         |  20 +-
 tests/benchmark/test_urcu_lfs.c         |  20 +-
 tests/benchmark/test_urcu_lfs_rcu.c     |  20 +-
 tests/benchmark/test_urcu_qsbr.c        |  33 +---
 tests/benchmark/test_urcu_qsbr_gc.c     |  34 +---
 tests/benchmark/test_urcu_wfcq.c        |  22 +--
 tests/benchmark/test_urcu_wfq.c         |  20 +-
 tests/benchmark/test_urcu_wfs.c         |  22 +--
 tests/common/api.h                      |  12 +-
 tests/regression/rcutorture.h           | 106 +++++++----
 tests/unit/test_build.c                 |   8 +-
 tests/unit/test_lfstack.c               |  90 +++++++++
 tests/unit/test_wfcqueue.c              | 119 ++++++++++++
 tests/unit/test_wfqueue.c               |  91 +++++++++
 tests/unit/test_wfstack.c               |  90 +++++++++
 62 files changed, 1799 insertions(+), 684 deletions(-)
 create mode 100644 include/urcu/annotate.h
 create mode 100644 include/urcu/uatomic/builtins-generic.h
 create mode 100644 include/urcu/uatomic/builtins.h
 create mode 100644 tests/benchmark/common-states.c
 create mode 100644 tests/benchmark/common-states.h
 create mode 100644 tests/unit/test_lfstack.c
 create mode 100644 tests/unit/test_wfcqueue.c
 create mode 100644 tests/unit/test_wfqueue.c
 create mode 100644 tests/unit/test_wfstack.c

-- 
2.40.1

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [lttng-dev] [PATCH v2 01/12] configure: Add --disable-atomic-builtins option
  2023-05-15 20:17 [lttng-dev] [PATCH 00/11] Add support for TSAN to liburcu Olivier Dion via lttng-dev
                   ` (12 preceding siblings ...)
  2023-06-07 18:53 ` [lttng-dev] [PATCH v2 00/12] " Olivier Dion via lttng-dev
@ 2023-06-07 18:53 ` Olivier Dion via lttng-dev
  2023-06-07 18:53 ` [lttng-dev] [PATCH v2 02/12] urcu/compiler: Use atomic builtins if configured Olivier Dion via lttng-dev
                   ` (10 subsequent siblings)
  24 siblings, 0 replies; 69+ messages in thread
From: Olivier Dion via lttng-dev @ 2023-06-07 18:53 UTC (permalink / raw)
  To: lttng-dev; +Cc: Olivier Dion, Tony Finch, Paul E. McKenney

By default, if the toolchain supports atomic builtins, use them for the
uatomic API. This requires that the toolchains used to compile the
library and the user application supports such builtins.

The advantage of using these builtins is that they are well known
synchronization primitives by several tools such as TSAN.

Change-Id: Ia8e97112681f744f17816dbc4cbbec805a483331
Co-authored-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Olivier Dion <odion@efficios.com>
---
 README.md                | 11 +++++++++++
 configure.ac             | 26 ++++++++++++++++++++++++++
 include/urcu/config.h.in |  3 +++
 3 files changed, 40 insertions(+)

diff --git a/README.md b/README.md
index ba5bb08..6ce96c9 100644
--- a/README.md
+++ b/README.md
@@ -429,6 +429,17 @@ still being used to iterate on a hash table.
 This option alters the rculfhash ABI. Make sure to compile both library
 and application with matching configuration.
 
+### Usage of `--disable-atomic-builtins`
+
+By default, the configure script will check if the toolchain supports atomic
+builtins. If so, then the RCU memory model is implemented using the atomic
+builtins of the toolchain.
+
+Building liburcu with `--disable-atomic-builtins` force to use the legacy
+internal implementations for atomic accesses.
+
+This option is useful if for example the atomic builtins for a given toolchain
+version is known to be broken or to be inefficient.
 
 Make targets
 ------------
diff --git a/configure.ac b/configure.ac
index 909cf1d..4450a31 100644
--- a/configure.ac
+++ b/configure.ac
@@ -230,6 +230,11 @@ AE_FEATURE([rcu-debug], [Enable internal debugging self-checks. Introduces a per
 AE_FEATURE_DEFAULT_DISABLE
 AE_FEATURE([cds-lfht-iter-debug], [Enable extra debugging checks for lock-free hash table iterator traversal. Alters the rculfhash ABI. Make sure to compile both library and application with matching configuration.])
 
+# toolchain atomic builtins
+# Enabled by default
+AE_FEATURE_DEFAULT_ENABLE
+AE_FEATURE([atomic-builtins], [Disable the usage of toolchain atomic builtins.])
+
 # When given, add -Werror to WARN_CFLAGS and WARN_CXXFLAGS.
 # Disabled by default
 AE_FEATURE_DEFAULT_DISABLE
@@ -259,6 +264,23 @@ AE_IF_FEATURE_ENABLED([cds-lfht-iter-debug], [
   AC_DEFINE([CONFIG_CDS_LFHT_ITER_DEBUG], [1], [Enable extra debugging checks for lock-free hash table iterator traversal. Alters the rculfhash ABI. Make sure to compile both library and application with matching configuration.])
 ])
 
+AE_IF_FEATURE_ENABLED([atomic-builtins], [
+  AC_COMPILE_IFELSE(
+	[AC_LANG_PROGRAM(
+		[[int x, y;]],
+		[[__atomic_store_n(&x, 0, __ATOMIC_RELAXED);
+		  __atomic_load_n(&x, __ATOMIC_RELAXED);
+		  y = __atomic_exchange_n(&x, 1, __ATOMIC_RELAXED);
+		  __atomic_compare_exchange_n(&x, &y, 0, 0, __ATOMIC_RELAXED, __ATOMIC_RELAXED);
+		  __atomic_add_fetch(&x, 1, __ATOMIC_RELAXED);
+		  __atomic_sub_fetch(&x, 1, __ATOMIC_RELAXED);
+		  __atomic_and_fetch(&x, 0x01, __ATOMIC_RELAXED);
+		  __atomic_or_fetch(&x, 0x01, __ATOMIC_RELAXED);
+		  __atomic_thread_fence(__ATOMIC_RELAXED);
+		  __atomic_signal_fence(__ATOMIC_RELAXED);]])],
+	[AC_DEFINE([CONFIG_RCU_USE_ATOMIC_BUILTINS], [1], [Use compiler atomic builtins.])],
+	[AE_FEATURE_DISABLE(atomic-builtins)])
+])
 
 ##                                                                          ##
 ## Set automake variables for optional feature conditionnals in Makefile.am ##
@@ -361,6 +383,10 @@ PPRINT_PROP_BOOL([Internal debugging], $value)
 AE_IS_FEATURE_ENABLED([cds-lfht-iter-debug]) && value=1 || value=0
 PPRINT_PROP_BOOL([Lock-free HT iterator debugging], $value)
 
+# atomic builtins enabled/disabled
+AE_IS_FEATURE_ENABLED([atomic-builtins]) && value=1 || value=0
+PPRINT_PROP_BOOL([Use toolchain atomic builtins], $value)
+
 PPRINT_PROP_BOOL([Multi-flavor support], 1)
 
 report_bindir="`eval eval echo $bindir`"
diff --git a/include/urcu/config.h.in b/include/urcu/config.h.in
index 99d763a..1daaa7e 100644
--- a/include/urcu/config.h.in
+++ b/include/urcu/config.h.in
@@ -19,6 +19,9 @@
    Introduces a performance penalty. */
 #undef CONFIG_RCU_DEBUG
 
+/* Uatomic API uses atomic builtins? */
+#undef CONFIG_RCU_USE_ATOMIC_BUILTINS
+
 /* Expose multi-flavor support */
 #define CONFIG_RCU_HAVE_MULTIFLAVOR 1
 
-- 
2.40.1

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [lttng-dev] [PATCH v2 02/12] urcu/compiler: Use atomic builtins if configured
  2023-05-15 20:17 [lttng-dev] [PATCH 00/11] Add support for TSAN to liburcu Olivier Dion via lttng-dev
                   ` (13 preceding siblings ...)
  2023-06-07 18:53 ` [lttng-dev] [PATCH v2 01/12] configure: Add --disable-atomic-builtins option Olivier Dion via lttng-dev
@ 2023-06-07 18:53 ` Olivier Dion via lttng-dev
  2023-06-07 18:53 ` [lttng-dev] [PATCH v2 03/12] urcu/arch/generic: " Olivier Dion via lttng-dev
                   ` (9 subsequent siblings)
  24 siblings, 0 replies; 69+ messages in thread
From: Olivier Dion via lttng-dev @ 2023-06-07 18:53 UTC (permalink / raw)
  To: lttng-dev; +Cc: Olivier Dion, Tony Finch, Paul E. McKenney

Use __atomic_signal_fence(__ATOMIC_SEQ_CST) for cmm_barrier() if
configured to use atomic builtins.

Change-Id: Ib168b50f1e97a8da861b92d6882c56db230ebb2c
Co-authored-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Olivier Dion <odion@efficios.com>
---
 include/urcu/compiler.h | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/include/urcu/compiler.h b/include/urcu/compiler.h
index 2f32b38..3604488 100644
--- a/include/urcu/compiler.h
+++ b/include/urcu/compiler.h
@@ -25,10 +25,16 @@
 # include <type_traits>	/* for std::remove_cv */
 #endif
 
+#include <urcu/config.h>
+
 #define caa_likely(x)	__builtin_expect(!!(x), 1)
 #define caa_unlikely(x)	__builtin_expect(!!(x), 0)
 
-#define	cmm_barrier()	__asm__ __volatile__ ("" : : : "memory")
+#ifdef CONFIG_RCU_USE_ATOMIC_BUILTINS
+#  define cmm_barrier() __atomic_signal_fence(__ATOMIC_SEQ_CST)
+#else
+#  define cmm_barrier() asm volatile ("" : : : "memory")
+#endif
 
 /*
  * Instruct the compiler to perform only a single access to a variable
-- 
2.40.1

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [lttng-dev] [PATCH v2 03/12] urcu/arch/generic: Use atomic builtins if configured
  2023-05-15 20:17 [lttng-dev] [PATCH 00/11] Add support for TSAN to liburcu Olivier Dion via lttng-dev
                   ` (14 preceding siblings ...)
  2023-06-07 18:53 ` [lttng-dev] [PATCH v2 02/12] urcu/compiler: Use atomic builtins if configured Olivier Dion via lttng-dev
@ 2023-06-07 18:53 ` Olivier Dion via lttng-dev
  2023-06-07 18:53 ` [lttng-dev] [PATCH v2 04/12] urcu/system: " Olivier Dion via lttng-dev
                   ` (8 subsequent siblings)
  24 siblings, 0 replies; 69+ messages in thread
From: Olivier Dion via lttng-dev @ 2023-06-07 18:53 UTC (permalink / raw)
  To: lttng-dev; +Cc: Olivier Dion, Tony Finch, Paul E. McKenney

If configured to use atomic builtins, implement SMP memory barriers in
term of atomic builtins if the architecture does not implement its own
version.

Change-Id: Iddc4283606e0fce572e104d2d3f03b5c0d9926fb
Co-authored-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Olivier Dion <odion@efficios.com>
---
 include/urcu/arch/generic.h | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/include/urcu/arch/generic.h b/include/urcu/arch/generic.h
index be6e41e..e292c70 100644
--- a/include/urcu/arch/generic.h
+++ b/include/urcu/arch/generic.h
@@ -43,6 +43,14 @@ extern "C" {
  * GCC builtins) as well as cmm_rmb and cmm_wmb (defaulting to cmm_mb).
  */
 
+#ifdef CONFIG_RCU_USE_ATOMIC_BUILTINS
+
+# ifndef cmm_smp_mb
+#  define cmm_smp_mb() __atomic_thread_fence(__ATOMIC_SEQ_CST)
+# endif
+
+#endif	/* CONFIG_RCU_USE_ATOMIC_BUILTINS */
+
 #ifndef cmm_mb
 #define cmm_mb()    __sync_synchronize()
 #endif
-- 
2.40.1

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [lttng-dev] [PATCH v2 04/12] urcu/system: Use atomic builtins if configured
  2023-05-15 20:17 [lttng-dev] [PATCH 00/11] Add support for TSAN to liburcu Olivier Dion via lttng-dev
                   ` (15 preceding siblings ...)
  2023-06-07 18:53 ` [lttng-dev] [PATCH v2 03/12] urcu/arch/generic: " Olivier Dion via lttng-dev
@ 2023-06-07 18:53 ` Olivier Dion via lttng-dev
  2023-06-21 23:23   ` Paul E. McKenney via lttng-dev
  2023-06-07 18:53 ` [lttng-dev] [PATCH v2 05/12] urcu/uatomic: Add CMM memory model Olivier Dion via lttng-dev
                   ` (7 subsequent siblings)
  24 siblings, 1 reply; 69+ messages in thread
From: Olivier Dion via lttng-dev @ 2023-06-07 18:53 UTC (permalink / raw)
  To: lttng-dev; +Cc: Olivier Dion, Tony Finch, Paul E. McKenney

If configured to use atomic builtins, use them for implementing the
CMM_LOAD_SHARED and CMM_STORE_SHARED macros.

Change-Id: I3eaaaaf0d26c47aced6e94b40fd59c7b8baa6272
Co-authored-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Olivier Dion <odion@efficios.com>
---
 include/urcu/system.h | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/include/urcu/system.h b/include/urcu/system.h
index faae390..f184aad 100644
--- a/include/urcu/system.h
+++ b/include/urcu/system.h
@@ -19,9 +19,28 @@
  * all copies or substantial portions of the Software.
  */
 
+#include <urcu/config.h>
 #include <urcu/compiler.h>
 #include <urcu/arch.h>
 
+#ifdef CONFIG_RCU_USE_ATOMIC_BUILTINS
+
+#define CMM_LOAD_SHARED(x)			\
+	__atomic_load_n(&(x), __ATOMIC_RELAXED)
+
+#define _CMM_LOAD_SHARED(x) CMM_LOAD_SHARED(x)
+
+#define CMM_STORE_SHARED(x, v)					\
+	__extension__						\
+	({							\
+		__typeof__(v) _v = (v);				\
+		__atomic_store_n(&(x), _v, __ATOMIC_RELAXED);	\
+		_v;						\
+	})
+
+#define _CMM_STORE_SHARED(x, v) CMM_STORE_SHARED(x, v)
+
+#else
 /*
  * Identify a shared load. A cmm_smp_rmc() or cmm_smp_mc() should come
  * before the load.
@@ -56,4 +75,6 @@
 		_v = _v;	/* Work around clang "unused result" */	\
 	})
 
+#endif	/* CONFIG_RCU_USE_ATOMIC_BUILTINS */
+
 #endif /* _URCU_SYSTEM_H */
-- 
2.40.1

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [lttng-dev] [PATCH v2 05/12] urcu/uatomic: Add CMM memory model
  2023-05-15 20:17 [lttng-dev] [PATCH 00/11] Add support for TSAN to liburcu Olivier Dion via lttng-dev
                   ` (16 preceding siblings ...)
  2023-06-07 18:53 ` [lttng-dev] [PATCH v2 04/12] urcu/system: " Olivier Dion via lttng-dev
@ 2023-06-07 18:53 ` Olivier Dion via lttng-dev
  2023-06-21 23:28   ` Paul E. McKenney via lttng-dev
  2023-06-07 18:53 ` [lttng-dev] [PATCH v2 06/12] urcu-wait: Fix wait state load/store Olivier Dion via lttng-dev
                   ` (6 subsequent siblings)
  24 siblings, 1 reply; 69+ messages in thread
From: Olivier Dion via lttng-dev @ 2023-06-07 18:53 UTC (permalink / raw)
  To: lttng-dev; +Cc: Olivier Dion, Tony Finch, Paul E. McKenney

Introducing the CMM memory model with the following new primitives:

  - uatomic_load(addr, memory_order)

  - uatomic_store(addr, value, memory_order)
  - uatomic_and_mo(addr, mask, memory_order)
  - uatomic_or_mo(addr, mask, memory_order)
  - uatomic_add_mo(addr, value, memory_order)
  - uatomic_sub_mo(addr, value, memory_order)
  - uatomic_inc_mo(addr, memory_order)
  - uatomic_dec_mo(addr, memory_order)

  - uatomic_add_return_mo(addr, value, memory_order)
  - uatomic_sub_return_mo(addr, value, memory_order)

  - uatomic_xchg_mo(addr, value, memory_order)

  - uatomic_cmpxchg_mo(addr, old, new,
                       memory_order_success,
                       memory_order_failure)

The CMM memory model reflects the C11 memory model with an additional
CMM_SEQ_CST_FENCE memory order. The memory order can be selected through
the enum cmm_memorder.

* With Atomic Builtins

If configured with atomic builtins, the correspondence between the CMM
memory model and the C11 memory model is a one to one at the exception
of the CMM_SEQ_CST_FENCE memory order which implies the memory order
CMM_SEQ_CST and a thread fence after the operation.

* Without Atomic Builtins

However, if not configured with atomic builtins, the following stipulate
the memory model.

For load operations with uatomic_load(), the memory orders CMM_RELAXED,
CMM_CONSUME, CMM_ACQUIRE, CMM_SEQ_CST and CMM_SEQ_CST_FENCE are
allowed. A barrier may be inserted before and after the load from memory
depending on the memory order:

  - CMM_RELAXED: No barrier
  - CMM_CONSUME: Memory barrier after read
  - CMM_ACQUIRE: Memory barrier after read
  - CMM_SEQ_CST: Memory barriers before and after read
  - CMM_SEQ_CST_FENCE: Memory barriers before and after read

For store operations with uatomic_store(), the memory orders
CMM_RELAXED, CMM_RELEASE, CMM_SEQ_CST and CMM_SEQ_CST_FENCE are
allowed. A barrier may be inserted before and after the store to memory
depending on the memory order:

  - CMM_RELAXED: No barrier
  - CMM_RELEASE: Memory barrier before operation
  - CMM_SEQ_CST: Memory barriers before and after operation
  - CMM_SEQ_CST_FENCE: Memory barriers before and after operation

For load/store operations with uatomic_and_mo(), uatomic_or_mo(),
uatomic_add_mo(), uatomic_sub_mo(), uatomic_inc_mo(), uatomic_dec_mo(),
uatomic_add_return_mo() and uatomic_sub_return_mo(), all memory orders
are allowed. A barrier may be inserted before and after the operation
depending on the memory order:

  - CMM_RELAXED: No barrier
  - CMM_ACQUIRE: Memory barrier after operation
  - CMM_CONSUME: Memory barrier after operation
  - CMM_RELEASE: Memory barrier before operation
  - CMM_ACQ_REL: Memory barriers before and after operation
  - CMM_SEQ_CST: Memory barriers before and after operation
  - CMM_SEQ_CST_FENCE: Memory barriers before and after operation

For the exchange operation uatomic_xchg_mo(), any memory order is
valid. A barrier may be inserted before and after the exchange to memory
depending on the memory order:

  - CMM_RELAXED: No barrier
  - CMM_ACQUIRE: Memory barrier after operation
  - CMM_CONSUME: Memory barrier after operation
  - CMM_RELEASE: Memory barrier before operation
  - CMM_ACQ_REL: Memory barriers before and after operation
  - CMM_SEQ_CST: Memory barriers before and after operation
  - CMM_SEQ_CST_FENCE: Memory barriers before and after operation

For the compare exchange operation uatomic_cmpxchg_mo(), the success
memory order can be anything while the failure memory order cannot be
CMM_RELEASE nor CMM_ACQ_REL and cannot be stronger than the success
memory order. A barrier may be inserted before and after the store to
memory depending on the memory orders:

 Success memory order:

  - CMM_RELAXED: No barrier
  - CMM_ACQUIRE: Memory barrier after operation
  - CMM_CONSUME: Memory barrier after operation
  - CMM_RELEASE: Memory barrier before operation
  - CMM_ACQ_REL: Memory barriers before and after operation
  - CMM_SEQ_CST: Memory barriers before and after operation
  - CMM_SEQ_CST_FENCE: Memory barriers before and after operation

  Barriers after the operations are only emitted if the compare exchange
  succeed.

 Failure memory order:
  - CMM_RELAXED: No barrier
  - CMM_ACQUIRE: Memory barrier after operation
  - CMM_CONSUME: Memory barrier after operation
  - CMM_SEQ_CST: Memory barriers before and after operation
  - CMM_SEQ_CST_FENCE: Memory barriers before and after operation

  Barriers after the operations are only emitted if the compare exchange
  failed.  Barriers before the operation are never emitted by this
  memory order.

Change-Id: I213ba19c84e82a63083f00143a3142ffbdab1d52
Co-authored-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Olivier Dion <odion@efficios.com>
---
 doc/uatomic-api.md                      |   3 +-
 include/Makefile.am                     |   2 +
 include/urcu/static/pointer.h           |  40 ++--
 include/urcu/uatomic.h                  |  63 ++++++-
 include/urcu/uatomic/builtins-generic.h | 170 +++++++++++++++++
 include/urcu/uatomic/builtins.h         |  79 ++++++++
 include/urcu/uatomic/generic.h          | 234 ++++++++++++++++++++++++
 src/urcu-pointer.c                      |   9 +-
 8 files changed, 565 insertions(+), 35 deletions(-)
 create mode 100644 include/urcu/uatomic/builtins-generic.h
 create mode 100644 include/urcu/uatomic/builtins.h

diff --git a/doc/uatomic-api.md b/doc/uatomic-api.md
index 0962399..7341ee8 100644
--- a/doc/uatomic-api.md
+++ b/doc/uatomic-api.md
@@ -52,7 +52,8 @@ An atomic read-modify-write operation that performs this
 sequence of operations atomically: check if `addr` contains `old`.
 If true, then replace the content of `addr` by `new`. Return the
 value previously contained by `addr`. This function implies a full
-memory barrier before and after the atomic operation.
+memory barrier before and after the atomic operation. The second memory
+barrier is only emitted if the operation succeeded.
 
 
 ```c
diff --git a/include/Makefile.am b/include/Makefile.am
index ba1fe60..b20e56d 100644
--- a/include/Makefile.am
+++ b/include/Makefile.am
@@ -63,6 +63,8 @@ nobase_include_HEADERS = \
 	urcu/uatomic/alpha.h \
 	urcu/uatomic_arch.h \
 	urcu/uatomic/arm.h \
+	urcu/uatomic/builtins.h \
+	urcu/uatomic/builtins-generic.h \
 	urcu/uatomic/gcc.h \
 	urcu/uatomic/generic.h \
 	urcu/uatomic.h \
diff --git a/include/urcu/static/pointer.h b/include/urcu/static/pointer.h
index 9e46a57..9da8657 100644
--- a/include/urcu/static/pointer.h
+++ b/include/urcu/static/pointer.h
@@ -96,23 +96,8 @@ extern "C" {
  * -Wincompatible-pointer-types errors.  Using the statement expression
  * makes it an rvalue and gets rid of the const-ness.
  */
-#ifdef __URCU_DEREFERENCE_USE_ATOMIC_CONSUME
-# define _rcu_dereference(p) __extension__ ({						\
-				__typeof__(__extension__ ({				\
-					__typeof__(p) __attribute__((unused)) _________p0 = { 0 }; \
-					_________p0;					\
-				})) _________p1;					\
-				__atomic_load(&(p), &_________p1, __ATOMIC_CONSUME);	\
-				(_________p1);						\
-			})
-#else
-# define _rcu_dereference(p) __extension__ ({						\
-				__typeof__(p) _________p1 = CMM_LOAD_SHARED(p);		\
-				cmm_smp_read_barrier_depends();				\
-				(_________p1);						\
-			})
-#endif
-
+# define _rcu_dereference(p)			\
+	uatomic_load(&(p), CMM_CONSUME)
 /**
  * _rcu_cmpxchg_pointer - same as rcu_assign_pointer, but tests if the pointer
  * is as expected by "old". If succeeds, returns the previous pointer to the
@@ -131,8 +116,9 @@ extern "C" {
 	({								\
 		__typeof__(*p) _________pold = (old);			\
 		__typeof__(*p) _________pnew = (_new);			\
-		uatomic_cmpxchg(p, _________pold, _________pnew);	\
-	})
+		uatomic_cmpxchg_mo(p, _________pold, _________pnew,	\
+				   CMM_SEQ_CST, CMM_SEQ_CST);		\
+	});
 
 /**
  * _rcu_xchg_pointer - same as rcu_assign_pointer, but returns the previous
@@ -149,17 +135,17 @@ extern "C" {
 	__extension__					\
 	({						\
 		__typeof__(*p) _________pv = (v);	\
-		uatomic_xchg(p, _________pv);		\
+		uatomic_xchg_mo(p, _________pv,		\
+				CMM_SEQ_CST);		\
 	})
 
 
-#define _rcu_set_pointer(p, v)				\
-	do {						\
-		__typeof__(*p) _________pv = (v);	\
-		if (!__builtin_constant_p(v) || 	\
-		    ((v) != NULL))			\
-			cmm_wmb();				\
-		uatomic_set(p, _________pv);		\
+#define _rcu_set_pointer(p, v)						\
+	do {								\
+		__typeof__(*p) _________pv = (v);			\
+		uatomic_store(p, _________pv,				\
+			__builtin_constant_p(v) && (v) == NULL ?	\
+			CMM_RELAXED : CMM_RELEASE);			\
 	} while (0)
 
 /**
diff --git a/include/urcu/uatomic.h b/include/urcu/uatomic.h
index 2fb5fd4..be857e1 100644
--- a/include/urcu/uatomic.h
+++ b/include/urcu/uatomic.h
@@ -21,9 +21,70 @@
 #ifndef _URCU_UATOMIC_H
 #define _URCU_UATOMIC_H
 
+#include <assert.h>
+
 #include <urcu/arch.h>
+#include <urcu/config.h>
 
-#if defined(URCU_ARCH_X86)
+enum cmm_memorder {
+	CMM_RELAXED = 0,
+	CMM_CONSUME = 1,
+	CMM_ACQUIRE = 2,
+	CMM_RELEASE = 3,
+	CMM_ACQ_REL = 4,
+	CMM_SEQ_CST = 5,
+	CMM_SEQ_CST_FENCE = 6,
+};
+
+#ifdef CONFIG_RCU_USE_ATOMIC_BUILTINS
+
+/*
+ * Make sure that CMM_SEQ_CST_FENCE is not equivalent to other memory orders.
+ */
+# ifdef static_assert
+static_assert(CMM_RELAXED == __ATOMIC_RELAXED, "");
+static_assert(CMM_CONSUME == __ATOMIC_CONSUME, "");
+static_assert(CMM_ACQUIRE == __ATOMIC_ACQUIRE, "");
+static_assert(CMM_RELEASE == __ATOMIC_RELEASE, "");
+static_assert(CMM_ACQ_REL == __ATOMIC_ACQ_REL, "");
+static_assert(CMM_SEQ_CST == __ATOMIC_SEQ_CST, "");
+# endif
+
+/*
+ * This is not part of the public API. It it used internally to implement the
+ * CMM_SEQ_CST_FENCE memory order.
+ *
+ * NOTE: Using switch here instead of if statement to avoid -Wduplicated-cond
+ * warning when memory order is conditionally determined.
+ */
+static inline void cmm_seq_cst_fence_after_atomic(enum cmm_memorder mo)
+{
+	switch (mo) {
+	case CMM_SEQ_CST_FENCE:
+		cmm_smp_mb();
+		break;
+	default:
+		break;
+	}
+}
+
+#endif
+
+/*
+ * This is not part of the public API. It is used internally to convert from the
+ * CMM memory model to the C11 memory model.
+ */
+static inline int cmm_to_c11(int mo)
+{
+	if (mo == CMM_SEQ_CST_FENCE) {
+		return CMM_SEQ_CST;
+	}
+	return mo;
+}
+
+#if defined(CONFIG_RCU_USE_ATOMIC_BUILTINS)
+#include <urcu/uatomic/builtins.h>
+#elif defined(URCU_ARCH_X86)
 #include <urcu/uatomic/x86.h>
 #elif defined(URCU_ARCH_PPC)
 #include <urcu/uatomic/ppc.h>
diff --git a/include/urcu/uatomic/builtins-generic.h b/include/urcu/uatomic/builtins-generic.h
new file mode 100644
index 0000000..673e888
--- /dev/null
+++ b/include/urcu/uatomic/builtins-generic.h
@@ -0,0 +1,170 @@
+/*
+ * urcu/uatomic/builtins-generic.h
+ *
+ * Copyright (c) 2023 Olivier Dion <odion@efficios.com>
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#ifndef _URCU_UATOMIC_BUILTINS_GENERIC_H
+#define _URCU_UATOMIC_BUILTINS_GENERIC_H
+
+#include <urcu/system.h>
+
+#define uatomic_store(addr, v, mo)				\
+	__extension__						\
+	({							\
+		__atomic_store_n(addr, v, cmm_to_c11(mo));	\
+		cmm_seq_cst_fence_after_atomic(mo);		\
+	})
+
+#define uatomic_set(addr, v)			\
+	uatomic_store(addr, v, CMM_RELAXED)
+
+#define uatomic_load(addr, mo)						\
+	__extension__							\
+	({								\
+		__typeof__(*(addr)) _value = __atomic_load_n(addr,	\
+							cmm_to_c11(mo)); \
+		cmm_seq_cst_fence_after_atomic(mo);			\
+									\
+		_value;							\
+	})
+
+#define uatomic_read(addr)			\
+	uatomic_load(addr, CMM_RELAXED)
+
+#define uatomic_cmpxchg_mo(addr, old, new, mos, mof)			\
+	__extension__							\
+	({								\
+		__typeof__(*(addr)) _old = (__typeof__(*(addr)))old;	\
+									\
+		if (__atomic_compare_exchange_n(addr, &_old, new, 0,	\
+							cmm_to_c11(mos), \
+							cmm_to_c11(mof))) { \
+			cmm_seq_cst_fence_after_atomic(mos);		\
+		} else {						\
+			cmm_seq_cst_fence_after_atomic(mof);		\
+		}							\
+		_old;							\
+	})
+
+#define uatomic_cmpxchg(addr, old, new)					\
+	uatomic_cmpxchg_mo(addr, old, new, CMM_SEQ_CST_FENCE, CMM_RELAXED)
+
+#define uatomic_xchg_mo(addr, v, mo)					\
+	__extension__							\
+	({								\
+		__typeof__((*addr)) _old = __atomic_exchange_n(addr, v,	\
+							cmm_to_c11(mo)); \
+		cmm_seq_cst_fence_after_atomic(mo);			\
+		_old;							\
+	})
+
+#define uatomic_xchg(addr, v)						\
+	uatomic_xchg_mo(addr, v, CMM_SEQ_CST_FENCE)
+
+#define uatomic_add_return_mo(addr, v, mo)				\
+	__extension__							\
+	({								\
+		__typeof__(*(addr)) _old = __atomic_add_fetch(addr, v,	\
+							cmm_to_c11(mo)); \
+		cmm_seq_cst_fence_after_atomic(mo);			\
+		_old;							\
+	})
+
+#define uatomic_add_return(addr, v)					\
+	uatomic_add_return_mo(addr, v, CMM_SEQ_CST_FENCE)
+
+#define uatomic_sub_return_mo(addr, v, mo)				\
+	__extension__							\
+	({								\
+		__typeof__(*(addr)) _old = __atomic_sub_fetch(addr, v,	\
+							cmm_to_c11(mo)); \
+		cmm_seq_cst_fence_after_atomic(mo);			\
+		_old;							\
+	})
+
+#define uatomic_sub_return(addr, v)					\
+	uatomic_sub_return_mo(addr, v, CMM_SEQ_CST_FENCE)
+
+#define uatomic_and_mo(addr, mask, mo)					\
+	__extension__							\
+	({								\
+		__typeof__(*(addr)) _old = __atomic_and_fetch(addr, mask, \
+							cmm_to_c11(mo)); \
+		cmm_seq_cst_fence_after_atomic(mo);			\
+		_old;							\
+	})
+
+#define uatomic_and(addr, mask)				\
+	(void) uatomic_and_mo(addr, mask, CMM_SEQ_CST)
+
+#define uatomic_or_mo(addr, mask, mo)					\
+	__extension__							\
+	({								\
+		__typeof__(*(addr)) _old = __atomic_or_fetch(addr, mask, \
+							cmm_to_c11(mo)); \
+		cmm_seq_cst_fence_after_atomic(mo);			\
+		_old;							\
+	})
+
+
+#define uatomic_or(addr, mask)				\
+	(void) uatomic_or_mo(addr, mask, CMM_RELAXED)
+
+#define uatomic_add_mo(addr, v, mo)			\
+	(void) uatomic_add_return_mo(addr, v, mo)
+
+#define uatomic_add(addr, v)				\
+	(void) uatomic_add_mo(addr, v, CMM_RELAXED)
+
+#define uatomic_sub_mo(addr, v, mo)			\
+	(void) uatomic_sub_return_mo(addr, v, mo)
+
+#define uatomic_sub(addr, v)				\
+	(void) uatomic_sub_mo(addr, v, CMM_RELAXED)
+
+#define uatomic_inc_mo(addr, mo)		\
+	(void) uatomic_add_mo(addr, 1, mo)
+
+#define uatomic_inc(addr)				\
+	(void) uatomic_inc_mo(addr, CMM_RELAXED)
+
+#define uatomic_dec_mo(addr, mo)		\
+	(void) uatomic_sub_mo(addr, 1, mo)
+
+#define uatomic_dec(addr)				\
+	(void) uatomic_dec_mo(addr, CMM_RELAXED)
+
+#define cmm_smp_mb__before_uatomic_and() cmm_smp_mb()
+#define cmm_smp_mb__after_uatomic_and()  cmm_smp_mb()
+
+#define cmm_smp_mb__before_uatomic_or() cmm_smp_mb()
+#define cmm_smp_mb__after_uatomic_or()  cmm_smp_mb()
+
+#define cmm_smp_mb__before_uatomic_add() cmm_smp_mb()
+#define cmm_smp_mb__after_uatomic_add()  cmm_smp_mb()
+
+#define cmm_smp_mb__before_uatomic_sub() cmm_smp_mb()
+#define cmm_smp_mb__after_uatomic_sub()  cmm_smp_mb()
+
+#define cmm_smp_mb__before_uatomic_inc() cmm_smp_mb()
+#define cmm_smp_mb__after_uatomic_inc() cmm_smp_mb()
+
+#define cmm_smp_mb__before_uatomic_dec() cmm_smp_mb()
+#define cmm_smp_mb__after_uatomic_dec() cmm_smp_mb()
+
+#endif /* _URCU_UATOMIC_BUILTINS_X86_H */
diff --git a/include/urcu/uatomic/builtins.h b/include/urcu/uatomic/builtins.h
new file mode 100644
index 0000000..82e98f8
--- /dev/null
+++ b/include/urcu/uatomic/builtins.h
@@ -0,0 +1,79 @@
+/*
+ * urcu/uatomic/builtins.h
+ *
+ * Copyright (c) 2023 Olivier Dion <odion@efficios.com>
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#ifndef _URCU_UATOMIC_BUILTINS_H
+#define _URCU_UATOMIC_BUILTINS_H
+
+#include <urcu/arch.h>
+
+#if defined(__has_builtin)
+# if !__has_builtin(__atomic_store_n)
+#  error "Toolchain does not support __atomic_store_n."
+# endif
+# if !__has_builtin(__atomic_load_n)
+#  error "Toolchain does not support __atomic_load_n."
+# endif
+# if !__has_builtin(__atomic_exchange_n)
+#  error "Toolchain does not support __atomic_exchange_n."
+# endif
+# if !__has_builtin(__atomic_compare_exchange_n)
+#  error "Toolchain does not support __atomic_compare_exchange_n."
+# endif
+# if !__has_builtin(__atomic_add_fetch)
+#  error "Toolchain does not support __atomic_add_fetch."
+# endif
+# if !__has_builtin(__atomic_sub_fetch)
+#  error "Toolchain does not support __atomic_sub_fetch."
+# endif
+# if !__has_builtin(__atomic_or_fetch)
+#  error "Toolchain does not support __atomic_or_fetch."
+# endif
+# if !__has_builtin(__atomic_thread_fence)
+#  error "Toolchain does not support __atomic_thread_fence."
+# endif
+# if !__has_builtin(__atomic_signal_fence)
+#  error "Toolchain does not support __atomic_signal_fence."
+# endif
+#elif defined(__GNUC__)
+# define GCC_VERSION (__GNUC__       * 10000 + \
+		       __GNUC_MINOR__ * 100   + \
+		       __GNUC_PATCHLEVEL__)
+# if  GCC_VERSION < 40700
+#  error "GCC version is too old. Version must be 4.7 or greater"
+# endif
+# undef  GCC_VERSION
+#else
+# error "Toolchain is not supported."
+#endif
+
+#if defined(__GNUC__)
+# define UATOMIC_HAS_ATOMIC_BYTE  __GCC_ATOMIC_CHAR_LOCK_FREE
+# define UATOMIC_HAS_ATOMIC_SHORT __GCC_ATOMIC_SHORT_LOCK_FREE
+#elif defined(__clang__)
+# define UATOMIC_HAS_ATOMIC_BYTE  __CLANG_ATOMIC_CHAR_LOCK_FREE
+# define UATOMIC_HAS_ATOMIC_SHORT __CLANG_ATOMIC_SHORT_LOCK_FREE
+#else
+/* #  define UATOMIC_HAS_ATOMIC_BYTE  */
+/* #  define UATOMIC_HAS_ATOMIC_SHORT */
+#endif
+
+#include <urcu/uatomic/builtins-generic.h>
+
+#endif	/* _URCU_UATOMIC_BUILTINS_H */
diff --git a/include/urcu/uatomic/generic.h b/include/urcu/uatomic/generic.h
index e31a19b..6b9c153 100644
--- a/include/urcu/uatomic/generic.h
+++ b/include/urcu/uatomic/generic.h
@@ -33,10 +33,244 @@ extern "C" {
 #define uatomic_set(addr, v)	((void) CMM_STORE_SHARED(*(addr), (v)))
 #endif
 
+extern void abort(void);
+
+#define uatomic_load_store_return_op(op, addr, v, mo)			\
+	__extension__							\
+	({								\
+									\
+		switch (mo) {						\
+		case CMM_ACQUIRE:					\
+		case CMM_CONSUME:					\
+		case CMM_RELAXED:					\
+			break;						\
+		case CMM_RELEASE:					\
+		case CMM_ACQ_REL:					\
+		case CMM_SEQ_CST:					\
+		case CMM_SEQ_CST_FENCE:					\
+			cmm_smp_mb();					\
+			break;						\
+		default:						\
+			abort();					\
+		}							\
+									\
+		__typeof__((*addr)) _value = op(addr, v);		\
+									\
+		switch (mo) {						\
+		case CMM_CONSUME:					\
+			cmm_smp_read_barrier_depends();			\
+			break;						\
+		case CMM_ACQUIRE:					\
+		case CMM_ACQ_REL:					\
+		case CMM_SEQ_CST:					\
+		case CMM_SEQ_CST_FENCE:					\
+			cmm_smp_mb();					\
+			break;						\
+		case CMM_RELAXED:					\
+		case CMM_RELEASE:					\
+			break;						\
+		default:						\
+			abort();					\
+		}							\
+		_value;							\
+	})
+
+#define uatomic_load_store_op(op, addr, v, mo)				\
+	({								\
+		switch (mo) {						\
+		case CMM_ACQUIRE:					\
+		case CMM_CONSUME:					\
+		case CMM_RELAXED:					\
+			break;						\
+		case CMM_RELEASE:					\
+		case CMM_ACQ_REL:					\
+		case CMM_SEQ_CST:					\
+		case CMM_SEQ_CST_FENCE:					\
+			cmm_smp_mb();					\
+			break;						\
+		default:						\
+			abort();					\
+		}							\
+									\
+		op(addr, v);						\
+									\
+		switch (mo) {						\
+		case CMM_CONSUME:					\
+			cmm_smp_read_barrier_depends();			\
+			break;						\
+		case CMM_ACQUIRE:					\
+		case CMM_ACQ_REL:					\
+		case CMM_SEQ_CST:					\
+		case CMM_SEQ_CST_FENCE:					\
+			cmm_smp_mb();					\
+			break;						\
+		case CMM_RELAXED:					\
+		case CMM_RELEASE:					\
+			break;						\
+		default:						\
+			abort();					\
+		}							\
+	})
+
+#define uatomic_store(addr, v, mo)			\
+	({						\
+		switch (mo) {				\
+		case CMM_RELAXED:			\
+			break;				\
+		case CMM_RELEASE:			\
+		case CMM_SEQ_CST:			\
+		case CMM_SEQ_CST_FENCE:			\
+			cmm_smp_mb();			\
+			break;				\
+		default:				\
+			abort();			\
+		}					\
+							\
+		uatomic_set(addr, v);			\
+							\
+		switch (mo) {				\
+		case CMM_RELAXED:			\
+		case CMM_RELEASE:			\
+			break;				\
+		case CMM_SEQ_CST:			\
+		case CMM_SEQ_CST_FENCE:			\
+			cmm_smp_mb();			\
+			break;				\
+		default:				\
+			abort();			\
+		}					\
+	})
+
+#define uatomic_and_mo(addr, v, mo)				\
+	uatomic_load_store_op(uatomic_and, addr, v, mo)
+
+#define uatomic_or_mo(addr, v, mo)				\
+	uatomic_load_store_op(uatomic_or, addr, v, mo)
+
+#define uatomic_add_mo(addr, v, mo)				\
+	uatomic_load_store_op(uatomic_add, addr, v, mo)
+
+#define uatomic_sub_mo(addr, v, mo)				\
+	uatomic_load_store_op(uatomic_sub, addr, v, mo)
+
+#define uatomic_inc_mo(addr, mo)				\
+	uatomic_load_store_op(uatomic_add, addr, 1, mo)
+
+#define uatomic_dec_mo(addr, mo)				\
+	uatomic_load_store_op(uatomic_add, addr, -1, mo)
+/*
+ * NOTE: We can not just do switch (_value == (old) ? mos : mof) otherwise the
+ * compiler emit a -Wduplicated-cond warning.
+ */
+#define uatomic_cmpxchg_mo(addr, old, new, mos, mof)			\
+	__extension__							\
+	({								\
+		switch (mos) {						\
+		case CMM_ACQUIRE:					\
+		case CMM_CONSUME:					\
+		case CMM_RELAXED:					\
+			break;						\
+		case CMM_RELEASE:					\
+		case CMM_ACQ_REL:					\
+		case CMM_SEQ_CST:					\
+		case CMM_SEQ_CST_FENCE:					\
+			cmm_smp_mb();					\
+			break;						\
+		default:						\
+			abort();					\
+		}							\
+									\
+		__typeof__(*(addr)) _value = uatomic_cmpxchg(addr, old,	\
+							new);		\
+									\
+		if (_value == (old)) {					\
+			switch (mos) {					\
+			case CMM_CONSUME:				\
+				cmm_smp_read_barrier_depends();		\
+				break;					\
+			case CMM_ACQUIRE:				\
+			case CMM_ACQ_REL:				\
+			case CMM_SEQ_CST:				\
+			case CMM_SEQ_CST_FENCE:				\
+				cmm_smp_mb();				\
+				break;					\
+			case CMM_RELAXED:				\
+			case CMM_RELEASE:				\
+				break;					\
+			default:					\
+				abort();				\
+			}						\
+		} else {						\
+			switch (mof) {					\
+			case CMM_CONSUME:				\
+				cmm_smp_read_barrier_depends();		\
+				break;					\
+			case CMM_ACQUIRE:				\
+			case CMM_ACQ_REL:				\
+			case CMM_SEQ_CST:				\
+			case CMM_SEQ_CST_FENCE:				\
+				cmm_smp_mb();				\
+				break;					\
+			case CMM_RELAXED:				\
+			case CMM_RELEASE:				\
+				break;					\
+			default:					\
+				abort();				\
+			}						\
+		}							\
+		_value;							\
+	})
+
+#define uatomic_xchg_mo(addr, v, mo)				\
+	uatomic_load_store_return_op(uatomic_xchg, addr, v, mo)
+
+#define uatomic_add_return_mo(addr, v, mo)				\
+	uatomic_load_store_return_op(uatomic_add_return, addr, v)
+
+#define uatomic_sub_return_mo(addr, v, mo)				\
+	uatomic_load_store_return_op(uatomic_sub_return, addr, v)
+
+
 #ifndef uatomic_read
 #define uatomic_read(addr)	CMM_LOAD_SHARED(*(addr))
 #endif
 
+#define uatomic_load(addr, mo)						\
+	__extension__							\
+	({								\
+		switch (mo) {						\
+		case CMM_ACQUIRE:					\
+		case CMM_CONSUME:					\
+		case CMM_RELAXED:					\
+			break;						\
+		case CMM_SEQ_CST:					\
+		case CMM_SEQ_CST_FENCE:					\
+			cmm_smp_mb();					\
+			break;						\
+		default:						\
+			abort();					\
+		}							\
+									\
+		__typeof__(*(addr)) _rcu_value = uatomic_read(addr);	\
+									\
+		switch (mo) {						\
+		case CMM_RELAXED:					\
+			break;						\
+		case CMM_CONSUME:					\
+			cmm_smp_read_barrier_depends();			\
+			break;						\
+		case CMM_ACQUIRE:					\
+		case CMM_SEQ_CST:					\
+		case CMM_SEQ_CST_FENCE:					\
+			cmm_smp_mb();					\
+			break;						\
+		default:						\
+			abort();					\
+		}							\
+									\
+		_rcu_value;						\
+	})
+
 #if !defined __OPTIMIZE__  || defined UATOMIC_NO_LINK_ERROR
 #ifdef ILLEGAL_INSTR
 static inline __attribute__((always_inline))
diff --git a/src/urcu-pointer.c b/src/urcu-pointer.c
index d0854ac..cea8aeb 100644
--- a/src/urcu-pointer.c
+++ b/src/urcu-pointer.c
@@ -39,19 +39,16 @@ void *rcu_dereference_sym(void *p)
 
 void *rcu_set_pointer_sym(void **p, void *v)
 {
-	cmm_wmb();
-	uatomic_set(p, v);
+	uatomic_store(p, v, CMM_RELEASE);
 	return v;
 }
 
 void *rcu_xchg_pointer_sym(void **p, void *v)
 {
-	cmm_wmb();
-	return uatomic_xchg(p, v);
+	return uatomic_xchg_mo(p, v, CMM_SEQ_CST);
 }
 
 void *rcu_cmpxchg_pointer_sym(void **p, void *old, void *_new)
 {
-	cmm_wmb();
-	return uatomic_cmpxchg(p, old, _new);
+	return uatomic_cmpxchg_mo(p, old, _new, CMM_SEQ_CST, CMM_SEQ_CST);
 }
-- 
2.40.1

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [lttng-dev] [PATCH v2 06/12] urcu-wait: Fix wait state load/store
  2023-05-15 20:17 [lttng-dev] [PATCH 00/11] Add support for TSAN to liburcu Olivier Dion via lttng-dev
                   ` (17 preceding siblings ...)
  2023-06-07 18:53 ` [lttng-dev] [PATCH v2 05/12] urcu/uatomic: Add CMM memory model Olivier Dion via lttng-dev
@ 2023-06-07 18:53 ` Olivier Dion via lttng-dev
  2023-06-07 18:53 ` [lttng-dev] [PATCH v2 07/12] tests: Use uatomic for accessing global states Olivier Dion via lttng-dev
                   ` (5 subsequent siblings)
  24 siblings, 0 replies; 69+ messages in thread
From: Olivier Dion via lttng-dev @ 2023-06-07 18:53 UTC (permalink / raw)
  To: lttng-dev; +Cc: Olivier Dion, Tony Finch, Paul E. McKenney

The state of a wait node must be accessed atomically. Also, the action
of busy loading until the teardown state is seen must follow a
CMM_ACQUIRE semantic while storing the teardown must follow a
CMM_RELEASE semantic.

Change-Id: I9cd9cf4cd9ab2081551d7f33c0b1c23c3cf3942f
Co-authored-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Olivier Dion <odion@efficios.com>
---
 src/urcu-wait.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/urcu-wait.h b/src/urcu-wait.h
index ef5f7ed..4667a13 100644
--- a/src/urcu-wait.h
+++ b/src/urcu-wait.h
@@ -135,7 +135,7 @@ void urcu_adaptative_wake_up(struct urcu_wait_node *wait)
 			urcu_die(errno);
 	}
 	/* Allow teardown of struct urcu_wait memory. */
-	uatomic_or(&wait->state, URCU_WAIT_TEARDOWN);
+	uatomic_or_mo(&wait->state, URCU_WAIT_TEARDOWN, CMM_RELEASE);
 }
 
 /*
@@ -193,7 +193,7 @@ skip_futex_wait:
 			break;
 		caa_cpu_relax();
 	}
-	while (!(uatomic_read(&wait->state) & URCU_WAIT_TEARDOWN))
+	while (!(uatomic_load(&wait->state, CMM_ACQUIRE) & URCU_WAIT_TEARDOWN))
 		poll(NULL, 0, 10);
 	urcu_posix_assert(uatomic_read(&wait->state) & URCU_WAIT_TEARDOWN);
 }
@@ -209,7 +209,7 @@ void urcu_wake_all_waiters(struct urcu_waiters *waiters)
 			caa_container_of(iter, struct urcu_wait_node, node);
 
 		/* Don't wake already running threads */
-		if (wait_node->state & URCU_WAIT_RUNNING)
+		if (uatomic_load(&wait_node->state, CMM_RELAXED) & URCU_WAIT_RUNNING)
 			continue;
 		urcu_adaptative_wake_up(wait_node);
 	}
-- 
2.40.1

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [lttng-dev] [PATCH v2 07/12] tests: Use uatomic for accessing global states
  2023-05-15 20:17 [lttng-dev] [PATCH 00/11] Add support for TSAN to liburcu Olivier Dion via lttng-dev
                   ` (18 preceding siblings ...)
  2023-06-07 18:53 ` [lttng-dev] [PATCH v2 06/12] urcu-wait: Fix wait state load/store Olivier Dion via lttng-dev
@ 2023-06-07 18:53 ` Olivier Dion via lttng-dev
  2023-06-21 23:37   ` Paul E. McKenney via lttng-dev
  2023-06-07 18:53 ` [lttng-dev] [PATCH v2 08/12] benchmark: " Olivier Dion via lttng-dev
                   ` (4 subsequent siblings)
  24 siblings, 1 reply; 69+ messages in thread
From: Olivier Dion via lttng-dev @ 2023-06-07 18:53 UTC (permalink / raw)
  To: lttng-dev; +Cc: Olivier Dion, Tony Finch, Paul E. McKenney

Global states accesses were protected via memory barriers. Use the
uatomic API with the CMM memory model so that TSAN does not warns about
none atomic concurrent accesses.

Also, the thread id map mutex must be unlocked after setting the new
created thread id in the map. Otherwise, the new thread could observe an
unset id.

Change-Id: I1ecdc387b3f510621cbc116ad3b95c676f5d659a
Co-authored-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Olivier Dion <odion@efficios.com>
---
 tests/common/api.h            |  12 ++--
 tests/regression/rcutorture.h | 106 +++++++++++++++++++++++-----------
 2 files changed, 80 insertions(+), 38 deletions(-)

diff --git a/tests/common/api.h b/tests/common/api.h
index a260463..9d22b0f 100644
--- a/tests/common/api.h
+++ b/tests/common/api.h
@@ -26,6 +26,7 @@
 
 #include <urcu/compiler.h>
 #include <urcu/arch.h>
+#include <urcu/uatomic.h>
 
 /*
  * Machine parameters.
@@ -135,7 +136,7 @@ static int __smp_thread_id(void)
 	thread_id_t tid = pthread_self();
 
 	for (i = 0; i < NR_THREADS; i++) {
-		if (__thread_id_map[i] == tid) {
+		if (uatomic_read(&__thread_id_map[i]) == tid) {
 			long v = i + 1;  /* must be non-NULL. */
 
 			if (pthread_setspecific(thread_id_key, (void *)v) != 0) {
@@ -184,12 +185,13 @@ static thread_id_t create_thread(void *(*func)(void *), void *arg)
 		exit(-1);
 	}
 	__thread_id_map[i] = __THREAD_ID_MAP_WAITING;
-	spin_unlock(&__thread_id_map_mutex);
+
 	if (pthread_create(&tid, NULL, func, arg) != 0) {
 		perror("create_thread:pthread_create");
 		exit(-1);
 	}
-	__thread_id_map[i] = tid;
+	uatomic_set(&__thread_id_map[i], tid);
+	spin_unlock(&__thread_id_map_mutex);
 	return tid;
 }
 
@@ -199,7 +201,7 @@ static void *wait_thread(thread_id_t tid)
 	void *vp;
 
 	for (i = 0; i < NR_THREADS; i++) {
-		if (__thread_id_map[i] == tid)
+		if (uatomic_read(&__thread_id_map[i]) == tid)
 			break;
 	}
 	if (i >= NR_THREADS){
@@ -211,7 +213,7 @@ static void *wait_thread(thread_id_t tid)
 		perror("wait_thread:pthread_join");
 		exit(-1);
 	}
-	__thread_id_map[i] = __THREAD_ID_MAP_EMPTY;
+	uatomic_set(&__thread_id_map[i], __THREAD_ID_MAP_EMPTY);
 	return vp;
 }
 
diff --git a/tests/regression/rcutorture.h b/tests/regression/rcutorture.h
index bc394f9..5835b8f 100644
--- a/tests/regression/rcutorture.h
+++ b/tests/regression/rcutorture.h
@@ -44,6 +44,14 @@
  * data.  A correct RCU implementation will have all but the first two
  * numbers non-zero.
  *
+ * rcu_stress_count: Histogram of "ages" of structures seen by readers.  If any
+ * entries past the first two are non-zero, RCU is broken. The age of a newly
+ * allocated structure is zero, it becomes one when removed from reader
+ * visibility, and is incremented once per grace period subsequently -- and is
+ * freed after passing through (RCU_STRESS_PIPE_LEN-2) grace periods.  Since
+ * this tests only has one true writer (there are fake writers), only buckets at
+ * indexes 0 and 1 should be none-zero.
+ *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
  * the Free Software Foundation; either version 2 of the License, or
@@ -68,6 +76,8 @@
 #include <stdlib.h>
 #include "tap.h"
 
+#include <urcu/uatomic.h>
+
 #define NR_TESTS	1
 
 DEFINE_PER_THREAD(long long, n_reads_pt);
@@ -145,10 +155,10 @@ void *rcu_read_perf_test(void *arg)
 	run_on(me);
 	uatomic_inc(&nthreadsrunning);
 	put_thread_offline();
-	while (goflag == GOFLAG_INIT)
+	while (uatomic_read(&goflag) == GOFLAG_INIT)
 		(void) poll(NULL, 0, 1);
 	put_thread_online();
-	while (goflag == GOFLAG_RUN) {
+	while (uatomic_read(&goflag) == GOFLAG_RUN) {
 		for (i = 0; i < RCU_READ_RUN; i++) {
 			rcu_read_lock();
 			/* rcu_read_lock_nest(); */
@@ -180,9 +190,9 @@ void *rcu_update_perf_test(void *arg __attribute__((unused)))
 		}
 	}
 	uatomic_inc(&nthreadsrunning);
-	while (goflag == GOFLAG_INIT)
+	while (uatomic_read(&goflag) == GOFLAG_INIT)
 		(void) poll(NULL, 0, 1);
-	while (goflag == GOFLAG_RUN) {
+	while (uatomic_read(&goflag) == GOFLAG_RUN) {
 		synchronize_rcu();
 		n_updates_local++;
 	}
@@ -211,15 +221,11 @@ int perftestrun(int nthreads, int nreaders, int nupdaters)
 	int t;
 	int duration = 1;
 
-	cmm_smp_mb();
 	while (uatomic_read(&nthreadsrunning) < nthreads)
 		(void) poll(NULL, 0, 1);
-	goflag = GOFLAG_RUN;
-	cmm_smp_mb();
+	uatomic_set(&goflag, GOFLAG_RUN);
 	sleep(duration);
-	cmm_smp_mb();
-	goflag = GOFLAG_STOP;
-	cmm_smp_mb();
+	uatomic_set(&goflag, GOFLAG_STOP);
 	wait_all_threads();
 	for_each_thread(t) {
 		n_reads += per_thread(n_reads_pt, t);
@@ -300,6 +306,13 @@ struct rcu_stress rcu_stress_array[RCU_STRESS_PIPE_LEN] = { { 0, 0 } };
 struct rcu_stress *rcu_stress_current;
 int rcu_stress_idx = 0;
 
+/*
+ * How many time a reader has seen something that should not be visible. It is
+ * an error if this value is different than zero at the end of the stress test.
+ *
+ * Here, the something that should not be visibile is an old pipe that has been
+ * freed (mbtest = 0).
+ */
 int n_mberror = 0;
 DEFINE_PER_THREAD(long long [RCU_STRESS_PIPE_LEN + 1], rcu_stress_count);
 
@@ -315,19 +328,25 @@ void *rcu_read_stress_test(void *arg __attribute__((unused)))
 
 	rcu_register_thread();
 	put_thread_offline();
-	while (goflag == GOFLAG_INIT)
+	while (uatomic_read(&goflag) == GOFLAG_INIT)
 		(void) poll(NULL, 0, 1);
 	put_thread_online();
-	while (goflag == GOFLAG_RUN) {
+	while (uatomic_read(&goflag) == GOFLAG_RUN) {
 		rcu_read_lock();
 		p = rcu_dereference(rcu_stress_current);
 		if (p->mbtest == 0)
-			n_mberror++;
+			uatomic_inc_mo(&n_mberror, CMM_RELAXED);
 		rcu_read_lock_nest();
+		/*
+		 * The value of garbage is nothing important. This is
+		 * essentially a busy loop. The atomic operation -- while not
+		 * important here -- helps tools such as TSAN to not flag this
+		 * as a race condition.
+		 */
 		for (i = 0; i < 100; i++)
-			garbage++;
+			uatomic_inc(&garbage);
 		rcu_read_unlock_nest();
-		pc = p->pipe_count;
+		pc = uatomic_read(&p->pipe_count);
 		rcu_read_unlock();
 		if ((pc > RCU_STRESS_PIPE_LEN) || (pc < 0))
 			pc = RCU_STRESS_PIPE_LEN;
@@ -397,26 +416,47 @@ static
 void *rcu_update_stress_test(void *arg __attribute__((unused)))
 {
 	int i;
-	struct rcu_stress *p;
+	struct rcu_stress *p, *old_p;
 	struct rcu_head rh;
 	enum writer_state writer_state = WRITER_STATE_SYNC_RCU;
 
-	while (goflag == GOFLAG_INIT)
+	rcu_register_thread();
+
+	put_thread_offline();
+	while (uatomic_read(&goflag) == GOFLAG_INIT)
 		(void) poll(NULL, 0, 1);
-	while (goflag == GOFLAG_RUN) {
+
+	put_thread_online();
+	while (uatomic_read(&goflag) == GOFLAG_RUN) {
 		i = rcu_stress_idx + 1;
 		if (i >= RCU_STRESS_PIPE_LEN)
 			i = 0;
+		/*
+		 * Get old pipe that we free after a synchronize_rcu().
+		 */
+		rcu_read_lock();
+		old_p = rcu_dereference(rcu_stress_current);
+		rcu_read_unlock();
+
+		/*
+		 * Allocate a new pipe.
+		 */
 		p = &rcu_stress_array[i];
-		p->mbtest = 0;
-		cmm_smp_mb();
 		p->pipe_count = 0;
 		p->mbtest = 1;
+
 		rcu_assign_pointer(rcu_stress_current, p);
 		rcu_stress_idx = i;
+
+		/*
+		 * Increment every pipe except the freshly allocated one. A
+		 * reader should only see either the old pipe or the new
+		 * pipe. This is reflected in the rcu_stress_count histogram.
+		 */
 		for (i = 0; i < RCU_STRESS_PIPE_LEN; i++)
 			if (i != rcu_stress_idx)
-				rcu_stress_array[i].pipe_count++;
+				uatomic_inc(&rcu_stress_array[i].pipe_count);
+
 		switch (writer_state) {
 		case WRITER_STATE_SYNC_RCU:
 			synchronize_rcu();
@@ -432,9 +472,7 @@ void *rcu_update_stress_test(void *arg __attribute__((unused)))
 					strerror(errno));
 				abort();
 			}
-			rcu_register_thread();
 			call_rcu(&rh, rcu_update_stress_test_rcu);
-			rcu_unregister_thread();
 			/*
 			 * Our MacOS X test machine with the following
 			 * config:
@@ -470,18 +508,24 @@ void *rcu_update_stress_test(void *arg __attribute__((unused)))
 		{
 			struct urcu_gp_poll_state poll_state;
 
-			rcu_register_thread();
 			poll_state = start_poll_synchronize_rcu();
-			rcu_unregister_thread();
 			while (!poll_state_synchronize_rcu(poll_state))
 				(void) poll(NULL, 0, 1);	/* Wait for 1ms */
 			break;
 		}
 		}
+		/*
+		 * No readers should see that old pipe now. Setting mbtest to 0
+		 * to mark it as "freed".
+		 */
+		old_p->mbtest = 0;
 		n_updates++;
 		advance_writer_state(&writer_state);
 	}
 
+	put_thread_offline();
+	rcu_unregister_thread();
+
 	return NULL;
 }
 
@@ -497,9 +541,9 @@ void *rcu_fake_update_stress_test(void *arg __attribute__((unused)))
 			set_thread_call_rcu_data(crdp);
 		}
 	}
-	while (goflag == GOFLAG_INIT)
+	while (uatomic_read(&goflag) == GOFLAG_INIT)
 		(void) poll(NULL, 0, 1);
-	while (goflag == GOFLAG_RUN) {
+	while (uatomic_read(&goflag) == GOFLAG_RUN) {
 		synchronize_rcu();
 		(void) poll(NULL, 0, 1);
 	}
@@ -535,13 +579,9 @@ int stresstest(int nreaders)
 	create_thread(rcu_update_stress_test, NULL);
 	for (i = 0; i < 5; i++)
 		create_thread(rcu_fake_update_stress_test, NULL);
-	cmm_smp_mb();
-	goflag = GOFLAG_RUN;
-	cmm_smp_mb();
+	uatomic_set(&goflag, GOFLAG_RUN);
 	sleep(10);
-	cmm_smp_mb();
-	goflag = GOFLAG_STOP;
-	cmm_smp_mb();
+	uatomic_set(&goflag, GOFLAG_STOP);
 	wait_all_threads();
 	for_each_thread(t)
 		n_reads += per_thread(n_reads_pt, t);
-- 
2.40.1

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [lttng-dev] [PATCH v2 08/12] benchmark: Use uatomic for accessing global states
  2023-05-15 20:17 [lttng-dev] [PATCH 00/11] Add support for TSAN to liburcu Olivier Dion via lttng-dev
                   ` (19 preceding siblings ...)
  2023-06-07 18:53 ` [lttng-dev] [PATCH v2 07/12] tests: Use uatomic for accessing global states Olivier Dion via lttng-dev
@ 2023-06-07 18:53 ` Olivier Dion via lttng-dev
  2023-06-21 23:38   ` Paul E. McKenney via lttng-dev
  2023-06-07 18:53 ` [lttng-dev] [PATCH v2 09/12] tests/unit/test_build: Quiet unused return value Olivier Dion via lttng-dev
                   ` (3 subsequent siblings)
  24 siblings, 1 reply; 69+ messages in thread
From: Olivier Dion via lttng-dev @ 2023-06-07 18:53 UTC (permalink / raw)
  To: lttng-dev; +Cc: Olivier Dion, Tony Finch, Paul E. McKenney

Global states accesses were protected via memory barriers. Use the
uatomic API with the CMM memory model so that TSAN can understand the
ordering imposed by the synchronization flags.

Change-Id: I1bf5702c5ac470f308c478effe39e424a3158060
Co-authored-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Olivier Dion <odion@efficios.com>
---
 tests/benchmark/Makefile.am             | 91 +++++++++++++------------
 tests/benchmark/common-states.c         |  1 +
 tests/benchmark/common-states.h         | 51 ++++++++++++++
 tests/benchmark/test_mutex.c            | 32 +--------
 tests/benchmark/test_perthreadlock.c    | 32 +--------
 tests/benchmark/test_rwlock.c           | 32 +--------
 tests/benchmark/test_urcu.c             | 33 +--------
 tests/benchmark/test_urcu_assign.c      | 33 +--------
 tests/benchmark/test_urcu_bp.c          | 33 +--------
 tests/benchmark/test_urcu_defer.c       | 33 +--------
 tests/benchmark/test_urcu_gc.c          | 34 ++-------
 tests/benchmark/test_urcu_hash.c        |  6 +-
 tests/benchmark/test_urcu_hash.h        | 15 ----
 tests/benchmark/test_urcu_hash_rw.c     | 10 +--
 tests/benchmark/test_urcu_hash_unique.c | 10 +--
 tests/benchmark/test_urcu_lfq.c         | 20 ++----
 tests/benchmark/test_urcu_lfs.c         | 20 ++----
 tests/benchmark/test_urcu_lfs_rcu.c     | 20 ++----
 tests/benchmark/test_urcu_qsbr.c        | 33 +--------
 tests/benchmark/test_urcu_qsbr_gc.c     | 34 ++-------
 tests/benchmark/test_urcu_wfcq.c        | 22 +++---
 tests/benchmark/test_urcu_wfq.c         | 20 ++----
 tests/benchmark/test_urcu_wfs.c         | 22 +++---
 23 files changed, 177 insertions(+), 460 deletions(-)
 create mode 100644 tests/benchmark/common-states.c
 create mode 100644 tests/benchmark/common-states.h

diff --git a/tests/benchmark/Makefile.am b/tests/benchmark/Makefile.am
index c53e025..a7f91c2 100644
--- a/tests/benchmark/Makefile.am
+++ b/tests/benchmark/Makefile.am
@@ -1,4 +1,5 @@
 AM_CPPFLAGS += -I$(top_srcdir)/src -I$(top_srcdir)/tests/common
+AM_CPPFLAGS += -include $(top_srcdir)/tests/benchmark/common-states.h
 
 TEST_EXTENSIONS = .tap
 TAP_LOG_DRIVER_FLAGS = --merge --comments
@@ -7,6 +8,8 @@ TAP_LOG_DRIVER = env AM_TAP_AWK='$(AWK)' \
 	URCU_TESTS_BUILDDIR='$(abs_top_builddir)/tests' \
 	$(SHELL) $(top_srcdir)/tests/utils/tap-driver.sh
 
+noinst_HEADERS = common-states.h
+
 SCRIPT_LIST = \
 	runpaul-phase1.sh \
 	runpaul-phase2.sh \
@@ -61,163 +64,163 @@ URCU_CDS_LIB=$(top_builddir)/src/liburcu-cds.la
 
 DEBUG_YIELD_LIB=$(builddir)/../common/libdebug-yield.la
 
-test_urcu_SOURCES = test_urcu.c
+test_urcu_SOURCES = test_urcu.c common-states.c
 test_urcu_LDADD = $(URCU_LIB)
 
-test_urcu_dynamic_link_SOURCES = test_urcu.c
+test_urcu_dynamic_link_SOURCES = test_urcu.c common-states.c
 test_urcu_dynamic_link_LDADD = $(URCU_LIB)
 test_urcu_dynamic_link_CFLAGS = -DDYNAMIC_LINK_TEST $(AM_CFLAGS)
 
-test_urcu_timing_SOURCES = test_urcu_timing.c
+test_urcu_timing_SOURCES = test_urcu_timing.c common-states.c
 test_urcu_timing_LDADD = $(URCU_LIB)
 
-test_urcu_yield_SOURCES = test_urcu.c
+test_urcu_yield_SOURCES = test_urcu.c common-states.c
 test_urcu_yield_LDADD = $(URCU_LIB) $(DEBUG_YIELD_LIB)
 test_urcu_yield_CFLAGS = -DDEBUG_YIELD $(AM_CFLAGS)
 
 
-test_urcu_qsbr_SOURCES = test_urcu_qsbr.c
+test_urcu_qsbr_SOURCES = test_urcu_qsbr.c common-states.c
 test_urcu_qsbr_LDADD = $(URCU_QSBR_LIB)
 
-test_urcu_qsbr_timing_SOURCES = test_urcu_qsbr_timing.c
+test_urcu_qsbr_timing_SOURCES = test_urcu_qsbr_timing.c common-states.c
 test_urcu_qsbr_timing_LDADD = $(URCU_QSBR_LIB)
 
 
-test_urcu_mb_SOURCES = test_urcu.c
+test_urcu_mb_SOURCES = test_urcu.c common-states.c
 test_urcu_mb_LDADD = $(URCU_MB_LIB)
 test_urcu_mb_CFLAGS = -DRCU_MB $(AM_CFLAGS)
 
 
-test_urcu_signal_SOURCES = test_urcu.c
+test_urcu_signal_SOURCES = test_urcu.c common-states.c
 test_urcu_signal_LDADD = $(URCU_SIGNAL_LIB)
 test_urcu_signal_CFLAGS = -DRCU_SIGNAL $(AM_CFLAGS)
 
-test_urcu_signal_dynamic_link_SOURCES = test_urcu.c
+test_urcu_signal_dynamic_link_SOURCES = test_urcu.c common-states.c
 test_urcu_signal_dynamic_link_LDADD = $(URCU_SIGNAL_LIB)
 test_urcu_signal_dynamic_link_CFLAGS = -DRCU_SIGNAL -DDYNAMIC_LINK_TEST \
 					$(AM_CFLAGS)
 
-test_urcu_signal_timing_SOURCES = test_urcu_timing.c
+test_urcu_signal_timing_SOURCES = test_urcu_timing.c common-states.c
 test_urcu_signal_timing_LDADD = $(URCU_SIGNAL_LIB)
 test_urcu_signal_timing_CFLAGS= -DRCU_SIGNAL $(AM_CFLAGS)
 
-test_urcu_signal_yield_SOURCES = test_urcu.c
+test_urcu_signal_yield_SOURCES = test_urcu.c common-states.c
 test_urcu_signal_yield_LDADD = $(URCU_SIGNAL_LIB) $(DEBUG_YIELD_LIB)
 test_urcu_signal_yield_CFLAGS = -DRCU_SIGNAL -DDEBUG_YIELD $(AM_CFLAGS)
 
-test_rwlock_timing_SOURCES = test_rwlock_timing.c
+test_rwlock_timing_SOURCES = test_rwlock_timing.c common-states.c
 test_rwlock_timing_LDADD = $(URCU_SIGNAL_LIB)
 
-test_rwlock_SOURCES = test_rwlock.c
+test_rwlock_SOURCES = test_rwlock.c common-states.c
 test_rwlock_LDADD = $(URCU_SIGNAL_LIB)
 
-test_perthreadlock_timing_SOURCES = test_perthreadlock_timing.c
+test_perthreadlock_timing_SOURCES = test_perthreadlock_timing.c common-states.c
 test_perthreadlock_timing_LDADD = $(URCU_SIGNAL_LIB)
 
-test_perthreadlock_SOURCES = test_perthreadlock.c
+test_perthreadlock_SOURCES = test_perthreadlock.c common-states.c
 test_perthreadlock_LDADD = $(URCU_SIGNAL_LIB)
 
-test_mutex_SOURCES = test_mutex.c
+test_mutex_SOURCES = test_mutex.c common-states.c
 
-test_looplen_SOURCES = test_looplen.c
+test_looplen_SOURCES = test_looplen.c common-states.c
 
-test_urcu_gc_SOURCES = test_urcu_gc.c
+test_urcu_gc_SOURCES = test_urcu_gc.c common-states.c
 test_urcu_gc_LDADD = $(URCU_LIB)
 
-test_urcu_signal_gc_SOURCES = test_urcu_gc.c
+test_urcu_signal_gc_SOURCES = test_urcu_gc.c common-states.c
 test_urcu_signal_gc_LDADD = $(URCU_SIGNAL_LIB)
 test_urcu_signal_gc_CFLAGS = -DRCU_SIGNAL $(AM_CFLAGS)
 
-test_urcu_mb_gc_SOURCES = test_urcu_gc.c
+test_urcu_mb_gc_SOURCES = test_urcu_gc.c common-states.c
 test_urcu_mb_gc_LDADD = $(URCU_MB_LIB)
 test_urcu_mb_gc_CFLAGS = -DRCU_MB $(AM_CFLAGS)
 
-test_urcu_qsbr_gc_SOURCES = test_urcu_qsbr_gc.c
+test_urcu_qsbr_gc_SOURCES = test_urcu_qsbr_gc.c common-states.c
 test_urcu_qsbr_gc_LDADD = $(URCU_QSBR_LIB)
 
-test_urcu_qsbr_lgc_SOURCES = test_urcu_qsbr_gc.c
+test_urcu_qsbr_lgc_SOURCES = test_urcu_qsbr_gc.c common-states.c
 test_urcu_qsbr_lgc_LDADD = $(URCU_QSBR_LIB)
 test_urcu_qsbr_lgc_CFLAGS = -DTEST_LOCAL_GC $(AM_CFLAGS)
 
-test_urcu_lgc_SOURCES = test_urcu_gc.c
+test_urcu_lgc_SOURCES = test_urcu_gc.c common-states.c
 test_urcu_lgc_LDADD = $(URCU_LIB)
 test_urcu_lgc_CFLAGS = -DTEST_LOCAL_GC $(AM_CFLAGS)
 
-test_urcu_signal_lgc_SOURCES = test_urcu_gc.c
+test_urcu_signal_lgc_SOURCES = test_urcu_gc.c common-states.c
 test_urcu_signal_lgc_LDADD = $(URCU_SIGNAL_LIB)
 test_urcu_signal_lgc_CFLAGS = -DRCU_SIGNAL -DTEST_LOCAL_GC $(AM_CFLAGS)
 
-test_urcu_mb_lgc_SOURCES = test_urcu_gc.c
+test_urcu_mb_lgc_SOURCES = test_urcu_gc.c common-states.c
 test_urcu_mb_lgc_LDADD = $(URCU_MB_LIB)
 test_urcu_mb_lgc_CFLAGS = -DTEST_LOCAL_GC -DRCU_MB $(AM_CFLAGS)
 
-test_urcu_qsbr_dynamic_link_SOURCES = test_urcu_qsbr.c
+test_urcu_qsbr_dynamic_link_SOURCES = test_urcu_qsbr.c common-states.c
 test_urcu_qsbr_dynamic_link_LDADD = $(URCU_QSBR_LIB)
 test_urcu_qsbr_dynamic_link_CFLAGS = -DDYNAMIC_LINK_TEST $(AM_CFLAGS)
 
-test_urcu_defer_SOURCES = test_urcu_defer.c
+test_urcu_defer_SOURCES = test_urcu_defer.c common-states.c
 test_urcu_defer_LDADD = $(URCU_LIB)
 
 test_cycles_per_loop_SOURCES = test_cycles_per_loop.c
 
-test_urcu_assign_SOURCES = test_urcu_assign.c
+test_urcu_assign_SOURCES = test_urcu_assign.c common-states.c
 test_urcu_assign_LDADD = $(URCU_LIB)
 
-test_urcu_assign_dynamic_link_SOURCES = test_urcu_assign.c
+test_urcu_assign_dynamic_link_SOURCES = test_urcu_assign.c common-states.c
 test_urcu_assign_dynamic_link_LDADD = $(URCU_LIB)
 test_urcu_assign_dynamic_link_CFLAGS = -DDYNAMIC_LINK_TEST $(AM_CFLAGS)
 
-test_urcu_bp_SOURCES = test_urcu_bp.c
+test_urcu_bp_SOURCES = test_urcu_bp.c common-states.c
 test_urcu_bp_LDADD = $(URCU_BP_LIB)
 
-test_urcu_bp_dynamic_link_SOURCES = test_urcu_bp.c
+test_urcu_bp_dynamic_link_SOURCES = test_urcu_bp.c common-states.c
 test_urcu_bp_dynamic_link_LDADD = $(URCU_BP_LIB)
 test_urcu_bp_dynamic_link_CFLAGS = -DDYNAMIC_LINK_TEST $(AM_CFLAGS)
 
-test_urcu_lfq_SOURCES = test_urcu_lfq.c
+test_urcu_lfq_SOURCES = test_urcu_lfq.c common-states.c
 test_urcu_lfq_LDADD = $(URCU_LIB) $(URCU_CDS_LIB)
 
-test_urcu_lfq_dynlink_SOURCES = test_urcu_lfq.c
+test_urcu_lfq_dynlink_SOURCES = test_urcu_lfq.c common-states.c
 test_urcu_lfq_dynlink_CFLAGS = -DDYNAMIC_LINK_TEST $(AM_CFLAGS)
 test_urcu_lfq_dynlink_LDADD = $(URCU_LIB) $(URCU_CDS_LIB)
 
-test_urcu_wfq_SOURCES = test_urcu_wfq.c
+test_urcu_wfq_SOURCES = test_urcu_wfq.c common-states.c
 test_urcu_wfq_LDADD = $(URCU_COMMON_LIB)
 
-test_urcu_wfq_dynlink_SOURCES = test_urcu_wfq.c
+test_urcu_wfq_dynlink_SOURCES = test_urcu_wfq.c common-states.c
 test_urcu_wfq_dynlink_CFLAGS = -DDYNAMIC_LINK_TEST $(AM_CFLAGS)
 test_urcu_wfq_dynlink_LDADD = $(URCU_COMMON_LIB)
 
-test_urcu_wfcq_SOURCES = test_urcu_wfcq.c
+test_urcu_wfcq_SOURCES = test_urcu_wfcq.c common-states.c
 test_urcu_wfcq_LDADD = $(URCU_COMMON_LIB)
 
-test_urcu_wfcq_dynlink_SOURCES = test_urcu_wfcq.c
+test_urcu_wfcq_dynlink_SOURCES = test_urcu_wfcq.c common-states.c
 test_urcu_wfcq_dynlink_CFLAGS = -DDYNAMIC_LINK_TEST $(AM_CFLAGS)
 test_urcu_wfcq_dynlink_LDADD = $(URCU_COMMON_LIB)
 
-test_urcu_lfs_SOURCES = test_urcu_lfs.c
+test_urcu_lfs_SOURCES = test_urcu_lfs.c common-states.c
 test_urcu_lfs_LDADD = $(URCU_LIB) $(URCU_CDS_LIB)
 
-test_urcu_lfs_rcu_SOURCES = test_urcu_lfs_rcu.c
+test_urcu_lfs_rcu_SOURCES = test_urcu_lfs_rcu.c common-states.c
 test_urcu_lfs_rcu_LDADD = $(URCU_LIB) $(URCU_CDS_LIB)
 
-test_urcu_lfs_dynlink_SOURCES = test_urcu_lfs.c
+test_urcu_lfs_dynlink_SOURCES = test_urcu_lfs.c common-states.c
 test_urcu_lfs_dynlink_CFLAGS = -DDYNAMIC_LINK_TEST $(AM_CFLAGS)
 test_urcu_lfs_dynlink_LDADD = $(URCU_LIB) $(URCU_CDS_LIB)
 
-test_urcu_lfs_rcu_dynlink_SOURCES = test_urcu_lfs_rcu.c
+test_urcu_lfs_rcu_dynlink_SOURCES = test_urcu_lfs_rcu.c common-states.c
 test_urcu_lfs_rcu_dynlink_CFLAGS = -DDYNAMIC_LINK_TEST $(AM_CFLAGS)
 test_urcu_lfs_rcu_dynlink_LDADD = $(URCU_LIB) $(URCU_CDS_LIB)
 
-test_urcu_wfs_SOURCES = test_urcu_wfs.c
+test_urcu_wfs_SOURCES = test_urcu_wfs.c common-states.c
 test_urcu_wfs_LDADD = $(URCU_COMMON_LIB)
 
-test_urcu_wfs_dynlink_SOURCES = test_urcu_wfs.c
+test_urcu_wfs_dynlink_SOURCES = test_urcu_wfs.c common-states.c
 test_urcu_wfs_dynlink_CFLAGS = -DDYNAMIC_LINK_TEST $(AM_CFLAGS)
 test_urcu_wfs_dynlink_LDADD = $(URCU_COMMON_LIB)
 
 test_urcu_hash_SOURCES = test_urcu_hash.c test_urcu_hash.h \
-		test_urcu_hash_rw.c test_urcu_hash_unique.c
+		test_urcu_hash_rw.c test_urcu_hash_unique.c common-states.c
 test_urcu_hash_CFLAGS = -DRCU_QSBR $(AM_CFLAGS)
 test_urcu_hash_LDADD = $(URCU_QSBR_LIB) $(URCU_COMMON_LIB) $(URCU_CDS_LIB)
 
diff --git a/tests/benchmark/common-states.c b/tests/benchmark/common-states.c
new file mode 100644
index 0000000..6e70351
--- /dev/null
+++ b/tests/benchmark/common-states.c
@@ -0,0 +1 @@
+volatile int _test_go = 0, _test_stop = 0;
diff --git a/tests/benchmark/common-states.h b/tests/benchmark/common-states.h
new file mode 100644
index 0000000..dfbbfe5
--- /dev/null
+++ b/tests/benchmark/common-states.h
@@ -0,0 +1,51 @@
+/* Common states for benchmarks. */
+
+#include <unistd.h>
+
+#include <urcu/uatomic.h>
+
+extern volatile int _test_go, _test_stop;
+
+static inline void complete_sleep(unsigned int seconds)
+{
+	while (seconds != 0) {
+		seconds = sleep(seconds);
+	}
+}
+
+static inline void begin_test(void)
+{
+	uatomic_store(&_test_go, 1, CMM_RELEASE);
+}
+
+static inline void end_test(void)
+{
+	uatomic_store(&_test_stop, 1, CMM_RELAXED);
+}
+
+static inline void test_for(unsigned int duration)
+{
+	begin_test();
+	complete_sleep(duration);
+	end_test();
+}
+
+static inline void wait_until_go(void)
+{
+	while (!uatomic_load(&_test_go, CMM_ACQUIRE))
+	{
+	}
+}
+
+/*
+ * returns 0 if test should end.
+ */
+static inline int test_duration_write(void)
+{
+	return !uatomic_load(&_test_stop, CMM_RELAXED);
+}
+
+static inline int test_duration_read(void)
+{
+	return !uatomic_load(&_test_stop, CMM_RELAXED);
+}
diff --git a/tests/benchmark/test_mutex.c b/tests/benchmark/test_mutex.c
index 55f7c38..145139c 100644
--- a/tests/benchmark/test_mutex.c
+++ b/tests/benchmark/test_mutex.c
@@ -49,8 +49,6 @@ struct test_array {
 
 static pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
 
-static volatile int test_go, test_stop;
-
 static unsigned long wdelay;
 
 static volatile struct test_array test_array = { 8 };
@@ -111,19 +109,6 @@ static void set_affinity(void)
 #endif /* HAVE_SCHED_SETAFFINITY */
 }
 
-/*
- * returns 0 if test should end.
- */
-static int test_duration_write(void)
-{
-	return !test_stop;
-}
-
-static int test_duration_read(void)
-{
-	return !test_stop;
-}
-
 static DEFINE_URCU_TLS(unsigned long long, nr_writes);
 static DEFINE_URCU_TLS(unsigned long long, nr_reads);
 
@@ -147,9 +132,7 @@ void *thr_reader(void *data)
 
 	set_affinity();
 
-	while (!test_go)
-	{
-	}
+	wait_until_go();
 
 	for (;;) {
 		int v;
@@ -182,10 +165,7 @@ void *thr_writer(void *data)
 
 	set_affinity();
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	for (;;) {
 		pthread_mutex_lock(&lock);
@@ -325,13 +305,7 @@ int main(int argc, char **argv)
 			exit(1);
 	}
 
-	cmm_smp_mb();
-
-	test_go = 1;
-
-	sleep(duration);
-
-	test_stop = 1;
+	test_for(duration);
 
 	for (i_thr = 0; i_thr < nr_readers; i_thr++) {
 		err = pthread_join(tid_reader[i_thr], &tret);
diff --git a/tests/benchmark/test_perthreadlock.c b/tests/benchmark/test_perthreadlock.c
index 47a512c..bf468eb 100644
--- a/tests/benchmark/test_perthreadlock.c
+++ b/tests/benchmark/test_perthreadlock.c
@@ -53,8 +53,6 @@ struct per_thread_lock {
 
 static struct per_thread_lock *per_thread_lock;
 
-static volatile int test_go, test_stop;
-
 static unsigned long wdelay;
 
 static volatile struct test_array test_array = { 8 };
@@ -117,19 +115,6 @@ static void set_affinity(void)
 #endif /* HAVE_SCHED_SETAFFINITY */
 }
 
-/*
- * returns 0 if test should end.
- */
-static int test_duration_write(void)
-{
-	return !test_stop;
-}
-
-static int test_duration_read(void)
-{
-	return !test_stop;
-}
-
 static DEFINE_URCU_TLS(unsigned long long, nr_writes);
 static DEFINE_URCU_TLS(unsigned long long, nr_reads);
 
@@ -175,9 +160,7 @@ void *thr_reader(void *data)
 
 	set_affinity();
 
-	while (!test_go)
-	{
-	}
+	wait_until_go();
 
 	for (;;) {
 		int v;
@@ -211,10 +194,7 @@ void *thr_writer(void *data)
 
 	set_affinity();
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	for (;;) {
 		for (tidx = 0; tidx < (long)nr_readers; tidx++) {
@@ -359,13 +339,7 @@ int main(int argc, char **argv)
 			exit(1);
 	}
 
-	cmm_smp_mb();
-
-	test_go = 1;
-
-	sleep(duration);
-
-	test_stop = 1;
+	test_for(duration);
 
 	for (i_thr = 0; i_thr < nr_readers; i_thr++) {
 		err = pthread_join(tid_reader[i_thr], &tret);
diff --git a/tests/benchmark/test_rwlock.c b/tests/benchmark/test_rwlock.c
index 6908ea4..f5099e8 100644
--- a/tests/benchmark/test_rwlock.c
+++ b/tests/benchmark/test_rwlock.c
@@ -53,8 +53,6 @@ struct test_array {
  */
 pthread_rwlock_t lock;
 
-static volatile int test_go, test_stop;
-
 static unsigned long wdelay;
 
 static volatile struct test_array test_array = { 8 };
@@ -116,19 +114,6 @@ static void set_affinity(void)
 #endif /* HAVE_SCHED_SETAFFINITY */
 }
 
-/*
- * returns 0 if test should end.
- */
-static int test_duration_write(void)
-{
-	return !test_stop;
-}
-
-static int test_duration_read(void)
-{
-	return !test_stop;
-}
-
 static DEFINE_URCU_TLS(unsigned long long, nr_writes);
 static DEFINE_URCU_TLS(unsigned long long, nr_reads);
 
@@ -147,9 +132,7 @@ void *thr_reader(void *_count)
 
 	set_affinity();
 
-	while (!test_go)
-	{
-	}
+	wait_until_go();
 
 	for (;;) {
 		int a, ret;
@@ -194,10 +177,7 @@ void *thr_writer(void *_count)
 
 	set_affinity();
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	for (;;) {
 		int ret;
@@ -355,13 +335,7 @@ int main(int argc, char **argv)
 			exit(1);
 	}
 
-	cmm_smp_mb();
-
-	test_go = 1;
-
-	sleep(duration);
-
-	test_stop = 1;
+	test_for(duration);
 
 	for (i_thr = 0; i_thr < nr_readers; i_thr++) {
 		err = pthread_join(tid_reader[i_thr], &tret);
diff --git a/tests/benchmark/test_urcu.c b/tests/benchmark/test_urcu.c
index ea849fa..b89513b 100644
--- a/tests/benchmark/test_urcu.c
+++ b/tests/benchmark/test_urcu.c
@@ -44,8 +44,6 @@
 #endif
 #include <urcu.h>
 
-static volatile int test_go, test_stop;
-
 static unsigned long wdelay;
 
 static int *test_rcu_pointer;
@@ -107,19 +105,6 @@ static void set_affinity(void)
 #endif /* HAVE_SCHED_SETAFFINITY */
 }
 
-/*
- * returns 0 if test should end.
- */
-static int test_duration_write(void)
-{
-	return !test_stop;
-}
-
-static int test_duration_read(void)
-{
-	return !test_stop;
-}
-
 static DEFINE_URCU_TLS(unsigned long long, nr_writes);
 static DEFINE_URCU_TLS(unsigned long long, nr_reads);
 
@@ -142,10 +127,7 @@ void *thr_reader(void *_count)
 	rcu_register_thread();
 	urcu_posix_assert(!rcu_read_ongoing());
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	for (;;) {
 		rcu_read_lock();
@@ -186,10 +168,7 @@ void *thr_writer(void *_count)
 
 	set_affinity();
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	for (;;) {
 		new = malloc(sizeof(int));
@@ -337,13 +316,7 @@ int main(int argc, char **argv)
 			exit(1);
 	}
 
-	cmm_smp_mb();
-
-	test_go = 1;
-
-	sleep(duration);
-
-	test_stop = 1;
+	test_for(duration);
 
 	for (i_thr = 0; i_thr < nr_readers; i_thr++) {
 		err = pthread_join(tid_reader[i_thr], &tret);
diff --git a/tests/benchmark/test_urcu_assign.c b/tests/benchmark/test_urcu_assign.c
index 88889a8..e83b05e 100644
--- a/tests/benchmark/test_urcu_assign.c
+++ b/tests/benchmark/test_urcu_assign.c
@@ -48,8 +48,6 @@ struct test_array {
 	int a;
 };
 
-static volatile int test_go, test_stop;
-
 static unsigned long wdelay;
 
 static struct test_array *test_rcu_pointer;
@@ -111,19 +109,6 @@ static void set_affinity(void)
 #endif /* HAVE_SCHED_SETAFFINITY */
 }
 
-/*
- * returns 0 if test should end.
- */
-static int test_duration_write(void)
-{
-	return !test_stop;
-}
-
-static int test_duration_read(void)
-{
-	return !test_stop;
-}
-
 static DEFINE_URCU_TLS(unsigned long long, nr_writes);
 static DEFINE_URCU_TLS(unsigned long long, nr_reads);
 
@@ -201,10 +186,7 @@ void *thr_reader(void *_count)
 
 	rcu_register_thread();
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	for (;;) {
 		rcu_read_lock();
@@ -240,10 +222,7 @@ void *thr_writer(void *_count)
 
 	set_affinity();
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	for (;;) {
 		rcu_copy_mutex_lock();
@@ -394,13 +373,7 @@ int main(int argc, char **argv)
 			exit(1);
 	}
 
-	cmm_smp_mb();
-
-	test_go = 1;
-
-	sleep(duration);
-
-	test_stop = 1;
+	test_for(duration);
 
 	for (i_thr = 0; i_thr < nr_readers; i_thr++) {
 		err = pthread_join(tid_reader[i_thr], &tret);
diff --git a/tests/benchmark/test_urcu_bp.c b/tests/benchmark/test_urcu_bp.c
index 6f8c59d..c3b00f1 100644
--- a/tests/benchmark/test_urcu_bp.c
+++ b/tests/benchmark/test_urcu_bp.c
@@ -44,8 +44,6 @@
 #endif
 #include <urcu-bp.h>
 
-static volatile int test_go, test_stop;
-
 static unsigned long wdelay;
 
 static int *test_rcu_pointer;
@@ -107,19 +105,6 @@ static void set_affinity(void)
 #endif /* HAVE_SCHED_SETAFFINITY */
 }
 
-/*
- * returns 0 if test should end.
- */
-static int test_duration_write(void)
-{
-	return !test_stop;
-}
-
-static int test_duration_read(void)
-{
-	return !test_stop;
-}
-
 static DEFINE_URCU_TLS(unsigned long long, nr_writes);
 static DEFINE_URCU_TLS(unsigned long long, nr_reads);
 
@@ -142,10 +127,7 @@ void *thr_reader(void *_count)
 	rcu_register_thread();
 	urcu_posix_assert(!rcu_read_ongoing());
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	for (;;) {
 		rcu_read_lock();
@@ -182,10 +164,7 @@ void *thr_writer(void *_count)
 
 	set_affinity();
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	for (;;) {
 		new = malloc(sizeof(int));
@@ -332,13 +311,7 @@ int main(int argc, char **argv)
 			exit(1);
 	}
 
-	cmm_smp_mb();
-
-	test_go = 1;
-
-	sleep(duration);
-
-	test_stop = 1;
+	test_for(duration);
 
 	for (i_thr = 0; i_thr < nr_readers; i_thr++) {
 		err = pthread_join(tid_reader[i_thr], &tret);
diff --git a/tests/benchmark/test_urcu_defer.c b/tests/benchmark/test_urcu_defer.c
index e948ebf..c501f60 100644
--- a/tests/benchmark/test_urcu_defer.c
+++ b/tests/benchmark/test_urcu_defer.c
@@ -49,8 +49,6 @@ struct test_array {
 	int a;
 };
 
-static volatile int test_go, test_stop;
-
 static unsigned long wdelay;
 
 static struct test_array *test_rcu_pointer;
@@ -112,19 +110,6 @@ static void set_affinity(void)
 #endif /* HAVE_SCHED_SETAFFINITY */
 }
 
-/*
- * returns 0 if test should end.
- */
-static int test_duration_write(void)
-{
-	return !test_stop;
-}
-
-static int test_duration_read(void)
-{
-	return !test_stop;
-}
-
 static DEFINE_URCU_TLS(unsigned long long, nr_writes);
 static DEFINE_URCU_TLS(unsigned long long, nr_reads);
 
@@ -149,10 +134,7 @@ void *thr_reader(void *_count)
 
 	rcu_register_thread();
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	for (;;) {
 		rcu_read_lock();
@@ -203,10 +185,7 @@ void *thr_writer(void *data)
 		exit(-1);
 	}
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	for (;;) {
 		new = malloc(sizeof(*new));
@@ -359,13 +338,7 @@ int main(int argc, char **argv)
 			exit(1);
 	}
 
-	cmm_smp_mb();
-
-	test_go = 1;
-
-	sleep(duration);
-
-	test_stop = 1;
+	test_for(duration);
 
 	for (i_thr = 0; i_thr < nr_readers; i_thr++) {
 		err = pthread_join(tid_reader[i_thr], &tret);
diff --git a/tests/benchmark/test_urcu_gc.c b/tests/benchmark/test_urcu_gc.c
index f14f728..1cbee44 100644
--- a/tests/benchmark/test_urcu_gc.c
+++ b/tests/benchmark/test_urcu_gc.c
@@ -33,6 +33,7 @@
 #include <urcu/arch.h>
 #include <urcu/assert.h>
 #include <urcu/tls-compat.h>
+#include <urcu/uatomic.h>
 #include "thread-id.h"
 #include "../common/debug-yield.h"
 
@@ -48,8 +49,6 @@ struct test_array {
 	int a;
 };
 
-static volatile int test_go, test_stop;
-
 static unsigned long wdelay;
 
 static struct test_array *test_rcu_pointer;
@@ -120,19 +119,6 @@ static void set_affinity(void)
 #endif /* HAVE_SCHED_SETAFFINITY */
 }
 
-/*
- * returns 0 if test should end.
- */
-static int test_duration_write(void)
-{
-	return !test_stop;
-}
-
-static int test_duration_read(void)
-{
-	return !test_stop;
-}
-
 static DEFINE_URCU_TLS(unsigned long long, nr_writes);
 static DEFINE_URCU_TLS(unsigned long long, nr_reads);
 
@@ -157,10 +143,7 @@ void *thr_reader(void *_count)
 
 	rcu_register_thread();
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	for (;;) {
 		rcu_read_lock();
@@ -231,10 +214,7 @@ void *thr_writer(void *data)
 
 	set_affinity();
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	for (;;) {
 #ifndef TEST_LOCAL_GC
@@ -399,13 +379,7 @@ int main(int argc, char **argv)
 			exit(1);
 	}
 
-	cmm_smp_mb();
-
-	test_go = 1;
-
-	sleep(duration);
-
-	test_stop = 1;
+	test_for(duration);
 
 	for (i_thr = 0; i_thr < nr_readers; i_thr++) {
 		err = pthread_join(tid_reader[i_thr], &tret);
diff --git a/tests/benchmark/test_urcu_hash.c b/tests/benchmark/test_urcu_hash.c
index 3574b4c..1a3087e 100644
--- a/tests/benchmark/test_urcu_hash.c
+++ b/tests/benchmark/test_urcu_hash.c
@@ -96,8 +96,6 @@ DEFINE_URCU_TLS(unsigned long, lookup_ok);
 
 struct cds_lfht *test_ht;
 
-volatile int test_go, test_stop;
-
 unsigned long wdelay;
 
 unsigned long duration;
@@ -649,14 +647,14 @@ int main(int argc, char **argv)
 
 	cmm_smp_mb();
 
-	test_go = 1;
+	begin_test();
 
 	remain = duration;
 	do {
 		remain = sleep(remain);
 	} while (remain > 0);
 
-	test_stop = 1;
+	end_test();
 
 end_pthread_join:
 	for (i_thr = 0; i_thr < nr_readers_created; i_thr++) {
diff --git a/tests/benchmark/test_urcu_hash.h b/tests/benchmark/test_urcu_hash.h
index 47b2ae3..73a0a6d 100644
--- a/tests/benchmark/test_urcu_hash.h
+++ b/tests/benchmark/test_urcu_hash.h
@@ -125,8 +125,6 @@ cds_lfht_iter_get_test_node(struct cds_lfht_iter *iter)
 	return to_test_node(cds_lfht_iter_get_node(iter));
 }
 
-extern volatile int test_go, test_stop;
-
 extern unsigned long wdelay;
 
 extern unsigned long duration;
@@ -174,19 +172,6 @@ extern pthread_mutex_t affinity_mutex;
 
 void set_affinity(void);
 
-/*
- * returns 0 if test should end.
- */
-static inline int test_duration_write(void)
-{
-	return !test_stop;
-}
-
-static inline int test_duration_read(void)
-{
-	return !test_stop;
-}
-
 extern DECLARE_URCU_TLS(unsigned long long, nr_writes);
 extern DECLARE_URCU_TLS(unsigned long long, nr_reads);
 
diff --git a/tests/benchmark/test_urcu_hash_rw.c b/tests/benchmark/test_urcu_hash_rw.c
index 862a6f0..087e869 100644
--- a/tests/benchmark/test_urcu_hash_rw.c
+++ b/tests/benchmark/test_urcu_hash_rw.c
@@ -73,10 +73,7 @@ void *test_hash_rw_thr_reader(void *_count)
 
 	rcu_register_thread();
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	for (;;) {
 		rcu_read_lock();
@@ -133,10 +130,7 @@ void *test_hash_rw_thr_writer(void *_count)
 
 	rcu_register_thread();
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	for (;;) {
 		struct cds_lfht_node *ret_node = NULL;
diff --git a/tests/benchmark/test_urcu_hash_unique.c b/tests/benchmark/test_urcu_hash_unique.c
index de7c427..90c0e19 100644
--- a/tests/benchmark/test_urcu_hash_unique.c
+++ b/tests/benchmark/test_urcu_hash_unique.c
@@ -71,10 +71,7 @@ void *test_hash_unique_thr_reader(void *_count)
 
 	rcu_register_thread();
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	for (;;) {
 		struct lfht_test_node *node;
@@ -136,10 +133,7 @@ void *test_hash_unique_thr_writer(void *_count)
 
 	rcu_register_thread();
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	for (;;) {
 		/*
diff --git a/tests/benchmark/test_urcu_lfq.c b/tests/benchmark/test_urcu_lfq.c
index 490e8b0..50c4211 100644
--- a/tests/benchmark/test_urcu_lfq.c
+++ b/tests/benchmark/test_urcu_lfq.c
@@ -47,8 +47,6 @@
 #include <urcu.h>
 #include <urcu/cds.h>
 
-static volatile int test_go, test_stop;
-
 static unsigned long rduration;
 
 static unsigned long duration;
@@ -110,12 +108,12 @@ static void set_affinity(void)
  */
 static int test_duration_dequeue(void)
 {
-	return !test_stop;
+	return test_duration_read();
 }
 
 static int test_duration_enqueue(void)
 {
-	return !test_stop;
+	return test_duration_write();
 }
 
 static DEFINE_URCU_TLS(unsigned long long, nr_dequeues);
@@ -146,10 +144,7 @@ void *thr_enqueuer(void *_count)
 
 	rcu_register_thread();
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	for (;;) {
 		struct test *node = malloc(sizeof(*node));
@@ -202,10 +197,7 @@ void *thr_dequeuer(void *_count)
 
 	rcu_register_thread();
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	for (;;) {
 		struct cds_lfq_node_rcu *qnode;
@@ -375,7 +367,7 @@ int main(int argc, char **argv)
 
 	cmm_smp_mb();
 
-	test_go = 1;
+	begin_test();
 
 	for (i_thr = 0; i_thr < duration; i_thr++) {
 		sleep(1);
@@ -385,7 +377,7 @@ int main(int argc, char **argv)
 		}
 	}
 
-	test_stop = 1;
+	end_test();
 
 	for (i_thr = 0; i_thr < nr_enqueuers; i_thr++) {
 		err = pthread_join(tid_enqueuer[i_thr], &tret);
diff --git a/tests/benchmark/test_urcu_lfs.c b/tests/benchmark/test_urcu_lfs.c
index 52239e0..48b2b23 100644
--- a/tests/benchmark/test_urcu_lfs.c
+++ b/tests/benchmark/test_urcu_lfs.c
@@ -59,8 +59,6 @@ enum test_sync {
 
 static enum test_sync test_sync;
 
-static volatile int test_go, test_stop;
-
 static unsigned long rduration;
 
 static unsigned long duration;
@@ -124,12 +122,12 @@ static void set_affinity(void)
  */
 static int test_duration_dequeue(void)
 {
-	return !test_stop;
+	return test_duration_read();
 }
 
 static int test_duration_enqueue(void)
 {
-	return !test_stop;
+	return test_duration_write();
 }
 
 static DEFINE_URCU_TLS(unsigned long long, nr_dequeues);
@@ -159,10 +157,7 @@ static void *thr_enqueuer(void *_count)
 
 	rcu_register_thread();
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	for (;;) {
 		struct test *node = malloc(sizeof(*node));
@@ -261,10 +256,7 @@ static void *thr_dequeuer(void *_count)
 
 	rcu_register_thread();
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	urcu_posix_assert(test_pop || test_pop_all);
 
@@ -459,7 +451,7 @@ int main(int argc, char **argv)
 
 	cmm_smp_mb();
 
-	test_go = 1;
+	begin_test();
 
 	for (i_thr = 0; i_thr < duration; i_thr++) {
 		sleep(1);
@@ -469,7 +461,7 @@ int main(int argc, char **argv)
 		}
 	}
 
-	test_stop = 1;
+	end_test();
 
 	for (i_thr = 0; i_thr < nr_enqueuers; i_thr++) {
 		err = pthread_join(tid_enqueuer[i_thr], &tret);
diff --git a/tests/benchmark/test_urcu_lfs_rcu.c b/tests/benchmark/test_urcu_lfs_rcu.c
index 7975faf..ae3dff4 100644
--- a/tests/benchmark/test_urcu_lfs_rcu.c
+++ b/tests/benchmark/test_urcu_lfs_rcu.c
@@ -51,8 +51,6 @@
 
 #include <urcu/cds.h>
 
-static volatile int test_go, test_stop;
-
 static unsigned long rduration;
 
 static unsigned long duration;
@@ -114,12 +112,12 @@ static void set_affinity(void)
  */
 static int test_duration_dequeue(void)
 {
-	return !test_stop;
+	return test_duration_read();
 }
 
 static int test_duration_enqueue(void)
 {
-	return !test_stop;
+	return test_duration_write();
 }
 
 static DEFINE_URCU_TLS(unsigned long long, nr_dequeues);
@@ -150,10 +148,7 @@ void *thr_enqueuer(void *_count)
 
 	rcu_register_thread();
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	for (;;) {
 		struct test *node = malloc(sizeof(*node));
@@ -205,10 +200,7 @@ void *thr_dequeuer(void *_count)
 
 	rcu_register_thread();
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	for (;;) {
 		struct cds_lfs_node_rcu *snode;
@@ -377,7 +369,7 @@ int main(int argc, char **argv)
 
 	cmm_smp_mb();
 
-	test_go = 1;
+	begin_test();
 
 	for (i_thr = 0; i_thr < duration; i_thr++) {
 		sleep(1);
@@ -387,7 +379,7 @@ int main(int argc, char **argv)
 		}
 	}
 
-	test_stop = 1;
+	end_test();
 
 	for (i_thr = 0; i_thr < nr_enqueuers; i_thr++) {
 		err = pthread_join(tid_enqueuer[i_thr], &tret);
diff --git a/tests/benchmark/test_urcu_qsbr.c b/tests/benchmark/test_urcu_qsbr.c
index 1ea369c..295e9db 100644
--- a/tests/benchmark/test_urcu_qsbr.c
+++ b/tests/benchmark/test_urcu_qsbr.c
@@ -44,8 +44,6 @@
 #endif
 #include "urcu-qsbr.h"
 
-static volatile int test_go, test_stop;
-
 static unsigned long wdelay;
 
 static int *test_rcu_pointer;
@@ -106,19 +104,6 @@ static void set_affinity(void)
 #endif /* HAVE_SCHED_SETAFFINITY */
 }
 
-/*
- * returns 0 if test should end.
- */
-static int test_duration_write(void)
-{
-	return !test_stop;
-}
-
-static int test_duration_read(void)
-{
-	return !test_stop;
-}
-
 static DEFINE_URCU_TLS(unsigned long long, nr_writes);
 static DEFINE_URCU_TLS(unsigned long long, nr_reads);
 
@@ -145,10 +130,7 @@ void *thr_reader(void *_count)
 	urcu_posix_assert(!rcu_read_ongoing());
 	rcu_thread_online();
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	for (;;) {
 		rcu_read_lock();
@@ -192,10 +174,7 @@ void *thr_writer(void *_count)
 
 	set_affinity();
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	for (;;) {
 		new = malloc(sizeof(int));
@@ -343,13 +322,7 @@ int main(int argc, char **argv)
 			exit(1);
 	}
 
-	cmm_smp_mb();
-
-	test_go = 1;
-
-	sleep(duration);
-
-	test_stop = 1;
+	test_for(duration);
 
 	for (i_thr = 0; i_thr < nr_readers; i_thr++) {
 		err = pthread_join(tid_reader[i_thr], &tret);
diff --git a/tests/benchmark/test_urcu_qsbr_gc.c b/tests/benchmark/test_urcu_qsbr_gc.c
index 8877a82..163405d 100644
--- a/tests/benchmark/test_urcu_qsbr_gc.c
+++ b/tests/benchmark/test_urcu_qsbr_gc.c
@@ -33,6 +33,7 @@
 #include <urcu/arch.h>
 #include <urcu/assert.h>
 #include <urcu/tls-compat.h>
+#include <urcu/uatomic.h>
 #include "thread-id.h"
 #include "../common/debug-yield.h"
 
@@ -46,8 +47,6 @@ struct test_array {
 	int a;
 };
 
-static volatile int test_go, test_stop;
-
 static unsigned long wdelay;
 
 static struct test_array *test_rcu_pointer;
@@ -118,19 +117,6 @@ static void set_affinity(void)
 #endif /* HAVE_SCHED_SETAFFINITY */
 }
 
-/*
- * returns 0 if test should end.
- */
-static int test_duration_write(void)
-{
-	return !test_stop;
-}
-
-static int test_duration_read(void)
-{
-	return !test_stop;
-}
-
 static DEFINE_URCU_TLS(unsigned long long, nr_writes);
 static DEFINE_URCU_TLS(unsigned long long, nr_reads);
 
@@ -154,10 +140,7 @@ void *thr_reader(void *_count)
 
 	rcu_register_thread();
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	for (;;) {
 		_rcu_read_lock();
@@ -231,10 +214,7 @@ void *thr_writer(void *data)
 
 	set_affinity();
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	for (;;) {
 #ifndef TEST_LOCAL_GC
@@ -399,13 +379,7 @@ int main(int argc, char **argv)
 			exit(1);
 	}
 
-	cmm_smp_mb();
-
-	test_go = 1;
-
-	sleep(duration);
-
-	test_stop = 1;
+	test_for(duration);
 
 	for (i_thr = 0; i_thr < nr_readers; i_thr++) {
 		err = pthread_join(tid_reader[i_thr], &tret);
diff --git a/tests/benchmark/test_urcu_wfcq.c b/tests/benchmark/test_urcu_wfcq.c
index 2c6e0fd..542a13a 100644
--- a/tests/benchmark/test_urcu_wfcq.c
+++ b/tests/benchmark/test_urcu_wfcq.c
@@ -56,7 +56,7 @@ static enum test_sync test_sync;
 
 static int test_force_sync;
 
-static volatile int test_go, test_stop_enqueue, test_stop_dequeue;
+static volatile int test_stop_enqueue, test_stop_dequeue;
 
 static unsigned long rduration;
 
@@ -122,12 +122,12 @@ static void set_affinity(void)
  */
 static int test_duration_dequeue(void)
 {
-	return !test_stop_dequeue;
+	return !uatomic_load(&test_stop_dequeue, CMM_RELAXED);
 }
 
 static int test_duration_enqueue(void)
 {
-	return !test_stop_enqueue;
+	return !uatomic_load(&test_stop_enqueue, CMM_RELAXED);
 }
 
 static DEFINE_URCU_TLS(unsigned long long, nr_dequeues);
@@ -155,10 +155,7 @@ static void *thr_enqueuer(void *_count)
 
 	set_affinity();
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	for (;;) {
 		struct cds_wfcq_node *node = malloc(sizeof(*node));
@@ -266,10 +263,7 @@ static void *thr_dequeuer(void *_count)
 
 	set_affinity();
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	for (;;) {
 		if (test_dequeue && test_splice) {
@@ -482,7 +476,7 @@ int main(int argc, char **argv)
 
 	cmm_smp_mb();
 
-	test_go = 1;
+	begin_test();
 
 	for (i_thr = 0; i_thr < duration; i_thr++) {
 		sleep(1);
@@ -492,7 +486,7 @@ int main(int argc, char **argv)
 		}
 	}
 
-	test_stop_enqueue = 1;
+	uatomic_store(&test_stop_enqueue, 1, CMM_RELEASE);
 
 	if (test_wait_empty) {
 		while (nr_enqueuers != uatomic_read(&test_enqueue_stopped)) {
@@ -503,7 +497,7 @@ int main(int argc, char **argv)
 		}
 	}
 
-	test_stop_dequeue = 1;
+	uatomic_store(&test_stop_dequeue, 1, CMM_RELAXED);
 
 	for (i_thr = 0; i_thr < nr_enqueuers; i_thr++) {
 		err = pthread_join(tid_enqueuer[i_thr], &tret);
diff --git a/tests/benchmark/test_urcu_wfq.c b/tests/benchmark/test_urcu_wfq.c
index 8381160..2d8de87 100644
--- a/tests/benchmark/test_urcu_wfq.c
+++ b/tests/benchmark/test_urcu_wfq.c
@@ -51,8 +51,6 @@
 #include <urcu.h>
 #include <urcu/wfqueue.h>
 
-static volatile int test_go, test_stop;
-
 static unsigned long rduration;
 
 static unsigned long duration;
@@ -114,12 +112,12 @@ static void set_affinity(void)
  */
 static int test_duration_dequeue(void)
 {
-	return !test_stop;
+	return test_duration_read();
 }
 
 static int test_duration_enqueue(void)
 {
-	return !test_stop;
+	return test_duration_write();
 }
 
 static DEFINE_URCU_TLS(unsigned long long, nr_dequeues);
@@ -143,10 +141,7 @@ void *thr_enqueuer(void *_count)
 
 	set_affinity();
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	for (;;) {
 		struct cds_wfq_node *node = malloc(sizeof(*node));
@@ -185,10 +180,7 @@ void *thr_dequeuer(void *_count)
 
 	set_affinity();
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	for (;;) {
 		struct cds_wfq_node *node = cds_wfq_dequeue_blocking(&q);
@@ -343,7 +335,7 @@ int main(int argc, char **argv)
 
 	cmm_smp_mb();
 
-	test_go = 1;
+	begin_test();
 
 	for (i_thr = 0; i_thr < duration; i_thr++) {
 		sleep(1);
@@ -353,7 +345,7 @@ int main(int argc, char **argv)
 		}
 	}
 
-	test_stop = 1;
+	end_test();
 
 	for (i_thr = 0; i_thr < nr_enqueuers; i_thr++) {
 		err = pthread_join(tid_enqueuer[i_thr], &tret);
diff --git a/tests/benchmark/test_urcu_wfs.c b/tests/benchmark/test_urcu_wfs.c
index c285feb..d1a4afc 100644
--- a/tests/benchmark/test_urcu_wfs.c
+++ b/tests/benchmark/test_urcu_wfs.c
@@ -59,7 +59,7 @@ static enum test_sync test_sync;
 
 static int test_force_sync;
 
-static volatile int test_go, test_stop_enqueue, test_stop_dequeue;
+static volatile int test_stop_enqueue, test_stop_dequeue;
 
 static unsigned long rduration;
 
@@ -125,12 +125,12 @@ static void set_affinity(void)
  */
 static int test_duration_dequeue(void)
 {
-	return !test_stop_dequeue;
+	return !uatomic_load(&test_stop_dequeue, CMM_RELAXED);
 }
 
 static int test_duration_enqueue(void)
 {
-	return !test_stop_enqueue;
+	return !uatomic_load(&test_stop_enqueue, CMM_RELAXED);
 }
 
 static DEFINE_URCU_TLS(unsigned long long, nr_dequeues);
@@ -157,10 +157,7 @@ static void *thr_enqueuer(void *_count)
 
 	set_affinity();
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	for (;;) {
 		struct cds_wfs_node *node = malloc(sizeof(*node));
@@ -250,10 +247,7 @@ static void *thr_dequeuer(void *_count)
 
 	set_affinity();
 
-	while (!test_go)
-	{
-	}
-	cmm_smp_mb();
+	wait_until_go();
 
 	urcu_posix_assert(test_pop || test_pop_all);
 
@@ -469,7 +463,7 @@ int main(int argc, char **argv)
 
 	cmm_smp_mb();
 
-	test_go = 1;
+	begin_test();
 
 	for (i_thr = 0; i_thr < duration; i_thr++) {
 		sleep(1);
@@ -479,7 +473,7 @@ int main(int argc, char **argv)
 		}
 	}
 
-	test_stop_enqueue = 1;
+	uatomic_store(&test_stop_enqueue, 1, CMM_RELEASE);
 
 	if (test_wait_empty) {
 		while (nr_enqueuers != uatomic_read(&test_enqueue_stopped)) {
@@ -490,7 +484,7 @@ int main(int argc, char **argv)
 		}
 	}
 
-	test_stop_dequeue = 1;
+	uatomic_store(&test_stop_dequeue, 1, CMM_RELAXED);
 
 	for (i_thr = 0; i_thr < nr_enqueuers; i_thr++) {
 		err = pthread_join(tid_enqueuer[i_thr], &tret);
-- 
2.40.1

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [lttng-dev] [PATCH v2 09/12] tests/unit/test_build: Quiet unused return value
  2023-05-15 20:17 [lttng-dev] [PATCH 00/11] Add support for TSAN to liburcu Olivier Dion via lttng-dev
                   ` (20 preceding siblings ...)
  2023-06-07 18:53 ` [lttng-dev] [PATCH v2 08/12] benchmark: " Olivier Dion via lttng-dev
@ 2023-06-07 18:53 ` Olivier Dion via lttng-dev
  2023-06-07 18:53 ` [lttng-dev] [PATCH v2 10/12] urcu/annotate: Add CMM annotation Olivier Dion via lttng-dev
                   ` (2 subsequent siblings)
  24 siblings, 0 replies; 69+ messages in thread
From: Olivier Dion via lttng-dev @ 2023-06-07 18:53 UTC (permalink / raw)
  To: lttng-dev; +Cc: Olivier Dion, Tony Finch, Paul E. McKenney

Change-Id: Ie5a18e0ccc4b1b5ee85c5bd140561cc2ff9e2fbc
Co-authored-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Olivier Dion <odion@efficios.com>
---
 tests/unit/test_build.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/tests/unit/test_build.c b/tests/unit/test_build.c
index f6b667c..702c1ef 100644
--- a/tests/unit/test_build.c
+++ b/tests/unit/test_build.c
@@ -129,10 +129,10 @@ void test_build_rcu_dereference(void)
 	static struct a_clear_struct *clear = NULL;
 	static struct a_clear_struct *const clear_const = NULL;
 
-	rcu_dereference(opaque);
-	rcu_dereference(opaque_const);
-	rcu_dereference(clear);
-	rcu_dereference(clear_const);
+	(void) rcu_dereference(opaque);
+	(void) rcu_dereference(opaque_const);
+	(void) rcu_dereference(clear);
+	(void) rcu_dereference(clear_const);
 }
 
 int main(void)
-- 
2.40.1

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [lttng-dev] [PATCH v2 10/12] urcu/annotate: Add CMM annotation
  2023-05-15 20:17 [lttng-dev] [PATCH 00/11] Add support for TSAN to liburcu Olivier Dion via lttng-dev
                   ` (21 preceding siblings ...)
  2023-06-07 18:53 ` [lttng-dev] [PATCH v2 09/12] tests/unit/test_build: Quiet unused return value Olivier Dion via lttng-dev
@ 2023-06-07 18:53 ` Olivier Dion via lttng-dev
  2023-06-07 18:53 ` [lttng-dev] [PATCH v2 11/12] Add cmm_emit_legacy_smp_mb() Olivier Dion via lttng-dev
  2023-06-07 18:53 ` [lttng-dev] [PATCH v2 12/12] tests: Add tests for checking race conditions Olivier Dion via lttng-dev
  24 siblings, 0 replies; 69+ messages in thread
From: Olivier Dion via lttng-dev @ 2023-06-07 18:53 UTC (permalink / raw)
  To: lttng-dev; +Cc: Olivier Dion, Tony Finch, Paul E. McKenney

The CMM annotation is highly experimental and not meant to be used by
user for now, even though it is exposed in the public API since some
parts of the liburcu public API require those annotations.

The main primitive is the cmm_annotate_t which denotes a group of memory
operations associated with a memory barrier. A group follows a state
machine, starting from the `CMM_ANNOTATE_VOID' state. The following are
the only valid transitions:

  CMM_ANNOTATE_VOID -> CMM_ANNOTATE_MB (acquire & release MB)
  CMM_ANNOTATE_VOID -> CMM_ANNOTATE_LOAD (acquire memory)
  CMM_ANNOTATE_LOAD -> CMM_ANNOTATE_MB (acquire MB)

The macro `cmm_annotate_define(name)' can be used to create an
annotation object on the stack. The rest of the `cmm_annotate_*' macros
can be used to change the state of the group after validating that the
transition is allowed. Some of these macros also inject TSAN annotations
to help it understand the flow of events in the program since it does
not currently support thread fence.

Sometime, a single memory access does not need to be associated with a
group. In the case, the acquire/release macros variant without the
`group' infix can be used to annotate memory accesses.

Note that TSAN can not be used on the liburcu-signal flavor. This is
because TSAN hijacks calls to sigaction(3) and places its own handler
that will deliver the signal to the application at a synchronization
point.

Thus, the usage of TSAN on the signal flavor is undefined
behavior. However, there's at least one known behavior which is a
deadlock between readers that want to unregister them-self by locking
the `rcu_registry_lock' while a synchronize RCU is made on the writer
side which has already locked that mutex until all the registered
readers execute a memory barrier in a signal handler defined by
liburcu-signal. However, TSAN will not call the registered handler while
waiting on the mutex. Therefore, the writer spin infinitely on
pthread_kill(3p) because the reader simply never complete the handshake.

See the deadlock minimal reproducer below.

Deadlock reproducer:
```
#include <poll.h>
#include <signal.h>

#include <pthread.h>

#define SIGURCU SIGUSR1

static pthread_mutex_t rcu_registry_lock = PTHREAD_MUTEX_INITIALIZER;
static int need_mb = 0;

static void *reader_side(void *nil)
{
	(void) nil;

	pthread_mutex_lock(&rcu_registry_lock);
	pthread_mutex_unlock(&rcu_registry_lock);

	return NULL;
}

static void writer_side(pthread_t reader)
{
	__atomic_store_n(&need_mb, 1, __ATOMIC_RELEASE);
	while (__atomic_load_n(&need_mb, __ATOMIC_ACQUIRE)) {
		pthread_kill(reader, SIGURCU);
		(void) poll(NULL, 0, 1);
	}
	pthread_mutex_unlock(&rcu_registry_lock);

	pthread_join(reader, NULL);
}

static void sigrcu_handler(int signo, siginfo_t *siginfo, void *context)
{
	(void) signo;
	(void) siginfo;
	(void) context;

	__atomic_store_n(&need_mb, 0, __ATOMIC_SEQ_CST);
}

static void install_signal(void)
{
	struct sigaction act;

	act.sa_sigaction = sigrcu_handler;
	act.sa_flags     = SA_SIGINFO | SA_RESTART;

	sigemptyset(&act.sa_mask);

	(void) sigaction(SIGURCU, &act, NULL);
}

int main(void)
{
	pthread_t th;

	install_signal();

	pthread_mutex_lock(&rcu_registry_lock);
	pthread_create(&th, NULL, reader_side, NULL);

	writer_side(th);

	return 0;
}
```

Change-Id: I9c234bb311cc0f82ea9dbefdf4fee07047ab93f9
Co-authored-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Olivier Dion <odion@efficios.com>
---
 include/Makefile.am               |   1 +
 include/urcu/annotate.h           | 174 ++++++++++++++++++++++++++++++
 include/urcu/arch/generic.h       |  33 +++++-
 include/urcu/compiler.h           |  14 +++
 include/urcu/static/urcu-bp.h     |  12 ++-
 include/urcu/static/urcu-common.h |   8 +-
 include/urcu/static/urcu-mb.h     |  11 +-
 include/urcu/static/urcu-memb.h   |  26 +++--
 include/urcu/static/urcu-qsbr.h   |  29 +++--
 src/rculfhash.c                   |  92 ++++++++++------
 src/urcu-bp.c                     |  17 ++-
 src/urcu-qsbr.c                   |  31 ++++--
 src/urcu-wait.h                   |   9 +-
 src/urcu.c                        |  24 +++--
 14 files changed, 392 insertions(+), 89 deletions(-)
 create mode 100644 include/urcu/annotate.h

diff --git a/include/Makefile.am b/include/Makefile.am
index b20e56d..24151c1 100644
--- a/include/Makefile.am
+++ b/include/Makefile.am
@@ -1,4 +1,5 @@
 nobase_include_HEADERS = \
+	urcu/annotate.h \
 	urcu/arch/aarch64.h \
 	urcu/arch/alpha.h \
 	urcu/arch/arm.h \
diff --git a/include/urcu/annotate.h b/include/urcu/annotate.h
new file mode 100644
index 0000000..37e7f03
--- /dev/null
+++ b/include/urcu/annotate.h
@@ -0,0 +1,174 @@
+/*
+ * urcu/annotate.h
+ *
+ * Userspace RCU - annotation header.
+ *
+ * Copyright 2023 - Olivier Dion <odion@efficios.com>
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+/*
+ * WARNING!
+ *
+ * This API is highly experimental. There is zero guarantees of stability
+ * between releases.
+ *
+ * You have been warned.
+ */
+#ifndef _URCU_ANNOTATE_H
+#define _URCU_ANNOTATE_H
+
+#include <stdio.h>
+#include <stdlib.h>
+
+#include <urcu/compiler.h>
+
+enum cmm_annotate {
+	CMM_ANNOTATE_VOID,
+	CMM_ANNOTATE_LOAD,
+	CMM_ANNOTATE_STORE,
+	CMM_ANNOTATE_MB,
+};
+
+typedef enum cmm_annotate cmm_annotate_t __attribute__((unused));
+
+#define cmm_annotate_define(name)		\
+	cmm_annotate_t name = CMM_ANNOTATE_VOID
+
+#ifdef CMM_SANITIZE_THREAD
+
+# ifdef __cplusplus
+extern "C" {
+# endif
+extern void __tsan_acquire(void *);
+extern void __tsan_release(void *);
+# ifdef __cplusplus
+}
+# endif
+
+# define cmm_annotate_die(msg)						\
+	do {								\
+		fprintf(stderr,						\
+			"(" __FILE__ ":%s@%u) Annotation ERROR: %s\n",	\
+			__func__, __LINE__, msg);			\
+		abort();						\
+	} while (0)
+
+/* Only used for typechecking in macros. */
+static inline cmm_annotate_t cmm_annotate_dereference(cmm_annotate_t *group)
+{
+	return *group;
+}
+
+# define cmm_annotate_group_mb_acquire(group)				\
+	do {								\
+		switch (cmm_annotate_dereference(group)) {		\
+		case CMM_ANNOTATE_VOID:					\
+			break;						\
+		case CMM_ANNOTATE_LOAD:					\
+			break;						\
+		case CMM_ANNOTATE_STORE:				\
+			cmm_annotate_die("store for acquire group");	\
+			break;						\
+		case CMM_ANNOTATE_MB:					\
+			cmm_annotate_die(				\
+				"redundant mb for acquire group"	\
+					);				\
+			break;						\
+		}							\
+		*(group) = CMM_ANNOTATE_MB;				\
+	} while (0)
+
+# define cmm_annotate_group_mb_release(group)				\
+	do {								\
+		switch (cmm_annotate_dereference(group)) {		\
+		case CMM_ANNOTATE_VOID:					\
+			break;						\
+		case CMM_ANNOTATE_LOAD:					\
+			cmm_annotate_die("load before release group");	\
+			break;						\
+		case CMM_ANNOTATE_STORE:				\
+			cmm_annotate_die(				\
+				"store before release group"		\
+					);				\
+			break;						\
+		case CMM_ANNOTATE_MB:					\
+			cmm_annotate_die(				\
+				"redundant mb of release group"		\
+					);				\
+			break;						\
+		}							\
+		*(group) = CMM_ANNOTATE_MB;				\
+	} while (0)
+
+# define cmm_annotate_group_mem_acquire(group, mem)			\
+	do {								\
+		__tsan_acquire((void*)(mem));				\
+		switch (cmm_annotate_dereference(group)) {		\
+		case CMM_ANNOTATE_VOID:					\
+			*(group) = CMM_ANNOTATE_LOAD;			\
+			break;						\
+		case CMM_ANNOTATE_MB:					\
+			cmm_annotate_die(				\
+				"load after mb for acquire group"	\
+					);				\
+			break;						\
+		default:						\
+			break;						\
+		}							\
+	} while (0)
+
+# define cmm_annotate_group_mem_release(group, mem)		\
+	do {							\
+		__tsan_release((void*)(mem));			\
+		switch (cmm_annotate_dereference(group)) {	\
+		case CMM_ANNOTATE_MB:				\
+			break;					\
+		default:					\
+			cmm_annotate_die(			\
+				"missing mb for release group"	\
+					);			\
+		}						\
+	} while (0)
+
+# define cmm_annotate_mem_acquire(mem)		\
+	__tsan_acquire((void*)(mem))
+
+# define cmm_annotate_mem_release(mem)		\
+	__tsan_release((void*)(mem))
+#else
+
+# define cmm_annotate_group_mb_acquire(group)	\
+	(void) (group)
+
+# define cmm_annotate_group_mb_release(group)	\
+	(void) (group)
+
+# define cmm_annotate_group_mem_acquire(group, mem)	\
+	(void) (group)
+
+# define cmm_annotate_group_mem_release(group, mem)	\
+	(void) (group)
+
+# define cmm_annotate_mem_acquire(mem)		\
+	do { } while (0)
+
+# define cmm_annotate_mem_release(mem)		\
+	do { } while (0)
+
+#endif  /* CMM_SANITIZE_THREAD */
+
+#endif	/* _URCU_ANNOTATE_H */
diff --git a/include/urcu/arch/generic.h b/include/urcu/arch/generic.h
index e292c70..65dedf2 100644
--- a/include/urcu/arch/generic.h
+++ b/include/urcu/arch/generic.h
@@ -45,9 +45,38 @@ extern "C" {
 
 #ifdef CONFIG_RCU_USE_ATOMIC_BUILTINS
 
+# ifdef CMM_SANITIZE_THREAD
+/*
+ * This makes TSAN quiet about unsupported thread fence.
+ */
+static inline void _cmm_thread_fence_wrapper(void)
+{
+#   if defined(__clang__)
+#    pragma clang diagnostic push
+#    pragma clang diagnostic ignored "-Wpragmas"
+#    pragma clang diagnostic ignored "-Wunknown-warning-option"
+#    pragma clang diagnostic ignored "-Wtsan"
+#   elif defined(__GNUC__)
+#    pragma GCC diagnostic push
+#    pragma GCC diagnostic ignored "-Wpragmas"
+#    pragma GCC diagnostic ignored "-Wtsan"
+#   endif
+	__atomic_thread_fence(__ATOMIC_SEQ_CST);
+#   if defined(__clang__)
+#    pragma clang diagnostic pop
+#   elif defined(__GNUC__)
+#    pragma GCC diagnostic pop
+#   endif
+}
+# endif	 /* CMM_SANITIZE_THREAD */
+
 # ifndef cmm_smp_mb
-#  define cmm_smp_mb() __atomic_thread_fence(__ATOMIC_SEQ_CST)
-# endif
+#  ifdef CMM_SANITIZE_THREAD
+#   define cmm_smp_mb() _cmm_thread_fence_wrapper()
+#  else
+#   define cmm_smp_mb() __atomic_thread_fence(__ATOMIC_SEQ_CST)
+#  endif /* CMM_SANITIZE_THREAD */
+# endif /* !cmm_smp_mb */
 
 #endif	/* CONFIG_RCU_USE_ATOMIC_BUILTINS */
 
diff --git a/include/urcu/compiler.h b/include/urcu/compiler.h
index 3604488..b0c7850 100644
--- a/include/urcu/compiler.h
+++ b/include/urcu/compiler.h
@@ -129,4 +129,18 @@
 				+ __GNUC_PATCHLEVEL__)
 #endif
 
+/*
+ * Allow user to manually define CMM_SANITIZE_THREAD if their toolchain is not
+ * supported by this check.
+ */
+#ifndef CMM_SANITIZE_THREAD
+# if defined(__GNUC__) && defined(__SANITIZE_THREAD__)
+#  define CMM_SANITIZE_THREAD
+# elif defined(__clang__) && defined(__has_feature)
+#  if __has_feature(thread_sanitizer)
+#   define CMM_SANITIZE_THREAD
+#  endif
+# endif
+#endif	/* !CMM_SANITIZE_THREAD */
+
 #endif /* _URCU_COMPILER_H */
diff --git a/include/urcu/static/urcu-bp.h b/include/urcu/static/urcu-bp.h
index 8ba3830..3e14ef7 100644
--- a/include/urcu/static/urcu-bp.h
+++ b/include/urcu/static/urcu-bp.h
@@ -33,6 +33,7 @@
 #include <pthread.h>
 #include <unistd.h>
 
+#include <urcu/annotate.h>
 #include <urcu/debug.h>
 #include <urcu/config.h>
 #include <urcu/compiler.h>
@@ -117,7 +118,8 @@ static inline void urcu_bp_smp_mb_slave(void)
 		cmm_smp_mb();
 }
 
-static inline enum urcu_bp_state urcu_bp_reader_state(unsigned long *ctr)
+static inline enum urcu_bp_state urcu_bp_reader_state(unsigned long *ctr,
+						cmm_annotate_t *group)
 {
 	unsigned long v;
 
@@ -127,7 +129,9 @@ static inline enum urcu_bp_state urcu_bp_reader_state(unsigned long *ctr)
 	 * Make sure both tests below are done on the same version of *value
 	 * to insure consistency.
 	 */
-	v = CMM_LOAD_SHARED(*ctr);
+	v = uatomic_load(ctr, CMM_RELAXED);
+	cmm_annotate_group_mem_acquire(group, ctr);
+
 	if (!(v & URCU_BP_GP_CTR_NEST_MASK))
 		return URCU_BP_READER_INACTIVE;
 	if (!((v ^ urcu_bp_gp.ctr) & URCU_BP_GP_CTR_PHASE))
@@ -181,12 +185,14 @@ static inline void _urcu_bp_read_lock(void)
 static inline void _urcu_bp_read_unlock(void)
 {
 	unsigned long tmp;
+	unsigned long *ctr = &URCU_TLS(urcu_bp_reader)->ctr;
 
 	tmp = URCU_TLS(urcu_bp_reader)->ctr;
 	urcu_assert_debug(tmp & URCU_BP_GP_CTR_NEST_MASK);
 	/* Finish using rcu before decrementing the pointer. */
 	urcu_bp_smp_mb_slave();
-	_CMM_STORE_SHARED(URCU_TLS(urcu_bp_reader)->ctr, tmp - URCU_BP_GP_COUNT);
+	cmm_annotate_mem_release(ctr);
+	uatomic_store(ctr, tmp - URCU_BP_GP_COUNT, CMM_RELAXED);
 	cmm_barrier();	/* Ensure the compiler does not reorder us with mutex */
 }
 
diff --git a/include/urcu/static/urcu-common.h b/include/urcu/static/urcu-common.h
index 60ea8b8..32cb834 100644
--- a/include/urcu/static/urcu-common.h
+++ b/include/urcu/static/urcu-common.h
@@ -34,6 +34,7 @@
 #include <unistd.h>
 #include <stdint.h>
 
+#include <urcu/annotate.h>
 #include <urcu/config.h>
 #include <urcu/compiler.h>
 #include <urcu/arch.h>
@@ -105,7 +106,8 @@ static inline void urcu_common_wake_up_gp(struct urcu_gp *gp)
 }
 
 static inline enum urcu_state urcu_common_reader_state(struct urcu_gp *gp,
-		unsigned long *ctr)
+						unsigned long *ctr,
+						cmm_annotate_t *group)
 {
 	unsigned long v;
 
@@ -113,7 +115,9 @@ static inline enum urcu_state urcu_common_reader_state(struct urcu_gp *gp,
 	 * Make sure both tests below are done on the same version of *value
 	 * to insure consistency.
 	 */
-	v = CMM_LOAD_SHARED(*ctr);
+	v = uatomic_load(ctr, CMM_RELAXED);
+	cmm_annotate_group_mem_acquire(group, ctr);
+
 	if (!(v & URCU_GP_CTR_NEST_MASK))
 		return URCU_READER_INACTIVE;
 	if (!((v ^ gp->ctr) & URCU_GP_CTR_PHASE))
diff --git a/include/urcu/static/urcu-mb.h b/include/urcu/static/urcu-mb.h
index b97e42a..5bf7933 100644
--- a/include/urcu/static/urcu-mb.h
+++ b/include/urcu/static/urcu-mb.h
@@ -108,13 +108,14 @@ static inline void _urcu_mb_read_lock(void)
  */
 static inline void _urcu_mb_read_unlock_update_and_wakeup(unsigned long tmp)
 {
+	unsigned long *ctr = &URCU_TLS(urcu_mb_reader).ctr;
+
 	if (caa_likely((tmp & URCU_GP_CTR_NEST_MASK) == URCU_GP_COUNT)) {
-		cmm_smp_mb();
-		_CMM_STORE_SHARED(URCU_TLS(urcu_mb_reader).ctr, tmp - URCU_GP_COUNT);
-		cmm_smp_mb();
+		uatomic_store(ctr, tmp - URCU_GP_COUNT, CMM_SEQ_CST);
 		urcu_common_wake_up_gp(&urcu_mb_gp);
-	} else
-		_CMM_STORE_SHARED(URCU_TLS(urcu_mb_reader).ctr, tmp - URCU_GP_COUNT);
+	} else {
+		uatomic_store(ctr, tmp - URCU_GP_COUNT, CMM_RELAXED);
+	}
 }
 
 /*
diff --git a/include/urcu/static/urcu-memb.h b/include/urcu/static/urcu-memb.h
index c8d102f..8191ccc 100644
--- a/include/urcu/static/urcu-memb.h
+++ b/include/urcu/static/urcu-memb.h
@@ -34,6 +34,7 @@
 #include <unistd.h>
 #include <stdint.h>
 
+#include <urcu/annotate.h>
 #include <urcu/debug.h>
 #include <urcu/config.h>
 #include <urcu/compiler.h>
@@ -93,11 +94,20 @@ extern DECLARE_URCU_TLS(struct urcu_reader, urcu_memb_reader);
  */
 static inline void _urcu_memb_read_lock_update(unsigned long tmp)
 {
+	unsigned long *ctr = &URCU_TLS(urcu_memb_reader).ctr;
+
 	if (caa_likely(!(tmp & URCU_GP_CTR_NEST_MASK))) {
-		_CMM_STORE_SHARED(URCU_TLS(urcu_memb_reader).ctr, _CMM_LOAD_SHARED(urcu_memb_gp.ctr));
+		unsigned long *pgctr = &urcu_memb_gp.ctr;
+		unsigned long gctr = uatomic_load(pgctr, CMM_RELAXED);
+
+		/* Paired with following mb slave. */
+		cmm_annotate_mem_acquire(pgctr);
+		uatomic_store(ctr, gctr, CMM_RELAXED);
+
 		urcu_memb_smp_mb_slave();
-	} else
-		_CMM_STORE_SHARED(URCU_TLS(urcu_memb_reader).ctr, tmp + URCU_GP_COUNT);
+	} else {
+		uatomic_store(ctr, tmp + URCU_GP_COUNT, CMM_RELAXED);
+	}
 }
 
 /*
@@ -131,13 +141,17 @@ static inline void _urcu_memb_read_lock(void)
  */
 static inline void _urcu_memb_read_unlock_update_and_wakeup(unsigned long tmp)
 {
+	unsigned long *ctr = &URCU_TLS(urcu_memb_reader).ctr;
+
 	if (caa_likely((tmp & URCU_GP_CTR_NEST_MASK) == URCU_GP_COUNT)) {
 		urcu_memb_smp_mb_slave();
-		_CMM_STORE_SHARED(URCU_TLS(urcu_memb_reader).ctr, tmp - URCU_GP_COUNT);
+		cmm_annotate_mem_release(ctr);
+		uatomic_store(ctr, tmp - URCU_GP_COUNT, CMM_RELAXED);
 		urcu_memb_smp_mb_slave();
 		urcu_common_wake_up_gp(&urcu_memb_gp);
-	} else
-		_CMM_STORE_SHARED(URCU_TLS(urcu_memb_reader).ctr, tmp - URCU_GP_COUNT);
+	} else {
+		uatomic_store(ctr, tmp - URCU_GP_COUNT, CMM_RELAXED);
+	}
 }
 
 /*
diff --git a/include/urcu/static/urcu-qsbr.h b/include/urcu/static/urcu-qsbr.h
index b878877..864cbcf 100644
--- a/include/urcu/static/urcu-qsbr.h
+++ b/include/urcu/static/urcu-qsbr.h
@@ -35,6 +35,7 @@
 #include <unistd.h>
 #include <stdint.h>
 
+#include <urcu/annotate.h>
 #include <urcu/debug.h>
 #include <urcu/compiler.h>
 #include <urcu/arch.h>
@@ -96,11 +97,14 @@ static inline void urcu_qsbr_wake_up_gp(void)
 	}
 }
 
-static inline enum urcu_state urcu_qsbr_reader_state(unsigned long *ctr)
+static inline enum urcu_state urcu_qsbr_reader_state(unsigned long *ctr,
+						cmm_annotate_t *group)
 {
 	unsigned long v;
 
-	v = CMM_LOAD_SHARED(*ctr);
+	v = uatomic_load(ctr, CMM_RELAXED);
+	cmm_annotate_group_mem_acquire(group, ctr);
+
 	if (!v)
 		return URCU_READER_INACTIVE;
 	if (v == urcu_qsbr_gp.ctr)
@@ -155,9 +159,9 @@ static inline int _urcu_qsbr_read_ongoing(void)
  */
 static inline void _urcu_qsbr_quiescent_state_update_and_wakeup(unsigned long gp_ctr)
 {
-	cmm_smp_mb();
-	_CMM_STORE_SHARED(URCU_TLS(urcu_qsbr_reader).ctr, gp_ctr);
-	cmm_smp_mb();	/* write URCU_TLS(urcu_qsbr_reader).ctr before read futex */
+	uatomic_store(&URCU_TLS(urcu_qsbr_reader).ctr, gp_ctr, CMM_SEQ_CST);
+
+	/* write URCU_TLS(urcu_qsbr_reader).ctr before read futex */
 	urcu_qsbr_wake_up_gp();
 	cmm_smp_mb();
 }
@@ -179,7 +183,8 @@ static inline void _urcu_qsbr_quiescent_state(void)
 	unsigned long gp_ctr;
 
 	urcu_assert_debug(URCU_TLS(urcu_qsbr_reader).registered);
-	if ((gp_ctr = CMM_LOAD_SHARED(urcu_qsbr_gp.ctr)) == URCU_TLS(urcu_qsbr_reader).ctr)
+	gp_ctr = uatomic_load(&urcu_qsbr_gp.ctr, CMM_RELAXED);
+	if (gp_ctr == URCU_TLS(urcu_qsbr_reader).ctr)
 		return;
 	_urcu_qsbr_quiescent_state_update_and_wakeup(gp_ctr);
 }
@@ -195,9 +200,8 @@ static inline void _urcu_qsbr_quiescent_state(void)
 static inline void _urcu_qsbr_thread_offline(void)
 {
 	urcu_assert_debug(URCU_TLS(urcu_qsbr_reader).registered);
-	cmm_smp_mb();
-	CMM_STORE_SHARED(URCU_TLS(urcu_qsbr_reader).ctr, 0);
-	cmm_smp_mb();	/* write URCU_TLS(urcu_qsbr_reader).ctr before read futex */
+	uatomic_store(&URCU_TLS(urcu_qsbr_reader).ctr, 0, CMM_SEQ_CST);
+	/* write URCU_TLS(urcu_qsbr_reader).ctr before read futex */
 	urcu_qsbr_wake_up_gp();
 	cmm_barrier();	/* Ensure the compiler does not reorder us with mutex */
 }
@@ -212,9 +216,14 @@ static inline void _urcu_qsbr_thread_offline(void)
  */
 static inline void _urcu_qsbr_thread_online(void)
 {
+	unsigned long *pctr = &URCU_TLS(urcu_qsbr_reader).ctr;
+	unsigned long ctr;
+
 	urcu_assert_debug(URCU_TLS(urcu_qsbr_reader).registered);
 	cmm_barrier();	/* Ensure the compiler does not reorder us with mutex */
-	_CMM_STORE_SHARED(URCU_TLS(urcu_qsbr_reader).ctr, CMM_LOAD_SHARED(urcu_qsbr_gp.ctr));
+	ctr = uatomic_load(&urcu_qsbr_gp.ctr, CMM_RELAXED);
+	cmm_annotate_mem_acquire(&urcu_qsbr_gp.ctr);
+	uatomic_store(pctr, ctr, CMM_RELAXED);
 	cmm_smp_mb();
 }
 
diff --git a/src/rculfhash.c b/src/rculfhash.c
index b456415..cdc2aee 100644
--- a/src/rculfhash.c
+++ b/src/rculfhash.c
@@ -623,9 +623,7 @@ static void mutex_lock(pthread_mutex_t *mutex)
 		if (ret != EBUSY && ret != EINTR)
 			urcu_die(ret);
 		if (CMM_LOAD_SHARED(URCU_TLS(rcu_reader).need_mb)) {
-			cmm_smp_mb();
-			_CMM_STORE_SHARED(URCU_TLS(rcu_reader).need_mb, 0);
-			cmm_smp_mb();
+			uatomic_store(&URCU_TLS(rcu_reader).need_mb, 0, CMM_SEQ_CST);
 		}
 		(void) poll(NULL, 0, 10);
 	}
@@ -883,8 +881,10 @@ unsigned long _uatomic_xchg_monotonic_increase(unsigned long *ptr,
 	old1 = uatomic_read(ptr);
 	do {
 		old2 = old1;
-		if (old2 >= v)
+		if (old2 >= v) {
+			cmm_smp_mb();
 			return old2;
+		}
 	} while ((old1 = uatomic_cmpxchg(ptr, old2, v)) != old2);
 	return old2;
 }
@@ -1190,15 +1190,17 @@ int _cds_lfht_del(struct cds_lfht *ht, unsigned long size,
 	/*
 	 * The del operation semantic guarantees a full memory barrier
 	 * before the uatomic_or atomic commit of the deletion flag.
-	 */
-	cmm_smp_mb__before_uatomic_or();
-	/*
+	 *
 	 * We set the REMOVED_FLAG unconditionally. Note that there may
 	 * be more than one concurrent thread setting this flag.
 	 * Knowing which wins the race will be known after the garbage
 	 * collection phase, stay tuned!
+	 *
+	 * NOTE: The cast is here because Clang says that address argument to
+	 * atomic operation must be a pointer to integer.
 	 */
-	uatomic_or(&node->next, REMOVED_FLAG);
+	uatomic_or_mo((uintptr_t*) &node->next, REMOVED_FLAG, CMM_RELEASE);
+
 	/* We performed the (logical) deletion. */
 
 	/*
@@ -1223,7 +1225,7 @@ int _cds_lfht_del(struct cds_lfht *ht, unsigned long size,
 	 * was already set).
 	 */
 	if (!is_removal_owner(uatomic_xchg(&node->next,
-			flag_removal_owner(node->next))))
+			flag_removal_owner(uatomic_load(&node->next, CMM_RELAXED)))))
 		return 0;
 	else
 		return -ENOENT;
@@ -1389,9 +1391,10 @@ void init_table(struct cds_lfht *ht,
 
 		/*
 		 * Update table size.
+		 *
+		 * Populate data before RCU size.
 		 */
-		cmm_smp_wmb();	/* populate data before RCU size */
-		CMM_STORE_SHARED(ht->size, 1UL << i);
+		uatomic_store(&ht->size, 1UL << i, CMM_RELEASE);
 
 		dbg_printf("init new size: %lu\n", 1UL << i);
 		if (CMM_LOAD_SHARED(ht->in_progress_destroy))
@@ -1440,8 +1443,12 @@ void remove_table_partition(struct cds_lfht *ht, unsigned long i,
 		urcu_posix_assert(j >= size && j < (size << 1));
 		dbg_printf("remove entry: order %lu index %lu hash %lu\n",
 			   i, j, j);
-		/* Set the REMOVED_FLAG to freeze the ->next for gc */
-		uatomic_or(&fini_bucket->next, REMOVED_FLAG);
+		/* Set the REMOVED_FLAG to freeze the ->next for gc.
+		 *
+		 * NOTE: The cast is here because Clang says that address
+		 * argument to atomic operation must be a pointer to integer.
+		 */
+		uatomic_or((uintptr_t*) &fini_bucket->next, REMOVED_FLAG);
 		_cds_lfht_gc_bucket(parent_bucket, fini_bucket);
 	}
 	ht->flavor->read_unlock();
@@ -1667,7 +1674,14 @@ void cds_lfht_lookup(struct cds_lfht *ht, unsigned long hash,
 
 	reverse_hash = bit_reverse_ulong(hash);
 
-	size = rcu_dereference(ht->size);
+	/*
+	 * Use load acquire instead of rcu_dereference because there is no
+	 * dependency between the table size and the dereference of the bucket
+	 * content.
+	 *
+	 * This acquire is paired with the store release in init_table().
+	 */
+	size = uatomic_load(&ht->size, CMM_ACQUIRE);
 	bucket = lookup_bucket(ht, size, hash);
 	/* We can always skip the bucket node initially */
 	node = rcu_dereference(bucket->next);
@@ -1726,7 +1740,7 @@ void cds_lfht_next_duplicate(struct cds_lfht *ht __attribute__((unused)),
 		}
 		node = clear_flag(next);
 	}
-	urcu_posix_assert(!node || !is_bucket(CMM_LOAD_SHARED(node->next)));
+	urcu_posix_assert(!node || !is_bucket(uatomic_load(&node->next, CMM_RELAXED)));
 	iter->node = node;
 	iter->next = next;
 }
@@ -1750,7 +1764,7 @@ void cds_lfht_next(struct cds_lfht *ht __attribute__((unused)),
 		}
 		node = clear_flag(next);
 	}
-	urcu_posix_assert(!node || !is_bucket(CMM_LOAD_SHARED(node->next)));
+	urcu_posix_assert(!node || !is_bucket(uatomic_load(&node->next, CMM_RELAXED)));
 	iter->node = node;
 	iter->next = next;
 }
@@ -1762,7 +1776,7 @@ void cds_lfht_first(struct cds_lfht *ht, struct cds_lfht_iter *iter)
 	 * Get next after first bucket node. The first bucket node is the
 	 * first node of the linked list.
 	 */
-	iter->next = bucket_at(ht, 0)->next;
+	iter->next = uatomic_load(&bucket_at(ht, 0)->next, CMM_CONSUME);
 	cds_lfht_next(ht, iter);
 }
 
@@ -1772,7 +1786,7 @@ void cds_lfht_add(struct cds_lfht *ht, unsigned long hash,
 	unsigned long size;
 
 	node->reverse_hash = bit_reverse_ulong(hash);
-	size = rcu_dereference(ht->size);
+	size = uatomic_load(&ht->size, CMM_ACQUIRE);
 	_cds_lfht_add(ht, hash, NULL, NULL, size, node, NULL, 0);
 	ht_count_add(ht, size, hash);
 }
@@ -1787,7 +1801,7 @@ struct cds_lfht_node *cds_lfht_add_unique(struct cds_lfht *ht,
 	struct cds_lfht_iter iter;
 
 	node->reverse_hash = bit_reverse_ulong(hash);
-	size = rcu_dereference(ht->size);
+	size = uatomic_load(&ht->size, CMM_ACQUIRE);
 	_cds_lfht_add(ht, hash, match, key, size, node, &iter, 0);
 	if (iter.node == node)
 		ht_count_add(ht, size, hash);
@@ -1804,7 +1818,7 @@ struct cds_lfht_node *cds_lfht_add_replace(struct cds_lfht *ht,
 	struct cds_lfht_iter iter;
 
 	node->reverse_hash = bit_reverse_ulong(hash);
-	size = rcu_dereference(ht->size);
+	size = uatomic_load(&ht->size, CMM_ACQUIRE);
 	for (;;) {
 		_cds_lfht_add(ht, hash, match, key, size, node, &iter, 0);
 		if (iter.node == node) {
@@ -1833,7 +1847,7 @@ int cds_lfht_replace(struct cds_lfht *ht,
 		return -EINVAL;
 	if (caa_unlikely(!match(old_iter->node, key)))
 		return -EINVAL;
-	size = rcu_dereference(ht->size);
+	size = uatomic_load(&ht->size, CMM_ACQUIRE);
 	return _cds_lfht_replace(ht, size, old_iter->node, old_iter->next,
 			new_node);
 }
@@ -1843,7 +1857,7 @@ int cds_lfht_del(struct cds_lfht *ht, struct cds_lfht_node *node)
 	unsigned long size;
 	int ret;
 
-	size = rcu_dereference(ht->size);
+	size = uatomic_load(&ht->size, CMM_ACQUIRE);
 	ret = _cds_lfht_del(ht, size, node);
 	if (!ret) {
 		unsigned long hash;
@@ -1957,7 +1971,7 @@ int cds_lfht_destroy(struct cds_lfht *ht, pthread_attr_t **attr)
 		if (!cds_lfht_is_empty(ht))
 			return -EPERM;
 		/* Cancel ongoing resize operations. */
-		_CMM_STORE_SHARED(ht->in_progress_destroy, 1);
+		uatomic_store(&ht->in_progress_destroy, 1, CMM_RELAXED);
 		if (attr) {
 			*attr = ht->caller_resize_attr;
 			ht->caller_resize_attr = NULL;
@@ -2077,19 +2091,22 @@ void _do_cds_lfht_resize(struct cds_lfht *ht)
 	 * Resize table, re-do if the target size has changed under us.
 	 */
 	do {
-		if (CMM_LOAD_SHARED(ht->in_progress_destroy))
+		if (uatomic_load(&ht->in_progress_destroy, CMM_RELAXED))
 			break;
-		ht->resize_initiated = 1;
+
+		uatomic_store(&ht->resize_initiated, 1, CMM_RELAXED);
+
 		old_size = ht->size;
-		new_size = CMM_LOAD_SHARED(ht->resize_target);
+		new_size = uatomic_load(&ht->resize_target, CMM_RELAXED);
 		if (old_size < new_size)
 			_do_cds_lfht_grow(ht, old_size, new_size);
 		else if (old_size > new_size)
 			_do_cds_lfht_shrink(ht, old_size, new_size);
-		ht->resize_initiated = 0;
+
+		uatomic_store(&ht->resize_initiated, 0, CMM_RELAXED);
 		/* write resize_initiated before read resize_target */
 		cmm_smp_mb();
-	} while (ht->size != CMM_LOAD_SHARED(ht->resize_target));
+	} while (ht->size != uatomic_load(&ht->resize_target, CMM_RELAXED));
 }
 
 static
@@ -2110,7 +2127,12 @@ void resize_target_update_count(struct cds_lfht *ht,
 void cds_lfht_resize(struct cds_lfht *ht, unsigned long new_size)
 {
 	resize_target_update_count(ht, new_size);
-	CMM_STORE_SHARED(ht->resize_initiated, 1);
+
+	/*
+	 * Set flags has early as possible even in contention case.
+	 */
+	uatomic_store(&ht->resize_initiated, 1, CMM_RELAXED);
+
 	mutex_lock(&ht->resize_mutex);
 	_do_cds_lfht_resize(ht);
 	mutex_unlock(&ht->resize_mutex);
@@ -2136,10 +2158,12 @@ void __cds_lfht_resize_lazy_launch(struct cds_lfht *ht)
 {
 	struct resize_work *work;
 
-	/* Store resize_target before read resize_initiated */
-	cmm_smp_mb();
-	if (!CMM_LOAD_SHARED(ht->resize_initiated)) {
-		if (CMM_LOAD_SHARED(ht->in_progress_destroy)) {
+	/*
+	 * Store to resize_target is before read resize_initiated as guaranteed
+	 * by either cmpxchg or _uatomic_xchg_monotonic_increase.
+	 */
+	if (!uatomic_load(&ht->resize_initiated, CMM_RELAXED)) {
+		if (uatomic_load(&ht->in_progress_destroy, CMM_RELAXED)) {
 			return;
 		}
 		work = malloc(sizeof(*work));
@@ -2150,7 +2174,7 @@ void __cds_lfht_resize_lazy_launch(struct cds_lfht *ht)
 		work->ht = ht;
 		urcu_workqueue_queue_work(cds_lfht_workqueue,
 			&work->work, do_resize_cb);
-		CMM_STORE_SHARED(ht->resize_initiated, 1);
+		uatomic_store(&ht->resize_initiated, 1, CMM_RELAXED);
 	}
 }
 
diff --git a/src/urcu-bp.c b/src/urcu-bp.c
index 47fad8e..08aaa88 100644
--- a/src/urcu-bp.c
+++ b/src/urcu-bp.c
@@ -36,6 +36,7 @@
 #include <stdbool.h>
 #include <sys/mman.h>
 
+#include <urcu/annotate.h>
 #include <urcu/assert.h>
 #include <urcu/config.h>
 #include <urcu/arch.h>
@@ -220,7 +221,8 @@ static void smp_mb_master(void)
  */
 static void wait_for_readers(struct cds_list_head *input_readers,
 			struct cds_list_head *cur_snap_readers,
-			struct cds_list_head *qsreaders)
+			struct cds_list_head *qsreaders,
+			cmm_annotate_t *group)
 {
 	unsigned int wait_loops = 0;
 	struct urcu_bp_reader *index, *tmp;
@@ -235,7 +237,7 @@ static void wait_for_readers(struct cds_list_head *input_readers,
 			wait_loops++;
 
 		cds_list_for_each_entry_safe(index, tmp, input_readers, node) {
-			switch (urcu_bp_reader_state(&index->ctr)) {
+			switch (urcu_bp_reader_state(&index->ctr, group)) {
 			case URCU_BP_READER_ACTIVE_CURRENT:
 				if (cur_snap_readers) {
 					cds_list_move(&index->node,
@@ -274,6 +276,8 @@ static void wait_for_readers(struct cds_list_head *input_readers,
 
 void urcu_bp_synchronize_rcu(void)
 {
+	cmm_annotate_define(acquire_group);
+	cmm_annotate_define(release_group);
 	CDS_LIST_HEAD(cur_snap_readers);
 	CDS_LIST_HEAD(qsreaders);
 	sigset_t newmask, oldmask;
@@ -295,13 +299,14 @@ void urcu_bp_synchronize_rcu(void)
 	 * where new ptr points to. */
 	/* Write new ptr before changing the qparity */
 	smp_mb_master();
+	cmm_annotate_group_mb_release(&release_group);
 
 	/*
 	 * Wait for readers to observe original parity or be quiescent.
 	 * wait_for_readers() can release and grab again rcu_registry_lock
 	 * internally.
 	 */
-	wait_for_readers(&registry, &cur_snap_readers, &qsreaders);
+	wait_for_readers(&registry, &cur_snap_readers, &qsreaders, &acquire_group);
 
 	/*
 	 * Adding a cmm_smp_mb() which is _not_ formally required, but makes the
@@ -311,7 +316,8 @@ void urcu_bp_synchronize_rcu(void)
 	cmm_smp_mb();
 
 	/* Switch parity: 0 -> 1, 1 -> 0 */
-	CMM_STORE_SHARED(rcu_gp.ctr, rcu_gp.ctr ^ URCU_BP_GP_CTR_PHASE);
+	cmm_annotate_group_mem_release(&release_group, &rcu_gp.ctr);
+	uatomic_store(&rcu_gp.ctr, rcu_gp.ctr ^ URCU_BP_GP_CTR_PHASE, CMM_RELAXED);
 
 	/*
 	 * Must commit qparity update to memory before waiting for other parity
@@ -332,7 +338,7 @@ void urcu_bp_synchronize_rcu(void)
 	 * wait_for_readers() can release and grab again rcu_registry_lock
 	 * internally.
 	 */
-	wait_for_readers(&cur_snap_readers, NULL, &qsreaders);
+	wait_for_readers(&cur_snap_readers, NULL, &qsreaders, &acquire_group);
 
 	/*
 	 * Put quiescent reader list back into registry.
@@ -344,6 +350,7 @@ void urcu_bp_synchronize_rcu(void)
 	 * freed.
 	 */
 	smp_mb_master();
+	cmm_annotate_group_mb_acquire(&acquire_group);
 out:
 	mutex_unlock(&rcu_registry_lock);
 	mutex_unlock(&rcu_gp_lock);
diff --git a/src/urcu-qsbr.c b/src/urcu-qsbr.c
index 318ab29..fd50e80 100644
--- a/src/urcu-qsbr.c
+++ b/src/urcu-qsbr.c
@@ -34,6 +34,7 @@
 #include <errno.h>
 #include <poll.h>
 
+#include <urcu/annotate.h>
 #include <urcu/assert.h>
 #include <urcu/wfcqueue.h>
 #include <urcu/map/urcu-qsbr.h>
@@ -156,7 +157,8 @@ static void wait_gp(void)
  */
 static void wait_for_readers(struct cds_list_head *input_readers,
 			struct cds_list_head *cur_snap_readers,
-			struct cds_list_head *qsreaders)
+			struct cds_list_head *qsreaders,
+			cmm_annotate_t *group)
 {
 	unsigned int wait_loops = 0;
 	struct urcu_qsbr_reader *index, *tmp;
@@ -183,7 +185,7 @@ static void wait_for_readers(struct cds_list_head *input_readers,
 			cmm_smp_mb();
 		}
 		cds_list_for_each_entry_safe(index, tmp, input_readers, node) {
-			switch (urcu_qsbr_reader_state(&index->ctr)) {
+			switch (urcu_qsbr_reader_state(&index->ctr, group)) {
 			case URCU_READER_ACTIVE_CURRENT:
 				if (cur_snap_readers) {
 					cds_list_move(&index->node,
@@ -208,8 +210,7 @@ static void wait_for_readers(struct cds_list_head *input_readers,
 		if (cds_list_empty(input_readers)) {
 			if (wait_loops >= RCU_QS_ACTIVE_ATTEMPTS) {
 				/* Read reader_gp before write futex */
-				cmm_smp_mb();
-				uatomic_set(&urcu_qsbr_gp.futex, 0);
+				uatomic_store(&urcu_qsbr_gp.futex, 0, CMM_RELEASE);
 			}
 			break;
 		} else {
@@ -238,6 +239,8 @@ static void wait_for_readers(struct cds_list_head *input_readers,
 #if (CAA_BITS_PER_LONG < 64)
 void urcu_qsbr_synchronize_rcu(void)
 {
+	cmm_annotate_define(acquire_group);
+	cmm_annotate_define(release_group);
 	CDS_LIST_HEAD(cur_snap_readers);
 	CDS_LIST_HEAD(qsreaders);
 	unsigned long was_online;
@@ -258,6 +261,7 @@ void urcu_qsbr_synchronize_rcu(void)
 		urcu_qsbr_thread_offline();
 	else
 		cmm_smp_mb();
+	cmm_annotate_group_mb_release(&release_group);
 
 	/*
 	 * Add ourself to gp_waiters queue of threads awaiting to wait
@@ -289,7 +293,7 @@ void urcu_qsbr_synchronize_rcu(void)
 	 * wait_for_readers() can release and grab again rcu_registry_lock
 	 * internally.
 	 */
-	wait_for_readers(&registry, &cur_snap_readers, &qsreaders);
+	wait_for_readers(&registry, &cur_snap_readers, &qsreaders, &acquire_group);
 
 	/*
 	 * Must finish waiting for quiescent state for original parity
@@ -309,7 +313,8 @@ void urcu_qsbr_synchronize_rcu(void)
 	cmm_smp_mb();
 
 	/* Switch parity: 0 -> 1, 1 -> 0 */
-	CMM_STORE_SHARED(urcu_qsbr_gp.ctr, urcu_qsbr_gp.ctr ^ URCU_QSBR_GP_CTR);
+	cmm_annotate_group_mem_release(&release_group, &urcu_qsbr_gp.ctr);
+	uatomic_store(&urcu_qsbr_gp.ctr, urcu_qsbr_gp.ctr ^ URCU_QSBR_GP_CTR, CMM_RELAXED);
 
 	/*
 	 * Must commit urcu_qsbr_gp.ctr update to memory before waiting for
@@ -332,7 +337,7 @@ void urcu_qsbr_synchronize_rcu(void)
 	 * wait_for_readers() can release and grab again rcu_registry_lock
 	 * internally.
 	 */
-	wait_for_readers(&cur_snap_readers, NULL, &qsreaders);
+	wait_for_readers(&cur_snap_readers, NULL, &qsreaders, &acquire_group);
 
 	/*
 	 * Put quiescent reader list back into registry.
@@ -347,6 +352,8 @@ gp_end:
 	 * Finish waiting for reader threads before letting the old ptr being
 	 * freed.
 	 */
+	cmm_annotate_group_mb_acquire(&acquire_group);
+
 	if (was_online)
 		urcu_qsbr_thread_online();
 	else
@@ -355,6 +362,8 @@ gp_end:
 #else /* !(CAA_BITS_PER_LONG < 64) */
 void urcu_qsbr_synchronize_rcu(void)
 {
+	cmm_annotate_define(acquire_group);
+	cmm_annotate_define(release_group);
 	CDS_LIST_HEAD(qsreaders);
 	unsigned long was_online;
 	DEFINE_URCU_WAIT_NODE(wait, URCU_WAIT_WAITING);
@@ -371,6 +380,7 @@ void urcu_qsbr_synchronize_rcu(void)
 		urcu_qsbr_thread_offline();
 	else
 		cmm_smp_mb();
+	cmm_annotate_group_mb_release(&release_group);
 
 	/*
 	 * Add ourself to gp_waiters queue of threads awaiting to wait
@@ -398,7 +408,8 @@ void urcu_qsbr_synchronize_rcu(void)
 		goto out;
 
 	/* Increment current G.P. */
-	CMM_STORE_SHARED(urcu_qsbr_gp.ctr, urcu_qsbr_gp.ctr + URCU_QSBR_GP_CTR);
+	cmm_annotate_group_mem_release(&release_group, &urcu_qsbr_gp.ctr);
+	uatomic_store(&urcu_qsbr_gp.ctr, urcu_qsbr_gp.ctr + URCU_QSBR_GP_CTR, CMM_RELAXED);
 
 	/*
 	 * Must commit urcu_qsbr_gp.ctr update to memory before waiting for
@@ -421,7 +432,7 @@ void urcu_qsbr_synchronize_rcu(void)
 	 * wait_for_readers() can release and grab again rcu_registry_lock
 	 * internally.
 	 */
-	wait_for_readers(&registry, NULL, &qsreaders);
+	wait_for_readers(&registry, NULL, &qsreaders, &acquire_group);
 
 	/*
 	 * Put quiescent reader list back into registry.
@@ -436,6 +447,8 @@ gp_end:
 		urcu_qsbr_thread_online();
 	else
 		cmm_smp_mb();
+
+	cmm_annotate_group_mb_acquire(&acquire_group);
 }
 #endif  /* !(CAA_BITS_PER_LONG < 64) */
 
diff --git a/src/urcu-wait.h b/src/urcu-wait.h
index 4667a13..1ffced4 100644
--- a/src/urcu-wait.h
+++ b/src/urcu-wait.h
@@ -126,9 +126,8 @@ void urcu_wait_node_init(struct urcu_wait_node *node,
 static inline
 void urcu_adaptative_wake_up(struct urcu_wait_node *wait)
 {
-	cmm_smp_mb();
 	urcu_posix_assert(uatomic_read(&wait->state) == URCU_WAIT_WAITING);
-	uatomic_set(&wait->state, URCU_WAIT_WAKEUP);
+	uatomic_store(&wait->state, URCU_WAIT_WAKEUP, CMM_RELEASE);
 	if (!(uatomic_read(&wait->state) & URCU_WAIT_RUNNING)) {
 		if (futex_noasync(&wait->state, FUTEX_WAKE, 1,
 				NULL, NULL, 0) < 0)
@@ -150,11 +149,11 @@ void urcu_adaptative_busy_wait(struct urcu_wait_node *wait)
 	/* Load and test condition before read state */
 	cmm_smp_rmb();
 	for (i = 0; i < URCU_WAIT_ATTEMPTS; i++) {
-		if (uatomic_read(&wait->state) != URCU_WAIT_WAITING)
+		if (uatomic_load(&wait->state, CMM_ACQUIRE) != URCU_WAIT_WAITING)
 			goto skip_futex_wait;
 		caa_cpu_relax();
 	}
-	while (uatomic_read(&wait->state) == URCU_WAIT_WAITING) {
+	while (uatomic_load(&wait->state, CMM_ACQUIRE) == URCU_WAIT_WAITING) {
 		if (!futex_noasync(&wait->state, FUTEX_WAIT, URCU_WAIT_WAITING, NULL, NULL, 0)) {
 			/*
 			 * Prior queued wakeups queued by unrelated code
@@ -189,7 +188,7 @@ skip_futex_wait:
 	 * memory allocated for struct urcu_wait.
 	 */
 	for (i = 0; i < URCU_WAIT_ATTEMPTS; i++) {
-		if (uatomic_read(&wait->state) & URCU_WAIT_TEARDOWN)
+		if (uatomic_load(&wait->state, CMM_RELAXED) & URCU_WAIT_TEARDOWN)
 			break;
 		caa_cpu_relax();
 	}
diff --git a/src/urcu.c b/src/urcu.c
index c60307e..353e9bb 100644
--- a/src/urcu.c
+++ b/src/urcu.c
@@ -38,6 +38,7 @@
 #include <poll.h>
 
 #include <urcu/config.h>
+#include <urcu/annotate.h>
 #include <urcu/assert.h>
 #include <urcu/arch.h>
 #include <urcu/wfcqueue.h>
@@ -300,7 +301,8 @@ end:
  */
 static void wait_for_readers(struct cds_list_head *input_readers,
 			struct cds_list_head *cur_snap_readers,
-			struct cds_list_head *qsreaders)
+			struct cds_list_head *qsreaders,
+			cmm_annotate_t *group)
 {
 	unsigned int wait_loops = 0;
 	struct urcu_reader *index, *tmp;
@@ -323,7 +325,7 @@ static void wait_for_readers(struct cds_list_head *input_readers,
 		}
 
 		cds_list_for_each_entry_safe(index, tmp, input_readers, node) {
-			switch (urcu_common_reader_state(&rcu_gp, &index->ctr)) {
+			switch (urcu_common_reader_state(&rcu_gp, &index->ctr, group)) {
 			case URCU_READER_ACTIVE_CURRENT:
 				if (cur_snap_readers) {
 					cds_list_move(&index->node,
@@ -407,6 +409,8 @@ static void wait_for_readers(struct cds_list_head *input_readers,
 
 void synchronize_rcu(void)
 {
+	cmm_annotate_define(acquire_group);
+	cmm_annotate_define(release_group);
 	CDS_LIST_HEAD(cur_snap_readers);
 	CDS_LIST_HEAD(qsreaders);
 	DEFINE_URCU_WAIT_NODE(wait, URCU_WAIT_WAITING);
@@ -421,10 +425,11 @@ void synchronize_rcu(void)
 	 * queue before their insertion into the wait queue.
 	 */
 	if (urcu_wait_add(&gp_waiters, &wait) != 0) {
-		/* Not first in queue: will be awakened by another thread. */
+		/*
+		 * Not first in queue: will be awakened by another thread.
+		 * Implies a memory barrier after grace period.
+		 */
 		urcu_adaptative_busy_wait(&wait);
-		/* Order following memory accesses after grace period. */
-		cmm_smp_mb();
 		return;
 	}
 	/* We won't need to wake ourself up */
@@ -449,13 +454,14 @@ void synchronize_rcu(void)
 	 */
 	/* Write new ptr before changing the qparity */
 	smp_mb_master();
+	cmm_annotate_group_mb_release(&release_group);
 
 	/*
 	 * Wait for readers to observe original parity or be quiescent.
 	 * wait_for_readers() can release and grab again rcu_registry_lock
 	 * internally.
 	 */
-	wait_for_readers(&registry, &cur_snap_readers, &qsreaders);
+	wait_for_readers(&registry, &cur_snap_readers, &qsreaders, &acquire_group);
 
 	/*
 	 * Must finish waiting for quiescent state for original parity before
@@ -474,7 +480,8 @@ void synchronize_rcu(void)
 	cmm_smp_mb();
 
 	/* Switch parity: 0 -> 1, 1 -> 0 */
-	CMM_STORE_SHARED(rcu_gp.ctr, rcu_gp.ctr ^ URCU_GP_CTR_PHASE);
+	cmm_annotate_group_mem_release(&release_group, &rcu_gp.ctr);
+	uatomic_store(&rcu_gp.ctr, rcu_gp.ctr ^ URCU_GP_CTR_PHASE, CMM_RELAXED);
 
 	/*
 	 * Must commit rcu_gp.ctr update to memory before waiting for quiescent
@@ -497,7 +504,7 @@ void synchronize_rcu(void)
 	 * wait_for_readers() can release and grab again rcu_registry_lock
 	 * internally.
 	 */
-	wait_for_readers(&cur_snap_readers, NULL, &qsreaders);
+	wait_for_readers(&cur_snap_readers, NULL, &qsreaders, &acquire_group);
 
 	/*
 	 * Put quiescent reader list back into registry.
@@ -510,6 +517,7 @@ void synchronize_rcu(void)
 	 * iterates on reader threads.
 	 */
 	smp_mb_master();
+	cmm_annotate_group_mb_acquire(&acquire_group);
 out:
 	mutex_unlock(&rcu_registry_lock);
 	mutex_unlock(&rcu_gp_lock);
-- 
2.40.1

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [lttng-dev] [PATCH v2 11/12] Add cmm_emit_legacy_smp_mb()
  2023-05-15 20:17 [lttng-dev] [PATCH 00/11] Add support for TSAN to liburcu Olivier Dion via lttng-dev
                   ` (22 preceding siblings ...)
  2023-06-07 18:53 ` [lttng-dev] [PATCH v2 10/12] urcu/annotate: Add CMM annotation Olivier Dion via lttng-dev
@ 2023-06-07 18:53 ` Olivier Dion via lttng-dev
  2023-06-07 18:53 ` [lttng-dev] [PATCH v2 12/12] tests: Add tests for checking race conditions Olivier Dion via lttng-dev
  24 siblings, 0 replies; 69+ messages in thread
From: Olivier Dion via lttng-dev @ 2023-06-07 18:53 UTC (permalink / raw)
  To: lttng-dev; +Cc: Olivier Dion, Tony Finch, Paul E. McKenney

Some public APIs stipulate implicit memory barriers on operations. These
were coherent with the memory model used at that time. However, with the
migration to a memory model closer to the C11 memory model, these memory
barriers are not strictly emitted by the atomic operations in the new
memory model.

Therefore, introducing the `--disable-legacy-mb' configuration
option. By default, liburcu is configured to emit these legacy memory
barriers, thus keeping backward compatibility at the expense of slower
performances. However, users can opt-out by disabling the legacy memory
barriers.

This options is publicly exported in the system configuration header
file and can be overrode manually on a compilation unit basis by
defining `CONFIG_RCU_EMIT_LEGACY_MB' before including any liburcu files.

The usage of this macro requires to re-write atomic operations in term
of the CMM memory model. This is done for the queue and stack APIs.

Change-Id: Ia5ce3b3d8cd1955556ce96fa4408a63aa098a1a6
Signed-off-by: Olivier Dion <odion@efficios.com>
---
 configure.ac                     | 13 ++++++
 include/urcu/arch.h              |  6 +++
 include/urcu/config.h.in         |  3 ++
 include/urcu/static/lfstack.h    | 25 ++++++++----
 include/urcu/static/rculfqueue.h | 14 ++++---
 include/urcu/static/rculfstack.h |  8 +++-
 include/urcu/static/wfcqueue.h   | 68 +++++++++++++++++---------------
 include/urcu/static/wfqueue.h    |  9 +++--
 include/urcu/static/wfstack.h    | 24 +++++++----
 9 files changed, 114 insertions(+), 56 deletions(-)

diff --git a/configure.ac b/configure.ac
index 4450a31..ca51b5b 100644
--- a/configure.ac
+++ b/configure.ac
@@ -235,6 +235,11 @@ AE_FEATURE([cds-lfht-iter-debug], [Enable extra debugging checks for lock-free h
 AE_FEATURE_DEFAULT_ENABLE
 AE_FEATURE([atomic-builtins], [Disable the usage of toolchain atomic builtins.])
 
+# emit legacy memory barriers
+# Enable by default
+AE_FEATURE_DEFAULT_ENABLE
+AE_FEATURE([legacy-mb], [Disable legacy memory barriers.])
+
 # When given, add -Werror to WARN_CFLAGS and WARN_CXXFLAGS.
 # Disabled by default
 AE_FEATURE_DEFAULT_DISABLE
@@ -282,6 +287,10 @@ AE_IF_FEATURE_ENABLED([atomic-builtins], [
 	[AE_FEATURE_DISABLE(atomic-builtins)])
 ])
 
+AE_IF_FEATURE_ENABLED([legacy-mb], [
+  AC_DEFINE([CONFIG_RCU_EMIT_LEGACY_MB], [1], [Emit legacy memory barriers that were documented in the APIs.])
+])
+
 ##                                                                          ##
 ## Set automake variables for optional feature conditionnals in Makefile.am ##
 ##                                                                          ##
@@ -387,6 +396,10 @@ PPRINT_PROP_BOOL([Lock-free HT iterator debugging], $value)
 AE_IS_FEATURE_ENABLED([atomic-builtins]) && value=1 || value=0
 PPRINT_PROP_BOOL([Use toolchain atomic builtins], $value)
 
+# legacy memory barriers
+AE_IS_FEATURE_ENABLED([legacy-mb]) && value=1 || value=0
+PPRINT_PROP_BOOL([Emit legacy memory barriers], $value)
+
 PPRINT_PROP_BOOL([Multi-flavor support], 1)
 
 report_bindir="`eval eval echo $bindir`"
diff --git a/include/urcu/arch.h b/include/urcu/arch.h
index d3914da..377a0ec 100644
--- a/include/urcu/arch.h
+++ b/include/urcu/arch.h
@@ -171,5 +171,11 @@
 #error "Cannot build: unrecognized architecture, see <urcu/arch.h>."
 #endif
 
+#ifdef CONFIG_RCU_EMIT_LEGACY_MB
+# define cmm_emit_legacy_smp_mb() cmm_smp_mb()
+#else
+# define cmm_emit_legacy_smp_mb() do { } while (0)
+#endif
+
 
 #endif /* _URCU_ARCH_H */
diff --git a/include/urcu/config.h.in b/include/urcu/config.h.in
index 1daaa7e..d2f6c8c 100644
--- a/include/urcu/config.h.in
+++ b/include/urcu/config.h.in
@@ -22,6 +22,9 @@
 /* Uatomic API uses atomic builtins? */
 #undef CONFIG_RCU_USE_ATOMIC_BUILTINS
 
+/* Emit legacy memory barriers? */
+#undef CONFIG_RCU_EMIT_LEGACY_MB
+
 /* Expose multi-flavor support */
 #define CONFIG_RCU_HAVE_MULTIFLAVOR 1
 
diff --git a/include/urcu/static/lfstack.h b/include/urcu/static/lfstack.h
index a05acb4..07604db 100644
--- a/include/urcu/static/lfstack.h
+++ b/include/urcu/static/lfstack.h
@@ -114,7 +114,7 @@ bool ___cds_lfs_empty_head(struct cds_lfs_head *head)
 static inline
 bool _cds_lfs_empty(cds_lfs_stack_ptr_t s)
 {
-	return ___cds_lfs_empty_head(CMM_LOAD_SHARED(s._s->head));
+	return ___cds_lfs_empty_head(uatomic_load(&s._s->head, CMM_RELAXED));
 }
 
 /*
@@ -122,6 +122,8 @@ bool _cds_lfs_empty(cds_lfs_stack_ptr_t s)
  *
  * Does not require any synchronization with other push nor pop.
  *
+ * Operations before push are consistent when observed after associated pop.
+ *
  * Lock-free stack push is not subject to ABA problem, so no need to
  * take the RCU read-side lock. Even if "head" changes between two
  * uatomic_cmpxchg() invocations here (being popped, and then pushed
@@ -167,7 +169,9 @@ bool _cds_lfs_push(cds_lfs_stack_ptr_t u_s,
 		 * uatomic_cmpxchg() implicit memory barrier orders earlier
 		 * stores to node before publication.
 		 */
-		head = uatomic_cmpxchg(&s->head, old_head, new_head);
+		cmm_emit_legacy_smp_mb();
+		head = uatomic_cmpxchg_mo(&s->head, old_head, new_head,
+					CMM_SEQ_CST, CMM_SEQ_CST);
 		if (old_head == head)
 			break;
 	}
@@ -179,6 +183,8 @@ bool _cds_lfs_push(cds_lfs_stack_ptr_t u_s,
  *
  * Returns NULL if stack is empty.
  *
+ * Operations after pop are consistent when observed before associated push.
+ *
  * __cds_lfs_pop needs to be synchronized using one of the following
  * techniques:
  *
@@ -203,7 +209,7 @@ struct cds_lfs_node *___cds_lfs_pop(cds_lfs_stack_ptr_t u_s)
 		struct cds_lfs_head *head, *next_head;
 		struct cds_lfs_node *next;
 
-		head = _CMM_LOAD_SHARED(s->head);
+		head = uatomic_load(&s->head, CMM_CONSUME);
 		if (___cds_lfs_empty_head(head))
 			return NULL;	/* Empty stack */
 
@@ -212,12 +218,14 @@ struct cds_lfs_node *___cds_lfs_pop(cds_lfs_stack_ptr_t u_s)
 		 * memory barrier before uatomic_cmpxchg() in
 		 * cds_lfs_push.
 		 */
-		cmm_smp_read_barrier_depends();
-		next = _CMM_LOAD_SHARED(head->node.next);
+		next = uatomic_load(&head->node.next, CMM_RELAXED);
 		next_head = caa_container_of(next,
 				struct cds_lfs_head, node);
-		if (uatomic_cmpxchg(&s->head, head, next_head) == head)
+		if (uatomic_cmpxchg_mo(&s->head, head, next_head,
+					CMM_SEQ_CST, CMM_SEQ_CST) == head){
+			cmm_emit_legacy_smp_mb();
 			return &head->node;
+		}
 		/* busy-loop if head changed under us */
 	}
 }
@@ -245,6 +253,7 @@ static inline
 struct cds_lfs_head *___cds_lfs_pop_all(cds_lfs_stack_ptr_t u_s)
 {
 	struct __cds_lfs_stack *s = u_s._s;
+	struct cds_lfs_head *head;
 
 	/*
 	 * Implicit memory barrier after uatomic_xchg() matches implicit
@@ -256,7 +265,9 @@ struct cds_lfs_head *___cds_lfs_pop_all(cds_lfs_stack_ptr_t u_s)
 	 * taking care to order writes to each node prior to the full
 	 * memory barrier after this uatomic_xchg().
 	 */
-	return uatomic_xchg(&s->head, NULL);
+	head = uatomic_xchg_mo(&s->head, NULL, CMM_SEQ_CST);
+	cmm_emit_legacy_smp_mb();
+	return head;
 }
 
 /*
diff --git a/include/urcu/static/rculfqueue.h b/include/urcu/static/rculfqueue.h
index ad73454..25a4ec8 100644
--- a/include/urcu/static/rculfqueue.h
+++ b/include/urcu/static/rculfqueue.h
@@ -148,26 +148,29 @@ void _cds_lfq_enqueue_rcu(struct cds_lfq_queue_rcu *q,
 	 * uatomic_cmpxchg() implicit memory barrier orders earlier stores to
 	 * node before publication.
 	 */
-
 	for (;;) {
 		struct cds_lfq_node_rcu *tail, *next;
 
 		tail = rcu_dereference(q->tail);
-		next = uatomic_cmpxchg(&tail->next, NULL, node);
+		cmm_emit_legacy_smp_mb();
+		next = uatomic_cmpxchg_mo(&tail->next, NULL, node,
+					CMM_SEQ_CST, CMM_SEQ_CST);
 		if (next == NULL) {
 			/*
 			 * Tail was at the end of queue, we successfully
 			 * appended to it. Now move tail (another
 			 * enqueue might beat us to it, that's fine).
 			 */
-			(void) uatomic_cmpxchg(&q->tail, tail, node);
+			(void) uatomic_cmpxchg_mo(&q->tail, tail, node,
+						CMM_SEQ_CST, CMM_SEQ_CST);
 			return;
 		} else {
 			/*
 			 * Failure to append to current tail.
 			 * Help moving tail further and retry.
 			 */
-			(void) uatomic_cmpxchg(&q->tail, tail, next);
+			(void) uatomic_cmpxchg_mo(&q->tail, tail, next,
+						CMM_SEQ_CST, CMM_SEQ_CST);
 			continue;
 		}
 	}
@@ -211,7 +214,8 @@ struct cds_lfq_node_rcu *_cds_lfq_dequeue_rcu(struct cds_lfq_queue_rcu *q)
 			enqueue_dummy(q);
 			next = rcu_dereference(head->next);
 		}
-		if (uatomic_cmpxchg(&q->head, head, next) != head)
+		if (uatomic_cmpxchg_mo(&q->head, head, next,
+					CMM_SEQ_CST, CMM_SEQ_CST) != head)
 			continue;	/* Concurrently pushed. */
 		if (head->dummy) {
 			/* Free dummy after grace period. */
diff --git a/include/urcu/static/rculfstack.h b/include/urcu/static/rculfstack.h
index 54ff377..2befb2a 100644
--- a/include/urcu/static/rculfstack.h
+++ b/include/urcu/static/rculfstack.h
@@ -83,7 +83,9 @@ int _cds_lfs_push_rcu(struct cds_lfs_stack_rcu *s,
 		 * uatomic_cmpxchg() implicit memory barrier orders earlier
 		 * stores to node before publication.
 		 */
-		head = uatomic_cmpxchg(&s->head, old_head, node);
+		cmm_emit_legacy_smp_mb();
+		head = uatomic_cmpxchg_mo(&s->head, old_head, node,
+					CMM_SEQ_CST, CMM_SEQ_CST);
 		if (old_head == head)
 			break;
 	}
@@ -108,7 +110,9 @@ _cds_lfs_pop_rcu(struct cds_lfs_stack_rcu *s)
 		if (head) {
 			struct cds_lfs_node_rcu *next = rcu_dereference(head->next);
 
-			if (uatomic_cmpxchg(&s->head, head, next) == head) {
+			if (uatomic_cmpxchg_mo(&s->head, head, next,
+						CMM_SEQ_CST, CMM_SEQ_CST) == head) {
+				cmm_emit_legacy_smp_mb();
 				return head;
 			} else {
 				/* Concurrent modification. Retry. */
diff --git a/include/urcu/static/wfcqueue.h b/include/urcu/static/wfcqueue.h
index 478e859..043b18a 100644
--- a/include/urcu/static/wfcqueue.h
+++ b/include/urcu/static/wfcqueue.h
@@ -91,6 +91,11 @@ static inline void _cds_wfcq_node_init(struct cds_wfcq_node *node)
 	node->next = NULL;
 }
 
+static inline void _cds_wfcq_node_init_atomic(struct cds_wfcq_node *node)
+{
+	uatomic_store(&node->next, NULL, CMM_RELAXED);
+}
+
 /*
  * cds_wfcq_init: initialize wait-free queue (with lock). Pair with
  * cds_wfcq_destroy().
@@ -153,8 +158,8 @@ static inline bool _cds_wfcq_empty(cds_wfcq_head_ptr_t u_head,
 	 * common case to ensure that dequeuers do not frequently access
 	 * enqueuer's tail->p cache line.
 	 */
-	return CMM_LOAD_SHARED(head->node.next) == NULL
-		&& CMM_LOAD_SHARED(tail->p) == &head->node;
+	return uatomic_load(&head->node.next, CMM_CONSUME) == NULL
+		&& uatomic_load(&tail->p, CMM_CONSUME) == &head->node;
 }
 
 static inline void _cds_wfcq_dequeue_lock(struct cds_wfcq_head *head,
@@ -188,7 +193,7 @@ static inline bool ___cds_wfcq_append(cds_wfcq_head_ptr_t u_head,
 	 * stores to data structure containing node and setting
 	 * node->next to NULL before publication.
 	 */
-	old_tail = uatomic_xchg(&tail->p, new_tail);
+	old_tail = uatomic_xchg_mo(&tail->p, new_tail, CMM_SEQ_CST);
 
 	/*
 	 * Implicit memory barrier after uatomic_xchg() orders store to
@@ -199,7 +204,8 @@ static inline bool ___cds_wfcq_append(cds_wfcq_head_ptr_t u_head,
 	 * store will append "node" to the queue from a dequeuer
 	 * perspective.
 	 */
-	CMM_STORE_SHARED(old_tail->next, new_head);
+	uatomic_store(&old_tail->next, new_head, CMM_RELEASE);
+
 	/*
 	 * Return false if queue was empty prior to adding the node,
 	 * else return true.
@@ -210,8 +216,8 @@ static inline bool ___cds_wfcq_append(cds_wfcq_head_ptr_t u_head,
 /*
  * cds_wfcq_enqueue: enqueue a node into a wait-free queue.
  *
- * Issues a full memory barrier before enqueue. No mutual exclusion is
- * required.
+ * Operations prior to enqueue are consistant with respect to dequeuing or
+ * splicing and iterating.
  *
  * Returns false if the queue was empty prior to adding the node.
  * Returns true otherwise.
@@ -220,6 +226,8 @@ static inline bool _cds_wfcq_enqueue(cds_wfcq_head_ptr_t head,
 		struct cds_wfcq_tail *tail,
 		struct cds_wfcq_node *new_tail)
 {
+	cmm_emit_legacy_smp_mb();
+
 	return ___cds_wfcq_append(head, tail, new_tail, new_tail);
 }
 
@@ -270,8 +278,10 @@ ___cds_wfcq_node_sync_next(struct cds_wfcq_node *node, int blocking)
 
 	/*
 	 * Adaptative busy-looping waiting for enqueuer to complete enqueue.
+	 *
+	 * Load node.next before loading node's content
 	 */
-	while ((next = CMM_LOAD_SHARED(node->next)) == NULL) {
+	while ((next = uatomic_load(&node->next, CMM_CONSUME)) == NULL) {
 		if (___cds_wfcq_busy_wait(&attempt, blocking))
 			return CDS_WFCQ_WOULDBLOCK;
 	}
@@ -290,8 +300,7 @@ ___cds_wfcq_first(cds_wfcq_head_ptr_t u_head,
 	if (_cds_wfcq_empty(__cds_wfcq_head_cast(head), tail))
 		return NULL;
 	node = ___cds_wfcq_node_sync_next(&head->node, blocking);
-	/* Load head->node.next before loading node's content */
-	cmm_smp_read_barrier_depends();
+
 	return node;
 }
 
@@ -343,16 +352,15 @@ ___cds_wfcq_next(cds_wfcq_head_ptr_t head __attribute__((unused)),
 	 * out if we reached the end of the queue, we first check
 	 * node->next as a common case to ensure that iteration on nodes
 	 * do not frequently access enqueuer's tail->p cache line.
+	 *
+	 * Load node->next before loading next's content
 	 */
-	if ((next = CMM_LOAD_SHARED(node->next)) == NULL) {
-		/* Load node->next before tail->p */
-		cmm_smp_rmb();
-		if (CMM_LOAD_SHARED(tail->p) == node)
+	if ((next = uatomic_load(&node->next, CMM_CONSUME)) == NULL) {
+		if (uatomic_load(&tail->p, CMM_RELAXED) == node)
 			return NULL;
 		next = ___cds_wfcq_node_sync_next(node, blocking);
 	}
-	/* Load node->next before loading next's content */
-	cmm_smp_read_barrier_depends();
+
 	return next;
 }
 
@@ -414,7 +422,7 @@ ___cds_wfcq_dequeue_with_state(cds_wfcq_head_ptr_t u_head,
 		return CDS_WFCQ_WOULDBLOCK;
 	}
 
-	if ((next = CMM_LOAD_SHARED(node->next)) == NULL) {
+	if ((next = uatomic_load(&node->next, CMM_CONSUME)) == NULL) {
 		/*
 		 * @node is probably the only node in the queue.
 		 * Try to move the tail to &q->head.
@@ -422,17 +430,13 @@ ___cds_wfcq_dequeue_with_state(cds_wfcq_head_ptr_t u_head,
 		 * NULL if the cmpxchg succeeds. Should the
 		 * cmpxchg fail due to a concurrent enqueue, the
 		 * q->head.next will be set to the next node.
-		 * The implicit memory barrier before
-		 * uatomic_cmpxchg() orders load node->next
-		 * before loading q->tail.
-		 * The implicit memory barrier before uatomic_cmpxchg
-		 * orders load q->head.next before loading node's
-		 * content.
 		 */
-		_cds_wfcq_node_init(&head->node);
-		if (uatomic_cmpxchg(&tail->p, node, &head->node) == node) {
+		_cds_wfcq_node_init_atomic(&head->node);
+		if (uatomic_cmpxchg_mo(&tail->p, node, &head->node,
+					CMM_SEQ_CST, CMM_SEQ_CST) == node) {
 			if (state)
 				*state |= CDS_WFCQ_STATE_LAST;
+			cmm_emit_legacy_smp_mb();
 			return node;
 		}
 		next = ___cds_wfcq_node_sync_next(node, blocking);
@@ -442,7 +446,7 @@ ___cds_wfcq_dequeue_with_state(cds_wfcq_head_ptr_t u_head,
 		 * (currently NULL) back to its original value.
 		 */
 		if (!blocking && next == CDS_WFCQ_WOULDBLOCK) {
-			head->node.next = node;
+			uatomic_store(&head->node.next, node, CMM_RELAXED);
 			return CDS_WFCQ_WOULDBLOCK;
 		}
 	}
@@ -450,10 +454,9 @@ ___cds_wfcq_dequeue_with_state(cds_wfcq_head_ptr_t u_head,
 	/*
 	 * Move queue head forward.
 	 */
-	head->node.next = next;
+	uatomic_store(&head->node.next, next, CMM_RELAXED);
+	cmm_emit_legacy_smp_mb();
 
-	/* Load q->head.next before loading node's content */
-	cmm_smp_read_barrier_depends();
 	return node;
 }
 
@@ -515,6 +518,8 @@ ___cds_wfcq_dequeue_nonblocking(cds_wfcq_head_ptr_t head,
 /*
  * __cds_wfcq_splice: enqueue all src_q nodes at the end of dest_q.
  *
+ * Operations after splice are consistant with respect to enqueue.
+ *
  * Dequeue all nodes from src_q.
  * dest_q must be already initialized.
  * Mutual exclusion for src_q should be ensured by the caller as
@@ -548,10 +553,10 @@ ___cds_wfcq_splice(
 		 * uatomic_xchg, as well as tail pointer vs head node
 		 * address.
 		 */
-		head = uatomic_xchg(&src_q_head->node.next, NULL);
+		head = uatomic_xchg_mo(&src_q_head->node.next, NULL, CMM_SEQ_CST);
 		if (head)
 			break;	/* non-empty */
-		if (CMM_LOAD_SHARED(src_q_tail->p) == &src_q_head->node)
+		if (uatomic_load(&src_q_tail->p, CMM_CONSUME) == &src_q_head->node)
 			return CDS_WFCQ_RET_SRC_EMPTY;
 		if (___cds_wfcq_busy_wait(&attempt, blocking))
 			return CDS_WFCQ_RET_WOULDBLOCK;
@@ -563,7 +568,8 @@ ___cds_wfcq_splice(
 	 * concurrent enqueue on src_q, which exchanges the tail before
 	 * updating the previous tail's next pointer.
 	 */
-	tail = uatomic_xchg(&src_q_tail->p, &src_q_head->node);
+	cmm_emit_legacy_smp_mb();
+	tail = uatomic_xchg_mo(&src_q_tail->p, &src_q_head->node, CMM_SEQ_CST);
 
 	/*
 	 * Append the spliced content of src_q into dest_q. Does not
diff --git a/include/urcu/static/wfqueue.h b/include/urcu/static/wfqueue.h
index d04f66f..290fe0a 100644
--- a/include/urcu/static/wfqueue.h
+++ b/include/urcu/static/wfqueue.h
@@ -81,13 +81,14 @@ static inline void _cds_wfq_enqueue(struct cds_wfq_queue *q,
 	 * structure containing node and setting node->next to NULL before
 	 * publication.
 	 */
-	old_tail = uatomic_xchg(&q->tail, &node->next);
+	cmm_emit_legacy_smp_mb();
+	old_tail = uatomic_xchg_mo(&q->tail, &node->next, CMM_SEQ_CST);
 	/*
 	 * At this point, dequeuers see a NULL old_tail->next, which indicates
 	 * that the queue is being appended to. The following store will append
 	 * "node" to the queue from a dequeuer perspective.
 	 */
-	CMM_STORE_SHARED(*old_tail, node);
+	uatomic_store(old_tail, node, CMM_RELEASE);
 }
 
 /*
@@ -102,7 +103,7 @@ ___cds_wfq_node_sync_next(struct cds_wfq_node *node)
 	/*
 	 * Adaptative busy-looping waiting for enqueuer to complete enqueue.
 	 */
-	while ((next = CMM_LOAD_SHARED(node->next)) == NULL) {
+	while ((next = uatomic_load(&node->next, CMM_CONSUME)) == NULL) {
 		if (++attempt >= WFQ_ADAPT_ATTEMPTS) {
 			(void) poll(NULL, 0, WFQ_WAIT);	/* Wait for 10ms */
 			attempt = 0;
@@ -129,7 +130,7 @@ ___cds_wfq_dequeue_blocking(struct cds_wfq_queue *q)
 	/*
 	 * Queue is empty if it only contains the dummy node.
 	 */
-	if (q->head == &q->dummy && CMM_LOAD_SHARED(q->tail) == &q->dummy.next)
+	if (q->head == &q->dummy && uatomic_load(&q->tail, CMM_CONSUME) == &q->dummy.next)
 		return NULL;
 	node = q->head;
 
diff --git a/include/urcu/static/wfstack.h b/include/urcu/static/wfstack.h
index 088e6e3..cfaf675 100644
--- a/include/urcu/static/wfstack.h
+++ b/include/urcu/static/wfstack.h
@@ -124,7 +124,7 @@ static inline bool _cds_wfs_empty(cds_wfs_stack_ptr_t u_stack)
 {
 	struct __cds_wfs_stack *s = u_stack._s;
 
-	return ___cds_wfs_end(CMM_LOAD_SHARED(s->head));
+	return ___cds_wfs_end(uatomic_load(&s->head, CMM_RELAXED));
 }
 
 /*
@@ -133,6 +133,8 @@ static inline bool _cds_wfs_empty(cds_wfs_stack_ptr_t u_stack)
  * Issues a full memory barrier before push. No mutual exclusion is
  * required.
  *
+ * Operations before push are consistent when observed after associated pop.
+ *
  * Returns 0 if the stack was empty prior to adding the node.
  * Returns non-zero otherwise.
  */
@@ -148,12 +150,13 @@ int _cds_wfs_push(cds_wfs_stack_ptr_t u_stack, struct cds_wfs_node *node)
 	 * uatomic_xchg() implicit memory barrier orders earlier stores
 	 * to node (setting it to NULL) before publication.
 	 */
-	old_head = uatomic_xchg(&s->head, new_head);
+	cmm_emit_legacy_smp_mb();
+	old_head = uatomic_xchg_mo(&s->head, new_head, CMM_SEQ_CST);
 	/*
 	 * At this point, dequeuers see a NULL node->next, they should
 	 * busy-wait until node->next is set to old_head.
 	 */
-	CMM_STORE_SHARED(node->next, &old_head->node);
+	uatomic_store(&node->next, &old_head->node, CMM_RELEASE);
 	return !___cds_wfs_end(old_head);
 }
 
@@ -169,7 +172,7 @@ ___cds_wfs_node_sync_next(struct cds_wfs_node *node, int blocking)
 	/*
 	 * Adaptative busy-looping waiting for push to complete.
 	 */
-	while ((next = CMM_LOAD_SHARED(node->next)) == NULL) {
+	while ((next = uatomic_load(&node->next, CMM_CONSUME)) == NULL) {
 		if (!blocking)
 			return CDS_WFS_WOULDBLOCK;
 		if (++attempt >= CDS_WFS_ADAPT_ATTEMPTS) {
@@ -194,7 +197,7 @@ ___cds_wfs_pop(cds_wfs_stack_ptr_t u_stack, int *state, int blocking)
 	if (state)
 		*state = 0;
 	for (;;) {
-		head = CMM_LOAD_SHARED(s->head);
+		head = uatomic_load(&s->head, CMM_CONSUME);
 		if (___cds_wfs_end(head)) {
 			return NULL;
 		}
@@ -203,9 +206,11 @@ ___cds_wfs_pop(cds_wfs_stack_ptr_t u_stack, int *state, int blocking)
 			return CDS_WFS_WOULDBLOCK;
 		}
 		new_head = caa_container_of(next, struct cds_wfs_head, node);
-		if (uatomic_cmpxchg(&s->head, head, new_head) == head) {
+		if (uatomic_cmpxchg_mo(&s->head, head, new_head,
+					CMM_SEQ_CST, CMM_SEQ_CST) == head) {
 			if (state && ___cds_wfs_end(new_head))
 				*state |= CDS_WFS_STATE_LAST;
+			cmm_emit_legacy_smp_mb();
 			return &head->node;
 		}
 		if (!blocking) {
@@ -220,6 +225,8 @@ ___cds_wfs_pop(cds_wfs_stack_ptr_t u_stack, int *state, int blocking)
  *
  * Returns NULL if stack is empty.
  *
+ * Operations after pop push are consistent when observed before associated push.
+ *
  * __cds_wfs_pop_blocking needs to be synchronized using one of the
  * following techniques:
  *
@@ -278,6 +285,8 @@ ___cds_wfs_pop_nonblocking(cds_wfs_stack_ptr_t u_stack)
 /*
  * __cds_wfs_pop_all: pop all nodes from a stack.
  *
+ * Operations after pop push are consistent when observed before associated push.
+ *
  * __cds_wfs_pop_all does not require any synchronization with other
  * push, nor with other __cds_wfs_pop_all, but requires synchronization
  * matching the technique used to synchronize __cds_wfs_pop_blocking:
@@ -309,7 +318,8 @@ ___cds_wfs_pop_all(cds_wfs_stack_ptr_t u_stack)
 	 * taking care to order writes to each node prior to the full
 	 * memory barrier after this uatomic_xchg().
 	 */
-	head = uatomic_xchg(&s->head, CDS_WFS_END);
+	head = uatomic_xchg_mo(&s->head, CDS_WFS_END, CMM_SEQ_CST);
+	cmm_emit_legacy_smp_mb();
 	if (___cds_wfs_end(head))
 		return NULL;
 	return head;
-- 
2.40.1

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [lttng-dev] [PATCH v2 12/12] tests: Add tests for checking race conditions
  2023-05-15 20:17 [lttng-dev] [PATCH 00/11] Add support for TSAN to liburcu Olivier Dion via lttng-dev
                   ` (23 preceding siblings ...)
  2023-06-07 18:53 ` [lttng-dev] [PATCH v2 11/12] Add cmm_emit_legacy_smp_mb() Olivier Dion via lttng-dev
@ 2023-06-07 18:53 ` Olivier Dion via lttng-dev
  24 siblings, 0 replies; 69+ messages in thread
From: Olivier Dion via lttng-dev @ 2023-06-07 18:53 UTC (permalink / raw)
  To: lttng-dev; +Cc: Olivier Dion, Tony Finch, Paul E. McKenney

These tests do nothing useful except of stress testing a
single-consumer, multiple-producers program on various data structures.

These tests are only meaningful when compiling liburcu with TSAN.

Change-Id: If22b27ed0fb95bf890947fc4e75f923edb5ada8f
Signed-off-by: Olivier Dion <odion@efficios.com>
---
 tests/unit/test_lfstack.c  |  90 ++++++++++++++++++++++++++++
 tests/unit/test_wfcqueue.c | 119 +++++++++++++++++++++++++++++++++++++
 tests/unit/test_wfqueue.c  |  91 ++++++++++++++++++++++++++++
 tests/unit/test_wfstack.c  |  90 ++++++++++++++++++++++++++++
 4 files changed, 390 insertions(+)
 create mode 100644 tests/unit/test_lfstack.c
 create mode 100644 tests/unit/test_wfcqueue.c
 create mode 100644 tests/unit/test_wfqueue.c
 create mode 100644 tests/unit/test_wfstack.c

diff --git a/tests/unit/test_lfstack.c b/tests/unit/test_lfstack.c
new file mode 100644
index 0000000..a1f99f0
--- /dev/null
+++ b/tests/unit/test_lfstack.c
@@ -0,0 +1,90 @@
+/*
+ * test_lfstack.c
+ *
+ * Userspace RCU library - test wftack race conditions
+ *
+ * Copyright 2023 - Olivier Dion <odion@efficios.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#define _LGPL_SOURCE
+
+#include <stdlib.h>
+
+#include <pthread.h>
+
+#include <urcu/lfstack.h>
+
+#include "tap.h"
+
+#define NR_TESTS 1
+#define NR_PRODUCERS 4
+#define LOOP 100
+
+static void async_run(struct cds_lfs_stack *queue)
+{
+	struct cds_lfs_node *node = malloc(sizeof(*node));
+
+	cds_lfs_node_init(node);
+
+	cds_lfs_push(queue, node);
+}
+
+static void *async_loop(void *queue)
+{
+	size_t k = 0;
+
+	while (k < LOOP * NR_PRODUCERS) {
+		free(cds_lfs_pop_blocking(queue));
+		++k;
+	}
+
+	return NULL;
+}
+
+static void *spawn_jobs(void *queue)
+{
+	for (size_t k = 0; k < LOOP; ++k) {
+		async_run(queue);
+	}
+
+	return 0;
+}
+
+int main(void)
+{
+	pthread_t consumer;
+	pthread_t producers[NR_PRODUCERS];
+	struct cds_lfs_stack queue;
+
+	plan_tests(NR_TESTS);
+
+	cds_lfs_init(&queue);
+	pthread_create(&consumer, NULL, async_loop, &queue);
+
+	for (size_t k = 0; k < NR_PRODUCERS; ++k) {
+		pthread_create(&producers[k], NULL, spawn_jobs, &queue);
+	}
+
+	pthread_join(consumer, NULL);
+	for (size_t k = 0; k < NR_PRODUCERS; ++k) {
+		pthread_join(producers[k], NULL);
+	}
+
+	ok1("No race conditions");
+
+	return exit_status();
+}
diff --git a/tests/unit/test_wfcqueue.c b/tests/unit/test_wfcqueue.c
new file mode 100644
index 0000000..338aa07
--- /dev/null
+++ b/tests/unit/test_wfcqueue.c
@@ -0,0 +1,119 @@
+/*
+ * test_wfcqueue.c
+ *
+ * Userspace RCU library - test wfcqueue race conditions
+ *
+ * Copyright 2023 - Olivier Dion <odion@efficios.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#define _LGPL_SOURCE
+
+#include <stdlib.h>
+
+#include <pthread.h>
+
+#include <urcu/wfcqueue.h>
+
+#include "tap.h"
+
+#define NR_TESTS 1
+#define NR_PRODUCERS 4
+#define LOOP 100
+
+struct queue {
+	struct cds_wfcq_head head;
+	struct cds_wfcq_tail tail;
+};
+
+static void async_run(struct queue *queue)
+{
+	struct cds_wfcq_node *node = malloc(sizeof(*node));
+
+	cds_wfcq_node_init(node);
+
+	cds_wfcq_enqueue(&queue->head, &queue->tail, node);
+}
+static void do_async_loop(size_t *k, struct queue *queue)
+{
+	struct queue my_queue;
+	enum cds_wfcq_ret state;
+	struct cds_wfcq_node *node, *next;
+
+	cds_wfcq_init(&my_queue.head, &my_queue.tail);
+
+	state = cds_wfcq_splice_blocking(&my_queue.head,
+					&my_queue.tail,
+					&queue->head,
+					&queue->tail);
+
+	if (state == CDS_WFCQ_RET_SRC_EMPTY) {
+		return;
+	}
+
+	__cds_wfcq_for_each_blocking_safe(&my_queue.head,
+					&my_queue.tail,
+					node, next) {
+		free(node);
+		(*k)++;
+	}
+}
+
+static void *async_loop(void *queue)
+{
+	size_t k = 0;
+
+	while (k < LOOP * NR_PRODUCERS) {
+		(void) poll(NULL, 0, 10);
+		do_async_loop(&k, queue);
+	}
+
+	return NULL;
+}
+
+static void *spawn_jobs(void *queue)
+{
+	for (size_t k = 0; k < LOOP; ++k) {
+		async_run(queue);
+	}
+
+	return 0;
+}
+
+int main(void)
+{
+	pthread_t consumer;
+	pthread_t producers[NR_PRODUCERS];
+	struct queue queue;
+
+	plan_tests(NR_TESTS);
+
+	cds_wfcq_init(&queue.head, &queue.tail);
+	pthread_create(&consumer, NULL, async_loop, &queue);
+
+	for (size_t k = 0; k < NR_PRODUCERS; ++k) {
+		pthread_create(&producers[k], NULL, spawn_jobs, &queue);
+	}
+
+	pthread_join(consumer, NULL);
+	for (size_t k = 0; k < NR_PRODUCERS; ++k) {
+		pthread_join(producers[k], NULL);
+	}
+
+	ok1("No race conditions");
+
+	return exit_status();
+}
diff --git a/tests/unit/test_wfqueue.c b/tests/unit/test_wfqueue.c
new file mode 100644
index 0000000..57afaba
--- /dev/null
+++ b/tests/unit/test_wfqueue.c
@@ -0,0 +1,91 @@
+/*
+ * test_wfqueue.c
+ *
+ * Userspace RCU library - test wfqueue race conditions
+ *
+ * Copyright 2023 - Olivier Dion <odion@efficios.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#define _LGPL_SOURCE
+
+#include <stdlib.h>
+
+#include <pthread.h>
+
+#define CDS_WFQ_DEPRECATED
+#include <urcu/wfqueue.h>
+
+#include "tap.h"
+
+#define NR_TESTS 1
+#define NR_PRODUCERS 4
+#define LOOP 100
+
+static void async_run(struct cds_wfq_queue *queue)
+{
+	struct cds_wfq_node *node = malloc(sizeof(*node));
+
+	cds_wfq_node_init(node);
+
+	cds_wfq_enqueue(queue, node);
+}
+
+static void *async_loop(void *queue)
+{
+	size_t k = 0;
+
+	while (k < LOOP * NR_PRODUCERS) {
+		free(cds_wfq_dequeue_blocking(queue));
+		++k;
+	}
+
+	return NULL;
+}
+
+static void *spawn_jobs(void *queue)
+{
+	for (size_t k = 0; k < LOOP; ++k) {
+		async_run(queue);
+	}
+
+	return 0;
+}
+
+int main(void)
+{
+	pthread_t consumer;
+	pthread_t producers[NR_PRODUCERS];
+	struct cds_wfq_queue queue;
+
+	plan_tests(NR_TESTS);
+
+	cds_wfq_init(&queue);
+	pthread_create(&consumer, NULL, async_loop, &queue);
+
+	for (size_t k = 0; k < NR_PRODUCERS; ++k) {
+		pthread_create(&producers[k], NULL, spawn_jobs, &queue);
+	}
+
+	pthread_join(consumer, NULL);
+	for (size_t k = 0; k < NR_PRODUCERS; ++k) {
+		pthread_join(producers[k], NULL);
+	}
+
+	ok1("No race conditions");
+
+	return exit_status();
+}
diff --git a/tests/unit/test_wfstack.c b/tests/unit/test_wfstack.c
new file mode 100644
index 0000000..578ae92
--- /dev/null
+++ b/tests/unit/test_wfstack.c
@@ -0,0 +1,90 @@
+/*
+ * test_wfstack.c
+ *
+ * Userspace RCU library - test wftack race conditions
+ *
+ * Copyright 2023 - Olivier Dion <odion@efficios.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#define _LGPL_SOURCE
+
+#include <stdlib.h>
+
+#include <pthread.h>
+
+#include <urcu/wfstack.h>
+
+#include "tap.h"
+
+#define NR_TESTS 1
+#define NR_PRODUCERS 4
+#define LOOP 100
+
+static void async_run(struct cds_wfs_stack *queue)
+{
+	struct cds_wfs_node *node = malloc(sizeof(*node));
+
+	cds_wfs_node_init(node);
+
+	cds_wfs_push(queue, node);
+}
+
+static void *async_loop(void *queue)
+{
+	size_t k = 0;
+
+	while (k < LOOP * NR_PRODUCERS) {
+		free(cds_wfs_pop_blocking(queue));
+		++k;
+	}
+
+	return NULL;
+}
+
+static void *spawn_jobs(void *queue)
+{
+	for (size_t k = 0; k < LOOP; ++k) {
+		async_run(queue);
+	}
+
+	return 0;
+}
+
+int main(void)
+{
+	pthread_t consumer;
+	pthread_t producers[NR_PRODUCERS];
+	struct cds_wfs_stack queue;
+
+	plan_tests(NR_TESTS);
+
+	cds_wfs_init(&queue);
+	pthread_create(&consumer, NULL, async_loop, &queue);
+
+	for (size_t k = 0; k < NR_PRODUCERS; ++k) {
+		pthread_create(&producers[k], NULL, spawn_jobs, &queue);
+	}
+
+	pthread_join(consumer, NULL);
+	for (size_t k = 0; k < NR_PRODUCERS; ++k) {
+		pthread_join(producers[k], NULL);
+	}
+
+	ok1("No race conditions");
+
+	return exit_status();
+}
-- 
2.40.1

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [lttng-dev] [PATCH v2 00/12] Add support for TSAN to liburcu
  2023-06-07 18:53 ` [lttng-dev] [PATCH v2 00/12] " Olivier Dion via lttng-dev
@ 2023-06-07 19:04   ` Ondřej Surý via lttng-dev
  2023-06-07 19:20     ` Olivier Dion via lttng-dev
  0 siblings, 1 reply; 69+ messages in thread
From: Ondřej Surý via lttng-dev @ 2023-06-07 19:04 UTC (permalink / raw)
  To: Olivier Dion; +Cc: lttng-dev, Tony Finch

Olivier,

is this somewhere in Gerrit again, please? It’s much easier for us to pull the changes from git than cherry-pick patches from the mailing list.

Ondřej
--
Ondřej Surý <ondrej@sury.org> (He/Him)

> On 7. 6. 2023, at 20:54, Olivier Dion <odion@efficios.com> wrote:
> 
> This patch set adds support for TSAN in liburcu.
> 
> * Change since v1
> 
> ** Adding CMM_SEQ_CST_FENCE memory order to the CMM memory model
> 
>   The C11 memory model is incompatible with the memory model used by liburcu,
>   since the semantic of the C11 memory model is based on a happen-before
>   (acquire/release) relationship of memory accesses, while liburcu is based on
>   memory barriers and relaxed memory accesses.
> 
>   To circumvent this, a new memory order called CMM_SEQ_CST_FENCE is
>   introduced.  It implies a CMM_SEQ_CST while emitting a thread fence after the
>   operation.  Operations that were documented as emitting memory barriers
>   before and after the operation are implemented now in term of this new memory
>   order to keep compatibility.
> 
>   However, this model is redundant in some cases and the memory orders were
>   changed internally in liburcu to simply use the CMM_SEQ_CST.
> 
> ** Adding cmm_emit_legacy_smp_mb() to queue/stack APIs
> 
>   The queue/stack APIs document implicit memory barriers before or after
>   operation.  These are now optionally emitted only if
>   CONFIG_RCU_EMIT_LEGACY_MB is defined in urcu/config.h or manually defined by
>   the user before including liburcu.  That way, users can opt-in even if the
>   system headers were configured without this feature.
> 
>   However, users can not opt-out of that feature if configured by the system.
> 
> * v1
> 
> ** Here are the major changes
> 
>  - Usage of compiler atomic builtins is added to the uatomic API.  This is
>    required for TSAN to understand atomic memory accesses.  If the compiler
>    supports such builtins, they are used by default.  User can opt-out and use
>    the legacy implementation of the uatomic API by using the
>    `--disable-atomic-builtins' configuration option.
> 
>  - The CMM memory model is introduced but yet formalized. It tries to be as
>    close as possible to the C11 memory model while offering primitives such as
>    cmm_smp_wmb(), cmm_smp_rmb() and cmm_mb() that can't be expressed in it.
>    For example, cmm_mb() can be used for ordering memory accesses to MMIO
>    devices, which is out of the scope of the C11 memory model.
> 
>  - The CMM annotation layer is a new public API that is highly experimental and
>    not guaranteed to be stable at this stage.  It serves the dual purpose of
>    verifying local (intra-thread) relaxed atomic accesses ordering with a
>    memory barrier and global (inter-thread) relaxed atomic accesses with a
>    shared state.  The second purpose is necessary for TSAN to understand memory
>    accesses ordering since it does not fully support thread fence yet.
> 
> ** CMM annotation example
> 
>  Consider the following pseudo-code of writer side in synchronize_rcu().  An
>  acquire group is defined on the stack of the writer.  Annotations are made
>  onto the group to ensure ordering of relaxed memory accesses in reader_state()
>  before the memory barrier at the end of synchronize_rcu().  It also helps TSAN
>  to understand that the relaxed accesses in reader_state() act like acquire
>  accesses because of the memory barrier in synchronize_rcu().
> 
>  In other words, the purpose of this annotation is to convert a group of
>  load-acquire memory operations into load-relaxed memory operations followed by
>  a single memory barrier.  This highly benefits weakly ordered architectures by
>  having a constant number of memory barriers instead of being linearly
>  proportional to the number of loads.  This does not benefit TSO
>  architectures.  
> 
> 
> Olivier Dion (12):
>  configure: Add --disable-atomic-builtins option
>  urcu/compiler: Use atomic builtins if configured
>  urcu/arch/generic: Use atomic builtins if configured
>  urcu/system: Use atomic builtins if configured
>  urcu/uatomic: Add CMM memory model
>  urcu-wait: Fix wait state load/store
>  tests: Use uatomic for accessing global states
>  benchmark: Use uatomic for accessing global states
>  tests/unit/test_build: Quiet unused return value
>  urcu/annotate: Add CMM annotation
>  Add cmm_emit_legacy_smp_mb()
>  tests: Add tests for checking race conditions
> 
> README.md                               |  11 ++
> configure.ac                            |  39 ++++
> doc/uatomic-api.md                      |   3 +-
> include/Makefile.am                     |   3 +
> include/urcu/annotate.h                 | 174 ++++++++++++++++++
> include/urcu/arch.h                     |   6 +
> include/urcu/arch/generic.h             |  37 ++++
> include/urcu/compiler.h                 |  22 ++-
> include/urcu/config.h.in                |   6 +
> include/urcu/static/lfstack.h           |  25 ++-
> include/urcu/static/pointer.h           |  40 ++--
> include/urcu/static/rculfqueue.h        |  14 +-
> include/urcu/static/rculfstack.h        |   8 +-
> include/urcu/static/urcu-bp.h           |  12 +-
> include/urcu/static/urcu-common.h       |   8 +-
> include/urcu/static/urcu-mb.h           |  11 +-
> include/urcu/static/urcu-memb.h         |  26 ++-
> include/urcu/static/urcu-qsbr.h         |  29 ++-
> include/urcu/static/wfcqueue.h          |  68 +++----
> include/urcu/static/wfqueue.h           |   9 +-
> include/urcu/static/wfstack.h           |  24 ++-
> include/urcu/system.h                   |  21 +++
> include/urcu/uatomic.h                  |  63 ++++++-
> include/urcu/uatomic/builtins-generic.h | 170 +++++++++++++++++
> include/urcu/uatomic/builtins.h         |  79 ++++++++
> include/urcu/uatomic/generic.h          | 234 ++++++++++++++++++++++++
> src/rculfhash.c                         |  92 ++++++----
> src/urcu-bp.c                           |  17 +-
> src/urcu-pointer.c                      |   9 +-
> src/urcu-qsbr.c                         |  31 +++-
> src/urcu-wait.h                         |  15 +-
> src/urcu.c                              |  24 ++-
> tests/benchmark/Makefile.am             |  91 ++++-----
> tests/benchmark/common-states.c         |   1 +
> tests/benchmark/common-states.h         |  51 ++++++
> tests/benchmark/test_mutex.c            |  32 +---
> tests/benchmark/test_perthreadlock.c    |  32 +---
> tests/benchmark/test_rwlock.c           |  32 +---
> tests/benchmark/test_urcu.c             |  33 +---
> tests/benchmark/test_urcu_assign.c      |  33 +---
> tests/benchmark/test_urcu_bp.c          |  33 +---
> tests/benchmark/test_urcu_defer.c       |  33 +---
> tests/benchmark/test_urcu_gc.c          |  34 +---
> tests/benchmark/test_urcu_hash.c        |   6 +-
> tests/benchmark/test_urcu_hash.h        |  15 --
> tests/benchmark/test_urcu_hash_rw.c     |  10 +-
> tests/benchmark/test_urcu_hash_unique.c |  10 +-
> tests/benchmark/test_urcu_lfq.c         |  20 +-
> tests/benchmark/test_urcu_lfs.c         |  20 +-
> tests/benchmark/test_urcu_lfs_rcu.c     |  20 +-
> tests/benchmark/test_urcu_qsbr.c        |  33 +---
> tests/benchmark/test_urcu_qsbr_gc.c     |  34 +---
> tests/benchmark/test_urcu_wfcq.c        |  22 +--
> tests/benchmark/test_urcu_wfq.c         |  20 +-
> tests/benchmark/test_urcu_wfs.c         |  22 +--
> tests/common/api.h                      |  12 +-
> tests/regression/rcutorture.h           | 106 +++++++----
> tests/unit/test_build.c                 |   8 +-
> tests/unit/test_lfstack.c               |  90 +++++++++
> tests/unit/test_wfcqueue.c              | 119 ++++++++++++
> tests/unit/test_wfqueue.c               |  91 +++++++++
> tests/unit/test_wfstack.c               |  90 +++++++++
> 62 files changed, 1799 insertions(+), 684 deletions(-)
> create mode 100644 include/urcu/annotate.h
> create mode 100644 include/urcu/uatomic/builtins-generic.h
> create mode 100644 include/urcu/uatomic/builtins.h
> create mode 100644 tests/benchmark/common-states.c
> create mode 100644 tests/benchmark/common-states.h
> create mode 100644 tests/unit/test_lfstack.c
> create mode 100644 tests/unit/test_wfcqueue.c
> create mode 100644 tests/unit/test_wfqueue.c
> create mode 100644 tests/unit/test_wfstack.c
> 
> -- 
> 2.40.1
> 

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [lttng-dev] [PATCH v2 00/12] Add support for TSAN to liburcu
  2023-06-07 19:04   ` Ondřej Surý via lttng-dev
@ 2023-06-07 19:20     ` Olivier Dion via lttng-dev
  0 siblings, 0 replies; 69+ messages in thread
From: Olivier Dion via lttng-dev @ 2023-06-07 19:20 UTC (permalink / raw)
  To: Ondřej Surý; +Cc: lttng-dev, Tony Finch

On Wed, 07 Jun 2023, Ondřej Surý <ondrej@sury.org> wrote:
> Olivier,
>
> is this somewhere in Gerrit again, please? It’s much easier for us to
> pull the changes from git than cherry-pick patches from the mailing
> list.

Here https://review.lttng.org/c/userspace-rcu/+/10118/6.

-- 
Olivier Dion
EfficiOS Inc.
https://www.efficios.com
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [lttng-dev] [PATCH 02/11] urcu/uatomic: Use atomic builtins if configured
  2023-05-15 20:17 ` [lttng-dev] [PATCH 02/11] urcu/uatomic: Use atomic builtins if configured Olivier Dion via lttng-dev
@ 2023-06-21 23:19   ` Paul E. McKenney via lttng-dev
  2023-06-22 15:55     ` Mathieu Desnoyers via lttng-dev
  0 siblings, 1 reply; 69+ messages in thread
From: Paul E. McKenney via lttng-dev @ 2023-06-21 23:19 UTC (permalink / raw)
  To: Olivier Dion; +Cc: lttng-dev

On Mon, May 15, 2023 at 04:17:09PM -0400, Olivier Dion wrote:
> Implement uatomic in term of atomic builtins if configured to do so.
> 
> Change-Id: I5814494c62ee507fd5d381c3ba4ccd0a80c4f4e3
> Co-authored-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Signed-off-by: Olivier Dion <odion@efficios.com>
> ---
>  include/Makefile.am                     |  3 +
>  include/urcu/uatomic.h                  |  5 +-
>  include/urcu/uatomic/builtins-generic.h | 85 +++++++++++++++++++++++++
>  include/urcu/uatomic/builtins-x86.h     | 85 +++++++++++++++++++++++++
>  include/urcu/uatomic/builtins.h         | 83 ++++++++++++++++++++++++
>  5 files changed, 260 insertions(+), 1 deletion(-)
>  create mode 100644 include/urcu/uatomic/builtins-generic.h
>  create mode 100644 include/urcu/uatomic/builtins-x86.h
>  create mode 100644 include/urcu/uatomic/builtins.h
> 
> diff --git a/include/Makefile.am b/include/Makefile.am
> index ba1fe60..fac941f 100644
> --- a/include/Makefile.am
> +++ b/include/Makefile.am
> @@ -63,6 +63,9 @@ nobase_include_HEADERS = \
>  	urcu/uatomic/alpha.h \
>  	urcu/uatomic_arch.h \
>  	urcu/uatomic/arm.h \
> +	urcu/uatomic/builtins.h \
> +	urcu/uatomic/builtins-generic.h \
> +	urcu/uatomic/builtins-x86.h \
>  	urcu/uatomic/gcc.h \
>  	urcu/uatomic/generic.h \
>  	urcu/uatomic.h \
> diff --git a/include/urcu/uatomic.h b/include/urcu/uatomic.h
> index 2fb5fd4..6b57c5f 100644
> --- a/include/urcu/uatomic.h
> +++ b/include/urcu/uatomic.h
> @@ -22,8 +22,11 @@
>  #define _URCU_UATOMIC_H
>  
>  #include <urcu/arch.h>
> +#include <urcu/config.h>
>  
> -#if defined(URCU_ARCH_X86)
> +#if defined(CONFIG_RCU_USE_ATOMIC_BUILTINS)
> +#include <urcu/uatomic/builtins.h>
> +#elif defined(URCU_ARCH_X86)
>  #include <urcu/uatomic/x86.h>
>  #elif defined(URCU_ARCH_PPC)
>  #include <urcu/uatomic/ppc.h>
> diff --git a/include/urcu/uatomic/builtins-generic.h b/include/urcu/uatomic/builtins-generic.h
> new file mode 100644
> index 0000000..8e6a9b5
> --- /dev/null
> +++ b/include/urcu/uatomic/builtins-generic.h
> @@ -0,0 +1,85 @@
> +/*
> + * urcu/uatomic/builtins-generic.h
> + *
> + * Copyright (c) 2023 Olivier Dion <odion@efficios.com>
> + *
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with this library; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +
> +#ifndef _URCU_UATOMIC_BUILTINS_GENERIC_H
> +#define _URCU_UATOMIC_BUILTINS_GENERIC_H
> +
> +#include <urcu/system.h>
> +
> +#define uatomic_set(addr, v) __atomic_store_n(addr, v, __ATOMIC_RELAXED)
> +
> +#define uatomic_read(addr) __atomic_load_n(addr, __ATOMIC_RELAXED)

Does this lose the volatile semantics that the old-style definitions
had?

> +
> +#define uatomic_cmpxchg(addr, old, new)					\
> +	__extension__							\
> +	({								\
> +		__typeof__(*(addr)) _old = (__typeof__(*(addr)))old;	\
> +		__atomic_compare_exchange_n(addr, &_old, new, 0,	\
> +					    __ATOMIC_SEQ_CST,		\
> +					    __ATOMIC_SEQ_CST);		\
> +		_old;							\
> +	})
> +
> +#define uatomic_xchg(addr, v)				\
> +	__atomic_exchange_n(addr, v, __ATOMIC_SEQ_CST)
> +
> +#define uatomic_add_return(addr, v)			\
> +	__atomic_add_fetch(addr, v, __ATOMIC_SEQ_CST)
> +
> +#define uatomic_sub_return(addr, v)			\
> +	__atomic_sub_fetch(addr, v, __ATOMIC_SEQ_CST)
> +
> +#define uatomic_and(addr, mask)					\
> +	(void)__atomic_and_fetch(addr, mask, __ATOMIC_RELAXED)
> +
> +#define uatomic_or(addr, mask)					\
> +	(void)__atomic_or_fetch(addr, mask, __ATOMIC_RELAXED)
> +
> +#define uatomic_add(addr, v)					\
> +	(void)__atomic_add_fetch(addr, v, __ATOMIC_RELAXED)
> +
> +#define uatomic_sub(addr, v)					\
> +	(void)__atomic_sub_fetch(addr, v, __ATOMIC_RELAXED)
> +
> +#define uatomic_inc(addr)					\
> +	(void)__atomic_add_fetch(addr, 1, __ATOMIC_RELAXED)
> +
> +#define uatomic_dec(addr)					\
> +	(void)__atomic_sub_fetch(addr, 1, __ATOMIC_RELAXED)
> +
> +#define cmm_smp_mb__before_uatomic_and() cmm_smp_mb()
> +#define cmm_smp_mb__after_uatomic_and()  cmm_smp_mb()
> +
> +#define cmm_smp_mb__before_uatomic_or() cmm_smp_mb()
> +#define cmm_smp_mb__after_uatomic_or()  cmm_smp_mb()
> +
> +#define cmm_smp_mb__before_uatomic_add() cmm_smp_mb()
> +#define cmm_smp_mb__after_uatomic_add()  cmm_smp_mb()
> +
> +#define cmm_smp_mb__before_uatomic_sub() cmm_smp_mb()
> +#define cmm_smp_mb__after_uatomic_sub()  cmm_smp_mb()
> +
> +#define cmm_smp_mb__before_uatomic_inc() cmm_smp_mb()
> +#define cmm_smp_mb__after_uatomic_inc() cmm_smp_mb()
> +
> +#define cmm_smp_mb__before_uatomic_dec() cmm_smp_mb()
> +#define cmm_smp_mb__after_uatomic_dec() cmm_smp_mb()
> +
> +#endif /* _URCU_UATOMIC_BUILTINS_GENERIC_H */
> diff --git a/include/urcu/uatomic/builtins-x86.h b/include/urcu/uatomic/builtins-x86.h
> new file mode 100644
> index 0000000..a70f922
> --- /dev/null
> +++ b/include/urcu/uatomic/builtins-x86.h
> @@ -0,0 +1,85 @@
> +/*
> + * urcu/uatomic/builtins-x86.h
> + *
> + * Copyright (c) 2023 Olivier Dion <odion@efficios.com>
> + *
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with this library; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +
> +#ifndef _URCU_UATOMIC_BUILTINS_X86_H
> +#define _URCU_UATOMIC_BUILTINS_X86_H
> +
> +#include <urcu/system.h>
> +
> +#define uatomic_set(addr, v) __atomic_store_n(addr, v, __ATOMIC_RELAXED)
> +
> +#define uatomic_read(addr) __atomic_load_n(addr, __ATOMIC_RELAXED)

And same question here.

							Thanx, Paul

> +
> +#define uatomic_cmpxchg(addr, old, new)					\
> +	__extension__							\
> +	({								\
> +		__typeof__(*(addr)) _old = (__typeof__(*(addr)))old;	\
> +		__atomic_compare_exchange_n(addr, &_old, new, 0,	\
> +					    __ATOMIC_SEQ_CST,		\
> +					    __ATOMIC_SEQ_CST);		\
> +		_old;							\
> +	})
> +
> +#define uatomic_xchg(addr, v)				\
> +	__atomic_exchange_n(addr, v, __ATOMIC_SEQ_CST)
> +
> +#define uatomic_add_return(addr, v)			\
> +	__atomic_add_fetch(addr, v, __ATOMIC_SEQ_CST)
> +
> +#define uatomic_sub_return(addr, v)			\
> +	__atomic_sub_fetch(addr, v, __ATOMIC_SEQ_CST)
> +
> +#define uatomic_and(addr, mask)					\
> +	(void)__atomic_and_fetch(addr, mask, __ATOMIC_SEQ_CST)
> +
> +#define uatomic_or(addr, mask)					\
> +	(void)__atomic_or_fetch(addr, mask, __ATOMIC_SEQ_CST)
> +
> +#define uatomic_add(addr, v)					\
> +	(void)__atomic_add_fetch(addr, v, __ATOMIC_SEQ_CST)
> +
> +#define uatomic_sub(addr, v)					\
> +	(void)__atomic_sub_fetch(addr, v, __ATOMIC_SEQ_CST)
> +
> +#define uatomic_inc(addr)					\
> +	(void)__atomic_add_fetch(addr, 1, __ATOMIC_SEQ_CST)
> +
> +#define uatomic_dec(addr)					\
> +	(void)__atomic_sub_fetch(addr, 1, __ATOMIC_SEQ_CST)
> +
> +#define cmm_smp_mb__before_uatomic_and() do { } while (0)
> +#define cmm_smp_mb__after_uatomic_and()  do { } while (0)
> +
> +#define cmm_smp_mb__before_uatomic_or() do { } while (0)
> +#define cmm_smp_mb__after_uatomic_or()  do { } while (0)
> +
> +#define cmm_smp_mb__before_uatomic_add() do { } while (0)
> +#define cmm_smp_mb__after_uatomic_add()  do { } while (0)
> +
> +#define cmm_smp_mb__before_uatomic_sub() do { } while (0)
> +#define cmm_smp_mb__after_uatomic_sub()  do { } while (0)
> +
> +#define cmm_smp_mb__before_uatomic_inc() do { } while (0)
> +#define cmm_smp_mb__after_uatomic_inc()  do { } while (0)
> +
> +#define cmm_smp_mb__before_uatomic_dec() do { } while (0)
> +#define cmm_smp_mb__after_uatomic_dec()  do { } while (0)
> +
> +#endif /* _URCU_UATOMIC_BUILTINS_X86_H */
> diff --git a/include/urcu/uatomic/builtins.h b/include/urcu/uatomic/builtins.h
> new file mode 100644
> index 0000000..164201b
> --- /dev/null
> +++ b/include/urcu/uatomic/builtins.h
> @@ -0,0 +1,83 @@
> +/*
> + * urcu/uatomic/builtins.h
> + *
> + * Copyright (c) 2023 Olivier Dion <odion@efficios.com>
> + *
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with this library; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +
> +#ifndef _URCU_UATOMIC_BUILTINS_H
> +#define _URCU_UATOMIC_BUILTINS_H
> +
> +#include <urcu/arch.h>
> +
> +#if defined(__has_builtin)
> +#  if !__has_builtin(__atomic_store_n)
> +#    error "Toolchain does not support __atomic_store_n."
> +#  endif
> +#  if !__has_builtin(__atomic_load_n)
> +#    error "Toolchain does not support __atomic_load_n."
> +#  endif
> +#  if !__has_builtin(__atomic_exchange_n)
> +#    error "Toolchain does not support __atomic_exchange_n."
> +#  endif
> +#  if !__has_builtin(__atomic_compare_exchange_n)
> +#    error "Toolchain does not support __atomic_compare_exchange_n."
> +#  endif
> +#  if !__has_builtin(__atomic_add_fetch)
> +#    error "Toolchain does not support __atomic_add_fetch."
> +#  endif
> +#  if !__has_builtin(__atomic_sub_fetch)
> +#    error "Toolchain does not support __atomic_sub_fetch."
> +#  endif
> +#  if !__has_builtin(__atomic_or_fetch)
> +#    error "Toolchain does not support __atomic_or_fetch."
> +#  endif
> +#  if !__has_builtin(__atomic_thread_fence)
> +#    error "Toolchain does not support __atomic_thread_fence."
> +#  endif
> +#  if !__has_builtin(__atomic_signal_fence)
> +#    error "Toolchain does not support __atomic_signal_fence."
> +#  endif
> +#elif defined(__GNUC__)
> +#  define GCC_VERSION (__GNUC__       * 10000 + \
> +		       __GNUC_MINOR__ * 100   + \
> +		       __GNUC_PATCHLEVEL__)
> +#  if  GCC_VERSION < 40700
> +#    error "GCC version is too old. Version must be 4.7 or greater"
> +#  endif
> +#  undef  GCC_VERSION
> +#else
> +#  error "Toolchain is not supported."
> +#endif
> +
> +#if defined(__GNUC__)
> +#  define UATOMIC_HAS_ATOMIC_BYTE  __GCC_ATOMIC_CHAR_LOCK_FREE
> +#  define UATOMIC_HAS_ATOMIC_SHORT __GCC_ATOMIC_SHORT_LOCK_FREE
> +#elif defined(__clang__)
> +#  define UATOMIC_HAS_ATOMIC_BYTE  __CLANG_ATOMIC_CHAR_LOCK_FREE
> +#  define UATOMIC_HAS_ATOMIC_SHORT __CLANG_ATOMIC_SHORT_LOCK_FREE
> +#else
> +/* #  define UATOMIC_HAS_ATOMIC_BYTE  */
> +/* #  define UATOMIC_HAS_ATOMIC_SHORT */
> +#endif
> +
> +#if defined(URCU_ARCH_X86)
> +#  include <urcu/uatomic/builtins-x86.h>
> +#else
> +#  include <urcu/uatomic/builtins-generic.h>
> +#endif
> +
> +#endif	/* _URCU_UATOMIC_BUILTINS_H */
> -- 
> 2.39.2
> 
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [lttng-dev] [PATCH 04/11] urcu/arch/generic: Use atomic builtins if configured
  2023-05-15 20:17 ` [lttng-dev] [PATCH 04/11] urcu/arch/generic: " Olivier Dion via lttng-dev
@ 2023-06-21 23:22   ` Paul E. McKenney via lttng-dev
  2023-06-22  0:53     ` Olivier Dion via lttng-dev
  0 siblings, 1 reply; 69+ messages in thread
From: Paul E. McKenney via lttng-dev @ 2023-06-21 23:22 UTC (permalink / raw)
  To: Olivier Dion; +Cc: lttng-dev

On Mon, May 15, 2023 at 04:17:11PM -0400, Olivier Dion wrote:
> If configured to use atomic builtins, implement SMP memory barriers in
> term of atomic builtins if the architecture does not implement its own
> version.
> 
> Change-Id: Iddc4283606e0fce572e104d2d3f03b5c0d9926fb
> Co-authored-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Signed-off-by: Olivier Dion <odion@efficios.com>
> ---
>  include/urcu/arch/generic.h | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/include/urcu/arch/generic.h b/include/urcu/arch/generic.h
> index be6e41e..e292c70 100644
> --- a/include/urcu/arch/generic.h
> +++ b/include/urcu/arch/generic.h
> @@ -43,6 +43,14 @@ extern "C" {
>   * GCC builtins) as well as cmm_rmb and cmm_wmb (defaulting to cmm_mb).
>   */
>  
> +#ifdef CONFIG_RCU_USE_ATOMIC_BUILTINS
> +
> +# ifndef cmm_smp_mb
> +#  define cmm_smp_mb() __atomic_thread_fence(__ATOMIC_SEQ_CST)
> +# endif
> +
> +#endif	/* CONFIG_RCU_USE_ATOMIC_BUILTINS */
> +
>  #ifndef cmm_mb
>  #define cmm_mb()    __sync_synchronize()

Just out of curiosity, why not also implement cmm_mb() in terms of
__atomic_thread_fence(__ATOMIC_SEQ_CST)?  (Or is that a later patch?)

							Thanx, Paul

>  #endif
> -- 
> 2.39.2
> 
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [lttng-dev] [PATCH v2 04/12] urcu/system: Use atomic builtins if configured
  2023-06-07 18:53 ` [lttng-dev] [PATCH v2 04/12] urcu/system: " Olivier Dion via lttng-dev
@ 2023-06-21 23:23   ` Paul E. McKenney via lttng-dev
  2023-07-04 14:43     ` Olivier Dion via lttng-dev
  0 siblings, 1 reply; 69+ messages in thread
From: Paul E. McKenney via lttng-dev @ 2023-06-21 23:23 UTC (permalink / raw)
  To: Olivier Dion; +Cc: Tony Finch, lttng-dev

On Wed, Jun 07, 2023 at 02:53:51PM -0400, Olivier Dion wrote:
> If configured to use atomic builtins, use them for implementing the
> CMM_LOAD_SHARED and CMM_STORE_SHARED macros.
> 
> Change-Id: I3eaaaaf0d26c47aced6e94b40fd59c7b8baa6272
> Co-authored-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Signed-off-by: Olivier Dion <odion@efficios.com>
> ---
>  include/urcu/system.h | 21 +++++++++++++++++++++
>  1 file changed, 21 insertions(+)
> 
> diff --git a/include/urcu/system.h b/include/urcu/system.h
> index faae390..f184aad 100644
> --- a/include/urcu/system.h
> +++ b/include/urcu/system.h
> @@ -19,9 +19,28 @@
>   * all copies or substantial portions of the Software.
>   */
>  
> +#include <urcu/config.h>
>  #include <urcu/compiler.h>
>  #include <urcu/arch.h>
>  
> +#ifdef CONFIG_RCU_USE_ATOMIC_BUILTINS
> +
> +#define CMM_LOAD_SHARED(x)			\
> +	__atomic_load_n(&(x), __ATOMIC_RELAXED)
> +
> +#define _CMM_LOAD_SHARED(x) CMM_LOAD_SHARED(x)
> +
> +#define CMM_STORE_SHARED(x, v)					\
> +	__extension__						\
> +	({							\
> +		__typeof__(v) _v = (v);				\
> +		__atomic_store_n(&(x), _v, __ATOMIC_RELAXED);	\
> +		_v;						\
> +	})
> +
> +#define _CMM_STORE_SHARED(x, v) CMM_STORE_SHARED(x, v)

Same question here on loss of volatile semantics.

							Thanx, Paul

> +
> +#else
>  /*
>   * Identify a shared load. A cmm_smp_rmc() or cmm_smp_mc() should come
>   * before the load.
> @@ -56,4 +75,6 @@
>  		_v = _v;	/* Work around clang "unused result" */	\
>  	})
>  
> +#endif	/* CONFIG_RCU_USE_ATOMIC_BUILTINS */
> +
>  #endif /* _URCU_SYSTEM_H */
> -- 
> 2.40.1
> 
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [lttng-dev] [PATCH v2 05/12] urcu/uatomic: Add CMM memory model
  2023-06-07 18:53 ` [lttng-dev] [PATCH v2 05/12] urcu/uatomic: Add CMM memory model Olivier Dion via lttng-dev
@ 2023-06-21 23:28   ` Paul E. McKenney via lttng-dev
  2023-06-29 16:49     ` Olivier Dion via lttng-dev
  0 siblings, 1 reply; 69+ messages in thread
From: Paul E. McKenney via lttng-dev @ 2023-06-21 23:28 UTC (permalink / raw)
  To: Olivier Dion; +Cc: Tony Finch, lttng-dev

On Wed, Jun 07, 2023 at 02:53:52PM -0400, Olivier Dion wrote:
> Introducing the CMM memory model with the following new primitives:
> 
>   - uatomic_load(addr, memory_order)
> 
>   - uatomic_store(addr, value, memory_order)
>   - uatomic_and_mo(addr, mask, memory_order)
>   - uatomic_or_mo(addr, mask, memory_order)
>   - uatomic_add_mo(addr, value, memory_order)
>   - uatomic_sub_mo(addr, value, memory_order)
>   - uatomic_inc_mo(addr, memory_order)
>   - uatomic_dec_mo(addr, memory_order)
> 
>   - uatomic_add_return_mo(addr, value, memory_order)
>   - uatomic_sub_return_mo(addr, value, memory_order)
> 
>   - uatomic_xchg_mo(addr, value, memory_order)
> 
>   - uatomic_cmpxchg_mo(addr, old, new,
>                        memory_order_success,
>                        memory_order_failure)
> 
> The CMM memory model reflects the C11 memory model with an additional
> CMM_SEQ_CST_FENCE memory order. The memory order can be selected through
> the enum cmm_memorder.
> 
> * With Atomic Builtins
> 
> If configured with atomic builtins, the correspondence between the CMM
> memory model and the C11 memory model is a one to one at the exception
> of the CMM_SEQ_CST_FENCE memory order which implies the memory order
> CMM_SEQ_CST and a thread fence after the operation.
> 
> * Without Atomic Builtins
> 
> However, if not configured with atomic builtins, the following stipulate
> the memory model.
> 
> For load operations with uatomic_load(), the memory orders CMM_RELAXED,
> CMM_CONSUME, CMM_ACQUIRE, CMM_SEQ_CST and CMM_SEQ_CST_FENCE are
> allowed. A barrier may be inserted before and after the load from memory
> depending on the memory order:
> 
>   - CMM_RELAXED: No barrier
>   - CMM_CONSUME: Memory barrier after read
>   - CMM_ACQUIRE: Memory barrier after read
>   - CMM_SEQ_CST: Memory barriers before and after read
>   - CMM_SEQ_CST_FENCE: Memory barriers before and after read
> 
> For store operations with uatomic_store(), the memory orders
> CMM_RELAXED, CMM_RELEASE, CMM_SEQ_CST and CMM_SEQ_CST_FENCE are
> allowed. A barrier may be inserted before and after the store to memory
> depending on the memory order:
> 
>   - CMM_RELAXED: No barrier
>   - CMM_RELEASE: Memory barrier before operation
>   - CMM_SEQ_CST: Memory barriers before and after operation
>   - CMM_SEQ_CST_FENCE: Memory barriers before and after operation
> 
> For load/store operations with uatomic_and_mo(), uatomic_or_mo(),
> uatomic_add_mo(), uatomic_sub_mo(), uatomic_inc_mo(), uatomic_dec_mo(),
> uatomic_add_return_mo() and uatomic_sub_return_mo(), all memory orders
> are allowed. A barrier may be inserted before and after the operation
> depending on the memory order:
> 
>   - CMM_RELAXED: No barrier
>   - CMM_ACQUIRE: Memory barrier after operation
>   - CMM_CONSUME: Memory barrier after operation
>   - CMM_RELEASE: Memory barrier before operation
>   - CMM_ACQ_REL: Memory barriers before and after operation
>   - CMM_SEQ_CST: Memory barriers before and after operation
>   - CMM_SEQ_CST_FENCE: Memory barriers before and after operation
> 
> For the exchange operation uatomic_xchg_mo(), any memory order is
> valid. A barrier may be inserted before and after the exchange to memory
> depending on the memory order:
> 
>   - CMM_RELAXED: No barrier
>   - CMM_ACQUIRE: Memory barrier after operation
>   - CMM_CONSUME: Memory barrier after operation
>   - CMM_RELEASE: Memory barrier before operation
>   - CMM_ACQ_REL: Memory barriers before and after operation
>   - CMM_SEQ_CST: Memory barriers before and after operation
>   - CMM_SEQ_CST_FENCE: Memory barriers before and after operation
> 
> For the compare exchange operation uatomic_cmpxchg_mo(), the success
> memory order can be anything while the failure memory order cannot be
> CMM_RELEASE nor CMM_ACQ_REL and cannot be stronger than the success
> memory order. A barrier may be inserted before and after the store to
> memory depending on the memory orders:
> 
>  Success memory order:
> 
>   - CMM_RELAXED: No barrier
>   - CMM_ACQUIRE: Memory barrier after operation
>   - CMM_CONSUME: Memory barrier after operation
>   - CMM_RELEASE: Memory barrier before operation
>   - CMM_ACQ_REL: Memory barriers before and after operation
>   - CMM_SEQ_CST: Memory barriers before and after operation
>   - CMM_SEQ_CST_FENCE: Memory barriers before and after operation
> 
>   Barriers after the operations are only emitted if the compare exchange
>   succeed.
> 
>  Failure memory order:
>   - CMM_RELAXED: No barrier
>   - CMM_ACQUIRE: Memory barrier after operation
>   - CMM_CONSUME: Memory barrier after operation
>   - CMM_SEQ_CST: Memory barriers before and after operation
>   - CMM_SEQ_CST_FENCE: Memory barriers before and after operation
> 
>   Barriers after the operations are only emitted if the compare exchange
>   failed.  Barriers before the operation are never emitted by this
>   memory order.
> 
> Change-Id: I213ba19c84e82a63083f00143a3142ffbdab1d52
> Co-authored-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Signed-off-by: Olivier Dion <odion@efficios.com>
> ---
>  doc/uatomic-api.md                      |   3 +-
>  include/Makefile.am                     |   2 +
>  include/urcu/static/pointer.h           |  40 ++--
>  include/urcu/uatomic.h                  |  63 ++++++-
>  include/urcu/uatomic/builtins-generic.h | 170 +++++++++++++++++
>  include/urcu/uatomic/builtins.h         |  79 ++++++++
>  include/urcu/uatomic/generic.h          | 234 ++++++++++++++++++++++++
>  src/urcu-pointer.c                      |   9 +-
>  8 files changed, 565 insertions(+), 35 deletions(-)
>  create mode 100644 include/urcu/uatomic/builtins-generic.h
>  create mode 100644 include/urcu/uatomic/builtins.h
> 
> diff --git a/doc/uatomic-api.md b/doc/uatomic-api.md
> index 0962399..7341ee8 100644
> --- a/doc/uatomic-api.md
> +++ b/doc/uatomic-api.md
> @@ -52,7 +52,8 @@ An atomic read-modify-write operation that performs this
>  sequence of operations atomically: check if `addr` contains `old`.
>  If true, then replace the content of `addr` by `new`. Return the
>  value previously contained by `addr`. This function implies a full
> -memory barrier before and after the atomic operation.
> +memory barrier before and after the atomic operation. The second memory
> +barrier is only emitted if the operation succeeded.
>  
>  
>  ```c
> diff --git a/include/Makefile.am b/include/Makefile.am
> index ba1fe60..b20e56d 100644
> --- a/include/Makefile.am
> +++ b/include/Makefile.am
> @@ -63,6 +63,8 @@ nobase_include_HEADERS = \
>  	urcu/uatomic/alpha.h \
>  	urcu/uatomic_arch.h \
>  	urcu/uatomic/arm.h \
> +	urcu/uatomic/builtins.h \
> +	urcu/uatomic/builtins-generic.h \
>  	urcu/uatomic/gcc.h \
>  	urcu/uatomic/generic.h \
>  	urcu/uatomic.h \
> diff --git a/include/urcu/static/pointer.h b/include/urcu/static/pointer.h
> index 9e46a57..9da8657 100644
> --- a/include/urcu/static/pointer.h
> +++ b/include/urcu/static/pointer.h
> @@ -96,23 +96,8 @@ extern "C" {
>   * -Wincompatible-pointer-types errors.  Using the statement expression
>   * makes it an rvalue and gets rid of the const-ness.
>   */
> -#ifdef __URCU_DEREFERENCE_USE_ATOMIC_CONSUME
> -# define _rcu_dereference(p) __extension__ ({						\
> -				__typeof__(__extension__ ({				\
> -					__typeof__(p) __attribute__((unused)) _________p0 = { 0 }; \
> -					_________p0;					\
> -				})) _________p1;					\
> -				__atomic_load(&(p), &_________p1, __ATOMIC_CONSUME);	\

There is talk of getting rid of memory_order_consume.  But for the moment,
it is what there is.  Another alternative is to use a volatile load,
similar to old-style CMM_LOAD_SHARED() or in-kernel READ_ONCE().

						Thanx, Paul

> -				(_________p1);						\
> -			})
> -#else
> -# define _rcu_dereference(p) __extension__ ({						\
> -				__typeof__(p) _________p1 = CMM_LOAD_SHARED(p);		\
> -				cmm_smp_read_barrier_depends();				\
> -				(_________p1);						\
> -			})
> -#endif
> -
> +# define _rcu_dereference(p)			\
> +	uatomic_load(&(p), CMM_CONSUME)
>  /**
>   * _rcu_cmpxchg_pointer - same as rcu_assign_pointer, but tests if the pointer
>   * is as expected by "old". If succeeds, returns the previous pointer to the
> @@ -131,8 +116,9 @@ extern "C" {
>  	({								\
>  		__typeof__(*p) _________pold = (old);			\
>  		__typeof__(*p) _________pnew = (_new);			\
> -		uatomic_cmpxchg(p, _________pold, _________pnew);	\
> -	})
> +		uatomic_cmpxchg_mo(p, _________pold, _________pnew,	\
> +				   CMM_SEQ_CST, CMM_SEQ_CST);		\
> +	});
>  
>  /**
>   * _rcu_xchg_pointer - same as rcu_assign_pointer, but returns the previous
> @@ -149,17 +135,17 @@ extern "C" {
>  	__extension__					\
>  	({						\
>  		__typeof__(*p) _________pv = (v);	\
> -		uatomic_xchg(p, _________pv);		\
> +		uatomic_xchg_mo(p, _________pv,		\
> +				CMM_SEQ_CST);		\
>  	})
>  
>  
> -#define _rcu_set_pointer(p, v)				\
> -	do {						\
> -		__typeof__(*p) _________pv = (v);	\
> -		if (!__builtin_constant_p(v) || 	\
> -		    ((v) != NULL))			\
> -			cmm_wmb();				\
> -		uatomic_set(p, _________pv);		\
> +#define _rcu_set_pointer(p, v)						\
> +	do {								\
> +		__typeof__(*p) _________pv = (v);			\
> +		uatomic_store(p, _________pv,				\
> +			__builtin_constant_p(v) && (v) == NULL ?	\
> +			CMM_RELAXED : CMM_RELEASE);			\
>  	} while (0)
>  
>  /**
> diff --git a/include/urcu/uatomic.h b/include/urcu/uatomic.h
> index 2fb5fd4..be857e1 100644
> --- a/include/urcu/uatomic.h
> +++ b/include/urcu/uatomic.h
> @@ -21,9 +21,70 @@
>  #ifndef _URCU_UATOMIC_H
>  #define _URCU_UATOMIC_H
>  
> +#include <assert.h>
> +
>  #include <urcu/arch.h>
> +#include <urcu/config.h>
>  
> -#if defined(URCU_ARCH_X86)
> +enum cmm_memorder {
> +	CMM_RELAXED = 0,
> +	CMM_CONSUME = 1,
> +	CMM_ACQUIRE = 2,
> +	CMM_RELEASE = 3,
> +	CMM_ACQ_REL = 4,
> +	CMM_SEQ_CST = 5,
> +	CMM_SEQ_CST_FENCE = 6,
> +};
> +
> +#ifdef CONFIG_RCU_USE_ATOMIC_BUILTINS
> +
> +/*
> + * Make sure that CMM_SEQ_CST_FENCE is not equivalent to other memory orders.
> + */
> +# ifdef static_assert
> +static_assert(CMM_RELAXED == __ATOMIC_RELAXED, "");
> +static_assert(CMM_CONSUME == __ATOMIC_CONSUME, "");
> +static_assert(CMM_ACQUIRE == __ATOMIC_ACQUIRE, "");
> +static_assert(CMM_RELEASE == __ATOMIC_RELEASE, "");
> +static_assert(CMM_ACQ_REL == __ATOMIC_ACQ_REL, "");
> +static_assert(CMM_SEQ_CST == __ATOMIC_SEQ_CST, "");
> +# endif
> +
> +/*
> + * This is not part of the public API. It it used internally to implement the
> + * CMM_SEQ_CST_FENCE memory order.
> + *
> + * NOTE: Using switch here instead of if statement to avoid -Wduplicated-cond
> + * warning when memory order is conditionally determined.
> + */
> +static inline void cmm_seq_cst_fence_after_atomic(enum cmm_memorder mo)
> +{
> +	switch (mo) {
> +	case CMM_SEQ_CST_FENCE:
> +		cmm_smp_mb();
> +		break;
> +	default:
> +		break;
> +	}
> +}
> +
> +#endif
> +
> +/*
> + * This is not part of the public API. It is used internally to convert from the
> + * CMM memory model to the C11 memory model.
> + */
> +static inline int cmm_to_c11(int mo)
> +{
> +	if (mo == CMM_SEQ_CST_FENCE) {
> +		return CMM_SEQ_CST;
> +	}
> +	return mo;
> +}
> +
> +#if defined(CONFIG_RCU_USE_ATOMIC_BUILTINS)
> +#include <urcu/uatomic/builtins.h>
> +#elif defined(URCU_ARCH_X86)
>  #include <urcu/uatomic/x86.h>
>  #elif defined(URCU_ARCH_PPC)
>  #include <urcu/uatomic/ppc.h>
> diff --git a/include/urcu/uatomic/builtins-generic.h b/include/urcu/uatomic/builtins-generic.h
> new file mode 100644
> index 0000000..673e888
> --- /dev/null
> +++ b/include/urcu/uatomic/builtins-generic.h
> @@ -0,0 +1,170 @@
> +/*
> + * urcu/uatomic/builtins-generic.h
> + *
> + * Copyright (c) 2023 Olivier Dion <odion@efficios.com>
> + *
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with this library; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +
> +#ifndef _URCU_UATOMIC_BUILTINS_GENERIC_H
> +#define _URCU_UATOMIC_BUILTINS_GENERIC_H
> +
> +#include <urcu/system.h>
> +
> +#define uatomic_store(addr, v, mo)				\
> +	__extension__						\
> +	({							\
> +		__atomic_store_n(addr, v, cmm_to_c11(mo));	\
> +		cmm_seq_cst_fence_after_atomic(mo);		\
> +	})
> +
> +#define uatomic_set(addr, v)			\
> +	uatomic_store(addr, v, CMM_RELAXED)
> +
> +#define uatomic_load(addr, mo)						\
> +	__extension__							\
> +	({								\
> +		__typeof__(*(addr)) _value = __atomic_load_n(addr,	\
> +							cmm_to_c11(mo)); \
> +		cmm_seq_cst_fence_after_atomic(mo);			\
> +									\
> +		_value;							\
> +	})
> +
> +#define uatomic_read(addr)			\
> +	uatomic_load(addr, CMM_RELAXED)
> +
> +#define uatomic_cmpxchg_mo(addr, old, new, mos, mof)			\
> +	__extension__							\
> +	({								\
> +		__typeof__(*(addr)) _old = (__typeof__(*(addr)))old;	\
> +									\
> +		if (__atomic_compare_exchange_n(addr, &_old, new, 0,	\
> +							cmm_to_c11(mos), \
> +							cmm_to_c11(mof))) { \
> +			cmm_seq_cst_fence_after_atomic(mos);		\
> +		} else {						\
> +			cmm_seq_cst_fence_after_atomic(mof);		\
> +		}							\
> +		_old;							\
> +	})
> +
> +#define uatomic_cmpxchg(addr, old, new)					\
> +	uatomic_cmpxchg_mo(addr, old, new, CMM_SEQ_CST_FENCE, CMM_RELAXED)
> +
> +#define uatomic_xchg_mo(addr, v, mo)					\
> +	__extension__							\
> +	({								\
> +		__typeof__((*addr)) _old = __atomic_exchange_n(addr, v,	\
> +							cmm_to_c11(mo)); \
> +		cmm_seq_cst_fence_after_atomic(mo);			\
> +		_old;							\
> +	})
> +
> +#define uatomic_xchg(addr, v)						\
> +	uatomic_xchg_mo(addr, v, CMM_SEQ_CST_FENCE)
> +
> +#define uatomic_add_return_mo(addr, v, mo)				\
> +	__extension__							\
> +	({								\
> +		__typeof__(*(addr)) _old = __atomic_add_fetch(addr, v,	\
> +							cmm_to_c11(mo)); \
> +		cmm_seq_cst_fence_after_atomic(mo);			\
> +		_old;							\
> +	})
> +
> +#define uatomic_add_return(addr, v)					\
> +	uatomic_add_return_mo(addr, v, CMM_SEQ_CST_FENCE)
> +
> +#define uatomic_sub_return_mo(addr, v, mo)				\
> +	__extension__							\
> +	({								\
> +		__typeof__(*(addr)) _old = __atomic_sub_fetch(addr, v,	\
> +							cmm_to_c11(mo)); \
> +		cmm_seq_cst_fence_after_atomic(mo);			\
> +		_old;							\
> +	})
> +
> +#define uatomic_sub_return(addr, v)					\
> +	uatomic_sub_return_mo(addr, v, CMM_SEQ_CST_FENCE)
> +
> +#define uatomic_and_mo(addr, mask, mo)					\
> +	__extension__							\
> +	({								\
> +		__typeof__(*(addr)) _old = __atomic_and_fetch(addr, mask, \
> +							cmm_to_c11(mo)); \
> +		cmm_seq_cst_fence_after_atomic(mo);			\
> +		_old;							\
> +	})
> +
> +#define uatomic_and(addr, mask)				\
> +	(void) uatomic_and_mo(addr, mask, CMM_SEQ_CST)
> +
> +#define uatomic_or_mo(addr, mask, mo)					\
> +	__extension__							\
> +	({								\
> +		__typeof__(*(addr)) _old = __atomic_or_fetch(addr, mask, \
> +							cmm_to_c11(mo)); \
> +		cmm_seq_cst_fence_after_atomic(mo);			\
> +		_old;							\
> +	})
> +
> +
> +#define uatomic_or(addr, mask)				\
> +	(void) uatomic_or_mo(addr, mask, CMM_RELAXED)
> +
> +#define uatomic_add_mo(addr, v, mo)			\
> +	(void) uatomic_add_return_mo(addr, v, mo)
> +
> +#define uatomic_add(addr, v)				\
> +	(void) uatomic_add_mo(addr, v, CMM_RELAXED)
> +
> +#define uatomic_sub_mo(addr, v, mo)			\
> +	(void) uatomic_sub_return_mo(addr, v, mo)
> +
> +#define uatomic_sub(addr, v)				\
> +	(void) uatomic_sub_mo(addr, v, CMM_RELAXED)
> +
> +#define uatomic_inc_mo(addr, mo)		\
> +	(void) uatomic_add_mo(addr, 1, mo)
> +
> +#define uatomic_inc(addr)				\
> +	(void) uatomic_inc_mo(addr, CMM_RELAXED)
> +
> +#define uatomic_dec_mo(addr, mo)		\
> +	(void) uatomic_sub_mo(addr, 1, mo)
> +
> +#define uatomic_dec(addr)				\
> +	(void) uatomic_dec_mo(addr, CMM_RELAXED)
> +
> +#define cmm_smp_mb__before_uatomic_and() cmm_smp_mb()
> +#define cmm_smp_mb__after_uatomic_and()  cmm_smp_mb()
> +
> +#define cmm_smp_mb__before_uatomic_or() cmm_smp_mb()
> +#define cmm_smp_mb__after_uatomic_or()  cmm_smp_mb()
> +
> +#define cmm_smp_mb__before_uatomic_add() cmm_smp_mb()
> +#define cmm_smp_mb__after_uatomic_add()  cmm_smp_mb()
> +
> +#define cmm_smp_mb__before_uatomic_sub() cmm_smp_mb()
> +#define cmm_smp_mb__after_uatomic_sub()  cmm_smp_mb()
> +
> +#define cmm_smp_mb__before_uatomic_inc() cmm_smp_mb()
> +#define cmm_smp_mb__after_uatomic_inc() cmm_smp_mb()
> +
> +#define cmm_smp_mb__before_uatomic_dec() cmm_smp_mb()
> +#define cmm_smp_mb__after_uatomic_dec() cmm_smp_mb()
> +
> +#endif /* _URCU_UATOMIC_BUILTINS_X86_H */
> diff --git a/include/urcu/uatomic/builtins.h b/include/urcu/uatomic/builtins.h
> new file mode 100644
> index 0000000..82e98f8
> --- /dev/null
> +++ b/include/urcu/uatomic/builtins.h
> @@ -0,0 +1,79 @@
> +/*
> + * urcu/uatomic/builtins.h
> + *
> + * Copyright (c) 2023 Olivier Dion <odion@efficios.com>
> + *
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with this library; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +
> +#ifndef _URCU_UATOMIC_BUILTINS_H
> +#define _URCU_UATOMIC_BUILTINS_H
> +
> +#include <urcu/arch.h>
> +
> +#if defined(__has_builtin)
> +# if !__has_builtin(__atomic_store_n)
> +#  error "Toolchain does not support __atomic_store_n."
> +# endif
> +# if !__has_builtin(__atomic_load_n)
> +#  error "Toolchain does not support __atomic_load_n."
> +# endif
> +# if !__has_builtin(__atomic_exchange_n)
> +#  error "Toolchain does not support __atomic_exchange_n."
> +# endif
> +# if !__has_builtin(__atomic_compare_exchange_n)
> +#  error "Toolchain does not support __atomic_compare_exchange_n."
> +# endif
> +# if !__has_builtin(__atomic_add_fetch)
> +#  error "Toolchain does not support __atomic_add_fetch."
> +# endif
> +# if !__has_builtin(__atomic_sub_fetch)
> +#  error "Toolchain does not support __atomic_sub_fetch."
> +# endif
> +# if !__has_builtin(__atomic_or_fetch)
> +#  error "Toolchain does not support __atomic_or_fetch."
> +# endif
> +# if !__has_builtin(__atomic_thread_fence)
> +#  error "Toolchain does not support __atomic_thread_fence."
> +# endif
> +# if !__has_builtin(__atomic_signal_fence)
> +#  error "Toolchain does not support __atomic_signal_fence."
> +# endif
> +#elif defined(__GNUC__)
> +# define GCC_VERSION (__GNUC__       * 10000 + \
> +		       __GNUC_MINOR__ * 100   + \
> +		       __GNUC_PATCHLEVEL__)
> +# if  GCC_VERSION < 40700
> +#  error "GCC version is too old. Version must be 4.7 or greater"
> +# endif
> +# undef  GCC_VERSION
> +#else
> +# error "Toolchain is not supported."
> +#endif
> +
> +#if defined(__GNUC__)
> +# define UATOMIC_HAS_ATOMIC_BYTE  __GCC_ATOMIC_CHAR_LOCK_FREE
> +# define UATOMIC_HAS_ATOMIC_SHORT __GCC_ATOMIC_SHORT_LOCK_FREE
> +#elif defined(__clang__)
> +# define UATOMIC_HAS_ATOMIC_BYTE  __CLANG_ATOMIC_CHAR_LOCK_FREE
> +# define UATOMIC_HAS_ATOMIC_SHORT __CLANG_ATOMIC_SHORT_LOCK_FREE
> +#else
> +/* #  define UATOMIC_HAS_ATOMIC_BYTE  */
> +/* #  define UATOMIC_HAS_ATOMIC_SHORT */
> +#endif
> +
> +#include <urcu/uatomic/builtins-generic.h>
> +
> +#endif	/* _URCU_UATOMIC_BUILTINS_H */
> diff --git a/include/urcu/uatomic/generic.h b/include/urcu/uatomic/generic.h
> index e31a19b..6b9c153 100644
> --- a/include/urcu/uatomic/generic.h
> +++ b/include/urcu/uatomic/generic.h
> @@ -33,10 +33,244 @@ extern "C" {
>  #define uatomic_set(addr, v)	((void) CMM_STORE_SHARED(*(addr), (v)))
>  #endif
>  
> +extern void abort(void);
> +
> +#define uatomic_load_store_return_op(op, addr, v, mo)			\
> +	__extension__							\
> +	({								\
> +									\
> +		switch (mo) {						\
> +		case CMM_ACQUIRE:					\
> +		case CMM_CONSUME:					\
> +		case CMM_RELAXED:					\
> +			break;						\
> +		case CMM_RELEASE:					\
> +		case CMM_ACQ_REL:					\
> +		case CMM_SEQ_CST:					\
> +		case CMM_SEQ_CST_FENCE:					\
> +			cmm_smp_mb();					\
> +			break;						\
> +		default:						\
> +			abort();					\
> +		}							\
> +									\
> +		__typeof__((*addr)) _value = op(addr, v);		\
> +									\
> +		switch (mo) {						\
> +		case CMM_CONSUME:					\
> +			cmm_smp_read_barrier_depends();			\
> +			break;						\
> +		case CMM_ACQUIRE:					\
> +		case CMM_ACQ_REL:					\
> +		case CMM_SEQ_CST:					\
> +		case CMM_SEQ_CST_FENCE:					\
> +			cmm_smp_mb();					\
> +			break;						\
> +		case CMM_RELAXED:					\
> +		case CMM_RELEASE:					\
> +			break;						\
> +		default:						\
> +			abort();					\
> +		}							\
> +		_value;							\
> +	})
> +
> +#define uatomic_load_store_op(op, addr, v, mo)				\
> +	({								\
> +		switch (mo) {						\
> +		case CMM_ACQUIRE:					\
> +		case CMM_CONSUME:					\
> +		case CMM_RELAXED:					\
> +			break;						\
> +		case CMM_RELEASE:					\
> +		case CMM_ACQ_REL:					\
> +		case CMM_SEQ_CST:					\
> +		case CMM_SEQ_CST_FENCE:					\
> +			cmm_smp_mb();					\
> +			break;						\
> +		default:						\
> +			abort();					\
> +		}							\
> +									\
> +		op(addr, v);						\
> +									\
> +		switch (mo) {						\
> +		case CMM_CONSUME:					\
> +			cmm_smp_read_barrier_depends();			\
> +			break;						\
> +		case CMM_ACQUIRE:					\
> +		case CMM_ACQ_REL:					\
> +		case CMM_SEQ_CST:					\
> +		case CMM_SEQ_CST_FENCE:					\
> +			cmm_smp_mb();					\
> +			break;						\
> +		case CMM_RELAXED:					\
> +		case CMM_RELEASE:					\
> +			break;						\
> +		default:						\
> +			abort();					\
> +		}							\
> +	})
> +
> +#define uatomic_store(addr, v, mo)			\
> +	({						\
> +		switch (mo) {				\
> +		case CMM_RELAXED:			\
> +			break;				\
> +		case CMM_RELEASE:			\
> +		case CMM_SEQ_CST:			\
> +		case CMM_SEQ_CST_FENCE:			\
> +			cmm_smp_mb();			\
> +			break;				\
> +		default:				\
> +			abort();			\
> +		}					\
> +							\
> +		uatomic_set(addr, v);			\
> +							\
> +		switch (mo) {				\
> +		case CMM_RELAXED:			\
> +		case CMM_RELEASE:			\
> +			break;				\
> +		case CMM_SEQ_CST:			\
> +		case CMM_SEQ_CST_FENCE:			\
> +			cmm_smp_mb();			\
> +			break;				\
> +		default:				\
> +			abort();			\
> +		}					\
> +	})
> +
> +#define uatomic_and_mo(addr, v, mo)				\
> +	uatomic_load_store_op(uatomic_and, addr, v, mo)
> +
> +#define uatomic_or_mo(addr, v, mo)				\
> +	uatomic_load_store_op(uatomic_or, addr, v, mo)
> +
> +#define uatomic_add_mo(addr, v, mo)				\
> +	uatomic_load_store_op(uatomic_add, addr, v, mo)
> +
> +#define uatomic_sub_mo(addr, v, mo)				\
> +	uatomic_load_store_op(uatomic_sub, addr, v, mo)
> +
> +#define uatomic_inc_mo(addr, mo)				\
> +	uatomic_load_store_op(uatomic_add, addr, 1, mo)
> +
> +#define uatomic_dec_mo(addr, mo)				\
> +	uatomic_load_store_op(uatomic_add, addr, -1, mo)
> +/*
> + * NOTE: We can not just do switch (_value == (old) ? mos : mof) otherwise the
> + * compiler emit a -Wduplicated-cond warning.
> + */
> +#define uatomic_cmpxchg_mo(addr, old, new, mos, mof)			\
> +	__extension__							\
> +	({								\
> +		switch (mos) {						\
> +		case CMM_ACQUIRE:					\
> +		case CMM_CONSUME:					\
> +		case CMM_RELAXED:					\
> +			break;						\
> +		case CMM_RELEASE:					\
> +		case CMM_ACQ_REL:					\
> +		case CMM_SEQ_CST:					\
> +		case CMM_SEQ_CST_FENCE:					\
> +			cmm_smp_mb();					\
> +			break;						\
> +		default:						\
> +			abort();					\
> +		}							\
> +									\
> +		__typeof__(*(addr)) _value = uatomic_cmpxchg(addr, old,	\
> +							new);		\
> +									\
> +		if (_value == (old)) {					\
> +			switch (mos) {					\
> +			case CMM_CONSUME:				\
> +				cmm_smp_read_barrier_depends();		\
> +				break;					\
> +			case CMM_ACQUIRE:				\
> +			case CMM_ACQ_REL:				\
> +			case CMM_SEQ_CST:				\
> +			case CMM_SEQ_CST_FENCE:				\
> +				cmm_smp_mb();				\
> +				break;					\
> +			case CMM_RELAXED:				\
> +			case CMM_RELEASE:				\
> +				break;					\
> +			default:					\
> +				abort();				\
> +			}						\
> +		} else {						\
> +			switch (mof) {					\
> +			case CMM_CONSUME:				\
> +				cmm_smp_read_barrier_depends();		\
> +				break;					\
> +			case CMM_ACQUIRE:				\
> +			case CMM_ACQ_REL:				\
> +			case CMM_SEQ_CST:				\
> +			case CMM_SEQ_CST_FENCE:				\
> +				cmm_smp_mb();				\
> +				break;					\
> +			case CMM_RELAXED:				\
> +			case CMM_RELEASE:				\
> +				break;					\
> +			default:					\
> +				abort();				\
> +			}						\
> +		}							\
> +		_value;							\
> +	})
> +
> +#define uatomic_xchg_mo(addr, v, mo)				\
> +	uatomic_load_store_return_op(uatomic_xchg, addr, v, mo)
> +
> +#define uatomic_add_return_mo(addr, v, mo)				\
> +	uatomic_load_store_return_op(uatomic_add_return, addr, v)
> +
> +#define uatomic_sub_return_mo(addr, v, mo)				\
> +	uatomic_load_store_return_op(uatomic_sub_return, addr, v)
> +
> +
>  #ifndef uatomic_read
>  #define uatomic_read(addr)	CMM_LOAD_SHARED(*(addr))
>  #endif
>  
> +#define uatomic_load(addr, mo)						\
> +	__extension__							\
> +	({								\
> +		switch (mo) {						\
> +		case CMM_ACQUIRE:					\
> +		case CMM_CONSUME:					\
> +		case CMM_RELAXED:					\
> +			break;						\
> +		case CMM_SEQ_CST:					\
> +		case CMM_SEQ_CST_FENCE:					\
> +			cmm_smp_mb();					\
> +			break;						\
> +		default:						\
> +			abort();					\
> +		}							\
> +									\
> +		__typeof__(*(addr)) _rcu_value = uatomic_read(addr);	\
> +									\
> +		switch (mo) {						\
> +		case CMM_RELAXED:					\
> +			break;						\
> +		case CMM_CONSUME:					\
> +			cmm_smp_read_barrier_depends();			\
> +			break;						\
> +		case CMM_ACQUIRE:					\
> +		case CMM_SEQ_CST:					\
> +		case CMM_SEQ_CST_FENCE:					\
> +			cmm_smp_mb();					\
> +			break;						\
> +		default:						\
> +			abort();					\
> +		}							\
> +									\
> +		_rcu_value;						\
> +	})
> +
>  #if !defined __OPTIMIZE__  || defined UATOMIC_NO_LINK_ERROR
>  #ifdef ILLEGAL_INSTR
>  static inline __attribute__((always_inline))
> diff --git a/src/urcu-pointer.c b/src/urcu-pointer.c
> index d0854ac..cea8aeb 100644
> --- a/src/urcu-pointer.c
> +++ b/src/urcu-pointer.c
> @@ -39,19 +39,16 @@ void *rcu_dereference_sym(void *p)
>  
>  void *rcu_set_pointer_sym(void **p, void *v)
>  {
> -	cmm_wmb();
> -	uatomic_set(p, v);
> +	uatomic_store(p, v, CMM_RELEASE);
>  	return v;
>  }
>  
>  void *rcu_xchg_pointer_sym(void **p, void *v)
>  {
> -	cmm_wmb();
> -	return uatomic_xchg(p, v);
> +	return uatomic_xchg_mo(p, v, CMM_SEQ_CST);
>  }
>  
>  void *rcu_cmpxchg_pointer_sym(void **p, void *old, void *_new)
>  {
> -	cmm_wmb();
> -	return uatomic_cmpxchg(p, old, _new);
> +	return uatomic_cmpxchg_mo(p, old, _new, CMM_SEQ_CST, CMM_SEQ_CST);
>  }
> -- 
> 2.40.1
> 
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [lttng-dev] [PATCH v2 07/12] tests: Use uatomic for accessing global states
  2023-06-07 18:53 ` [lttng-dev] [PATCH v2 07/12] tests: Use uatomic for accessing global states Olivier Dion via lttng-dev
@ 2023-06-21 23:37   ` Paul E. McKenney via lttng-dev
  0 siblings, 0 replies; 69+ messages in thread
From: Paul E. McKenney via lttng-dev @ 2023-06-21 23:37 UTC (permalink / raw)
  To: Olivier Dion; +Cc: Tony Finch, lttng-dev

On Wed, Jun 07, 2023 at 02:53:54PM -0400, Olivier Dion wrote:
> Global states accesses were protected via memory barriers. Use the
> uatomic API with the CMM memory model so that TSAN does not warns about

"does not warn", for whatever that is worth.

> none atomic concurrent accesses.
> 
> Also, the thread id map mutex must be unlocked after setting the new
> created thread id in the map. Otherwise, the new thread could observe an
> unset id.
> 
> Change-Id: I1ecdc387b3f510621cbc116ad3b95c676f5d659a
> Co-authored-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Signed-off-by: Olivier Dion <odion@efficios.com>
> ---
>  tests/common/api.h            |  12 ++--
>  tests/regression/rcutorture.h | 106 +++++++++++++++++++++++-----------
>  2 files changed, 80 insertions(+), 38 deletions(-)
> 
> diff --git a/tests/common/api.h b/tests/common/api.h
> index a260463..9d22b0f 100644
> --- a/tests/common/api.h
> +++ b/tests/common/api.h
> @@ -26,6 +26,7 @@
>  
>  #include <urcu/compiler.h>
>  #include <urcu/arch.h>
> +#include <urcu/uatomic.h>
>  
>  /*
>   * Machine parameters.
> @@ -135,7 +136,7 @@ static int __smp_thread_id(void)
>  	thread_id_t tid = pthread_self();
>  
>  	for (i = 0; i < NR_THREADS; i++) {
> -		if (__thread_id_map[i] == tid) {
> +		if (uatomic_read(&__thread_id_map[i]) == tid) {
>  			long v = i + 1;  /* must be non-NULL. */
>  
>  			if (pthread_setspecific(thread_id_key, (void *)v) != 0) {
> @@ -184,12 +185,13 @@ static thread_id_t create_thread(void *(*func)(void *), void *arg)
>  		exit(-1);
>  	}
>  	__thread_id_map[i] = __THREAD_ID_MAP_WAITING;
> -	spin_unlock(&__thread_id_map_mutex);
> +
>  	if (pthread_create(&tid, NULL, func, arg) != 0) {
>  		perror("create_thread:pthread_create");
>  		exit(-1);
>  	}
> -	__thread_id_map[i] = tid;
> +	uatomic_set(&__thread_id_map[i], tid);
> +	spin_unlock(&__thread_id_map_mutex);
>  	return tid;
>  }
>  
> @@ -199,7 +201,7 @@ static void *wait_thread(thread_id_t tid)
>  	void *vp;
>  
>  	for (i = 0; i < NR_THREADS; i++) {
> -		if (__thread_id_map[i] == tid)
> +		if (uatomic_read(&__thread_id_map[i]) == tid)
>  			break;
>  	}
>  	if (i >= NR_THREADS){
> @@ -211,7 +213,7 @@ static void *wait_thread(thread_id_t tid)
>  		perror("wait_thread:pthread_join");
>  		exit(-1);
>  	}
> -	__thread_id_map[i] = __THREAD_ID_MAP_EMPTY;
> +	uatomic_set(&__thread_id_map[i], __THREAD_ID_MAP_EMPTY);
>  	return vp;
>  }
>  
> diff --git a/tests/regression/rcutorture.h b/tests/regression/rcutorture.h
> index bc394f9..5835b8f 100644
> --- a/tests/regression/rcutorture.h
> +++ b/tests/regression/rcutorture.h
> @@ -44,6 +44,14 @@
>   * data.  A correct RCU implementation will have all but the first two
>   * numbers non-zero.
>   *
> + * rcu_stress_count: Histogram of "ages" of structures seen by readers.  If any
> + * entries past the first two are non-zero, RCU is broken. The age of a newly
> + * allocated structure is zero, it becomes one when removed from reader
> + * visibility, and is incremented once per grace period subsequently -- and is
> + * freed after passing through (RCU_STRESS_PIPE_LEN-2) grace periods.  Since
> + * this tests only has one true writer (there are fake writers), only buckets at
> + * indexes 0 and 1 should be none-zero.
> + *
>   * This program is free software; you can redistribute it and/or modify
>   * it under the terms of the GNU General Public License as published by
>   * the Free Software Foundation; either version 2 of the License, or
> @@ -68,6 +76,8 @@
>  #include <stdlib.h>
>  #include "tap.h"
>  
> +#include <urcu/uatomic.h>
> +
>  #define NR_TESTS	1
>  
>  DEFINE_PER_THREAD(long long, n_reads_pt);
> @@ -145,10 +155,10 @@ void *rcu_read_perf_test(void *arg)
>  	run_on(me);
>  	uatomic_inc(&nthreadsrunning);
>  	put_thread_offline();
> -	while (goflag == GOFLAG_INIT)
> +	while (uatomic_read(&goflag) == GOFLAG_INIT)
>  		(void) poll(NULL, 0, 1);
>  	put_thread_online();
> -	while (goflag == GOFLAG_RUN) {
> +	while (uatomic_read(&goflag) == GOFLAG_RUN) {
>  		for (i = 0; i < RCU_READ_RUN; i++) {
>  			rcu_read_lock();
>  			/* rcu_read_lock_nest(); */
> @@ -180,9 +190,9 @@ void *rcu_update_perf_test(void *arg __attribute__((unused)))
>  		}
>  	}
>  	uatomic_inc(&nthreadsrunning);
> -	while (goflag == GOFLAG_INIT)
> +	while (uatomic_read(&goflag) == GOFLAG_INIT)
>  		(void) poll(NULL, 0, 1);
> -	while (goflag == GOFLAG_RUN) {
> +	while (uatomic_read(&goflag) == GOFLAG_RUN) {
>  		synchronize_rcu();
>  		n_updates_local++;
>  	}
> @@ -211,15 +221,11 @@ int perftestrun(int nthreads, int nreaders, int nupdaters)
>  	int t;
>  	int duration = 1;
>  
> -	cmm_smp_mb();
>  	while (uatomic_read(&nthreadsrunning) < nthreads)
>  		(void) poll(NULL, 0, 1);
> -	goflag = GOFLAG_RUN;
> -	cmm_smp_mb();
> +	uatomic_set(&goflag, GOFLAG_RUN);
>  	sleep(duration);

The theory here being that the context switches implied by the sleep()
make the memory barrier unnecesary?  Not unreasonable, I guess.  ;-)

> -	cmm_smp_mb();
> -	goflag = GOFLAG_STOP;
> -	cmm_smp_mb();
> +	uatomic_set(&goflag, GOFLAG_STOP);
>  	wait_all_threads();
>  	for_each_thread(t) {
>  		n_reads += per_thread(n_reads_pt, t);
> @@ -300,6 +306,13 @@ struct rcu_stress rcu_stress_array[RCU_STRESS_PIPE_LEN] = { { 0, 0 } };
>  struct rcu_stress *rcu_stress_current;
>  int rcu_stress_idx = 0;
>  
> +/*
> + * How many time a reader has seen something that should not be visible. It is
> + * an error if this value is different than zero at the end of the stress test.
> + *
> + * Here, the something that should not be visibile is an old pipe that has been
> + * freed (mbtest = 0).
> + */
>  int n_mberror = 0;
>  DEFINE_PER_THREAD(long long [RCU_STRESS_PIPE_LEN + 1], rcu_stress_count);
>  
> @@ -315,19 +328,25 @@ void *rcu_read_stress_test(void *arg __attribute__((unused)))
>  
>  	rcu_register_thread();
>  	put_thread_offline();
> -	while (goflag == GOFLAG_INIT)
> +	while (uatomic_read(&goflag) == GOFLAG_INIT)
>  		(void) poll(NULL, 0, 1);
>  	put_thread_online();
> -	while (goflag == GOFLAG_RUN) {
> +	while (uatomic_read(&goflag) == GOFLAG_RUN) {
>  		rcu_read_lock();
>  		p = rcu_dereference(rcu_stress_current);
>  		if (p->mbtest == 0)
> -			n_mberror++;
> +			uatomic_inc_mo(&n_mberror, CMM_RELAXED);
>  		rcu_read_lock_nest();
> +		/*
> +		 * The value of garbage is nothing important. This is
> +		 * essentially a busy loop. The atomic operation -- while not
> +		 * important here -- helps tools such as TSAN to not flag this
> +		 * as a race condition.
> +		 */
>  		for (i = 0; i < 100; i++)
> -			garbage++;
> +			uatomic_inc(&garbage);
>  		rcu_read_unlock_nest();
> -		pc = p->pipe_count;
> +		pc = uatomic_read(&p->pipe_count);
>  		rcu_read_unlock();
>  		if ((pc > RCU_STRESS_PIPE_LEN) || (pc < 0))
>  			pc = RCU_STRESS_PIPE_LEN;
> @@ -397,26 +416,47 @@ static
>  void *rcu_update_stress_test(void *arg __attribute__((unused)))
>  {
>  	int i;
> -	struct rcu_stress *p;
> +	struct rcu_stress *p, *old_p;
>  	struct rcu_head rh;
>  	enum writer_state writer_state = WRITER_STATE_SYNC_RCU;
>  
> -	while (goflag == GOFLAG_INIT)
> +	rcu_register_thread();
> +
> +	put_thread_offline();
> +	while (uatomic_read(&goflag) == GOFLAG_INIT)
>  		(void) poll(NULL, 0, 1);
> -	while (goflag == GOFLAG_RUN) {
> +
> +	put_thread_online();
> +	while (uatomic_read(&goflag) == GOFLAG_RUN) {
>  		i = rcu_stress_idx + 1;
>  		if (i >= RCU_STRESS_PIPE_LEN)
>  			i = 0;
> +		/*
> +		 * Get old pipe that we free after a synchronize_rcu().
> +		 */
> +		rcu_read_lock();
> +		old_p = rcu_dereference(rcu_stress_current);
> +		rcu_read_unlock();
> +
> +		/*
> +		 * Allocate a new pipe.
> +		 */
>  		p = &rcu_stress_array[i];
> -		p->mbtest = 0;
> -		cmm_smp_mb();
>  		p->pipe_count = 0;
>  		p->mbtest = 1;
> +
>  		rcu_assign_pointer(rcu_stress_current, p);
>  		rcu_stress_idx = i;
> +
> +		/*
> +		 * Increment every pipe except the freshly allocated one. A
> +		 * reader should only see either the old pipe or the new
> +		 * pipe. This is reflected in the rcu_stress_count histogram.
> +		 */
>  		for (i = 0; i < RCU_STRESS_PIPE_LEN; i++)
>  			if (i != rcu_stress_idx)
> -				rcu_stress_array[i].pipe_count++;
> +				uatomic_inc(&rcu_stress_array[i].pipe_count);
> +
>  		switch (writer_state) {
>  		case WRITER_STATE_SYNC_RCU:
>  			synchronize_rcu();
> @@ -432,9 +472,7 @@ void *rcu_update_stress_test(void *arg __attribute__((unused)))
>  					strerror(errno));
>  				abort();
>  			}
> -			rcu_register_thread();
>  			call_rcu(&rh, rcu_update_stress_test_rcu);
> -			rcu_unregister_thread();
>  			/*
>  			 * Our MacOS X test machine with the following
>  			 * config:
> @@ -470,18 +508,24 @@ void *rcu_update_stress_test(void *arg __attribute__((unused)))
>  		{
>  			struct urcu_gp_poll_state poll_state;
>  
> -			rcu_register_thread();
>  			poll_state = start_poll_synchronize_rcu();
> -			rcu_unregister_thread();
>  			while (!poll_state_synchronize_rcu(poll_state))
>  				(void) poll(NULL, 0, 1);	/* Wait for 1ms */
>  			break;
>  		}
>  		}
> +		/*
> +		 * No readers should see that old pipe now. Setting mbtest to 0
> +		 * to mark it as "freed".
> +		 */
> +		old_p->mbtest = 0;
>  		n_updates++;
>  		advance_writer_state(&writer_state);
>  	}
>  
> +	put_thread_offline();
> +	rcu_unregister_thread();
> +
>  	return NULL;
>  }
>  
> @@ -497,9 +541,9 @@ void *rcu_fake_update_stress_test(void *arg __attribute__((unused)))
>  			set_thread_call_rcu_data(crdp);
>  		}
>  	}
> -	while (goflag == GOFLAG_INIT)
> +	while (uatomic_read(&goflag) == GOFLAG_INIT)
>  		(void) poll(NULL, 0, 1);
> -	while (goflag == GOFLAG_RUN) {
> +	while (uatomic_read(&goflag) == GOFLAG_RUN) {
>  		synchronize_rcu();
>  		(void) poll(NULL, 0, 1);
>  	}
> @@ -535,13 +579,9 @@ int stresstest(int nreaders)
>  	create_thread(rcu_update_stress_test, NULL);
>  	for (i = 0; i < 5; i++)
>  		create_thread(rcu_fake_update_stress_test, NULL);
> -	cmm_smp_mb();
> -	goflag = GOFLAG_RUN;
> -	cmm_smp_mb();
> +	uatomic_set(&goflag, GOFLAG_RUN);
>  	sleep(10);
> -	cmm_smp_mb();
> -	goflag = GOFLAG_STOP;
> -	cmm_smp_mb();
> +	uatomic_set(&goflag, GOFLAG_STOP);
>  	wait_all_threads();
>  	for_each_thread(t)
>  		n_reads += per_thread(n_reads_pt, t);

Looks plausible!

							Thanx, Paul
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [lttng-dev] [PATCH v2 08/12] benchmark: Use uatomic for accessing global states
  2023-06-07 18:53 ` [lttng-dev] [PATCH v2 08/12] benchmark: " Olivier Dion via lttng-dev
@ 2023-06-21 23:38   ` Paul E. McKenney via lttng-dev
  0 siblings, 0 replies; 69+ messages in thread
From: Paul E. McKenney via lttng-dev @ 2023-06-21 23:38 UTC (permalink / raw)
  To: Olivier Dion; +Cc: Tony Finch, lttng-dev

On Wed, Jun 07, 2023 at 02:53:55PM -0400, Olivier Dion wrote:
> Global states accesses were protected via memory barriers. Use the
> uatomic API with the CMM memory model so that TSAN can understand the
> ordering imposed by the synchronization flags.
> 
> Change-Id: I1bf5702c5ac470f308c478effe39e424a3158060
> Co-authored-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Signed-off-by: Olivier Dion <odion@efficios.com>

This does look more organized!

							Thanx, Paul

> ---
>  tests/benchmark/Makefile.am             | 91 +++++++++++++------------
>  tests/benchmark/common-states.c         |  1 +
>  tests/benchmark/common-states.h         | 51 ++++++++++++++
>  tests/benchmark/test_mutex.c            | 32 +--------
>  tests/benchmark/test_perthreadlock.c    | 32 +--------
>  tests/benchmark/test_rwlock.c           | 32 +--------
>  tests/benchmark/test_urcu.c             | 33 +--------
>  tests/benchmark/test_urcu_assign.c      | 33 +--------
>  tests/benchmark/test_urcu_bp.c          | 33 +--------
>  tests/benchmark/test_urcu_defer.c       | 33 +--------
>  tests/benchmark/test_urcu_gc.c          | 34 ++-------
>  tests/benchmark/test_urcu_hash.c        |  6 +-
>  tests/benchmark/test_urcu_hash.h        | 15 ----
>  tests/benchmark/test_urcu_hash_rw.c     | 10 +--
>  tests/benchmark/test_urcu_hash_unique.c | 10 +--
>  tests/benchmark/test_urcu_lfq.c         | 20 ++----
>  tests/benchmark/test_urcu_lfs.c         | 20 ++----
>  tests/benchmark/test_urcu_lfs_rcu.c     | 20 ++----
>  tests/benchmark/test_urcu_qsbr.c        | 33 +--------
>  tests/benchmark/test_urcu_qsbr_gc.c     | 34 ++-------
>  tests/benchmark/test_urcu_wfcq.c        | 22 +++---
>  tests/benchmark/test_urcu_wfq.c         | 20 ++----
>  tests/benchmark/test_urcu_wfs.c         | 22 +++---
>  23 files changed, 177 insertions(+), 460 deletions(-)
>  create mode 100644 tests/benchmark/common-states.c
>  create mode 100644 tests/benchmark/common-states.h
> 
> diff --git a/tests/benchmark/Makefile.am b/tests/benchmark/Makefile.am
> index c53e025..a7f91c2 100644
> --- a/tests/benchmark/Makefile.am
> +++ b/tests/benchmark/Makefile.am
> @@ -1,4 +1,5 @@
>  AM_CPPFLAGS += -I$(top_srcdir)/src -I$(top_srcdir)/tests/common
> +AM_CPPFLAGS += -include $(top_srcdir)/tests/benchmark/common-states.h
>  
>  TEST_EXTENSIONS = .tap
>  TAP_LOG_DRIVER_FLAGS = --merge --comments
> @@ -7,6 +8,8 @@ TAP_LOG_DRIVER = env AM_TAP_AWK='$(AWK)' \
>  	URCU_TESTS_BUILDDIR='$(abs_top_builddir)/tests' \
>  	$(SHELL) $(top_srcdir)/tests/utils/tap-driver.sh
>  
> +noinst_HEADERS = common-states.h
> +
>  SCRIPT_LIST = \
>  	runpaul-phase1.sh \
>  	runpaul-phase2.sh \
> @@ -61,163 +64,163 @@ URCU_CDS_LIB=$(top_builddir)/src/liburcu-cds.la
>  
>  DEBUG_YIELD_LIB=$(builddir)/../common/libdebug-yield.la
>  
> -test_urcu_SOURCES = test_urcu.c
> +test_urcu_SOURCES = test_urcu.c common-states.c
>  test_urcu_LDADD = $(URCU_LIB)
>  
> -test_urcu_dynamic_link_SOURCES = test_urcu.c
> +test_urcu_dynamic_link_SOURCES = test_urcu.c common-states.c
>  test_urcu_dynamic_link_LDADD = $(URCU_LIB)
>  test_urcu_dynamic_link_CFLAGS = -DDYNAMIC_LINK_TEST $(AM_CFLAGS)
>  
> -test_urcu_timing_SOURCES = test_urcu_timing.c
> +test_urcu_timing_SOURCES = test_urcu_timing.c common-states.c
>  test_urcu_timing_LDADD = $(URCU_LIB)
>  
> -test_urcu_yield_SOURCES = test_urcu.c
> +test_urcu_yield_SOURCES = test_urcu.c common-states.c
>  test_urcu_yield_LDADD = $(URCU_LIB) $(DEBUG_YIELD_LIB)
>  test_urcu_yield_CFLAGS = -DDEBUG_YIELD $(AM_CFLAGS)
>  
>  
> -test_urcu_qsbr_SOURCES = test_urcu_qsbr.c
> +test_urcu_qsbr_SOURCES = test_urcu_qsbr.c common-states.c
>  test_urcu_qsbr_LDADD = $(URCU_QSBR_LIB)
>  
> -test_urcu_qsbr_timing_SOURCES = test_urcu_qsbr_timing.c
> +test_urcu_qsbr_timing_SOURCES = test_urcu_qsbr_timing.c common-states.c
>  test_urcu_qsbr_timing_LDADD = $(URCU_QSBR_LIB)
>  
>  
> -test_urcu_mb_SOURCES = test_urcu.c
> +test_urcu_mb_SOURCES = test_urcu.c common-states.c
>  test_urcu_mb_LDADD = $(URCU_MB_LIB)
>  test_urcu_mb_CFLAGS = -DRCU_MB $(AM_CFLAGS)
>  
>  
> -test_urcu_signal_SOURCES = test_urcu.c
> +test_urcu_signal_SOURCES = test_urcu.c common-states.c
>  test_urcu_signal_LDADD = $(URCU_SIGNAL_LIB)
>  test_urcu_signal_CFLAGS = -DRCU_SIGNAL $(AM_CFLAGS)
>  
> -test_urcu_signal_dynamic_link_SOURCES = test_urcu.c
> +test_urcu_signal_dynamic_link_SOURCES = test_urcu.c common-states.c
>  test_urcu_signal_dynamic_link_LDADD = $(URCU_SIGNAL_LIB)
>  test_urcu_signal_dynamic_link_CFLAGS = -DRCU_SIGNAL -DDYNAMIC_LINK_TEST \
>  					$(AM_CFLAGS)
>  
> -test_urcu_signal_timing_SOURCES = test_urcu_timing.c
> +test_urcu_signal_timing_SOURCES = test_urcu_timing.c common-states.c
>  test_urcu_signal_timing_LDADD = $(URCU_SIGNAL_LIB)
>  test_urcu_signal_timing_CFLAGS= -DRCU_SIGNAL $(AM_CFLAGS)
>  
> -test_urcu_signal_yield_SOURCES = test_urcu.c
> +test_urcu_signal_yield_SOURCES = test_urcu.c common-states.c
>  test_urcu_signal_yield_LDADD = $(URCU_SIGNAL_LIB) $(DEBUG_YIELD_LIB)
>  test_urcu_signal_yield_CFLAGS = -DRCU_SIGNAL -DDEBUG_YIELD $(AM_CFLAGS)
>  
> -test_rwlock_timing_SOURCES = test_rwlock_timing.c
> +test_rwlock_timing_SOURCES = test_rwlock_timing.c common-states.c
>  test_rwlock_timing_LDADD = $(URCU_SIGNAL_LIB)
>  
> -test_rwlock_SOURCES = test_rwlock.c
> +test_rwlock_SOURCES = test_rwlock.c common-states.c
>  test_rwlock_LDADD = $(URCU_SIGNAL_LIB)
>  
> -test_perthreadlock_timing_SOURCES = test_perthreadlock_timing.c
> +test_perthreadlock_timing_SOURCES = test_perthreadlock_timing.c common-states.c
>  test_perthreadlock_timing_LDADD = $(URCU_SIGNAL_LIB)
>  
> -test_perthreadlock_SOURCES = test_perthreadlock.c
> +test_perthreadlock_SOURCES = test_perthreadlock.c common-states.c
>  test_perthreadlock_LDADD = $(URCU_SIGNAL_LIB)
>  
> -test_mutex_SOURCES = test_mutex.c
> +test_mutex_SOURCES = test_mutex.c common-states.c
>  
> -test_looplen_SOURCES = test_looplen.c
> +test_looplen_SOURCES = test_looplen.c common-states.c
>  
> -test_urcu_gc_SOURCES = test_urcu_gc.c
> +test_urcu_gc_SOURCES = test_urcu_gc.c common-states.c
>  test_urcu_gc_LDADD = $(URCU_LIB)
>  
> -test_urcu_signal_gc_SOURCES = test_urcu_gc.c
> +test_urcu_signal_gc_SOURCES = test_urcu_gc.c common-states.c
>  test_urcu_signal_gc_LDADD = $(URCU_SIGNAL_LIB)
>  test_urcu_signal_gc_CFLAGS = -DRCU_SIGNAL $(AM_CFLAGS)
>  
> -test_urcu_mb_gc_SOURCES = test_urcu_gc.c
> +test_urcu_mb_gc_SOURCES = test_urcu_gc.c common-states.c
>  test_urcu_mb_gc_LDADD = $(URCU_MB_LIB)
>  test_urcu_mb_gc_CFLAGS = -DRCU_MB $(AM_CFLAGS)
>  
> -test_urcu_qsbr_gc_SOURCES = test_urcu_qsbr_gc.c
> +test_urcu_qsbr_gc_SOURCES = test_urcu_qsbr_gc.c common-states.c
>  test_urcu_qsbr_gc_LDADD = $(URCU_QSBR_LIB)
>  
> -test_urcu_qsbr_lgc_SOURCES = test_urcu_qsbr_gc.c
> +test_urcu_qsbr_lgc_SOURCES = test_urcu_qsbr_gc.c common-states.c
>  test_urcu_qsbr_lgc_LDADD = $(URCU_QSBR_LIB)
>  test_urcu_qsbr_lgc_CFLAGS = -DTEST_LOCAL_GC $(AM_CFLAGS)
>  
> -test_urcu_lgc_SOURCES = test_urcu_gc.c
> +test_urcu_lgc_SOURCES = test_urcu_gc.c common-states.c
>  test_urcu_lgc_LDADD = $(URCU_LIB)
>  test_urcu_lgc_CFLAGS = -DTEST_LOCAL_GC $(AM_CFLAGS)
>  
> -test_urcu_signal_lgc_SOURCES = test_urcu_gc.c
> +test_urcu_signal_lgc_SOURCES = test_urcu_gc.c common-states.c
>  test_urcu_signal_lgc_LDADD = $(URCU_SIGNAL_LIB)
>  test_urcu_signal_lgc_CFLAGS = -DRCU_SIGNAL -DTEST_LOCAL_GC $(AM_CFLAGS)
>  
> -test_urcu_mb_lgc_SOURCES = test_urcu_gc.c
> +test_urcu_mb_lgc_SOURCES = test_urcu_gc.c common-states.c
>  test_urcu_mb_lgc_LDADD = $(URCU_MB_LIB)
>  test_urcu_mb_lgc_CFLAGS = -DTEST_LOCAL_GC -DRCU_MB $(AM_CFLAGS)
>  
> -test_urcu_qsbr_dynamic_link_SOURCES = test_urcu_qsbr.c
> +test_urcu_qsbr_dynamic_link_SOURCES = test_urcu_qsbr.c common-states.c
>  test_urcu_qsbr_dynamic_link_LDADD = $(URCU_QSBR_LIB)
>  test_urcu_qsbr_dynamic_link_CFLAGS = -DDYNAMIC_LINK_TEST $(AM_CFLAGS)
>  
> -test_urcu_defer_SOURCES = test_urcu_defer.c
> +test_urcu_defer_SOURCES = test_urcu_defer.c common-states.c
>  test_urcu_defer_LDADD = $(URCU_LIB)
>  
>  test_cycles_per_loop_SOURCES = test_cycles_per_loop.c
>  
> -test_urcu_assign_SOURCES = test_urcu_assign.c
> +test_urcu_assign_SOURCES = test_urcu_assign.c common-states.c
>  test_urcu_assign_LDADD = $(URCU_LIB)
>  
> -test_urcu_assign_dynamic_link_SOURCES = test_urcu_assign.c
> +test_urcu_assign_dynamic_link_SOURCES = test_urcu_assign.c common-states.c
>  test_urcu_assign_dynamic_link_LDADD = $(URCU_LIB)
>  test_urcu_assign_dynamic_link_CFLAGS = -DDYNAMIC_LINK_TEST $(AM_CFLAGS)
>  
> -test_urcu_bp_SOURCES = test_urcu_bp.c
> +test_urcu_bp_SOURCES = test_urcu_bp.c common-states.c
>  test_urcu_bp_LDADD = $(URCU_BP_LIB)
>  
> -test_urcu_bp_dynamic_link_SOURCES = test_urcu_bp.c
> +test_urcu_bp_dynamic_link_SOURCES = test_urcu_bp.c common-states.c
>  test_urcu_bp_dynamic_link_LDADD = $(URCU_BP_LIB)
>  test_urcu_bp_dynamic_link_CFLAGS = -DDYNAMIC_LINK_TEST $(AM_CFLAGS)
>  
> -test_urcu_lfq_SOURCES = test_urcu_lfq.c
> +test_urcu_lfq_SOURCES = test_urcu_lfq.c common-states.c
>  test_urcu_lfq_LDADD = $(URCU_LIB) $(URCU_CDS_LIB)
>  
> -test_urcu_lfq_dynlink_SOURCES = test_urcu_lfq.c
> +test_urcu_lfq_dynlink_SOURCES = test_urcu_lfq.c common-states.c
>  test_urcu_lfq_dynlink_CFLAGS = -DDYNAMIC_LINK_TEST $(AM_CFLAGS)
>  test_urcu_lfq_dynlink_LDADD = $(URCU_LIB) $(URCU_CDS_LIB)
>  
> -test_urcu_wfq_SOURCES = test_urcu_wfq.c
> +test_urcu_wfq_SOURCES = test_urcu_wfq.c common-states.c
>  test_urcu_wfq_LDADD = $(URCU_COMMON_LIB)
>  
> -test_urcu_wfq_dynlink_SOURCES = test_urcu_wfq.c
> +test_urcu_wfq_dynlink_SOURCES = test_urcu_wfq.c common-states.c
>  test_urcu_wfq_dynlink_CFLAGS = -DDYNAMIC_LINK_TEST $(AM_CFLAGS)
>  test_urcu_wfq_dynlink_LDADD = $(URCU_COMMON_LIB)
>  
> -test_urcu_wfcq_SOURCES = test_urcu_wfcq.c
> +test_urcu_wfcq_SOURCES = test_urcu_wfcq.c common-states.c
>  test_urcu_wfcq_LDADD = $(URCU_COMMON_LIB)
>  
> -test_urcu_wfcq_dynlink_SOURCES = test_urcu_wfcq.c
> +test_urcu_wfcq_dynlink_SOURCES = test_urcu_wfcq.c common-states.c
>  test_urcu_wfcq_dynlink_CFLAGS = -DDYNAMIC_LINK_TEST $(AM_CFLAGS)
>  test_urcu_wfcq_dynlink_LDADD = $(URCU_COMMON_LIB)
>  
> -test_urcu_lfs_SOURCES = test_urcu_lfs.c
> +test_urcu_lfs_SOURCES = test_urcu_lfs.c common-states.c
>  test_urcu_lfs_LDADD = $(URCU_LIB) $(URCU_CDS_LIB)
>  
> -test_urcu_lfs_rcu_SOURCES = test_urcu_lfs_rcu.c
> +test_urcu_lfs_rcu_SOURCES = test_urcu_lfs_rcu.c common-states.c
>  test_urcu_lfs_rcu_LDADD = $(URCU_LIB) $(URCU_CDS_LIB)
>  
> -test_urcu_lfs_dynlink_SOURCES = test_urcu_lfs.c
> +test_urcu_lfs_dynlink_SOURCES = test_urcu_lfs.c common-states.c
>  test_urcu_lfs_dynlink_CFLAGS = -DDYNAMIC_LINK_TEST $(AM_CFLAGS)
>  test_urcu_lfs_dynlink_LDADD = $(URCU_LIB) $(URCU_CDS_LIB)
>  
> -test_urcu_lfs_rcu_dynlink_SOURCES = test_urcu_lfs_rcu.c
> +test_urcu_lfs_rcu_dynlink_SOURCES = test_urcu_lfs_rcu.c common-states.c
>  test_urcu_lfs_rcu_dynlink_CFLAGS = -DDYNAMIC_LINK_TEST $(AM_CFLAGS)
>  test_urcu_lfs_rcu_dynlink_LDADD = $(URCU_LIB) $(URCU_CDS_LIB)
>  
> -test_urcu_wfs_SOURCES = test_urcu_wfs.c
> +test_urcu_wfs_SOURCES = test_urcu_wfs.c common-states.c
>  test_urcu_wfs_LDADD = $(URCU_COMMON_LIB)
>  
> -test_urcu_wfs_dynlink_SOURCES = test_urcu_wfs.c
> +test_urcu_wfs_dynlink_SOURCES = test_urcu_wfs.c common-states.c
>  test_urcu_wfs_dynlink_CFLAGS = -DDYNAMIC_LINK_TEST $(AM_CFLAGS)
>  test_urcu_wfs_dynlink_LDADD = $(URCU_COMMON_LIB)
>  
>  test_urcu_hash_SOURCES = test_urcu_hash.c test_urcu_hash.h \
> -		test_urcu_hash_rw.c test_urcu_hash_unique.c
> +		test_urcu_hash_rw.c test_urcu_hash_unique.c common-states.c
>  test_urcu_hash_CFLAGS = -DRCU_QSBR $(AM_CFLAGS)
>  test_urcu_hash_LDADD = $(URCU_QSBR_LIB) $(URCU_COMMON_LIB) $(URCU_CDS_LIB)
>  
> diff --git a/tests/benchmark/common-states.c b/tests/benchmark/common-states.c
> new file mode 100644
> index 0000000..6e70351
> --- /dev/null
> +++ b/tests/benchmark/common-states.c
> @@ -0,0 +1 @@
> +volatile int _test_go = 0, _test_stop = 0;
> diff --git a/tests/benchmark/common-states.h b/tests/benchmark/common-states.h
> new file mode 100644
> index 0000000..dfbbfe5
> --- /dev/null
> +++ b/tests/benchmark/common-states.h
> @@ -0,0 +1,51 @@
> +/* Common states for benchmarks. */
> +
> +#include <unistd.h>
> +
> +#include <urcu/uatomic.h>
> +
> +extern volatile int _test_go, _test_stop;
> +
> +static inline void complete_sleep(unsigned int seconds)
> +{
> +	while (seconds != 0) {
> +		seconds = sleep(seconds);
> +	}
> +}
> +
> +static inline void begin_test(void)
> +{
> +	uatomic_store(&_test_go, 1, CMM_RELEASE);
> +}
> +
> +static inline void end_test(void)
> +{
> +	uatomic_store(&_test_stop, 1, CMM_RELAXED);
> +}
> +
> +static inline void test_for(unsigned int duration)
> +{
> +	begin_test();
> +	complete_sleep(duration);
> +	end_test();
> +}
> +
> +static inline void wait_until_go(void)
> +{
> +	while (!uatomic_load(&_test_go, CMM_ACQUIRE))
> +	{
> +	}
> +}
> +
> +/*
> + * returns 0 if test should end.
> + */
> +static inline int test_duration_write(void)
> +{
> +	return !uatomic_load(&_test_stop, CMM_RELAXED);
> +}
> +
> +static inline int test_duration_read(void)
> +{
> +	return !uatomic_load(&_test_stop, CMM_RELAXED);
> +}
> diff --git a/tests/benchmark/test_mutex.c b/tests/benchmark/test_mutex.c
> index 55f7c38..145139c 100644
> --- a/tests/benchmark/test_mutex.c
> +++ b/tests/benchmark/test_mutex.c
> @@ -49,8 +49,6 @@ struct test_array {
>  
>  static pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
>  
> -static volatile int test_go, test_stop;
> -
>  static unsigned long wdelay;
>  
>  static volatile struct test_array test_array = { 8 };
> @@ -111,19 +109,6 @@ static void set_affinity(void)
>  #endif /* HAVE_SCHED_SETAFFINITY */
>  }
>  
> -/*
> - * returns 0 if test should end.
> - */
> -static int test_duration_write(void)
> -{
> -	return !test_stop;
> -}
> -
> -static int test_duration_read(void)
> -{
> -	return !test_stop;
> -}
> -
>  static DEFINE_URCU_TLS(unsigned long long, nr_writes);
>  static DEFINE_URCU_TLS(unsigned long long, nr_reads);
>  
> @@ -147,9 +132,7 @@ void *thr_reader(void *data)
>  
>  	set_affinity();
>  
> -	while (!test_go)
> -	{
> -	}
> +	wait_until_go();
>  
>  	for (;;) {
>  		int v;
> @@ -182,10 +165,7 @@ void *thr_writer(void *data)
>  
>  	set_affinity();
>  
> -	while (!test_go)
> -	{
> -	}
> -	cmm_smp_mb();
> +	wait_until_go();
>  
>  	for (;;) {
>  		pthread_mutex_lock(&lock);
> @@ -325,13 +305,7 @@ int main(int argc, char **argv)
>  			exit(1);
>  	}
>  
> -	cmm_smp_mb();
> -
> -	test_go = 1;
> -
> -	sleep(duration);
> -
> -	test_stop = 1;
> +	test_for(duration);
>  
>  	for (i_thr = 0; i_thr < nr_readers; i_thr++) {
>  		err = pthread_join(tid_reader[i_thr], &tret);
> diff --git a/tests/benchmark/test_perthreadlock.c b/tests/benchmark/test_perthreadlock.c
> index 47a512c..bf468eb 100644
> --- a/tests/benchmark/test_perthreadlock.c
> +++ b/tests/benchmark/test_perthreadlock.c
> @@ -53,8 +53,6 @@ struct per_thread_lock {
>  
>  static struct per_thread_lock *per_thread_lock;
>  
> -static volatile int test_go, test_stop;
> -
>  static unsigned long wdelay;
>  
>  static volatile struct test_array test_array = { 8 };
> @@ -117,19 +115,6 @@ static void set_affinity(void)
>  #endif /* HAVE_SCHED_SETAFFINITY */
>  }
>  
> -/*
> - * returns 0 if test should end.
> - */
> -static int test_duration_write(void)
> -{
> -	return !test_stop;
> -}
> -
> -static int test_duration_read(void)
> -{
> -	return !test_stop;
> -}
> -
>  static DEFINE_URCU_TLS(unsigned long long, nr_writes);
>  static DEFINE_URCU_TLS(unsigned long long, nr_reads);
>  
> @@ -175,9 +160,7 @@ void *thr_reader(void *data)
>  
>  	set_affinity();
>  
> -	while (!test_go)
> -	{
> -	}
> +	wait_until_go();
>  
>  	for (;;) {
>  		int v;
> @@ -211,10 +194,7 @@ void *thr_writer(void *data)
>  
>  	set_affinity();
>  
> -	while (!test_go)
> -	{
> -	}
> -	cmm_smp_mb();
> +	wait_until_go();
>  
>  	for (;;) {
>  		for (tidx = 0; tidx < (long)nr_readers; tidx++) {
> @@ -359,13 +339,7 @@ int main(int argc, char **argv)
>  			exit(1);
>  	}
>  
> -	cmm_smp_mb();
> -
> -	test_go = 1;
> -
> -	sleep(duration);
> -
> -	test_stop = 1;
> +	test_for(duration);
>  
>  	for (i_thr = 0; i_thr < nr_readers; i_thr++) {
>  		err = pthread_join(tid_reader[i_thr], &tret);
> diff --git a/tests/benchmark/test_rwlock.c b/tests/benchmark/test_rwlock.c
> index 6908ea4..f5099e8 100644
> --- a/tests/benchmark/test_rwlock.c
> +++ b/tests/benchmark/test_rwlock.c
> @@ -53,8 +53,6 @@ struct test_array {
>   */
>  pthread_rwlock_t lock;
>  
> -static volatile int test_go, test_stop;
> -
>  static unsigned long wdelay;
>  
>  static volatile struct test_array test_array = { 8 };
> @@ -116,19 +114,6 @@ static void set_affinity(void)
>  #endif /* HAVE_SCHED_SETAFFINITY */
>  }
>  
> -/*
> - * returns 0 if test should end.
> - */
> -static int test_duration_write(void)
> -{
> -	return !test_stop;
> -}
> -
> -static int test_duration_read(void)
> -{
> -	return !test_stop;
> -}
> -
>  static DEFINE_URCU_TLS(unsigned long long, nr_writes);
>  static DEFINE_URCU_TLS(unsigned long long, nr_reads);
>  
> @@ -147,9 +132,7 @@ void *thr_reader(void *_count)
>  
>  	set_affinity();
>  
> -	while (!test_go)
> -	{
> -	}
> +	wait_until_go();
>  
>  	for (;;) {
>  		int a, ret;
> @@ -194,10 +177,7 @@ void *thr_writer(void *_count)
>  
>  	set_affinity();
>  
> -	while (!test_go)
> -	{
> -	}
> -	cmm_smp_mb();
> +	wait_until_go();
>  
>  	for (;;) {
>  		int ret;
> @@ -355,13 +335,7 @@ int main(int argc, char **argv)
>  			exit(1);
>  	}
>  
> -	cmm_smp_mb();
> -
> -	test_go = 1;
> -
> -	sleep(duration);
> -
> -	test_stop = 1;
> +	test_for(duration);
>  
>  	for (i_thr = 0; i_thr < nr_readers; i_thr++) {
>  		err = pthread_join(tid_reader[i_thr], &tret);
> diff --git a/tests/benchmark/test_urcu.c b/tests/benchmark/test_urcu.c
> index ea849fa..b89513b 100644
> --- a/tests/benchmark/test_urcu.c
> +++ b/tests/benchmark/test_urcu.c
> @@ -44,8 +44,6 @@
>  #endif
>  #include <urcu.h>
>  
> -static volatile int test_go, test_stop;
> -
>  static unsigned long wdelay;
>  
>  static int *test_rcu_pointer;
> @@ -107,19 +105,6 @@ static void set_affinity(void)
>  #endif /* HAVE_SCHED_SETAFFINITY */
>  }
>  
> -/*
> - * returns 0 if test should end.
> - */
> -static int test_duration_write(void)
> -{
> -	return !test_stop;
> -}
> -
> -static int test_duration_read(void)
> -{
> -	return !test_stop;
> -}
> -
>  static DEFINE_URCU_TLS(unsigned long long, nr_writes);
>  static DEFINE_URCU_TLS(unsigned long long, nr_reads);
>  
> @@ -142,10 +127,7 @@ void *thr_reader(void *_count)
>  	rcu_register_thread();
>  	urcu_posix_assert(!rcu_read_ongoing());
>  
> -	while (!test_go)
> -	{
> -	}
> -	cmm_smp_mb();
> +	wait_until_go();
>  
>  	for (;;) {
>  		rcu_read_lock();
> @@ -186,10 +168,7 @@ void *thr_writer(void *_count)
>  
>  	set_affinity();
>  
> -	while (!test_go)
> -	{
> -	}
> -	cmm_smp_mb();
> +	wait_until_go();
>  
>  	for (;;) {
>  		new = malloc(sizeof(int));
> @@ -337,13 +316,7 @@ int main(int argc, char **argv)
>  			exit(1);
>  	}
>  
> -	cmm_smp_mb();
> -
> -	test_go = 1;
> -
> -	sleep(duration);
> -
> -	test_stop = 1;
> +	test_for(duration);
>  
>  	for (i_thr = 0; i_thr < nr_readers; i_thr++) {
>  		err = pthread_join(tid_reader[i_thr], &tret);
> diff --git a/tests/benchmark/test_urcu_assign.c b/tests/benchmark/test_urcu_assign.c
> index 88889a8..e83b05e 100644
> --- a/tests/benchmark/test_urcu_assign.c
> +++ b/tests/benchmark/test_urcu_assign.c
> @@ -48,8 +48,6 @@ struct test_array {
>  	int a;
>  };
>  
> -static volatile int test_go, test_stop;
> -
>  static unsigned long wdelay;
>  
>  static struct test_array *test_rcu_pointer;
> @@ -111,19 +109,6 @@ static void set_affinity(void)
>  #endif /* HAVE_SCHED_SETAFFINITY */
>  }
>  
> -/*
> - * returns 0 if test should end.
> - */
> -static int test_duration_write(void)
> -{
> -	return !test_stop;
> -}
> -
> -static int test_duration_read(void)
> -{
> -	return !test_stop;
> -}
> -
>  static DEFINE_URCU_TLS(unsigned long long, nr_writes);
>  static DEFINE_URCU_TLS(unsigned long long, nr_reads);
>  
> @@ -201,10 +186,7 @@ void *thr_reader(void *_count)
>  
>  	rcu_register_thread();
>  
> -	while (!test_go)
> -	{
> -	}
> -	cmm_smp_mb();
> +	wait_until_go();
>  
>  	for (;;) {
>  		rcu_read_lock();
> @@ -240,10 +222,7 @@ void *thr_writer(void *_count)
>  
>  	set_affinity();
>  
> -	while (!test_go)
> -	{
> -	}
> -	cmm_smp_mb();
> +	wait_until_go();
>  
>  	for (;;) {
>  		rcu_copy_mutex_lock();
> @@ -394,13 +373,7 @@ int main(int argc, char **argv)
>  			exit(1);
>  	}
>  
> -	cmm_smp_mb();
> -
> -	test_go = 1;
> -
> -	sleep(duration);
> -
> -	test_stop = 1;
> +	test_for(duration);
>  
>  	for (i_thr = 0; i_thr < nr_readers; i_thr++) {
>  		err = pthread_join(tid_reader[i_thr], &tret);
> diff --git a/tests/benchmark/test_urcu_bp.c b/tests/benchmark/test_urcu_bp.c
> index 6f8c59d..c3b00f1 100644
> --- a/tests/benchmark/test_urcu_bp.c
> +++ b/tests/benchmark/test_urcu_bp.c
> @@ -44,8 +44,6 @@
>  #endif
>  #include <urcu-bp.h>
>  
> -static volatile int test_go, test_stop;
> -
>  static unsigned long wdelay;
>  
>  static int *test_rcu_pointer;
> @@ -107,19 +105,6 @@ static void set_affinity(void)
>  #endif /* HAVE_SCHED_SETAFFINITY */
>  }
>  
> -/*
> - * returns 0 if test should end.
> - */
> -static int test_duration_write(void)
> -{
> -	return !test_stop;
> -}
> -
> -static int test_duration_read(void)
> -{
> -	return !test_stop;
> -}
> -
>  static DEFINE_URCU_TLS(unsigned long long, nr_writes);
>  static DEFINE_URCU_TLS(unsigned long long, nr_reads);
>  
> @@ -142,10 +127,7 @@ void *thr_reader(void *_count)
>  	rcu_register_thread();
>  	urcu_posix_assert(!rcu_read_ongoing());
>  
> -	while (!test_go)
> -	{
> -	}
> -	cmm_smp_mb();
> +	wait_until_go();
>  
>  	for (;;) {
>  		rcu_read_lock();
> @@ -182,10 +164,7 @@ void *thr_writer(void *_count)
>  
>  	set_affinity();
>  
> -	while (!test_go)
> -	{
> -	}
> -	cmm_smp_mb();
> +	wait_until_go();
>  
>  	for (;;) {
>  		new = malloc(sizeof(int));
> @@ -332,13 +311,7 @@ int main(int argc, char **argv)
>  			exit(1);
>  	}
>  
> -	cmm_smp_mb();
> -
> -	test_go = 1;
> -
> -	sleep(duration);
> -
> -	test_stop = 1;
> +	test_for(duration);
>  
>  	for (i_thr = 0; i_thr < nr_readers; i_thr++) {
>  		err = pthread_join(tid_reader[i_thr], &tret);
> diff --git a/tests/benchmark/test_urcu_defer.c b/tests/benchmark/test_urcu_defer.c
> index e948ebf..c501f60 100644
> --- a/tests/benchmark/test_urcu_defer.c
> +++ b/tests/benchmark/test_urcu_defer.c
> @@ -49,8 +49,6 @@ struct test_array {
>  	int a;
>  };
>  
> -static volatile int test_go, test_stop;
> -
>  static unsigned long wdelay;
>  
>  static struct test_array *test_rcu_pointer;
> @@ -112,19 +110,6 @@ static void set_affinity(void)
>  #endif /* HAVE_SCHED_SETAFFINITY */
>  }
>  
> -/*
> - * returns 0 if test should end.
> - */
> -static int test_duration_write(void)
> -{
> -	return !test_stop;
> -}
> -
> -static int test_duration_read(void)
> -{
> -	return !test_stop;
> -}
> -
>  static DEFINE_URCU_TLS(unsigned long long, nr_writes);
>  static DEFINE_URCU_TLS(unsigned long long, nr_reads);
>  
> @@ -149,10 +134,7 @@ void *thr_reader(void *_count)
>  
>  	rcu_register_thread();
>  
> -	while (!test_go)
> -	{
> -	}
> -	cmm_smp_mb();
> +	wait_until_go();
>  
>  	for (;;) {
>  		rcu_read_lock();
> @@ -203,10 +185,7 @@ void *thr_writer(void *data)
>  		exit(-1);
>  	}
>  
> -	while (!test_go)
> -	{
> -	}
> -	cmm_smp_mb();
> +	wait_until_go();
>  
>  	for (;;) {
>  		new = malloc(sizeof(*new));
> @@ -359,13 +338,7 @@ int main(int argc, char **argv)
>  			exit(1);
>  	}
>  
> -	cmm_smp_mb();
> -
> -	test_go = 1;
> -
> -	sleep(duration);
> -
> -	test_stop = 1;
> +	test_for(duration);
>  
>  	for (i_thr = 0; i_thr < nr_readers; i_thr++) {
>  		err = pthread_join(tid_reader[i_thr], &tret);
> diff --git a/tests/benchmark/test_urcu_gc.c b/tests/benchmark/test_urcu_gc.c
> index f14f728..1cbee44 100644
> --- a/tests/benchmark/test_urcu_gc.c
> +++ b/tests/benchmark/test_urcu_gc.c
> @@ -33,6 +33,7 @@
>  #include <urcu/arch.h>
>  #include <urcu/assert.h>
>  #include <urcu/tls-compat.h>
> +#include <urcu/uatomic.h>
>  #include "thread-id.h"
>  #include "../common/debug-yield.h"
>  
> @@ -48,8 +49,6 @@ struct test_array {
>  	int a;
>  };
>  
> -static volatile int test_go, test_stop;
> -
>  static unsigned long wdelay;
>  
>  static struct test_array *test_rcu_pointer;
> @@ -120,19 +119,6 @@ static void set_affinity(void)
>  #endif /* HAVE_SCHED_SETAFFINITY */
>  }
>  
> -/*
> - * returns 0 if test should end.
> - */
> -static int test_duration_write(void)
> -{
> -	return !test_stop;
> -}
> -
> -static int test_duration_read(void)
> -{
> -	return !test_stop;
> -}
> -
>  static DEFINE_URCU_TLS(unsigned long long, nr_writes);
>  static DEFINE_URCU_TLS(unsigned long long, nr_reads);
>  
> @@ -157,10 +143,7 @@ void *thr_reader(void *_count)
>  
>  	rcu_register_thread();
>  
> -	while (!test_go)
> -	{
> -	}
> -	cmm_smp_mb();
> +	wait_until_go();
>  
>  	for (;;) {
>  		rcu_read_lock();
> @@ -231,10 +214,7 @@ void *thr_writer(void *data)
>  
>  	set_affinity();
>  
> -	while (!test_go)
> -	{
> -	}
> -	cmm_smp_mb();
> +	wait_until_go();
>  
>  	for (;;) {
>  #ifndef TEST_LOCAL_GC
> @@ -399,13 +379,7 @@ int main(int argc, char **argv)
>  			exit(1);
>  	}
>  
> -	cmm_smp_mb();
> -
> -	test_go = 1;
> -
> -	sleep(duration);
> -
> -	test_stop = 1;
> +	test_for(duration);
>  
>  	for (i_thr = 0; i_thr < nr_readers; i_thr++) {
>  		err = pthread_join(tid_reader[i_thr], &tret);
> diff --git a/tests/benchmark/test_urcu_hash.c b/tests/benchmark/test_urcu_hash.c
> index 3574b4c..1a3087e 100644
> --- a/tests/benchmark/test_urcu_hash.c
> +++ b/tests/benchmark/test_urcu_hash.c
> @@ -96,8 +96,6 @@ DEFINE_URCU_TLS(unsigned long, lookup_ok);
>  
>  struct cds_lfht *test_ht;
>  
> -volatile int test_go, test_stop;
> -
>  unsigned long wdelay;
>  
>  unsigned long duration;
> @@ -649,14 +647,14 @@ int main(int argc, char **argv)
>  
>  	cmm_smp_mb();
>  
> -	test_go = 1;
> +	begin_test();
>  
>  	remain = duration;
>  	do {
>  		remain = sleep(remain);
>  	} while (remain > 0);
>  
> -	test_stop = 1;
> +	end_test();
>  
>  end_pthread_join:
>  	for (i_thr = 0; i_thr < nr_readers_created; i_thr++) {
> diff --git a/tests/benchmark/test_urcu_hash.h b/tests/benchmark/test_urcu_hash.h
> index 47b2ae3..73a0a6d 100644
> --- a/tests/benchmark/test_urcu_hash.h
> +++ b/tests/benchmark/test_urcu_hash.h
> @@ -125,8 +125,6 @@ cds_lfht_iter_get_test_node(struct cds_lfht_iter *iter)
>  	return to_test_node(cds_lfht_iter_get_node(iter));
>  }
>  
> -extern volatile int test_go, test_stop;
> -
>  extern unsigned long wdelay;
>  
>  extern unsigned long duration;
> @@ -174,19 +172,6 @@ extern pthread_mutex_t affinity_mutex;
>  
>  void set_affinity(void);
>  
> -/*
> - * returns 0 if test should end.
> - */
> -static inline int test_duration_write(void)
> -{
> -	return !test_stop;
> -}
> -
> -static inline int test_duration_read(void)
> -{
> -	return !test_stop;
> -}
> -
>  extern DECLARE_URCU_TLS(unsigned long long, nr_writes);
>  extern DECLARE_URCU_TLS(unsigned long long, nr_reads);
>  
> diff --git a/tests/benchmark/test_urcu_hash_rw.c b/tests/benchmark/test_urcu_hash_rw.c
> index 862a6f0..087e869 100644
> --- a/tests/benchmark/test_urcu_hash_rw.c
> +++ b/tests/benchmark/test_urcu_hash_rw.c
> @@ -73,10 +73,7 @@ void *test_hash_rw_thr_reader(void *_count)
>  
>  	rcu_register_thread();
>  
> -	while (!test_go)
> -	{
> -	}
> -	cmm_smp_mb();
> +	wait_until_go();
>  
>  	for (;;) {
>  		rcu_read_lock();
> @@ -133,10 +130,7 @@ void *test_hash_rw_thr_writer(void *_count)
>  
>  	rcu_register_thread();
>  
> -	while (!test_go)
> -	{
> -	}
> -	cmm_smp_mb();
> +	wait_until_go();
>  
>  	for (;;) {
>  		struct cds_lfht_node *ret_node = NULL;
> diff --git a/tests/benchmark/test_urcu_hash_unique.c b/tests/benchmark/test_urcu_hash_unique.c
> index de7c427..90c0e19 100644
> --- a/tests/benchmark/test_urcu_hash_unique.c
> +++ b/tests/benchmark/test_urcu_hash_unique.c
> @@ -71,10 +71,7 @@ void *test_hash_unique_thr_reader(void *_count)
>  
>  	rcu_register_thread();
>  
> -	while (!test_go)
> -	{
> -	}
> -	cmm_smp_mb();
> +	wait_until_go();
>  
>  	for (;;) {
>  		struct lfht_test_node *node;
> @@ -136,10 +133,7 @@ void *test_hash_unique_thr_writer(void *_count)
>  
>  	rcu_register_thread();
>  
> -	while (!test_go)
> -	{
> -	}
> -	cmm_smp_mb();
> +	wait_until_go();
>  
>  	for (;;) {
>  		/*
> diff --git a/tests/benchmark/test_urcu_lfq.c b/tests/benchmark/test_urcu_lfq.c
> index 490e8b0..50c4211 100644
> --- a/tests/benchmark/test_urcu_lfq.c
> +++ b/tests/benchmark/test_urcu_lfq.c
> @@ -47,8 +47,6 @@
>  #include <urcu.h>
>  #include <urcu/cds.h>
>  
> -static volatile int test_go, test_stop;
> -
>  static unsigned long rduration;
>  
>  static unsigned long duration;
> @@ -110,12 +108,12 @@ static void set_affinity(void)
>   */
>  static int test_duration_dequeue(void)
>  {
> -	return !test_stop;
> +	return test_duration_read();
>  }
>  
>  static int test_duration_enqueue(void)
>  {
> -	return !test_stop;
> +	return test_duration_write();
>  }
>  
>  static DEFINE_URCU_TLS(unsigned long long, nr_dequeues);
> @@ -146,10 +144,7 @@ void *thr_enqueuer(void *_count)
>  
>  	rcu_register_thread();
>  
> -	while (!test_go)
> -	{
> -	}
> -	cmm_smp_mb();
> +	wait_until_go();
>  
>  	for (;;) {
>  		struct test *node = malloc(sizeof(*node));
> @@ -202,10 +197,7 @@ void *thr_dequeuer(void *_count)
>  
>  	rcu_register_thread();
>  
> -	while (!test_go)
> -	{
> -	}
> -	cmm_smp_mb();
> +	wait_until_go();
>  
>  	for (;;) {
>  		struct cds_lfq_node_rcu *qnode;
> @@ -375,7 +367,7 @@ int main(int argc, char **argv)
>  
>  	cmm_smp_mb();
>  
> -	test_go = 1;
> +	begin_test();
>  
>  	for (i_thr = 0; i_thr < duration; i_thr++) {
>  		sleep(1);
> @@ -385,7 +377,7 @@ int main(int argc, char **argv)
>  		}
>  	}
>  
> -	test_stop = 1;
> +	end_test();
>  
>  	for (i_thr = 0; i_thr < nr_enqueuers; i_thr++) {
>  		err = pthread_join(tid_enqueuer[i_thr], &tret);
> diff --git a/tests/benchmark/test_urcu_lfs.c b/tests/benchmark/test_urcu_lfs.c
> index 52239e0..48b2b23 100644
> --- a/tests/benchmark/test_urcu_lfs.c
> +++ b/tests/benchmark/test_urcu_lfs.c
> @@ -59,8 +59,6 @@ enum test_sync {
>  
>  static enum test_sync test_sync;
>  
> -static volatile int test_go, test_stop;
> -
>  static unsigned long rduration;
>  
>  static unsigned long duration;
> @@ -124,12 +122,12 @@ static void set_affinity(void)
>   */
>  static int test_duration_dequeue(void)
>  {
> -	return !test_stop;
> +	return test_duration_read();
>  }
>  
>  static int test_duration_enqueue(void)
>  {
> -	return !test_stop;
> +	return test_duration_write();
>  }
>  
>  static DEFINE_URCU_TLS(unsigned long long, nr_dequeues);
> @@ -159,10 +157,7 @@ static void *thr_enqueuer(void *_count)
>  
>  	rcu_register_thread();
>  
> -	while (!test_go)
> -	{
> -	}
> -	cmm_smp_mb();
> +	wait_until_go();
>  
>  	for (;;) {
>  		struct test *node = malloc(sizeof(*node));
> @@ -261,10 +256,7 @@ static void *thr_dequeuer(void *_count)
>  
>  	rcu_register_thread();
>  
> -	while (!test_go)
> -	{
> -	}
> -	cmm_smp_mb();
> +	wait_until_go();
>  
>  	urcu_posix_assert(test_pop || test_pop_all);
>  
> @@ -459,7 +451,7 @@ int main(int argc, char **argv)
>  
>  	cmm_smp_mb();
>  
> -	test_go = 1;
> +	begin_test();
>  
>  	for (i_thr = 0; i_thr < duration; i_thr++) {
>  		sleep(1);
> @@ -469,7 +461,7 @@ int main(int argc, char **argv)
>  		}
>  	}
>  
> -	test_stop = 1;
> +	end_test();
>  
>  	for (i_thr = 0; i_thr < nr_enqueuers; i_thr++) {
>  		err = pthread_join(tid_enqueuer[i_thr], &tret);
> diff --git a/tests/benchmark/test_urcu_lfs_rcu.c b/tests/benchmark/test_urcu_lfs_rcu.c
> index 7975faf..ae3dff4 100644
> --- a/tests/benchmark/test_urcu_lfs_rcu.c
> +++ b/tests/benchmark/test_urcu_lfs_rcu.c
> @@ -51,8 +51,6 @@
>  
>  #include <urcu/cds.h>
>  
> -static volatile int test_go, test_stop;
> -
>  static unsigned long rduration;
>  
>  static unsigned long duration;
> @@ -114,12 +112,12 @@ static void set_affinity(void)
>   */
>  static int test_duration_dequeue(void)
>  {
> -	return !test_stop;
> +	return test_duration_read();
>  }
>  
>  static int test_duration_enqueue(void)
>  {
> -	return !test_stop;
> +	return test_duration_write();
>  }
>  
>  static DEFINE_URCU_TLS(unsigned long long, nr_dequeues);
> @@ -150,10 +148,7 @@ void *thr_enqueuer(void *_count)
>  
>  	rcu_register_thread();
>  
> -	while (!test_go)
> -	{
> -	}
> -	cmm_smp_mb();
> +	wait_until_go();
>  
>  	for (;;) {
>  		struct test *node = malloc(sizeof(*node));
> @@ -205,10 +200,7 @@ void *thr_dequeuer(void *_count)
>  
>  	rcu_register_thread();
>  
> -	while (!test_go)
> -	{
> -	}
> -	cmm_smp_mb();
> +	wait_until_go();
>  
>  	for (;;) {
>  		struct cds_lfs_node_rcu *snode;
> @@ -377,7 +369,7 @@ int main(int argc, char **argv)
>  
>  	cmm_smp_mb();
>  
> -	test_go = 1;
> +	begin_test();
>  
>  	for (i_thr = 0; i_thr < duration; i_thr++) {
>  		sleep(1);
> @@ -387,7 +379,7 @@ int main(int argc, char **argv)
>  		}
>  	}
>  
> -	test_stop = 1;
> +	end_test();
>  
>  	for (i_thr = 0; i_thr < nr_enqueuers; i_thr++) {
>  		err = pthread_join(tid_enqueuer[i_thr], &tret);
> diff --git a/tests/benchmark/test_urcu_qsbr.c b/tests/benchmark/test_urcu_qsbr.c
> index 1ea369c..295e9db 100644
> --- a/tests/benchmark/test_urcu_qsbr.c
> +++ b/tests/benchmark/test_urcu_qsbr.c
> @@ -44,8 +44,6 @@
>  #endif
>  #include "urcu-qsbr.h"
>  
> -static volatile int test_go, test_stop;
> -
>  static unsigned long wdelay;
>  
>  static int *test_rcu_pointer;
> @@ -106,19 +104,6 @@ static void set_affinity(void)
>  #endif /* HAVE_SCHED_SETAFFINITY */
>  }
>  
> -/*
> - * returns 0 if test should end.
> - */
> -static int test_duration_write(void)
> -{
> -	return !test_stop;
> -}
> -
> -static int test_duration_read(void)
> -{
> -	return !test_stop;
> -}
> -
>  static DEFINE_URCU_TLS(unsigned long long, nr_writes);
>  static DEFINE_URCU_TLS(unsigned long long, nr_reads);
>  
> @@ -145,10 +130,7 @@ void *thr_reader(void *_count)
>  	urcu_posix_assert(!rcu_read_ongoing());
>  	rcu_thread_online();
>  
> -	while (!test_go)
> -	{
> -	}
> -	cmm_smp_mb();
> +	wait_until_go();
>  
>  	for (;;) {
>  		rcu_read_lock();
> @@ -192,10 +174,7 @@ void *thr_writer(void *_count)
>  
>  	set_affinity();
>  
> -	while (!test_go)
> -	{
> -	}
> -	cmm_smp_mb();
> +	wait_until_go();
>  
>  	for (;;) {
>  		new = malloc(sizeof(int));
> @@ -343,13 +322,7 @@ int main(int argc, char **argv)
>  			exit(1);
>  	}
>  
> -	cmm_smp_mb();
> -
> -	test_go = 1;
> -
> -	sleep(duration);
> -
> -	test_stop = 1;
> +	test_for(duration);
>  
>  	for (i_thr = 0; i_thr < nr_readers; i_thr++) {
>  		err = pthread_join(tid_reader[i_thr], &tret);
> diff --git a/tests/benchmark/test_urcu_qsbr_gc.c b/tests/benchmark/test_urcu_qsbr_gc.c
> index 8877a82..163405d 100644
> --- a/tests/benchmark/test_urcu_qsbr_gc.c
> +++ b/tests/benchmark/test_urcu_qsbr_gc.c
> @@ -33,6 +33,7 @@
>  #include <urcu/arch.h>
>  #include <urcu/assert.h>
>  #include <urcu/tls-compat.h>
> +#include <urcu/uatomic.h>
>  #include "thread-id.h"
>  #include "../common/debug-yield.h"
>  
> @@ -46,8 +47,6 @@ struct test_array {
>  	int a;
>  };
>  
> -static volatile int test_go, test_stop;
> -
>  static unsigned long wdelay;
>  
>  static struct test_array *test_rcu_pointer;
> @@ -118,19 +117,6 @@ static void set_affinity(void)
>  #endif /* HAVE_SCHED_SETAFFINITY */
>  }
>  
> -/*
> - * returns 0 if test should end.
> - */
> -static int test_duration_write(void)
> -{
> -	return !test_stop;
> -}
> -
> -static int test_duration_read(void)
> -{
> -	return !test_stop;
> -}
> -
>  static DEFINE_URCU_TLS(unsigned long long, nr_writes);
>  static DEFINE_URCU_TLS(unsigned long long, nr_reads);
>  
> @@ -154,10 +140,7 @@ void *thr_reader(void *_count)
>  
>  	rcu_register_thread();
>  
> -	while (!test_go)
> -	{
> -	}
> -	cmm_smp_mb();
> +	wait_until_go();
>  
>  	for (;;) {
>  		_rcu_read_lock();
> @@ -231,10 +214,7 @@ void *thr_writer(void *data)
>  
>  	set_affinity();
>  
> -	while (!test_go)
> -	{
> -	}
> -	cmm_smp_mb();
> +	wait_until_go();
>  
>  	for (;;) {
>  #ifndef TEST_LOCAL_GC
> @@ -399,13 +379,7 @@ int main(int argc, char **argv)
>  			exit(1);
>  	}
>  
> -	cmm_smp_mb();
> -
> -	test_go = 1;
> -
> -	sleep(duration);
> -
> -	test_stop = 1;
> +	test_for(duration);
>  
>  	for (i_thr = 0; i_thr < nr_readers; i_thr++) {
>  		err = pthread_join(tid_reader[i_thr], &tret);
> diff --git a/tests/benchmark/test_urcu_wfcq.c b/tests/benchmark/test_urcu_wfcq.c
> index 2c6e0fd..542a13a 100644
> --- a/tests/benchmark/test_urcu_wfcq.c
> +++ b/tests/benchmark/test_urcu_wfcq.c
> @@ -56,7 +56,7 @@ static enum test_sync test_sync;
>  
>  static int test_force_sync;
>  
> -static volatile int test_go, test_stop_enqueue, test_stop_dequeue;
> +static volatile int test_stop_enqueue, test_stop_dequeue;
>  
>  static unsigned long rduration;
>  
> @@ -122,12 +122,12 @@ static void set_affinity(void)
>   */
>  static int test_duration_dequeue(void)
>  {
> -	return !test_stop_dequeue;
> +	return !uatomic_load(&test_stop_dequeue, CMM_RELAXED);
>  }
>  
>  static int test_duration_enqueue(void)
>  {
> -	return !test_stop_enqueue;
> +	return !uatomic_load(&test_stop_enqueue, CMM_RELAXED);
>  }
>  
>  static DEFINE_URCU_TLS(unsigned long long, nr_dequeues);
> @@ -155,10 +155,7 @@ static void *thr_enqueuer(void *_count)
>  
>  	set_affinity();
>  
> -	while (!test_go)
> -	{
> -	}
> -	cmm_smp_mb();
> +	wait_until_go();
>  
>  	for (;;) {
>  		struct cds_wfcq_node *node = malloc(sizeof(*node));
> @@ -266,10 +263,7 @@ static void *thr_dequeuer(void *_count)
>  
>  	set_affinity();
>  
> -	while (!test_go)
> -	{
> -	}
> -	cmm_smp_mb();
> +	wait_until_go();
>  
>  	for (;;) {
>  		if (test_dequeue && test_splice) {
> @@ -482,7 +476,7 @@ int main(int argc, char **argv)
>  
>  	cmm_smp_mb();
>  
> -	test_go = 1;
> +	begin_test();
>  
>  	for (i_thr = 0; i_thr < duration; i_thr++) {
>  		sleep(1);
> @@ -492,7 +486,7 @@ int main(int argc, char **argv)
>  		}
>  	}
>  
> -	test_stop_enqueue = 1;
> +	uatomic_store(&test_stop_enqueue, 1, CMM_RELEASE);
>  
>  	if (test_wait_empty) {
>  		while (nr_enqueuers != uatomic_read(&test_enqueue_stopped)) {
> @@ -503,7 +497,7 @@ int main(int argc, char **argv)
>  		}
>  	}
>  
> -	test_stop_dequeue = 1;
> +	uatomic_store(&test_stop_dequeue, 1, CMM_RELAXED);
>  
>  	for (i_thr = 0; i_thr < nr_enqueuers; i_thr++) {
>  		err = pthread_join(tid_enqueuer[i_thr], &tret);
> diff --git a/tests/benchmark/test_urcu_wfq.c b/tests/benchmark/test_urcu_wfq.c
> index 8381160..2d8de87 100644
> --- a/tests/benchmark/test_urcu_wfq.c
> +++ b/tests/benchmark/test_urcu_wfq.c
> @@ -51,8 +51,6 @@
>  #include <urcu.h>
>  #include <urcu/wfqueue.h>
>  
> -static volatile int test_go, test_stop;
> -
>  static unsigned long rduration;
>  
>  static unsigned long duration;
> @@ -114,12 +112,12 @@ static void set_affinity(void)
>   */
>  static int test_duration_dequeue(void)
>  {
> -	return !test_stop;
> +	return test_duration_read();
>  }
>  
>  static int test_duration_enqueue(void)
>  {
> -	return !test_stop;
> +	return test_duration_write();
>  }
>  
>  static DEFINE_URCU_TLS(unsigned long long, nr_dequeues);
> @@ -143,10 +141,7 @@ void *thr_enqueuer(void *_count)
>  
>  	set_affinity();
>  
> -	while (!test_go)
> -	{
> -	}
> -	cmm_smp_mb();
> +	wait_until_go();
>  
>  	for (;;) {
>  		struct cds_wfq_node *node = malloc(sizeof(*node));
> @@ -185,10 +180,7 @@ void *thr_dequeuer(void *_count)
>  
>  	set_affinity();
>  
> -	while (!test_go)
> -	{
> -	}
> -	cmm_smp_mb();
> +	wait_until_go();
>  
>  	for (;;) {
>  		struct cds_wfq_node *node = cds_wfq_dequeue_blocking(&q);
> @@ -343,7 +335,7 @@ int main(int argc, char **argv)
>  
>  	cmm_smp_mb();
>  
> -	test_go = 1;
> +	begin_test();
>  
>  	for (i_thr = 0; i_thr < duration; i_thr++) {
>  		sleep(1);
> @@ -353,7 +345,7 @@ int main(int argc, char **argv)
>  		}
>  	}
>  
> -	test_stop = 1;
> +	end_test();
>  
>  	for (i_thr = 0; i_thr < nr_enqueuers; i_thr++) {
>  		err = pthread_join(tid_enqueuer[i_thr], &tret);
> diff --git a/tests/benchmark/test_urcu_wfs.c b/tests/benchmark/test_urcu_wfs.c
> index c285feb..d1a4afc 100644
> --- a/tests/benchmark/test_urcu_wfs.c
> +++ b/tests/benchmark/test_urcu_wfs.c
> @@ -59,7 +59,7 @@ static enum test_sync test_sync;
>  
>  static int test_force_sync;
>  
> -static volatile int test_go, test_stop_enqueue, test_stop_dequeue;
> +static volatile int test_stop_enqueue, test_stop_dequeue;
>  
>  static unsigned long rduration;
>  
> @@ -125,12 +125,12 @@ static void set_affinity(void)
>   */
>  static int test_duration_dequeue(void)
>  {
> -	return !test_stop_dequeue;
> +	return !uatomic_load(&test_stop_dequeue, CMM_RELAXED);
>  }
>  
>  static int test_duration_enqueue(void)
>  {
> -	return !test_stop_enqueue;
> +	return !uatomic_load(&test_stop_enqueue, CMM_RELAXED);
>  }
>  
>  static DEFINE_URCU_TLS(unsigned long long, nr_dequeues);
> @@ -157,10 +157,7 @@ static void *thr_enqueuer(void *_count)
>  
>  	set_affinity();
>  
> -	while (!test_go)
> -	{
> -	}
> -	cmm_smp_mb();
> +	wait_until_go();
>  
>  	for (;;) {
>  		struct cds_wfs_node *node = malloc(sizeof(*node));
> @@ -250,10 +247,7 @@ static void *thr_dequeuer(void *_count)
>  
>  	set_affinity();
>  
> -	while (!test_go)
> -	{
> -	}
> -	cmm_smp_mb();
> +	wait_until_go();
>  
>  	urcu_posix_assert(test_pop || test_pop_all);
>  
> @@ -469,7 +463,7 @@ int main(int argc, char **argv)
>  
>  	cmm_smp_mb();
>  
> -	test_go = 1;
> +	begin_test();
>  
>  	for (i_thr = 0; i_thr < duration; i_thr++) {
>  		sleep(1);
> @@ -479,7 +473,7 @@ int main(int argc, char **argv)
>  		}
>  	}
>  
> -	test_stop_enqueue = 1;
> +	uatomic_store(&test_stop_enqueue, 1, CMM_RELEASE);
>  
>  	if (test_wait_empty) {
>  		while (nr_enqueuers != uatomic_read(&test_enqueue_stopped)) {
> @@ -490,7 +484,7 @@ int main(int argc, char **argv)
>  		}
>  	}
>  
> -	test_stop_dequeue = 1;
> +	uatomic_store(&test_stop_dequeue, 1, CMM_RELAXED);
>  
>  	for (i_thr = 0; i_thr < nr_enqueuers; i_thr++) {
>  		err = pthread_join(tid_enqueuer[i_thr], &tret);
> -- 
> 2.40.1
> 
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [lttng-dev] [PATCH 04/11] urcu/arch/generic: Use atomic builtins if configured
  2023-06-21 23:22   ` Paul E. McKenney via lttng-dev
@ 2023-06-22  0:53     ` Olivier Dion via lttng-dev
  2023-06-22  1:48       ` Mathieu Desnoyers via lttng-dev
  0 siblings, 1 reply; 69+ messages in thread
From: Olivier Dion via lttng-dev @ 2023-06-22  0:53 UTC (permalink / raw)
  To: paulmck; +Cc: lttng-dev

On Wed, 21 Jun 2023, "Paul E. McKenney" <paulmck@kernel.org> wrote:
> On Mon, May 15, 2023 at 04:17:11PM -0400, Olivier Dion wrote:
>>  #ifndef cmm_mb
>>  #define cmm_mb()    __sync_synchronize()
>
> Just out of curiosity, why not also implement cmm_mb() in terms of
> __atomic_thread_fence(__ATOMIC_SEQ_CST)?  (Or is that a later patch?)

IIRC, Mathieu and I agree that the definition of a thread fence -- acts
as a synchronization fence between threads -- is too weak for what we
want here.  For example, with I/O devices.

Although __sync_synchronize() is probably an alias for a SEQ_CST thread
fence, its definition -- issues a full memory barrier -- is stronger.

We do not want to rely on this assumption (alias) and prefer to rely on
the documented definition instead.

-- 
Olivier Dion
EfficiOS Inc.
https://www.efficios.com
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [lttng-dev] [PATCH 04/11] urcu/arch/generic: Use atomic builtins if configured
  2023-06-22  0:53     ` Olivier Dion via lttng-dev
@ 2023-06-22  1:48       ` Mathieu Desnoyers via lttng-dev
  2023-06-22  3:44         ` Paul E. McKenney via lttng-dev
  0 siblings, 1 reply; 69+ messages in thread
From: Mathieu Desnoyers via lttng-dev @ 2023-06-22  1:48 UTC (permalink / raw)
  To: Olivier Dion, paulmck; +Cc: lttng-dev

On 6/21/23 20:53, Olivier Dion wrote:
> On Wed, 21 Jun 2023, "Paul E. McKenney" <paulmck@kernel.org> wrote:
>> On Mon, May 15, 2023 at 04:17:11PM -0400, Olivier Dion wrote:
>>>   #ifndef cmm_mb
>>>   #define cmm_mb()    __sync_synchronize()
>>
>> Just out of curiosity, why not also implement cmm_mb() in terms of
>> __atomic_thread_fence(__ATOMIC_SEQ_CST)?  (Or is that a later patch?)
> 
> IIRC, Mathieu and I agree that the definition of a thread fence -- acts
> as a synchronization fence between threads -- is too weak for what we
> want here.  For example, with I/O devices.
> 
> Although __sync_synchronize() is probably an alias for a SEQ_CST thread
> fence, its definition -- issues a full memory barrier -- is stronger.
> 
> We do not want to rely on this assumption (alias) and prefer to rely on
> the documented definition instead.
> 

We should document this rationale with a new comment near the #define,
in case anyone mistakenly decides to use a thread fence there to make it
similar to the rest of the code in the future.

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [lttng-dev] [PATCH 04/11] urcu/arch/generic: Use atomic builtins if configured
  2023-06-22  1:48       ` Mathieu Desnoyers via lttng-dev
@ 2023-06-22  3:44         ` Paul E. McKenney via lttng-dev
  0 siblings, 0 replies; 69+ messages in thread
From: Paul E. McKenney via lttng-dev @ 2023-06-22  3:44 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: Olivier Dion, lttng-dev

On Wed, Jun 21, 2023 at 09:48:10PM -0400, Mathieu Desnoyers wrote:
> On 6/21/23 20:53, Olivier Dion wrote:
> > On Wed, 21 Jun 2023, "Paul E. McKenney" <paulmck@kernel.org> wrote:
> > > On Mon, May 15, 2023 at 04:17:11PM -0400, Olivier Dion wrote:
> > > >   #ifndef cmm_mb
> > > >   #define cmm_mb()    __sync_synchronize()
> > > 
> > > Just out of curiosity, why not also implement cmm_mb() in terms of
> > > __atomic_thread_fence(__ATOMIC_SEQ_CST)?  (Or is that a later patch?)
> > 
> > IIRC, Mathieu and I agree that the definition of a thread fence -- acts
> > as a synchronization fence between threads -- is too weak for what we
> > want here.  For example, with I/O devices.
> > 
> > Although __sync_synchronize() is probably an alias for a SEQ_CST thread
> > fence, its definition -- issues a full memory barrier -- is stronger.
> > 
> > We do not want to rely on this assumption (alias) and prefer to rely on
> > the documented definition instead.
> 
> We should document this rationale with a new comment near the #define,
> in case anyone mistakenly decides to use a thread fence there to make it
> similar to the rest of the code in the future.

That would be good, thank you!

Ah, and I did not find any issues with the rest of the patchset.

							Thanx, Paul
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [lttng-dev] [PATCH 02/11] urcu/uatomic: Use atomic builtins if configured
  2023-06-21 23:19   ` Paul E. McKenney via lttng-dev
@ 2023-06-22 15:55     ` Mathieu Desnoyers via lttng-dev
  2023-06-22 18:32       ` Paul E. McKenney via lttng-dev
  0 siblings, 1 reply; 69+ messages in thread
From: Mathieu Desnoyers via lttng-dev @ 2023-06-22 15:55 UTC (permalink / raw)
  To: paulmck, Olivier Dion; +Cc: lttng-dev

On 6/21/23 19:19, Paul E. McKenney wrote:
[...]
>> diff --git a/include/urcu/uatomic/builtins-generic.h b/include/urcu/uatomic/builtins-generic.h
>> new file mode 100644
>> index 0000000..8e6a9b5
>> --- /dev/null
>> +++ b/include/urcu/uatomic/builtins-generic.h
>> @@ -0,0 +1,85 @@
>> +/*
>> + * urcu/uatomic/builtins-generic.h
>> + *
>> + * Copyright (c) 2023 Olivier Dion <odion@efficios.com>
>> + *
>> + * This library is free software; you can redistribute it and/or
>> + * modify it under the terms of the GNU Lesser General Public
>> + * License as published by the Free Software Foundation; either
>> + * version 2.1 of the License, or (at your option) any later version.
>> + *
>> + * This library is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> + * Lesser General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU Lesser General Public
>> + * License along with this library; if not, write to the Free Software
>> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
>> + */
>> +
>> +#ifndef _URCU_UATOMIC_BUILTINS_GENERIC_H
>> +#define _URCU_UATOMIC_BUILTINS_GENERIC_H
>> +
>> +#include <urcu/system.h>
>> +
>> +#define uatomic_set(addr, v) __atomic_store_n(addr, v, __ATOMIC_RELAXED)
>> +
>> +#define uatomic_read(addr) __atomic_load_n(addr, __ATOMIC_RELAXED)
> 
> Does this lose the volatile semantics that the old-style definitions
> had?
> 

Yes.

[...]

>> +++ b/include/urcu/uatomic/builtins-x86.h
>> @@ -0,0 +1,85 @@
>> +/*
>> + * urcu/uatomic/builtins-x86.h
>> + *
>> + * Copyright (c) 2023 Olivier Dion <odion@efficios.com>
>> + *
>> + * This library is free software; you can redistribute it and/or
>> + * modify it under the terms of the GNU Lesser General Public
>> + * License as published by the Free Software Foundation; either
>> + * version 2.1 of the License, or (at your option) any later version.
>> + *
>> + * This library is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> + * Lesser General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU Lesser General Public
>> + * License along with this library; if not, write to the Free Software
>> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
>> + */
>> +
>> +#ifndef _URCU_UATOMIC_BUILTINS_X86_H
>> +#define _URCU_UATOMIC_BUILTINS_X86_H
>> +
>> +#include <urcu/system.h>
>> +
>> +#define uatomic_set(addr, v) __atomic_store_n(addr, v, __ATOMIC_RELAXED)
>> +
>> +#define uatomic_read(addr) __atomic_load_n(addr, __ATOMIC_RELAXED)
> 
> And same question here.

Yes, this opens interesting questions:

* what semantic do we want for uatomic_read/set ?

* what semantic do we want for CMM_LOAD_SHARED/CMM_STORE_SHARED ?

* do we want to allow load/store-shared to work on variables larger than 
a word ? (e.g. on a uint64_t on a 32-bit architecture, or on a structure)

* what are the guarantees of a volatile type ?

* what are the guarantees of a load/store relaxed in C11 ?

Does the delta between volatile and C11 relaxed guarantees matter ?

Is there an advantage to use C11 load/store relaxed over volatile ? 
Should we combine both C11 load/store _and_ volatile ? Should we use 
atomic_signal_fence instead ?

Thanks,

Mathieu

> 
> 							Thanx, Paul
> 
>> +

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [lttng-dev] [PATCH 02/11] urcu/uatomic: Use atomic builtins if configured
  2023-06-22 15:55     ` Mathieu Desnoyers via lttng-dev
@ 2023-06-22 18:32       ` Paul E. McKenney via lttng-dev
  2023-06-22 19:53         ` Olivier Dion via lttng-dev
                           ` (2 more replies)
  0 siblings, 3 replies; 69+ messages in thread
From: Paul E. McKenney via lttng-dev @ 2023-06-22 18:32 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: Olivier Dion, lttng-dev

On Thu, Jun 22, 2023 at 11:55:55AM -0400, Mathieu Desnoyers wrote:
> On 6/21/23 19:19, Paul E. McKenney wrote:
> [...]
> > > diff --git a/include/urcu/uatomic/builtins-generic.h b/include/urcu/uatomic/builtins-generic.h
> > > new file mode 100644
> > > index 0000000..8e6a9b5
> > > --- /dev/null
> > > +++ b/include/urcu/uatomic/builtins-generic.h
> > > @@ -0,0 +1,85 @@
> > > +/*
> > > + * urcu/uatomic/builtins-generic.h
> > > + *
> > > + * Copyright (c) 2023 Olivier Dion <odion@efficios.com>
> > > + *
> > > + * This library is free software; you can redistribute it and/or
> > > + * modify it under the terms of the GNU Lesser General Public
> > > + * License as published by the Free Software Foundation; either
> > > + * version 2.1 of the License, or (at your option) any later version.
> > > + *
> > > + * This library is distributed in the hope that it will be useful,
> > > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > > + * Lesser General Public License for more details.
> > > + *
> > > + * You should have received a copy of the GNU Lesser General Public
> > > + * License along with this library; if not, write to the Free Software
> > > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> > > + */
> > > +
> > > +#ifndef _URCU_UATOMIC_BUILTINS_GENERIC_H
> > > +#define _URCU_UATOMIC_BUILTINS_GENERIC_H
> > > +
> > > +#include <urcu/system.h>
> > > +
> > > +#define uatomic_set(addr, v) __atomic_store_n(addr, v, __ATOMIC_RELAXED)
> > > +
> > > +#define uatomic_read(addr) __atomic_load_n(addr, __ATOMIC_RELAXED)
> > 
> > Does this lose the volatile semantics that the old-style definitions
> > had?
> > 
> 
> Yes.
> 
> [...]
> 
> > > +++ b/include/urcu/uatomic/builtins-x86.h
> > > @@ -0,0 +1,85 @@
> > > +/*
> > > + * urcu/uatomic/builtins-x86.h
> > > + *
> > > + * Copyright (c) 2023 Olivier Dion <odion@efficios.com>
> > > + *
> > > + * This library is free software; you can redistribute it and/or
> > > + * modify it under the terms of the GNU Lesser General Public
> > > + * License as published by the Free Software Foundation; either
> > > + * version 2.1 of the License, or (at your option) any later version.
> > > + *
> > > + * This library is distributed in the hope that it will be useful,
> > > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > > + * Lesser General Public License for more details.
> > > + *
> > > + * You should have received a copy of the GNU Lesser General Public
> > > + * License along with this library; if not, write to the Free Software
> > > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> > > + */
> > > +
> > > +#ifndef _URCU_UATOMIC_BUILTINS_X86_H
> > > +#define _URCU_UATOMIC_BUILTINS_X86_H
> > > +
> > > +#include <urcu/system.h>
> > > +
> > > +#define uatomic_set(addr, v) __atomic_store_n(addr, v, __ATOMIC_RELAXED)
> > > +
> > > +#define uatomic_read(addr) __atomic_load_n(addr, __ATOMIC_RELAXED)
> > 
> > And same question here.
> 
> Yes, this opens interesting questions:
> 
> * what semantic do we want for uatomic_read/set ?
> 
> * what semantic do we want for CMM_LOAD_SHARED/CMM_STORE_SHARED ?
> 
> * do we want to allow load/store-shared to work on variables larger than a
> word ? (e.g. on a uint64_t on a 32-bit architecture, or on a structure)
> 
> * what are the guarantees of a volatile type ?
> 
> * what are the guarantees of a load/store relaxed in C11 ?
> 
> Does the delta between volatile and C11 relaxed guarantees matter ?
> 
> Is there an advantage to use C11 load/store relaxed over volatile ? Should
> we combine both C11 load/store _and_ volatile ? Should we use
> atomic_signal_fence instead ?

I suggest C11 volatile atomic load/store.  Load/store fusing is permitted
for non-volatile atomic loads and stores, and such fusing can ruin your
code's entire day.  ;-)

							Thanx, Paul
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [lttng-dev] [PATCH 02/11] urcu/uatomic: Use atomic builtins if configured
  2023-06-22 18:32       ` Paul E. McKenney via lttng-dev
@ 2023-06-22 19:53         ` Olivier Dion via lttng-dev
  2023-06-22 19:56           ` Mathieu Desnoyers via lttng-dev
  2023-06-22 20:11           ` Paul E. McKenney via lttng-dev
  2023-06-22 19:54         ` Mathieu Desnoyers via lttng-dev
  2023-06-29 17:22         ` Olivier Dion via lttng-dev
  2 siblings, 2 replies; 69+ messages in thread
From: Olivier Dion via lttng-dev @ 2023-06-22 19:53 UTC (permalink / raw)
  To: paulmck, Mathieu Desnoyers; +Cc: lttng-dev

On Thu, 22 Jun 2023, "Paul E. McKenney" <paulmck@kernel.org> wrote:

> I suggest C11 volatile atomic load/store.  Load/store fusing is permitted
> for non-volatile atomic loads and stores, and such fusing can ruin your
> code's entire day.  ;-)

Good catch.  Seems like not a problem on GCC (yet), but Clang is extremely
aggressive and seems to do store fusing on some corner cases [0].

However, I do not find any simple reproducer of load/store fusing.  Do
you have example of such fusing, or is this a precaution?  In the
meantime, back to reading the standard to be certain :-)

 [0] https://godbolt.org/z/odKG9a75a

-- 
Olivier Dion
EfficiOS Inc.
https://www.efficios.com
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [lttng-dev] [PATCH 02/11] urcu/uatomic: Use atomic builtins if configured
  2023-06-22 18:32       ` Paul E. McKenney via lttng-dev
  2023-06-22 19:53         ` Olivier Dion via lttng-dev
@ 2023-06-22 19:54         ` Mathieu Desnoyers via lttng-dev
  2023-06-29 17:22         ` Olivier Dion via lttng-dev
  2 siblings, 0 replies; 69+ messages in thread
From: Mathieu Desnoyers via lttng-dev @ 2023-06-22 19:54 UTC (permalink / raw)
  To: paulmck; +Cc: Olivier Dion, lttng-dev

On 6/22/23 14:32, Paul E. McKenney wrote:
> On Thu, Jun 22, 2023 at 11:55:55AM -0400, Mathieu Desnoyers wrote:
>> On 6/21/23 19:19, Paul E. McKenney wrote:
>> [...]
>>>> diff --git a/include/urcu/uatomic/builtins-generic.h b/include/urcu/uatomic/builtins-generic.h
>>>> new file mode 100644
>>>> index 0000000..8e6a9b5
>>>> --- /dev/null
>>>> +++ b/include/urcu/uatomic/builtins-generic.h
>>>> @@ -0,0 +1,85 @@
>>>> +/*
>>>> + * urcu/uatomic/builtins-generic.h
>>>> + *
>>>> + * Copyright (c) 2023 Olivier Dion <odion@efficios.com>
>>>> + *
>>>> + * This library is free software; you can redistribute it and/or
>>>> + * modify it under the terms of the GNU Lesser General Public
>>>> + * License as published by the Free Software Foundation; either
>>>> + * version 2.1 of the License, or (at your option) any later version.
>>>> + *
>>>> + * This library is distributed in the hope that it will be useful,
>>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>>>> + * Lesser General Public License for more details.
>>>> + *
>>>> + * You should have received a copy of the GNU Lesser General Public
>>>> + * License along with this library; if not, write to the Free Software
>>>> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
>>>> + */
>>>> +
>>>> +#ifndef _URCU_UATOMIC_BUILTINS_GENERIC_H
>>>> +#define _URCU_UATOMIC_BUILTINS_GENERIC_H
>>>> +
>>>> +#include <urcu/system.h>
>>>> +
>>>> +#define uatomic_set(addr, v) __atomic_store_n(addr, v, __ATOMIC_RELAXED)
>>>> +
>>>> +#define uatomic_read(addr) __atomic_load_n(addr, __ATOMIC_RELAXED)
>>>
>>> Does this lose the volatile semantics that the old-style definitions
>>> had?
>>>
>>
>> Yes.
>>
>> [...]
>>
>>>> +++ b/include/urcu/uatomic/builtins-x86.h
>>>> @@ -0,0 +1,85 @@
>>>> +/*
>>>> + * urcu/uatomic/builtins-x86.h
>>>> + *
>>>> + * Copyright (c) 2023 Olivier Dion <odion@efficios.com>
>>>> + *
>>>> + * This library is free software; you can redistribute it and/or
>>>> + * modify it under the terms of the GNU Lesser General Public
>>>> + * License as published by the Free Software Foundation; either
>>>> + * version 2.1 of the License, or (at your option) any later version.
>>>> + *
>>>> + * This library is distributed in the hope that it will be useful,
>>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>>>> + * Lesser General Public License for more details.
>>>> + *
>>>> + * You should have received a copy of the GNU Lesser General Public
>>>> + * License along with this library; if not, write to the Free Software
>>>> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
>>>> + */
>>>> +
>>>> +#ifndef _URCU_UATOMIC_BUILTINS_X86_H
>>>> +#define _URCU_UATOMIC_BUILTINS_X86_H
>>>> +
>>>> +#include <urcu/system.h>
>>>> +
>>>> +#define uatomic_set(addr, v) __atomic_store_n(addr, v, __ATOMIC_RELAXED)
>>>> +
>>>> +#define uatomic_read(addr) __atomic_load_n(addr, __ATOMIC_RELAXED)
>>>
>>> And same question here.
>>
>> Yes, this opens interesting questions:
>>
>> * what semantic do we want for uatomic_read/set ?
>>
>> * what semantic do we want for CMM_LOAD_SHARED/CMM_STORE_SHARED ?
>>
>> * do we want to allow load/store-shared to work on variables larger than a
>> word ? (e.g. on a uint64_t on a 32-bit architecture, or on a structure)
>>
>> * what are the guarantees of a volatile type ?
>>
>> * what are the guarantees of a load/store relaxed in C11 ?
>>
>> Does the delta between volatile and C11 relaxed guarantees matter ?
>>
>> Is there an advantage to use C11 load/store relaxed over volatile ? Should
>> we combine both C11 load/store _and_ volatile ? Should we use
>> atomic_signal_fence instead ?
> 
> I suggest C11 volatile atomic load/store.  Load/store fusing is permitted
> for non-volatile atomic loads and stores, and such fusing can ruin your
> code's entire day.  ;-)

I'm OK with erring towards a safer approach, but just out of curiosity, 
do you have examples of compilers doing load or store fusion on C11 or 
C++11 relaxed atomics, or is it out of caution due to lack of explicit 
guarantees in the standards ?

Does this lack of guarantee about fusion also apply to other MO such as 
acquire, release and seq.cst. ?

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [lttng-dev] [PATCH 02/11] urcu/uatomic: Use atomic builtins if configured
  2023-06-22 19:53         ` Olivier Dion via lttng-dev
@ 2023-06-22 19:56           ` Mathieu Desnoyers via lttng-dev
  2023-06-22 20:10             ` Olivier Dion via lttng-dev
  2023-06-22 20:11           ` Paul E. McKenney via lttng-dev
  1 sibling, 1 reply; 69+ messages in thread
From: Mathieu Desnoyers via lttng-dev @ 2023-06-22 19:56 UTC (permalink / raw)
  To: Olivier Dion, paulmck; +Cc: lttng-dev

On 6/22/23 15:53, Olivier Dion wrote:
> On Thu, 22 Jun 2023, "Paul E. McKenney" <paulmck@kernel.org> wrote:
> 
>> I suggest C11 volatile atomic load/store.  Load/store fusing is permitted
>> for non-volatile atomic loads and stores, and such fusing can ruin your
>> code's entire day.  ;-)
> 
> Good catch.  Seems like not a problem on GCC (yet), but Clang is extremely
> aggressive and seems to do store fusing on some corner cases [0].

I don't think this is an example of store fusing, but rather just that 
the compiler can eliminate stores to static variables which are 
otherwise unused, making the entire variable useless.

Thanks,

Mathieu

> 
> However, I do not find any simple reproducer of load/store fusing.  Do
> you have example of such fusing, or is this a precaution?  In the
> meantime, back to reading the standard to be certain :-)
> 
>   [0] https://godbolt.org/z/odKG9a75a
> 

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [lttng-dev] [PATCH 02/11] urcu/uatomic: Use atomic builtins if configured
  2023-06-22 19:56           ` Mathieu Desnoyers via lttng-dev
@ 2023-06-22 20:10             ` Olivier Dion via lttng-dev
  0 siblings, 0 replies; 69+ messages in thread
From: Olivier Dion via lttng-dev @ 2023-06-22 20:10 UTC (permalink / raw)
  To: Mathieu Desnoyers, paulmck; +Cc: lttng-dev

On Thu, 22 Jun 2023, Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> On 6/22/23 15:53, Olivier Dion wrote:
>> On Thu, 22 Jun 2023, "Paul E. McKenney" <paulmck@kernel.org> wrote:
>> 
>>> I suggest C11 volatile atomic load/store.  Load/store fusing is permitted
>>> for non-volatile atomic loads and stores, and such fusing can ruin your
>>> code's entire day.  ;-)
>> 
>> Good catch.  Seems like not a problem on GCC (yet), but Clang is extremely
>> aggressive and seems to do store fusing on some corner cases [0].
>
> I don't think this is an example of store fusing, but rather just that 
> the compiler can eliminate stores to static variables which are 
> otherwise unused, making the entire variable useless.

Indeed, that is not store fusing.  It is however interesting to see that
the dead store elimination is avoid when casting one of the &x with a
volatile qualifier.  TIL dead store can apply to atomics.

-- 
Olivier Dion
EfficiOS Inc.
https://www.efficios.com
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [lttng-dev] [PATCH 02/11] urcu/uatomic: Use atomic builtins if configured
  2023-06-22 19:53         ` Olivier Dion via lttng-dev
  2023-06-22 19:56           ` Mathieu Desnoyers via lttng-dev
@ 2023-06-22 20:11           ` Paul E. McKenney via lttng-dev
  1 sibling, 0 replies; 69+ messages in thread
From: Paul E. McKenney via lttng-dev @ 2023-06-22 20:11 UTC (permalink / raw)
  To: Olivier Dion; +Cc: lttng-dev

On Thu, Jun 22, 2023 at 03:53:33PM -0400, Olivier Dion wrote:
> On Thu, 22 Jun 2023, "Paul E. McKenney" <paulmck@kernel.org> wrote:
> 
> > I suggest C11 volatile atomic load/store.  Load/store fusing is permitted
> > for non-volatile atomic loads and stores, and such fusing can ruin your
> > code's entire day.  ;-)
> 
> Good catch.  Seems like not a problem on GCC (yet), but Clang is extremely
> aggressive and seems to do store fusing on some corner cases [0].
> 
> However, I do not find any simple reproducer of load/store fusing.  Do
> you have example of such fusing, or is this a precaution?  In the
> meantime, back to reading the standard to be certain :-)
> 
>  [0] https://godbolt.org/z/odKG9a75a

I certainly have heard a number of compiler writers thinking in terms
of doing load/store fusing, some of whom were trying to get rid of the
volatile variants in order to remove an impediment to their mission of
optimizing all programs out of existence.  ;-)

I therefore suggest taking this possibility quite seriously.

							Thanx, Paul
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [lttng-dev] [PATCH v2 05/12] urcu/uatomic: Add CMM memory model
  2023-06-21 23:28   ` Paul E. McKenney via lttng-dev
@ 2023-06-29 16:49     ` Olivier Dion via lttng-dev
  2023-06-29 18:40       ` Paul E. McKenney via lttng-dev
  0 siblings, 1 reply; 69+ messages in thread
From: Olivier Dion via lttng-dev @ 2023-06-29 16:49 UTC (permalink / raw)
  To: paulmck; +Cc: Tony Finch, lttng-dev

On Wed, 21 Jun 2023, "Paul E. McKenney" <paulmck@kernel.org> wrote:
> On Wed, Jun 07, 2023 at 02:53:52PM -0400, Olivier Dion wrote:
>> -#ifdef __URCU_DEREFERENCE_USE_ATOMIC_CONSUME
>> -# define _rcu_dereference(p) __extension__ ({						\
>> -				__typeof__(__extension__ ({				\
>> -					__typeof__(p) __attribute__((unused)) _________p0 = { 0 }; \
>> -					_________p0;					\
>> -				})) _________p1;					\
>> -				__atomic_load(&(p), &_________p1, __ATOMIC_CONSUME);	\
>
> There is talk of getting rid of memory_order_consume.  But for the moment,
> it is what there is.  Another alternative is to use a volatile load,
> similar to old-style CMM_LOAD_SHARED() or in-kernel READ_ONCE().

I think we can stick to __ATOMIC_CONSUME for now.  Hopefully getting rid
of it means it will be an alias for __ATOMIC_ACQUIRE for ever.

-- 
Olivier Dion
EfficiOS Inc.
https://www.efficios.com
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [lttng-dev] [PATCH 02/11] urcu/uatomic: Use atomic builtins if configured
  2023-06-22 18:32       ` Paul E. McKenney via lttng-dev
  2023-06-22 19:53         ` Olivier Dion via lttng-dev
  2023-06-22 19:54         ` Mathieu Desnoyers via lttng-dev
@ 2023-06-29 17:22         ` Olivier Dion via lttng-dev
  2023-06-29 17:27           ` Olivier Dion via lttng-dev
  2023-06-29 18:29           ` Mathieu Desnoyers via lttng-dev
  2 siblings, 2 replies; 69+ messages in thread
From: Olivier Dion via lttng-dev @ 2023-06-29 17:22 UTC (permalink / raw)
  To: paulmck, Mathieu Desnoyers; +Cc: lttng-dev

On Thu, 22 Jun 2023, "Paul E. McKenney" <paulmck@kernel.org> wrote:
> On Thu, Jun 22, 2023 at 11:55:55AM -0400, Mathieu Desnoyers wrote:
>> On 6/21/23 19:19, Paul E. McKenney wrote:
> I suggest C11 volatile atomic load/store.  Load/store fusing is permitted
> for non-volatile atomic loads and stores, and such fusing can ruin your
> code's entire day.  ;-)

After some testing, I got a wall of warnings:

  -Wignored-qualifiers:

    Warn if the return type of a function has a type qualifier such as
    "const".  For ISO C such a type qualifier has no effect, since the
    value returned by a function is not an lvalue.  For C++, the warning
    is only emitted for scalar types or "void".  ISO C prohibits
    qualified "void" return types on function definitions, so such
    return types always receive a warning even without this option.

Since we are using atomic builtins, for example load:

  type __atomic_load_n (type *ptr, int memorder)

If we put the qualifier volatile to TYPE, we end up with the same
qualifier on the return value, triggering a warning for each atomic
operation.

This seems to be only a problem when compiling in C++ [0] while in C it
seems the compiler is more relaxed on this [1].

Ideas to make the toolchains happy? :-)

  [0] https://godbolt.org/z/3nW14M3v1
  [1] https://godbolt.org/z/TcTeMeKbW

-- 
Olivier Dion
EfficiOS Inc.
https://www.efficios.com
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [lttng-dev] [PATCH 02/11] urcu/uatomic: Use atomic builtins if configured
  2023-06-29 17:22         ` Olivier Dion via lttng-dev
@ 2023-06-29 17:27           ` Olivier Dion via lttng-dev
  2023-06-29 18:33             ` Mathieu Desnoyers via lttng-dev
  2023-06-29 18:29           ` Mathieu Desnoyers via lttng-dev
  1 sibling, 1 reply; 69+ messages in thread
From: Olivier Dion via lttng-dev @ 2023-06-29 17:27 UTC (permalink / raw)
  To: paulmck, Mathieu Desnoyers; +Cc: lttng-dev

On Thu, 29 Jun 2023, Olivier Dion <odion@efficios.com> wrote:

>   [0] https://godbolt.org/z/3nW14M3v1
>   [1] https://godbolt.org/z/TcTeMeKbW

Sorry.  That was:

    [0] https://godbolt.org/z/ETcxnz4TW
    [1] https://godbolt.org/z/jMjh8YoM4

-- 
Olivier Dion
EfficiOS Inc.
https://www.efficios.com
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [lttng-dev] [PATCH 02/11] urcu/uatomic: Use atomic builtins if configured
  2023-06-29 17:22         ` Olivier Dion via lttng-dev
  2023-06-29 17:27           ` Olivier Dion via lttng-dev
@ 2023-06-29 18:29           ` Mathieu Desnoyers via lttng-dev
  1 sibling, 0 replies; 69+ messages in thread
From: Mathieu Desnoyers via lttng-dev @ 2023-06-29 18:29 UTC (permalink / raw)
  To: Olivier Dion, paulmck; +Cc: lttng-dev

On 6/29/23 13:22, Olivier Dion wrote:
> On Thu, 22 Jun 2023, "Paul E. McKenney" <paulmck@kernel.org> wrote:
>> On Thu, Jun 22, 2023 at 11:55:55AM -0400, Mathieu Desnoyers wrote:
>>> On 6/21/23 19:19, Paul E. McKenney wrote:
>> I suggest C11 volatile atomic load/store.  Load/store fusing is permitted
>> for non-volatile atomic loads and stores, and such fusing can ruin your
>> code's entire day.  ;-)
> 
> After some testing, I got a wall of warnings:
> 
>    -Wignored-qualifiers:
> 
>      Warn if the return type of a function has a type qualifier such as
>      "const".  For ISO C such a type qualifier has no effect, since the
>      value returned by a function is not an lvalue.  For C++, the warning
>      is only emitted for scalar types or "void".  ISO C prohibits
>      qualified "void" return types on function definitions, so such
>      return types always receive a warning even without this option.
> 
> Since we are using atomic builtins, for example load:
> 
>    type __atomic_load_n (type *ptr, int memorder)
> 
> If we put the qualifier volatile to TYPE, we end up with the same
> qualifier on the return value, triggering a warning for each atomic
> operation.
> 
> This seems to be only a problem when compiling in C++ [0] while in C it
> seems the compiler is more relaxed on this [1].
> 
> Ideas to make the toolchains happy? :-)

Change:

(__typeof__(*ptr) *volatile)(ptr);

(which applies the volatile to the pointer, rather than what is pointed to)

to either:

(volatile __typeof__(*ptr) *)(ptr);

or:

(__typeof__(*ptr) volatile *)(ptr);

Thanks,

Mathieu

> 
>    [0] https://godbolt.org/z/3nW14M3v1
>    [1] https://godbolt.org/z/TcTeMeKbW
> 

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [lttng-dev] [PATCH 02/11] urcu/uatomic: Use atomic builtins if configured
  2023-06-29 17:27           ` Olivier Dion via lttng-dev
@ 2023-06-29 18:33             ` Mathieu Desnoyers via lttng-dev
  0 siblings, 0 replies; 69+ messages in thread
From: Mathieu Desnoyers via lttng-dev @ 2023-06-29 18:33 UTC (permalink / raw)
  To: Olivier Dion, paulmck; +Cc: lttng-dev

On 6/29/23 13:27, Olivier Dion wrote:
> On Thu, 29 Jun 2023, Olivier Dion <odion@efficios.com> wrote:
> 
>>    [0] https://godbolt.org/z/3nW14M3v1
>>    [1] https://godbolt.org/z/TcTeMeKbW
> 
> Sorry.  That was:
> 
>      [0] https://godbolt.org/z/ETcxnz4TW

Change

(volatile __typeof__(ptr))(ptr);

for:

(volatile __typeof__(*(ptr)) *)(ptr);

and:

void love_iso(int *x)
{
      __atomic_store_n(cast_volatile(&x), 1,
                       __ATOMIC_RELAXED);
}

for

void love_iso(int *x)
{
      __atomic_store_n(cast_volatile(x), 1,
                       __ATOMIC_RELAXED);
}

Thanks,

Mathieu


>      [1] https://godbolt.org/z/jMjh8YoM4
> -- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [lttng-dev] [PATCH v2 05/12] urcu/uatomic: Add CMM memory model
  2023-06-29 16:49     ` Olivier Dion via lttng-dev
@ 2023-06-29 18:40       ` Paul E. McKenney via lttng-dev
  0 siblings, 0 replies; 69+ messages in thread
From: Paul E. McKenney via lttng-dev @ 2023-06-29 18:40 UTC (permalink / raw)
  To: Olivier Dion; +Cc: Tony Finch, lttng-dev

On Thu, Jun 29, 2023 at 12:49:00PM -0400, Olivier Dion wrote:
> On Wed, 21 Jun 2023, "Paul E. McKenney" <paulmck@kernel.org> wrote:
> > On Wed, Jun 07, 2023 at 02:53:52PM -0400, Olivier Dion wrote:
> >> -#ifdef __URCU_DEREFERENCE_USE_ATOMIC_CONSUME
> >> -# define _rcu_dereference(p) __extension__ ({						\
> >> -				__typeof__(__extension__ ({				\
> >> -					__typeof__(p) __attribute__((unused)) _________p0 = { 0 }; \
> >> -					_________p0;					\
> >> -				})) _________p1;					\
> >> -				__atomic_load(&(p), &_________p1, __ATOMIC_CONSUME);	\
> >
> > There is talk of getting rid of memory_order_consume.  But for the moment,
> > it is what there is.  Another alternative is to use a volatile load,
> > similar to old-style CMM_LOAD_SHARED() or in-kernel READ_ONCE().
> 
> I think we can stick to __ATOMIC_CONSUME for now.  Hopefully getting rid
> of it means it will be an alias for __ATOMIC_ACQUIRE for ever.

That seems emininently reasonable to me!

							Thanx, Paul
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [lttng-dev] [PATCH v2 04/12] urcu/system: Use atomic builtins if configured
  2023-06-21 23:23   ` Paul E. McKenney via lttng-dev
@ 2023-07-04 14:43     ` Olivier Dion via lttng-dev
  2023-07-05 18:48       ` Paul E. McKenney via lttng-dev
  0 siblings, 1 reply; 69+ messages in thread
From: Olivier Dion via lttng-dev @ 2023-07-04 14:43 UTC (permalink / raw)
  To: paulmck; +Cc: Tony Finch, lttng-dev

On Wed, 21 Jun 2023, "Paul E. McKenney" <paulmck@kernel.org> wrote:
> On Wed, Jun 07, 2023 at 02:53:51PM -0400, Olivier Dion wrote:
>
> Same question here on loss of volatile semantics.
>

This apply to all review on volatile semantics.  I added a
cmm_cast_volatile() macro/template for C/C++ for adding the volatile
qualifier to pointers passed to every atomic builtin calls.

-- 
Olivier Dion
EfficiOS Inc.
https://www.efficios.com
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [lttng-dev] [PATCH v2 04/12] urcu/system: Use atomic builtins if configured
  2023-07-04 14:43     ` Olivier Dion via lttng-dev
@ 2023-07-05 18:48       ` Paul E. McKenney via lttng-dev
  2023-07-05 19:03         ` Olivier Dion via lttng-dev
  0 siblings, 1 reply; 69+ messages in thread
From: Paul E. McKenney via lttng-dev @ 2023-07-05 18:48 UTC (permalink / raw)
  To: Olivier Dion; +Cc: Tony Finch, lttng-dev

On Tue, Jul 04, 2023 at 10:43:21AM -0400, Olivier Dion wrote:
> On Wed, 21 Jun 2023, "Paul E. McKenney" <paulmck@kernel.org> wrote:
> > On Wed, Jun 07, 2023 at 02:53:51PM -0400, Olivier Dion wrote:
> >
> > Same question here on loss of volatile semantics.
> 
> This apply to all review on volatile semantics.  I added a
> cmm_cast_volatile() macro/template for C/C++ for adding the volatile
> qualifier to pointers passed to every atomic builtin calls.

Sounds very good, thank you!

							Thanx, Paul
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [lttng-dev] [PATCH v2 04/12] urcu/system: Use atomic builtins if configured
  2023-07-05 18:48       ` Paul E. McKenney via lttng-dev
@ 2023-07-05 19:03         ` Olivier Dion via lttng-dev
  2023-07-05 19:28           ` Paul E. McKenney via lttng-dev
  0 siblings, 1 reply; 69+ messages in thread
From: Olivier Dion via lttng-dev @ 2023-07-05 19:03 UTC (permalink / raw)
  To: paulmck; +Cc: Tony Finch, lttng-dev

On Wed, 05 Jul 2023, "Paul E. McKenney" <paulmck@kernel.org> wrote:
> On Tue, Jul 04, 2023 at 10:43:21AM -0400, Olivier Dion wrote:
>> On Wed, 21 Jun 2023, "Paul E. McKenney" <paulmck@kernel.org> wrote:
>> > On Wed, Jun 07, 2023 at 02:53:51PM -0400, Olivier Dion wrote:
>> >
>> > Same question here on loss of volatile semantics.
>> 
>> This apply to all review on volatile semantics.  I added a
>> cmm_cast_volatile() macro/template for C/C++ for adding the volatile
>> qualifier to pointers passed to every atomic builtin calls.
>
> Sounds very good, thank you!

Maybe a case of synchronicity here, but I just stumble upon this
<https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0062r1.html>
where you seem to express the same concerns :-)

-- 
Olivier Dion
EfficiOS Inc.
https://www.efficios.com
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [lttng-dev] [PATCH v2 04/12] urcu/system: Use atomic builtins if configured
  2023-07-05 19:03         ` Olivier Dion via lttng-dev
@ 2023-07-05 19:28           ` Paul E. McKenney via lttng-dev
  0 siblings, 0 replies; 69+ messages in thread
From: Paul E. McKenney via lttng-dev @ 2023-07-05 19:28 UTC (permalink / raw)
  To: Olivier Dion; +Cc: Tony Finch, lttng-dev

On Wed, Jul 05, 2023 at 03:03:21PM -0400, Olivier Dion wrote:
> On Wed, 05 Jul 2023, "Paul E. McKenney" <paulmck@kernel.org> wrote:
> > On Tue, Jul 04, 2023 at 10:43:21AM -0400, Olivier Dion wrote:
> >> On Wed, 21 Jun 2023, "Paul E. McKenney" <paulmck@kernel.org> wrote:
> >> > On Wed, Jun 07, 2023 at 02:53:51PM -0400, Olivier Dion wrote:
> >> >
> >> > Same question here on loss of volatile semantics.
> >> 
> >> This apply to all review on volatile semantics.  I added a
> >> cmm_cast_volatile() macro/template for C/C++ for adding the volatile
> >> qualifier to pointers passed to every atomic builtin calls.
> >
> > Sounds very good, thank you!
> 
> Maybe a case of synchronicity here, but I just stumble upon this
> <https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0062r1.html>
> where you seem to express the same concerns :-)

Just for completeness, my response to Hans's concern about volatile is
addressed by an empty memory-clobber asm, similar to barrier() in the
Linux kernel.

But yes, this has seen significant discussion over the years.  ;-)

							Thanx, Paul
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 69+ messages in thread

end of thread, other threads:[~2023-07-05 19:29 UTC | newest]

Thread overview: 69+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-15 20:17 [lttng-dev] [PATCH 00/11] Add support for TSAN to liburcu Olivier Dion via lttng-dev
2023-05-15 20:17 ` [lttng-dev] [PATCH 01/11] configure: Add --disable-atomic-builtins option Olivier Dion via lttng-dev
2023-05-15 20:17 ` [lttng-dev] [PATCH 02/11] urcu/uatomic: Use atomic builtins if configured Olivier Dion via lttng-dev
2023-06-21 23:19   ` Paul E. McKenney via lttng-dev
2023-06-22 15:55     ` Mathieu Desnoyers via lttng-dev
2023-06-22 18:32       ` Paul E. McKenney via lttng-dev
2023-06-22 19:53         ` Olivier Dion via lttng-dev
2023-06-22 19:56           ` Mathieu Desnoyers via lttng-dev
2023-06-22 20:10             ` Olivier Dion via lttng-dev
2023-06-22 20:11           ` Paul E. McKenney via lttng-dev
2023-06-22 19:54         ` Mathieu Desnoyers via lttng-dev
2023-06-29 17:22         ` Olivier Dion via lttng-dev
2023-06-29 17:27           ` Olivier Dion via lttng-dev
2023-06-29 18:33             ` Mathieu Desnoyers via lttng-dev
2023-06-29 18:29           ` Mathieu Desnoyers via lttng-dev
2023-05-15 20:17 ` [lttng-dev] [PATCH 03/11] urcu/compiler: " Olivier Dion via lttng-dev
2023-05-15 20:17 ` [lttng-dev] [PATCH 04/11] urcu/arch/generic: " Olivier Dion via lttng-dev
2023-06-21 23:22   ` Paul E. McKenney via lttng-dev
2023-06-22  0:53     ` Olivier Dion via lttng-dev
2023-06-22  1:48       ` Mathieu Desnoyers via lttng-dev
2023-06-22  3:44         ` Paul E. McKenney via lttng-dev
2023-05-15 20:17 ` [lttng-dev] [PATCH 05/11] urcu/system: " Olivier Dion via lttng-dev
2023-05-15 20:17 ` [lttng-dev] [PATCH 06/11] urcu/uatomic: Add CMM memory model Olivier Dion via lttng-dev
2023-05-15 20:17 ` [lttng-dev] [PATCH 07/11] urcu-wait: Fix wait state load/store Olivier Dion via lttng-dev
2023-05-15 20:17 ` [lttng-dev] [PATCH 08/11] tests: Use uatomic for accessing global states Olivier Dion via lttng-dev
2023-05-15 20:17 ` [lttng-dev] [PATCH 09/11] benchmark: " Olivier Dion via lttng-dev
2023-05-15 20:17 ` [lttng-dev] [PATCH 10/11] tests/unit/test_build: Quiet unused return value Olivier Dion via lttng-dev
2023-05-15 20:17 ` [lttng-dev] [PATCH 11/11] urcu/annotate: Add CMM annotation Olivier Dion via lttng-dev
2023-05-16 15:57   ` Olivier Dion via lttng-dev
2023-05-16  8:18 ` [lttng-dev] [PATCH 00/11] Add support for TSAN to liburcu Dmitry Vyukov via lttng-dev
2023-05-16 15:47   ` Olivier Dion via lttng-dev
2023-05-17 10:21     ` Dmitry Vyukov via lttng-dev
2023-05-17 10:57       ` Dmitry Vyukov via lttng-dev
2023-05-17 14:44         ` Olivier Dion via lttng-dev
2023-05-23 16:05           ` Olivier Dion via lttng-dev
2023-05-24  8:14             ` Dmitry Vyukov via lttng-dev
2023-05-26  5:33               ` Ondřej Surý via lttng-dev
2023-05-26  6:08                 ` Dmitry Vyukov via lttng-dev
2023-05-26  6:10                   ` Dmitry Vyukov via lttng-dev
2023-05-26 10:06                   ` Ondřej Surý via lttng-dev
2023-05-26 10:08                     ` Dmitry Vyukov via lttng-dev
2023-05-26 14:20                     ` Olivier Dion via lttng-dev
2023-05-26 15:15               ` Olivier Dion via lttng-dev
2023-05-17 14:44       ` Olivier Dion via lttng-dev
2023-06-07 18:53 ` [lttng-dev] [PATCH v2 00/12] " Olivier Dion via lttng-dev
2023-06-07 19:04   ` Ondřej Surý via lttng-dev
2023-06-07 19:20     ` Olivier Dion via lttng-dev
2023-06-07 18:53 ` [lttng-dev] [PATCH v2 01/12] configure: Add --disable-atomic-builtins option Olivier Dion via lttng-dev
2023-06-07 18:53 ` [lttng-dev] [PATCH v2 02/12] urcu/compiler: Use atomic builtins if configured Olivier Dion via lttng-dev
2023-06-07 18:53 ` [lttng-dev] [PATCH v2 03/12] urcu/arch/generic: " Olivier Dion via lttng-dev
2023-06-07 18:53 ` [lttng-dev] [PATCH v2 04/12] urcu/system: " Olivier Dion via lttng-dev
2023-06-21 23:23   ` Paul E. McKenney via lttng-dev
2023-07-04 14:43     ` Olivier Dion via lttng-dev
2023-07-05 18:48       ` Paul E. McKenney via lttng-dev
2023-07-05 19:03         ` Olivier Dion via lttng-dev
2023-07-05 19:28           ` Paul E. McKenney via lttng-dev
2023-06-07 18:53 ` [lttng-dev] [PATCH v2 05/12] urcu/uatomic: Add CMM memory model Olivier Dion via lttng-dev
2023-06-21 23:28   ` Paul E. McKenney via lttng-dev
2023-06-29 16:49     ` Olivier Dion via lttng-dev
2023-06-29 18:40       ` Paul E. McKenney via lttng-dev
2023-06-07 18:53 ` [lttng-dev] [PATCH v2 06/12] urcu-wait: Fix wait state load/store Olivier Dion via lttng-dev
2023-06-07 18:53 ` [lttng-dev] [PATCH v2 07/12] tests: Use uatomic for accessing global states Olivier Dion via lttng-dev
2023-06-21 23:37   ` Paul E. McKenney via lttng-dev
2023-06-07 18:53 ` [lttng-dev] [PATCH v2 08/12] benchmark: " Olivier Dion via lttng-dev
2023-06-21 23:38   ` Paul E. McKenney via lttng-dev
2023-06-07 18:53 ` [lttng-dev] [PATCH v2 09/12] tests/unit/test_build: Quiet unused return value Olivier Dion via lttng-dev
2023-06-07 18:53 ` [lttng-dev] [PATCH v2 10/12] urcu/annotate: Add CMM annotation Olivier Dion via lttng-dev
2023-06-07 18:53 ` [lttng-dev] [PATCH v2 11/12] Add cmm_emit_legacy_smp_mb() Olivier Dion via lttng-dev
2023-06-07 18:53 ` [lttng-dev] [PATCH v2 12/12] tests: Add tests for checking race conditions Olivier Dion via lttng-dev

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.