[lttng-dev] [PATCH 0/7] Replace the custom code with gcc/clang _

lttng-dev.lists.lttng.org archive mirror
 help / color / mirror / Atom feed

* [lttng-dev] [PATCH 0/7] Replace the custom code with gcc/clang __atomic builtins
@ 2023-03-17 21:37 Ondřej Surý via lttng-dev
  2023-03-17 21:37 ` [lttng-dev] [PATCH 1/7] Require __atomic builtins to build Ondřej Surý via lttng-dev
                   ` (6 more replies)
  0 siblings, 7 replies; 30+ messages in thread
From: Ondřej Surý via lttng-dev @ 2023-03-17 21:37 UTC (permalink / raw)
  To: lttng-dev

Hi,

(this is my first time using git send-email, so I apologise in advance if
anything breaks).

Here's my attempt to convert the Userspace RCU to use __atomic builtins whenever
possible instead of custom assembly.

The __atomic builtins were first introduced in gcc 4.7.0 and clang 3.1.0.

Apart from simplifying the code, this should also help ThreadSanitizer to
understand the memory synchronization and report less (or no) warnings.

The code compiles and the tests passed (on amd64).

This is by no means complete, and most probably I missed or misunderstood
something, but it's a solid start, so I am submitting the patch set for
discussion and review.

Thanks,
--
Ondřej Surý <ondrej@sury.org>

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [lttng-dev] [PATCH 1/7] Require __atomic builtins to build
  2023-03-17 21:37 [lttng-dev] [PATCH 0/7] Replace the custom code with gcc/clang __atomic builtins Ondřej Surý via lttng-dev
@ 2023-03-17 21:37 ` Ondřej Surý via lttng-dev
  2023-03-17 21:37 ` [lttng-dev] [PATCH 2/7] Use gcc __atomic builtis for <urcu/uatomic.h> implementation Ondřej Surý via lttng-dev
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 30+ messages in thread
From: Ondřej Surý via lttng-dev @ 2023-03-17 21:37 UTC (permalink / raw)
  To: lttng-dev

Add autoconf checks for all __atomic builtins that urcu require, and
adjust the gcc and clang versions in the README.md.

Signed-off-by: Ondřej Surý <ondrej@sury.org>
---
 README.md    | 33 +++++++++------------------------
 configure.ac | 15 +++++++++++++++
 2 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/README.md b/README.md
index ba5bb08..a65a07a 100644
--- a/README.md
+++ b/README.md
@@ -68,30 +68,15 @@ Should also work on:
 
 (more testing needed before claiming support for these OS).
 
-Linux ARM depends on running a Linux kernel 2.6.15 or better, GCC 4.4 or
-better.
-
-The C compiler used needs to support at least C99. The C++ compiler used
-needs to support at least C++11.
-
-The GCC compiler versions 3.3, 3.4, 4.0, 4.1, 4.2, 4.3, 4.4 and 4.5 are
-supported, with the following exceptions:
-
-  - GCC 3.3 and 3.4 have a bug that prevents them from generating volatile
-    accesses to offsets in a TLS structure on 32-bit x86. These versions are
-    therefore not compatible with `liburcu` on x86 32-bit
-    (i386, i486, i586, i686).
-    The problem has been reported to the GCC community:
-    <http://www.mail-archive.com/gcc-bugs@gcc.gnu.org/msg281255.html>
-  - GCC 3.3 cannot match the "xchg" instruction on 32-bit x86 build.
-    See <http://kerneltrap.org/node/7507>
-  - Alpha, ia64 and ARM architectures depend on GCC 4.x with atomic builtins
-    support. For ARM this was introduced with GCC 4.4:
-    <http://gcc.gnu.org/gcc-4.4/changes.html>.
-  - Linux aarch64 depends on GCC 5.1 or better because prior versions
-    perform unsafe access to deallocated stack.
-
-Clang version 3.0 (based on LLVM 3.0) is supported.
+Linux ARM depends on running a Linux kernel 2.6.15 or better.
+
+The C compiler used needs to support at least C99 and __atomic
+builtins. The C++ compiler used needs to support at least C++11
+and __atomic builtins.
+
+The GCC compiler versions 4.7 or better are supported.
+
+Clang version 3.1 (based on LLVM 3.1) is supported.
 
 Glibc >= 2.4 should work but the older version we test against is
 currently 2.17.
diff --git a/configure.ac b/configure.ac
index 909cf1d..cb7ba18 100644
--- a/configure.ac
+++ b/configure.ac
@@ -198,6 +198,21 @@ AC_SEARCH_LIBS([clock_gettime], [rt], [
   AC_DEFINE([CONFIG_RCU_HAVE_CLOCK_GETTIME], [1], [clock_gettime() is detected.])
 ])
 
+# Require __atomic builtins
+AC_COMPILE_IFELSE(
+	[AC_LANG_PROGRAM(
+		[[int x, y;]],
+		[[__atomic_store_n(&x, 0, __ATOMIC_RELEASE);
+		  __atomic_load_n(&x, __ATOMIC_CONSUME);
+		  y = __atomic_exchange_n(&x, 1, __ATOMIC_ACQ_REL);
+		  __atomic_compare_exchange_n(&x, &y, 0, 0, __ATOMIC_ACQ_REL, __ATOMIC_CONSUME);
+		  __atomic_add_fetch(&x, 1, __ATOMIC_ACQ_REL);
+		  __atomic_sub_fetch(&x, 1, __ATOMIC_ACQ_REL);
+		  __atomic_and_fetch(&x, 0x01, __ATOMIC_ACQ_REL);
+		  __atomic_or_fetch(&x, 0x01, __ATOMIC_ACQ_REL);
+		  __atomic_thread_fence(__ATOMIC_ACQ_REL)]])],
+	[],
+	[AC_MSG_ERROR([The compiler does not support __atomic builtins])])
 
 ##                             ##
 ## Optional features selection ##
-- 
2.39.2

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [lttng-dev] [PATCH 2/7] Use gcc __atomic builtis for <urcu/uatomic.h> implementation
  2023-03-17 21:37 [lttng-dev] [PATCH 0/7] Replace the custom code with gcc/clang __atomic builtins Ondřej Surý via lttng-dev
  2023-03-17 21:37 ` [lttng-dev] [PATCH 1/7] Require __atomic builtins to build Ondřej Surý via lttng-dev
@ 2023-03-17 21:37 ` Ondřej Surý via lttng-dev
  2023-03-20 18:03   ` Mathieu Desnoyers via lttng-dev
  2023-03-17 21:37 ` [lttng-dev] [PATCH 3/7] Use __atomic_thread_fence() for cmm_barrier() Ondřej Surý via lttng-dev
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 30+ messages in thread
From: Ondřej Surý via lttng-dev @ 2023-03-17 21:37 UTC (permalink / raw)
  To: lttng-dev

Replace the custom assembly code in include/urcu/uatomic/ with __atomic
builtins provided by C11-compatible compiler.

Signed-off-by: Ondřej Surý <ondrej@sury.org>
---
 include/Makefile.am            |  16 -
 include/urcu/uatomic.h         |  84 +++--
 include/urcu/uatomic/aarch64.h |  41 ---
 include/urcu/uatomic/alpha.h   |  32 --
 include/urcu/uatomic/arm.h     |  57 ---
 include/urcu/uatomic/gcc.h     |  46 ---
 include/urcu/uatomic/generic.h | 613 -------------------------------
 include/urcu/uatomic/hppa.h    |  10 -
 include/urcu/uatomic/ia64.h    |  41 ---
 include/urcu/uatomic/m68k.h    |  44 ---
 include/urcu/uatomic/mips.h    |  32 --
 include/urcu/uatomic/nios2.h   |  32 --
 include/urcu/uatomic/ppc.h     | 237 ------------
 include/urcu/uatomic/riscv.h   |  44 ---
 include/urcu/uatomic/s390.h    | 170 ---------
 include/urcu/uatomic/sparc64.h |  81 -----
 include/urcu/uatomic/tile.h    |  41 ---
 include/urcu/uatomic/x86.h     | 646 ---------------------------------
 18 files changed, 53 insertions(+), 2214 deletions(-)
 delete mode 100644 include/urcu/uatomic/aarch64.h
 delete mode 100644 include/urcu/uatomic/alpha.h
 delete mode 100644 include/urcu/uatomic/arm.h
 delete mode 100644 include/urcu/uatomic/gcc.h
 delete mode 100644 include/urcu/uatomic/generic.h
 delete mode 100644 include/urcu/uatomic/hppa.h
 delete mode 100644 include/urcu/uatomic/ia64.h
 delete mode 100644 include/urcu/uatomic/m68k.h
 delete mode 100644 include/urcu/uatomic/mips.h
 delete mode 100644 include/urcu/uatomic/nios2.h
 delete mode 100644 include/urcu/uatomic/ppc.h
 delete mode 100644 include/urcu/uatomic/riscv.h
 delete mode 100644 include/urcu/uatomic/s390.h
 delete mode 100644 include/urcu/uatomic/sparc64.h
 delete mode 100644 include/urcu/uatomic/tile.h
 delete mode 100644 include/urcu/uatomic/x86.h

diff --git a/include/Makefile.am b/include/Makefile.am
index ba1fe60..53a28fd 100644
--- a/include/Makefile.am
+++ b/include/Makefile.am
@@ -59,24 +59,8 @@ nobase_include_HEADERS = \
 	urcu/syscall-compat.h \
 	urcu/system.h \
 	urcu/tls-compat.h \
-	urcu/uatomic/aarch64.h \
-	urcu/uatomic/alpha.h \
 	urcu/uatomic_arch.h \
-	urcu/uatomic/arm.h \
-	urcu/uatomic/gcc.h \
-	urcu/uatomic/generic.h \
 	urcu/uatomic.h \
-	urcu/uatomic/hppa.h \
-	urcu/uatomic/ia64.h \
-	urcu/uatomic/m68k.h \
-	urcu/uatomic/mips.h \
-	urcu/uatomic/nios2.h \
-	urcu/uatomic/ppc.h \
-	urcu/uatomic/riscv.h \
-	urcu/uatomic/s390.h \
-	urcu/uatomic/sparc64.h \
-	urcu/uatomic/tile.h \
-	urcu/uatomic/x86.h \
 	urcu/urcu-bp.h \
 	urcu/urcu-futex.h \
 	urcu/urcu.h \
diff --git a/include/urcu/uatomic.h b/include/urcu/uatomic.h
index 2fb5fd4..4dbef4c 100644
--- a/include/urcu/uatomic.h
+++ b/include/urcu/uatomic.h
@@ -22,37 +22,59 @@
 #define _URCU_UATOMIC_H
 
 #include <urcu/arch.h>
+#include <urcu/system.h>
 
-#if defined(URCU_ARCH_X86)
-#include <urcu/uatomic/x86.h>
-#elif defined(URCU_ARCH_PPC)
-#include <urcu/uatomic/ppc.h>
-#elif defined(URCU_ARCH_S390)
-#include <urcu/uatomic/s390.h>
-#elif defined(URCU_ARCH_SPARC64)
-#include <urcu/uatomic/sparc64.h>
-#elif defined(URCU_ARCH_ALPHA)
-#include <urcu/uatomic/alpha.h>
-#elif defined(URCU_ARCH_IA64)
-#include <urcu/uatomic/ia64.h>
-#elif defined(URCU_ARCH_ARM)
-#include <urcu/uatomic/arm.h>
-#elif defined(URCU_ARCH_AARCH64)
-#include <urcu/uatomic/aarch64.h>
-#elif defined(URCU_ARCH_MIPS)
-#include <urcu/uatomic/mips.h>
-#elif defined(URCU_ARCH_NIOS2)
-#include <urcu/uatomic/nios2.h>
-#elif defined(URCU_ARCH_TILE)
-#include <urcu/uatomic/tile.h>
-#elif defined(URCU_ARCH_HPPA)
-#include <urcu/uatomic/hppa.h>
-#elif defined(URCU_ARCH_M68K)
-#include <urcu/uatomic/m68k.h>
-#elif defined(URCU_ARCH_RISCV)
-#include <urcu/uatomic/riscv.h>
-#else
-#error "Cannot build: unrecognized architecture, see <urcu/arch.h>."
-#endif
+#define UATOMIC_HAS_ATOMIC_BYTE
+#define UATOMIC_HAS_ATOMIC_SHORT
+
+#define uatomic_set(addr, v) __atomic_store_n(addr, v, __ATOMIC_RELEASE)
+
+#define uatomic_read(addr) __atomic_load_n((addr), __ATOMIC_CONSUME)
+
+#define uatomic_xchg(addr, v) __atomic_exchange_n((addr), (v), __ATOMIC_ACQ_REL)
+
+#define uatomic_cmpxchg(addr, old, new) \
+	({									\
+		__typeof__(*(addr)) __old = old;				\
+		__atomic_compare_exchange_n(addr, &__old, new, 0,		\
+					    __ATOMIC_ACQ_REL, __ATOMIC_CONSUME);\
+		__old;								\
+	})
+
+#define uatomic_add_return(addr, v) \
+	__atomic_add_fetch((addr), (v), __ATOMIC_ACQ_REL)
+
+#define uatomic_add(addr, v) \
+	(void)__atomic_add_fetch((addr), (v), __ATOMIC_RELAXED)
+
+#define uatomic_sub_return(addr, v) \
+	__atomic_sub_fetch((addr), (v), __ATOMIC_ACQ_REL)
+
+#define uatomic_sub(addr, v) \
+	(void)__atomic_sub_fetch((addr), (v), __ATOMIC_RELAXED)
+
+#define uatomic_and(addr, mask) \
+	(void)__atomic_and_fetch((addr), (mask), __ATOMIC_RELAXED)
+
+#define uatomic_or(addr, mask)						\
+	(void)__atomic_or_fetch((addr), (mask), __ATOMIC_RELAXED)
+
+#define uatomic_inc(addr) (void)__atomic_add_fetch((addr), 1, __ATOMIC_RELAXED)
+#define uatomic_dec(addr) (void)__atomic_sub_fetch((addr), 1, __ATOMIC_RELAXED)
+
+#define cmm_smp_mb__before_uatomic_and()	__atomic_thread_fence(__ATOMIC_ACQ_REL)
+#define cmm_smp_mb__after_uatomic_and()		__atomic_thread_fence(__ATOMIC_ACQ_REL)
+#define cmm_smp_mb__before_uatomic_or()		__atomic_thread_fence(__ATOMIC_ACQ_REL)
+#define cmm_smp_mb__after_uatomic_or()		__atomic_thread_fence(__ATOMIC_ACQ_REL)
+#define cmm_smp_mb__before_uatomic_add()	__atomic_thread_fence(__ATOMIC_ACQ_REL)
+#define cmm_smp_mb__after_uatomic_add()		__atomic_thread_fence(__ATOMIC_ACQ_REL)
+#define cmm_smp_mb__before_uatomic_sub()	cmm_smp_mb__before_uatomic_add()
+#define cmm_smp_mb__after_uatomic_sub()		cmm_smp_mb__after_uatomic_add()
+#define cmm_smp_mb__before_uatomic_inc()	cmm_smp_mb__before_uatomic_add()
+#define cmm_smp_mb__after_uatomic_inc()		cmm_smp_mb__after_uatomic_add()
+#define cmm_smp_mb__before_uatomic_dec()	cmm_smp_mb__before_uatomic_add()
+#define cmm_smp_mb__after_uatomic_dec()		cmm_smp_mb__after_uatomic_add()
+
+#define cmm_smp_mb()				cmm_mb()
 
 #endif /* _URCU_UATOMIC_H */
diff --git a/include/urcu/uatomic/aarch64.h b/include/urcu/uatomic/aarch64.h
deleted file mode 100644
index 58698ce..0000000
--- a/include/urcu/uatomic/aarch64.h
+++ /dev/null
@@ -1,41 +0,0 @@
-#ifndef _URCU_ARCH_UATOMIC_AARCH64_H
-#define _URCU_ARCH_UATOMIC_AARCH64_H
-
-/*
- * Copyright (c) 1991-1994 by Xerox Corporation.  All rights reserved.
- * Copyright (c) 1996-1999 by Silicon Graphics.  All rights reserved.
- * Copyright (c) 1999-2004 Hewlett-Packard Development Company, L.P.
- * Copyright (c) 2009-2015 Mathieu Desnoyers
- * Copyright (c) 2010      Paul E. McKenney, IBM Corporation
- *			   (Adapted from uatomic_arch_ppc.h)
- *
- * THIS MATERIAL IS PROVIDED AS IS, WITH ABSOLUTELY NO WARRANTY EXPRESSED
- * OR IMPLIED.  ANY USE IS AT YOUR OWN RISK.
- *
- * Permission is hereby granted to use or copy this program
- * for any purpose,  provided the above notices are retained on all copies.
- * Permission to modify the code and to distribute modified code is granted,
- * provided the above notices are retained, and a notice that the code was
- * modified is included with the above copyright notice.
- *
- * Code inspired from libuatomic_ops-1.2, inherited in part from the
- * Boehm-Demers-Weiser conservative garbage collector.
- */
-
-#include <urcu/compiler.h>
-#include <urcu/system.h>
-
-#ifdef __cplusplus
-extern "C" {
-#endif
-
-#define UATOMIC_HAS_ATOMIC_BYTE
-#define UATOMIC_HAS_ATOMIC_SHORT
-
-#ifdef __cplusplus
-}
-#endif
-
-#include <urcu/uatomic/generic.h>
-
-#endif /* _URCU_ARCH_UATOMIC_AARCH64_H */
diff --git a/include/urcu/uatomic/alpha.h b/include/urcu/uatomic/alpha.h
deleted file mode 100644
index 5dceb90..0000000
--- a/include/urcu/uatomic/alpha.h
+++ /dev/null
@@ -1,32 +0,0 @@
-#ifndef _URCU_UATOMIC_ARCH_ALPHA_H
-#define _URCU_UATOMIC_ARCH_ALPHA_H
-
-/*
- * Atomic exchange operations for the Alpha architecture. Let GCC do it.
- *
- * Copyright (c) 2010 Paolo Bonzini <pbonzini@redhat.com>
- *
- * Permission is hereby granted, free of charge, to any person obtaining a copy
- * of this software and associated documentation files (the "Software"), to
- * deal in the Software without restriction, including without limitation the
- * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
- * sell copies of the Software, and to permit persons to whom the Software is
- * furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice shall be included in
- * all copies or substantial portions of the Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
- * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
- * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
- * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
- * IN THE SOFTWARE.
- */
-
-#include <urcu/compiler.h>
-#include <urcu/system.h>
-#include <urcu/uatomic/generic.h>
-
-#endif /* _URCU_UATOMIC_ARCH_ALPHA_H */
diff --git a/include/urcu/uatomic/arm.h b/include/urcu/uatomic/arm.h
deleted file mode 100644
index 95f32f3..0000000
--- a/include/urcu/uatomic/arm.h
+++ /dev/null
@@ -1,57 +0,0 @@
-#ifndef _URCU_ARCH_UATOMIC_ARM_H
-#define _URCU_ARCH_UATOMIC_ARM_H
-
-/*
- * Atomics for ARM.  This approach is usable on kernels back to 2.6.15.
- *
- * Copyright (c) 1991-1994 by Xerox Corporation.  All rights reserved.
- * Copyright (c) 1996-1999 by Silicon Graphics.  All rights reserved.
- * Copyright (c) 1999-2004 Hewlett-Packard Development Company, L.P.
- * Copyright (c) 2009      Mathieu Desnoyers
- * Copyright (c) 2010      Paul E. McKenney, IBM Corporation
- *			   (Adapted from uatomic_arch_ppc.h)
- *
- * THIS MATERIAL IS PROVIDED AS IS, WITH ABSOLUTELY NO WARRANTY EXPRESSED
- * OR IMPLIED.  ANY USE IS AT YOUR OWN RISK.
- *
- * Permission is hereby granted to use or copy this program
- * for any purpose,  provided the above notices are retained on all copies.
- * Permission to modify the code and to distribute modified code is granted,
- * provided the above notices are retained, and a notice that the code was
- * modified is included with the above copyright notice.
- *
- * Code inspired from libuatomic_ops-1.2, inherited in part from the
- * Boehm-Demers-Weiser conservative garbage collector.
- */
-
-#include <urcu/compiler.h>
-#include <urcu/system.h>
-#include <urcu/arch.h>
-
-#ifdef __cplusplus
-extern "C" {
-#endif
-
-/* xchg */
-
-/*
- * Based on [1], __sync_lock_test_and_set() is not a full barrier, but
- * instead only an acquire barrier. Given that uatomic_xchg() acts as
- * both release and acquire barriers, we therefore need to have our own
- * release barrier before this operation.
- *
- * [1] https://gcc.gnu.org/onlinedocs/gcc-4.1.0/gcc/Atomic-Builtins.html
- */
-#define uatomic_xchg(addr, v)				\
-	({						\
-		cmm_smp_mb();				\
-		__sync_lock_test_and_set(addr, v);	\
-	})
-
-#ifdef __cplusplus
-}
-#endif
-
-#include <urcu/uatomic/generic.h>
-
-#endif /* _URCU_ARCH_UATOMIC_ARM_H */
diff --git a/include/urcu/uatomic/gcc.h b/include/urcu/uatomic/gcc.h
deleted file mode 100644
index 438e039..0000000
--- a/include/urcu/uatomic/gcc.h
+++ /dev/null
@@ -1,46 +0,0 @@
-#ifndef _URCU_ARCH_UATOMIC_GCC_H
-#define _URCU_ARCH_UATOMIC_GCC_H
-
-/*
- * Copyright (c) 1991-1994 by Xerox Corporation.  All rights reserved.
- * Copyright (c) 1996-1999 by Silicon Graphics.  All rights reserved.
- * Copyright (c) 1999-2004 Hewlett-Packard Development Company, L.P.
- * Copyright (c) 2009      Mathieu Desnoyers
- * Copyright (c) 2010      Paul E. McKenney, IBM Corporation
- *			   (Adapted from uatomic_arch_ppc.h)
- *
- * THIS MATERIAL IS PROVIDED AS IS, WITH ABSOLUTELY NO WARRANTY EXPRESSED
- * OR IMPLIED.  ANY USE IS AT YOUR OWN RISK.
- *
- * Permission is hereby granted to use or copy this program
- * for any purpose,  provided the above notices are retained on all copies.
- * Permission to modify the code and to distribute modified code is granted,
- * provided the above notices are retained, and a notice that the code was
- * modified is included with the above copyright notice.
- *
- * Code inspired from libuatomic_ops-1.2, inherited in part from the
- * Boehm-Demers-Weiser conservative garbage collector.
- */
-
-#include <urcu/compiler.h>
-#include <urcu/system.h>
-
-#ifdef __cplusplus
-extern "C" {
-#endif
-
-/*
- * If your platform doesn't have a full set of atomics, you will need
- * a separate uatomic_arch_*.h file for your architecture.  Otherwise,
- * just rely on the definitions in uatomic/generic.h.
- */
-#define UATOMIC_HAS_ATOMIC_BYTE
-#define UATOMIC_HAS_ATOMIC_SHORT
-
-#ifdef __cplusplus
-}
-#endif
-
-#include <urcu/uatomic/generic.h>
-
-#endif /* _URCU_ARCH_UATOMIC_GCC_H */
diff --git a/include/urcu/uatomic/generic.h b/include/urcu/uatomic/generic.h
deleted file mode 100644
index c3762b0..0000000
--- a/include/urcu/uatomic/generic.h
+++ /dev/null
@@ -1,613 +0,0 @@
-#ifndef _URCU_UATOMIC_GENERIC_H
-#define _URCU_UATOMIC_GENERIC_H
-
-/*
- * Copyright (c) 1991-1994 by Xerox Corporation.  All rights reserved.
- * Copyright (c) 1996-1999 by Silicon Graphics.  All rights reserved.
- * Copyright (c) 1999-2004 Hewlett-Packard Development Company, L.P.
- * Copyright (c) 2009      Mathieu Desnoyers
- * Copyright (c) 2010      Paolo Bonzini
- *
- * THIS MATERIAL IS PROVIDED AS IS, WITH ABSOLUTELY NO WARRANTY EXPRESSED
- * OR IMPLIED.  ANY USE IS AT YOUR OWN RISK.
- *
- * Permission is hereby granted to use or copy this program
- * for any purpose,  provided the above notices are retained on all copies.
- * Permission to modify the code and to distribute modified code is granted,
- * provided the above notices are retained, and a notice that the code was
- * modified is included with the above copyright notice.
- *
- * Code inspired from libuatomic_ops-1.2, inherited in part from the
- * Boehm-Demers-Weiser conservative garbage collector.
- */
-
-#include <stdint.h>
-#include <urcu/compiler.h>
-#include <urcu/system.h>
-
-#ifdef __cplusplus
-extern "C" {
-#endif
-
-#ifndef uatomic_set
-#define uatomic_set(addr, v)	((void) CMM_STORE_SHARED(*(addr), (v)))
-#endif
-
-#ifndef uatomic_read
-#define uatomic_read(addr)	CMM_LOAD_SHARED(*(addr))
-#endif
-
-#if !defined __OPTIMIZE__  || defined UATOMIC_NO_LINK_ERROR
-static inline __attribute__((always_inline, __noreturn__))
-void _uatomic_link_error(void)
-{
-#ifdef ILLEGAL_INSTR
-	/*
-	 * generate an illegal instruction. Cannot catch this with
-	 * linker tricks when optimizations are disabled.
-	 */
-	__asm__ __volatile__(ILLEGAL_INSTR);
-#else
-	__builtin_trap();
-#endif
-}
-
-#else /* #if !defined __OPTIMIZE__  || defined UATOMIC_NO_LINK_ERROR */
-extern void _uatomic_link_error(void);
-#endif /* #else #if !defined __OPTIMIZE__  || defined UATOMIC_NO_LINK_ERROR */
-
-/* cmpxchg */
-
-#ifndef uatomic_cmpxchg
-static inline __attribute__((always_inline))
-unsigned long _uatomic_cmpxchg(void *addr, unsigned long old,
-			      unsigned long _new, int len)
-{
-	switch (len) {
-#ifdef UATOMIC_HAS_ATOMIC_BYTE
-	case 1:
-		return __sync_val_compare_and_swap_1((uint8_t *) addr, old,
-				_new);
-#endif
-#ifdef UATOMIC_HAS_ATOMIC_SHORT
-	case 2:
-		return __sync_val_compare_and_swap_2((uint16_t *) addr, old,
-				_new);
-#endif
-	case 4:
-		return __sync_val_compare_and_swap_4((uint32_t *) addr, old,
-				_new);
-#if (CAA_BITS_PER_LONG == 64)
-	case 8:
-		return __sync_val_compare_and_swap_8((uint64_t *) addr, old,
-				_new);
-#endif
-	}
-	_uatomic_link_error();
-	return 0;
-}
-
-
-#define uatomic_cmpxchg(addr, old, _new)				      \
-	((__typeof__(*(addr))) _uatomic_cmpxchg((addr),			      \
-						caa_cast_long_keep_sign(old), \
-						caa_cast_long_keep_sign(_new),\
-						sizeof(*(addr))))
-
-
-/* uatomic_and */
-
-#ifndef uatomic_and
-static inline __attribute__((always_inline))
-void _uatomic_and(void *addr, unsigned long val,
-		  int len)
-{
-	switch (len) {
-#ifdef UATOMIC_HAS_ATOMIC_BYTE
-	case 1:
-		__sync_and_and_fetch_1((uint8_t *) addr, val);
-		return;
-#endif
-#ifdef UATOMIC_HAS_ATOMIC_SHORT
-	case 2:
-		__sync_and_and_fetch_2((uint16_t *) addr, val);
-		return;
-#endif
-	case 4:
-		__sync_and_and_fetch_4((uint32_t *) addr, val);
-		return;
-#if (CAA_BITS_PER_LONG == 64)
-	case 8:
-		__sync_and_and_fetch_8((uint64_t *) addr, val);
-		return;
-#endif
-	}
-	_uatomic_link_error();
-}
-
-#define uatomic_and(addr, v)			\
-	(_uatomic_and((addr),			\
-		caa_cast_long_keep_sign(v),	\
-		sizeof(*(addr))))
-#define cmm_smp_mb__before_uatomic_and()	cmm_barrier()
-#define cmm_smp_mb__after_uatomic_and()		cmm_barrier()
-
-#endif
-
-/* uatomic_or */
-
-#ifndef uatomic_or
-static inline __attribute__((always_inline))
-void _uatomic_or(void *addr, unsigned long val,
-		 int len)
-{
-	switch (len) {
-#ifdef UATOMIC_HAS_ATOMIC_BYTE
-	case 1:
-		__sync_or_and_fetch_1((uint8_t *) addr, val);
-		return;
-#endif
-#ifdef UATOMIC_HAS_ATOMIC_SHORT
-	case 2:
-		__sync_or_and_fetch_2((uint16_t *) addr, val);
-		return;
-#endif
-	case 4:
-		__sync_or_and_fetch_4((uint32_t *) addr, val);
-		return;
-#if (CAA_BITS_PER_LONG == 64)
-	case 8:
-		__sync_or_and_fetch_8((uint64_t *) addr, val);
-		return;
-#endif
-	}
-	_uatomic_link_error();
-	return;
-}
-
-#define uatomic_or(addr, v)			\
-	(_uatomic_or((addr),			\
-		caa_cast_long_keep_sign(v),	\
-		sizeof(*(addr))))
-#define cmm_smp_mb__before_uatomic_or()		cmm_barrier()
-#define cmm_smp_mb__after_uatomic_or()		cmm_barrier()
-
-#endif
-
-
-/* uatomic_add_return */
-
-#ifndef uatomic_add_return
-static inline __attribute__((always_inline))
-unsigned long _uatomic_add_return(void *addr, unsigned long val,
-				 int len)
-{
-	switch (len) {
-#ifdef UATOMIC_HAS_ATOMIC_BYTE
-	case 1:
-		return __sync_add_and_fetch_1((uint8_t *) addr, val);
-#endif
-#ifdef UATOMIC_HAS_ATOMIC_SHORT
-	case 2:
-		return __sync_add_and_fetch_2((uint16_t *) addr, val);
-#endif
-	case 4:
-		return __sync_add_and_fetch_4((uint32_t *) addr, val);
-#if (CAA_BITS_PER_LONG == 64)
-	case 8:
-		return __sync_add_and_fetch_8((uint64_t *) addr, val);
-#endif
-	}
-	_uatomic_link_error();
-	return 0;
-}
-
-
-#define uatomic_add_return(addr, v)					    \
-	((__typeof__(*(addr))) _uatomic_add_return((addr),		    \
-						caa_cast_long_keep_sign(v), \
-						sizeof(*(addr))))
-#endif /* #ifndef uatomic_add_return */
-
-#ifndef uatomic_xchg
-/* xchg */
-
-static inline __attribute__((always_inline))
-unsigned long _uatomic_exchange(void *addr, unsigned long val, int len)
-{
-	switch (len) {
-#ifdef UATOMIC_HAS_ATOMIC_BYTE
-	case 1:
-	{
-		uint8_t old;
-
-		do {
-			old = uatomic_read((uint8_t *) addr);
-		} while (!__sync_bool_compare_and_swap_1((uint8_t *) addr,
-				old, val));
-
-		return old;
-	}
-#endif
-#ifdef UATOMIC_HAS_ATOMIC_SHORT
-	case 2:
-	{
-		uint16_t old;
-
-		do {
-			old = uatomic_read((uint16_t *) addr);
-		} while (!__sync_bool_compare_and_swap_2((uint16_t *) addr,
-				old, val));
-
-		return old;
-	}
-#endif
-	case 4:
-	{
-		uint32_t old;
-
-		do {
-			old = uatomic_read((uint32_t *) addr);
-		} while (!__sync_bool_compare_and_swap_4((uint32_t *) addr,
-				old, val));
-
-		return old;
-	}
-#if (CAA_BITS_PER_LONG == 64)
-	case 8:
-	{
-		uint64_t old;
-
-		do {
-			old = uatomic_read((uint64_t *) addr);
-		} while (!__sync_bool_compare_and_swap_8((uint64_t *) addr,
-				old, val));
-
-		return old;
-	}
-#endif
-	}
-	_uatomic_link_error();
-	return 0;
-}
-
-#define uatomic_xchg(addr, v)						    \
-	((__typeof__(*(addr))) _uatomic_exchange((addr),		    \
-						caa_cast_long_keep_sign(v), \
-						sizeof(*(addr))))
-#endif /* #ifndef uatomic_xchg */
-
-#else /* #ifndef uatomic_cmpxchg */
-
-#ifndef uatomic_and
-/* uatomic_and */
-
-static inline __attribute__((always_inline))
-void _uatomic_and(void *addr, unsigned long val, int len)
-{
-	switch (len) {
-#ifdef UATOMIC_HAS_ATOMIC_BYTE
-	case 1:
-	{
-		uint8_t old, oldt;
-
-		oldt = uatomic_read((uint8_t *) addr);
-		do {
-			old = oldt;
-			oldt = _uatomic_cmpxchg(addr, old, old & val, 1);
-		} while (oldt != old);
-
-		return;
-	}
-#endif
-#ifdef UATOMIC_HAS_ATOMIC_SHORT
-	case 2:
-	{
-		uint16_t old, oldt;
-
-		oldt = uatomic_read((uint16_t *) addr);
-		do {
-			old = oldt;
-			oldt = _uatomic_cmpxchg(addr, old, old & val, 2);
-		} while (oldt != old);
-	}
-#endif
-	case 4:
-	{
-		uint32_t old, oldt;
-
-		oldt = uatomic_read((uint32_t *) addr);
-		do {
-			old = oldt;
-			oldt = _uatomic_cmpxchg(addr, old, old & val, 4);
-		} while (oldt != old);
-
-		return;
-	}
-#if (CAA_BITS_PER_LONG == 64)
-	case 8:
-	{
-		uint64_t old, oldt;
-
-		oldt = uatomic_read((uint64_t *) addr);
-		do {
-			old = oldt;
-			oldt = _uatomic_cmpxchg(addr, old, old & val, 8);
-		} while (oldt != old);
-
-		return;
-	}
-#endif
-	}
-	_uatomic_link_error();
-}
-
-#define uatomic_and(addr, v)			\
-	(_uatomic_and((addr),			\
-		caa_cast_long_keep_sign(v),	\
-		sizeof(*(addr))))
-#define cmm_smp_mb__before_uatomic_and()	cmm_barrier()
-#define cmm_smp_mb__after_uatomic_and()		cmm_barrier()
-
-#endif /* #ifndef uatomic_and */
-
-#ifndef uatomic_or
-/* uatomic_or */
-
-static inline __attribute__((always_inline))
-void _uatomic_or(void *addr, unsigned long val, int len)
-{
-	switch (len) {
-#ifdef UATOMIC_HAS_ATOMIC_BYTE
-	case 1:
-	{
-		uint8_t old, oldt;
-
-		oldt = uatomic_read((uint8_t *) addr);
-		do {
-			old = oldt;
-			oldt = _uatomic_cmpxchg(addr, old, old | val, 1);
-		} while (oldt != old);
-
-		return;
-	}
-#endif
-#ifdef UATOMIC_HAS_ATOMIC_SHORT
-	case 2:
-	{
-		uint16_t old, oldt;
-
-		oldt = uatomic_read((uint16_t *) addr);
-		do {
-			old = oldt;
-			oldt = _uatomic_cmpxchg(addr, old, old | val, 2);
-		} while (oldt != old);
-
-		return;
-	}
-#endif
-	case 4:
-	{
-		uint32_t old, oldt;
-
-		oldt = uatomic_read((uint32_t *) addr);
-		do {
-			old = oldt;
-			oldt = _uatomic_cmpxchg(addr, old, old | val, 4);
-		} while (oldt != old);
-
-		return;
-	}
-#if (CAA_BITS_PER_LONG == 64)
-	case 8:
-	{
-		uint64_t old, oldt;
-
-		oldt = uatomic_read((uint64_t *) addr);
-		do {
-			old = oldt;
-			oldt = _uatomic_cmpxchg(addr, old, old | val, 8);
-		} while (oldt != old);
-
-		return;
-	}
-#endif
-	}
-	_uatomic_link_error();
-}
-
-#define uatomic_or(addr, v)			\
-	(_uatomic_or((addr),			\
-		caa_cast_long_keep_sign(v),	\
-		sizeof(*(addr))))
-#define cmm_smp_mb__before_uatomic_or()		cmm_barrier()
-#define cmm_smp_mb__after_uatomic_or()		cmm_barrier()
-
-#endif /* #ifndef uatomic_or */
-
-#ifndef uatomic_add_return
-/* uatomic_add_return */
-
-static inline __attribute__((always_inline))
-unsigned long _uatomic_add_return(void *addr, unsigned long val, int len)
-{
-	switch (len) {
-#ifdef UATOMIC_HAS_ATOMIC_BYTE
-	case 1:
-	{
-		uint8_t old, oldt;
-
-		oldt = uatomic_read((uint8_t *) addr);
-		do {
-			old = oldt;
-			oldt = uatomic_cmpxchg((uint8_t *) addr,
-                                               old, old + val);
-		} while (oldt != old);
-
-		return old + val;
-	}
-#endif
-#ifdef UATOMIC_HAS_ATOMIC_SHORT
-	case 2:
-	{
-		uint16_t old, oldt;
-
-		oldt = uatomic_read((uint16_t *) addr);
-		do {
-			old = oldt;
-			oldt = uatomic_cmpxchg((uint16_t *) addr,
-                                               old, old + val);
-		} while (oldt != old);
-
-		return old + val;
-	}
-#endif
-	case 4:
-	{
-		uint32_t old, oldt;
-
-		oldt = uatomic_read((uint32_t *) addr);
-		do {
-			old = oldt;
-			oldt = uatomic_cmpxchg((uint32_t *) addr,
-                                               old, old + val);
-		} while (oldt != old);
-
-		return old + val;
-	}
-#if (CAA_BITS_PER_LONG == 64)
-	case 8:
-	{
-		uint64_t old, oldt;
-
-		oldt = uatomic_read((uint64_t *) addr);
-		do {
-			old = oldt;
-			oldt = uatomic_cmpxchg((uint64_t *) addr,
-                                               old, old + val);
-		} while (oldt != old);
-
-		return old + val;
-	}
-#endif
-	}
-	_uatomic_link_error();
-	return 0;
-}
-
-#define uatomic_add_return(addr, v)					    \
-	((__typeof__(*(addr))) _uatomic_add_return((addr),		    \
-						caa_cast_long_keep_sign(v), \
-						sizeof(*(addr))))
-#endif /* #ifndef uatomic_add_return */
-
-#ifndef uatomic_xchg
-/* xchg */
-
-static inline __attribute__((always_inline))
-unsigned long _uatomic_exchange(void *addr, unsigned long val, int len)
-{
-	switch (len) {
-#ifdef UATOMIC_HAS_ATOMIC_BYTE
-	case 1:
-	{
-		uint8_t old, oldt;
-
-		oldt = uatomic_read((uint8_t *) addr);
-		do {
-			old = oldt;
-			oldt = uatomic_cmpxchg((uint8_t *) addr,
-                                               old, val);
-		} while (oldt != old);
-
-		return old;
-	}
-#endif
-#ifdef UATOMIC_HAS_ATOMIC_SHORT
-	case 2:
-	{
-		uint16_t old, oldt;
-
-		oldt = uatomic_read((uint16_t *) addr);
-		do {
-			old = oldt;
-			oldt = uatomic_cmpxchg((uint16_t *) addr,
-                                               old, val);
-		} while (oldt != old);
-
-		return old;
-	}
-#endif
-	case 4:
-	{
-		uint32_t old, oldt;
-
-		oldt = uatomic_read((uint32_t *) addr);
-		do {
-			old = oldt;
-			oldt = uatomic_cmpxchg((uint32_t *) addr,
-                                               old, val);
-		} while (oldt != old);
-
-		return old;
-	}
-#if (CAA_BITS_PER_LONG == 64)
-	case 8:
-	{
-		uint64_t old, oldt;
-
-		oldt = uatomic_read((uint64_t *) addr);
-		do {
-			old = oldt;
-			oldt = uatomic_cmpxchg((uint64_t *) addr,
-                                               old, val);
-		} while (oldt != old);
-
-		return old;
-	}
-#endif
-	}
-	_uatomic_link_error();
-	return 0;
-}
-
-#define uatomic_xchg(addr, v)						    \
-	((__typeof__(*(addr))) _uatomic_exchange((addr),		    \
-						caa_cast_long_keep_sign(v), \
-						sizeof(*(addr))))
-#endif /* #ifndef uatomic_xchg */
-
-#endif /* #else #ifndef uatomic_cmpxchg */
-
-/* uatomic_sub_return, uatomic_add, uatomic_sub, uatomic_inc, uatomic_dec */
-
-#ifndef uatomic_add
-#define uatomic_add(addr, v)		(void)uatomic_add_return((addr), (v))
-#define cmm_smp_mb__before_uatomic_add()	cmm_barrier()
-#define cmm_smp_mb__after_uatomic_add()		cmm_barrier()
-#endif
-
-#define uatomic_sub_return(addr, v)	\
-	uatomic_add_return((addr), -(caa_cast_long_keep_sign(v)))
-#define uatomic_sub(addr, v)		\
-	uatomic_add((addr), -(caa_cast_long_keep_sign(v)))
-#define cmm_smp_mb__before_uatomic_sub()	cmm_smp_mb__before_uatomic_add()
-#define cmm_smp_mb__after_uatomic_sub()		cmm_smp_mb__after_uatomic_add()
-
-#ifndef uatomic_inc
-#define uatomic_inc(addr)		uatomic_add((addr), 1)
-#define cmm_smp_mb__before_uatomic_inc()	cmm_smp_mb__before_uatomic_add()
-#define cmm_smp_mb__after_uatomic_inc()		cmm_smp_mb__after_uatomic_add()
-#endif
-
-#ifndef uatomic_dec
-#define uatomic_dec(addr)		uatomic_add((addr), -1)
-#define cmm_smp_mb__before_uatomic_dec()	cmm_smp_mb__before_uatomic_add()
-#define cmm_smp_mb__after_uatomic_dec()		cmm_smp_mb__after_uatomic_add()
-#endif
-
-#ifdef __cplusplus
-}
-#endif
-
-#endif /* _URCU_UATOMIC_GENERIC_H */
diff --git a/include/urcu/uatomic/hppa.h b/include/urcu/uatomic/hppa.h
deleted file mode 100644
index 2102153..0000000
--- a/include/urcu/uatomic/hppa.h
+++ /dev/null
@@ -1,10 +0,0 @@
-#ifndef _URCU_ARCH_UATOMIC_HPPA_H
-#define _URCU_ARCH_UATOMIC_HPPA_H
-
-#include <urcu/compiler.h>
-#include <urcu/system.h>
-
-#define UATOMIC_HAS_ATOMIC_SHORT
-#include <urcu/uatomic/generic.h>
-
-#endif /* _URCU_ARCH_UATOMIC_HPPA_H */
diff --git a/include/urcu/uatomic/ia64.h b/include/urcu/uatomic/ia64.h
deleted file mode 100644
index b5db8cc..0000000
--- a/include/urcu/uatomic/ia64.h
+++ /dev/null
@@ -1,41 +0,0 @@
-#ifndef _URCU_ARCH_UATOMIC_IA64_H
-#define _URCU_ARCH_UATOMIC_IA64_H
-
-/*
- * Copyright (c) 1991-1994 by Xerox Corporation.  All rights reserved.
- * Copyright (c) 1996-1999 by Silicon Graphics.  All rights reserved.
- * Copyright (c) 1999-2004 Hewlett-Packard Development Company, L.P.
- * Copyright (c) 2009-2015 Mathieu Desnoyers
- * Copyright (c) 2010      Paul E. McKenney, IBM Corporation
- *			   (Adapted from uatomic_arch_ppc.h)
- *
- * THIS MATERIAL IS PROVIDED AS IS, WITH ABSOLUTELY NO WARRANTY EXPRESSED
- * OR IMPLIED.  ANY USE IS AT YOUR OWN RISK.
- *
- * Permission is hereby granted to use or copy this program
- * for any purpose,  provided the above notices are retained on all copies.
- * Permission to modify the code and to distribute modified code is granted,
- * provided the above notices are retained, and a notice that the code was
- * modified is included with the above copyright notice.
- *
- * Code inspired from libuatomic_ops-1.2, inherited in part from the
- * Boehm-Demers-Weiser conservative garbage collector.
- */
-
-#include <urcu/compiler.h>
-#include <urcu/system.h>
-
-#ifdef __cplusplus
-extern "C" {
-#endif
-
-#define UATOMIC_HAS_ATOMIC_BYTE
-#define UATOMIC_HAS_ATOMIC_SHORT
-
-#ifdef __cplusplus
-}
-#endif
-
-#include <urcu/uatomic/generic.h>
-
-#endif /* _URCU_ARCH_UATOMIC_IA64_H */
diff --git a/include/urcu/uatomic/m68k.h b/include/urcu/uatomic/m68k.h
deleted file mode 100644
index 60b01c7..0000000
--- a/include/urcu/uatomic/m68k.h
+++ /dev/null
@@ -1,44 +0,0 @@
-/*
- * Atomic exchange operations for the m68k architecture. Let GCC do it.
- *
- * Copyright (c) 2017 Michael Jeanson <mjeanson@efficios.com>
- *
- * Permission is hereby granted, free of charge, to any person obtaining a copy
- * of this software and associated documentation files (the "Software"), to
- * deal in the Software without restriction, including without limitation the
- * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
- * sell copies of the Software, and to permit persons to whom the Software is
- * furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice shall be included in
- * all copies or substantial portions of the Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
- * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
- * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
- * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
- * IN THE SOFTWARE.
- */
-
-#ifndef _URCU_ARCH_UATOMIC_M68K_H
-#define _URCU_ARCH_UATOMIC_M68K_H
-
-#include <urcu/compiler.h>
-#include <urcu/system.h>
-
-#ifdef __cplusplus
-extern "C" {
-#endif
-
-#define UATOMIC_HAS_ATOMIC_BYTE
-#define UATOMIC_HAS_ATOMIC_SHORT
-
-#ifdef __cplusplus
-}
-#endif
-
-#include <urcu/uatomic/generic.h>
-
-#endif /* _URCU_ARCH_UATOMIC_M68K_H */
diff --git a/include/urcu/uatomic/mips.h b/include/urcu/uatomic/mips.h
deleted file mode 100644
index bd7ca7f..0000000
--- a/include/urcu/uatomic/mips.h
+++ /dev/null
@@ -1,32 +0,0 @@
-#ifndef _URCU_UATOMIC_ARCH_MIPS_H
-#define _URCU_UATOMIC_ARCH_MIPS_H
-
-/*
- * Atomic exchange operations for the MIPS architecture. Let GCC do it.
- *
- * Copyright (c) 2010 Paolo Bonzini <pbonzini@redhat.com>
- *
- * Permission is hereby granted, free of charge, to any person obtaining a copy
- * of this software and associated documentation files (the "Software"), to
- * deal in the Software without restriction, including without limitation the
- * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
- * sell copies of the Software, and to permit persons to whom the Software is
- * furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice shall be included in
- * all copies or substantial portions of the Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
- * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
- * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
- * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
- * IN THE SOFTWARE.
- */
-
-#include <urcu/compiler.h>
-#include <urcu/system.h>
-#include <urcu/uatomic/generic.h>
-
-#endif /* _URCU_UATOMIC_ARCH_MIPS_H */
diff --git a/include/urcu/uatomic/nios2.h b/include/urcu/uatomic/nios2.h
deleted file mode 100644
index 5b3c303..0000000
--- a/include/urcu/uatomic/nios2.h
+++ /dev/null
@@ -1,32 +0,0 @@
-#ifndef _URCU_UATOMIC_ARCH_NIOS2_H
-#define _URCU_UATOMIC_ARCH_NIOS2_H
-
-/*
- * Atomic exchange operations for the NIOS2 architecture. Let GCC do it.
- *
- * Copyright (c) 2016 Marek Vasut <marex@denx.de>
- *
- * Permission is hereby granted, free of charge, to any person obtaining a copy
- * of this software and associated documentation files (the "Software"), to
- * deal in the Software without restriction, including without limitation the
- * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
- * sell copies of the Software, and to permit persons to whom the Software is
- * furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice shall be included in
- * all copies or substantial portions of the Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
- * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
- * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
- * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
- * IN THE SOFTWARE.
- */
-
-#include <urcu/compiler.h>
-#include <urcu/system.h>
-#include <urcu/uatomic/generic.h>
-
-#endif /* _URCU_UATOMIC_ARCH_NIOS2_H */
diff --git a/include/urcu/uatomic/ppc.h b/include/urcu/uatomic/ppc.h
deleted file mode 100644
index 0e672f5..0000000
--- a/include/urcu/uatomic/ppc.h
+++ /dev/null
@@ -1,237 +0,0 @@
-#ifndef _URCU_ARCH_UATOMIC_PPC_H
-#define _URCU_ARCH_UATOMIC_PPC_H
-
-/*
- * Copyright (c) 1991-1994 by Xerox Corporation.  All rights reserved.
- * Copyright (c) 1996-1999 by Silicon Graphics.  All rights reserved.
- * Copyright (c) 1999-2004 Hewlett-Packard Development Company, L.P.
- * Copyright (c) 2009      Mathieu Desnoyers
- *
- * THIS MATERIAL IS PROVIDED AS IS, WITH ABSOLUTELY NO WARRANTY EXPRESSED
- * OR IMPLIED.  ANY USE IS AT YOUR OWN RISK.
- *
- * Permission is hereby granted to use or copy this program
- * for any purpose,  provided the above notices are retained on all copies.
- * Permission to modify the code and to distribute modified code is granted,
- * provided the above notices are retained, and a notice that the code was
- * modified is included with the above copyright notice.
- *
- * Code inspired from libuatomic_ops-1.2, inherited in part from the
- * Boehm-Demers-Weiser conservative garbage collector.
- */
-
-#include <urcu/compiler.h>
-#include <urcu/system.h>
-
-#ifdef __cplusplus
-extern "C" {
-#endif
-
-#define ILLEGAL_INSTR	".long	0xd00d00"
-
-/*
- * Providing sequential consistency semantic with respect to other
- * instructions for cmpxchg and add_return family of atomic primitives.
- *
- * This is achieved with:
- *   lwsync (prior loads can be reordered after following load)
- *   lwarx
- *   stwcx.
- *   test if success (retry)
- *   sync
- *
- * Explanation of the sequential consistency provided by this scheme
- * from Paul E. McKenney:
- *
- * The reason we can get away with the lwsync before is that if a prior
- * store reorders with the lwarx, then you have to store to the atomic
- * variable from some other CPU to detect it.
- *
- * And if you do that, the lwarx will lose its reservation, so the stwcx
- * will fail.  The atomic operation will retry, so that the caller won't be
- * able to see the misordering.
- */
-
-/* xchg */
-
-static inline __attribute__((always_inline))
-unsigned long _uatomic_exchange(void *addr, unsigned long val, int len)
-{
-	switch (len) {
-	case 4:
-	{
-		unsigned int result;
-
-		__asm__ __volatile__(
-			LWSYNC_OPCODE
-		"1:\t"	"lwarx %0,0,%1\n"	/* load and reserve */
-			"stwcx. %2,0,%1\n"	/* else store conditional */
-			"bne- 1b\n"	 	/* retry if lost reservation */
-			"sync\n"
-				: "=&r"(result)
-				: "r"(addr), "r"(val)
-				: "memory", "cc");
-
-		return result;
-	}
-#if (CAA_BITS_PER_LONG == 64)
-	case 8:
-	{
-		unsigned long result;
-
-		__asm__ __volatile__(
-			LWSYNC_OPCODE
-		"1:\t"	"ldarx %0,0,%1\n"	/* load and reserve */
-			"stdcx. %2,0,%1\n"	/* else store conditional */
-			"bne- 1b\n"	 	/* retry if lost reservation */
-			"sync\n"
-				: "=&r"(result)
-				: "r"(addr), "r"(val)
-				: "memory", "cc");
-
-		return result;
-	}
-#endif
-	}
-	/*
-	 * generate an illegal instruction. Cannot catch this with
-	 * linker tricks when optimizations are disabled.
-	 */
-	__asm__ __volatile__(ILLEGAL_INSTR);
-	return 0;
-}
-
-#define uatomic_xchg(addr, v)						    \
-	((__typeof__(*(addr))) _uatomic_exchange((addr),		    \
-						caa_cast_long_keep_sign(v), \
-						sizeof(*(addr))))
-/* cmpxchg */
-
-static inline __attribute__((always_inline))
-unsigned long _uatomic_cmpxchg(void *addr, unsigned long old,
-			      unsigned long _new, int len)
-{
-	switch (len) {
-	case 4:
-	{
-		unsigned int old_val;
-
-		__asm__ __volatile__(
-			LWSYNC_OPCODE
-		"1:\t"	"lwarx %0,0,%1\n"	/* load and reserve */
-			"cmpw %0,%3\n"		/* if load is not equal to */
-			"bne 2f\n"		/* old, fail */
-			"stwcx. %2,0,%1\n"	/* else store conditional */
-			"bne- 1b\n"	 	/* retry if lost reservation */
-			"sync\n"
-		"2:\n"
-				: "=&r"(old_val)
-				: "r"(addr), "r"((unsigned int)_new),
-				  "r"((unsigned int)old)
-				: "memory", "cc");
-
-		return old_val;
-	}
-#if (CAA_BITS_PER_LONG == 64)
-	case 8:
-	{
-		unsigned long old_val;
-
-		__asm__ __volatile__(
-			LWSYNC_OPCODE
-		"1:\t"	"ldarx %0,0,%1\n"	/* load and reserve */
-			"cmpd %0,%3\n"		/* if load is not equal to */
-			"bne 2f\n"		/* old, fail */
-			"stdcx. %2,0,%1\n"	/* else store conditional */
-			"bne- 1b\n"	 	/* retry if lost reservation */
-			"sync\n"
-		"2:\n"
-				: "=&r"(old_val)
-				: "r"(addr), "r"((unsigned long)_new),
-				  "r"((unsigned long)old)
-				: "memory", "cc");
-
-		return old_val;
-	}
-#endif
-	}
-	/*
-	 * generate an illegal instruction. Cannot catch this with
-	 * linker tricks when optimizations are disabled.
-	 */
-	__asm__ __volatile__(ILLEGAL_INSTR);
-	return 0;
-}
-
-
-#define uatomic_cmpxchg(addr, old, _new)				      \
-	((__typeof__(*(addr))) _uatomic_cmpxchg((addr),			      \
-						caa_cast_long_keep_sign(old), \
-						caa_cast_long_keep_sign(_new),\
-						sizeof(*(addr))))
-
-/* uatomic_add_return */
-
-static inline __attribute__((always_inline))
-unsigned long _uatomic_add_return(void *addr, unsigned long val,
-				 int len)
-{
-	switch (len) {
-	case 4:
-	{
-		unsigned int result;
-
-		__asm__ __volatile__(
-			LWSYNC_OPCODE
-		"1:\t"	"lwarx %0,0,%1\n"	/* load and reserve */
-			"add %0,%2,%0\n"	/* add val to value loaded */
-			"stwcx. %0,0,%1\n"	/* store conditional */
-			"bne- 1b\n"	 	/* retry if lost reservation */
-			"sync\n"
-				: "=&r"(result)
-				: "r"(addr), "r"(val)
-				: "memory", "cc");
-
-		return result;
-	}
-#if (CAA_BITS_PER_LONG == 64)
-	case 8:
-	{
-		unsigned long result;
-
-		__asm__ __volatile__(
-			LWSYNC_OPCODE
-		"1:\t"	"ldarx %0,0,%1\n"	/* load and reserve */
-			"add %0,%2,%0\n"	/* add val to value loaded */
-			"stdcx. %0,0,%1\n"	/* store conditional */
-			"bne- 1b\n"	 	/* retry if lost reservation */
-			"sync\n"
-				: "=&r"(result)
-				: "r"(addr), "r"(val)
-				: "memory", "cc");
-
-		return result;
-	}
-#endif
-	}
-	/*
-	 * generate an illegal instruction. Cannot catch this with
-	 * linker tricks when optimizations are disabled.
-	 */
-	__asm__ __volatile__(ILLEGAL_INSTR);
-	return 0;
-}
-
-
-#define uatomic_add_return(addr, v)					    \
-	((__typeof__(*(addr))) _uatomic_add_return((addr),		    \
-						caa_cast_long_keep_sign(v), \
-						sizeof(*(addr))))
-
-#ifdef __cplusplus
-}
-#endif
-
-#include <urcu/uatomic/generic.h>
-
-#endif /* _URCU_ARCH_UATOMIC_PPC_H */
diff --git a/include/urcu/uatomic/riscv.h b/include/urcu/uatomic/riscv.h
deleted file mode 100644
index a6700e1..0000000
--- a/include/urcu/uatomic/riscv.h
+++ /dev/null
@@ -1,44 +0,0 @@
-/*
- * Atomic exchange operations for the RISC-V architecture. Let GCC do it.
- *
- * Copyright (c) 2018 Michael Jeanson <mjeanson@efficios.com>
- *
- * Permission is hereby granted, free of charge, to any person obtaining a copy
- * of this software and associated documentation files (the "Software"), to
- * deal in the Software without restriction, including without limitation the
- * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
- * sell copies of the Software, and to permit persons to whom the Software is
- * furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice shall be included in
- * all copies or substantial portions of the Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
- * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
- * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
- * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
- * IN THE SOFTWARE.
- */
-
-#ifndef _URCU_ARCH_UATOMIC_RISCV_H
-#define _URCU_ARCH_UATOMIC_RISCV_H
-
-#include <urcu/compiler.h>
-#include <urcu/system.h>
-
-#ifdef __cplusplus
-extern "C" {
-#endif
-
-#define UATOMIC_HAS_ATOMIC_BYTE
-#define UATOMIC_HAS_ATOMIC_SHORT
-
-#ifdef __cplusplus
-}
-#endif
-
-#include <urcu/uatomic/generic.h>
-
-#endif /* _URCU_ARCH_UATOMIC_RISCV_H */
diff --git a/include/urcu/uatomic/s390.h b/include/urcu/uatomic/s390.h
deleted file mode 100644
index 42f23e7..0000000
--- a/include/urcu/uatomic/s390.h
+++ /dev/null
@@ -1,170 +0,0 @@
-#ifndef _URCU_UATOMIC_ARCH_S390_H
-#define _URCU_UATOMIC_ARCH_S390_H
-
-/*
- * Atomic exchange operations for the S390 architecture. Based on information
- * taken from the Principles of Operation Appendix A "Conditional Swapping
- * Instructions (CS, CDS)".
- *
- * Copyright (c) 2009 Novell, Inc.
- * Author: Jan Blunck <jblunck@suse.de>
- * Copyright (c) 2009 Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
- *
- * Permission is hereby granted, free of charge, to any person obtaining a copy
- * of this software and associated documentation files (the "Software"), to
- * deal in the Software without restriction, including without limitation the
- * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
- * sell copies of the Software, and to permit persons to whom the Software is
- * furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice shall be included in
- * all copies or substantial portions of the Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
- * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
- * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
- * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
- * IN THE SOFTWARE.
- */
-
-#include <urcu/compiler.h>
-#include <urcu/system.h>
-
-#ifdef __cplusplus
-extern "C" {
-#endif
-
-#if __GNUC__ > 3 || (__GNUC__ == 3 && __GNUC_MINOR__ > 2)
-#define COMPILER_HAVE_SHORT_MEM_OPERAND
-#endif
-
-/*
- * MEMOP assembler operand rules:
- * - op refer to MEMOP_IN operand
- * - MEMOP_IN can expand to more than a single operand. Use it at the end of
- *   operand list only.
- */
-
-#ifdef COMPILER_HAVE_SHORT_MEM_OPERAND
-
-#define MEMOP_OUT(addr)	"=Q" (*(addr))
-#define MEMOP_IN(addr)	"Q" (*(addr))
-#define MEMOP_REF(op)	#op		/* op refer to MEMOP_IN operand */
-
-#else /* !COMPILER_HAVE_SHORT_MEM_OPERAND */
-
-#define MEMOP_OUT(addr)	"=m" (*(addr))
-#define MEMOP_IN(addr)	"a" (addr), "m" (*(addr))
-#define MEMOP_REF(op)	"0(" #op ")"	/* op refer to MEMOP_IN operand */
-
-#endif /* !COMPILER_HAVE_SHORT_MEM_OPERAND */
-
-/*
- * The __hp() macro casts the void pointer @x to a pointer to a structure
- * containing an array of char of the specified size. This allows passing the
- * @addr arguments of the following inline functions as "m" and "+m" operands
- * to the assembly. The @size parameter should be a constant to support
- * compilers such as clang which do not support VLA. Create typedefs because
- * C++ does not allow types be defined in casts.
- */
-
-typedef struct { char v[4]; } __hp_4;
-typedef struct { char v[8]; } __hp_8;
-
-#define __hp(size, x)	((__hp_##size *)(x))
-
-/* xchg */
-
-static inline __attribute__((always_inline))
-unsigned long _uatomic_exchange(volatile void *addr, unsigned long val, int len)
-{
-	switch (len) {
-	case 4:
-	{
-		unsigned int old_val;
-
-		__asm__ __volatile__(
-			"0:	cs %0,%2," MEMOP_REF(%3) "\n"
-			"	brc 4,0b\n"
-			: "=&r" (old_val), MEMOP_OUT (__hp(4, addr))
-			: "r" (val), MEMOP_IN (__hp(4, addr))
-			: "memory", "cc");
-		return old_val;
-	}
-#if (CAA_BITS_PER_LONG == 64)
-	case 8:
-	{
-		unsigned long old_val;
-
-		__asm__ __volatile__(
-			"0:	csg %0,%2," MEMOP_REF(%3) "\n"
-			"	brc 4,0b\n"
-			: "=&r" (old_val), MEMOP_OUT (__hp(8, addr))
-			: "r" (val), MEMOP_IN (__hp(8, addr))
-			: "memory", "cc");
-		return old_val;
-	}
-#endif
-	default:
-		__asm__ __volatile__(".long	0xd00d00");
-	}
-
-	return 0;
-}
-
-#define uatomic_xchg(addr, v)						    \
-	(__typeof__(*(addr))) _uatomic_exchange((addr),			    \
-						caa_cast_long_keep_sign(v), \
-						sizeof(*(addr)))
-
-/* cmpxchg */
-
-static inline __attribute__((always_inline))
-unsigned long _uatomic_cmpxchg(void *addr, unsigned long old,
-			       unsigned long _new, int len)
-{
-	switch (len) {
-	case 4:
-	{
-		unsigned int old_val = (unsigned int)old;
-
-		__asm__ __volatile__(
-			"	cs %0,%2," MEMOP_REF(%3) "\n"
-			: "+r" (old_val), MEMOP_OUT (__hp(4, addr))
-			: "r" (_new), MEMOP_IN (__hp(4, addr))
-			: "memory", "cc");
-		return old_val;
-	}
-#if (CAA_BITS_PER_LONG == 64)
-	case 8:
-	{
-		__asm__ __volatile__(
-			"	csg %0,%2," MEMOP_REF(%3) "\n"
-			: "+r" (old), MEMOP_OUT (__hp(8, addr))
-			: "r" (_new), MEMOP_IN (__hp(8, addr))
-			: "memory", "cc");
-		return old;
-	}
-#endif
-	default:
-		__asm__ __volatile__(".long	0xd00d00");
-	}
-
-	return 0;
-}
-
-#define uatomic_cmpxchg(addr, old, _new)				     \
-	(__typeof__(*(addr))) _uatomic_cmpxchg((addr),			     \
-					       caa_cast_long_keep_sign(old), \
-					       caa_cast_long_keep_sign(_new),\
-					       sizeof(*(addr)))
-
-#ifdef __cplusplus
-}
-#endif
-
-#include <urcu/uatomic/generic.h>
-
-#endif /* _URCU_UATOMIC_ARCH_S390_H */
diff --git a/include/urcu/uatomic/sparc64.h b/include/urcu/uatomic/sparc64.h
deleted file mode 100644
index a9f2795..0000000
--- a/include/urcu/uatomic/sparc64.h
+++ /dev/null
@@ -1,81 +0,0 @@
-#ifndef _URCU_ARCH_UATOMIC_SPARC64_H
-#define _URCU_ARCH_UATOMIC_SPARC64_H
-
-/*
- * Copyright (c) 1991-1994 by Xerox Corporation.  All rights reserved.
- * Copyright (c) 1996-1999 by Silicon Graphics.  All rights reserved.
- * Copyright (c) 1999-2003 by Hewlett-Packard Company. All rights reserved.
- * Copyright (c) 2009      Mathieu Desnoyers
- *
- * THIS MATERIAL IS PROVIDED AS IS, WITH ABSOLUTELY NO WARRANTY EXPRESSED
- * OR IMPLIED.  ANY USE IS AT YOUR OWN RISK.
- *
- * Permission is hereby granted to use or copy this program
- * for any purpose,  provided the above notices are retained on all copies.
- * Permission to modify the code and to distribute modified code is granted,
- * provided the above notices are retained, and a notice that the code was
- * modified is included with the above copyright notice.
- *
- * Code inspired from libuatomic_ops-1.2, inherited in part from the
- * Boehm-Demers-Weiser conservative garbage collector.
- */
-
-#include <urcu/compiler.h>
-#include <urcu/system.h>
-
-#ifdef __cplusplus
-extern "C" {
-#endif
-
-/* cmpxchg */
-
-static inline __attribute__((always_inline))
-unsigned long _uatomic_cmpxchg(void *addr, unsigned long old,
-			      unsigned long _new, int len)
-{
-	switch (len) {
-	case 4:
-	{
-		__asm__ __volatile__ (
-			"membar #StoreLoad | #LoadLoad\n\t"
-                        "cas [%1],%2,%0\n\t"
-                        "membar #StoreLoad | #StoreStore\n\t"
-                        : "+&r" (_new)
-                        : "r" (addr), "r" (old)
-                        : "memory");
-
-		return _new;
-	}
-#if (CAA_BITS_PER_LONG == 64)
-	case 8:
-	{
-		__asm__ __volatile__ (
-			"membar #StoreLoad | #LoadLoad\n\t"
-                        "casx [%1],%2,%0\n\t"
-                        "membar #StoreLoad | #StoreStore\n\t"
-                        : "+&r" (_new)
-                        : "r" (addr), "r" (old)
-                        : "memory");
-
-		return _new;
-	}
-#endif
-	}
-	__builtin_trap();
-	return 0;
-}
-
-
-#define uatomic_cmpxchg(addr, old, _new)				       \
-	((__typeof__(*(addr))) _uatomic_cmpxchg((addr),			       \
-						caa_cast_long_keep_sign(old),  \
-						caa_cast_long_keep_sign(_new), \
-						sizeof(*(addr))))
-
-#ifdef __cplusplus
-}
-#endif
-
-#include <urcu/uatomic/generic.h>
-
-#endif /* _URCU_ARCH_UATOMIC_PPC_H */
diff --git a/include/urcu/uatomic/tile.h b/include/urcu/uatomic/tile.h
deleted file mode 100644
index 830f260..0000000
--- a/include/urcu/uatomic/tile.h
+++ /dev/null
@@ -1,41 +0,0 @@
-#ifndef _URCU_ARCH_UATOMIC_TILE_H
-#define _URCU_ARCH_UATOMIC_TILE_H
-
-/*
- * Copyright (c) 1991-1994 by Xerox Corporation.  All rights reserved.
- * Copyright (c) 1996-1999 by Silicon Graphics.  All rights reserved.
- * Copyright (c) 1999-2004 Hewlett-Packard Development Company, L.P.
- * Copyright (c) 2009-2015 Mathieu Desnoyers
- * Copyright (c) 2010      Paul E. McKenney, IBM Corporation
- *			   (Adapted from uatomic_arch_ppc.h)
- *
- * THIS MATERIAL IS PROVIDED AS IS, WITH ABSOLUTELY NO WARRANTY EXPRESSED
- * OR IMPLIED.  ANY USE IS AT YOUR OWN RISK.
- *
- * Permission is hereby granted to use or copy this program
- * for any purpose,  provided the above notices are retained on all copies.
- * Permission to modify the code and to distribute modified code is granted,
- * provided the above notices are retained, and a notice that the code was
- * modified is included with the above copyright notice.
- *
- * Code inspired from libuatomic_ops-1.2, inherited in part from the
- * Boehm-Demers-Weiser conservative garbage collector.
- */
-
-#include <urcu/compiler.h>
-#include <urcu/system.h>
-
-#ifdef __cplusplus
-extern "C" {
-#endif
-
-#define UATOMIC_HAS_ATOMIC_BYTE
-#define UATOMIC_HAS_ATOMIC_SHORT
-
-#ifdef __cplusplus
-}
-#endif
-
-#include <urcu/uatomic/generic.h>
-
-#endif /* _URCU_ARCH_UATOMIC_TILE_H */
diff --git a/include/urcu/uatomic/x86.h b/include/urcu/uatomic/x86.h
deleted file mode 100644
index d416963..0000000
--- a/include/urcu/uatomic/x86.h
+++ /dev/null
@@ -1,646 +0,0 @@
-#ifndef _URCU_ARCH_UATOMIC_X86_H
-#define _URCU_ARCH_UATOMIC_X86_H
-
-/*
- * Copyright (c) 1991-1994 by Xerox Corporation.  All rights reserved.
- * Copyright (c) 1996-1999 by Silicon Graphics.  All rights reserved.
- * Copyright (c) 1999-2004 Hewlett-Packard Development Company, L.P.
- * Copyright (c) 2009      Mathieu Desnoyers
- *
- * THIS MATERIAL IS PROVIDED AS IS, WITH ABSOLUTELY NO WARRANTY EXPRESSED
- * OR IMPLIED.  ANY USE IS AT YOUR OWN RISK.
- *
- * Permission is hereby granted to use or copy this program
- * for any purpose,  provided the above notices are retained on all copies.
- * Permission to modify the code and to distribute modified code is granted,
- * provided the above notices are retained, and a notice that the code was
- * modified is included with the above copyright notice.
- *
- * Code inspired from libuatomic_ops-1.2, inherited in part from the
- * Boehm-Demers-Weiser conservative garbage collector.
- */
-
-#include <urcu/arch.h>
-#include <urcu/config.h>
-#include <urcu/compiler.h>
-#include <urcu/system.h>
-
-#define UATOMIC_HAS_ATOMIC_BYTE
-#define UATOMIC_HAS_ATOMIC_SHORT
-
-#ifdef __cplusplus
-extern "C" {
-#endif
-
-/*
- * Derived from AO_compare_and_swap() and AO_test_and_set_full().
- */
-
-/*
- * The __hp() macro casts the void pointer @x to a pointer to a structure
- * containing an array of char of the specified size. This allows passing the
- * @addr arguments of the following inline functions as "m" and "+m" operands
- * to the assembly. The @size parameter should be a constant to support
- * compilers such as clang which do not support VLA. Create typedefs because
- * C++ does not allow types be defined in casts.
- */
-
-typedef struct { char v[1]; } __hp_1;
-typedef struct { char v[2]; } __hp_2;
-typedef struct { char v[4]; } __hp_4;
-typedef struct { char v[8]; } __hp_8;
-
-#define __hp(size, x)	((__hp_##size *)(x))
-
-#define _uatomic_set(addr, v)	((void) CMM_STORE_SHARED(*(addr), (v)))
-
-/* cmpxchg */
-
-static inline __attribute__((always_inline))
-unsigned long __uatomic_cmpxchg(void *addr, unsigned long old,
-			      unsigned long _new, int len)
-{
-	switch (len) {
-	case 1:
-	{
-		unsigned char result = old;
-
-		__asm__ __volatile__(
-		"lock; cmpxchgb %2, %1"
-			: "+a"(result), "+m"(*__hp(1, addr))
-			: "q"((unsigned char)_new)
-			: "memory");
-		return result;
-	}
-	case 2:
-	{
-		unsigned short result = old;
-
-		__asm__ __volatile__(
-		"lock; cmpxchgw %2, %1"
-			: "+a"(result), "+m"(*__hp(2, addr))
-			: "r"((unsigned short)_new)
-			: "memory");
-		return result;
-	}
-	case 4:
-	{
-		unsigned int result = old;
-
-		__asm__ __volatile__(
-		"lock; cmpxchgl %2, %1"
-			: "+a"(result), "+m"(*__hp(4, addr))
-			: "r"((unsigned int)_new)
-			: "memory");
-		return result;
-	}
-#if (CAA_BITS_PER_LONG == 64)
-	case 8:
-	{
-		unsigned long result = old;
-
-		__asm__ __volatile__(
-		"lock; cmpxchgq %2, %1"
-			: "+a"(result), "+m"(*__hp(8, addr))
-			: "r"((unsigned long)_new)
-			: "memory");
-		return result;
-	}
-#endif
-	}
-	/*
-	 * generate an illegal instruction. Cannot catch this with
-	 * linker tricks when optimizations are disabled.
-	 */
-	__asm__ __volatile__("ud2");
-	return 0;
-}
-
-#define _uatomic_cmpxchg(addr, old, _new)				      \
-	((__typeof__(*(addr))) __uatomic_cmpxchg((addr),		      \
-						caa_cast_long_keep_sign(old), \
-						caa_cast_long_keep_sign(_new),\
-						sizeof(*(addr))))
-
-/* xchg */
-
-static inline __attribute__((always_inline))
-unsigned long __uatomic_exchange(void *addr, unsigned long val, int len)
-{
-	/* Note: the "xchg" instruction does not need a "lock" prefix. */
-	switch (len) {
-	case 1:
-	{
-		unsigned char result;
-		__asm__ __volatile__(
-		"xchgb %0, %1"
-			: "=q"(result), "+m"(*__hp(1, addr))
-			: "0" ((unsigned char)val)
-			: "memory");
-		return result;
-	}
-	case 2:
-	{
-		unsigned short result;
-		__asm__ __volatile__(
-		"xchgw %0, %1"
-			: "=r"(result), "+m"(*__hp(2, addr))
-			: "0" ((unsigned short)val)
-			: "memory");
-		return result;
-	}
-	case 4:
-	{
-		unsigned int result;
-		__asm__ __volatile__(
-		"xchgl %0, %1"
-			: "=r"(result), "+m"(*__hp(4, addr))
-			: "0" ((unsigned int)val)
-			: "memory");
-		return result;
-	}
-#if (CAA_BITS_PER_LONG == 64)
-	case 8:
-	{
-		unsigned long result;
-		__asm__ __volatile__(
-		"xchgq %0, %1"
-			: "=r"(result), "+m"(*__hp(8, addr))
-			: "0" ((unsigned long)val)
-			: "memory");
-		return result;
-	}
-#endif
-	}
-	/*
-	 * generate an illegal instruction. Cannot catch this with
-	 * linker tricks when optimizations are disabled.
-	 */
-	__asm__ __volatile__("ud2");
-	return 0;
-}
-
-#define _uatomic_xchg(addr, v)						      \
-	((__typeof__(*(addr))) __uatomic_exchange((addr),		      \
-						caa_cast_long_keep_sign(v),   \
-						sizeof(*(addr))))
-
-/* uatomic_add_return */
-
-static inline __attribute__((always_inline))
-unsigned long __uatomic_add_return(void *addr, unsigned long val,
-				 int len)
-{
-	switch (len) {
-	case 1:
-	{
-		unsigned char result = val;
-
-		__asm__ __volatile__(
-		"lock; xaddb %1, %0"
-			: "+m"(*__hp(1, addr)), "+q" (result)
-			:
-			: "memory");
-		return result + (unsigned char)val;
-	}
-	case 2:
-	{
-		unsigned short result = val;
-
-		__asm__ __volatile__(
-		"lock; xaddw %1, %0"
-			: "+m"(*__hp(2, addr)), "+r" (result)
-			:
-			: "memory");
-		return result + (unsigned short)val;
-	}
-	case 4:
-	{
-		unsigned int result = val;
-
-		__asm__ __volatile__(
-		"lock; xaddl %1, %0"
-			: "+m"(*__hp(4, addr)), "+r" (result)
-			:
-			: "memory");
-		return result + (unsigned int)val;
-	}
-#if (CAA_BITS_PER_LONG == 64)
-	case 8:
-	{
-		unsigned long result = val;
-
-		__asm__ __volatile__(
-		"lock; xaddq %1, %0"
-			: "+m"(*__hp(8, addr)), "+r" (result)
-			:
-			: "memory");
-		return result + (unsigned long)val;
-	}
-#endif
-	}
-	/*
-	 * generate an illegal instruction. Cannot catch this with
-	 * linker tricks when optimizations are disabled.
-	 */
-	__asm__ __volatile__("ud2");
-	return 0;
-}
-
-#define _uatomic_add_return(addr, v)					    \
-	((__typeof__(*(addr))) __uatomic_add_return((addr),		    \
-						caa_cast_long_keep_sign(v), \
-						sizeof(*(addr))))
-
-/* uatomic_and */
-
-static inline __attribute__((always_inline))
-void __uatomic_and(void *addr, unsigned long val, int len)
-{
-	switch (len) {
-	case 1:
-	{
-		__asm__ __volatile__(
-		"lock; andb %1, %0"
-			: "=m"(*__hp(1, addr))
-			: "iq" ((unsigned char)val)
-			: "memory");
-		return;
-	}
-	case 2:
-	{
-		__asm__ __volatile__(
-		"lock; andw %1, %0"
-			: "=m"(*__hp(2, addr))
-			: "ir" ((unsigned short)val)
-			: "memory");
-		return;
-	}
-	case 4:
-	{
-		__asm__ __volatile__(
-		"lock; andl %1, %0"
-			: "=m"(*__hp(4, addr))
-			: "ir" ((unsigned int)val)
-			: "memory");
-		return;
-	}
-#if (CAA_BITS_PER_LONG == 64)
-	case 8:
-	{
-		__asm__ __volatile__(
-		"lock; andq %1, %0"
-			: "=m"(*__hp(8, addr))
-			: "er" ((unsigned long)val)
-			: "memory");
-		return;
-	}
-#endif
-	}
-	/*
-	 * generate an illegal instruction. Cannot catch this with
-	 * linker tricks when optimizations are disabled.
-	 */
-	__asm__ __volatile__("ud2");
-	return;
-}
-
-#define _uatomic_and(addr, v)						   \
-	(__uatomic_and((addr), caa_cast_long_keep_sign(v), sizeof(*(addr))))
-
-/* uatomic_or */
-
-static inline __attribute__((always_inline))
-void __uatomic_or(void *addr, unsigned long val, int len)
-{
-	switch (len) {
-	case 1:
-	{
-		__asm__ __volatile__(
-		"lock; orb %1, %0"
-			: "=m"(*__hp(1, addr))
-			: "iq" ((unsigned char)val)
-			: "memory");
-		return;
-	}
-	case 2:
-	{
-		__asm__ __volatile__(
-		"lock; orw %1, %0"
-			: "=m"(*__hp(2, addr))
-			: "ir" ((unsigned short)val)
-			: "memory");
-		return;
-	}
-	case 4:
-	{
-		__asm__ __volatile__(
-		"lock; orl %1, %0"
-			: "=m"(*__hp(4, addr))
-			: "ir" ((unsigned int)val)
-			: "memory");
-		return;
-	}
-#if (CAA_BITS_PER_LONG == 64)
-	case 8:
-	{
-		__asm__ __volatile__(
-		"lock; orq %1, %0"
-			: "=m"(*__hp(8, addr))
-			: "er" ((unsigned long)val)
-			: "memory");
-		return;
-	}
-#endif
-	}
-	/*
-	 * generate an illegal instruction. Cannot catch this with
-	 * linker tricks when optimizations are disabled.
-	 */
-	__asm__ __volatile__("ud2");
-	return;
-}
-
-#define _uatomic_or(addr, v)						   \
-	(__uatomic_or((addr), caa_cast_long_keep_sign(v), sizeof(*(addr))))
-
-/* uatomic_add */
-
-static inline __attribute__((always_inline))
-void __uatomic_add(void *addr, unsigned long val, int len)
-{
-	switch (len) {
-	case 1:
-	{
-		__asm__ __volatile__(
-		"lock; addb %1, %0"
-			: "=m"(*__hp(1, addr))
-			: "iq" ((unsigned char)val)
-			: "memory");
-		return;
-	}
-	case 2:
-	{
-		__asm__ __volatile__(
-		"lock; addw %1, %0"
-			: "=m"(*__hp(2, addr))
-			: "ir" ((unsigned short)val)
-			: "memory");
-		return;
-	}
-	case 4:
-	{
-		__asm__ __volatile__(
-		"lock; addl %1, %0"
-			: "=m"(*__hp(4, addr))
-			: "ir" ((unsigned int)val)
-			: "memory");
-		return;
-	}
-#if (CAA_BITS_PER_LONG == 64)
-	case 8:
-	{
-		__asm__ __volatile__(
-		"lock; addq %1, %0"
-			: "=m"(*__hp(8, addr))
-			: "er" ((unsigned long)val)
-			: "memory");
-		return;
-	}
-#endif
-	}
-	/*
-	 * generate an illegal instruction. Cannot catch this with
-	 * linker tricks when optimizations are disabled.
-	 */
-	__asm__ __volatile__("ud2");
-	return;
-}
-
-#define _uatomic_add(addr, v)						   \
-	(__uatomic_add((addr), caa_cast_long_keep_sign(v), sizeof(*(addr))))
-
-
-/* uatomic_inc */
-
-static inline __attribute__((always_inline))
-void __uatomic_inc(void *addr, int len)
-{
-	switch (len) {
-	case 1:
-	{
-		__asm__ __volatile__(
-		"lock; incb %0"
-			: "=m"(*__hp(1, addr))
-			:
-			: "memory");
-		return;
-	}
-	case 2:
-	{
-		__asm__ __volatile__(
-		"lock; incw %0"
-			: "=m"(*__hp(2, addr))
-			:
-			: "memory");
-		return;
-	}
-	case 4:
-	{
-		__asm__ __volatile__(
-		"lock; incl %0"
-			: "=m"(*__hp(4, addr))
-			:
-			: "memory");
-		return;
-	}
-#if (CAA_BITS_PER_LONG == 64)
-	case 8:
-	{
-		__asm__ __volatile__(
-		"lock; incq %0"
-			: "=m"(*__hp(8, addr))
-			:
-			: "memory");
-		return;
-	}
-#endif
-	}
-	/* generate an illegal instruction. Cannot catch this with linker tricks
-	 * when optimizations are disabled. */
-	__asm__ __volatile__("ud2");
-	return;
-}
-
-#define _uatomic_inc(addr)	(__uatomic_inc((addr), sizeof(*(addr))))
-
-/* uatomic_dec */
-
-static inline __attribute__((always_inline))
-void __uatomic_dec(void *addr, int len)
-{
-	switch (len) {
-	case 1:
-	{
-		__asm__ __volatile__(
-		"lock; decb %0"
-			: "=m"(*__hp(1, addr))
-			:
-			: "memory");
-		return;
-	}
-	case 2:
-	{
-		__asm__ __volatile__(
-		"lock; decw %0"
-			: "=m"(*__hp(2, addr))
-			:
-			: "memory");
-		return;
-	}
-	case 4:
-	{
-		__asm__ __volatile__(
-		"lock; decl %0"
-			: "=m"(*__hp(4, addr))
-			:
-			: "memory");
-		return;
-	}
-#if (CAA_BITS_PER_LONG == 64)
-	case 8:
-	{
-		__asm__ __volatile__(
-		"lock; decq %0"
-			: "=m"(*__hp(8, addr))
-			:
-			: "memory");
-		return;
-	}
-#endif
-	}
-	/*
-	 * generate an illegal instruction. Cannot catch this with
-	 * linker tricks when optimizations are disabled.
-	 */
-	__asm__ __volatile__("ud2");
-	return;
-}
-
-#define _uatomic_dec(addr)	(__uatomic_dec((addr), sizeof(*(addr))))
-
-#ifdef URCU_ARCH_X86_NO_CAS
-
-/* For backwards compat */
-#define CONFIG_RCU_COMPAT_ARCH 1
-
-extern int __rcu_cas_avail;
-extern int __rcu_cas_init(void);
-
-#define UATOMIC_COMPAT(insn)							\
-	((caa_likely(__rcu_cas_avail > 0))						\
-	? (_uatomic_##insn)							\
-		: ((caa_unlikely(__rcu_cas_avail < 0)				\
-			? ((__rcu_cas_init() > 0)				\
-				? (_uatomic_##insn)				\
-				: (compat_uatomic_##insn))			\
-			: (compat_uatomic_##insn))))
-
-/*
- * We leave the return value so we don't break the ABI, but remove the
- * return value from the API.
- */
-extern unsigned long _compat_uatomic_set(void *addr,
-					 unsigned long _new, int len);
-#define compat_uatomic_set(addr, _new)				     	       \
-	((void) _compat_uatomic_set((addr),				       \
-				caa_cast_long_keep_sign(_new),		       \
-				sizeof(*(addr))))
-
-
-extern unsigned long _compat_uatomic_xchg(void *addr,
-					  unsigned long _new, int len);
-#define compat_uatomic_xchg(addr, _new)					       \
-	((__typeof__(*(addr))) _compat_uatomic_xchg((addr),		       \
-						caa_cast_long_keep_sign(_new), \
-						sizeof(*(addr))))
-
-extern unsigned long _compat_uatomic_cmpxchg(void *addr, unsigned long old,
-					     unsigned long _new, int len);
-#define compat_uatomic_cmpxchg(addr, old, _new)				       \
-	((__typeof__(*(addr))) _compat_uatomic_cmpxchg((addr),		       \
-						caa_cast_long_keep_sign(old),  \
-						caa_cast_long_keep_sign(_new), \
-						sizeof(*(addr))))
-
-extern void _compat_uatomic_and(void *addr, unsigned long _new, int len);
-#define compat_uatomic_and(addr, v)				       \
-	(_compat_uatomic_and((addr),				       \
-			caa_cast_long_keep_sign(v),		       \
-			sizeof(*(addr))))
-
-extern void _compat_uatomic_or(void *addr, unsigned long _new, int len);
-#define compat_uatomic_or(addr, v)				       \
-	(_compat_uatomic_or((addr),				       \
-			  caa_cast_long_keep_sign(v),		       \
-			  sizeof(*(addr))))
-
-extern unsigned long _compat_uatomic_add_return(void *addr,
-						unsigned long _new, int len);
-#define compat_uatomic_add_return(addr, v)			            \
-	((__typeof__(*(addr))) _compat_uatomic_add_return((addr),     	    \
-						caa_cast_long_keep_sign(v), \
-						sizeof(*(addr))))
-
-#define compat_uatomic_add(addr, v)					       \
-		((void)compat_uatomic_add_return((addr), (v)))
-#define compat_uatomic_inc(addr)					       \
-		(compat_uatomic_add((addr), 1))
-#define compat_uatomic_dec(addr)					       \
-		(compat_uatomic_add((addr), -1))
-
-#else
-#define UATOMIC_COMPAT(insn)	(_uatomic_##insn)
-#endif
-
-/* Read is atomic even in compat mode */
-#define uatomic_set(addr, v)			\
-		UATOMIC_COMPAT(set(addr, v))
-
-#define uatomic_cmpxchg(addr, old, _new)	\
-		UATOMIC_COMPAT(cmpxchg(addr, old, _new))
-#define uatomic_xchg(addr, v)			\
-		UATOMIC_COMPAT(xchg(addr, v))
-
-#define uatomic_and(addr, v)		\
-		UATOMIC_COMPAT(and(addr, v))
-#define cmm_smp_mb__before_uatomic_and()	cmm_barrier()
-#define cmm_smp_mb__after_uatomic_and()		cmm_barrier()
-
-#define uatomic_or(addr, v)		\
-		UATOMIC_COMPAT(or(addr, v))
-#define cmm_smp_mb__before_uatomic_or()		cmm_barrier()
-#define cmm_smp_mb__after_uatomic_or()		cmm_barrier()
-
-#define uatomic_add_return(addr, v)		\
-		UATOMIC_COMPAT(add_return(addr, v))
-
-#define uatomic_add(addr, v)	UATOMIC_COMPAT(add(addr, v))
-#define cmm_smp_mb__before_uatomic_add()	cmm_barrier()
-#define cmm_smp_mb__after_uatomic_add()		cmm_barrier()
-
-#define uatomic_inc(addr)	UATOMIC_COMPAT(inc(addr))
-#define cmm_smp_mb__before_uatomic_inc()	cmm_barrier()
-#define cmm_smp_mb__after_uatomic_inc()		cmm_barrier()
-
-#define uatomic_dec(addr)	UATOMIC_COMPAT(dec(addr))
-#define cmm_smp_mb__before_uatomic_dec()	cmm_barrier()
-#define cmm_smp_mb__after_uatomic_dec()		cmm_barrier()
-
-#ifdef __cplusplus
-}
-#endif
-
-#include <urcu/uatomic/generic.h>
-
-#endif /* _URCU_ARCH_UATOMIC_X86_H */
-- 
2.39.2

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [lttng-dev] [PATCH 3/7] Use __atomic_thread_fence() for cmm_barrier()
  2023-03-17 21:37 [lttng-dev] [PATCH 0/7] Replace the custom code with gcc/clang __atomic builtins Ondřej Surý via lttng-dev
  2023-03-17 21:37 ` [lttng-dev] [PATCH 1/7] Require __atomic builtins to build Ondřej Surý via lttng-dev
  2023-03-17 21:37 ` [lttng-dev] [PATCH 2/7] Use gcc __atomic builtis for <urcu/uatomic.h> implementation Ondřej Surý via lttng-dev
@ 2023-03-17 21:37 ` Ondřej Surý via lttng-dev
  2023-03-20 18:06   ` Mathieu Desnoyers via lttng-dev
  2023-03-17 21:37 ` [lttng-dev] [PATCH 4/7] Replace the internal pointer manipulation with __atomic builtins Ondřej Surý via lttng-dev
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 30+ messages in thread
From: Ondřej Surý via lttng-dev @ 2023-03-17 21:37 UTC (permalink / raw)
  To: lttng-dev

Use __atomic_thread_fence(__ATOMIC_ACQ_REL) for cmm_barrier(), so
ThreadSanitizer can understand the memory synchronization.

FIXME: What should be the correct memory ordering here?

Signed-off-by: Ondřej Surý <ondrej@sury.org>
---
 include/urcu/compiler.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/urcu/compiler.h b/include/urcu/compiler.h
index 2f32b38..ede909f 100644
--- a/include/urcu/compiler.h
+++ b/include/urcu/compiler.h
@@ -28,7 +28,8 @@
 #define caa_likely(x)	__builtin_expect(!!(x), 1)
 #define caa_unlikely(x)	__builtin_expect(!!(x), 0)
 
-#define	cmm_barrier()	__asm__ __volatile__ ("" : : : "memory")
+/* FIXME: What would be a correct memory ordering here? */
+#define	cmm_barrier()	__atomic_signal_fence(__ATOMIC_ACQ_REL)
 
 /*
  * Instruct the compiler to perform only a single access to a variable
-- 
2.39.2

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [lttng-dev] [PATCH 4/7] Replace the internal pointer manipulation with __atomic builtins
  2023-03-17 21:37 [lttng-dev] [PATCH 0/7] Replace the custom code with gcc/clang __atomic builtins Ondřej Surý via lttng-dev
                   ` (2 preceding siblings ...)
  2023-03-17 21:37 ` [lttng-dev] [PATCH 3/7] Use __atomic_thread_fence() for cmm_barrier() Ondřej Surý via lttng-dev
@ 2023-03-17 21:37 ` Ondřej Surý via lttng-dev
  2023-03-20 18:25   ` Mathieu Desnoyers via lttng-dev
  2023-03-17 21:37 ` [lttng-dev] [PATCH 5/7] Use __atomic builtins to implement CMM_{LOAD, STORE}_SHARED Ondřej Surý via lttng-dev
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 30+ messages in thread
From: Ondřej Surý via lttng-dev @ 2023-03-17 21:37 UTC (permalink / raw)
  To: lttng-dev

Instead of custom code, use the __atomic builtins to implement the
rcu_dereference(), rcu_cmpxchg_pointer(), rcu_xchg_pointer() and
rcu_assign_pointer().

Signed-off-by: Ondřej Surý <ondrej@sury.org>
---
 include/urcu/arch.h           | 20 +++++++++
 include/urcu/arch/alpha.h     |  6 +++
 include/urcu/arch/arm.h       | 12 ++++++
 include/urcu/arch/mips.h      |  2 +
 include/urcu/arch/nios2.h     |  2 +
 include/urcu/arch/ppc.h       |  6 +++
 include/urcu/arch/s390.h      |  2 +
 include/urcu/arch/sparc64.h   |  6 +++
 include/urcu/arch/x86.h       | 20 +++++++++
 include/urcu/static/pointer.h | 77 +++++++----------------------------
 10 files changed, 90 insertions(+), 63 deletions(-)

diff --git a/include/urcu/arch.h b/include/urcu/arch.h
index d3914da..aec6fa1 100644
--- a/include/urcu/arch.h
+++ b/include/urcu/arch.h
@@ -21,6 +21,26 @@
 #ifndef _URCU_ARCH_H
 #define _URCU_ARCH_H
 
+#if !defined(__has_feature)
+#define __has_feature(x) 0
+#endif /* if !defined(__has_feature) */
+
+/* GCC defines __SANITIZE_ADDRESS__, so reuse the macro for clang */
+#if __has_feature(address_sanitizer)
+#define __SANITIZE_ADDRESS__ 1
+#endif /* if __has_feature(address_sanitizer) */
+
+#ifdef __SANITIZE_THREAD__
+/* FIXME: Somebody who understands the barriers should look into this */
+#define cmm_mb() __atomic_thread_fence(__ATOMIC_ACQ_REL)
+#define cmm_rmb() __atomic_thread_fence(__ATOMIC_ACQUIRE)
+#define cmm_wmb() __atomic_thread_fence(__ATOMIC_RELEASE)
+#define cmm_smp_mb() __atomic_thread_fence(__ATOMIC_ACQ_REL)
+#define cmm_smp_rmb() __atomic_thread_fence(__ATOMIC_ACQUIRE)
+#define cmm_smp_wmb() __atomic_thread_fence(__ATOMIC_RELEASE)
+#define cmm_read_barrier_depends() __atomic_thread_fence(__ATOMIC_ACQ_REL)
+#endif
+
 /*
  * Architecture detection using compiler defines.
  *
diff --git a/include/urcu/arch/alpha.h b/include/urcu/arch/alpha.h
index dc33e28..84526ef 100644
--- a/include/urcu/arch/alpha.h
+++ b/include/urcu/arch/alpha.h
@@ -29,9 +29,15 @@
 extern "C" {
 #endif
 
+#ifndef cmm_mb
 #define cmm_mb()			__asm__ __volatile__ ("mb":::"memory")
+#endif
+#ifndef cmm_wmb
 #define cmm_wmb()			__asm__ __volatile__ ("wmb":::"memory")
+#endif
+#ifndef cmm_read_barrier_depends
 #define cmm_read_barrier_depends()	__asm__ __volatile__ ("mb":::"memory")
+#endif
 
 /*
  * On Linux, define the membarrier system call number if not yet available in
diff --git a/include/urcu/arch/arm.h b/include/urcu/arch/arm.h
index 54ca4fa..4950e13 100644
--- a/include/urcu/arch/arm.h
+++ b/include/urcu/arch/arm.h
@@ -42,16 +42,28 @@ extern "C" {
 /*
  * Issues full system DMB operation.
  */
+#ifndef cmm_mb
 #define cmm_mb()	__asm__ __volatile__ ("dmb sy":::"memory")
+#endif
+#ifndef cmm_rmb
 #define cmm_rmb()	__asm__ __volatile__ ("dmb sy":::"memory")
+#endif
+#ifndef cmm_wmb
 #define cmm_wmb()	__asm__ __volatile__ ("dmb sy":::"memory")
+#endif
 
 /*
  * Issues DMB operation only to the inner shareable domain.
  */
+#ifndef cmm_smp_mb
 #define cmm_smp_mb()	__asm__ __volatile__ ("dmb ish":::"memory")
+#endif
+#ifndef cmm_smp_rmb
 #define cmm_smp_rmb()	__asm__ __volatile__ ("dmb ish":::"memory")
+#endif
+#ifndef cmm_smp_wmb
 #define cmm_smp_wmb()	__asm__ __volatile__ ("dmb ish":::"memory")
+#endif
 
 #endif /* URCU_ARCH_ARMV7 */
 
diff --git a/include/urcu/arch/mips.h b/include/urcu/arch/mips.h
index ea5b7e9..b9ee021 100644
--- a/include/urcu/arch/mips.h
+++ b/include/urcu/arch/mips.h
@@ -30,11 +30,13 @@
 extern "C" {
 #endif
 
+#ifndef cmm_mb
 #define cmm_mb()			__asm__ __volatile__ (		    \
 					"	.set	mips2		\n" \
 					"	sync			\n" \
 					"	.set	mips0		\n" \
 					:::"memory")
+#endif
 
 #ifdef __cplusplus
 }
diff --git a/include/urcu/arch/nios2.h b/include/urcu/arch/nios2.h
index b4f3e50..5def45c 100644
--- a/include/urcu/arch/nios2.h
+++ b/include/urcu/arch/nios2.h
@@ -29,7 +29,9 @@
 extern "C" {
 #endif
 
+#ifndef cmm_mb
 #define cmm_mb()	cmm_barrier()
+#endif
 
 #ifdef __cplusplus
 }
diff --git a/include/urcu/arch/ppc.h b/include/urcu/arch/ppc.h
index 791529e..b8ec40d 100644
--- a/include/urcu/arch/ppc.h
+++ b/include/urcu/arch/ppc.h
@@ -48,7 +48,9 @@ extern "C" {
  * order cacheable and non-cacheable memory operations separately---i.e.
  * not the latter against the former.
  */
+#ifndef cmm_mb
 #define cmm_mb()         __asm__ __volatile__ ("sync":::"memory")
+#endif
 
 /*
  * lwsync orders loads in cacheable memory with respect to other loads,
@@ -56,8 +58,12 @@ extern "C" {
  * Therefore, use it for barriers ordering accesses to cacheable memory
  * only.
  */
+#ifndef cmm_smp_rmb
 #define cmm_smp_rmb()    __asm__ __volatile__ (LWSYNC_OPCODE:::"memory")
+#endif
+#ifndef cmm_smp_rmb
 #define cmm_smp_wmb()    __asm__ __volatile__ (LWSYNC_OPCODE:::"memory")
+#endif
 
 #define mftbl()						\
 	__extension__					\
diff --git a/include/urcu/arch/s390.h b/include/urcu/arch/s390.h
index 67461b4..2733873 100644
--- a/include/urcu/arch/s390.h
+++ b/include/urcu/arch/s390.h
@@ -39,7 +39,9 @@ extern "C" {
 
 #define CAA_CACHE_LINE_SIZE	128
 
+#ifndef cmm_mb
 #define cmm_mb()    __asm__ __volatile__("bcr 15,0" : : : "memory")
+#endif
 
 #define HAS_CAA_GET_CYCLES
 
diff --git a/include/urcu/arch/sparc64.h b/include/urcu/arch/sparc64.h
index 1ff40f5..32a6b0e 100644
--- a/include/urcu/arch/sparc64.h
+++ b/include/urcu/arch/sparc64.h
@@ -49,9 +49,15 @@ __asm__ __volatile__("ba,pt %%xcc, 1f\n\t"	\
 		     "1:\n"			\
 		     : : : "memory")
 
+#ifndef cmm_mb
 #define cmm_mb()	membar_safe("#LoadLoad | #LoadStore | #StoreStore | #StoreLoad")
+#endif
+#ifndef cmm_rmb
 #define cmm_rmb()	membar_safe("#LoadLoad")
+#endif
+#ifndef cmm_wmb
 #define cmm_wmb()	membar_safe("#StoreStore")
+#endif
 
 #ifdef __cplusplus
 }
diff --git a/include/urcu/arch/x86.h b/include/urcu/arch/x86.h
index 744f9f9..6be9d38 100644
--- a/include/urcu/arch/x86.h
+++ b/include/urcu/arch/x86.h
@@ -46,15 +46,23 @@ extern "C" {
 /* For backwards compat */
 #define CONFIG_RCU_HAVE_FENCE 1
 
+#ifndef cmm_mb
 #define cmm_mb()    __asm__ __volatile__ ("mfence":::"memory")
+#endif
 
 /*
  * Define cmm_rmb/cmm_wmb to "strict" barriers that may be needed when
  * using SSE or working with I/O areas.  cmm_smp_rmb/cmm_smp_wmb are
  * only compiler barriers, which is enough for general use.
  */
+#ifndef cmm_rmb
 #define cmm_rmb()     __asm__ __volatile__ ("lfence":::"memory")
+#endif
+
+#ifndef cmm_wmb
 #define cmm_wmb()     __asm__ __volatile__ ("sfence"::: "memory")
+#endif
+
 #define cmm_smp_rmb() cmm_barrier()
 #define cmm_smp_wmb() cmm_barrier()
 
@@ -72,15 +80,27 @@ extern "C" {
  * under our feet; cmm_smp_wmb() ceases to be a nop for these processors.
  */
 #if (CAA_BITS_PER_LONG == 32)
+#ifndef cmm_mb
 #define cmm_mb()    __asm__ __volatile__ ("lock; addl $0,0(%%esp)":::"memory")
+#endif
+#ifndef cmm_rmb
 #define cmm_rmb()    __asm__ __volatile__ ("lock; addl $0,0(%%esp)":::"memory")
+#endif
+#ifndef cmm_wmb
 #define cmm_wmb()    __asm__ __volatile__ ("lock; addl $0,0(%%esp)":::"memory")
+#endif
 #else
+#ifndef cmm_mb
 #define cmm_mb()    __asm__ __volatile__ ("lock; addl $0,0(%%rsp)":::"memory")
+#endif
+#ifndef cmm_rmb
 #define cmm_rmb()    __asm__ __volatile__ ("lock; addl $0,0(%%rsp)":::"memory")
+#endif
+#ifndef cmm_wmb
 #define cmm_wmb()    __asm__ __volatile__ ("lock; addl $0,0(%%rsp)":::"memory")
 #endif
 #endif
+#endif
 
 #define caa_cpu_relax()	__asm__ __volatile__ ("rep; nop" : : : "memory")
 
diff --git a/include/urcu/static/pointer.h b/include/urcu/static/pointer.h
index 9e46a57..3f116f3 100644
--- a/include/urcu/static/pointer.h
+++ b/include/urcu/static/pointer.h
@@ -38,6 +38,8 @@
 extern "C" {
 #endif
 
+#define _rcu_get_pointer(addr) __atomic_load_n(addr, __ATOMIC_CONSUME)
+
 /**
  * _rcu_dereference - reads (copy) a RCU-protected pointer to a local variable
  * into a RCU read-side critical section. The pointer can later be safely
@@ -49,14 +51,6 @@ extern "C" {
  * Inserts memory barriers on architectures that require them (currently only
  * Alpha) and documents which pointers are protected by RCU.
  *
- * With C standards prior to C11/C++11, the compiler memory barrier in
- * CMM_LOAD_SHARED() ensures that value-speculative optimizations (e.g.
- * VSS: Value Speculation Scheduling) does not perform the data read
- * before the pointer read by speculating the value of the pointer.
- * Correct ordering is ensured because the pointer is read as a volatile
- * access. This acts as a global side-effect operation, which forbids
- * reordering of dependent memory operations.
- *
  * With C standards C11/C++11, concerns about dependency-breaking
  * optimizations are taken care of by the "memory_order_consume" atomic
  * load.
@@ -65,10 +59,6 @@ extern "C" {
  * explicit because the pointer used as input argument is a pointer,
  * not an _Atomic type as required by C11/C++11.
  *
- * By defining URCU_DEREFERENCE_USE_VOLATILE, the user requires use of
- * volatile access to implement rcu_dereference rather than
- * memory_order_consume load from the C11/C++11 standards.
- *
  * This may improve performance on weakly-ordered architectures where
  * the compiler implements memory_order_consume as a
  * memory_order_acquire, which is stricter than required by the
@@ -83,35 +73,7 @@ extern "C" {
  * meets the 10-line criterion in LGPL, allowing this function to be
  * expanded directly in non-LGPL code.
  */
-
-#if !defined (URCU_DEREFERENCE_USE_VOLATILE) &&		\
-	((defined (__cplusplus) && __cplusplus >= 201103L) ||	\
-	(defined (__STDC_VERSION__) && __STDC_VERSION__ >= 201112L))
-# define __URCU_DEREFERENCE_USE_ATOMIC_CONSUME
-#endif
-
-/*
- * If p is const (the pointer itself, not what it points to), using
- * __typeof__(p) would declare a const variable, leading to
- * -Wincompatible-pointer-types errors.  Using the statement expression
- * makes it an rvalue and gets rid of the const-ness.
- */
-#ifdef __URCU_DEREFERENCE_USE_ATOMIC_CONSUME
-# define _rcu_dereference(p) __extension__ ({						\
-				__typeof__(__extension__ ({				\
-					__typeof__(p) __attribute__((unused)) _________p0 = { 0 }; \
-					_________p0;					\
-				})) _________p1;					\
-				__atomic_load(&(p), &_________p1, __ATOMIC_CONSUME);	\
-				(_________p1);						\
-			})
-#else
-# define _rcu_dereference(p) __extension__ ({						\
-				__typeof__(p) _________p1 = CMM_LOAD_SHARED(p);		\
-				cmm_smp_read_barrier_depends();				\
-				(_________p1);						\
-			})
-#endif
+#define _rcu_dereference(p) _rcu_get_pointer(&(p))
 
 /**
  * _rcu_cmpxchg_pointer - same as rcu_assign_pointer, but tests if the pointer
@@ -126,12 +88,12 @@ extern "C" {
  * meets the 10-line criterion in LGPL, allowing this function to be
  * expanded directly in non-LGPL code.
  */
-#define _rcu_cmpxchg_pointer(p, old, _new)				\
-	__extension__							\
-	({								\
-		__typeof__(*p) _________pold = (old);			\
-		__typeof__(*p) _________pnew = (_new);			\
-		uatomic_cmpxchg(p, _________pold, _________pnew);	\
+#define _rcu_cmpxchg_pointer(p, old, _new)						\
+	({										\
+		__typeof__(*(p)) __old = old;						\
+		__atomic_compare_exchange_n(p, &__old, _new, 0,				\
+					    __ATOMIC_ACQ_REL, __ATOMIC_CONSUME);	\
+		__old;									\
 	})
 
 /**
@@ -145,22 +107,11 @@ extern "C" {
  * meets the 10-line criterion in LGPL, allowing this function to be
  * expanded directly in non-LGPL code.
  */
-#define _rcu_xchg_pointer(p, v)				\
-	__extension__					\
-	({						\
-		__typeof__(*p) _________pv = (v);	\
-		uatomic_xchg(p, _________pv);		\
-	})
-
+#define _rcu_xchg_pointer(p, v) \
+	__atomic_exchange_n(p, v, __ATOMIC_ACQ_REL)
 
-#define _rcu_set_pointer(p, v)				\
-	do {						\
-		__typeof__(*p) _________pv = (v);	\
-		if (!__builtin_constant_p(v) || 	\
-		    ((v) != NULL))			\
-			cmm_wmb();				\
-		uatomic_set(p, _________pv);		\
-	} while (0)
+#define _rcu_set_pointer(p, v) \
+	__atomic_store_n(p, v, __ATOMIC_RELEASE)
 
 /**
  * _rcu_assign_pointer - assign (publicize) a pointer to a new data structure
@@ -178,7 +129,7 @@ extern "C" {
  * meets the 10-line criterion in LGPL, allowing this function to be
  * expanded directly in non-LGPL code.
  */
-#define _rcu_assign_pointer(p, v)	_rcu_set_pointer(&(p), v)
+#define _rcu_assign_pointer(p, v) rcu_set_pointer(&(p), v)
 
 #ifdef __cplusplus
 }
-- 
2.39.2

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [lttng-dev] [PATCH 5/7] Use __atomic builtins to implement CMM_{LOAD, STORE}_SHARED
  2023-03-17 21:37 [lttng-dev] [PATCH 0/7] Replace the custom code with gcc/clang __atomic builtins Ondřej Surý via lttng-dev
                   ` (3 preceding siblings ...)
  2023-03-17 21:37 ` [lttng-dev] [PATCH 4/7] Replace the internal pointer manipulation with __atomic builtins Ondřej Surý via lttng-dev
@ 2023-03-17 21:37 ` Ondřej Surý via lttng-dev
  2023-03-20 18:28   ` Mathieu Desnoyers via lttng-dev
  2023-03-17 21:37 ` [lttng-dev] [PATCH 6/7] Fix: uatomic_or() need retyping to uintptr_t in rculfhash.c Ondřej Surý via lttng-dev
  2023-03-17 21:37 ` [lttng-dev] [PATCH 7/7] Experiment: Add explicit memory barrier in free_completion() Ondřej Surý via lttng-dev
  6 siblings, 1 reply; 30+ messages in thread
From: Ondřej Surý via lttng-dev @ 2023-03-17 21:37 UTC (permalink / raw)
  To: lttng-dev

Instead of using CMM_ACCESS_ONCE() with memory barriers, use __atomic
builtins with relaxed memory ordering to implement CMM_LOAD_SHARED() and
CMM_STORE_SHARED().

Signed-off-by: Ondřej Surý <ondrej@sury.org>
---
 include/urcu/system.h | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/include/urcu/system.h b/include/urcu/system.h
index faae390..99e7443 100644
--- a/include/urcu/system.h
+++ b/include/urcu/system.h
@@ -26,7 +26,7 @@
  * Identify a shared load. A cmm_smp_rmc() or cmm_smp_mc() should come
  * before the load.
  */
-#define _CMM_LOAD_SHARED(p)	       CMM_ACCESS_ONCE(p)
+#define _CMM_LOAD_SHARED(p)	       __atomic_load_n(&(p), __ATOMIC_RELAXED)
 
 /*
  * Load a data from shared memory, doing a cache flush if required.
@@ -42,7 +42,7 @@
  * Identify a shared store. A cmm_smp_wmc() or cmm_smp_mc() should
  * follow the store.
  */
-#define _CMM_STORE_SHARED(x, v)	__extension__ ({ CMM_ACCESS_ONCE(x) = (v); })
+#define _CMM_STORE_SHARED(x, v)	__atomic_store_n(&(x), (v), __ATOMIC_RELAXED)
 
 /*
  * Store v into x, where x is located in shared memory. Performs the
@@ -51,9 +51,8 @@
 #define CMM_STORE_SHARED(x, v)						\
 	__extension__							\
 	({								\
-		__typeof__(x) _v = _CMM_STORE_SHARED(x, v);		\
+		_CMM_STORE_SHARED(x, v);				\
 		cmm_smp_wmc();						\
-		_v = _v;	/* Work around clang "unused result" */	\
 	})
 
 #endif /* _URCU_SYSTEM_H */
-- 
2.39.2

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [lttng-dev] [PATCH 6/7] Fix: uatomic_or() need retyping to uintptr_t in rculfhash.c
  2023-03-17 21:37 [lttng-dev] [PATCH 0/7] Replace the custom code with gcc/clang __atomic builtins Ondřej Surý via lttng-dev
                   ` (4 preceding siblings ...)
  2023-03-17 21:37 ` [lttng-dev] [PATCH 5/7] Use __atomic builtins to implement CMM_{LOAD, STORE}_SHARED Ondřej Surý via lttng-dev
@ 2023-03-17 21:37 ` Ondřej Surý via lttng-dev
  2023-03-20 18:31   ` Mathieu Desnoyers via lttng-dev
  2023-03-17 21:37 ` [lttng-dev] [PATCH 7/7] Experiment: Add explicit memory barrier in free_completion() Ondřej Surý via lttng-dev
  6 siblings, 1 reply; 30+ messages in thread
From: Ondřej Surý via lttng-dev @ 2023-03-17 21:37 UTC (permalink / raw)
  To: lttng-dev

When adding REMOVED_FLAG to the pointers in the rculfhash
implementation, retype the generic pointer to uintptr_t to fix the
compiler error.

Signed-off-by: Ondřej Surý <ondrej@sury.org>
---
 src/rculfhash.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/rculfhash.c b/src/rculfhash.c
index b456415..863387e 100644
--- a/src/rculfhash.c
+++ b/src/rculfhash.c
@@ -1198,7 +1198,7 @@ int _cds_lfht_del(struct cds_lfht *ht, unsigned long size,
 	 * Knowing which wins the race will be known after the garbage
 	 * collection phase, stay tuned!
 	 */
-	uatomic_or(&node->next, REMOVED_FLAG);
+	uatomic_or((uintptr_t *)&node->next, REMOVED_FLAG);
 	/* We performed the (logical) deletion. */
 
 	/*
@@ -1441,7 +1441,7 @@ void remove_table_partition(struct cds_lfht *ht, unsigned long i,
 		dbg_printf("remove entry: order %lu index %lu hash %lu\n",
 			   i, j, j);
 		/* Set the REMOVED_FLAG to freeze the ->next for gc */
-		uatomic_or(&fini_bucket->next, REMOVED_FLAG);
+		uatomic_or((uintptr_t *)&fini_bucket->next, REMOVED_FLAG);
 		_cds_lfht_gc_bucket(parent_bucket, fini_bucket);
 	}
 	ht->flavor->read_unlock();
-- 
2.39.2

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [lttng-dev] [PATCH 7/7] Experiment: Add explicit memory barrier in free_completion()
  2023-03-17 21:37 [lttng-dev] [PATCH 0/7] Replace the custom code with gcc/clang __atomic builtins Ondřej Surý via lttng-dev
                   ` (5 preceding siblings ...)
  2023-03-17 21:37 ` [lttng-dev] [PATCH 6/7] Fix: uatomic_or() need retyping to uintptr_t in rculfhash.c Ondřej Surý via lttng-dev
@ 2023-03-17 21:37 ` Ondřej Surý via lttng-dev
  2023-03-20 18:37   ` Mathieu Desnoyers via lttng-dev
  6 siblings, 1 reply; 30+ messages in thread
From: Ondřej Surý via lttng-dev @ 2023-03-17 21:37 UTC (permalink / raw)
  To: lttng-dev

FIXME: This is experiment that adds explicit memory barrier in the
free_completion in the workqueue.c, so ThreadSanitizer knows it's ok to
free the resources.

Signed-off-by: Ondřej Surý <ondrej@sury.org>
---
 src/workqueue.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/workqueue.c b/src/workqueue.c
index 1039d72..f21907f 100644
--- a/src/workqueue.c
+++ b/src/workqueue.c
@@ -377,6 +377,7 @@ void free_completion(struct urcu_ref *ref)
 	struct urcu_workqueue_completion *completion;
 
 	completion = caa_container_of(ref, struct urcu_workqueue_completion, ref);
+	assert(!urcu_ref_get_unless_zero(&completion->ref));
 	free(completion);
 }
 
-- 
2.39.2

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [lttng-dev] [PATCH 2/7] Use gcc __atomic builtis for <urcu/uatomic.h> implementation
  2023-03-17 21:37 ` [lttng-dev] [PATCH 2/7] Use gcc __atomic builtis for <urcu/uatomic.h> implementation Ondřej Surý via lttng-dev
@ 2023-03-20 18:03   ` Mathieu Desnoyers via lttng-dev
  2023-03-20 18:13     ` Mathieu Desnoyers via lttng-dev
                       ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Mathieu Desnoyers via lttng-dev @ 2023-03-20 18:03 UTC (permalink / raw)
  To: Ondřej Surý, lttng-dev, paulmck

On 2023-03-17 17:37, Ondřej Surý via lttng-dev wrote:
> Replace the custom assembly code in include/urcu/uatomic/ with __atomic
> builtins provided by C11-compatible compiler.
> 
[...]
> +#define UATOMIC_HAS_ATOMIC_BYTE
> +#define UATOMIC_HAS_ATOMIC_SHORT
> +
> +#define uatomic_set(addr, v) __atomic_store_n(addr, v, __ATOMIC_RELEASE)
> +
> +#define uatomic_read(addr) __atomic_load_n((addr), __ATOMIC_CONSUME)
> +
> +#define uatomic_xchg(addr, v) __atomic_exchange_n((addr), (v), __ATOMIC_ACQ_REL)
> +
> +#define uatomic_cmpxchg(addr, old, new) \
> +	({									\
> +		__typeof__(*(addr)) __old = old;				\
> +		__atomic_compare_exchange_n(addr, &__old, new, 0,		\
> +					    __ATOMIC_ACQ_REL, __ATOMIC_CONSUME);\

In doc/uatomic-api.md, we document:

"```c
type uatomic_cmpxchg(type *addr, type old, type new);
```

An atomic read-modify-write operation that performs this
sequence of operations atomically: check if `addr` contains `old`.
If true, then replace the content of `addr` by `new`. Return the
value previously contained by `addr`. This function implies a full
memory barrier before and after the atomic operation."

This would map to a "__ATOMIC_ACQ_REL" semantic on cmpxchg failure
rather than __ATOMIC_CONSUME".

> +		__old;								\
> +	})
> +
> +#define uatomic_add_return(addr, v) \
> +	__atomic_add_fetch((addr), (v), __ATOMIC_ACQ_REL)
> +
> +#define uatomic_add(addr, v) \
> +	(void)__atomic_add_fetch((addr), (v), __ATOMIC_RELAXED)
> +
> +#define uatomic_sub_return(addr, v) \
> +	__atomic_sub_fetch((addr), (v), __ATOMIC_ACQ_REL)
> +
> +#define uatomic_sub(addr, v) \
> +	(void)__atomic_sub_fetch((addr), (v), __ATOMIC_RELAXED)
> +
> +#define uatomic_and(addr, mask) \
> +	(void)__atomic_and_fetch((addr), (mask), __ATOMIC_RELAXED)
> +
> +#define uatomic_or(addr, mask)						\
> +	(void)__atomic_or_fetch((addr), (mask), __ATOMIC_RELAXED)
> +
> +#define uatomic_inc(addr) (void)__atomic_add_fetch((addr), 1, __ATOMIC_RELAXED)
> +#define uatomic_dec(addr) (void)__atomic_sub_fetch((addr), 1, __ATOMIC_RELAXED)
> +
> +#define cmm_smp_mb__before_uatomic_and()	__atomic_thread_fence(__ATOMIC_ACQ_REL)
> +#define cmm_smp_mb__after_uatomic_and()		__atomic_thread_fence(__ATOMIC_ACQ_REL)
> +#define cmm_smp_mb__before_uatomic_or()		__atomic_thread_fence(__ATOMIC_ACQ_REL)
> +#define cmm_smp_mb__after_uatomic_or()		__atomic_thread_fence(__ATOMIC_ACQ_REL)
> +#define cmm_smp_mb__before_uatomic_add()	__atomic_thread_fence(__ATOMIC_ACQ_REL)
> +#define cmm_smp_mb__after_uatomic_add()		__atomic_thread_fence(__ATOMIC_ACQ_REL)
> +#define cmm_smp_mb__before_uatomic_sub()	cmm_smp_mb__before_uatomic_add()
> +#define cmm_smp_mb__after_uatomic_sub()		cmm_smp_mb__after_uatomic_add()
> +#define cmm_smp_mb__before_uatomic_inc()	cmm_smp_mb__before_uatomic_add()
> +#define cmm_smp_mb__after_uatomic_inc()		cmm_smp_mb__after_uatomic_add()
> +#define cmm_smp_mb__before_uatomic_dec()	cmm_smp_mb__before_uatomic_add()
> +#define cmm_smp_mb__after_uatomic_dec()		cmm_smp_mb__after_uatomic_add()
> +
> +#define cmm_smp_mb()				cmm_mb()

While OK for the general case, I would recommend that we immediately 
implement something more efficient on x86 32/64 which takes into account 
that __ATOMIC_ACQ_REL atomic operations are implemented with LOCK 
prefixed atomic ops, which imply the barrier already, leaving the 
before/after_uatomic_*() as no-ops.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [lttng-dev] [PATCH 3/7] Use __atomic_thread_fence() for cmm_barrier()
  2023-03-17 21:37 ` [lttng-dev] [PATCH 3/7] Use __atomic_thread_fence() for cmm_barrier() Ondřej Surý via lttng-dev
@ 2023-03-20 18:06   ` Mathieu Desnoyers via lttng-dev
  2023-03-20 18:14     ` Mathieu Desnoyers via lttng-dev
  0 siblings, 1 reply; 30+ messages in thread
From: Mathieu Desnoyers via lttng-dev @ 2023-03-20 18:06 UTC (permalink / raw)
  To: Ondřej Surý, lttng-dev, paulmck

On 2023-03-17 17:37, Ondřej Surý via lttng-dev wrote:
> Use __atomic_thread_fence(__ATOMIC_ACQ_REL) for cmm_barrier(), so
> ThreadSanitizer can understand the memory synchronization.

You should update the patch subject and commit message to replace 
"thread" by "signal".

> 
> FIXME: What should be the correct memory ordering here?

ACQ_REL is what we want here, I think this is fine. We want to prevent
the compiler from reordering loads/stores across the fence, but don't
want any barrier instructions issued.

Thanks,

Mathieu

> 
> Signed-off-by: Ondřej Surý <ondrej@sury.org>
> ---
>   include/urcu/compiler.h | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/include/urcu/compiler.h b/include/urcu/compiler.h
> index 2f32b38..ede909f 100644
> --- a/include/urcu/compiler.h
> +++ b/include/urcu/compiler.h
> @@ -28,7 +28,8 @@
>   #define caa_likely(x)	__builtin_expect(!!(x), 1)
>   #define caa_unlikely(x)	__builtin_expect(!!(x), 0)
>   
> -#define	cmm_barrier()	__asm__ __volatile__ ("" : : : "memory")
> +/* FIXME: What would be a correct memory ordering here? */
> +#define	cmm_barrier()	__atomic_signal_fence(__ATOMIC_ACQ_REL)
>   
>   /*
>    * Instruct the compiler to perform only a single access to a variable

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [lttng-dev] [PATCH 2/7] Use gcc __atomic builtis for <urcu/uatomic.h> implementation
  2023-03-20 18:03   ` Mathieu Desnoyers via lttng-dev
@ 2023-03-20 18:13     ` Mathieu Desnoyers via lttng-dev
  2023-03-20 18:28     ` Ondřej Surý via lttng-dev
  2023-03-20 19:38     ` Duncan Sands via lttng-dev
  2 siblings, 0 replies; 30+ messages in thread
From: Mathieu Desnoyers via lttng-dev @ 2023-03-20 18:13 UTC (permalink / raw)
  To: Ondřej Surý, lttng-dev, paulmck

On 2023-03-20 14:03, Mathieu Desnoyers via lttng-dev wrote:
> On 2023-03-17 17:37, Ondřej Surý via lttng-dev wrote:
>> Replace the custom assembly code in include/urcu/uatomic/ with __atomic
>> builtins provided by C11-compatible compiler.
>>
> [...]
>> +#define UATOMIC_HAS_ATOMIC_BYTE
>> +#define UATOMIC_HAS_ATOMIC_SHORT
>> +
>> +#define uatomic_set(addr, v) __atomic_store_n(addr, v, __ATOMIC_RELEASE)
>> +
>> +#define uatomic_read(addr) __atomic_load_n((addr), __ATOMIC_CONSUME)
>> +
>> +#define uatomic_xchg(addr, v) __atomic_exchange_n((addr), (v), 
>> __ATOMIC_ACQ_REL)
>> +
>> +#define uatomic_cmpxchg(addr, old, new) \
>> +    ({                                    \
>> +        __typeof__(*(addr)) __old = old;                \
>> +        __atomic_compare_exchange_n(addr, &__old, new, 0,        \
>> +                        __ATOMIC_ACQ_REL, __ATOMIC_CONSUME);\
> 

Actually, I suspect we'd want to change __ATOMIC_ACQ_REL to 
__ATOMIC_SEQ_CST everywhere, because we want total order.

Thanks,

Mathieu

> In doc/uatomic-api.md, we document:
> 
> "```c
> type uatomic_cmpxchg(type *addr, type old, type new);
> ```
> 
> An atomic read-modify-write operation that performs this
> sequence of operations atomically: check if `addr` contains `old`.
> If true, then replace the content of `addr` by `new`. Return the
> value previously contained by `addr`. This function implies a full
> memory barrier before and after the atomic operation."
> 
> This would map to a "__ATOMIC_ACQ_REL" semantic on cmpxchg failure
> rather than __ATOMIC_CONSUME".
> 
>> +        __old;                                \
>> +    })
>> +
>> +#define uatomic_add_return(addr, v) \
>> +    __atomic_add_fetch((addr), (v), __ATOMIC_ACQ_REL)
>> +
>> +#define uatomic_add(addr, v) \
>> +    (void)__atomic_add_fetch((addr), (v), __ATOMIC_RELAXED)
>> +
>> +#define uatomic_sub_return(addr, v) \
>> +    __atomic_sub_fetch((addr), (v), __ATOMIC_ACQ_REL)
>> +
>> +#define uatomic_sub(addr, v) \
>> +    (void)__atomic_sub_fetch((addr), (v), __ATOMIC_RELAXED)
>> +
>> +#define uatomic_and(addr, mask) \
>> +    (void)__atomic_and_fetch((addr), (mask), __ATOMIC_RELAXED)
>> +
>> +#define uatomic_or(addr, mask)                        \
>> +    (void)__atomic_or_fetch((addr), (mask), __ATOMIC_RELAXED)
>> +
>> +#define uatomic_inc(addr) (void)__atomic_add_fetch((addr), 1, 
>> __ATOMIC_RELAXED)
>> +#define uatomic_dec(addr) (void)__atomic_sub_fetch((addr), 1, 
>> __ATOMIC_RELAXED)
>> +
>> +#define cmm_smp_mb__before_uatomic_and()    
>> __atomic_thread_fence(__ATOMIC_ACQ_REL)
>> +#define cmm_smp_mb__after_uatomic_and()        
>> __atomic_thread_fence(__ATOMIC_ACQ_REL)
>> +#define cmm_smp_mb__before_uatomic_or()        
>> __atomic_thread_fence(__ATOMIC_ACQ_REL)
>> +#define cmm_smp_mb__after_uatomic_or()        
>> __atomic_thread_fence(__ATOMIC_ACQ_REL)
>> +#define cmm_smp_mb__before_uatomic_add()    
>> __atomic_thread_fence(__ATOMIC_ACQ_REL)
>> +#define cmm_smp_mb__after_uatomic_add()        
>> __atomic_thread_fence(__ATOMIC_ACQ_REL)
>> +#define cmm_smp_mb__before_uatomic_sub()    
>> cmm_smp_mb__before_uatomic_add()
>> +#define cmm_smp_mb__after_uatomic_sub()        
>> cmm_smp_mb__after_uatomic_add()
>> +#define cmm_smp_mb__before_uatomic_inc()    
>> cmm_smp_mb__before_uatomic_add()
>> +#define cmm_smp_mb__after_uatomic_inc()        
>> cmm_smp_mb__after_uatomic_add()
>> +#define cmm_smp_mb__before_uatomic_dec()    
>> cmm_smp_mb__before_uatomic_add()
>> +#define cmm_smp_mb__after_uatomic_dec()        
>> cmm_smp_mb__after_uatomic_add()
>> +
>> +#define cmm_smp_mb()                cmm_mb()
> 
> While OK for the general case, I would recommend that we immediately 
> implement something more efficient on x86 32/64 which takes into account 
> that __ATOMIC_ACQ_REL atomic operations are implemented with LOCK 
> prefixed atomic ops, which imply the barrier already, leaving the 
> before/after_uatomic_*() as no-ops.
> 
> Thanks,
> 
> Mathieu
> 

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [lttng-dev] [PATCH 3/7] Use __atomic_thread_fence() for cmm_barrier()
  2023-03-20 18:06   ` Mathieu Desnoyers via lttng-dev
@ 2023-03-20 18:14     ` Mathieu Desnoyers via lttng-dev
  0 siblings, 0 replies; 30+ messages in thread
From: Mathieu Desnoyers via lttng-dev @ 2023-03-20 18:14 UTC (permalink / raw)
  To: Ondřej Surý, lttng-dev, paulmck

On 2023-03-20 14:06, Mathieu Desnoyers via lttng-dev wrote:
> On 2023-03-17 17:37, Ondřej Surý via lttng-dev wrote:
>> Use __atomic_thread_fence(__ATOMIC_ACQ_REL) for cmm_barrier(), so
>> ThreadSanitizer can understand the memory synchronization.
> 
> You should update the patch subject and commit message to replace 
> "thread" by "signal".
> 
>>
>> FIXME: What should be the correct memory ordering here?
> 
> ACQ_REL is what we want here, I think this is fine. We want to prevent
> the compiler from reordering loads/stores across the fence, but don't
> want any barrier instructions issued.

We should probably make it SEQ_CST here as well, even though I doubt it 
changes anything in this very particular case of atomic_signal_fence.

Thanks,

Mathieu

> 
> Thanks,
> 
> Mathieu
> 
>>
>> Signed-off-by: Ondřej Surý <ondrej@sury.org>
>> ---
>>   include/urcu/compiler.h | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/urcu/compiler.h b/include/urcu/compiler.h
>> index 2f32b38..ede909f 100644
>> --- a/include/urcu/compiler.h
>> +++ b/include/urcu/compiler.h
>> @@ -28,7 +28,8 @@
>>   #define caa_likely(x)    __builtin_expect(!!(x), 1)
>>   #define caa_unlikely(x)    __builtin_expect(!!(x), 0)
>> -#define    cmm_barrier()    __asm__ __volatile__ ("" : : : "memory")
>> +/* FIXME: What would be a correct memory ordering here? */
>> +#define    cmm_barrier()    __atomic_signal_fence(__ATOMIC_ACQ_REL)
>>   /*
>>    * Instruct the compiler to perform only a single access to a variable
> 

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [lttng-dev] [PATCH 4/7] Replace the internal pointer manipulation with __atomic builtins
  2023-03-17 21:37 ` [lttng-dev] [PATCH 4/7] Replace the internal pointer manipulation with __atomic builtins Ondřej Surý via lttng-dev
@ 2023-03-20 18:25   ` Mathieu Desnoyers via lttng-dev
  0 siblings, 0 replies; 30+ messages in thread
From: Mathieu Desnoyers via lttng-dev @ 2023-03-20 18:25 UTC (permalink / raw)
  To: Ondřej Surý, lttng-dev, paulmck

On 2023-03-17 17:37, Ondřej Surý via lttng-dev wrote:
> Instead of custom code, use the __atomic builtins to implement the
> rcu_dereference(), rcu_cmpxchg_pointer(), rcu_xchg_pointer() and
> rcu_assign_pointer().

This also changes the cmm_mb() family of functions, but not everywhere. 
This should be documented.

I'm also unsure why architecture code has #ifndef cmm_mb when we would 
expect the generic arch implementation to be conditional (the other way 
around).

> 
> Signed-off-by: Ondřej Surý <ondrej@sury.org>
> ---
>   include/urcu/arch.h           | 20 +++++++++
>   include/urcu/arch/alpha.h     |  6 +++
>   include/urcu/arch/arm.h       | 12 ++++++
>   include/urcu/arch/mips.h      |  2 +
>   include/urcu/arch/nios2.h     |  2 +
>   include/urcu/arch/ppc.h       |  6 +++
>   include/urcu/arch/s390.h      |  2 +
>   include/urcu/arch/sparc64.h   |  6 +++
>   include/urcu/arch/x86.h       | 20 +++++++++
>   include/urcu/static/pointer.h | 77 +++++++----------------------------
>   10 files changed, 90 insertions(+), 63 deletions(-)
> 
> diff --git a/include/urcu/arch.h b/include/urcu/arch.h
> index d3914da..aec6fa1 100644
> --- a/include/urcu/arch.h
> +++ b/include/urcu/arch.h
> @@ -21,6 +21,26 @@
>   #ifndef _URCU_ARCH_H
>   #define _URCU_ARCH_H
>   
> +#if !defined(__has_feature)
> +#define __has_feature(x) 0
> +#endif /* if !defined(__has_feature) */
> +
> +/* GCC defines __SANITIZE_ADDRESS__, so reuse the macro for clang */
> +#if __has_feature(address_sanitizer)
> +#define __SANITIZE_ADDRESS__ 1
> +#endif /* if __has_feature(address_sanitizer) */
> +
> +#ifdef __SANITIZE_THREAD__
> +/* FIXME: Somebody who understands the barriers should look into this */
> +#define cmm_mb() __atomic_thread_fence(__ATOMIC_ACQ_REL)

This really needs to be __ATOMIC_SEQ_CST.

> +#define cmm_rmb() __atomic_thread_fence(__ATOMIC_ACQUIRE)
> +#define cmm_wmb() __atomic_thread_fence(__ATOMIC_RELEASE)

I am really unsure that rmb/wmb semantics map to acq/rel. Paul, can you 
confirm ?

> +#define cmm_smp_mb() __atomic_thread_fence(__ATOMIC_ACQ_REL)

SEQ_CST.

> +#define cmm_smp_rmb() __atomic_thread_fence(__ATOMIC_ACQUIRE)
> +#define cmm_smp_wmb() __atomic_thread_fence(__ATOMIC_RELEASE)

Unsure (see above).

> +#define cmm_read_barrier_depends() __atomic_thread_fence(__ATOMIC_ACQ_REL)

This would map to __ATOMIC_CONSUME, but AFAIK the current implementation 
of this semantic is done with __ATOMIC_ACQUIRE which is stronger than 
what is really needed here. So we can expect a slowdown on some 
architectures if we go that way.

Should we favor code simplicity and long-term maintainability at the 
expense of performance in the short-term ? Or should we keep 
arch-specific implementations until the toolchains end up implementing a 
proper consume semantic ?

> +#endif
> +
>   /*
>    * Architecture detection using compiler defines.
>    *
> diff --git a/include/urcu/arch/alpha.h b/include/urcu/arch/alpha.h
> index dc33e28..84526ef 100644
> --- a/include/urcu/arch/alpha.h
> +++ b/include/urcu/arch/alpha.h
> @@ -29,9 +29,15 @@
>   extern "C" {
>   #endif
>   
> +#ifndef cmm_mb
>   #define cmm_mb()			__asm__ __volatile__ ("mb":::"memory")
> +#endif
> +#ifndef cmm_wmb
>   #define cmm_wmb()			__asm__ __volatile__ ("wmb":::"memory")
> +#endif
> +#ifndef cmm_read_barrier_depends
>   #define cmm_read_barrier_depends()	__asm__ __volatile__ ("mb":::"memory")
> +#endif
>   
[...]
> diff --git a/include/urcu/static/pointer.h b/include/urcu/static/pointer.h
> index 9e46a57..3f116f3 100644
> --- a/include/urcu/static/pointer.h
> +++ b/include/urcu/static/pointer.h
> @@ -38,6 +38,8 @@
>   extern "C" {
>   #endif
>   
> +#define _rcu_get_pointer(addr) __atomic_load_n(addr, __ATOMIC_CONSUME)
> +
>   /**
>    * _rcu_dereference - reads (copy) a RCU-protected pointer to a local variable
>    * into a RCU read-side critical section. The pointer can later be safely
> @@ -49,14 +51,6 @@ extern "C" {
>    * Inserts memory barriers on architectures that require them (currently only
>    * Alpha) and documents which pointers are protected by RCU.
>    *
> - * With C standards prior to C11/C++11, the compiler memory barrier in
> - * CMM_LOAD_SHARED() ensures that value-speculative optimizations (e.g.
> - * VSS: Value Speculation Scheduling) does not perform the data read
> - * before the pointer read by speculating the value of the pointer.
> - * Correct ordering is ensured because the pointer is read as a volatile
> - * access. This acts as a global side-effect operation, which forbids
> - * reordering of dependent memory operations.

We should document that we end up relying on CONSUME for rcu_dereference 
in the patch commit message.

> - *
>    * With C standards C11/C++11, concerns about dependency-breaking
>    * optimizations are taken care of by the "memory_order_consume" atomic
>    * load.
> @@ -65,10 +59,6 @@ extern "C" {
>    * explicit because the pointer used as input argument is a pointer,
>    * not an _Atomic type as required by C11/C++11.
>    *
> - * By defining URCU_DEREFERENCE_USE_VOLATILE, the user requires use of
> - * volatile access to implement rcu_dereference rather than
> - * memory_order_consume load from the C11/C++11 standards.
> - *
>    * This may improve performance on weakly-ordered architectures where
>    * the compiler implements memory_order_consume as a
>    * memory_order_acquire, which is stricter than required by the
> @@ -83,35 +73,7 @@ extern "C" {
>    * meets the 10-line criterion in LGPL, allowing this function to be
>    * expanded directly in non-LGPL code.
>    */
> -
> -#if !defined (URCU_DEREFERENCE_USE_VOLATILE) &&		\
> -	((defined (__cplusplus) && __cplusplus >= 201103L) ||	\
> -	(defined (__STDC_VERSION__) && __STDC_VERSION__ >= 201112L))
> -# define __URCU_DEREFERENCE_USE_ATOMIC_CONSUME
> -#endif
> -
> -/*
> - * If p is const (the pointer itself, not what it points to), using
> - * __typeof__(p) would declare a const variable, leading to
> - * -Wincompatible-pointer-types errors.  Using the statement expression
> - * makes it an rvalue and gets rid of the const-ness.
> - */
> -#ifdef __URCU_DEREFERENCE_USE_ATOMIC_CONSUME
> -# define _rcu_dereference(p) __extension__ ({						\
> -				__typeof__(__extension__ ({				\
> -					__typeof__(p) __attribute__((unused)) _________p0 = { 0 }; \
> -					_________p0;					\
> -				})) _________p1;					\
> -				__atomic_load(&(p), &_________p1, __ATOMIC_CONSUME);	\
> -				(_________p1);						\
> -			})
> -#else
> -# define _rcu_dereference(p) __extension__ ({						\
> -				__typeof__(p) _________p1 = CMM_LOAD_SHARED(p);		\
> -				cmm_smp_read_barrier_depends();				\
> -				(_________p1);						\
> -			})
> -#endif
> +#define _rcu_dereference(p) _rcu_get_pointer(&(p))
>   
>   /**
>    * _rcu_cmpxchg_pointer - same as rcu_assign_pointer, but tests if the pointer
> @@ -126,12 +88,12 @@ extern "C" {
>    * meets the 10-line criterion in LGPL, allowing this function to be
>    * expanded directly in non-LGPL code.
>    */
> -#define _rcu_cmpxchg_pointer(p, old, _new)				\
> -	__extension__							\
> -	({								\
> -		__typeof__(*p) _________pold = (old);			\
> -		__typeof__(*p) _________pnew = (_new);			\
> -		uatomic_cmpxchg(p, _________pold, _________pnew);	\
> +#define _rcu_cmpxchg_pointer(p, old, _new)						\
> +	({										\
> +		__typeof__(*(p)) __old = old;						\
> +		__atomic_compare_exchange_n(p, &__old, _new, 0,				\
> +					    __ATOMIC_ACQ_REL, __ATOMIC_CONSUME);	\

__ATOMIC_SEQ_CST on both success and failure.

> +		__old;									\
>   	})
>   
>   /**
> @@ -145,22 +107,11 @@ extern "C" {
>    * meets the 10-line criterion in LGPL, allowing this function to be
>    * expanded directly in non-LGPL code.
>    */
> -#define _rcu_xchg_pointer(p, v)				\
> -	__extension__					\
> -	({						\
> -		__typeof__(*p) _________pv = (v);	\
> -		uatomic_xchg(p, _________pv);		\
> -	})
> -
> +#define _rcu_xchg_pointer(p, v) \
> +	__atomic_exchange_n(p, v, __ATOMIC_ACQ_REL)

__ATOMIC_SEQ_CST.

>   
> -#define _rcu_set_pointer(p, v)				\
> -	do {						\
> -		__typeof__(*p) _________pv = (v);	\
> -		if (!__builtin_constant_p(v) || 	\
> -		    ((v) != NULL))			\
> -			cmm_wmb();				\
> -		uatomic_set(p, _________pv);		\
> -	} while (0)
> +#define _rcu_set_pointer(p, v) \
> +	__atomic_store_n(p, v, __ATOMIC_RELEASE)

OK.

Thanks,

Mathieu

>   
>   /**
>    * _rcu_assign_pointer - assign (publicize) a pointer to a new data structure
> @@ -178,7 +129,7 @@ extern "C" {
>    * meets the 10-line criterion in LGPL, allowing this function to be
>    * expanded directly in non-LGPL code.
>    */
> -#define _rcu_assign_pointer(p, v)	_rcu_set_pointer(&(p), v)
> +#define _rcu_assign_pointer(p, v) rcu_set_pointer(&(p), v)
>   
>   #ifdef __cplusplus
>   }

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [lttng-dev] [PATCH 5/7] Use __atomic builtins to implement CMM_{LOAD, STORE}_SHARED
  2023-03-17 21:37 ` [lttng-dev] [PATCH 5/7] Use __atomic builtins to implement CMM_{LOAD, STORE}_SHARED Ondřej Surý via lttng-dev
@ 2023-03-20 18:28   ` Mathieu Desnoyers via lttng-dev
  0 siblings, 0 replies; 30+ messages in thread
From: Mathieu Desnoyers via lttng-dev @ 2023-03-20 18:28 UTC (permalink / raw)
  To: Ondřej Surý, lttng-dev

On 2023-03-17 17:37, Ondřej Surý via lttng-dev wrote:
> Instead of using CMM_ACCESS_ONCE() with memory barriers, use __atomic
> builtins with relaxed memory ordering to implement CMM_LOAD_SHARED() and
> CMM_STORE_SHARED().
> 
> Signed-off-by: Ondřej Surý <ondrej@sury.org>
> ---
>   include/urcu/system.h | 7 +++----
>   1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/include/urcu/system.h b/include/urcu/system.h
> index faae390..99e7443 100644
> --- a/include/urcu/system.h
> +++ b/include/urcu/system.h
> @@ -26,7 +26,7 @@
>    * Identify a shared load. A cmm_smp_rmc() or cmm_smp_mc() should come
>    * before the load.
>    */
> -#define _CMM_LOAD_SHARED(p)	       CMM_ACCESS_ONCE(p)
> +#define _CMM_LOAD_SHARED(p)	       __atomic_load_n(&(p), __ATOMIC_RELAXED)
>   
>   /*
>    * Load a data from shared memory, doing a cache flush if required.
> @@ -42,7 +42,7 @@
>    * Identify a shared store. A cmm_smp_wmc() or cmm_smp_mc() should
>    * follow the store.
>    */
> -#define _CMM_STORE_SHARED(x, v)	__extension__ ({ CMM_ACCESS_ONCE(x) = (v); })
> +#define _CMM_STORE_SHARED(x, v)	__atomic_store_n(&(x), (v), __ATOMIC_RELAXED)

__atomic_store_n() is void. _CMM_STORE_SHARED() should evaluate to (v) 
(unless we decide to change the semantic, which I would rather avoid).

Thanks,

Mathieu

>   
>   /*
>    * Store v into x, where x is located in shared memory. Performs the
> @@ -51,9 +51,8 @@
>   #define CMM_STORE_SHARED(x, v)						\
>   	__extension__							\
>   	({								\
> -		__typeof__(x) _v = _CMM_STORE_SHARED(x, v);		\
> +		_CMM_STORE_SHARED(x, v);				\
>   		cmm_smp_wmc();						\
> -		_v = _v;	/* Work around clang "unused result" */	\
>   	})
>   
>   #endif /* _URCU_SYSTEM_H */

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [lttng-dev] [PATCH 2/7] Use gcc __atomic builtis for <urcu/uatomic.h> implementation
  2023-03-20 18:03   ` Mathieu Desnoyers via lttng-dev
  2023-03-20 18:13     ` Mathieu Desnoyers via lttng-dev
@ 2023-03-20 18:28     ` Ondřej Surý via lttng-dev
  2023-03-20 18:38       ` Mathieu Desnoyers via lttng-dev
  2023-03-20 19:38     ` Duncan Sands via lttng-dev
  2 siblings, 1 reply; 30+ messages in thread
From: Ondřej Surý via lttng-dev @ 2023-03-20 18:28 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: lttng-dev, paulmck


> On 20. 3. 2023, at 19:03, Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> 
> In doc/uatomic-api.md, we document:
> 
> "```c
> type uatomic_cmpxchg(type *addr, type old, type new);
> ```
> 
> An atomic read-modify-write operation that performs this
> sequence of operations atomically: check if `addr` contains `old`.
> If true, then replace the content of `addr` by `new`. Return the
> value previously contained by `addr`. This function implies a full
> memory barrier before and after the atomic operation."
> 
> This would map to a "__ATOMIC_ACQ_REL" semantic on cmpxchg failure
> rather than __ATOMIC_CONSUME".


From: https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html

> If desired is written into *ptr then true is returned and memory is affected according to the memory order specified by success_memorder. There are no restrictions on what memory order can be used here.
> 
> Otherwise, false is returned and memory is affected according to failure_memorder. This memory order cannot be __ATOMIC_RELEASE nor __ATOMIC_ACQ_REL. It also cannot be a stronger order than that specified by success_memorder.

I think it makes sense that the failure_memorder has the same memorder as uatomic_read(), but it definitelly cannot be __ATOMIC_ACQ_REL - it's same as with __atomic_load_n, only following are permitted:

> The valid memory order variants are __ATOMIC_RELAXED, __ATOMIC_SEQ_CST, __ATOMIC_ACQUIRE, and __ATOMIC_CONSUME.

Ondrej
--
Ondřej Surý (He/Him)
ondrej@sury.org

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [lttng-dev] [PATCH 6/7] Fix: uatomic_or() need retyping to uintptr_t in rculfhash.c
  2023-03-17 21:37 ` [lttng-dev] [PATCH 6/7] Fix: uatomic_or() need retyping to uintptr_t in rculfhash.c Ondřej Surý via lttng-dev
@ 2023-03-20 18:31   ` Mathieu Desnoyers via lttng-dev
  2023-03-21 10:15     ` Ondřej Surý via lttng-dev
  0 siblings, 1 reply; 30+ messages in thread
From: Mathieu Desnoyers via lttng-dev @ 2023-03-20 18:31 UTC (permalink / raw)
  To: Ondřej Surý, lttng-dev

On 2023-03-17 17:37, Ondřej Surý via lttng-dev wrote:
> When adding REMOVED_FLAG to the pointers in the rculfhash
> implementation, retype the generic pointer to uintptr_t to fix the
> compiler error.

What is the compiler error ? I'm wondering whether the expected choice
to match the rest of this file's content would be to use "uintptr_t *" 
or "unsigned long *" ?

Thanks,

Mathieu

> 
> Signed-off-by: Ondřej Surý <ondrej@sury.org>
> ---
>   src/rculfhash.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/src/rculfhash.c b/src/rculfhash.c
> index b456415..863387e 100644
> --- a/src/rculfhash.c
> +++ b/src/rculfhash.c
> @@ -1198,7 +1198,7 @@ int _cds_lfht_del(struct cds_lfht *ht, unsigned long size,
>   	 * Knowing which wins the race will be known after the garbage
>   	 * collection phase, stay tuned!
>   	 */
> -	uatomic_or(&node->next, REMOVED_FLAG);
> +	uatomic_or((uintptr_t *)&node->next, REMOVED_FLAG);
>   	/* We performed the (logical) deletion. */
>   
>   	/*
> @@ -1441,7 +1441,7 @@ void remove_table_partition(struct cds_lfht *ht, unsigned long i,
>   		dbg_printf("remove entry: order %lu index %lu hash %lu\n",
>   			   i, j, j);
>   		/* Set the REMOVED_FLAG to freeze the ->next for gc */
> -		uatomic_or(&fini_bucket->next, REMOVED_FLAG);
> +		uatomic_or((uintptr_t *)&fini_bucket->next, REMOVED_FLAG);
>   		_cds_lfht_gc_bucket(parent_bucket, fini_bucket);
>   	}
>   	ht->flavor->read_unlock();

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [lttng-dev] [PATCH 7/7] Experiment: Add explicit memory barrier in free_completion()
  2023-03-17 21:37 ` [lttng-dev] [PATCH 7/7] Experiment: Add explicit memory barrier in free_completion() Ondřej Surý via lttng-dev
@ 2023-03-20 18:37   ` Mathieu Desnoyers via lttng-dev
  2023-03-21 10:21     ` Ondřej Surý via lttng-dev
  0 siblings, 1 reply; 30+ messages in thread
From: Mathieu Desnoyers via lttng-dev @ 2023-03-20 18:37 UTC (permalink / raw)
  To: Ondřej Surý, lttng-dev, paulmck

On 2023-03-17 17:37, Ondřej Surý via lttng-dev wrote:
> FIXME: This is experiment that adds explicit memory barrier in the
> free_completion in the workqueue.c, so ThreadSanitizer knows it's ok to
> free the resources.
> 
> Signed-off-by: Ondřej Surý <ondrej@sury.org>
> ---
>   src/workqueue.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/src/workqueue.c b/src/workqueue.c
> index 1039d72..f21907f 100644
> --- a/src/workqueue.c
> +++ b/src/workqueue.c
> @@ -377,6 +377,7 @@ void free_completion(struct urcu_ref *ref)
>   	struct urcu_workqueue_completion *completion;
>   
>   	completion = caa_container_of(ref, struct urcu_workqueue_completion, ref);
> +	assert(!urcu_ref_get_unless_zero(&completion->ref));

Perhaps what we really want here is an ANNOTATE_UNPUBLISH_MEMORY_RANGE() 
of some sort ?

Thanks,

Mathieu

>   	free(completion);
>   }
>   

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [lttng-dev] [PATCH 2/7] Use gcc __atomic builtis for <urcu/uatomic.h> implementation
  2023-03-20 18:28     ` Ondřej Surý via lttng-dev
@ 2023-03-20 18:38       ` Mathieu Desnoyers via lttng-dev
  2023-03-20 18:41         ` Mathieu Desnoyers via lttng-dev
  0 siblings, 1 reply; 30+ messages in thread
From: Mathieu Desnoyers via lttng-dev @ 2023-03-20 18:38 UTC (permalink / raw)
  To: Ondřej Surý; +Cc: lttng-dev, paulmck

On 2023-03-20 14:28, Ondřej Surý wrote:
> 
>> On 20. 3. 2023, at 19:03, Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
>>
>> In doc/uatomic-api.md, we document:
>>
>> "```c
>> type uatomic_cmpxchg(type *addr, type old, type new);
>> ```
>>
>> An atomic read-modify-write operation that performs this
>> sequence of operations atomically: check if `addr` contains `old`.
>> If true, then replace the content of `addr` by `new`. Return the
>> value previously contained by `addr`. This function implies a full
>> memory barrier before and after the atomic operation."
>>
>> This would map to a "__ATOMIC_ACQ_REL" semantic on cmpxchg failure
>> rather than __ATOMIC_CONSUME".
> 
> 
> From: https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html
> 
>> If desired is written into *ptr then true is returned and memory is affected according to the memory order specified by success_memorder. There are no restrictions on what memory order can be used here.
>>
>> Otherwise, false is returned and memory is affected according to failure_memorder. This memory order cannot be __ATOMIC_RELEASE nor __ATOMIC_ACQ_REL. It also cannot be a stronger order than that specified by success_memorder.
> 
> I think it makes sense that the failure_memorder has the same memorder as uatomic_read(), but it definitelly cannot be __ATOMIC_ACQ_REL - it's same as with __atomic_load_n, only following are permitted:
> 
>> The valid memory order variants are __ATOMIC_RELAXED, __ATOMIC_SEQ_CST, __ATOMIC_ACQUIRE, and __ATOMIC_CONSUME.

Based on my other reply, we want "SEQ_CST" rather than ACQ_REL everywhere.

Thanks,

Mathieu

> 
> Ondrej
> --
> Ondřej Surý (He/Him)
> ondrej@sury.org
> 

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [lttng-dev] [PATCH 2/7] Use gcc __atomic builtis for <urcu/uatomic.h> implementation
  2023-03-20 18:38       ` Mathieu Desnoyers via lttng-dev
@ 2023-03-20 18:41         ` Mathieu Desnoyers via lttng-dev
  0 siblings, 0 replies; 30+ messages in thread
From: Mathieu Desnoyers via lttng-dev @ 2023-03-20 18:41 UTC (permalink / raw)
  To: Ondřej Surý; +Cc: lttng-dev, paulmck

On 2023-03-20 14:38, Mathieu Desnoyers via lttng-dev wrote:
> On 2023-03-20 14:28, Ondřej Surý wrote:
>>
>>> On 20. 3. 2023, at 19:03, Mathieu Desnoyers 
>>> <mathieu.desnoyers@efficios.com> wrote:
>>>
>>> In doc/uatomic-api.md, we document:
>>>
>>> "```c
>>> type uatomic_cmpxchg(type *addr, type old, type new);
>>> ```
>>>
>>> An atomic read-modify-write operation that performs this
>>> sequence of operations atomically: check if `addr` contains `old`.
>>> If true, then replace the content of `addr` by `new`. Return the
>>> value previously contained by `addr`. This function implies a full
>>> memory barrier before and after the atomic operation."
>>>
>>> This would map to a "__ATOMIC_ACQ_REL" semantic on cmpxchg failure
>>> rather than __ATOMIC_CONSUME".
>>
>>
>> From: https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html
>>
>>> If desired is written into *ptr then true is returned and memory is 
>>> affected according to the memory order specified by success_memorder. 
>>> There are no restrictions on what memory order can be used here.
>>>
>>> Otherwise, false is returned and memory is affected according to 
>>> failure_memorder. This memory order cannot be __ATOMIC_RELEASE nor 
>>> __ATOMIC_ACQ_REL. It also cannot be a stronger order than that 
>>> specified by success_memorder.
>>
>> I think it makes sense that the failure_memorder has the same memorder 
>> as uatomic_read(), but it definitelly cannot be __ATOMIC_ACQ_REL - 
>> it's same as with __atomic_load_n, only following are permitted:
>>
>>> The valid memory order variants are __ATOMIC_RELAXED, 
>>> __ATOMIC_SEQ_CST, __ATOMIC_ACQUIRE, and __ATOMIC_CONSUME.
> 
> Based on my other reply, we want "SEQ_CST" rather than ACQ_REL everywhere.

And it _would_ make sense to use the same memorder on cmpxchg failure as 
uatomic_read if we were exposing a new API, but we are modifying an 
already exposed documented API, so I would stick to SEQ_CST for both 
cmpxchg success/failure.

If we want to expose a new cmpxchg_relaxed_failure with a relaxed 
memorder on failure that would be fine, but we cannot change the 
semantic that is already documented.

Thanks,

Mathieu

> 
> Thanks,
> 
> Mathieu
> 
>>
>> Ondrej
>> -- 
>> Ondřej Surý (He/Him)
>> ondrej@sury.org
>>
> 

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [lttng-dev] [PATCH 2/7] Use gcc __atomic builtis for <urcu/uatomic.h> implementation
  2023-03-20 18:03   ` Mathieu Desnoyers via lttng-dev
  2023-03-20 18:13     ` Mathieu Desnoyers via lttng-dev
  2023-03-20 18:28     ` Ondřej Surý via lttng-dev
@ 2023-03-20 19:38     ` Duncan Sands via lttng-dev
  2023-03-21 20:26       ` Mathieu Desnoyers via lttng-dev
  2 siblings, 1 reply; 30+ messages in thread
From: Duncan Sands via lttng-dev @ 2023-03-20 19:38 UTC (permalink / raw)
  To: lttng-dev

Hi Mathieu,

> While OK for the general case, I would recommend that we immediately implement 
> something more efficient on x86 32/64 which takes into account that 
> __ATOMIC_ACQ_REL atomic operations are implemented with LOCK prefixed atomic 
> ops, which imply the barrier already, leaving the before/after_uatomic_*() as 
> no-ops.

maybe first check whether the GCC optimizers merge them.  I believe some 
optimizations of atomic primitives are allowed and implemented, but I couldn't 
say which ones.

Best wishes, Duncan.
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [lttng-dev] [PATCH 6/7] Fix: uatomic_or() need retyping to uintptr_t in rculfhash.c
  2023-03-20 18:31   ` Mathieu Desnoyers via lttng-dev
@ 2023-03-21 10:15     ` Ondřej Surý via lttng-dev
  2023-03-21 14:44       ` Mathieu Desnoyers via lttng-dev
  0 siblings, 1 reply; 30+ messages in thread
From: Ondřej Surý via lttng-dev @ 2023-03-21 10:15 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: lttng-dev


> On 20. 3. 2023, at 19:31, Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> 
> On 2023-03-17 17:37, Ondřej Surý via lttng-dev wrote:
>> When adding REMOVED_FLAG to the pointers in the rculfhash
>> implementation, retype the generic pointer to uintptr_t to fix the
>> compiler error.
> 
> What is the compiler error ? I'm wondering whether the expected choice
> to match the rest of this file's content would be to use "uintptr_t *" or "unsigned long *" ?

This is the error:

rculfhash.c:1201:2: error: address argument to atomic operation must be a pointer to integer ('struct cds_lfht_node **' invalid)
        uatomic_or(&node->next, REMOVED_FLAG);
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../include/urcu/uatomic.h:60:8: note: expanded from macro 'uatomic_or'
        (void)__atomic_or_fetch((addr), (mask), __ATOMIC_RELAXED)
              ^                 ~~~~~~
rculfhash.c:1444:3: error: address argument to atomic operation must be a pointer to integer ('struct cds_lfht_node **' invalid)
                uatomic_or(&fini_bucket->next, REMOVED_FLAG);
                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../include/urcu/uatomic.h:60:8: note: expanded from macro 'uatomic_or'
        (void)__atomic_or_fetch((addr), (mask), __ATOMIC_RELAXED)
              ^                 ~~~~~~

uintptr_t is defined as "unsigned integer type capable of holding a pointer to void" while unsigned long is at least 32-bit;

I guess that works in a practise, but using unsigned long to retype the pointers might blow up (thinking of x32 which I know
little about, but it's kind of hybrid architecture, isn't it?)

Ondrej
--
Ondřej Surý (He/Him)
ondrej@sury.org

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [lttng-dev] [PATCH 7/7] Experiment: Add explicit memory barrier in free_completion()
  2023-03-20 18:37   ` Mathieu Desnoyers via lttng-dev
@ 2023-03-21 10:21     ` Ondřej Surý via lttng-dev
  2023-03-21 14:46       ` Mathieu Desnoyers via lttng-dev
  0 siblings, 1 reply; 30+ messages in thread
From: Ondřej Surý via lttng-dev @ 2023-03-21 10:21 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: lttng-dev, paulmck

> On 20. 3. 2023, at 19:37, Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> 
> On 2023-03-17 17:37, Ondřej Surý via lttng-dev wrote:
>> FIXME: This is experiment that adds explicit memory barrier in the
>> free_completion in the workqueue.c, so ThreadSanitizer knows it's ok to
>> free the resources.
>> Signed-off-by: Ondřej Surý <ondrej@sury.org>
>> ---
>>  src/workqueue.c | 1 +
>>  1 file changed, 1 insertion(+)
>> diff --git a/src/workqueue.c b/src/workqueue.c
>> index 1039d72..f21907f 100644
>> --- a/src/workqueue.c
>> +++ b/src/workqueue.c
>> @@ -377,6 +377,7 @@ void free_completion(struct urcu_ref *ref)
>>   struct urcu_workqueue_completion *completion;
>>     completion = caa_container_of(ref, struct urcu_workqueue_completion, ref);
>> + assert(!urcu_ref_get_unless_zero(&completion->ref));
> 
> Perhaps what we really want here is an ANNOTATE_UNPUBLISH_MEMORY_RANGE() of some sort ?

I guess?

My experience with TSAN tells me, that you need some kind of memory barrier when using acquire-release
semantics and you do:

if (__atomic_sub_fetch(obj->ref, __ATOMIC_RELEASE) == 0) {
  /* __ATOMIC_ACQUIRE needed here */
   free(obj);
}

we end up using following code in BIND 9:

if (__atomic_sub_fetch(obj->ref, __ATOMIC_ACQ_REL) == 0) {
   free(obj);
}

So, I am guessing after the change of uatomic_sub_return() to __ATOMIC_ACQ_REL,
this patch should no longer be needed.

Ondrej
--
Ondřej Surý (He/Him)
ondrej@sury.org

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [lttng-dev] [PATCH 6/7] Fix: uatomic_or() need retyping to uintptr_t in rculfhash.c
  2023-03-21 10:15     ` Ondřej Surý via lttng-dev
@ 2023-03-21 14:44       ` Mathieu Desnoyers via lttng-dev
  2023-03-21 14:45         ` Mathieu Desnoyers via lttng-dev
  0 siblings, 1 reply; 30+ messages in thread
From: Mathieu Desnoyers via lttng-dev @ 2023-03-21 14:44 UTC (permalink / raw)
  To: Ondřej Surý; +Cc: lttng-dev

On 2023-03-21 06:15, Ondřej Surý wrote:
> 
>> On 20. 3. 2023, at 19:31, Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
>>
>> On 2023-03-17 17:37, Ondřej Surý via lttng-dev wrote:
>>> When adding REMOVED_FLAG to the pointers in the rculfhash
>>> implementation, retype the generic pointer to uintptr_t to fix the
>>> compiler error.
>>
>> What is the compiler error ? I'm wondering whether the expected choice
>> to match the rest of this file's content would be to use "uintptr_t *" or "unsigned long *" ?
> 
> This is the error:
> 
> rculfhash.c:1201:2: error: address argument to atomic operation must be a pointer to integer ('struct cds_lfht_node **' invalid)
>          uatomic_or(&node->next, REMOVED_FLAG);
>          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> ../include/urcu/uatomic.h:60:8: note: expanded from macro 'uatomic_or'
>          (void)__atomic_or_fetch((addr), (mask), __ATOMIC_RELAXED)
>                ^                 ~~~~~~
> rculfhash.c:1444:3: error: address argument to atomic operation must be a pointer to integer ('struct cds_lfht_node **' invalid)
>                  uatomic_or(&fini_bucket->next, REMOVED_FLAG);
>                  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> ../include/urcu/uatomic.h:60:8: note: expanded from macro 'uatomic_or'
>          (void)__atomic_or_fetch((addr), (mask), __ATOMIC_RELAXED)
>                ^                 ~~~~~~
> 
> uintptr_t is defined as "unsigned integer type capable of holding a pointer to void" while unsigned long is at least 32-bit;
> 
> I guess that works in a practise, but using unsigned long to retype the pointers might blow up (thinking of x32 which I know
> little about, but it's kind of hybrid architecture, isn't it?)

x32 uses 4 bytes for unsigned long, uintptr_t, and void * size. So even 
that architecture is OK with casting pointer to unsigned long.

I agree with you that uintptr_t is the semantically correct type, but it 
should come as a separate change across the urcu code base: currently 
there are many places where void * is cast to unsigned long to do 
bitwise operations.

I therefore recommend to use unsigned long here to stay similar to the 
rest of the code base, and keep the transition from unsigned long to 
uintptr_t for the future, as it is not an immediate issue we have to 
address.

Thanks,

Mathieu


> 
> Ondrej
> --
> Ondřej Surý (He/Him)
> ondrej@sury.org
> 

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [lttng-dev] [PATCH 6/7] Fix: uatomic_or() need retyping to uintptr_t in rculfhash.c
  2023-03-21 14:44       ` Mathieu Desnoyers via lttng-dev
@ 2023-03-21 14:45         ` Mathieu Desnoyers via lttng-dev
  0 siblings, 0 replies; 30+ messages in thread
From: Mathieu Desnoyers via lttng-dev @ 2023-03-21 14:45 UTC (permalink / raw)
  To: Ondřej Surý; +Cc: lttng-dev

On 2023-03-21 10:44, Mathieu Desnoyers wrote:
> On 2023-03-21 06:15, Ondřej Surý wrote:
>>
>>> On 20. 3. 2023, at 19:31, Mathieu Desnoyers 
>>> <mathieu.desnoyers@efficios.com> wrote:
>>>
>>> On 2023-03-17 17:37, Ondřej Surý via lttng-dev wrote:
>>>> When adding REMOVED_FLAG to the pointers in the rculfhash
>>>> implementation, retype the generic pointer to uintptr_t to fix the
>>>> compiler error.
>>>
>>> What is the compiler error ? I'm wondering whether the expected choice
>>> to match the rest of this file's content would be to use "uintptr_t 
>>> *" or "unsigned long *" ?
>>
>> This is the error:
>>
>> rculfhash.c:1201:2: error: address argument to atomic operation must 
>> be a pointer to integer ('struct cds_lfht_node **' invalid)
>>          uatomic_or(&node->next, REMOVED_FLAG);
>>          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> ../include/urcu/uatomic.h:60:8: note: expanded from macro 'uatomic_or'
>>          (void)__atomic_or_fetch((addr), (mask), __ATOMIC_RELAXED)
>>                ^                 ~~~~~~
>> rculfhash.c:1444:3: error: address argument to atomic operation must 
>> be a pointer to integer ('struct cds_lfht_node **' invalid)
>>                  uatomic_or(&fini_bucket->next, REMOVED_FLAG);
>>                  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> ../include/urcu/uatomic.h:60:8: note: expanded from macro 'uatomic_or'
>>          (void)__atomic_or_fetch((addr), (mask), __ATOMIC_RELAXED)
>>                ^                 ~~~~~~
>>
>> uintptr_t is defined as "unsigned integer type capable of holding a 
>> pointer to void" while unsigned long is at least 32-bit;
>>
>> I guess that works in a practise, but using unsigned long to retype 
>> the pointers might blow up (thinking of x32 which I know
>> little about, but it's kind of hybrid architecture, isn't it?)
> 
> x32 uses 4 bytes for unsigned long, uintptr_t, and void * size. So even 
> that architecture is OK with casting pointer to unsigned long.
> 
> I agree with you that uintptr_t is the semantically correct type, but it 
> should come as a separate change across the urcu code base: currently 
> there are many places where void * is cast to unsigned long to do 
> bitwise operations.
> 
> I therefore recommend to use unsigned long here to stay similar to the 
> rest of the code base, and keep the transition from unsigned long to 
> uintptr_t for the future, as it is not an immediate issue we have to 
> address.

I forgot to mention: you should add the compiler error to the commit 
message.

You should also explain why this was not an issue until now. It's 
probably related to the introduced use of __atomic builtins.

Thanks,

Mathieu

> 
> Thanks,
> 
> Mathieu
> 
> 
>>
>> Ondrej
>> -- 
>> Ondřej Surý (He/Him)
>> ondrej@sury.org
>>
> 

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [lttng-dev] [PATCH 7/7] Experiment: Add explicit memory barrier in free_completion()
  2023-03-21 10:21     ` Ondřej Surý via lttng-dev
@ 2023-03-21 14:46       ` Mathieu Desnoyers via lttng-dev
  2023-03-21 14:48         ` Ondřej Surý via lttng-dev
  0 siblings, 1 reply; 30+ messages in thread
From: Mathieu Desnoyers via lttng-dev @ 2023-03-21 14:46 UTC (permalink / raw)
  To: Ondřej Surý; +Cc: lttng-dev, paulmck

On 2023-03-21 06:21, Ondřej Surý wrote:
>> On 20. 3. 2023, at 19:37, Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
>>
>> On 2023-03-17 17:37, Ondřej Surý via lttng-dev wrote:
>>> FIXME: This is experiment that adds explicit memory barrier in the
>>> free_completion in the workqueue.c, so ThreadSanitizer knows it's ok to
>>> free the resources.
>>> Signed-off-by: Ondřej Surý <ondrej@sury.org>
>>> ---
>>>   src/workqueue.c | 1 +
>>>   1 file changed, 1 insertion(+)
>>> diff --git a/src/workqueue.c b/src/workqueue.c
>>> index 1039d72..f21907f 100644
>>> --- a/src/workqueue.c
>>> +++ b/src/workqueue.c
>>> @@ -377,6 +377,7 @@ void free_completion(struct urcu_ref *ref)
>>>    struct urcu_workqueue_completion *completion;
>>>      completion = caa_container_of(ref, struct urcu_workqueue_completion, ref);
>>> + assert(!urcu_ref_get_unless_zero(&completion->ref));
>>
>> Perhaps what we really want here is an ANNOTATE_UNPUBLISH_MEMORY_RANGE() of some sort ?
> 
> I guess?
> 
> My experience with TSAN tells me, that you need some kind of memory barrier when using acquire-release
> semantics and you do:
> 
> if (__atomic_sub_fetch(obj->ref, __ATOMIC_RELEASE) == 0) {
>    /* __ATOMIC_ACQUIRE needed here */
>     free(obj);
> }
> 
> we end up using following code in BIND 9:
> 
> if (__atomic_sub_fetch(obj->ref, __ATOMIC_ACQ_REL) == 0) {
>     free(obj);
> }
> 
> So, I am guessing after the change of uatomic_sub_return() to __ATOMIC_ACQ_REL,
> this patch should no longer be needed.

Actually we want __ATOMIC_SEQ_CST, which is even stronger than ACQ_REL.

Thanks,

Mathieu

> 
> Ondrej
> --
> Ondřej Surý (He/Him)
> ondrej@sury.org
> 

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [lttng-dev] [PATCH 7/7] Experiment: Add explicit memory barrier in free_completion()
  2023-03-21 14:46       ` Mathieu Desnoyers via lttng-dev
@ 2023-03-21 14:48         ` Ondřej Surý via lttng-dev
  2023-03-21 14:49           ` Mathieu Desnoyers via lttng-dev
  0 siblings, 1 reply; 30+ messages in thread
From: Ondřej Surý via lttng-dev @ 2023-03-21 14:48 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: lttng-dev, paulmck

> On 21. 3. 2023, at 15:46, Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> 
> On 2023-03-21 06:21, Ondřej Surý wrote:
>>> On 20. 3. 2023, at 19:37, Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
>>> 
>>> On 2023-03-17 17:37, Ondřej Surý via lttng-dev wrote:
>>>> FIXME: This is experiment that adds explicit memory barrier in the
>>>> free_completion in the workqueue.c, so ThreadSanitizer knows it's ok to
>>>> free the resources.
>>>> Signed-off-by: Ondřej Surý <ondrej@sury.org>
>>>> ---
>>>>  src/workqueue.c | 1 +
>>>>  1 file changed, 1 insertion(+)
>>>> diff --git a/src/workqueue.c b/src/workqueue.c
>>>> index 1039d72..f21907f 100644
>>>> --- a/src/workqueue.c
>>>> +++ b/src/workqueue.c
>>>> @@ -377,6 +377,7 @@ void free_completion(struct urcu_ref *ref)
>>>>   struct urcu_workqueue_completion *completion;
>>>>     completion = caa_container_of(ref, struct urcu_workqueue_completion, ref);
>>>> + assert(!urcu_ref_get_unless_zero(&completion->ref));
>>> 
>>> Perhaps what we really want here is an ANNOTATE_UNPUBLISH_MEMORY_RANGE() of some sort ?
>> I guess?
>> My experience with TSAN tells me, that you need some kind of memory barrier when using acquire-release
>> semantics and you do:
>> if (__atomic_sub_fetch(obj->ref, __ATOMIC_RELEASE) == 0) {
>>   /* __ATOMIC_ACQUIRE needed here */
>>    free(obj);
>> }
>> we end up using following code in BIND 9:
>> if (__atomic_sub_fetch(obj->ref, __ATOMIC_ACQ_REL) == 0) {
>>    free(obj);
>> }
>> So, I am guessing after the change of uatomic_sub_return() to __ATOMIC_ACQ_REL,
>> this patch should no longer be needed.
> 
> Actually we want __ATOMIC_SEQ_CST, which is even stronger than ACQ_REL.

Yeah, I think I already did that, but wrote the email before that. Nevertheless, my main
point was that it should not be needed anymore.

Ondrej
--
Ondřej Surý (He/Him)
ondrej@sury.org

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [lttng-dev] [PATCH 7/7] Experiment: Add explicit memory barrier in free_completion()
  2023-03-21 14:48         ` Ondřej Surý via lttng-dev
@ 2023-03-21 14:49           ` Mathieu Desnoyers via lttng-dev
  2023-03-21 14:59             ` [lttng-dev] TSAN and the tests Ondřej Surý via lttng-dev
  0 siblings, 1 reply; 30+ messages in thread
From: Mathieu Desnoyers via lttng-dev @ 2023-03-21 14:49 UTC (permalink / raw)
  To: Ondřej Surý; +Cc: lttng-dev, paulmck

On 2023-03-21 10:48, Ondřej Surý wrote:
>> On 21. 3. 2023, at 15:46, Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
>>
>> On 2023-03-21 06:21, Ondřej Surý wrote:
>>>> On 20. 3. 2023, at 19:37, Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
>>>>
>>>> On 2023-03-17 17:37, Ondřej Surý via lttng-dev wrote:
>>>>> FIXME: This is experiment that adds explicit memory barrier in the
>>>>> free_completion in the workqueue.c, so ThreadSanitizer knows it's ok to
>>>>> free the resources.
>>>>> Signed-off-by: Ondřej Surý <ondrej@sury.org>
>>>>> ---
>>>>>   src/workqueue.c | 1 +
>>>>>   1 file changed, 1 insertion(+)
>>>>> diff --git a/src/workqueue.c b/src/workqueue.c
>>>>> index 1039d72..f21907f 100644
>>>>> --- a/src/workqueue.c
>>>>> +++ b/src/workqueue.c
>>>>> @@ -377,6 +377,7 @@ void free_completion(struct urcu_ref *ref)
>>>>>    struct urcu_workqueue_completion *completion;
>>>>>      completion = caa_container_of(ref, struct urcu_workqueue_completion, ref);
>>>>> + assert(!urcu_ref_get_unless_zero(&completion->ref));
>>>>
>>>> Perhaps what we really want here is an ANNOTATE_UNPUBLISH_MEMORY_RANGE() of some sort ?
>>> I guess?
>>> My experience with TSAN tells me, that you need some kind of memory barrier when using acquire-release
>>> semantics and you do:
>>> if (__atomic_sub_fetch(obj->ref, __ATOMIC_RELEASE) == 0) {
>>>    /* __ATOMIC_ACQUIRE needed here */
>>>     free(obj);
>>> }
>>> we end up using following code in BIND 9:
>>> if (__atomic_sub_fetch(obj->ref, __ATOMIC_ACQ_REL) == 0) {
>>>     free(obj);
>>> }
>>> So, I am guessing after the change of uatomic_sub_return() to __ATOMIC_ACQ_REL,
>>> this patch should no longer be needed.
>>
>> Actually we want __ATOMIC_SEQ_CST, which is even stronger than ACQ_REL.
> 
> Yeah, I think I already did that, but wrote the email before that. Nevertheless, my main
> point was that it should not be needed anymore.

Agreed, let's see how it holds up to testing under TSAN. :)

Thanks,

Mathieu

> 
> Ondrej
> --
> Ondřej Surý (He/Him)
> ondrej@sury.org
> 

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [lttng-dev] TSAN and the tests
  2023-03-21 14:49           ` Mathieu Desnoyers via lttng-dev
@ 2023-03-21 14:59             ` Ondřej Surý via lttng-dev
  0 siblings, 0 replies; 30+ messages in thread
From: Ondřej Surý via lttng-dev @ 2023-03-21 14:59 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: lttng-dev, paulmck

> On 21. 3. 2023, at 15:49, Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> 
> Agreed, let's see how it holds up to testing under TSAN. :)

The make check is OK, but any of the make regtest, short_bench or long_bench produces
lot of TSAN warnings like these below.

Looks like they mostly come from the tests itself and I will start chopping them off when I
have little time if nobody beats me to it meanwhile.

Meanwhile I've isolated the use of cds_lfht for a small hashtable with bad "names" used
as SERVFAIL cache which doesn't use reference counting nor LRU, and there was
yet-another custom hashtable in BIND 9, so it was perfect for testing.  I'll report what
TSAN likes or doesn't like in the real world app soon(ish).

Ondrej


==================
WARNING: ThreadSanitizer: data race (pid=1110834)
  Write of size 4 at 0x5612e63ef380 by main thread:
    #0 perftestrun /home/ondrej/Projects/userspace-rcu/tests/regression/./rcutorture.h:217:9 (rcutorture_urcu_bp+0xd6278) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)
    #1 perftest /home/ondrej/Projects/userspace-rcu/tests/regression/./rcutorture.h:255:9 (rcutorture_urcu_bp+0xd55f5) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)
    #2 main /home/ondrej/Projects/userspace-rcu/tests/regression/./rcutorture.h:646:4 (rcutorture_urcu_bp+0xd512d) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)

  Previous read of size 4 at 0x5612e63ef380 by thread T24:
    #0 rcu_read_perf_test /home/ondrej/Projects/userspace-rcu/tests/regression/./rcutorture.h:148:9 (rcutorture_urcu_bp+0xd5f61) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)

  Location is global 'goflag' of size 4 at 0x5612e63ef380 (rcutorture_urcu_bp+0x147f380)

  Thread T24 (tid=1110872, running) created by main thread at:
    #0 pthread_create <null> (rcutorture_urcu_bp+0x4e5bb) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)
    #1 create_thread /home/ondrej/Projects/userspace-rcu/tests/regression/../../tests/common/api.h:188:6 (rcutorture_urcu_bp+0xd5e6c) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)
    #2 perftest /home/ondrej/Projects/userspace-rcu/tests/regression/./rcutorture.h:251:3 (rcutorture_urcu_bp+0xd55b7) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)
    #3 main /home/ondrej/Projects/userspace-rcu/tests/regression/./rcutorture.h:646:4 (rcutorture_urcu_bp+0xd512d) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)

SUMMARY: ThreadSanitizer: data race /home/ondrej/Projects/userspace-rcu/tests/regression/./rcutorture.h:217:9 in perftestrun
==================
==================
WARNING: ThreadSanitizer: data race (pid=1110834)
  Read of size 8 at 0x5612e63ef408 by thread T3:
    #0 __smp_thread_id /home/ondrej/Projects/userspace-rcu/tests/regression/../../tests/common/api.h:138:7 (rcutorture_urcu_bp+0xd69d8) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)
    #1 smp_thread_id /home/ondrej/Projects/userspace-rcu/tests/regression/../../tests/common/api.h:167:10 (rcutorture_urcu_bp+0xd67be) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)
    #2 rcu_read_perf_test /home/ondrej/Projects/userspace-rcu/tests/regression/./rcutorture.h:161:2 (rcutorture_urcu_bp+0xd6007) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)

  Previous write of size 8 at 0x5612e63ef408 by main thread:
    #0 create_thread /home/ondrej/Projects/userspace-rcu/tests/regression/../../tests/common/api.h:192:21 (rcutorture_urcu_bp+0xd5eb7) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)
    #1 perftest /home/ondrej/Projects/userspace-rcu/tests/regression/./rcutorture.h:251:3 (rcutorture_urcu_bp+0xd55b7) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)
    #2 main /home/ondrej/Projects/userspace-rcu/tests/regression/./rcutorture.h:646:4 (rcutorture_urcu_bp+0xd512d) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)

  Location is global '__thread_id_map' of size 32768 at 0x5612e63ef3f0 (rcutorture_urcu_bp+0x147f408)

  Thread T3 (tid=1110851, running) created by main thread at:
    #0 pthread_create <null> (rcutorture_urcu_bp+0x4e5bb) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)
    #1 create_thread /home/ondrej/Projects/userspace-rcu/tests/regression/../../tests/common/api.h:188:6 (rcutorture_urcu_bp+0xd5e6c) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)
    #2 perftest /home/ondrej/Projects/userspace-rcu/tests/regression/./rcutorture.h:251:3 (rcutorture_urcu_bp+0xd55b7) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)
    #3 main /home/ondrej/Projects/userspace-rcu/tests/regression/./rcutorture.h:646:4 (rcutorture_urcu_bp+0xd512d) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)

SUMMARY: ThreadSanitizer: data race /home/ondrej/Projects/userspace-rcu/tests/regression/../../tests/common/api.h:138:7 in __smp_thread_id
==================
==================
WARNING: ThreadSanitizer: data race (pid=1110834)
  Write of size 8 at 0x5612e63ef3f8 by main thread:
    #0 wait_thread /home/ondrej/Projects/userspace-rcu/tests/regression/../../tests/common/api.h:214:21 (rcutorture_urcu_bp+0xd6cc6) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)
    #1 wait_all_threads /home/ondrej/Projects/userspace-rcu/tests/regression/../../tests/common/api.h:227:10 (rcutorture_urcu_bp+0xd6ba1) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)
    #2 perftestrun /home/ondrej/Projects/userspace-rcu/tests/regression/./rcutorture.h:223:2 (rcutorture_urcu_bp+0xd62c9) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)
    #3 perftest /home/ondrej/Projects/userspace-rcu/tests/regression/./rcutorture.h:255:9 (rcutorture_urcu_bp+0xd55f5) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)
    #4 main /home/ondrej/Projects/userspace-rcu/tests/regression/./rcutorture.h:646:4 (rcutorture_urcu_bp+0xd512d) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)

  Previous read of size 8 at 0x5612e63ef3f8 by thread T11:
    #0 __smp_thread_id /home/ondrej/Projects/userspace-rcu/tests/regression/../../tests/common/api.h:138:7 (rcutorture_urcu_bp+0xd69d8) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)
    #1 smp_thread_id /home/ondrej/Projects/userspace-rcu/tests/regression/../../tests/common/api.h:167:10 (rcutorture_urcu_bp+0xd67be) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)
    #2 rcu_read_perf_test /home/ondrej/Projects/userspace-rcu/tests/regression/./rcutorture.h:161:2 (rcutorture_urcu_bp+0xd6007) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)

  As if synchronized via sleep:
    #0 sleep <null> (rcutorture_urcu_bp+0x4b206) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)
    #1 perftestrun /home/ondrej/Projects/userspace-rcu/tests/regression/./rcutorture.h:219:2 (rcutorture_urcu_bp+0xd6297) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)
    #2 perftest /home/ondrej/Projects/userspace-rcu/tests/regression/./rcutorture.h:255:9 (rcutorture_urcu_bp+0xd55f5) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)
    #3 main /home/ondrej/Projects/userspace-rcu/tests/regression/./rcutorture.h:646:4 (rcutorture_urcu_bp+0xd512d) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)

  Location is global '__thread_id_map' of size 32768 at 0x5612e63ef3f0 (rcutorture_urcu_bp+0x147f3f8)

  Thread T11 (tid=1110859, finished) created by main thread at:
    #0 pthread_create <null> (rcutorture_urcu_bp+0x4e5bb) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)
    #1 create_thread /home/ondrej/Projects/userspace-rcu/tests/regression/../../tests/common/api.h:188:6 (rcutorture_urcu_bp+0xd5e6c) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)
    #2 perftest /home/ondrej/Projects/userspace-rcu/tests/regression/./rcutorture.h:251:3 (rcutorture_urcu_bp+0xd55b7) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)
    #3 main /home/ondrej/Projects/userspace-rcu/tests/regression/./rcutorture.h:646:4 (rcutorture_urcu_bp+0xd512d) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)

SUMMARY: ThreadSanitizer: data race /home/ondrej/Projects/userspace-rcu/tests/regression/../../tests/common/api.h:214:21 in wait_thread
==================
==================
WARNING: ThreadSanitizer: data race (pid=1110834)
  Read of size 8 at 0x5612e63ef4f8 by thread T33:
    #0 __smp_thread_id /home/ondrej/Projects/userspace-rcu/tests/regression/../../tests/common/api.h:138:7 (rcutorture_urcu_bp+0xd69d8) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)
    #1 smp_thread_id /home/ondrej/Projects/userspace-rcu/tests/regression/../../tests/common/api.h:167:10 (rcutorture_urcu_bp+0xd67be) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)
    #2 rcu_update_perf_test /home/ondrej/Projects/userspace-rcu/tests/regression/./rcutorture.h:189:2 (rcutorture_urcu_bp+0xd6173) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)

  Previous write of size 8 at 0x5612e63ef4f8 by main thread:
    #0 create_thread /home/ondrej/Projects/userspace-rcu/tests/regression/../../tests/common/api.h:192:21 (rcutorture_urcu_bp+0xd5eb7) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)
    #1 perftest /home/ondrej/Projects/userspace-rcu/tests/regression/./rcutorture.h:254:2 (rcutorture_urcu_bp+0xd55e2) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)
    #2 main /home/ondrej/Projects/userspace-rcu/tests/regression/./rcutorture.h:646:4 (rcutorture_urcu_bp+0xd512d) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)

  Location is global '__thread_id_map' of size 32768 at 0x5612e63ef3f0 (rcutorture_urcu_bp+0x147f4f8)

  Thread T33 (tid=1110881, running) created by main thread at:
    #0 pthread_create <null> (rcutorture_urcu_bp+0x4e5bb) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)
    #1 create_thread /home/ondrej/Projects/userspace-rcu/tests/regression/../../tests/common/api.h:188:6 (rcutorture_urcu_bp+0xd5e6c) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)
    #2 perftest /home/ondrej/Projects/userspace-rcu/tests/regression/./rcutorture.h:254:2 (rcutorture_urcu_bp+0xd55e2) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)
    #3 main /home/ondrej/Projects/userspace-rcu/tests/regression/./rcutorture.h:646:4 (rcutorture_urcu_bp+0xd512d) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)

SUMMARY: ThreadSanitizer: data race /home/ondrej/Projects/userspace-rcu/tests/regression/../../tests/common/api.h:138:7 in __smp_thread_id
==================
==================
WARNING: ThreadSanitizer: data race (pid=1110834)
  Write of size 8 at 0x5612e63ef458 by main thread:
    #0 wait_thread /home/ondrej/Projects/userspace-rcu/tests/regression/../../tests/common/api.h:214:21 (rcutorture_urcu_bp+0xd6cc6) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)
    #1 wait_all_threads /home/ondrej/Projects/userspace-rcu/tests/regression/../../tests/common/api.h:227:10 (rcutorture_urcu_bp+0xd6ba1) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)
    #2 perftestrun /home/ondrej/Projects/userspace-rcu/tests/regression/./rcutorture.h:223:2 (rcutorture_urcu_bp+0xd62c9) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)
    #3 perftest /home/ondrej/Projects/userspace-rcu/tests/regression/./rcutorture.h:255:9 (rcutorture_urcu_bp+0xd55f5) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)
    #4 main /home/ondrej/Projects/userspace-rcu/tests/regression/./rcutorture.h:646:4 (rcutorture_urcu_bp+0xd512d) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)

  Previous read of size 8 at 0x5612e63ef458 by thread T33:
    #0 __smp_thread_id /home/ondrej/Projects/userspace-rcu/tests/regression/../../tests/common/api.h:138:7 (rcutorture_urcu_bp+0xd69d8) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)
    #1 smp_thread_id /home/ondrej/Projects/userspace-rcu/tests/regression/../../tests/common/api.h:167:10 (rcutorture_urcu_bp+0xd67be) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)
    #2 rcu_update_perf_test /home/ondrej/Projects/userspace-rcu/tests/regression/./rcutorture.h:189:2 (rcutorture_urcu_bp+0xd6173) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)

  Location is global '__thread_id_map' of size 32768 at 0x5612e63ef3f0 (rcutorture_urcu_bp+0x147f458)

  Thread T33 (tid=1110881, finished) created by main thread at:
    #0 pthread_create <null> (rcutorture_urcu_bp+0x4e5bb) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)
    #1 create_thread /home/ondrej/Projects/userspace-rcu/tests/regression/../../tests/common/api.h:188:6 (rcutorture_urcu_bp+0xd5e6c) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)
    #2 perftest /home/ondrej/Projects/userspace-rcu/tests/regression/./rcutorture.h:254:2 (rcutorture_urcu_bp+0xd55e2) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)
    #3 main /home/ondrej/Projects/userspace-rcu/tests/regression/./rcutorture.h:646:4 (rcutorture_urcu_bp+0xd512d) (BuildId: 7d612c532f25f675ca5eca76f2a54e3bca37fd4b)

SUMMARY: ThreadSanitizer: data race /home/ondrej/Projects/userspace-rcu/tests/regression/../../tests/common/api.h:214:21 in wait_thread
==================
ThreadSanitizer: reported 5 warnings

Ondrej
--
Ondřej Surý (He/Him)
ondrej@sury.org

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [lttng-dev] [PATCH 2/7] Use gcc __atomic builtis for <urcu/uatomic.h> implementation
  2023-03-20 19:38     ` Duncan Sands via lttng-dev
@ 2023-03-21 20:26       ` Mathieu Desnoyers via lttng-dev
  2023-03-22  8:24         ` Duncan Sands via lttng-dev
  0 siblings, 1 reply; 30+ messages in thread
From: Mathieu Desnoyers via lttng-dev @ 2023-03-21 20:26 UTC (permalink / raw)
  To: Duncan Sands, lttng-dev

On 2023-03-20 15:38, Duncan Sands via lttng-dev wrote:
> Hi Mathieu,
> 
>> While OK for the general case, I would recommend that we immediately 
>> implement something more efficient on x86 32/64 which takes into 
>> account that __ATOMIC_ACQ_REL atomic operations are implemented with 
>> LOCK prefixed atomic ops, which imply the barrier already, leaving the 
>> before/after_uatomic_*() as no-ops.
> 
> maybe first check whether the GCC optimizers merge them.  I believe some 
> optimizations of atomic primitives are allowed and implemented, but I 
> couldn't say which ones.
> 
> Best wishes, Duncan.

Tested on godbolt.org with:

int a;

void fct(void)
{
     (void) __atomic_add_fetch(&a, 1, __ATOMIC_RELAXED);
     __atomic_thread_fence(__ATOMIC_SEQ_CST);
}

x86-64 gcc 12.2 -O2 -std=c11:

fct:
         lock add        DWORD PTR a[rip], 1
         lock or QWORD PTR [rsp], 0
         ret
a:
         .zero   4

x86-64 clang 16.0.0 -O2 -std=c11:

fct:                                    # @fct
         lock            inc     dword ptr [rip + a]
         mfence
         ret
a:
         .long   0

So none of gcc/clang optimize this today, hence the need for an 
x86-specific implementation.

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [lttng-dev] [PATCH 2/7] Use gcc __atomic builtis for <urcu/uatomic.h> implementation
  2023-03-21 20:26       ` Mathieu Desnoyers via lttng-dev
@ 2023-03-22  8:24         ` Duncan Sands via lttng-dev
  0 siblings, 0 replies; 30+ messages in thread
From: Duncan Sands via lttng-dev @ 2023-03-22  8:24 UTC (permalink / raw)
  To: Mathieu Desnoyers, lttng-dev

Hi Mathieu,

> Tested on godbolt.org with:
> 
> int a;
> 
> void fct(void)
> {
>      (void) __atomic_add_fetch(&a, 1, __ATOMIC_RELAXED);
>      __atomic_thread_fence(__ATOMIC_SEQ_CST);
> }
> 
> x86-64 gcc 12.2 -O2 -std=c11:
> 
> fct:
>          lock add        DWORD PTR a[rip], 1
>          lock or QWORD PTR [rsp], 0
>          ret
> a:
>          .zero   4

that's disappointing.  It's the same if both use __ATOMIC_SEQ_CST:

int a;

void fct(void)
{
     (void) __atomic_add_fetch(&a, 1, __ATOMIC_SEQ_CST);
     __atomic_thread_fence(__ATOMIC_SEQ_CST);
}

->

fct():
         lock add        DWORD PTR a[rip], 1
         lock or QWORD PTR [rsp], 0
         ret
a:
         .zero   4

on x86-64 gcc 12.2 -O2 -std=c11.  Clang also doesn't optimize the fence away.

Best wishes, Duncan.
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2023-03-22  8:25 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-17 21:37 [lttng-dev] [PATCH 0/7] Replace the custom code with gcc/clang __atomic builtins Ondřej Surý via lttng-dev
2023-03-17 21:37 ` [lttng-dev] [PATCH 1/7] Require __atomic builtins to build Ondřej Surý via lttng-dev
2023-03-17 21:37 ` [lttng-dev] [PATCH 2/7] Use gcc __atomic builtis for <urcu/uatomic.h> implementation Ondřej Surý via lttng-dev
2023-03-20 18:03   ` Mathieu Desnoyers via lttng-dev
2023-03-20 18:13     ` Mathieu Desnoyers via lttng-dev
2023-03-20 18:28     ` Ondřej Surý via lttng-dev
2023-03-20 18:38       ` Mathieu Desnoyers via lttng-dev
2023-03-20 18:41         ` Mathieu Desnoyers via lttng-dev
2023-03-20 19:38     ` Duncan Sands via lttng-dev
2023-03-21 20:26       ` Mathieu Desnoyers via lttng-dev
2023-03-22  8:24         ` Duncan Sands via lttng-dev
2023-03-17 21:37 ` [lttng-dev] [PATCH 3/7] Use __atomic_thread_fence() for cmm_barrier() Ondřej Surý via lttng-dev
2023-03-20 18:06   ` Mathieu Desnoyers via lttng-dev
2023-03-20 18:14     ` Mathieu Desnoyers via lttng-dev
2023-03-17 21:37 ` [lttng-dev] [PATCH 4/7] Replace the internal pointer manipulation with __atomic builtins Ondřej Surý via lttng-dev
2023-03-20 18:25   ` Mathieu Desnoyers via lttng-dev
2023-03-17 21:37 ` [lttng-dev] [PATCH 5/7] Use __atomic builtins to implement CMM_{LOAD, STORE}_SHARED Ondřej Surý via lttng-dev
2023-03-20 18:28   ` Mathieu Desnoyers via lttng-dev
2023-03-17 21:37 ` [lttng-dev] [PATCH 6/7] Fix: uatomic_or() need retyping to uintptr_t in rculfhash.c Ondřej Surý via lttng-dev
2023-03-20 18:31   ` Mathieu Desnoyers via lttng-dev
2023-03-21 10:15     ` Ondřej Surý via lttng-dev
2023-03-21 14:44       ` Mathieu Desnoyers via lttng-dev
2023-03-21 14:45         ` Mathieu Desnoyers via lttng-dev
2023-03-17 21:37 ` [lttng-dev] [PATCH 7/7] Experiment: Add explicit memory barrier in free_completion() Ondřej Surý via lttng-dev
2023-03-20 18:37   ` Mathieu Desnoyers via lttng-dev
2023-03-21 10:21     ` Ondřej Surý via lttng-dev
2023-03-21 14:46       ` Mathieu Desnoyers via lttng-dev
2023-03-21 14:48         ` Ondřej Surý via lttng-dev
2023-03-21 14:49           ` Mathieu Desnoyers via lttng-dev
2023-03-21 14:59             ` [lttng-dev] TSAN and the tests Ondřej Surý via lttng-dev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).