* [lttng-dev] (no subject) @ 2023-03-21 13:30 Ondřej Surý via lttng-dev 2023-03-21 13:30 ` [lttng-dev] [PATCH 1/7] Require __atomic builtins to build Ondřej Surý via lttng-dev ` (6 more replies) 0 siblings, 7 replies; 13+ messages in thread From: Ondřej Surý via lttng-dev @ 2023-03-21 13:30 UTC (permalink / raw) To: lttng-dev This is a second round of the patches after implementing the requested changes from the first round. Ondrej [PATCH 1/7] Require __atomic builtins to build - no changes [PATCH 2/7] Use gcc __atomic builtis for <urcu/uatomic.h> - the non return macros are now __ATOMIC_RELAXED - the return macros are now __ATOMIC_SEQ_CST - the memory barriers are now [PATCH 3/7] Use __atomic_signal_fence() for cmm_barrier() - this now uses __atomic_signal_fence() instead of __atomic_thread_fence() [PATCH 4/7] Replace the internal pointer manipulation with __atomic - changed the memory ordering to __ATOMIC_SEQ_CST for xchg and cmpxchg [PATCH 5/7] Replace the arch-specific memory barriers with __atomic - dropped the changes to urcu/arch.h - removed all custom cmm_*() macros from urcu/arch/*.h - added the generic __atomic implementation to urcu/arch/generic.h This was it's still possible to override the generics with arch specific macros. [PATCH 6/7] Use __atomic builtins to implement CMM_{LOAD,STORE}_SHARED - _CMM_STORE_SHARED and CMM_STORE_SHARED now returns the stored value [PATCH 7/7] Fix: uatomic_or() need retyping to uintptr_t in - no changes _______________________________________________ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev ^ permalink raw reply [flat|nested] 13+ messages in thread
* [lttng-dev] [PATCH 1/7] Require __atomic builtins to build 2023-03-21 13:30 [lttng-dev] (no subject) Ondřej Surý via lttng-dev @ 2023-03-21 13:30 ` Ondřej Surý via lttng-dev 2023-03-21 19:26 ` Mathieu Desnoyers via lttng-dev 2023-03-21 13:30 ` [lttng-dev] [PATCH 2/7] Use gcc __atomic builtis for <urcu/uatomic.h> implementation Ondřej Surý via lttng-dev ` (5 subsequent siblings) 6 siblings, 1 reply; 13+ messages in thread From: Ondřej Surý via lttng-dev @ 2023-03-21 13:30 UTC (permalink / raw) To: lttng-dev Add autoconf checks for all __atomic builtins that urcu require, and adjust the gcc and clang versions in the README.md. Signed-off-by: Ondřej Surý <ondrej@sury.org> --- README.md | 33 +++++++++------------------------ configure.ac | 15 +++++++++++++++ 2 files changed, 24 insertions(+), 24 deletions(-) diff --git a/README.md b/README.md index ba5bb08..a65a07a 100644 --- a/README.md +++ b/README.md @@ -68,30 +68,15 @@ Should also work on: (more testing needed before claiming support for these OS). -Linux ARM depends on running a Linux kernel 2.6.15 or better, GCC 4.4 or -better. - -The C compiler used needs to support at least C99. The C++ compiler used -needs to support at least C++11. - -The GCC compiler versions 3.3, 3.4, 4.0, 4.1, 4.2, 4.3, 4.4 and 4.5 are -supported, with the following exceptions: - - - GCC 3.3 and 3.4 have a bug that prevents them from generating volatile - accesses to offsets in a TLS structure on 32-bit x86. These versions are - therefore not compatible with `liburcu` on x86 32-bit - (i386, i486, i586, i686). - The problem has been reported to the GCC community: - <http://www.mail-archive.com/gcc-bugs@gcc.gnu.org/msg281255.html> - - GCC 3.3 cannot match the "xchg" instruction on 32-bit x86 build. - See <http://kerneltrap.org/node/7507> - - Alpha, ia64 and ARM architectures depend on GCC 4.x with atomic builtins - support. For ARM this was introduced with GCC 4.4: - <http://gcc.gnu.org/gcc-4.4/changes.html>. - - Linux aarch64 depends on GCC 5.1 or better because prior versions - perform unsafe access to deallocated stack. - -Clang version 3.0 (based on LLVM 3.0) is supported. +Linux ARM depends on running a Linux kernel 2.6.15 or better. + +The C compiler used needs to support at least C99 and __atomic +builtins. The C++ compiler used needs to support at least C++11 +and __atomic builtins. + +The GCC compiler versions 4.7 or better are supported. + +Clang version 3.1 (based on LLVM 3.1) is supported. Glibc >= 2.4 should work but the older version we test against is currently 2.17. diff --git a/configure.ac b/configure.ac index 909cf1d..cb7ba18 100644 --- a/configure.ac +++ b/configure.ac @@ -198,6 +198,21 @@ AC_SEARCH_LIBS([clock_gettime], [rt], [ AC_DEFINE([CONFIG_RCU_HAVE_CLOCK_GETTIME], [1], [clock_gettime() is detected.]) ]) +# Require __atomic builtins +AC_COMPILE_IFELSE( + [AC_LANG_PROGRAM( + [[int x, y;]], + [[__atomic_store_n(&x, 0, __ATOMIC_RELEASE); + __atomic_load_n(&x, __ATOMIC_CONSUME); + y = __atomic_exchange_n(&x, 1, __ATOMIC_ACQ_REL); + __atomic_compare_exchange_n(&x, &y, 0, 0, __ATOMIC_ACQ_REL, __ATOMIC_CONSUME); + __atomic_add_fetch(&x, 1, __ATOMIC_ACQ_REL); + __atomic_sub_fetch(&x, 1, __ATOMIC_ACQ_REL); + __atomic_and_fetch(&x, 0x01, __ATOMIC_ACQ_REL); + __atomic_or_fetch(&x, 0x01, __ATOMIC_ACQ_REL); + __atomic_thread_fence(__ATOMIC_ACQ_REL)]])], + [], + [AC_MSG_ERROR([The compiler does not support __atomic builtins])]) ## ## ## Optional features selection ## -- 2.39.2 _______________________________________________ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [lttng-dev] [PATCH 1/7] Require __atomic builtins to build 2023-03-21 13:30 ` [lttng-dev] [PATCH 1/7] Require __atomic builtins to build Ondřej Surý via lttng-dev @ 2023-03-21 19:26 ` Mathieu Desnoyers via lttng-dev 0 siblings, 0 replies; 13+ messages in thread From: Mathieu Desnoyers via lttng-dev @ 2023-03-21 19:26 UTC (permalink / raw) To: Ondřej Surý, lttng-dev On 2023-03-21 09:30, Ondřej Surý via lttng-dev wrote: > Add autoconf checks for all __atomic builtins that urcu require, and > adjust the gcc and clang versions in the README.md. > > Signed-off-by: Ondřej Surý <ondrej@sury.org> > --- > README.md | 33 +++++++++------------------------ > configure.ac | 15 +++++++++++++++ > 2 files changed, 24 insertions(+), 24 deletions(-) > > diff --git a/README.md b/README.md > index ba5bb08..a65a07a 100644 > --- a/README.md > +++ b/README.md > @@ -68,30 +68,15 @@ Should also work on: > > (more testing needed before claiming support for these OS). > > -Linux ARM depends on running a Linux kernel 2.6.15 or better, GCC 4.4 or > -better. > - > -The C compiler used needs to support at least C99. The C++ compiler used > -needs to support at least C++11. > - > -The GCC compiler versions 3.3, 3.4, 4.0, 4.1, 4.2, 4.3, 4.4 and 4.5 are > -supported, with the following exceptions: > - > - - GCC 3.3 and 3.4 have a bug that prevents them from generating volatile > - accesses to offsets in a TLS structure on 32-bit x86. These versions are > - therefore not compatible with `liburcu` on x86 32-bit > - (i386, i486, i586, i686). > - The problem has been reported to the GCC community: > - <http://www.mail-archive.com/gcc-bugs@gcc.gnu.org/msg281255.html> > - - GCC 3.3 cannot match the "xchg" instruction on 32-bit x86 build. > - See <http://kerneltrap.org/node/7507> > - - Alpha, ia64 and ARM architectures depend on GCC 4.x with atomic builtins > - support. For ARM this was introduced with GCC 4.4: > - <http://gcc.gnu.org/gcc-4.4/changes.html>. > - - Linux aarch64 depends on GCC 5.1 or better because prior versions > - perform unsafe access to deallocated stack. > - > -Clang version 3.0 (based on LLVM 3.0) is supported. > +Linux ARM depends on running a Linux kernel 2.6.15 or better. > + > +The C compiler used needs to support at least C99 and __atomic > +builtins. The C++ compiler used needs to support at least C++11 > +and __atomic builtins. > + > +The GCC compiler versions 4.7 or better are supported. > + > +Clang version 3.1 (based on LLVM 3.1) is supported. > > Glibc >= 2.4 should work but the older version we test against is > currently 2.17. > diff --git a/configure.ac b/configure.ac > index 909cf1d..cb7ba18 100644 > --- a/configure.ac > +++ b/configure.ac > @@ -198,6 +198,21 @@ AC_SEARCH_LIBS([clock_gettime], [rt], [ > AC_DEFINE([CONFIG_RCU_HAVE_CLOCK_GETTIME], [1], [clock_gettime() is detected.]) > ]) > > +# Require __atomic builtins > +AC_COMPILE_IFELSE( > + [AC_LANG_PROGRAM( > + [[int x, y;]], > + [[__atomic_store_n(&x, 0, __ATOMIC_RELEASE); > + __atomic_load_n(&x, __ATOMIC_CONSUME); > + y = __atomic_exchange_n(&x, 1, __ATOMIC_ACQ_REL); > + __atomic_compare_exchange_n(&x, &y, 0, 0, __ATOMIC_ACQ_REL, __ATOMIC_CONSUME); > + __atomic_add_fetch(&x, 1, __ATOMIC_ACQ_REL); > + __atomic_sub_fetch(&x, 1, __ATOMIC_ACQ_REL); > + __atomic_and_fetch(&x, 0x01, __ATOMIC_ACQ_REL); > + __atomic_or_fetch(&x, 0x01, __ATOMIC_ACQ_REL); > + __atomic_thread_fence(__ATOMIC_ACQ_REL)]])], I think we also want to test for __atomic_signal_fence here. Thanks, Mathieu > + [], > + [AC_MSG_ERROR([The compiler does not support __atomic builtins])]) > > ## ## > ## Optional features selection ## -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com _______________________________________________ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev ^ permalink raw reply [flat|nested] 13+ messages in thread
* [lttng-dev] [PATCH 2/7] Use gcc __atomic builtis for <urcu/uatomic.h> implementation 2023-03-21 13:30 [lttng-dev] (no subject) Ondřej Surý via lttng-dev 2023-03-21 13:30 ` [lttng-dev] [PATCH 1/7] Require __atomic builtins to build Ondřej Surý via lttng-dev @ 2023-03-21 13:30 ` Ondřej Surý via lttng-dev 2023-03-21 20:03 ` Mathieu Desnoyers via lttng-dev 2023-03-21 13:30 ` [lttng-dev] [PATCH 3/7] Use __atomic_signal_fence() for cmm_barrier() Ondřej Surý via lttng-dev ` (4 subsequent siblings) 6 siblings, 1 reply; 13+ messages in thread From: Ondřej Surý via lttng-dev @ 2023-03-21 13:30 UTC (permalink / raw) To: lttng-dev Replace the custom assembly code in include/urcu/uatomic/ with __atomic builtins provided by C11-compatible compiler. Signed-off-by: Ondřej Surý <ondrej@sury.org> --- include/Makefile.am | 16 - include/urcu/uatomic.h | 84 +++-- include/urcu/uatomic/aarch64.h | 41 --- include/urcu/uatomic/alpha.h | 32 -- include/urcu/uatomic/arm.h | 57 --- include/urcu/uatomic/gcc.h | 46 --- include/urcu/uatomic/generic.h | 613 ------------------------------- include/urcu/uatomic/hppa.h | 10 - include/urcu/uatomic/ia64.h | 41 --- include/urcu/uatomic/m68k.h | 44 --- include/urcu/uatomic/mips.h | 32 -- include/urcu/uatomic/nios2.h | 32 -- include/urcu/uatomic/ppc.h | 237 ------------ include/urcu/uatomic/riscv.h | 44 --- include/urcu/uatomic/s390.h | 170 --------- include/urcu/uatomic/sparc64.h | 81 ----- include/urcu/uatomic/tile.h | 41 --- include/urcu/uatomic/x86.h | 646 --------------------------------- 18 files changed, 53 insertions(+), 2214 deletions(-) delete mode 100644 include/urcu/uatomic/aarch64.h delete mode 100644 include/urcu/uatomic/alpha.h delete mode 100644 include/urcu/uatomic/arm.h delete mode 100644 include/urcu/uatomic/gcc.h delete mode 100644 include/urcu/uatomic/generic.h delete mode 100644 include/urcu/uatomic/hppa.h delete mode 100644 include/urcu/uatomic/ia64.h delete mode 100644 include/urcu/uatomic/m68k.h delete mode 100644 include/urcu/uatomic/mips.h delete mode 100644 include/urcu/uatomic/nios2.h delete mode 100644 include/urcu/uatomic/ppc.h delete mode 100644 include/urcu/uatomic/riscv.h delete mode 100644 include/urcu/uatomic/s390.h delete mode 100644 include/urcu/uatomic/sparc64.h delete mode 100644 include/urcu/uatomic/tile.h delete mode 100644 include/urcu/uatomic/x86.h diff --git a/include/Makefile.am b/include/Makefile.am index ba1fe60..53a28fd 100644 --- a/include/Makefile.am +++ b/include/Makefile.am @@ -59,24 +59,8 @@ nobase_include_HEADERS = \ urcu/syscall-compat.h \ urcu/system.h \ urcu/tls-compat.h \ - urcu/uatomic/aarch64.h \ - urcu/uatomic/alpha.h \ urcu/uatomic_arch.h \ - urcu/uatomic/arm.h \ - urcu/uatomic/gcc.h \ - urcu/uatomic/generic.h \ urcu/uatomic.h \ - urcu/uatomic/hppa.h \ - urcu/uatomic/ia64.h \ - urcu/uatomic/m68k.h \ - urcu/uatomic/mips.h \ - urcu/uatomic/nios2.h \ - urcu/uatomic/ppc.h \ - urcu/uatomic/riscv.h \ - urcu/uatomic/s390.h \ - urcu/uatomic/sparc64.h \ - urcu/uatomic/tile.h \ - urcu/uatomic/x86.h \ urcu/urcu-bp.h \ urcu/urcu-futex.h \ urcu/urcu.h \ diff --git a/include/urcu/uatomic.h b/include/urcu/uatomic.h index 2fb5fd4..0327810 100644 --- a/include/urcu/uatomic.h +++ b/include/urcu/uatomic.h @@ -22,37 +22,59 @@ #define _URCU_UATOMIC_H #include <urcu/arch.h> +#include <urcu/system.h> -#if defined(URCU_ARCH_X86) -#include <urcu/uatomic/x86.h> -#elif defined(URCU_ARCH_PPC) -#include <urcu/uatomic/ppc.h> -#elif defined(URCU_ARCH_S390) -#include <urcu/uatomic/s390.h> -#elif defined(URCU_ARCH_SPARC64) -#include <urcu/uatomic/sparc64.h> -#elif defined(URCU_ARCH_ALPHA) -#include <urcu/uatomic/alpha.h> -#elif defined(URCU_ARCH_IA64) -#include <urcu/uatomic/ia64.h> -#elif defined(URCU_ARCH_ARM) -#include <urcu/uatomic/arm.h> -#elif defined(URCU_ARCH_AARCH64) -#include <urcu/uatomic/aarch64.h> -#elif defined(URCU_ARCH_MIPS) -#include <urcu/uatomic/mips.h> -#elif defined(URCU_ARCH_NIOS2) -#include <urcu/uatomic/nios2.h> -#elif defined(URCU_ARCH_TILE) -#include <urcu/uatomic/tile.h> -#elif defined(URCU_ARCH_HPPA) -#include <urcu/uatomic/hppa.h> -#elif defined(URCU_ARCH_M68K) -#include <urcu/uatomic/m68k.h> -#elif defined(URCU_ARCH_RISCV) -#include <urcu/uatomic/riscv.h> -#else -#error "Cannot build: unrecognized architecture, see <urcu/arch.h>." -#endif +#define UATOMIC_HAS_ATOMIC_BYTE +#define UATOMIC_HAS_ATOMIC_SHORT + +#define uatomic_set(addr, v) __atomic_store_n(addr, v, __ATOMIC_RELEASE) + +#define uatomic_read(addr) __atomic_load_n((addr), __ATOMIC_CONSUME) + +#define uatomic_xchg(addr, v) __atomic_exchange_n((addr), (v), __ATOMIC_SEQ_CST) + +#define uatomic_cmpxchg(addr, old, new) \ + ({ \ + __typeof__(*(addr)) __old = old; \ + __atomic_compare_exchange_n(addr, &__old, new, 0, \ + __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST); \ + __old; \ + }) + +#define uatomic_add_return(addr, v) \ + __atomic_add_fetch((addr), (v), __ATOMIC_SEQ_CST) + +#define uatomic_add(addr, v) \ + (void)__atomic_add_fetch((addr), (v), __ATOMIC_RELAXED) + +#define uatomic_sub_return(addr, v) \ + __atomic_sub_fetch((addr), (v), __ATOMIC_SEQ_CST) + +#define uatomic_sub(addr, v) \ + (void)__atomic_sub_fetch((addr), (v), __ATOMIC_RELAXED) + +#define uatomic_and(addr, mask) \ + (void)__atomic_and_fetch((addr), (mask), __ATOMIC_RELAXED) + +#define uatomic_or(addr, mask) \ + (void)__atomic_or_fetch((addr), (mask), __ATOMIC_RELAXED) + +#define uatomic_inc(addr) (void)__atomic_add_fetch((addr), 1, __ATOMIC_RELAXED) +#define uatomic_dec(addr) (void)__atomic_sub_fetch((addr), 1, __ATOMIC_RELAXED) + +#define cmm_smp_mb__before_uatomic_and() __atomic_thread_fence(__ATOMIC_SEQ_CST) +#define cmm_smp_mb__after_uatomic_and() __atomic_thread_fence(__ATOMIC_SEQ_CST) +#define cmm_smp_mb__before_uatomic_or() __atomic_thread_fence(__ATOMIC_SEQ_CST) +#define cmm_smp_mb__after_uatomic_or() __atomic_thread_fence(__ATOMIC_SEQ_CST) +#define cmm_smp_mb__before_uatomic_add() __atomic_thread_fence(__ATOMIC_SEQ_CST) +#define cmm_smp_mb__after_uatomic_add() __atomic_thread_fence(__ATOMIC_SEQ_CST) +#define cmm_smp_mb__before_uatomic_sub() cmm_smp_mb__before_uatomic_add() +#define cmm_smp_mb__after_uatomic_sub() cmm_smp_mb__after_uatomic_add() +#define cmm_smp_mb__before_uatomic_inc() cmm_smp_mb__before_uatomic_add() +#define cmm_smp_mb__after_uatomic_inc() cmm_smp_mb__after_uatomic_add() +#define cmm_smp_mb__before_uatomic_dec() cmm_smp_mb__before_uatomic_add() +#define cmm_smp_mb__after_uatomic_dec() cmm_smp_mb__after_uatomic_add() + +#define cmm_smp_mb() cmm_mb() #endif /* _URCU_UATOMIC_H */ diff --git a/include/urcu/uatomic/aarch64.h b/include/urcu/uatomic/aarch64.h deleted file mode 100644 index 58698ce..0000000 --- a/include/urcu/uatomic/aarch64.h +++ /dev/null @@ -1,41 +0,0 @@ -#ifndef _URCU_ARCH_UATOMIC_AARCH64_H -#define _URCU_ARCH_UATOMIC_AARCH64_H - -/* - * Copyright (c) 1991-1994 by Xerox Corporation. All rights reserved. - * Copyright (c) 1996-1999 by Silicon Graphics. All rights reserved. - * Copyright (c) 1999-2004 Hewlett-Packard Development Company, L.P. - * Copyright (c) 2009-2015 Mathieu Desnoyers - * Copyright (c) 2010 Paul E. McKenney, IBM Corporation - * (Adapted from uatomic_arch_ppc.h) - * - * THIS MATERIAL IS PROVIDED AS IS, WITH ABSOLUTELY NO WARRANTY EXPRESSED - * OR IMPLIED. ANY USE IS AT YOUR OWN RISK. - * - * Permission is hereby granted to use or copy this program - * for any purpose, provided the above notices are retained on all copies. - * Permission to modify the code and to distribute modified code is granted, - * provided the above notices are retained, and a notice that the code was - * modified is included with the above copyright notice. - * - * Code inspired from libuatomic_ops-1.2, inherited in part from the - * Boehm-Demers-Weiser conservative garbage collector. - */ - -#include <urcu/compiler.h> -#include <urcu/system.h> - -#ifdef __cplusplus -extern "C" { -#endif - -#define UATOMIC_HAS_ATOMIC_BYTE -#define UATOMIC_HAS_ATOMIC_SHORT - -#ifdef __cplusplus -} -#endif - -#include <urcu/uatomic/generic.h> - -#endif /* _URCU_ARCH_UATOMIC_AARCH64_H */ diff --git a/include/urcu/uatomic/alpha.h b/include/urcu/uatomic/alpha.h deleted file mode 100644 index 5dceb90..0000000 --- a/include/urcu/uatomic/alpha.h +++ /dev/null @@ -1,32 +0,0 @@ -#ifndef _URCU_UATOMIC_ARCH_ALPHA_H -#define _URCU_UATOMIC_ARCH_ALPHA_H - -/* - * Atomic exchange operations for the Alpha architecture. Let GCC do it. - * - * Copyright (c) 2010 Paolo Bonzini <pbonzini@redhat.com> - * - * Permission is hereby granted, free of charge, to any person obtaining a copy - * of this software and associated documentation files (the "Software"), to - * deal in the Software without restriction, including without limitation the - * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or - * sell copies of the Software, and to permit persons to whom the Software is - * furnished to do so, subject to the following conditions: - * - * The above copyright notice and this permission notice shall be included in - * all copies or substantial portions of the Software. - * - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR - * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, - * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE - * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER - * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING - * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS - * IN THE SOFTWARE. - */ - -#include <urcu/compiler.h> -#include <urcu/system.h> -#include <urcu/uatomic/generic.h> - -#endif /* _URCU_UATOMIC_ARCH_ALPHA_H */ diff --git a/include/urcu/uatomic/arm.h b/include/urcu/uatomic/arm.h deleted file mode 100644 index 95f32f3..0000000 --- a/include/urcu/uatomic/arm.h +++ /dev/null @@ -1,57 +0,0 @@ -#ifndef _URCU_ARCH_UATOMIC_ARM_H -#define _URCU_ARCH_UATOMIC_ARM_H - -/* - * Atomics for ARM. This approach is usable on kernels back to 2.6.15. - * - * Copyright (c) 1991-1994 by Xerox Corporation. All rights reserved. - * Copyright (c) 1996-1999 by Silicon Graphics. All rights reserved. - * Copyright (c) 1999-2004 Hewlett-Packard Development Company, L.P. - * Copyright (c) 2009 Mathieu Desnoyers - * Copyright (c) 2010 Paul E. McKenney, IBM Corporation - * (Adapted from uatomic_arch_ppc.h) - * - * THIS MATERIAL IS PROVIDED AS IS, WITH ABSOLUTELY NO WARRANTY EXPRESSED - * OR IMPLIED. ANY USE IS AT YOUR OWN RISK. - * - * Permission is hereby granted to use or copy this program - * for any purpose, provided the above notices are retained on all copies. - * Permission to modify the code and to distribute modified code is granted, - * provided the above notices are retained, and a notice that the code was - * modified is included with the above copyright notice. - * - * Code inspired from libuatomic_ops-1.2, inherited in part from the - * Boehm-Demers-Weiser conservative garbage collector. - */ - -#include <urcu/compiler.h> -#include <urcu/system.h> -#include <urcu/arch.h> - -#ifdef __cplusplus -extern "C" { -#endif - -/* xchg */ - -/* - * Based on [1], __sync_lock_test_and_set() is not a full barrier, but - * instead only an acquire barrier. Given that uatomic_xchg() acts as - * both release and acquire barriers, we therefore need to have our own - * release barrier before this operation. - * - * [1] https://gcc.gnu.org/onlinedocs/gcc-4.1.0/gcc/Atomic-Builtins.html - */ -#define uatomic_xchg(addr, v) \ - ({ \ - cmm_smp_mb(); \ - __sync_lock_test_and_set(addr, v); \ - }) - -#ifdef __cplusplus -} -#endif - -#include <urcu/uatomic/generic.h> - -#endif /* _URCU_ARCH_UATOMIC_ARM_H */ diff --git a/include/urcu/uatomic/gcc.h b/include/urcu/uatomic/gcc.h deleted file mode 100644 index 438e039..0000000 --- a/include/urcu/uatomic/gcc.h +++ /dev/null @@ -1,46 +0,0 @@ -#ifndef _URCU_ARCH_UATOMIC_GCC_H -#define _URCU_ARCH_UATOMIC_GCC_H - -/* - * Copyright (c) 1991-1994 by Xerox Corporation. All rights reserved. - * Copyright (c) 1996-1999 by Silicon Graphics. All rights reserved. - * Copyright (c) 1999-2004 Hewlett-Packard Development Company, L.P. - * Copyright (c) 2009 Mathieu Desnoyers - * Copyright (c) 2010 Paul E. McKenney, IBM Corporation - * (Adapted from uatomic_arch_ppc.h) - * - * THIS MATERIAL IS PROVIDED AS IS, WITH ABSOLUTELY NO WARRANTY EXPRESSED - * OR IMPLIED. ANY USE IS AT YOUR OWN RISK. - * - * Permission is hereby granted to use or copy this program - * for any purpose, provided the above notices are retained on all copies. - * Permission to modify the code and to distribute modified code is granted, - * provided the above notices are retained, and a notice that the code was - * modified is included with the above copyright notice. - * - * Code inspired from libuatomic_ops-1.2, inherited in part from the - * Boehm-Demers-Weiser conservative garbage collector. - */ - -#include <urcu/compiler.h> -#include <urcu/system.h> - -#ifdef __cplusplus -extern "C" { -#endif - -/* - * If your platform doesn't have a full set of atomics, you will need - * a separate uatomic_arch_*.h file for your architecture. Otherwise, - * just rely on the definitions in uatomic/generic.h. - */ -#define UATOMIC_HAS_ATOMIC_BYTE -#define UATOMIC_HAS_ATOMIC_SHORT - -#ifdef __cplusplus -} -#endif - -#include <urcu/uatomic/generic.h> - -#endif /* _URCU_ARCH_UATOMIC_GCC_H */ diff --git a/include/urcu/uatomic/generic.h b/include/urcu/uatomic/generic.h deleted file mode 100644 index c3762b0..0000000 --- a/include/urcu/uatomic/generic.h +++ /dev/null @@ -1,613 +0,0 @@ -#ifndef _URCU_UATOMIC_GENERIC_H -#define _URCU_UATOMIC_GENERIC_H - -/* - * Copyright (c) 1991-1994 by Xerox Corporation. All rights reserved. - * Copyright (c) 1996-1999 by Silicon Graphics. All rights reserved. - * Copyright (c) 1999-2004 Hewlett-Packard Development Company, L.P. - * Copyright (c) 2009 Mathieu Desnoyers - * Copyright (c) 2010 Paolo Bonzini - * - * THIS MATERIAL IS PROVIDED AS IS, WITH ABSOLUTELY NO WARRANTY EXPRESSED - * OR IMPLIED. ANY USE IS AT YOUR OWN RISK. - * - * Permission is hereby granted to use or copy this program - * for any purpose, provided the above notices are retained on all copies. - * Permission to modify the code and to distribute modified code is granted, - * provided the above notices are retained, and a notice that the code was - * modified is included with the above copyright notice. - * - * Code inspired from libuatomic_ops-1.2, inherited in part from the - * Boehm-Demers-Weiser conservative garbage collector. - */ - -#include <stdint.h> -#include <urcu/compiler.h> -#include <urcu/system.h> - -#ifdef __cplusplus -extern "C" { -#endif - -#ifndef uatomic_set -#define uatomic_set(addr, v) ((void) CMM_STORE_SHARED(*(addr), (v))) -#endif - -#ifndef uatomic_read -#define uatomic_read(addr) CMM_LOAD_SHARED(*(addr)) -#endif - -#if !defined __OPTIMIZE__ || defined UATOMIC_NO_LINK_ERROR -static inline __attribute__((always_inline, __noreturn__)) -void _uatomic_link_error(void) -{ -#ifdef ILLEGAL_INSTR - /* - * generate an illegal instruction. Cannot catch this with - * linker tricks when optimizations are disabled. - */ - __asm__ __volatile__(ILLEGAL_INSTR); -#else - __builtin_trap(); -#endif -} - -#else /* #if !defined __OPTIMIZE__ || defined UATOMIC_NO_LINK_ERROR */ -extern void _uatomic_link_error(void); -#endif /* #else #if !defined __OPTIMIZE__ || defined UATOMIC_NO_LINK_ERROR */ - -/* cmpxchg */ - -#ifndef uatomic_cmpxchg -static inline __attribute__((always_inline)) -unsigned long _uatomic_cmpxchg(void *addr, unsigned long old, - unsigned long _new, int len) -{ - switch (len) { -#ifdef UATOMIC_HAS_ATOMIC_BYTE - case 1: - return __sync_val_compare_and_swap_1((uint8_t *) addr, old, - _new); -#endif -#ifdef UATOMIC_HAS_ATOMIC_SHORT - case 2: - return __sync_val_compare_and_swap_2((uint16_t *) addr, old, - _new); -#endif - case 4: - return __sync_val_compare_and_swap_4((uint32_t *) addr, old, - _new); -#if (CAA_BITS_PER_LONG == 64) - case 8: - return __sync_val_compare_and_swap_8((uint64_t *) addr, old, - _new); -#endif - } - _uatomic_link_error(); - return 0; -} - - -#define uatomic_cmpxchg(addr, old, _new) \ - ((__typeof__(*(addr))) _uatomic_cmpxchg((addr), \ - caa_cast_long_keep_sign(old), \ - caa_cast_long_keep_sign(_new),\ - sizeof(*(addr)))) - - -/* uatomic_and */ - -#ifndef uatomic_and -static inline __attribute__((always_inline)) -void _uatomic_and(void *addr, unsigned long val, - int len) -{ - switch (len) { -#ifdef UATOMIC_HAS_ATOMIC_BYTE - case 1: - __sync_and_and_fetch_1((uint8_t *) addr, val); - return; -#endif -#ifdef UATOMIC_HAS_ATOMIC_SHORT - case 2: - __sync_and_and_fetch_2((uint16_t *) addr, val); - return; -#endif - case 4: - __sync_and_and_fetch_4((uint32_t *) addr, val); - return; -#if (CAA_BITS_PER_LONG == 64) - case 8: - __sync_and_and_fetch_8((uint64_t *) addr, val); - return; -#endif - } - _uatomic_link_error(); -} - -#define uatomic_and(addr, v) \ - (_uatomic_and((addr), \ - caa_cast_long_keep_sign(v), \ - sizeof(*(addr)))) -#define cmm_smp_mb__before_uatomic_and() cmm_barrier() -#define cmm_smp_mb__after_uatomic_and() cmm_barrier() - -#endif - -/* uatomic_or */ - -#ifndef uatomic_or -static inline __attribute__((always_inline)) -void _uatomic_or(void *addr, unsigned long val, - int len) -{ - switch (len) { -#ifdef UATOMIC_HAS_ATOMIC_BYTE - case 1: - __sync_or_and_fetch_1((uint8_t *) addr, val); - return; -#endif -#ifdef UATOMIC_HAS_ATOMIC_SHORT - case 2: - __sync_or_and_fetch_2((uint16_t *) addr, val); - return; -#endif - case 4: - __sync_or_and_fetch_4((uint32_t *) addr, val); - return; -#if (CAA_BITS_PER_LONG == 64) - case 8: - __sync_or_and_fetch_8((uint64_t *) addr, val); - return; -#endif - } - _uatomic_link_error(); - return; -} - -#define uatomic_or(addr, v) \ - (_uatomic_or((addr), \ - caa_cast_long_keep_sign(v), \ - sizeof(*(addr)))) -#define cmm_smp_mb__before_uatomic_or() cmm_barrier() -#define cmm_smp_mb__after_uatomic_or() cmm_barrier() - -#endif - - -/* uatomic_add_return */ - -#ifndef uatomic_add_return -static inline __attribute__((always_inline)) -unsigned long _uatomic_add_return(void *addr, unsigned long val, - int len) -{ - switch (len) { -#ifdef UATOMIC_HAS_ATOMIC_BYTE - case 1: - return __sync_add_and_fetch_1((uint8_t *) addr, val); -#endif -#ifdef UATOMIC_HAS_ATOMIC_SHORT - case 2: - return __sync_add_and_fetch_2((uint16_t *) addr, val); -#endif - case 4: - return __sync_add_and_fetch_4((uint32_t *) addr, val); -#if (CAA_BITS_PER_LONG == 64) - case 8: - return __sync_add_and_fetch_8((uint64_t *) addr, val); -#endif - } - _uatomic_link_error(); - return 0; -} - - -#define uatomic_add_return(addr, v) \ - ((__typeof__(*(addr))) _uatomic_add_return((addr), \ - caa_cast_long_keep_sign(v), \ - sizeof(*(addr)))) -#endif /* #ifndef uatomic_add_return */ - -#ifndef uatomic_xchg -/* xchg */ - -static inline __attribute__((always_inline)) -unsigned long _uatomic_exchange(void *addr, unsigned long val, int len) -{ - switch (len) { -#ifdef UATOMIC_HAS_ATOMIC_BYTE - case 1: - { - uint8_t old; - - do { - old = uatomic_read((uint8_t *) addr); - } while (!__sync_bool_compare_and_swap_1((uint8_t *) addr, - old, val)); - - return old; - } -#endif -#ifdef UATOMIC_HAS_ATOMIC_SHORT - case 2: - { - uint16_t old; - - do { - old = uatomic_read((uint16_t *) addr); - } while (!__sync_bool_compare_and_swap_2((uint16_t *) addr, - old, val)); - - return old; - } -#endif - case 4: - { - uint32_t old; - - do { - old = uatomic_read((uint32_t *) addr); - } while (!__sync_bool_compare_and_swap_4((uint32_t *) addr, - old, val)); - - return old; - } -#if (CAA_BITS_PER_LONG == 64) - case 8: - { - uint64_t old; - - do { - old = uatomic_read((uint64_t *) addr); - } while (!__sync_bool_compare_and_swap_8((uint64_t *) addr, - old, val)); - - return old; - } -#endif - } - _uatomic_link_error(); - return 0; -} - -#define uatomic_xchg(addr, v) \ - ((__typeof__(*(addr))) _uatomic_exchange((addr), \ - caa_cast_long_keep_sign(v), \ - sizeof(*(addr)))) -#endif /* #ifndef uatomic_xchg */ - -#else /* #ifndef uatomic_cmpxchg */ - -#ifndef uatomic_and -/* uatomic_and */ - -static inline __attribute__((always_inline)) -void _uatomic_and(void *addr, unsigned long val, int len) -{ - switch (len) { -#ifdef UATOMIC_HAS_ATOMIC_BYTE - case 1: - { - uint8_t old, oldt; - - oldt = uatomic_read((uint8_t *) addr); - do { - old = oldt; - oldt = _uatomic_cmpxchg(addr, old, old & val, 1); - } while (oldt != old); - - return; - } -#endif -#ifdef UATOMIC_HAS_ATOMIC_SHORT - case 2: - { - uint16_t old, oldt; - - oldt = uatomic_read((uint16_t *) addr); - do { - old = oldt; - oldt = _uatomic_cmpxchg(addr, old, old & val, 2); - } while (oldt != old); - } -#endif - case 4: - { - uint32_t old, oldt; - - oldt = uatomic_read((uint32_t *) addr); - do { - old = oldt; - oldt = _uatomic_cmpxchg(addr, old, old & val, 4); - } while (oldt != old); - - return; - } -#if (CAA_BITS_PER_LONG == 64) - case 8: - { - uint64_t old, oldt; - - oldt = uatomic_read((uint64_t *) addr); - do { - old = oldt; - oldt = _uatomic_cmpxchg(addr, old, old & val, 8); - } while (oldt != old); - - return; - } -#endif - } - _uatomic_link_error(); -} - -#define uatomic_and(addr, v) \ - (_uatomic_and((addr), \ - caa_cast_long_keep_sign(v), \ - sizeof(*(addr)))) -#define cmm_smp_mb__before_uatomic_and() cmm_barrier() -#define cmm_smp_mb__after_uatomic_and() cmm_barrier() - -#endif /* #ifndef uatomic_and */ - -#ifndef uatomic_or -/* uatomic_or */ - -static inline __attribute__((always_inline)) -void _uatomic_or(void *addr, unsigned long val, int len) -{ - switch (len) { -#ifdef UATOMIC_HAS_ATOMIC_BYTE - case 1: - { - uint8_t old, oldt; - - oldt = uatomic_read((uint8_t *) addr); - do { - old = oldt; - oldt = _uatomic_cmpxchg(addr, old, old | val, 1); - } while (oldt != old); - - return; - } -#endif -#ifdef UATOMIC_HAS_ATOMIC_SHORT - case 2: - { - uint16_t old, oldt; - - oldt = uatomic_read((uint16_t *) addr); - do { - old = oldt; - oldt = _uatomic_cmpxchg(addr, old, old | val, 2); - } while (oldt != old); - - return; - } -#endif - case 4: - { - uint32_t old, oldt; - - oldt = uatomic_read((uint32_t *) addr); - do { - old = oldt; - oldt = _uatomic_cmpxchg(addr, old, old | val, 4); - } while (oldt != old); - - return; - } -#if (CAA_BITS_PER_LONG == 64) - case 8: - { - uint64_t old, oldt; - - oldt = uatomic_read((uint64_t *) addr); - do { - old = oldt; - oldt = _uatomic_cmpxchg(addr, old, old | val, 8); - } while (oldt != old); - - return; - } -#endif - } - _uatomic_link_error(); -} - -#define uatomic_or(addr, v) \ - (_uatomic_or((addr), \ - caa_cast_long_keep_sign(v), \ - sizeof(*(addr)))) -#define cmm_smp_mb__before_uatomic_or() cmm_barrier() -#define cmm_smp_mb__after_uatomic_or() cmm_barrier() - -#endif /* #ifndef uatomic_or */ - -#ifndef uatomic_add_return -/* uatomic_add_return */ - -static inline __attribute__((always_inline)) -unsigned long _uatomic_add_return(void *addr, unsigned long val, int len) -{ - switch (len) { -#ifdef UATOMIC_HAS_ATOMIC_BYTE - case 1: - { - uint8_t old, oldt; - - oldt = uatomic_read((uint8_t *) addr); - do { - old = oldt; - oldt = uatomic_cmpxchg((uint8_t *) addr, - old, old + val); - } while (oldt != old); - - return old + val; - } -#endif -#ifdef UATOMIC_HAS_ATOMIC_SHORT - case 2: - { - uint16_t old, oldt; - - oldt = uatomic_read((uint16_t *) addr); - do { - old = oldt; - oldt = uatomic_cmpxchg((uint16_t *) addr, - old, old + val); - } while (oldt != old); - - return old + val; - } -#endif - case 4: - { - uint32_t old, oldt; - - oldt = uatomic_read((uint32_t *) addr); - do { - old = oldt; - oldt = uatomic_cmpxchg((uint32_t *) addr, - old, old + val); - } while (oldt != old); - - return old + val; - } -#if (CAA_BITS_PER_LONG == 64) - case 8: - { - uint64_t old, oldt; - - oldt = uatomic_read((uint64_t *) addr); - do { - old = oldt; - oldt = uatomic_cmpxchg((uint64_t *) addr, - old, old + val); - } while (oldt != old); - - return old + val; - } -#endif - } - _uatomic_link_error(); - return 0; -} - -#define uatomic_add_return(addr, v) \ - ((__typeof__(*(addr))) _uatomic_add_return((addr), \ - caa_cast_long_keep_sign(v), \ - sizeof(*(addr)))) -#endif /* #ifndef uatomic_add_return */ - -#ifndef uatomic_xchg -/* xchg */ - -static inline __attribute__((always_inline)) -unsigned long _uatomic_exchange(void *addr, unsigned long val, int len) -{ - switch (len) { -#ifdef UATOMIC_HAS_ATOMIC_BYTE - case 1: - { - uint8_t old, oldt; - - oldt = uatomic_read((uint8_t *) addr); - do { - old = oldt; - oldt = uatomic_cmpxchg((uint8_t *) addr, - old, val); - } while (oldt != old); - - return old; - } -#endif -#ifdef UATOMIC_HAS_ATOMIC_SHORT - case 2: - { - uint16_t old, oldt; - - oldt = uatomic_read((uint16_t *) addr); - do { - old = oldt; - oldt = uatomic_cmpxchg((uint16_t *) addr, - old, val); - } while (oldt != old); - - return old; - } -#endif - case 4: - { - uint32_t old, oldt; - - oldt = uatomic_read((uint32_t *) addr); - do { - old = oldt; - oldt = uatomic_cmpxchg((uint32_t *) addr, - old, val); - } while (oldt != old); - - return old; - } -#if (CAA_BITS_PER_LONG == 64) - case 8: - { - uint64_t old, oldt; - - oldt = uatomic_read((uint64_t *) addr); - do { - old = oldt; - oldt = uatomic_cmpxchg((uint64_t *) addr, - old, val); - } while (oldt != old); - - return old; - } -#endif - } - _uatomic_link_error(); - return 0; -} - -#define uatomic_xchg(addr, v) \ - ((__typeof__(*(addr))) _uatomic_exchange((addr), \ - caa_cast_long_keep_sign(v), \ - sizeof(*(addr)))) -#endif /* #ifndef uatomic_xchg */ - -#endif /* #else #ifndef uatomic_cmpxchg */ - -/* uatomic_sub_return, uatomic_add, uatomic_sub, uatomic_inc, uatomic_dec */ - -#ifndef uatomic_add -#define uatomic_add(addr, v) (void)uatomic_add_return((addr), (v)) -#define cmm_smp_mb__before_uatomic_add() cmm_barrier() -#define cmm_smp_mb__after_uatomic_add() cmm_barrier() -#endif - -#define uatomic_sub_return(addr, v) \ - uatomic_add_return((addr), -(caa_cast_long_keep_sign(v))) -#define uatomic_sub(addr, v) \ - uatomic_add((addr), -(caa_cast_long_keep_sign(v))) -#define cmm_smp_mb__before_uatomic_sub() cmm_smp_mb__before_uatomic_add() -#define cmm_smp_mb__after_uatomic_sub() cmm_smp_mb__after_uatomic_add() - -#ifndef uatomic_inc -#define uatomic_inc(addr) uatomic_add((addr), 1) -#define cmm_smp_mb__before_uatomic_inc() cmm_smp_mb__before_uatomic_add() -#define cmm_smp_mb__after_uatomic_inc() cmm_smp_mb__after_uatomic_add() -#endif - -#ifndef uatomic_dec -#define uatomic_dec(addr) uatomic_add((addr), -1) -#define cmm_smp_mb__before_uatomic_dec() cmm_smp_mb__before_uatomic_add() -#define cmm_smp_mb__after_uatomic_dec() cmm_smp_mb__after_uatomic_add() -#endif - -#ifdef __cplusplus -} -#endif - -#endif /* _URCU_UATOMIC_GENERIC_H */ diff --git a/include/urcu/uatomic/hppa.h b/include/urcu/uatomic/hppa.h deleted file mode 100644 index 2102153..0000000 --- a/include/urcu/uatomic/hppa.h +++ /dev/null @@ -1,10 +0,0 @@ -#ifndef _URCU_ARCH_UATOMIC_HPPA_H -#define _URCU_ARCH_UATOMIC_HPPA_H - -#include <urcu/compiler.h> -#include <urcu/system.h> - -#define UATOMIC_HAS_ATOMIC_SHORT -#include <urcu/uatomic/generic.h> - -#endif /* _URCU_ARCH_UATOMIC_HPPA_H */ diff --git a/include/urcu/uatomic/ia64.h b/include/urcu/uatomic/ia64.h deleted file mode 100644 index b5db8cc..0000000 --- a/include/urcu/uatomic/ia64.h +++ /dev/null @@ -1,41 +0,0 @@ -#ifndef _URCU_ARCH_UATOMIC_IA64_H -#define _URCU_ARCH_UATOMIC_IA64_H - -/* - * Copyright (c) 1991-1994 by Xerox Corporation. All rights reserved. - * Copyright (c) 1996-1999 by Silicon Graphics. All rights reserved. - * Copyright (c) 1999-2004 Hewlett-Packard Development Company, L.P. - * Copyright (c) 2009-2015 Mathieu Desnoyers - * Copyright (c) 2010 Paul E. McKenney, IBM Corporation - * (Adapted from uatomic_arch_ppc.h) - * - * THIS MATERIAL IS PROVIDED AS IS, WITH ABSOLUTELY NO WARRANTY EXPRESSED - * OR IMPLIED. ANY USE IS AT YOUR OWN RISK. - * - * Permission is hereby granted to use or copy this program - * for any purpose, provided the above notices are retained on all copies. - * Permission to modify the code and to distribute modified code is granted, - * provided the above notices are retained, and a notice that the code was - * modified is included with the above copyright notice. - * - * Code inspired from libuatomic_ops-1.2, inherited in part from the - * Boehm-Demers-Weiser conservative garbage collector. - */ - -#include <urcu/compiler.h> -#include <urcu/system.h> - -#ifdef __cplusplus -extern "C" { -#endif - -#define UATOMIC_HAS_ATOMIC_BYTE -#define UATOMIC_HAS_ATOMIC_SHORT - -#ifdef __cplusplus -} -#endif - -#include <urcu/uatomic/generic.h> - -#endif /* _URCU_ARCH_UATOMIC_IA64_H */ diff --git a/include/urcu/uatomic/m68k.h b/include/urcu/uatomic/m68k.h deleted file mode 100644 index 60b01c7..0000000 --- a/include/urcu/uatomic/m68k.h +++ /dev/null @@ -1,44 +0,0 @@ -/* - * Atomic exchange operations for the m68k architecture. Let GCC do it. - * - * Copyright (c) 2017 Michael Jeanson <mjeanson@efficios.com> - * - * Permission is hereby granted, free of charge, to any person obtaining a copy - * of this software and associated documentation files (the "Software"), to - * deal in the Software without restriction, including without limitation the - * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or - * sell copies of the Software, and to permit persons to whom the Software is - * furnished to do so, subject to the following conditions: - * - * The above copyright notice and this permission notice shall be included in - * all copies or substantial portions of the Software. - * - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR - * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, - * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE - * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER - * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING - * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS - * IN THE SOFTWARE. - */ - -#ifndef _URCU_ARCH_UATOMIC_M68K_H -#define _URCU_ARCH_UATOMIC_M68K_H - -#include <urcu/compiler.h> -#include <urcu/system.h> - -#ifdef __cplusplus -extern "C" { -#endif - -#define UATOMIC_HAS_ATOMIC_BYTE -#define UATOMIC_HAS_ATOMIC_SHORT - -#ifdef __cplusplus -} -#endif - -#include <urcu/uatomic/generic.h> - -#endif /* _URCU_ARCH_UATOMIC_M68K_H */ diff --git a/include/urcu/uatomic/mips.h b/include/urcu/uatomic/mips.h deleted file mode 100644 index bd7ca7f..0000000 --- a/include/urcu/uatomic/mips.h +++ /dev/null @@ -1,32 +0,0 @@ -#ifndef _URCU_UATOMIC_ARCH_MIPS_H -#define _URCU_UATOMIC_ARCH_MIPS_H - -/* - * Atomic exchange operations for the MIPS architecture. Let GCC do it. - * - * Copyright (c) 2010 Paolo Bonzini <pbonzini@redhat.com> - * - * Permission is hereby granted, free of charge, to any person obtaining a copy - * of this software and associated documentation files (the "Software"), to - * deal in the Software without restriction, including without limitation the - * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or - * sell copies of the Software, and to permit persons to whom the Software is - * furnished to do so, subject to the following conditions: - * - * The above copyright notice and this permission notice shall be included in - * all copies or substantial portions of the Software. - * - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR - * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, - * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE - * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER - * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING - * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS - * IN THE SOFTWARE. - */ - -#include <urcu/compiler.h> -#include <urcu/system.h> -#include <urcu/uatomic/generic.h> - -#endif /* _URCU_UATOMIC_ARCH_MIPS_H */ diff --git a/include/urcu/uatomic/nios2.h b/include/urcu/uatomic/nios2.h deleted file mode 100644 index 5b3c303..0000000 --- a/include/urcu/uatomic/nios2.h +++ /dev/null @@ -1,32 +0,0 @@ -#ifndef _URCU_UATOMIC_ARCH_NIOS2_H -#define _URCU_UATOMIC_ARCH_NIOS2_H - -/* - * Atomic exchange operations for the NIOS2 architecture. Let GCC do it. - * - * Copyright (c) 2016 Marek Vasut <marex@denx.de> - * - * Permission is hereby granted, free of charge, to any person obtaining a copy - * of this software and associated documentation files (the "Software"), to - * deal in the Software without restriction, including without limitation the - * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or - * sell copies of the Software, and to permit persons to whom the Software is - * furnished to do so, subject to the following conditions: - * - * The above copyright notice and this permission notice shall be included in - * all copies or substantial portions of the Software. - * - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR - * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, - * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE - * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER - * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING - * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS - * IN THE SOFTWARE. - */ - -#include <urcu/compiler.h> -#include <urcu/system.h> -#include <urcu/uatomic/generic.h> - -#endif /* _URCU_UATOMIC_ARCH_NIOS2_H */ diff --git a/include/urcu/uatomic/ppc.h b/include/urcu/uatomic/ppc.h deleted file mode 100644 index 0e672f5..0000000 --- a/include/urcu/uatomic/ppc.h +++ /dev/null @@ -1,237 +0,0 @@ -#ifndef _URCU_ARCH_UATOMIC_PPC_H -#define _URCU_ARCH_UATOMIC_PPC_H - -/* - * Copyright (c) 1991-1994 by Xerox Corporation. All rights reserved. - * Copyright (c) 1996-1999 by Silicon Graphics. All rights reserved. - * Copyright (c) 1999-2004 Hewlett-Packard Development Company, L.P. - * Copyright (c) 2009 Mathieu Desnoyers - * - * THIS MATERIAL IS PROVIDED AS IS, WITH ABSOLUTELY NO WARRANTY EXPRESSED - * OR IMPLIED. ANY USE IS AT YOUR OWN RISK. - * - * Permission is hereby granted to use or copy this program - * for any purpose, provided the above notices are retained on all copies. - * Permission to modify the code and to distribute modified code is granted, - * provided the above notices are retained, and a notice that the code was - * modified is included with the above copyright notice. - * - * Code inspired from libuatomic_ops-1.2, inherited in part from the - * Boehm-Demers-Weiser conservative garbage collector. - */ - -#include <urcu/compiler.h> -#include <urcu/system.h> - -#ifdef __cplusplus -extern "C" { -#endif - -#define ILLEGAL_INSTR ".long 0xd00d00" - -/* - * Providing sequential consistency semantic with respect to other - * instructions for cmpxchg and add_return family of atomic primitives. - * - * This is achieved with: - * lwsync (prior loads can be reordered after following load) - * lwarx - * stwcx. - * test if success (retry) - * sync - * - * Explanation of the sequential consistency provided by this scheme - * from Paul E. McKenney: - * - * The reason we can get away with the lwsync before is that if a prior - * store reorders with the lwarx, then you have to store to the atomic - * variable from some other CPU to detect it. - * - * And if you do that, the lwarx will lose its reservation, so the stwcx - * will fail. The atomic operation will retry, so that the caller won't be - * able to see the misordering. - */ - -/* xchg */ - -static inline __attribute__((always_inline)) -unsigned long _uatomic_exchange(void *addr, unsigned long val, int len) -{ - switch (len) { - case 4: - { - unsigned int result; - - __asm__ __volatile__( - LWSYNC_OPCODE - "1:\t" "lwarx %0,0,%1\n" /* load and reserve */ - "stwcx. %2,0,%1\n" /* else store conditional */ - "bne- 1b\n" /* retry if lost reservation */ - "sync\n" - : "=&r"(result) - : "r"(addr), "r"(val) - : "memory", "cc"); - - return result; - } -#if (CAA_BITS_PER_LONG == 64) - case 8: - { - unsigned long result; - - __asm__ __volatile__( - LWSYNC_OPCODE - "1:\t" "ldarx %0,0,%1\n" /* load and reserve */ - "stdcx. %2,0,%1\n" /* else store conditional */ - "bne- 1b\n" /* retry if lost reservation */ - "sync\n" - : "=&r"(result) - : "r"(addr), "r"(val) - : "memory", "cc"); - - return result; - } -#endif - } - /* - * generate an illegal instruction. Cannot catch this with - * linker tricks when optimizations are disabled. - */ - __asm__ __volatile__(ILLEGAL_INSTR); - return 0; -} - -#define uatomic_xchg(addr, v) \ - ((__typeof__(*(addr))) _uatomic_exchange((addr), \ - caa_cast_long_keep_sign(v), \ - sizeof(*(addr)))) -/* cmpxchg */ - -static inline __attribute__((always_inline)) -unsigned long _uatomic_cmpxchg(void *addr, unsigned long old, - unsigned long _new, int len) -{ - switch (len) { - case 4: - { - unsigned int old_val; - - __asm__ __volatile__( - LWSYNC_OPCODE - "1:\t" "lwarx %0,0,%1\n" /* load and reserve */ - "cmpw %0,%3\n" /* if load is not equal to */ - "bne 2f\n" /* old, fail */ - "stwcx. %2,0,%1\n" /* else store conditional */ - "bne- 1b\n" /* retry if lost reservation */ - "sync\n" - "2:\n" - : "=&r"(old_val) - : "r"(addr), "r"((unsigned int)_new), - "r"((unsigned int)old) - : "memory", "cc"); - - return old_val; - } -#if (CAA_BITS_PER_LONG == 64) - case 8: - { - unsigned long old_val; - - __asm__ __volatile__( - LWSYNC_OPCODE - "1:\t" "ldarx %0,0,%1\n" /* load and reserve */ - "cmpd %0,%3\n" /* if load is not equal to */ - "bne 2f\n" /* old, fail */ - "stdcx. %2,0,%1\n" /* else store conditional */ - "bne- 1b\n" /* retry if lost reservation */ - "sync\n" - "2:\n" - : "=&r"(old_val) - : "r"(addr), "r"((unsigned long)_new), - "r"((unsigned long)old) - : "memory", "cc"); - - return old_val; - } -#endif - } - /* - * generate an illegal instruction. Cannot catch this with - * linker tricks when optimizations are disabled. - */ - __asm__ __volatile__(ILLEGAL_INSTR); - return 0; -} - - -#define uatomic_cmpxchg(addr, old, _new) \ - ((__typeof__(*(addr))) _uatomic_cmpxchg((addr), \ - caa_cast_long_keep_sign(old), \ - caa_cast_long_keep_sign(_new),\ - sizeof(*(addr)))) - -/* uatomic_add_return */ - -static inline __attribute__((always_inline)) -unsigned long _uatomic_add_return(void *addr, unsigned long val, - int len) -{ - switch (len) { - case 4: - { - unsigned int result; - - __asm__ __volatile__( - LWSYNC_OPCODE - "1:\t" "lwarx %0,0,%1\n" /* load and reserve */ - "add %0,%2,%0\n" /* add val to value loaded */ - "stwcx. %0,0,%1\n" /* store conditional */ - "bne- 1b\n" /* retry if lost reservation */ - "sync\n" - : "=&r"(result) - : "r"(addr), "r"(val) - : "memory", "cc"); - - return result; - } -#if (CAA_BITS_PER_LONG == 64) - case 8: - { - unsigned long result; - - __asm__ __volatile__( - LWSYNC_OPCODE - "1:\t" "ldarx %0,0,%1\n" /* load and reserve */ - "add %0,%2,%0\n" /* add val to value loaded */ - "stdcx. %0,0,%1\n" /* store conditional */ - "bne- 1b\n" /* retry if lost reservation */ - "sync\n" - : "=&r"(result) - : "r"(addr), "r"(val) - : "memory", "cc"); - - return result; - } -#endif - } - /* - * generate an illegal instruction. Cannot catch this with - * linker tricks when optimizations are disabled. - */ - __asm__ __volatile__(ILLEGAL_INSTR); - return 0; -} - - -#define uatomic_add_return(addr, v) \ - ((__typeof__(*(addr))) _uatomic_add_return((addr), \ - caa_cast_long_keep_sign(v), \ - sizeof(*(addr)))) - -#ifdef __cplusplus -} -#endif - -#include <urcu/uatomic/generic.h> - -#endif /* _URCU_ARCH_UATOMIC_PPC_H */ diff --git a/include/urcu/uatomic/riscv.h b/include/urcu/uatomic/riscv.h deleted file mode 100644 index a6700e1..0000000 --- a/include/urcu/uatomic/riscv.h +++ /dev/null @@ -1,44 +0,0 @@ -/* - * Atomic exchange operations for the RISC-V architecture. Let GCC do it. - * - * Copyright (c) 2018 Michael Jeanson <mjeanson@efficios.com> - * - * Permission is hereby granted, free of charge, to any person obtaining a copy - * of this software and associated documentation files (the "Software"), to - * deal in the Software without restriction, including without limitation the - * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or - * sell copies of the Software, and to permit persons to whom the Software is - * furnished to do so, subject to the following conditions: - * - * The above copyright notice and this permission notice shall be included in - * all copies or substantial portions of the Software. - * - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR - * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, - * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE - * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER - * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING - * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS - * IN THE SOFTWARE. - */ - -#ifndef _URCU_ARCH_UATOMIC_RISCV_H -#define _URCU_ARCH_UATOMIC_RISCV_H - -#include <urcu/compiler.h> -#include <urcu/system.h> - -#ifdef __cplusplus -extern "C" { -#endif - -#define UATOMIC_HAS_ATOMIC_BYTE -#define UATOMIC_HAS_ATOMIC_SHORT - -#ifdef __cplusplus -} -#endif - -#include <urcu/uatomic/generic.h> - -#endif /* _URCU_ARCH_UATOMIC_RISCV_H */ diff --git a/include/urcu/uatomic/s390.h b/include/urcu/uatomic/s390.h deleted file mode 100644 index 42f23e7..0000000 --- a/include/urcu/uatomic/s390.h +++ /dev/null @@ -1,170 +0,0 @@ -#ifndef _URCU_UATOMIC_ARCH_S390_H -#define _URCU_UATOMIC_ARCH_S390_H - -/* - * Atomic exchange operations for the S390 architecture. Based on information - * taken from the Principles of Operation Appendix A "Conditional Swapping - * Instructions (CS, CDS)". - * - * Copyright (c) 2009 Novell, Inc. - * Author: Jan Blunck <jblunck@suse.de> - * Copyright (c) 2009 Mathieu Desnoyers <mathieu.desnoyers@efficios.com> - * - * Permission is hereby granted, free of charge, to any person obtaining a copy - * of this software and associated documentation files (the "Software"), to - * deal in the Software without restriction, including without limitation the - * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or - * sell copies of the Software, and to permit persons to whom the Software is - * furnished to do so, subject to the following conditions: - * - * The above copyright notice and this permission notice shall be included in - * all copies or substantial portions of the Software. - * - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR - * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, - * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE - * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER - * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING - * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS - * IN THE SOFTWARE. - */ - -#include <urcu/compiler.h> -#include <urcu/system.h> - -#ifdef __cplusplus -extern "C" { -#endif - -#if __GNUC__ > 3 || (__GNUC__ == 3 && __GNUC_MINOR__ > 2) -#define COMPILER_HAVE_SHORT_MEM_OPERAND -#endif - -/* - * MEMOP assembler operand rules: - * - op refer to MEMOP_IN operand - * - MEMOP_IN can expand to more than a single operand. Use it at the end of - * operand list only. - */ - -#ifdef COMPILER_HAVE_SHORT_MEM_OPERAND - -#define MEMOP_OUT(addr) "=Q" (*(addr)) -#define MEMOP_IN(addr) "Q" (*(addr)) -#define MEMOP_REF(op) #op /* op refer to MEMOP_IN operand */ - -#else /* !COMPILER_HAVE_SHORT_MEM_OPERAND */ - -#define MEMOP_OUT(addr) "=m" (*(addr)) -#define MEMOP_IN(addr) "a" (addr), "m" (*(addr)) -#define MEMOP_REF(op) "0(" #op ")" /* op refer to MEMOP_IN operand */ - -#endif /* !COMPILER_HAVE_SHORT_MEM_OPERAND */ - -/* - * The __hp() macro casts the void pointer @x to a pointer to a structure - * containing an array of char of the specified size. This allows passing the - * @addr arguments of the following inline functions as "m" and "+m" operands - * to the assembly. The @size parameter should be a constant to support - * compilers such as clang which do not support VLA. Create typedefs because - * C++ does not allow types be defined in casts. - */ - -typedef struct { char v[4]; } __hp_4; -typedef struct { char v[8]; } __hp_8; - -#define __hp(size, x) ((__hp_##size *)(x)) - -/* xchg */ - -static inline __attribute__((always_inline)) -unsigned long _uatomic_exchange(volatile void *addr, unsigned long val, int len) -{ - switch (len) { - case 4: - { - unsigned int old_val; - - __asm__ __volatile__( - "0: cs %0,%2," MEMOP_REF(%3) "\n" - " brc 4,0b\n" - : "=&r" (old_val), MEMOP_OUT (__hp(4, addr)) - : "r" (val), MEMOP_IN (__hp(4, addr)) - : "memory", "cc"); - return old_val; - } -#if (CAA_BITS_PER_LONG == 64) - case 8: - { - unsigned long old_val; - - __asm__ __volatile__( - "0: csg %0,%2," MEMOP_REF(%3) "\n" - " brc 4,0b\n" - : "=&r" (old_val), MEMOP_OUT (__hp(8, addr)) - : "r" (val), MEMOP_IN (__hp(8, addr)) - : "memory", "cc"); - return old_val; - } -#endif - default: - __asm__ __volatile__(".long 0xd00d00"); - } - - return 0; -} - -#define uatomic_xchg(addr, v) \ - (__typeof__(*(addr))) _uatomic_exchange((addr), \ - caa_cast_long_keep_sign(v), \ - sizeof(*(addr))) - -/* cmpxchg */ - -static inline __attribute__((always_inline)) -unsigned long _uatomic_cmpxchg(void *addr, unsigned long old, - unsigned long _new, int len) -{ - switch (len) { - case 4: - { - unsigned int old_val = (unsigned int)old; - - __asm__ __volatile__( - " cs %0,%2," MEMOP_REF(%3) "\n" - : "+r" (old_val), MEMOP_OUT (__hp(4, addr)) - : "r" (_new), MEMOP_IN (__hp(4, addr)) - : "memory", "cc"); - return old_val; - } -#if (CAA_BITS_PER_LONG == 64) - case 8: - { - __asm__ __volatile__( - " csg %0,%2," MEMOP_REF(%3) "\n" - : "+r" (old), MEMOP_OUT (__hp(8, addr)) - : "r" (_new), MEMOP_IN (__hp(8, addr)) - : "memory", "cc"); - return old; - } -#endif - default: - __asm__ __volatile__(".long 0xd00d00"); - } - - return 0; -} - -#define uatomic_cmpxchg(addr, old, _new) \ - (__typeof__(*(addr))) _uatomic_cmpxchg((addr), \ - caa_cast_long_keep_sign(old), \ - caa_cast_long_keep_sign(_new),\ - sizeof(*(addr))) - -#ifdef __cplusplus -} -#endif - -#include <urcu/uatomic/generic.h> - -#endif /* _URCU_UATOMIC_ARCH_S390_H */ diff --git a/include/urcu/uatomic/sparc64.h b/include/urcu/uatomic/sparc64.h deleted file mode 100644 index a9f2795..0000000 --- a/include/urcu/uatomic/sparc64.h +++ /dev/null @@ -1,81 +0,0 @@ -#ifndef _URCU_ARCH_UATOMIC_SPARC64_H -#define _URCU_ARCH_UATOMIC_SPARC64_H - -/* - * Copyright (c) 1991-1994 by Xerox Corporation. All rights reserved. - * Copyright (c) 1996-1999 by Silicon Graphics. All rights reserved. - * Copyright (c) 1999-2003 by Hewlett-Packard Company. All rights reserved. - * Copyright (c) 2009 Mathieu Desnoyers - * - * THIS MATERIAL IS PROVIDED AS IS, WITH ABSOLUTELY NO WARRANTY EXPRESSED - * OR IMPLIED. ANY USE IS AT YOUR OWN RISK. - * - * Permission is hereby granted to use or copy this program - * for any purpose, provided the above notices are retained on all copies. - * Permission to modify the code and to distribute modified code is granted, - * provided the above notices are retained, and a notice that the code was - * modified is included with the above copyright notice. - * - * Code inspired from libuatomic_ops-1.2, inherited in part from the - * Boehm-Demers-Weiser conservative garbage collector. - */ - -#include <urcu/compiler.h> -#include <urcu/system.h> - -#ifdef __cplusplus -extern "C" { -#endif - -/* cmpxchg */ - -static inline __attribute__((always_inline)) -unsigned long _uatomic_cmpxchg(void *addr, unsigned long old, - unsigned long _new, int len) -{ - switch (len) { - case 4: - { - __asm__ __volatile__ ( - "membar #StoreLoad | #LoadLoad\n\t" - "cas [%1],%2,%0\n\t" - "membar #StoreLoad | #StoreStore\n\t" - : "+&r" (_new) - : "r" (addr), "r" (old) - : "memory"); - - return _new; - } -#if (CAA_BITS_PER_LONG == 64) - case 8: - { - __asm__ __volatile__ ( - "membar #StoreLoad | #LoadLoad\n\t" - "casx [%1],%2,%0\n\t" - "membar #StoreLoad | #StoreStore\n\t" - : "+&r" (_new) - : "r" (addr), "r" (old) - : "memory"); - - return _new; - } -#endif - } - __builtin_trap(); - return 0; -} - - -#define uatomic_cmpxchg(addr, old, _new) \ - ((__typeof__(*(addr))) _uatomic_cmpxchg((addr), \ - caa_cast_long_keep_sign(old), \ - caa_cast_long_keep_sign(_new), \ - sizeof(*(addr)))) - -#ifdef __cplusplus -} -#endif - -#include <urcu/uatomic/generic.h> - -#endif /* _URCU_ARCH_UATOMIC_PPC_H */ diff --git a/include/urcu/uatomic/tile.h b/include/urcu/uatomic/tile.h deleted file mode 100644 index 830f260..0000000 --- a/include/urcu/uatomic/tile.h +++ /dev/null @@ -1,41 +0,0 @@ -#ifndef _URCU_ARCH_UATOMIC_TILE_H -#define _URCU_ARCH_UATOMIC_TILE_H - -/* - * Copyright (c) 1991-1994 by Xerox Corporation. All rights reserved. - * Copyright (c) 1996-1999 by Silicon Graphics. All rights reserved. - * Copyright (c) 1999-2004 Hewlett-Packard Development Company, L.P. - * Copyright (c) 2009-2015 Mathieu Desnoyers - * Copyright (c) 2010 Paul E. McKenney, IBM Corporation - * (Adapted from uatomic_arch_ppc.h) - * - * THIS MATERIAL IS PROVIDED AS IS, WITH ABSOLUTELY NO WARRANTY EXPRESSED - * OR IMPLIED. ANY USE IS AT YOUR OWN RISK. - * - * Permission is hereby granted to use or copy this program - * for any purpose, provided the above notices are retained on all copies. - * Permission to modify the code and to distribute modified code is granted, - * provided the above notices are retained, and a notice that the code was - * modified is included with the above copyright notice. - * - * Code inspired from libuatomic_ops-1.2, inherited in part from the - * Boehm-Demers-Weiser conservative garbage collector. - */ - -#include <urcu/compiler.h> -#include <urcu/system.h> - -#ifdef __cplusplus -extern "C" { -#endif - -#define UATOMIC_HAS_ATOMIC_BYTE -#define UATOMIC_HAS_ATOMIC_SHORT - -#ifdef __cplusplus -} -#endif - -#include <urcu/uatomic/generic.h> - -#endif /* _URCU_ARCH_UATOMIC_TILE_H */ diff --git a/include/urcu/uatomic/x86.h b/include/urcu/uatomic/x86.h deleted file mode 100644 index d416963..0000000 --- a/include/urcu/uatomic/x86.h +++ /dev/null @@ -1,646 +0,0 @@ -#ifndef _URCU_ARCH_UATOMIC_X86_H -#define _URCU_ARCH_UATOMIC_X86_H - -/* - * Copyright (c) 1991-1994 by Xerox Corporation. All rights reserved. - * Copyright (c) 1996-1999 by Silicon Graphics. All rights reserved. - * Copyright (c) 1999-2004 Hewlett-Packard Development Company, L.P. - * Copyright (c) 2009 Mathieu Desnoyers - * - * THIS MATERIAL IS PROVIDED AS IS, WITH ABSOLUTELY NO WARRANTY EXPRESSED - * OR IMPLIED. ANY USE IS AT YOUR OWN RISK. - * - * Permission is hereby granted to use or copy this program - * for any purpose, provided the above notices are retained on all copies. - * Permission to modify the code and to distribute modified code is granted, - * provided the above notices are retained, and a notice that the code was - * modified is included with the above copyright notice. - * - * Code inspired from libuatomic_ops-1.2, inherited in part from the - * Boehm-Demers-Weiser conservative garbage collector. - */ - -#include <urcu/arch.h> -#include <urcu/config.h> -#include <urcu/compiler.h> -#include <urcu/system.h> - -#define UATOMIC_HAS_ATOMIC_BYTE -#define UATOMIC_HAS_ATOMIC_SHORT - -#ifdef __cplusplus -extern "C" { -#endif - -/* - * Derived from AO_compare_and_swap() and AO_test_and_set_full(). - */ - -/* - * The __hp() macro casts the void pointer @x to a pointer to a structure - * containing an array of char of the specified size. This allows passing the - * @addr arguments of the following inline functions as "m" and "+m" operands - * to the assembly. The @size parameter should be a constant to support - * compilers such as clang which do not support VLA. Create typedefs because - * C++ does not allow types be defined in casts. - */ - -typedef struct { char v[1]; } __hp_1; -typedef struct { char v[2]; } __hp_2; -typedef struct { char v[4]; } __hp_4; -typedef struct { char v[8]; } __hp_8; - -#define __hp(size, x) ((__hp_##size *)(x)) - -#define _uatomic_set(addr, v) ((void) CMM_STORE_SHARED(*(addr), (v))) - -/* cmpxchg */ - -static inline __attribute__((always_inline)) -unsigned long __uatomic_cmpxchg(void *addr, unsigned long old, - unsigned long _new, int len) -{ - switch (len) { - case 1: - { - unsigned char result = old; - - __asm__ __volatile__( - "lock; cmpxchgb %2, %1" - : "+a"(result), "+m"(*__hp(1, addr)) - : "q"((unsigned char)_new) - : "memory"); - return result; - } - case 2: - { - unsigned short result = old; - - __asm__ __volatile__( - "lock; cmpxchgw %2, %1" - : "+a"(result), "+m"(*__hp(2, addr)) - : "r"((unsigned short)_new) - : "memory"); - return result; - } - case 4: - { - unsigned int result = old; - - __asm__ __volatile__( - "lock; cmpxchgl %2, %1" - : "+a"(result), "+m"(*__hp(4, addr)) - : "r"((unsigned int)_new) - : "memory"); - return result; - } -#if (CAA_BITS_PER_LONG == 64) - case 8: - { - unsigned long result = old; - - __asm__ __volatile__( - "lock; cmpxchgq %2, %1" - : "+a"(result), "+m"(*__hp(8, addr)) - : "r"((unsigned long)_new) - : "memory"); - return result; - } -#endif - } - /* - * generate an illegal instruction. Cannot catch this with - * linker tricks when optimizations are disabled. - */ - __asm__ __volatile__("ud2"); - return 0; -} - -#define _uatomic_cmpxchg(addr, old, _new) \ - ((__typeof__(*(addr))) __uatomic_cmpxchg((addr), \ - caa_cast_long_keep_sign(old), \ - caa_cast_long_keep_sign(_new),\ - sizeof(*(addr)))) - -/* xchg */ - -static inline __attribute__((always_inline)) -unsigned long __uatomic_exchange(void *addr, unsigned long val, int len) -{ - /* Note: the "xchg" instruction does not need a "lock" prefix. */ - switch (len) { - case 1: - { - unsigned char result; - __asm__ __volatile__( - "xchgb %0, %1" - : "=q"(result), "+m"(*__hp(1, addr)) - : "0" ((unsigned char)val) - : "memory"); - return result; - } - case 2: - { - unsigned short result; - __asm__ __volatile__( - "xchgw %0, %1" - : "=r"(result), "+m"(*__hp(2, addr)) - : "0" ((unsigned short)val) - : "memory"); - return result; - } - case 4: - { - unsigned int result; - __asm__ __volatile__( - "xchgl %0, %1" - : "=r"(result), "+m"(*__hp(4, addr)) - : "0" ((unsigned int)val) - : "memory"); - return result; - } -#if (CAA_BITS_PER_LONG == 64) - case 8: - { - unsigned long result; - __asm__ __volatile__( - "xchgq %0, %1" - : "=r"(result), "+m"(*__hp(8, addr)) - : "0" ((unsigned long)val) - : "memory"); - return result; - } -#endif - } - /* - * generate an illegal instruction. Cannot catch this with - * linker tricks when optimizations are disabled. - */ - __asm__ __volatile__("ud2"); - return 0; -} - -#define _uatomic_xchg(addr, v) \ - ((__typeof__(*(addr))) __uatomic_exchange((addr), \ - caa_cast_long_keep_sign(v), \ - sizeof(*(addr)))) - -/* uatomic_add_return */ - -static inline __attribute__((always_inline)) -unsigned long __uatomic_add_return(void *addr, unsigned long val, - int len) -{ - switch (len) { - case 1: - { - unsigned char result = val; - - __asm__ __volatile__( - "lock; xaddb %1, %0" - : "+m"(*__hp(1, addr)), "+q" (result) - : - : "memory"); - return result + (unsigned char)val; - } - case 2: - { - unsigned short result = val; - - __asm__ __volatile__( - "lock; xaddw %1, %0" - : "+m"(*__hp(2, addr)), "+r" (result) - : - : "memory"); - return result + (unsigned short)val; - } - case 4: - { - unsigned int result = val; - - __asm__ __volatile__( - "lock; xaddl %1, %0" - : "+m"(*__hp(4, addr)), "+r" (result) - : - : "memory"); - return result + (unsigned int)val; - } -#if (CAA_BITS_PER_LONG == 64) - case 8: - { - unsigned long result = val; - - __asm__ __volatile__( - "lock; xaddq %1, %0" - : "+m"(*__hp(8, addr)), "+r" (result) - : - : "memory"); - return result + (unsigned long)val; - } -#endif - } - /* - * generate an illegal instruction. Cannot catch this with - * linker tricks when optimizations are disabled. - */ - __asm__ __volatile__("ud2"); - return 0; -} - -#define _uatomic_add_return(addr, v) \ - ((__typeof__(*(addr))) __uatomic_add_return((addr), \ - caa_cast_long_keep_sign(v), \ - sizeof(*(addr)))) - -/* uatomic_and */ - -static inline __attribute__((always_inline)) -void __uatomic_and(void *addr, unsigned long val, int len) -{ - switch (len) { - case 1: - { - __asm__ __volatile__( - "lock; andb %1, %0" - : "=m"(*__hp(1, addr)) - : "iq" ((unsigned char)val) - : "memory"); - return; - } - case 2: - { - __asm__ __volatile__( - "lock; andw %1, %0" - : "=m"(*__hp(2, addr)) - : "ir" ((unsigned short)val) - : "memory"); - return; - } - case 4: - { - __asm__ __volatile__( - "lock; andl %1, %0" - : "=m"(*__hp(4, addr)) - : "ir" ((unsigned int)val) - : "memory"); - return; - } -#if (CAA_BITS_PER_LONG == 64) - case 8: - { - __asm__ __volatile__( - "lock; andq %1, %0" - : "=m"(*__hp(8, addr)) - : "er" ((unsigned long)val) - : "memory"); - return; - } -#endif - } - /* - * generate an illegal instruction. Cannot catch this with - * linker tricks when optimizations are disabled. - */ - __asm__ __volatile__("ud2"); - return; -} - -#define _uatomic_and(addr, v) \ - (__uatomic_and((addr), caa_cast_long_keep_sign(v), sizeof(*(addr)))) - -/* uatomic_or */ - -static inline __attribute__((always_inline)) -void __uatomic_or(void *addr, unsigned long val, int len) -{ - switch (len) { - case 1: - { - __asm__ __volatile__( - "lock; orb %1, %0" - : "=m"(*__hp(1, addr)) - : "iq" ((unsigned char)val) - : "memory"); - return; - } - case 2: - { - __asm__ __volatile__( - "lock; orw %1, %0" - : "=m"(*__hp(2, addr)) - : "ir" ((unsigned short)val) - : "memory"); - return; - } - case 4: - { - __asm__ __volatile__( - "lock; orl %1, %0" - : "=m"(*__hp(4, addr)) - : "ir" ((unsigned int)val) - : "memory"); - return; - } -#if (CAA_BITS_PER_LONG == 64) - case 8: - { - __asm__ __volatile__( - "lock; orq %1, %0" - : "=m"(*__hp(8, addr)) - : "er" ((unsigned long)val) - : "memory"); - return; - } -#endif - } - /* - * generate an illegal instruction. Cannot catch this with - * linker tricks when optimizations are disabled. - */ - __asm__ __volatile__("ud2"); - return; -} - -#define _uatomic_or(addr, v) \ - (__uatomic_or((addr), caa_cast_long_keep_sign(v), sizeof(*(addr)))) - -/* uatomic_add */ - -static inline __attribute__((always_inline)) -void __uatomic_add(void *addr, unsigned long val, int len) -{ - switch (len) { - case 1: - { - __asm__ __volatile__( - "lock; addb %1, %0" - : "=m"(*__hp(1, addr)) - : "iq" ((unsigned char)val) - : "memory"); - return; - } - case 2: - { - __asm__ __volatile__( - "lock; addw %1, %0" - : "=m"(*__hp(2, addr)) - : "ir" ((unsigned short)val) - : "memory"); - return; - } - case 4: - { - __asm__ __volatile__( - "lock; addl %1, %0" - : "=m"(*__hp(4, addr)) - : "ir" ((unsigned int)val) - : "memory"); - return; - } -#if (CAA_BITS_PER_LONG == 64) - case 8: - { - __asm__ __volatile__( - "lock; addq %1, %0" - : "=m"(*__hp(8, addr)) - : "er" ((unsigned long)val) - : "memory"); - return; - } -#endif - } - /* - * generate an illegal instruction. Cannot catch this with - * linker tricks when optimizations are disabled. - */ - __asm__ __volatile__("ud2"); - return; -} - -#define _uatomic_add(addr, v) \ - (__uatomic_add((addr), caa_cast_long_keep_sign(v), sizeof(*(addr)))) - - -/* uatomic_inc */ - -static inline __attribute__((always_inline)) -void __uatomic_inc(void *addr, int len) -{ - switch (len) { - case 1: - { - __asm__ __volatile__( - "lock; incb %0" - : "=m"(*__hp(1, addr)) - : - : "memory"); - return; - } - case 2: - { - __asm__ __volatile__( - "lock; incw %0" - : "=m"(*__hp(2, addr)) - : - : "memory"); - return; - } - case 4: - { - __asm__ __volatile__( - "lock; incl %0" - : "=m"(*__hp(4, addr)) - : - : "memory"); - return; - } -#if (CAA_BITS_PER_LONG == 64) - case 8: - { - __asm__ __volatile__( - "lock; incq %0" - : "=m"(*__hp(8, addr)) - : - : "memory"); - return; - } -#endif - } - /* generate an illegal instruction. Cannot catch this with linker tricks - * when optimizations are disabled. */ - __asm__ __volatile__("ud2"); - return; -} - -#define _uatomic_inc(addr) (__uatomic_inc((addr), sizeof(*(addr)))) - -/* uatomic_dec */ - -static inline __attribute__((always_inline)) -void __uatomic_dec(void *addr, int len) -{ - switch (len) { - case 1: - { - __asm__ __volatile__( - "lock; decb %0" - : "=m"(*__hp(1, addr)) - : - : "memory"); - return; - } - case 2: - { - __asm__ __volatile__( - "lock; decw %0" - : "=m"(*__hp(2, addr)) - : - : "memory"); - return; - } - case 4: - { - __asm__ __volatile__( - "lock; decl %0" - : "=m"(*__hp(4, addr)) - : - : "memory"); - return; - } -#if (CAA_BITS_PER_LONG == 64) - case 8: - { - __asm__ __volatile__( - "lock; decq %0" - : "=m"(*__hp(8, addr)) - : - : "memory"); - return; - } -#endif - } - /* - * generate an illegal instruction. Cannot catch this with - * linker tricks when optimizations are disabled. - */ - __asm__ __volatile__("ud2"); - return; -} - -#define _uatomic_dec(addr) (__uatomic_dec((addr), sizeof(*(addr)))) - -#ifdef URCU_ARCH_X86_NO_CAS - -/* For backwards compat */ -#define CONFIG_RCU_COMPAT_ARCH 1 - -extern int __rcu_cas_avail; -extern int __rcu_cas_init(void); - -#define UATOMIC_COMPAT(insn) \ - ((caa_likely(__rcu_cas_avail > 0)) \ - ? (_uatomic_##insn) \ - : ((caa_unlikely(__rcu_cas_avail < 0) \ - ? ((__rcu_cas_init() > 0) \ - ? (_uatomic_##insn) \ - : (compat_uatomic_##insn)) \ - : (compat_uatomic_##insn)))) - -/* - * We leave the return value so we don't break the ABI, but remove the - * return value from the API. - */ -extern unsigned long _compat_uatomic_set(void *addr, - unsigned long _new, int len); -#define compat_uatomic_set(addr, _new) \ - ((void) _compat_uatomic_set((addr), \ - caa_cast_long_keep_sign(_new), \ - sizeof(*(addr)))) - - -extern unsigned long _compat_uatomic_xchg(void *addr, - unsigned long _new, int len); -#define compat_uatomic_xchg(addr, _new) \ - ((__typeof__(*(addr))) _compat_uatomic_xchg((addr), \ - caa_cast_long_keep_sign(_new), \ - sizeof(*(addr)))) - -extern unsigned long _compat_uatomic_cmpxchg(void *addr, unsigned long old, - unsigned long _new, int len); -#define compat_uatomic_cmpxchg(addr, old, _new) \ - ((__typeof__(*(addr))) _compat_uatomic_cmpxchg((addr), \ - caa_cast_long_keep_sign(old), \ - caa_cast_long_keep_sign(_new), \ - sizeof(*(addr)))) - -extern void _compat_uatomic_and(void *addr, unsigned long _new, int len); -#define compat_uatomic_and(addr, v) \ - (_compat_uatomic_and((addr), \ - caa_cast_long_keep_sign(v), \ - sizeof(*(addr)))) - -extern void _compat_uatomic_or(void *addr, unsigned long _new, int len); -#define compat_uatomic_or(addr, v) \ - (_compat_uatomic_or((addr), \ - caa_cast_long_keep_sign(v), \ - sizeof(*(addr)))) - -extern unsigned long _compat_uatomic_add_return(void *addr, - unsigned long _new, int len); -#define compat_uatomic_add_return(addr, v) \ - ((__typeof__(*(addr))) _compat_uatomic_add_return((addr), \ - caa_cast_long_keep_sign(v), \ - sizeof(*(addr)))) - -#define compat_uatomic_add(addr, v) \ - ((void)compat_uatomic_add_return((addr), (v))) -#define compat_uatomic_inc(addr) \ - (compat_uatomic_add((addr), 1)) -#define compat_uatomic_dec(addr) \ - (compat_uatomic_add((addr), -1)) - -#else -#define UATOMIC_COMPAT(insn) (_uatomic_##insn) -#endif - -/* Read is atomic even in compat mode */ -#define uatomic_set(addr, v) \ - UATOMIC_COMPAT(set(addr, v)) - -#define uatomic_cmpxchg(addr, old, _new) \ - UATOMIC_COMPAT(cmpxchg(addr, old, _new)) -#define uatomic_xchg(addr, v) \ - UATOMIC_COMPAT(xchg(addr, v)) - -#define uatomic_and(addr, v) \ - UATOMIC_COMPAT(and(addr, v)) -#define cmm_smp_mb__before_uatomic_and() cmm_barrier() -#define cmm_smp_mb__after_uatomic_and() cmm_barrier() - -#define uatomic_or(addr, v) \ - UATOMIC_COMPAT(or(addr, v)) -#define cmm_smp_mb__before_uatomic_or() cmm_barrier() -#define cmm_smp_mb__after_uatomic_or() cmm_barrier() - -#define uatomic_add_return(addr, v) \ - UATOMIC_COMPAT(add_return(addr, v)) - -#define uatomic_add(addr, v) UATOMIC_COMPAT(add(addr, v)) -#define cmm_smp_mb__before_uatomic_add() cmm_barrier() -#define cmm_smp_mb__after_uatomic_add() cmm_barrier() - -#define uatomic_inc(addr) UATOMIC_COMPAT(inc(addr)) -#define cmm_smp_mb__before_uatomic_inc() cmm_barrier() -#define cmm_smp_mb__after_uatomic_inc() cmm_barrier() - -#define uatomic_dec(addr) UATOMIC_COMPAT(dec(addr)) -#define cmm_smp_mb__before_uatomic_dec() cmm_barrier() -#define cmm_smp_mb__after_uatomic_dec() cmm_barrier() - -#ifdef __cplusplus -} -#endif - -#include <urcu/uatomic/generic.h> - -#endif /* _URCU_ARCH_UATOMIC_X86_H */ -- 2.39.2 _______________________________________________ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [lttng-dev] [PATCH 2/7] Use gcc __atomic builtis for <urcu/uatomic.h> implementation 2023-03-21 13:30 ` [lttng-dev] [PATCH 2/7] Use gcc __atomic builtis for <urcu/uatomic.h> implementation Ondřej Surý via lttng-dev @ 2023-03-21 20:03 ` Mathieu Desnoyers via lttng-dev 0 siblings, 0 replies; 13+ messages in thread From: Mathieu Desnoyers via lttng-dev @ 2023-03-21 20:03 UTC (permalink / raw) To: Ondřej Surý, lttng-dev On 2023-03-21 09:30, Ondřej Surý via lttng-dev wrote: > Replace the custom assembly code in include/urcu/uatomic/ with __atomic > builtins provided by C11-compatible compiler. > > Signed-off-by: Ondřej Surý <ondrej@sury.org> > --- > include/Makefile.am | 16 - > include/urcu/uatomic.h | 84 +++-- > include/urcu/uatomic/aarch64.h | 41 --- > include/urcu/uatomic/alpha.h | 32 -- > include/urcu/uatomic/arm.h | 57 --- > include/urcu/uatomic/gcc.h | 46 --- > include/urcu/uatomic/generic.h | 613 ------------------------------- > include/urcu/uatomic/hppa.h | 10 - > include/urcu/uatomic/ia64.h | 41 --- > include/urcu/uatomic/m68k.h | 44 --- > include/urcu/uatomic/mips.h | 32 -- > include/urcu/uatomic/nios2.h | 32 -- > include/urcu/uatomic/ppc.h | 237 ------------ > include/urcu/uatomic/riscv.h | 44 --- > include/urcu/uatomic/s390.h | 170 --------- > include/urcu/uatomic/sparc64.h | 81 ----- > include/urcu/uatomic/tile.h | 41 --- > include/urcu/uatomic/x86.h | 646 --------------------------------- > 18 files changed, 53 insertions(+), 2214 deletions(-) > delete mode 100644 include/urcu/uatomic/aarch64.h > delete mode 100644 include/urcu/uatomic/alpha.h > delete mode 100644 include/urcu/uatomic/arm.h > delete mode 100644 include/urcu/uatomic/gcc.h > delete mode 100644 include/urcu/uatomic/generic.h > delete mode 100644 include/urcu/uatomic/hppa.h > delete mode 100644 include/urcu/uatomic/ia64.h > delete mode 100644 include/urcu/uatomic/m68k.h > delete mode 100644 include/urcu/uatomic/mips.h > delete mode 100644 include/urcu/uatomic/nios2.h > delete mode 100644 include/urcu/uatomic/ppc.h > delete mode 100644 include/urcu/uatomic/riscv.h > delete mode 100644 include/urcu/uatomic/s390.h > delete mode 100644 include/urcu/uatomic/sparc64.h > delete mode 100644 include/urcu/uatomic/tile.h > delete mode 100644 include/urcu/uatomic/x86.h > > diff --git a/include/Makefile.am b/include/Makefile.am > index ba1fe60..53a28fd 100644 > --- a/include/Makefile.am > +++ b/include/Makefile.am > @@ -59,24 +59,8 @@ nobase_include_HEADERS = \ > urcu/syscall-compat.h \ > urcu/system.h \ > urcu/tls-compat.h \ > - urcu/uatomic/aarch64.h \ > - urcu/uatomic/alpha.h \ > urcu/uatomic_arch.h \ > - urcu/uatomic/arm.h \ > - urcu/uatomic/gcc.h \ > - urcu/uatomic/generic.h \ > urcu/uatomic.h \ > - urcu/uatomic/hppa.h \ > - urcu/uatomic/ia64.h \ > - urcu/uatomic/m68k.h \ > - urcu/uatomic/mips.h \ > - urcu/uatomic/nios2.h \ > - urcu/uatomic/ppc.h \ > - urcu/uatomic/riscv.h \ > - urcu/uatomic/s390.h \ > - urcu/uatomic/sparc64.h \ > - urcu/uatomic/tile.h \ > - urcu/uatomic/x86.h \ > urcu/urcu-bp.h \ > urcu/urcu-futex.h \ > urcu/urcu.h \ > diff --git a/include/urcu/uatomic.h b/include/urcu/uatomic.h > index 2fb5fd4..0327810 100644 > --- a/include/urcu/uatomic.h > +++ b/include/urcu/uatomic.h > @@ -22,37 +22,59 @@ > #define _URCU_UATOMIC_H > > #include <urcu/arch.h> > +#include <urcu/system.h> > > -#if defined(URCU_ARCH_X86) > -#include <urcu/uatomic/x86.h> > -#elif defined(URCU_ARCH_PPC) > -#include <urcu/uatomic/ppc.h> > -#elif defined(URCU_ARCH_S390) > -#include <urcu/uatomic/s390.h> > -#elif defined(URCU_ARCH_SPARC64) > -#include <urcu/uatomic/sparc64.h> > -#elif defined(URCU_ARCH_ALPHA) > -#include <urcu/uatomic/alpha.h> > -#elif defined(URCU_ARCH_IA64) > -#include <urcu/uatomic/ia64.h> > -#elif defined(URCU_ARCH_ARM) > -#include <urcu/uatomic/arm.h> > -#elif defined(URCU_ARCH_AARCH64) > -#include <urcu/uatomic/aarch64.h> > -#elif defined(URCU_ARCH_MIPS) > -#include <urcu/uatomic/mips.h> > -#elif defined(URCU_ARCH_NIOS2) > -#include <urcu/uatomic/nios2.h> > -#elif defined(URCU_ARCH_TILE) > -#include <urcu/uatomic/tile.h> > -#elif defined(URCU_ARCH_HPPA) > -#include <urcu/uatomic/hppa.h> > -#elif defined(URCU_ARCH_M68K) > -#include <urcu/uatomic/m68k.h> > -#elif defined(URCU_ARCH_RISCV) > -#include <urcu/uatomic/riscv.h> > -#else > -#error "Cannot build: unrecognized architecture, see <urcu/arch.h>." > -#endif > +#define UATOMIC_HAS_ATOMIC_BYTE > +#define UATOMIC_HAS_ATOMIC_SHORT > + > +#define uatomic_set(addr, v) __atomic_store_n(addr, v, __ATOMIC_RELEASE) > + > +#define uatomic_read(addr) __atomic_load_n((addr), __ATOMIC_CONSUME) > + > +#define uatomic_xchg(addr, v) __atomic_exchange_n((addr), (v), __ATOMIC_SEQ_CST) > + > +#define uatomic_cmpxchg(addr, old, new) \ > + ({ \ > + __typeof__(*(addr)) __old = old; \ > + __atomic_compare_exchange_n(addr, &__old, new, 0, \ > + __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST); \ > + __old; \ > + }) > + > +#define uatomic_add_return(addr, v) \ > + __atomic_add_fetch((addr), (v), __ATOMIC_SEQ_CST) The extra parentheses around "addr" and "v" here are not needed due to operator priority of comma ",". Likewise elsewhere in this patch. Also, as mentioned earlier, please special-case the x86 implementation to include the __ATOMIC_SEQ_CST into atomic operations. Thanks, Mathieu > + > +#define uatomic_add(addr, v) \ > + (void)__atomic_add_fetch((addr), (v), __ATOMIC_RELAXED) > + > +#define uatomic_sub_return(addr, v) \ > + __atomic_sub_fetch((addr), (v), __ATOMIC_SEQ_CST) > + > +#define uatomic_sub(addr, v) \ > + (void)__atomic_sub_fetch((addr), (v), __ATOMIC_RELAXED) > + > +#define uatomic_and(addr, mask) \ > + (void)__atomic_and_fetch((addr), (mask), __ATOMIC_RELAXED) > + > +#define uatomic_or(addr, mask) \ > + (void)__atomic_or_fetch((addr), (mask), __ATOMIC_RELAXED) > + > +#define uatomic_inc(addr) (void)__atomic_add_fetch((addr), 1, __ATOMIC_RELAXED) > +#define uatomic_dec(addr) (void)__atomic_sub_fetch((addr), 1, __ATOMIC_RELAXED) > + > +#define cmm_smp_mb__before_uatomic_and() __atomic_thread_fence(__ATOMIC_SEQ_CST) > +#define cmm_smp_mb__after_uatomic_and() __atomic_thread_fence(__ATOMIC_SEQ_CST) > +#define cmm_smp_mb__before_uatomic_or() __atomic_thread_fence(__ATOMIC_SEQ_CST) > +#define cmm_smp_mb__after_uatomic_or() __atomic_thread_fence(__ATOMIC_SEQ_CST) > +#define cmm_smp_mb__before_uatomic_add() __atomic_thread_fence(__ATOMIC_SEQ_CST) > +#define cmm_smp_mb__after_uatomic_add() __atomic_thread_fence(__ATOMIC_SEQ_CST) > +#define cmm_smp_mb__before_uatomic_sub() cmm_smp_mb__before_uatomic_add() > +#define cmm_smp_mb__after_uatomic_sub() cmm_smp_mb__after_uatomic_add() > +#define cmm_smp_mb__before_uatomic_inc() cmm_smp_mb__before_uatomic_add() > +#define cmm_smp_mb__after_uatomic_inc() cmm_smp_mb__after_uatomic_add() > +#define cmm_smp_mb__before_uatomic_dec() cmm_smp_mb__before_uatomic_add() > +#define cmm_smp_mb__after_uatomic_dec() cmm_smp_mb__after_uatomic_add() > + > +#define cmm_smp_mb() cmm_mb() > > #endif /* _URCU_UATOMIC_H */ [...] Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com _______________________________________________ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev ^ permalink raw reply [flat|nested] 13+ messages in thread
* [lttng-dev] [PATCH 3/7] Use __atomic_signal_fence() for cmm_barrier() 2023-03-21 13:30 [lttng-dev] (no subject) Ondřej Surý via lttng-dev 2023-03-21 13:30 ` [lttng-dev] [PATCH 1/7] Require __atomic builtins to build Ondřej Surý via lttng-dev 2023-03-21 13:30 ` [lttng-dev] [PATCH 2/7] Use gcc __atomic builtis for <urcu/uatomic.h> implementation Ondřej Surý via lttng-dev @ 2023-03-21 13:30 ` Ondřej Surý via lttng-dev 2023-03-21 13:30 ` [lttng-dev] [PATCH 4/7] Replace the internal pointer manipulation with __atomic builtins Ondřej Surý via lttng-dev ` (3 subsequent siblings) 6 siblings, 0 replies; 13+ messages in thread From: Ondřej Surý via lttng-dev @ 2023-03-21 13:30 UTC (permalink / raw) To: lttng-dev Use __atomic_signal_fence(__ATOMIC_SEQ_CST) for cmm_barrier(), so ThreadSanitizer can understand the memory synchronization. Signed-off-by: Ondřej Surý <ondrej@sury.org> --- include/urcu/compiler.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/urcu/compiler.h b/include/urcu/compiler.h index 2f32b38..5763378 100644 --- a/include/urcu/compiler.h +++ b/include/urcu/compiler.h @@ -28,7 +28,7 @@ #define caa_likely(x) __builtin_expect(!!(x), 1) #define caa_unlikely(x) __builtin_expect(!!(x), 0) -#define cmm_barrier() __asm__ __volatile__ ("" : : : "memory") +#define cmm_barrier() __atomic_signal_fence(__ATOMIC_SEQ_CST) /* * Instruct the compiler to perform only a single access to a variable -- 2.39.2 _______________________________________________ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [lttng-dev] [PATCH 4/7] Replace the internal pointer manipulation with __atomic builtins 2023-03-21 13:30 [lttng-dev] (no subject) Ondřej Surý via lttng-dev ` (2 preceding siblings ...) 2023-03-21 13:30 ` [lttng-dev] [PATCH 3/7] Use __atomic_signal_fence() for cmm_barrier() Ondřej Surý via lttng-dev @ 2023-03-21 13:30 ` Ondřej Surý via lttng-dev 2023-03-21 13:31 ` [lttng-dev] [PATCH 5/7] Replace the arch-specific memory barriers " Ondřej Surý via lttng-dev ` (2 subsequent siblings) 6 siblings, 0 replies; 13+ messages in thread From: Ondřej Surý via lttng-dev @ 2023-03-21 13:30 UTC (permalink / raw) To: lttng-dev Instead of custom code, use the __atomic builtins to implement the rcu_dereference(), rcu_cmpxchg_pointer(), rcu_xchg_pointer() and rcu_assign_pointer(). The rcu_dereference() now relies on CONSUME memory order. Signed-off-by: Ondřej Surý <ondrej@sury.org> --- include/urcu/static/pointer.h | 77 +++++++---------------------------- 1 file changed, 14 insertions(+), 63 deletions(-) diff --git a/include/urcu/static/pointer.h b/include/urcu/static/pointer.h index 9e46a57..a549483 100644 --- a/include/urcu/static/pointer.h +++ b/include/urcu/static/pointer.h @@ -38,6 +38,8 @@ extern "C" { #endif +#define _rcu_get_pointer(addr) __atomic_load_n(addr, __ATOMIC_CONSUME) + /** * _rcu_dereference - reads (copy) a RCU-protected pointer to a local variable * into a RCU read-side critical section. The pointer can later be safely @@ -49,14 +51,6 @@ extern "C" { * Inserts memory barriers on architectures that require them (currently only * Alpha) and documents which pointers are protected by RCU. * - * With C standards prior to C11/C++11, the compiler memory barrier in - * CMM_LOAD_SHARED() ensures that value-speculative optimizations (e.g. - * VSS: Value Speculation Scheduling) does not perform the data read - * before the pointer read by speculating the value of the pointer. - * Correct ordering is ensured because the pointer is read as a volatile - * access. This acts as a global side-effect operation, which forbids - * reordering of dependent memory operations. - * * With C standards C11/C++11, concerns about dependency-breaking * optimizations are taken care of by the "memory_order_consume" atomic * load. @@ -65,10 +59,6 @@ extern "C" { * explicit because the pointer used as input argument is a pointer, * not an _Atomic type as required by C11/C++11. * - * By defining URCU_DEREFERENCE_USE_VOLATILE, the user requires use of - * volatile access to implement rcu_dereference rather than - * memory_order_consume load from the C11/C++11 standards. - * * This may improve performance on weakly-ordered architectures where * the compiler implements memory_order_consume as a * memory_order_acquire, which is stricter than required by the @@ -83,35 +73,7 @@ extern "C" { * meets the 10-line criterion in LGPL, allowing this function to be * expanded directly in non-LGPL code. */ - -#if !defined (URCU_DEREFERENCE_USE_VOLATILE) && \ - ((defined (__cplusplus) && __cplusplus >= 201103L) || \ - (defined (__STDC_VERSION__) && __STDC_VERSION__ >= 201112L)) -# define __URCU_DEREFERENCE_USE_ATOMIC_CONSUME -#endif - -/* - * If p is const (the pointer itself, not what it points to), using - * __typeof__(p) would declare a const variable, leading to - * -Wincompatible-pointer-types errors. Using the statement expression - * makes it an rvalue and gets rid of the const-ness. - */ -#ifdef __URCU_DEREFERENCE_USE_ATOMIC_CONSUME -# define _rcu_dereference(p) __extension__ ({ \ - __typeof__(__extension__ ({ \ - __typeof__(p) __attribute__((unused)) _________p0 = { 0 }; \ - _________p0; \ - })) _________p1; \ - __atomic_load(&(p), &_________p1, __ATOMIC_CONSUME); \ - (_________p1); \ - }) -#else -# define _rcu_dereference(p) __extension__ ({ \ - __typeof__(p) _________p1 = CMM_LOAD_SHARED(p); \ - cmm_smp_read_barrier_depends(); \ - (_________p1); \ - }) -#endif +#define _rcu_dereference(p) _rcu_get_pointer(&(p)) /** * _rcu_cmpxchg_pointer - same as rcu_assign_pointer, but tests if the pointer @@ -126,12 +88,12 @@ extern "C" { * meets the 10-line criterion in LGPL, allowing this function to be * expanded directly in non-LGPL code. */ -#define _rcu_cmpxchg_pointer(p, old, _new) \ - __extension__ \ - ({ \ - __typeof__(*p) _________pold = (old); \ - __typeof__(*p) _________pnew = (_new); \ - uatomic_cmpxchg(p, _________pold, _________pnew); \ +#define _rcu_cmpxchg_pointer(p, old, _new) \ + ({ \ + __typeof__(*(p)) __old = old; \ + __atomic_compare_exchange_n(p, &__old, _new, 0, \ + __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST); \ + __old; \ }) /** @@ -145,22 +107,11 @@ extern "C" { * meets the 10-line criterion in LGPL, allowing this function to be * expanded directly in non-LGPL code. */ -#define _rcu_xchg_pointer(p, v) \ - __extension__ \ - ({ \ - __typeof__(*p) _________pv = (v); \ - uatomic_xchg(p, _________pv); \ - }) - +#define _rcu_xchg_pointer(p, v) \ + __atomic_exchange_n(p, v, __ATOMIC_SEQ_CST) -#define _rcu_set_pointer(p, v) \ - do { \ - __typeof__(*p) _________pv = (v); \ - if (!__builtin_constant_p(v) || \ - ((v) != NULL)) \ - cmm_wmb(); \ - uatomic_set(p, _________pv); \ - } while (0) +#define _rcu_set_pointer(p, v) \ + __atomic_store_n(p, v, __ATOMIC_RELEASE) /** * _rcu_assign_pointer - assign (publicize) a pointer to a new data structure @@ -178,7 +129,7 @@ extern "C" { * meets the 10-line criterion in LGPL, allowing this function to be * expanded directly in non-LGPL code. */ -#define _rcu_assign_pointer(p, v) _rcu_set_pointer(&(p), v) +#define _rcu_assign_pointer(p, v) rcu_set_pointer(&(p), v) #ifdef __cplusplus } -- 2.39.2 _______________________________________________ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [lttng-dev] [PATCH 5/7] Replace the arch-specific memory barriers with __atomic builtins 2023-03-21 13:30 [lttng-dev] (no subject) Ondřej Surý via lttng-dev ` (3 preceding siblings ...) 2023-03-21 13:30 ` [lttng-dev] [PATCH 4/7] Replace the internal pointer manipulation with __atomic builtins Ondřej Surý via lttng-dev @ 2023-03-21 13:31 ` Ondřej Surý via lttng-dev 2023-03-21 20:19 ` Mathieu Desnoyers via lttng-dev 2023-03-21 13:31 ` [lttng-dev] [PATCH 6/7] Use __atomic builtins to implement CMM_{LOAD, STORE}_SHARED Ondřej Surý via lttng-dev 2023-03-21 13:31 ` [lttng-dev] [PATCH 7/7] Fix: uatomic_or() need retyping to uintptr_t in rculfhash.c Ondřej Surý via lttng-dev 6 siblings, 1 reply; 13+ messages in thread From: Ondřej Surý via lttng-dev @ 2023-03-21 13:31 UTC (permalink / raw) To: lttng-dev Instead of a custom code, use the __atomic_thread_fence() builtin to implement the cmm_mb(), cmm_rmb(), cmm_wmb(), cmm_smp_mb(), cmm_smp_rmb(), and cmm_smp_wmb() on all architectures, and cmm_read_barrier_depends() on alpha (otherwise it's still no-op). family of functions Signed-off-by: Ondřej Surý <ondrej@sury.org> --- include/urcu/arch/alpha.h | 6 +++--- include/urcu/arch/arm.h | 14 ------------- include/urcu/arch/generic.h | 6 +++--- include/urcu/arch/mips.h | 6 ------ include/urcu/arch/nios2.h | 2 -- include/urcu/arch/ppc.h | 25 ---------------------- include/urcu/arch/s390.h | 2 -- include/urcu/arch/sparc64.h | 13 ------------ include/urcu/arch/x86.h | 42 +++---------------------------------- 9 files changed, 9 insertions(+), 107 deletions(-) diff --git a/include/urcu/arch/alpha.h b/include/urcu/arch/alpha.h index dc33e28..61687c7 100644 --- a/include/urcu/arch/alpha.h +++ b/include/urcu/arch/alpha.h @@ -29,9 +29,9 @@ extern "C" { #endif -#define cmm_mb() __asm__ __volatile__ ("mb":::"memory") -#define cmm_wmb() __asm__ __volatile__ ("wmb":::"memory") -#define cmm_read_barrier_depends() __asm__ __volatile__ ("mb":::"memory") +#ifndef cmm_read_barrier_depends +#define cmm_read_barrier_depends() __atomic_thread_fence(__ATOMIC_CONSUME) +#endif /* * On Linux, define the membarrier system call number if not yet available in diff --git a/include/urcu/arch/arm.h b/include/urcu/arch/arm.h index 54ca4fa..b3671dc 100644 --- a/include/urcu/arch/arm.h +++ b/include/urcu/arch/arm.h @@ -39,20 +39,6 @@ extern "C" { /* For backwards compat. */ #define CONFIG_RCU_ARM_HAVE_DMB 1 -/* - * Issues full system DMB operation. - */ -#define cmm_mb() __asm__ __volatile__ ("dmb sy":::"memory") -#define cmm_rmb() __asm__ __volatile__ ("dmb sy":::"memory") -#define cmm_wmb() __asm__ __volatile__ ("dmb sy":::"memory") - -/* - * Issues DMB operation only to the inner shareable domain. - */ -#define cmm_smp_mb() __asm__ __volatile__ ("dmb ish":::"memory") -#define cmm_smp_rmb() __asm__ __volatile__ ("dmb ish":::"memory") -#define cmm_smp_wmb() __asm__ __volatile__ ("dmb ish":::"memory") - #endif /* URCU_ARCH_ARMV7 */ #include <stdlib.h> diff --git a/include/urcu/arch/generic.h b/include/urcu/arch/generic.h index be6e41e..2715162 100644 --- a/include/urcu/arch/generic.h +++ b/include/urcu/arch/generic.h @@ -44,15 +44,15 @@ extern "C" { */ #ifndef cmm_mb -#define cmm_mb() __sync_synchronize() +#define cmm_mb() __atomic_thread_fence(__ATOMIC_SEQ_CST) #endif #ifndef cmm_rmb -#define cmm_rmb() cmm_mb() +#define cmm_rmb() __atomic_thread_fence(__ATOMIC_ACQUIRE) #endif #ifndef cmm_wmb -#define cmm_wmb() cmm_mb() +#define cmm_wmb() __atomic_thread_fence(__ATOMIC_RELEASE) #endif #define cmm_mc() cmm_barrier() diff --git a/include/urcu/arch/mips.h b/include/urcu/arch/mips.h index ea5b7e9..ffe65c0 100644 --- a/include/urcu/arch/mips.h +++ b/include/urcu/arch/mips.h @@ -30,12 +30,6 @@ extern "C" { #endif -#define cmm_mb() __asm__ __volatile__ ( \ - " .set mips2 \n" \ - " sync \n" \ - " .set mips0 \n" \ - :::"memory") - #ifdef __cplusplus } #endif diff --git a/include/urcu/arch/nios2.h b/include/urcu/arch/nios2.h index b4f3e50..cd6bdb8 100644 --- a/include/urcu/arch/nios2.h +++ b/include/urcu/arch/nios2.h @@ -29,8 +29,6 @@ extern "C" { #endif -#define cmm_mb() cmm_barrier() - #ifdef __cplusplus } #endif diff --git a/include/urcu/arch/ppc.h b/include/urcu/arch/ppc.h index 791529e..618f79c 100644 --- a/include/urcu/arch/ppc.h +++ b/include/urcu/arch/ppc.h @@ -34,31 +34,6 @@ extern "C" { /* Include size of POWER5+ L3 cache lines: 256 bytes */ #define CAA_CACHE_LINE_SIZE 256 -#ifdef __NO_LWSYNC__ -#define LWSYNC_OPCODE "sync\n" -#else -#define LWSYNC_OPCODE "lwsync\n" -#endif - -/* - * Use sync for all cmm_mb/rmb/wmb barriers because lwsync does not - * preserve ordering of cacheable vs. non-cacheable accesses, so it - * should not be used to order with respect to MMIO operations. An - * eieio+lwsync pair is also not enough for cmm_rmb, because it will - * order cacheable and non-cacheable memory operations separately---i.e. - * not the latter against the former. - */ -#define cmm_mb() __asm__ __volatile__ ("sync":::"memory") - -/* - * lwsync orders loads in cacheable memory with respect to other loads, - * and stores in cacheable memory with respect to other stores. - * Therefore, use it for barriers ordering accesses to cacheable memory - * only. - */ -#define cmm_smp_rmb() __asm__ __volatile__ (LWSYNC_OPCODE:::"memory") -#define cmm_smp_wmb() __asm__ __volatile__ (LWSYNC_OPCODE:::"memory") - #define mftbl() \ __extension__ \ ({ \ diff --git a/include/urcu/arch/s390.h b/include/urcu/arch/s390.h index 67461b4..c22fdf9 100644 --- a/include/urcu/arch/s390.h +++ b/include/urcu/arch/s390.h @@ -39,8 +39,6 @@ extern "C" { #define CAA_CACHE_LINE_SIZE 128 -#define cmm_mb() __asm__ __volatile__("bcr 15,0" : : : "memory") - #define HAS_CAA_GET_CYCLES typedef uint64_t caa_cycles_t; diff --git a/include/urcu/arch/sparc64.h b/include/urcu/arch/sparc64.h index 1ff40f5..b4e25ca 100644 --- a/include/urcu/arch/sparc64.h +++ b/include/urcu/arch/sparc64.h @@ -40,19 +40,6 @@ extern "C" { #define CAA_CACHE_LINE_SIZE 256 -/* - * Inspired from the Linux kernel. Workaround Spitfire bug #51. - */ -#define membar_safe(type) \ -__asm__ __volatile__("ba,pt %%xcc, 1f\n\t" \ - "membar " type "\n" \ - "1:\n" \ - : : : "memory") - -#define cmm_mb() membar_safe("#LoadLoad | #LoadStore | #StoreStore | #StoreLoad") -#define cmm_rmb() membar_safe("#LoadLoad") -#define cmm_wmb() membar_safe("#StoreStore") - #ifdef __cplusplus } #endif diff --git a/include/urcu/arch/x86.h b/include/urcu/arch/x86.h index 744f9f9..af4487d 100644 --- a/include/urcu/arch/x86.h +++ b/include/urcu/arch/x86.h @@ -46,44 +46,8 @@ extern "C" { /* For backwards compat */ #define CONFIG_RCU_HAVE_FENCE 1 -#define cmm_mb() __asm__ __volatile__ ("mfence":::"memory") - -/* - * Define cmm_rmb/cmm_wmb to "strict" barriers that may be needed when - * using SSE or working with I/O areas. cmm_smp_rmb/cmm_smp_wmb are - * only compiler barriers, which is enough for general use. - */ -#define cmm_rmb() __asm__ __volatile__ ("lfence":::"memory") -#define cmm_wmb() __asm__ __volatile__ ("sfence"::: "memory") -#define cmm_smp_rmb() cmm_barrier() -#define cmm_smp_wmb() cmm_barrier() - -#else - -/* - * We leave smp_rmb/smp_wmb as full barriers for processors that do not have - * fence instructions. - * - * An empty cmm_smp_rmb() may not be enough on old PentiumPro multiprocessor - * systems, due to an erratum. The Linux kernel says that "Even distro - * kernels should think twice before enabling this", but for now let's - * be conservative and leave the full barrier on 32-bit processors. Also, - * IDT WinChip supports weak store ordering, and the kernel may enable it - * under our feet; cmm_smp_wmb() ceases to be a nop for these processors. - */ -#if (CAA_BITS_PER_LONG == 32) -#define cmm_mb() __asm__ __volatile__ ("lock; addl $0,0(%%esp)":::"memory") -#define cmm_rmb() __asm__ __volatile__ ("lock; addl $0,0(%%esp)":::"memory") -#define cmm_wmb() __asm__ __volatile__ ("lock; addl $0,0(%%esp)":::"memory") -#else -#define cmm_mb() __asm__ __volatile__ ("lock; addl $0,0(%%rsp)":::"memory") -#define cmm_rmb() __asm__ __volatile__ ("lock; addl $0,0(%%rsp)":::"memory") -#define cmm_wmb() __asm__ __volatile__ ("lock; addl $0,0(%%rsp)":::"memory") -#endif #endif -#define caa_cpu_relax() __asm__ __volatile__ ("rep; nop" : : : "memory") - #define HAS_CAA_GET_CYCLES #define rdtscll(val) \ @@ -98,10 +62,10 @@ typedef uint64_t caa_cycles_t; static inline caa_cycles_t caa_get_cycles(void) { - caa_cycles_t ret = 0; + caa_cycles_t ret = 0; - rdtscll(ret); - return ret; + rdtscll(ret); + return ret; } /* -- 2.39.2 _______________________________________________ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [lttng-dev] [PATCH 5/7] Replace the arch-specific memory barriers with __atomic builtins 2023-03-21 13:31 ` [lttng-dev] [PATCH 5/7] Replace the arch-specific memory barriers " Ondřej Surý via lttng-dev @ 2023-03-21 20:19 ` Mathieu Desnoyers via lttng-dev 0 siblings, 0 replies; 13+ messages in thread From: Mathieu Desnoyers via lttng-dev @ 2023-03-21 20:19 UTC (permalink / raw) To: Ondřej Surý, lttng-dev On 2023-03-21 09:31, Ondřej Surý via lttng-dev wrote: > Instead of a custom code, use the __atomic_thread_fence() builtin to > implement the cmm_mb(), cmm_rmb(), cmm_wmb(), cmm_smp_mb(), > cmm_smp_rmb(), and cmm_smp_wmb() on all architectures, and > cmm_read_barrier_depends() on alpha (otherwise it's still no-op). > > family of functions > > Signed-off-by: Ondřej Surý <ondrej@sury.org> > --- > include/urcu/arch/alpha.h | 6 +++--- > include/urcu/arch/arm.h | 14 ------------- > include/urcu/arch/generic.h | 6 +++--- > include/urcu/arch/mips.h | 6 ------ > include/urcu/arch/nios2.h | 2 -- > include/urcu/arch/ppc.h | 25 ---------------------- > include/urcu/arch/s390.h | 2 -- > include/urcu/arch/sparc64.h | 13 ------------ > include/urcu/arch/x86.h | 42 +++---------------------------------- > 9 files changed, 9 insertions(+), 107 deletions(-) > > diff --git a/include/urcu/arch/alpha.h b/include/urcu/arch/alpha.h > index dc33e28..61687c7 100644 > --- a/include/urcu/arch/alpha.h > +++ b/include/urcu/arch/alpha.h > @@ -29,9 +29,9 @@ > extern "C" { > #endif > > -#define cmm_mb() __asm__ __volatile__ ("mb":::"memory") > -#define cmm_wmb() __asm__ __volatile__ ("wmb":::"memory") > -#define cmm_read_barrier_depends() __asm__ __volatile__ ("mb":::"memory") > +#ifndef cmm_read_barrier_depends > +#define cmm_read_barrier_depends() __atomic_thread_fence(__ATOMIC_CONSUME) > +#endif I don't expect a #ifndef in arch-specific code. I would expect the ifndef in the generic code. [...] > diff --git a/include/urcu/arch/generic.h b/include/urcu/arch/generic.h > index be6e41e..2715162 100644 > --- a/include/urcu/arch/generic.h > +++ b/include/urcu/arch/generic.h > @@ -44,15 +44,15 @@ extern "C" { > */ > > #ifndef cmm_mb > -#define cmm_mb() __sync_synchronize() > +#define cmm_mb() __atomic_thread_fence(__ATOMIC_SEQ_CST) > #endif > > #ifndef cmm_rmb > -#define cmm_rmb() cmm_mb() > +#define cmm_rmb() __atomic_thread_fence(__ATOMIC_ACQUIRE) > #endif > > #ifndef cmm_wmb > -#define cmm_wmb() cmm_mb() > +#define cmm_wmb() __atomic_thread_fence(__ATOMIC_RELEASE) I don't think rmb/wmb map to ACQUIRE/RELEASE semantic. This is incorrect AFAIU. ACQUIRE/RELEASE are semi-permeable barriers preventing code motion in one direction or the other, whereas rmb/wmb are barriers that only affect code motion of either loads or stores (but in both directions). In the generic case, rmb/wmb could map to __atomic_thread_fence(__ATOMIC_SEQ_CST). > #endif > > #define cmm_mc() cmm_barrier() [...] > diff --git a/include/urcu/arch/ppc.h b/include/urcu/arch/ppc.h > index 791529e..618f79c 100644 > --- a/include/urcu/arch/ppc.h > +++ b/include/urcu/arch/ppc.h > @@ -34,31 +34,6 @@ extern "C" { > /* Include size of POWER5+ L3 cache lines: 256 bytes */ > #define CAA_CACHE_LINE_SIZE 256 > > -#ifdef __NO_LWSYNC__ > -#define LWSYNC_OPCODE "sync\n" > -#else > -#define LWSYNC_OPCODE "lwsync\n" > -#endif > - > -/* > - * Use sync for all cmm_mb/rmb/wmb barriers because lwsync does not > - * preserve ordering of cacheable vs. non-cacheable accesses, so it > - * should not be used to order with respect to MMIO operations. An > - * eieio+lwsync pair is also not enough for cmm_rmb, because it will > - * order cacheable and non-cacheable memory operations separately---i.e. > - * not the latter against the former. > - */ > -#define cmm_mb() __asm__ __volatile__ ("sync":::"memory") I agree that we will want to use the generic implementation for smp_mb. > - > -/* > - * lwsync orders loads in cacheable memory with respect to other loads, > - * and stores in cacheable memory with respect to other stores. > - * Therefore, use it for barriers ordering accesses to cacheable memory > - * only. > - */ > -#define cmm_smp_rmb() __asm__ __volatile__ (LWSYNC_OPCODE:::"memory") > -#define cmm_smp_wmb() __asm__ __volatile__ (LWSYNC_OPCODE:::"memory") I suspect that using the generic implementation will be slower than lwsync. I am tempted to keep a custom implementation for rmb/wmb on ppc. We could have a build mode specific for TSAN which overrides those to use smp_mb instead. > - > #define mftbl() \ > __extension__ \ > ({ \ [...] > diff --git a/include/urcu/arch/sparc64.h b/include/urcu/arch/sparc64.h > index 1ff40f5..b4e25ca 100644 > --- a/include/urcu/arch/sparc64.h > +++ b/include/urcu/arch/sparc64.h > @@ -40,19 +40,6 @@ extern "C" { > > #define CAA_CACHE_LINE_SIZE 256 > > -/* > - * Inspired from the Linux kernel. Workaround Spitfire bug #51. > - */ > -#define membar_safe(type) \ > -__asm__ __volatile__("ba,pt %%xcc, 1f\n\t" \ > - "membar " type "\n" \ > - "1:\n" \ > - : : : "memory") > - > -#define cmm_mb() membar_safe("#LoadLoad | #LoadStore | #StoreStore | #StoreLoad") > -#define cmm_rmb() membar_safe("#LoadLoad") > -#define cmm_wmb() membar_safe("#StoreStore") Same comment as for ppc. > - > #ifdef __cplusplus > } > #endif > diff --git a/include/urcu/arch/x86.h b/include/urcu/arch/x86.h > index 744f9f9..af4487d 100644 > --- a/include/urcu/arch/x86.h > +++ b/include/urcu/arch/x86.h > @@ -46,44 +46,8 @@ extern "C" { > /* For backwards compat */ > #define CONFIG_RCU_HAVE_FENCE 1 > > -#define cmm_mb() __asm__ __volatile__ ("mfence":::"memory") > - > -/* > - * Define cmm_rmb/cmm_wmb to "strict" barriers that may be needed when > - * using SSE or working with I/O areas. cmm_smp_rmb/cmm_smp_wmb are > - * only compiler barriers, which is enough for general use. > - */ > -#define cmm_rmb() __asm__ __volatile__ ("lfence":::"memory") > -#define cmm_wmb() __asm__ __volatile__ ("sfence"::: "memory") > -#define cmm_smp_rmb() cmm_barrier() > -#define cmm_smp_wmb() cmm_barrier() Relying on the generic barrier for rmb and wmb would slow things down on x86, we may want to do like I suggest for ppc. > - > -#else > - > -/* > - * We leave smp_rmb/smp_wmb as full barriers for processors that do not have > - * fence instructions. > - * > - * An empty cmm_smp_rmb() may not be enough on old PentiumPro multiprocessor > - * systems, due to an erratum. The Linux kernel says that "Even distro > - * kernels should think twice before enabling this", but for now let's > - * be conservative and leave the full barrier on 32-bit processors. Also, > - * IDT WinChip supports weak store ordering, and the kernel may enable it > - * under our feet; cmm_smp_wmb() ceases to be a nop for these processors. > - */ > -#if (CAA_BITS_PER_LONG == 32) > -#define cmm_mb() __asm__ __volatile__ ("lock; addl $0,0(%%esp)":::"memory") > -#define cmm_rmb() __asm__ __volatile__ ("lock; addl $0,0(%%esp)":::"memory") > -#define cmm_wmb() __asm__ __volatile__ ("lock; addl $0,0(%%esp)":::"memory") > -#else > -#define cmm_mb() __asm__ __volatile__ ("lock; addl $0,0(%%rsp)":::"memory") > -#define cmm_rmb() __asm__ __volatile__ ("lock; addl $0,0(%%rsp)":::"memory") > -#define cmm_wmb() __asm__ __volatile__ ("lock; addl $0,0(%%rsp)":::"memory") > -#endif Removing this removes support for older i686 and for URCU_ARCH_K1OM (Xeon Phi). Do we intend to remove that support ? Thanks, Mathieu > #endif > > -#define caa_cpu_relax() __asm__ __volatile__ ("rep; nop" : : : "memory") > - > #define HAS_CAA_GET_CYCLES > > #define rdtscll(val) \ > @@ -98,10 +62,10 @@ typedef uint64_t caa_cycles_t; > > static inline caa_cycles_t caa_get_cycles(void) > { > - caa_cycles_t ret = 0; > + caa_cycles_t ret = 0; > > - rdtscll(ret); > - return ret; > + rdtscll(ret); > + return ret; > } This whitespace to tab cleanup should be moved to its own patch. Thanks, Mathieu > > /* -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com _______________________________________________ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev ^ permalink raw reply [flat|nested] 13+ messages in thread
* [lttng-dev] [PATCH 6/7] Use __atomic builtins to implement CMM_{LOAD, STORE}_SHARED 2023-03-21 13:30 [lttng-dev] (no subject) Ondřej Surý via lttng-dev ` (4 preceding siblings ...) 2023-03-21 13:31 ` [lttng-dev] [PATCH 5/7] Replace the arch-specific memory barriers " Ondřej Surý via lttng-dev @ 2023-03-21 13:31 ` Ondřej Surý via lttng-dev 2023-03-21 13:31 ` [lttng-dev] [PATCH 7/7] Fix: uatomic_or() need retyping to uintptr_t in rculfhash.c Ondřej Surý via lttng-dev 6 siblings, 0 replies; 13+ messages in thread From: Ondřej Surý via lttng-dev @ 2023-03-21 13:31 UTC (permalink / raw) To: lttng-dev Instead of using CMM_ACCESS_ONCE() with memory barriers, use __atomic builtins with relaxed memory ordering to implement CMM_LOAD_SHARED() and CMM_STORE_SHARED(). Signed-off-by: Ondřej Surý <ondrej@sury.org> --- include/urcu/system.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/urcu/system.h b/include/urcu/system.h index faae390..4302253 100644 --- a/include/urcu/system.h +++ b/include/urcu/system.h @@ -26,7 +26,7 @@ * Identify a shared load. A cmm_smp_rmc() or cmm_smp_mc() should come * before the load. */ -#define _CMM_LOAD_SHARED(p) CMM_ACCESS_ONCE(p) +#define _CMM_LOAD_SHARED(p) __atomic_load_n(&(p), __ATOMIC_RELAXED) /* * Load a data from shared memory, doing a cache flush if required. @@ -42,7 +42,7 @@ * Identify a shared store. A cmm_smp_wmc() or cmm_smp_mc() should * follow the store. */ -#define _CMM_STORE_SHARED(x, v) __extension__ ({ CMM_ACCESS_ONCE(x) = (v); }) +#define _CMM_STORE_SHARED(x, v) __extension__ ({ __atomic_store_n(&(x), v, __ATOMIC_RELAXED); v; }) /* * Store v into x, where x is located in shared memory. Performs the -- 2.39.2 _______________________________________________ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [lttng-dev] [PATCH 7/7] Fix: uatomic_or() need retyping to uintptr_t in rculfhash.c 2023-03-21 13:30 [lttng-dev] (no subject) Ondřej Surý via lttng-dev ` (5 preceding siblings ...) 2023-03-21 13:31 ` [lttng-dev] [PATCH 6/7] Use __atomic builtins to implement CMM_{LOAD, STORE}_SHARED Ondřej Surý via lttng-dev @ 2023-03-21 13:31 ` Ondřej Surý via lttng-dev 6 siblings, 0 replies; 13+ messages in thread From: Ondřej Surý via lttng-dev @ 2023-03-21 13:31 UTC (permalink / raw) To: lttng-dev When adding REMOVED_FLAG to the pointers in the rculfhash implementation, retype the generic pointer to uintptr_t to fix the compiler error. Signed-off-by: Ondřej Surý <ondrej@sury.org> --- src/rculfhash.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/rculfhash.c b/src/rculfhash.c index b456415..863387e 100644 --- a/src/rculfhash.c +++ b/src/rculfhash.c @@ -1198,7 +1198,7 @@ int _cds_lfht_del(struct cds_lfht *ht, unsigned long size, * Knowing which wins the race will be known after the garbage * collection phase, stay tuned! */ - uatomic_or(&node->next, REMOVED_FLAG); + uatomic_or((uintptr_t *)&node->next, REMOVED_FLAG); /* We performed the (logical) deletion. */ /* @@ -1441,7 +1441,7 @@ void remove_table_partition(struct cds_lfht *ht, unsigned long i, dbg_printf("remove entry: order %lu index %lu hash %lu\n", i, j, j); /* Set the REMOVED_FLAG to freeze the ->next for gc */ - uatomic_or(&fini_bucket->next, REMOVED_FLAG); + uatomic_or((uintptr_t *)&fini_bucket->next, REMOVED_FLAG); _cds_lfht_gc_bucket(parent_bucket, fini_bucket); } ht->flavor->read_unlock(); -- 2.39.2 _______________________________________________ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [lttng-dev] [PATCH 0/7] Replace the custom code with gcc/clang __atomic builtins @ 2023-03-17 21:37 Ondřej Surý via lttng-dev 2023-03-17 21:37 ` [lttng-dev] [PATCH 4/7] Replace the internal pointer manipulation with " Ondřej Surý via lttng-dev 0 siblings, 1 reply; 13+ messages in thread From: Ondřej Surý via lttng-dev @ 2023-03-17 21:37 UTC (permalink / raw) To: lttng-dev Hi, (this is my first time using git send-email, so I apologise in advance if anything breaks). Here's my attempt to convert the Userspace RCU to use __atomic builtins whenever possible instead of custom assembly. The __atomic builtins were first introduced in gcc 4.7.0 and clang 3.1.0. Apart from simplifying the code, this should also help ThreadSanitizer to understand the memory synchronization and report less (or no) warnings. The code compiles and the tests passed (on amd64). This is by no means complete, and most probably I missed or misunderstood something, but it's a solid start, so I am submitting the patch set for discussion and review. Thanks, -- Ondřej Surý <ondrej@sury.org> _______________________________________________ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev ^ permalink raw reply [flat|nested] 13+ messages in thread
* [lttng-dev] [PATCH 4/7] Replace the internal pointer manipulation with __atomic builtins 2023-03-17 21:37 [lttng-dev] [PATCH 0/7] Replace the custom code with gcc/clang __atomic builtins Ondřej Surý via lttng-dev @ 2023-03-17 21:37 ` Ondřej Surý via lttng-dev 2023-03-20 18:25 ` Mathieu Desnoyers via lttng-dev 0 siblings, 1 reply; 13+ messages in thread From: Ondřej Surý via lttng-dev @ 2023-03-17 21:37 UTC (permalink / raw) To: lttng-dev Instead of custom code, use the __atomic builtins to implement the rcu_dereference(), rcu_cmpxchg_pointer(), rcu_xchg_pointer() and rcu_assign_pointer(). Signed-off-by: Ondřej Surý <ondrej@sury.org> --- include/urcu/arch.h | 20 +++++++++ include/urcu/arch/alpha.h | 6 +++ include/urcu/arch/arm.h | 12 ++++++ include/urcu/arch/mips.h | 2 + include/urcu/arch/nios2.h | 2 + include/urcu/arch/ppc.h | 6 +++ include/urcu/arch/s390.h | 2 + include/urcu/arch/sparc64.h | 6 +++ include/urcu/arch/x86.h | 20 +++++++++ include/urcu/static/pointer.h | 77 +++++++---------------------------- 10 files changed, 90 insertions(+), 63 deletions(-) diff --git a/include/urcu/arch.h b/include/urcu/arch.h index d3914da..aec6fa1 100644 --- a/include/urcu/arch.h +++ b/include/urcu/arch.h @@ -21,6 +21,26 @@ #ifndef _URCU_ARCH_H #define _URCU_ARCH_H +#if !defined(__has_feature) +#define __has_feature(x) 0 +#endif /* if !defined(__has_feature) */ + +/* GCC defines __SANITIZE_ADDRESS__, so reuse the macro for clang */ +#if __has_feature(address_sanitizer) +#define __SANITIZE_ADDRESS__ 1 +#endif /* if __has_feature(address_sanitizer) */ + +#ifdef __SANITIZE_THREAD__ +/* FIXME: Somebody who understands the barriers should look into this */ +#define cmm_mb() __atomic_thread_fence(__ATOMIC_ACQ_REL) +#define cmm_rmb() __atomic_thread_fence(__ATOMIC_ACQUIRE) +#define cmm_wmb() __atomic_thread_fence(__ATOMIC_RELEASE) +#define cmm_smp_mb() __atomic_thread_fence(__ATOMIC_ACQ_REL) +#define cmm_smp_rmb() __atomic_thread_fence(__ATOMIC_ACQUIRE) +#define cmm_smp_wmb() __atomic_thread_fence(__ATOMIC_RELEASE) +#define cmm_read_barrier_depends() __atomic_thread_fence(__ATOMIC_ACQ_REL) +#endif + /* * Architecture detection using compiler defines. * diff --git a/include/urcu/arch/alpha.h b/include/urcu/arch/alpha.h index dc33e28..84526ef 100644 --- a/include/urcu/arch/alpha.h +++ b/include/urcu/arch/alpha.h @@ -29,9 +29,15 @@ extern "C" { #endif +#ifndef cmm_mb #define cmm_mb() __asm__ __volatile__ ("mb":::"memory") +#endif +#ifndef cmm_wmb #define cmm_wmb() __asm__ __volatile__ ("wmb":::"memory") +#endif +#ifndef cmm_read_barrier_depends #define cmm_read_barrier_depends() __asm__ __volatile__ ("mb":::"memory") +#endif /* * On Linux, define the membarrier system call number if not yet available in diff --git a/include/urcu/arch/arm.h b/include/urcu/arch/arm.h index 54ca4fa..4950e13 100644 --- a/include/urcu/arch/arm.h +++ b/include/urcu/arch/arm.h @@ -42,16 +42,28 @@ extern "C" { /* * Issues full system DMB operation. */ +#ifndef cmm_mb #define cmm_mb() __asm__ __volatile__ ("dmb sy":::"memory") +#endif +#ifndef cmm_rmb #define cmm_rmb() __asm__ __volatile__ ("dmb sy":::"memory") +#endif +#ifndef cmm_wmb #define cmm_wmb() __asm__ __volatile__ ("dmb sy":::"memory") +#endif /* * Issues DMB operation only to the inner shareable domain. */ +#ifndef cmm_smp_mb #define cmm_smp_mb() __asm__ __volatile__ ("dmb ish":::"memory") +#endif +#ifndef cmm_smp_rmb #define cmm_smp_rmb() __asm__ __volatile__ ("dmb ish":::"memory") +#endif +#ifndef cmm_smp_wmb #define cmm_smp_wmb() __asm__ __volatile__ ("dmb ish":::"memory") +#endif #endif /* URCU_ARCH_ARMV7 */ diff --git a/include/urcu/arch/mips.h b/include/urcu/arch/mips.h index ea5b7e9..b9ee021 100644 --- a/include/urcu/arch/mips.h +++ b/include/urcu/arch/mips.h @@ -30,11 +30,13 @@ extern "C" { #endif +#ifndef cmm_mb #define cmm_mb() __asm__ __volatile__ ( \ " .set mips2 \n" \ " sync \n" \ " .set mips0 \n" \ :::"memory") +#endif #ifdef __cplusplus } diff --git a/include/urcu/arch/nios2.h b/include/urcu/arch/nios2.h index b4f3e50..5def45c 100644 --- a/include/urcu/arch/nios2.h +++ b/include/urcu/arch/nios2.h @@ -29,7 +29,9 @@ extern "C" { #endif +#ifndef cmm_mb #define cmm_mb() cmm_barrier() +#endif #ifdef __cplusplus } diff --git a/include/urcu/arch/ppc.h b/include/urcu/arch/ppc.h index 791529e..b8ec40d 100644 --- a/include/urcu/arch/ppc.h +++ b/include/urcu/arch/ppc.h @@ -48,7 +48,9 @@ extern "C" { * order cacheable and non-cacheable memory operations separately---i.e. * not the latter against the former. */ +#ifndef cmm_mb #define cmm_mb() __asm__ __volatile__ ("sync":::"memory") +#endif /* * lwsync orders loads in cacheable memory with respect to other loads, @@ -56,8 +58,12 @@ extern "C" { * Therefore, use it for barriers ordering accesses to cacheable memory * only. */ +#ifndef cmm_smp_rmb #define cmm_smp_rmb() __asm__ __volatile__ (LWSYNC_OPCODE:::"memory") +#endif +#ifndef cmm_smp_rmb #define cmm_smp_wmb() __asm__ __volatile__ (LWSYNC_OPCODE:::"memory") +#endif #define mftbl() \ __extension__ \ diff --git a/include/urcu/arch/s390.h b/include/urcu/arch/s390.h index 67461b4..2733873 100644 --- a/include/urcu/arch/s390.h +++ b/include/urcu/arch/s390.h @@ -39,7 +39,9 @@ extern "C" { #define CAA_CACHE_LINE_SIZE 128 +#ifndef cmm_mb #define cmm_mb() __asm__ __volatile__("bcr 15,0" : : : "memory") +#endif #define HAS_CAA_GET_CYCLES diff --git a/include/urcu/arch/sparc64.h b/include/urcu/arch/sparc64.h index 1ff40f5..32a6b0e 100644 --- a/include/urcu/arch/sparc64.h +++ b/include/urcu/arch/sparc64.h @@ -49,9 +49,15 @@ __asm__ __volatile__("ba,pt %%xcc, 1f\n\t" \ "1:\n" \ : : : "memory") +#ifndef cmm_mb #define cmm_mb() membar_safe("#LoadLoad | #LoadStore | #StoreStore | #StoreLoad") +#endif +#ifndef cmm_rmb #define cmm_rmb() membar_safe("#LoadLoad") +#endif +#ifndef cmm_wmb #define cmm_wmb() membar_safe("#StoreStore") +#endif #ifdef __cplusplus } diff --git a/include/urcu/arch/x86.h b/include/urcu/arch/x86.h index 744f9f9..6be9d38 100644 --- a/include/urcu/arch/x86.h +++ b/include/urcu/arch/x86.h @@ -46,15 +46,23 @@ extern "C" { /* For backwards compat */ #define CONFIG_RCU_HAVE_FENCE 1 +#ifndef cmm_mb #define cmm_mb() __asm__ __volatile__ ("mfence":::"memory") +#endif /* * Define cmm_rmb/cmm_wmb to "strict" barriers that may be needed when * using SSE or working with I/O areas. cmm_smp_rmb/cmm_smp_wmb are * only compiler barriers, which is enough for general use. */ +#ifndef cmm_rmb #define cmm_rmb() __asm__ __volatile__ ("lfence":::"memory") +#endif + +#ifndef cmm_wmb #define cmm_wmb() __asm__ __volatile__ ("sfence"::: "memory") +#endif + #define cmm_smp_rmb() cmm_barrier() #define cmm_smp_wmb() cmm_barrier() @@ -72,15 +80,27 @@ extern "C" { * under our feet; cmm_smp_wmb() ceases to be a nop for these processors. */ #if (CAA_BITS_PER_LONG == 32) +#ifndef cmm_mb #define cmm_mb() __asm__ __volatile__ ("lock; addl $0,0(%%esp)":::"memory") +#endif +#ifndef cmm_rmb #define cmm_rmb() __asm__ __volatile__ ("lock; addl $0,0(%%esp)":::"memory") +#endif +#ifndef cmm_wmb #define cmm_wmb() __asm__ __volatile__ ("lock; addl $0,0(%%esp)":::"memory") +#endif #else +#ifndef cmm_mb #define cmm_mb() __asm__ __volatile__ ("lock; addl $0,0(%%rsp)":::"memory") +#endif +#ifndef cmm_rmb #define cmm_rmb() __asm__ __volatile__ ("lock; addl $0,0(%%rsp)":::"memory") +#endif +#ifndef cmm_wmb #define cmm_wmb() __asm__ __volatile__ ("lock; addl $0,0(%%rsp)":::"memory") #endif #endif +#endif #define caa_cpu_relax() __asm__ __volatile__ ("rep; nop" : : : "memory") diff --git a/include/urcu/static/pointer.h b/include/urcu/static/pointer.h index 9e46a57..3f116f3 100644 --- a/include/urcu/static/pointer.h +++ b/include/urcu/static/pointer.h @@ -38,6 +38,8 @@ extern "C" { #endif +#define _rcu_get_pointer(addr) __atomic_load_n(addr, __ATOMIC_CONSUME) + /** * _rcu_dereference - reads (copy) a RCU-protected pointer to a local variable * into a RCU read-side critical section. The pointer can later be safely @@ -49,14 +51,6 @@ extern "C" { * Inserts memory barriers on architectures that require them (currently only * Alpha) and documents which pointers are protected by RCU. * - * With C standards prior to C11/C++11, the compiler memory barrier in - * CMM_LOAD_SHARED() ensures that value-speculative optimizations (e.g. - * VSS: Value Speculation Scheduling) does not perform the data read - * before the pointer read by speculating the value of the pointer. - * Correct ordering is ensured because the pointer is read as a volatile - * access. This acts as a global side-effect operation, which forbids - * reordering of dependent memory operations. - * * With C standards C11/C++11, concerns about dependency-breaking * optimizations are taken care of by the "memory_order_consume" atomic * load. @@ -65,10 +59,6 @@ extern "C" { * explicit because the pointer used as input argument is a pointer, * not an _Atomic type as required by C11/C++11. * - * By defining URCU_DEREFERENCE_USE_VOLATILE, the user requires use of - * volatile access to implement rcu_dereference rather than - * memory_order_consume load from the C11/C++11 standards. - * * This may improve performance on weakly-ordered architectures where * the compiler implements memory_order_consume as a * memory_order_acquire, which is stricter than required by the @@ -83,35 +73,7 @@ extern "C" { * meets the 10-line criterion in LGPL, allowing this function to be * expanded directly in non-LGPL code. */ - -#if !defined (URCU_DEREFERENCE_USE_VOLATILE) && \ - ((defined (__cplusplus) && __cplusplus >= 201103L) || \ - (defined (__STDC_VERSION__) && __STDC_VERSION__ >= 201112L)) -# define __URCU_DEREFERENCE_USE_ATOMIC_CONSUME -#endif - -/* - * If p is const (the pointer itself, not what it points to), using - * __typeof__(p) would declare a const variable, leading to - * -Wincompatible-pointer-types errors. Using the statement expression - * makes it an rvalue and gets rid of the const-ness. - */ -#ifdef __URCU_DEREFERENCE_USE_ATOMIC_CONSUME -# define _rcu_dereference(p) __extension__ ({ \ - __typeof__(__extension__ ({ \ - __typeof__(p) __attribute__((unused)) _________p0 = { 0 }; \ - _________p0; \ - })) _________p1; \ - __atomic_load(&(p), &_________p1, __ATOMIC_CONSUME); \ - (_________p1); \ - }) -#else -# define _rcu_dereference(p) __extension__ ({ \ - __typeof__(p) _________p1 = CMM_LOAD_SHARED(p); \ - cmm_smp_read_barrier_depends(); \ - (_________p1); \ - }) -#endif +#define _rcu_dereference(p) _rcu_get_pointer(&(p)) /** * _rcu_cmpxchg_pointer - same as rcu_assign_pointer, but tests if the pointer @@ -126,12 +88,12 @@ extern "C" { * meets the 10-line criterion in LGPL, allowing this function to be * expanded directly in non-LGPL code. */ -#define _rcu_cmpxchg_pointer(p, old, _new) \ - __extension__ \ - ({ \ - __typeof__(*p) _________pold = (old); \ - __typeof__(*p) _________pnew = (_new); \ - uatomic_cmpxchg(p, _________pold, _________pnew); \ +#define _rcu_cmpxchg_pointer(p, old, _new) \ + ({ \ + __typeof__(*(p)) __old = old; \ + __atomic_compare_exchange_n(p, &__old, _new, 0, \ + __ATOMIC_ACQ_REL, __ATOMIC_CONSUME); \ + __old; \ }) /** @@ -145,22 +107,11 @@ extern "C" { * meets the 10-line criterion in LGPL, allowing this function to be * expanded directly in non-LGPL code. */ -#define _rcu_xchg_pointer(p, v) \ - __extension__ \ - ({ \ - __typeof__(*p) _________pv = (v); \ - uatomic_xchg(p, _________pv); \ - }) - +#define _rcu_xchg_pointer(p, v) \ + __atomic_exchange_n(p, v, __ATOMIC_ACQ_REL) -#define _rcu_set_pointer(p, v) \ - do { \ - __typeof__(*p) _________pv = (v); \ - if (!__builtin_constant_p(v) || \ - ((v) != NULL)) \ - cmm_wmb(); \ - uatomic_set(p, _________pv); \ - } while (0) +#define _rcu_set_pointer(p, v) \ + __atomic_store_n(p, v, __ATOMIC_RELEASE) /** * _rcu_assign_pointer - assign (publicize) a pointer to a new data structure @@ -178,7 +129,7 @@ extern "C" { * meets the 10-line criterion in LGPL, allowing this function to be * expanded directly in non-LGPL code. */ -#define _rcu_assign_pointer(p, v) _rcu_set_pointer(&(p), v) +#define _rcu_assign_pointer(p, v) rcu_set_pointer(&(p), v) #ifdef __cplusplus } -- 2.39.2 _______________________________________________ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [lttng-dev] [PATCH 4/7] Replace the internal pointer manipulation with __atomic builtins 2023-03-17 21:37 ` [lttng-dev] [PATCH 4/7] Replace the internal pointer manipulation with " Ondřej Surý via lttng-dev @ 2023-03-20 18:25 ` Mathieu Desnoyers via lttng-dev 0 siblings, 0 replies; 13+ messages in thread From: Mathieu Desnoyers via lttng-dev @ 2023-03-20 18:25 UTC (permalink / raw) To: Ondřej Surý, lttng-dev, paulmck On 2023-03-17 17:37, Ondřej Surý via lttng-dev wrote: > Instead of custom code, use the __atomic builtins to implement the > rcu_dereference(), rcu_cmpxchg_pointer(), rcu_xchg_pointer() and > rcu_assign_pointer(). This also changes the cmm_mb() family of functions, but not everywhere. This should be documented. I'm also unsure why architecture code has #ifndef cmm_mb when we would expect the generic arch implementation to be conditional (the other way around). > > Signed-off-by: Ondřej Surý <ondrej@sury.org> > --- > include/urcu/arch.h | 20 +++++++++ > include/urcu/arch/alpha.h | 6 +++ > include/urcu/arch/arm.h | 12 ++++++ > include/urcu/arch/mips.h | 2 + > include/urcu/arch/nios2.h | 2 + > include/urcu/arch/ppc.h | 6 +++ > include/urcu/arch/s390.h | 2 + > include/urcu/arch/sparc64.h | 6 +++ > include/urcu/arch/x86.h | 20 +++++++++ > include/urcu/static/pointer.h | 77 +++++++---------------------------- > 10 files changed, 90 insertions(+), 63 deletions(-) > > diff --git a/include/urcu/arch.h b/include/urcu/arch.h > index d3914da..aec6fa1 100644 > --- a/include/urcu/arch.h > +++ b/include/urcu/arch.h > @@ -21,6 +21,26 @@ > #ifndef _URCU_ARCH_H > #define _URCU_ARCH_H > > +#if !defined(__has_feature) > +#define __has_feature(x) 0 > +#endif /* if !defined(__has_feature) */ > + > +/* GCC defines __SANITIZE_ADDRESS__, so reuse the macro for clang */ > +#if __has_feature(address_sanitizer) > +#define __SANITIZE_ADDRESS__ 1 > +#endif /* if __has_feature(address_sanitizer) */ > + > +#ifdef __SANITIZE_THREAD__ > +/* FIXME: Somebody who understands the barriers should look into this */ > +#define cmm_mb() __atomic_thread_fence(__ATOMIC_ACQ_REL) This really needs to be __ATOMIC_SEQ_CST. > +#define cmm_rmb() __atomic_thread_fence(__ATOMIC_ACQUIRE) > +#define cmm_wmb() __atomic_thread_fence(__ATOMIC_RELEASE) I am really unsure that rmb/wmb semantics map to acq/rel. Paul, can you confirm ? > +#define cmm_smp_mb() __atomic_thread_fence(__ATOMIC_ACQ_REL) SEQ_CST. > +#define cmm_smp_rmb() __atomic_thread_fence(__ATOMIC_ACQUIRE) > +#define cmm_smp_wmb() __atomic_thread_fence(__ATOMIC_RELEASE) Unsure (see above). > +#define cmm_read_barrier_depends() __atomic_thread_fence(__ATOMIC_ACQ_REL) This would map to __ATOMIC_CONSUME, but AFAIK the current implementation of this semantic is done with __ATOMIC_ACQUIRE which is stronger than what is really needed here. So we can expect a slowdown on some architectures if we go that way. Should we favor code simplicity and long-term maintainability at the expense of performance in the short-term ? Or should we keep arch-specific implementations until the toolchains end up implementing a proper consume semantic ? > +#endif > + > /* > * Architecture detection using compiler defines. > * > diff --git a/include/urcu/arch/alpha.h b/include/urcu/arch/alpha.h > index dc33e28..84526ef 100644 > --- a/include/urcu/arch/alpha.h > +++ b/include/urcu/arch/alpha.h > @@ -29,9 +29,15 @@ > extern "C" { > #endif > > +#ifndef cmm_mb > #define cmm_mb() __asm__ __volatile__ ("mb":::"memory") > +#endif > +#ifndef cmm_wmb > #define cmm_wmb() __asm__ __volatile__ ("wmb":::"memory") > +#endif > +#ifndef cmm_read_barrier_depends > #define cmm_read_barrier_depends() __asm__ __volatile__ ("mb":::"memory") > +#endif > [...] > diff --git a/include/urcu/static/pointer.h b/include/urcu/static/pointer.h > index 9e46a57..3f116f3 100644 > --- a/include/urcu/static/pointer.h > +++ b/include/urcu/static/pointer.h > @@ -38,6 +38,8 @@ > extern "C" { > #endif > > +#define _rcu_get_pointer(addr) __atomic_load_n(addr, __ATOMIC_CONSUME) > + > /** > * _rcu_dereference - reads (copy) a RCU-protected pointer to a local variable > * into a RCU read-side critical section. The pointer can later be safely > @@ -49,14 +51,6 @@ extern "C" { > * Inserts memory barriers on architectures that require them (currently only > * Alpha) and documents which pointers are protected by RCU. > * > - * With C standards prior to C11/C++11, the compiler memory barrier in > - * CMM_LOAD_SHARED() ensures that value-speculative optimizations (e.g. > - * VSS: Value Speculation Scheduling) does not perform the data read > - * before the pointer read by speculating the value of the pointer. > - * Correct ordering is ensured because the pointer is read as a volatile > - * access. This acts as a global side-effect operation, which forbids > - * reordering of dependent memory operations. We should document that we end up relying on CONSUME for rcu_dereference in the patch commit message. > - * > * With C standards C11/C++11, concerns about dependency-breaking > * optimizations are taken care of by the "memory_order_consume" atomic > * load. > @@ -65,10 +59,6 @@ extern "C" { > * explicit because the pointer used as input argument is a pointer, > * not an _Atomic type as required by C11/C++11. > * > - * By defining URCU_DEREFERENCE_USE_VOLATILE, the user requires use of > - * volatile access to implement rcu_dereference rather than > - * memory_order_consume load from the C11/C++11 standards. > - * > * This may improve performance on weakly-ordered architectures where > * the compiler implements memory_order_consume as a > * memory_order_acquire, which is stricter than required by the > @@ -83,35 +73,7 @@ extern "C" { > * meets the 10-line criterion in LGPL, allowing this function to be > * expanded directly in non-LGPL code. > */ > - > -#if !defined (URCU_DEREFERENCE_USE_VOLATILE) && \ > - ((defined (__cplusplus) && __cplusplus >= 201103L) || \ > - (defined (__STDC_VERSION__) && __STDC_VERSION__ >= 201112L)) > -# define __URCU_DEREFERENCE_USE_ATOMIC_CONSUME > -#endif > - > -/* > - * If p is const (the pointer itself, not what it points to), using > - * __typeof__(p) would declare a const variable, leading to > - * -Wincompatible-pointer-types errors. Using the statement expression > - * makes it an rvalue and gets rid of the const-ness. > - */ > -#ifdef __URCU_DEREFERENCE_USE_ATOMIC_CONSUME > -# define _rcu_dereference(p) __extension__ ({ \ > - __typeof__(__extension__ ({ \ > - __typeof__(p) __attribute__((unused)) _________p0 = { 0 }; \ > - _________p0; \ > - })) _________p1; \ > - __atomic_load(&(p), &_________p1, __ATOMIC_CONSUME); \ > - (_________p1); \ > - }) > -#else > -# define _rcu_dereference(p) __extension__ ({ \ > - __typeof__(p) _________p1 = CMM_LOAD_SHARED(p); \ > - cmm_smp_read_barrier_depends(); \ > - (_________p1); \ > - }) > -#endif > +#define _rcu_dereference(p) _rcu_get_pointer(&(p)) > > /** > * _rcu_cmpxchg_pointer - same as rcu_assign_pointer, but tests if the pointer > @@ -126,12 +88,12 @@ extern "C" { > * meets the 10-line criterion in LGPL, allowing this function to be > * expanded directly in non-LGPL code. > */ > -#define _rcu_cmpxchg_pointer(p, old, _new) \ > - __extension__ \ > - ({ \ > - __typeof__(*p) _________pold = (old); \ > - __typeof__(*p) _________pnew = (_new); \ > - uatomic_cmpxchg(p, _________pold, _________pnew); \ > +#define _rcu_cmpxchg_pointer(p, old, _new) \ > + ({ \ > + __typeof__(*(p)) __old = old; \ > + __atomic_compare_exchange_n(p, &__old, _new, 0, \ > + __ATOMIC_ACQ_REL, __ATOMIC_CONSUME); \ __ATOMIC_SEQ_CST on both success and failure. > + __old; \ > }) > > /** > @@ -145,22 +107,11 @@ extern "C" { > * meets the 10-line criterion in LGPL, allowing this function to be > * expanded directly in non-LGPL code. > */ > -#define _rcu_xchg_pointer(p, v) \ > - __extension__ \ > - ({ \ > - __typeof__(*p) _________pv = (v); \ > - uatomic_xchg(p, _________pv); \ > - }) > - > +#define _rcu_xchg_pointer(p, v) \ > + __atomic_exchange_n(p, v, __ATOMIC_ACQ_REL) __ATOMIC_SEQ_CST. > > -#define _rcu_set_pointer(p, v) \ > - do { \ > - __typeof__(*p) _________pv = (v); \ > - if (!__builtin_constant_p(v) || \ > - ((v) != NULL)) \ > - cmm_wmb(); \ > - uatomic_set(p, _________pv); \ > - } while (0) > +#define _rcu_set_pointer(p, v) \ > + __atomic_store_n(p, v, __ATOMIC_RELEASE) OK. Thanks, Mathieu > > /** > * _rcu_assign_pointer - assign (publicize) a pointer to a new data structure > @@ -178,7 +129,7 @@ extern "C" { > * meets the 10-line criterion in LGPL, allowing this function to be > * expanded directly in non-LGPL code. > */ > -#define _rcu_assign_pointer(p, v) _rcu_set_pointer(&(p), v) > +#define _rcu_assign_pointer(p, v) rcu_set_pointer(&(p), v) > > #ifdef __cplusplus > } -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com _______________________________________________ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2023-03-21 20:19 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-03-21 13:30 [lttng-dev] (no subject) Ondřej Surý via lttng-dev 2023-03-21 13:30 ` [lttng-dev] [PATCH 1/7] Require __atomic builtins to build Ondřej Surý via lttng-dev 2023-03-21 19:26 ` Mathieu Desnoyers via lttng-dev 2023-03-21 13:30 ` [lttng-dev] [PATCH 2/7] Use gcc __atomic builtis for <urcu/uatomic.h> implementation Ondřej Surý via lttng-dev 2023-03-21 20:03 ` Mathieu Desnoyers via lttng-dev 2023-03-21 13:30 ` [lttng-dev] [PATCH 3/7] Use __atomic_signal_fence() for cmm_barrier() Ondřej Surý via lttng-dev 2023-03-21 13:30 ` [lttng-dev] [PATCH 4/7] Replace the internal pointer manipulation with __atomic builtins Ondřej Surý via lttng-dev 2023-03-21 13:31 ` [lttng-dev] [PATCH 5/7] Replace the arch-specific memory barriers " Ondřej Surý via lttng-dev 2023-03-21 20:19 ` Mathieu Desnoyers via lttng-dev 2023-03-21 13:31 ` [lttng-dev] [PATCH 6/7] Use __atomic builtins to implement CMM_{LOAD, STORE}_SHARED Ondřej Surý via lttng-dev 2023-03-21 13:31 ` [lttng-dev] [PATCH 7/7] Fix: uatomic_or() need retyping to uintptr_t in rculfhash.c Ondřej Surý via lttng-dev -- strict thread matches above, loose matches on Subject: below -- 2023-03-17 21:37 [lttng-dev] [PATCH 0/7] Replace the custom code with gcc/clang __atomic builtins Ondřej Surý via lttng-dev 2023-03-17 21:37 ` [lttng-dev] [PATCH 4/7] Replace the internal pointer manipulation with " Ondřej Surý via lttng-dev 2023-03-20 18:25 ` Mathieu Desnoyers via lttng-dev
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).