All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/7] Add generic support for relaxed atomics
@ 2015-07-16 15:32 Will Deacon
  2015-07-16 15:32 ` [PATCH v2 1/7] atomics: add acquire/release/relaxed variants of some atomic operations Will Deacon
                   ` (6 more replies)
  0 siblings, 7 replies; 21+ messages in thread
From: Will Deacon @ 2015-07-16 15:32 UTC (permalink / raw)
  To: linux-arch; +Cc: Waiman.Long, peterz, linux-kernel, paulmck, Will Deacon

Hi all,

This is version two of the patches I previously posted here:

  https://lwn.net/Articles/650862/

The series adds support for a family of relaxed atomics to the kernel.
More specifically:

  - acquire/release/relaxed flavours of xchg, cmpxchg and {add,sub}_return
  - atomic_read_acquire
  - atomic_set_release

This came out of a separate patch series porting the (barrier-heavy)
qrwlock code to arm64. Rather than have arch-specific hooks littered
around the place, it makes more sense to define a core set of relaxed
atomics that can be used regardless of architecture.

Changes since v1 include:

  - Added macros to construct acquire/release/fence variants from a
    relaxed variant + barrier macros

  - Ported ARM as a proof of concept (with a negative diffstat!)

Build tested on ARM, arm64, PowerPC and x86.

All feedback welcome,

Will

--->8

Will Deacon (7):
  atomics: add acquire/release/relaxed variants of some atomic
    operations
  asm-generic: rework atomic-long.h to avoid bulk code duplication
  asm-generic: add relaxed/acquire/release variants for atomic_long_t
  lockref: remove homebrew cmpxchg64_relaxed macro definition
  locking/qrwlock: make use of acquire/release/relaxed atomics
  include/llist: use linux/atomic.h instead of asm/cmpxchg.h
  ARM: atomics: define our SMP atomics in terms of _relaxed operations

 arch/arm/include/asm/atomic.h     |  37 ++---
 arch/arm/include/asm/cmpxchg.h    |  47 +-----
 include/asm-generic/atomic-long.h | 263 ++++++++++++--------------------
 include/asm-generic/qrwlock.h     |  13 +-
 include/linux/atomic.h            | 312 ++++++++++++++++++++++++++++++++++++++
 include/linux/llist.h             |   2 +-
 kernel/locking/qrwlock.c          |  12 +-
 lib/lockref.c                     |   8 -
 8 files changed, 442 insertions(+), 252 deletions(-)

-- 
2.1.4


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 1/7] atomics: add acquire/release/relaxed variants of some atomic operations
  2015-07-16 15:32 [PATCH v2 0/7] Add generic support for relaxed atomics Will Deacon
@ 2015-07-16 15:32 ` Will Deacon
  2015-07-17  0:07   ` Waiman Long
  2015-07-16 15:32 ` [PATCH v2 2/7] asm-generic: rework atomic-long.h to avoid bulk code duplication Will Deacon
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 21+ messages in thread
From: Will Deacon @ 2015-07-16 15:32 UTC (permalink / raw)
  To: linux-arch; +Cc: Waiman.Long, peterz, linux-kernel, paulmck, Will Deacon

Whilst porting the generic qrwlock code over to arm64, it became
apparent that any portable locking code needs finer-grained control of
the memory-ordering guarantees provided by our atomic routines.

In particular: xchg, cmpxchg, {add,sub}_return are often used in
situations where full barrier semantics (currently the only option
available) are not required. For example, when a reader increments a
reader count to obtain a lock, checking the old value to see if a writer
was present, only acquire semantics are strictly needed.

This patch introduces three new ordering semantics for these operations:

  - *_relaxed: No ordering guarantees. This is similar to what we have
               already for the non-return atomics (e.g. atomic_add).

  - *_acquire: ACQUIRE semantics, similar to smp_load_acquire.

  - *_release: RELEASE semantics, similar to smp_store_release.

In memory-ordering speak, this means that the acquire/release semantics
are RCpc as opposed to RCsc. Consequently a RELEASE followed by an
ACQUIRE does not imply a full barrier, as already documented in
memory-barriers.txt.

Currently, all the new macros are conditionally mapped to the full-mb
variants, however if the *_relaxed version is provided by the
architecture, then the acquire/release variants are constructed by
supplementing the relaxed routine with an explicit barrier.

Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 include/linux/atomic.h | 312 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 312 insertions(+)

diff --git a/include/linux/atomic.h b/include/linux/atomic.h
index 5b08a8540ecf..08c2f6e56f76 100644
--- a/include/linux/atomic.h
+++ b/include/linux/atomic.h
@@ -2,6 +2,318 @@
 #ifndef _LINUX_ATOMIC_H
 #define _LINUX_ATOMIC_H
 #include <asm/atomic.h>
+#include <asm/barrier.h>
+
+/*
+ * Relaxed variants of xchg, cmpxchg and some atomic operations.
+ *
+ * We support four variants:
+ *
+ * - Fully ordered: The default implementation, no suffix required.
+ * - Acquire: Provides ACQUIRE semantics, _acquire suffix.
+ * - Release: Provides RELEASE semantics, _release suffix.
+ * - Relaxed: No ordering guarantees, _relaxed suffix.
+ *
+ * See Documentation/memory-barriers.txt for ACQUIRE/RELEASE definitions.
+ */
+
+#ifndef atomic_read_acquire
+#define  atomic_read_acquire(v)		smp_load_acquire(&(v)->counter)
+#endif
+
+#ifndef atomic_set_release
+#define  atomic_set_release(v, i)	smp_store_release(&(v)->counter, (i))
+#endif
+
+/*
+ * The idea here is to build acquire/release variants by adding explicit
+ * barriers on top of the relaxed variant. In the case where the relaxed
+ * variant is already fully ordered, no additional barriers are needed.
+ */
+#define __atomic_op_acquire(ret_t, op, ...)				\
+({									\
+	ret_t __ret = op##_relaxed(__VA_ARGS__);			\
+	smp_mb__after_atomic();						\
+	__ret;								\
+})
+
+#define __atomic_op_release(ret_t, op, ...)				\
+({									\
+	ret_t __ret;							\
+	smp_mb__before_atomic();					\
+	__ret = op##_relaxed(__VA_ARGS__);				\
+	__ret;								\
+})
+
+#define __atomic_op_fence(ret_t, op, ...)				\
+({									\
+	ret_t __ret;							\
+	smp_mb__before_atomic();					\
+	__ret = op##_relaxed(__VA_ARGS__);				\
+	smp_mb__after_atomic();						\
+	__ret;								\
+})
+
+#ifndef atomic_add_return_relaxed
+#define  atomic_add_return_relaxed	atomic_add_return
+#define  atomic_add_return_acquire	atomic_add_return
+#define  atomic_add_return_release	atomic_add_return
+
+#else /* atomic_add_return_relaxed */
+
+#ifndef atomic_add_return_acquire
+#define  atomic_add_return_acquire(...)					\
+	__atomic_op_acquire(int, atomic_add_return, __VA_ARGS__)
+#endif
+
+#ifndef atomic_add_return_release
+#define  atomic_add_return_release(...)					\
+	__atomic_op_release(int, atomic_add_return, __VA_ARGS__)
+#endif
+
+#ifndef atomic_add_return
+#define  atomic_add_return(...)						\
+	__atomic_op_fence(int, atomic_add_return, __VA_ARGS__)
+#endif
+#endif /* atomic_add_return_relaxed */
+
+#ifndef atomic_sub_return_relaxed
+#define  atomic_sub_return_relaxed	atomic_sub_return
+#define  atomic_sub_return_acquire	atomic_sub_return
+#define  atomic_sub_return_release	atomic_sub_return
+
+#else /* atomic_sub_return_relaxed */
+
+#ifndef atomic_sub_return_acquire
+#define  atomic_sub_return_acquire(...)					\
+	__atomic_op_acquire(int, atomic_sub_return, __VA_ARGS__)
+#endif
+
+#ifndef atomic_sub_return_release
+#define  atomic_sub_return_release(...)					\
+	__atomic_op_release(int, atomic_sub_return, __VA_ARGS__)
+#endif
+
+#ifndef atomic_sub_return
+#define  atomic_sub_return(...)						\
+	__atomic_op_fence(int, atomic_sub_return, __VA_ARGS__)
+#endif
+#endif /* atomic_sub_return_relaxed */
+
+#ifndef atomic_xchg_relaxed
+#define  atomic_xchg_relaxed		atomic_xchg
+#define  atomic_xchg_acquire		atomic_xchg
+#define  atomic_xchg_release		atomic_xchg
+
+#else /* atomic_xchg_relaxed */
+
+#ifndef atomic_xchg_acquire
+#define  atomic_xchg_acquire(...)					\
+	__atomic_op_acquire(int, atomic_xchg, __VA_ARGS__)
+#endif
+
+#ifndef atomic_xchg_release
+#define  atomic_xchg_release(...)					\
+	__atomic_op_release(int, atomic_xchg, __VA_ARGS__)
+#endif
+
+#ifndef atomic_xchg
+#define  atomic_xchg(...)						\
+	__atomic_op_fence(int, atomic_xchg, __VA_ARGS__)
+#endif
+#endif /* atomic_xchg_relaxed */
+
+#ifndef atomic_cmpxchg_relaxed
+#define  atomic_cmpxchg_relaxed		atomic_cmpxchg
+#define  atomic_cmpxchg_acquire		atomic_cmpxchg
+#define  atomic_cmpxchg_release		atomic_cmpxchg
+
+#else /* atomic_cmpxchg_relaxed */
+
+#ifndef atomic_cmpxchg_acquire
+#define  atomic_cmpxchg_acquire(...)					\
+	__atomic_op_acquire(int, atomic_cmpxchg, __VA_ARGS__)
+#endif
+
+#ifndef atomic_cmpxchg_release
+#define  atomic_cmpxchg_release(...)					\
+	__atomic_op_release(int, atomic_cmpxchg, __VA_ARGS__)
+#endif
+
+#ifndef atomic_cmpxchg
+#define  atomic_cmpxchg(...)						\
+	__atomic_op_fence(int, atomic_cmpxchg, __VA_ARGS__)
+#endif
+#endif /* atomic_cmpxchg_relaxed */
+
+#ifndef atomic64_read_acquire
+#define  atomic64_read_acquire(v)	smp_load_acquire(&(v)->counter)
+#endif
+
+#ifndef atomic64_set_release
+#define  atomic64_set_release(v, i)	smp_store_release(&(v)->counter, (i))
+#endif
+
+#ifndef atomic64_add_return_relaxed
+#define  atomic64_add_return_relaxed	atomic64_add_return
+#define  atomic64_add_return_acquire	atomic64_add_return
+#define  atomic64_add_return_release	atomic64_add_return
+
+#else /* atomic64_add_return_relaxed */
+
+#ifndef atomic64_add_return_acquire
+#define  atomic64_add_return_acquire(...)				\
+	__atomic_op_acquire(long long, atomic64_add_return, __VA_ARGS__)
+#endif
+
+#ifndef atomic64_add_return_release
+#define  atomic64_add_return_release(...)				\
+	__atomic_op_release(long long, atomic64_add_return, __VA_ARGS__)
+#endif
+
+#ifndef atomic64_add_return
+#define  atomic64_add_return(...)					\
+	__atomic_op_fence(long long, atomic64_add_return, __VA_ARGS__)
+#endif
+#endif /* atomic64_add_return_relaxed */
+
+#ifndef atomic64_sub_return_relaxed
+#define  atomic64_sub_return_relaxed	atomic64_sub_return
+#define  atomic64_sub_return_acquire	atomic64_sub_return
+#define  atomic64_sub_return_release	atomic64_sub_return
+
+#else /* atomic64_sub_return_relaxed */
+
+#ifndef atomic64_sub_return_acquire
+#define  atomic64_sub_return_acquire(...)				\
+	__atomic_op_acquire(long long, atomic64_sub_return, __VA_ARGS__)
+#endif
+
+#ifndef atomic64_sub_return_release
+#define  atomic64_sub_return_release(...)				\
+	__atomic_op_release(long long, atomic64_sub_return, __VA_ARGS__)
+#endif
+
+#ifndef atomic64_sub_return
+#define  atomic64_sub_return(...)					\
+	__atomic_op_fence(long long, atomic64_sub_return, __VA_ARGS__)
+#endif
+#endif /* atomic64_sub_return_relaxed */
+
+#ifndef atomic64_xchg_relaxed
+#define  atomic64_xchg_relaxed		atomic64_xchg
+#define  atomic64_xchg_acquire		atomic64_xchg
+#define  atomic64_xchg_release		atomic64_xchg
+
+#else /* atomic64_xchg_relaxed */
+
+#ifndef atomic64_xchg_acquire
+#define  atomic64_xchg_acquire(...)					\
+	__atomic_op_acquire(long long, atomic64_xchg, __VA_ARGS__)
+#endif
+
+#ifndef atomic64_xchg_release
+#define  atomic64_xchg_release(...)					\
+	__atomic_op_release(long long, atomic64_xchg, __VA_ARGS__)
+#endif
+
+#ifndef atomic64_xchg
+#define  atomic64_xchg(...)						\
+	__atomic_op_fence(long long, atomic64_xchg, __VA_ARGS__)
+#endif
+#endif /* atomic64_xchg_relaxed */
+
+#ifndef atomic64_cmpxchg_relaxed
+#define  atomic64_cmpxchg_relaxed	atomic64_cmpxchg
+#define  atomic64_cmpxchg_acquire	atomic64_cmpxchg
+#define  atomic64_cmpxchg_release	atomic64_cmpxchg
+
+#else /* atomic64_cmpxchg_relaxed */
+
+#ifndef atomic64_cmpxchg_acquire
+#define  atomic64_cmpxchg_acquire(...)					\
+	__atomic_op_acquire(long long, atomic64_cmpxchg, __VA_ARGS__)
+#endif
+
+#ifndef atomic64_cmpxchg_release
+#define  atomic64_cmpxchg_release(...)					\
+	__atomic_op_release(long long, atomic64_cmpxchg, __VA_ARGS__)
+#endif
+
+#ifndef atomic64_cmpxchg
+#define  atomic64_cmpxchg(...)						\
+	__atomic_op_fence(long long, atomic64_cmpxchg, __VA_ARGS__)
+#endif
+#endif /* atomic64_cmpxchg_relaxed */
+
+#ifndef cmpxchg_relaxed
+#define  cmpxchg_relaxed		cmpxchg
+#define  cmpxchg_acquire		cmpxchg
+#define  cmpxchg_release		cmpxchg
+
+#else /* cmpxchg_relaxed */
+
+#ifndef cmpxchg_acquire
+#define  cmpxchg_acquire(ptr, ...)					\
+	__atomic_op_acquire(__typeof__(*ptr), cmpxchg, ptr, __VA_ARGS__)
+#endif
+
+#ifndef cmpxchg_release
+#define  cmpxchg_release(ptr, ...)					\
+	__atomic_op_release(__typeof__(*ptr), cmpxchg, ptr, __VA_ARGS__)
+#endif
+
+#ifndef cmpxchg
+#define  cmpxchg(ptr, ...)						\
+	__atomic_op_fence(__typeof__(*ptr), cmpxchg, ptr, __VA_ARGS__)
+#endif
+#endif /* cmpxchg_relaxed */
+
+#ifndef cmpxchg64_relaxed
+#define  cmpxchg64_relaxed		cmpxchg64
+#define  cmpxchg64_acquire		cmpxchg64
+#define  cmpxchg64_release		cmpxchg64
+
+#else /* cmpxchg64_relaxed */
+
+#ifndef cmpxchg64_acquire
+#define  cmpxchg64_acquire(ptr, ...)					\
+	__atomic_op_acquire(__typeof__(*ptr), cmpxchg64, ptr, __VA_ARGS__)
+#endif
+
+#ifndef cmpxchg64_release
+#define  cmpxchg64_release(ptr, ...)					\
+	__atomic_op_release(__typeof__(*ptr), cmpxchg64, ptr, __VA_ARGS__)
+#endif
+
+#ifndef cmpxchg64
+#define  cmpxchg64(ptr, ...)						\
+	__atomic_op_fence(__typeof__(*ptr), cmpxchg64, ptr, __VA_ARGS__)
+#endif
+#endif /* cmpxchg64_relaxed */
+
+#ifndef xchg_relaxed
+#define  xchg_relaxed			xchg
+#define  xchg_acquire			xchg
+#define  xchg_release			xchg
+
+#else /* xchg_relaxed */
+
+#ifndef xchg_acquire
+#define  xchg_acquire(ptr, ...)						\
+	__atomic_op_acquire(__typeof__(*ptr), xchg, ptr, __VA_ARGS__)
+#endif
+
+#ifndef xchg_release
+#define  xchg_release(ptr, ...)						\
+	__atomic_op_release(__typeof__(*ptr), xchg, ptr, __VA_ARGS__)
+#endif
+
+#ifndef xchg
+#define  xchg(ptr, ...)							\
+	__atomic_op_fence(__typeof__(*ptr), xchg, ptr, __VA_ARGS__)
+#endif
+#endif /* xchg_relaxed */
 
 /**
  * atomic_add_unless - add unless the number is already a given value
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 2/7] asm-generic: rework atomic-long.h to avoid bulk code duplication
  2015-07-16 15:32 [PATCH v2 0/7] Add generic support for relaxed atomics Will Deacon
  2015-07-16 15:32 ` [PATCH v2 1/7] atomics: add acquire/release/relaxed variants of some atomic operations Will Deacon
@ 2015-07-16 15:32 ` Will Deacon
  2015-07-16 15:32 ` [PATCH v2 3/7] asm-generic: add relaxed/acquire/release variants for atomic_long_t Will Deacon
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 21+ messages in thread
From: Will Deacon @ 2015-07-16 15:32 UTC (permalink / raw)
  To: linux-arch; +Cc: Waiman.Long, peterz, linux-kernel, paulmck, Will Deacon

We can use some (admittedly ugly) macros to generate the 32-bit and
64-bit based atomic_long implementations from the same code.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 include/asm-generic/atomic-long.h | 189 ++++++++------------------------------
 1 file changed, 40 insertions(+), 149 deletions(-)

diff --git a/include/asm-generic/atomic-long.h b/include/asm-generic/atomic-long.h
index b7babf0206b8..beaea541adfb 100644
--- a/include/asm-generic/atomic-long.h
+++ b/include/asm-generic/atomic-long.h
@@ -23,236 +23,127 @@
 typedef atomic64_t atomic_long_t;
 
 #define ATOMIC_LONG_INIT(i)	ATOMIC64_INIT(i)
+#define ATOMIC_LONG_PFX(x)	atomic64 ## x
 
-static inline long atomic_long_read(atomic_long_t *l)
-{
-	atomic64_t *v = (atomic64_t *)l;
-
-	return (long)atomic64_read(v);
-}
-
-static inline void atomic_long_set(atomic_long_t *l, long i)
-{
-	atomic64_t *v = (atomic64_t *)l;
-
-	atomic64_set(v, i);
-}
-
-static inline void atomic_long_inc(atomic_long_t *l)
-{
-	atomic64_t *v = (atomic64_t *)l;
-
-	atomic64_inc(v);
-}
-
-static inline void atomic_long_dec(atomic_long_t *l)
-{
-	atomic64_t *v = (atomic64_t *)l;
-
-	atomic64_dec(v);
-}
-
-static inline void atomic_long_add(long i, atomic_long_t *l)
-{
-	atomic64_t *v = (atomic64_t *)l;
-
-	atomic64_add(i, v);
-}
-
-static inline void atomic_long_sub(long i, atomic_long_t *l)
-{
-	atomic64_t *v = (atomic64_t *)l;
-
-	atomic64_sub(i, v);
-}
-
-static inline int atomic_long_sub_and_test(long i, atomic_long_t *l)
-{
-	atomic64_t *v = (atomic64_t *)l;
-
-	return atomic64_sub_and_test(i, v);
-}
-
-static inline int atomic_long_dec_and_test(atomic_long_t *l)
-{
-	atomic64_t *v = (atomic64_t *)l;
-
-	return atomic64_dec_and_test(v);
-}
-
-static inline int atomic_long_inc_and_test(atomic_long_t *l)
-{
-	atomic64_t *v = (atomic64_t *)l;
-
-	return atomic64_inc_and_test(v);
-}
-
-static inline int atomic_long_add_negative(long i, atomic_long_t *l)
-{
-	atomic64_t *v = (atomic64_t *)l;
-
-	return atomic64_add_negative(i, v);
-}
-
-static inline long atomic_long_add_return(long i, atomic_long_t *l)
-{
-	atomic64_t *v = (atomic64_t *)l;
-
-	return (long)atomic64_add_return(i, v);
-}
-
-static inline long atomic_long_sub_return(long i, atomic_long_t *l)
-{
-	atomic64_t *v = (atomic64_t *)l;
-
-	return (long)atomic64_sub_return(i, v);
-}
-
-static inline long atomic_long_inc_return(atomic_long_t *l)
-{
-	atomic64_t *v = (atomic64_t *)l;
-
-	return (long)atomic64_inc_return(v);
-}
-
-static inline long atomic_long_dec_return(atomic_long_t *l)
-{
-	atomic64_t *v = (atomic64_t *)l;
-
-	return (long)atomic64_dec_return(v);
-}
-
-static inline long atomic_long_add_unless(atomic_long_t *l, long a, long u)
-{
-	atomic64_t *v = (atomic64_t *)l;
-
-	return (long)atomic64_add_unless(v, a, u);
-}
-
-#define atomic_long_inc_not_zero(l) atomic64_inc_not_zero((atomic64_t *)(l))
-
-#define atomic_long_cmpxchg(l, old, new) \
-	(atomic64_cmpxchg((atomic64_t *)(l), (old), (new)))
-#define atomic_long_xchg(v, new) \
-	(atomic64_xchg((atomic64_t *)(v), (new)))
-
-#else  /*  BITS_PER_LONG == 64  */
+#else
 
 typedef atomic_t atomic_long_t;
 
 #define ATOMIC_LONG_INIT(i)	ATOMIC_INIT(i)
+#define ATOMIC_LONG_PFX(x)	atomic ## x
+
+#endif
+
 static inline long atomic_long_read(atomic_long_t *l)
 {
-	atomic_t *v = (atomic_t *)l;
+	ATOMIC_LONG_PFX(_t) *v = (ATOMIC_LONG_PFX(_t) *)l;
 
-	return (long)atomic_read(v);
+	return (long)ATOMIC_LONG_PFX(_read)(v);
 }
 
 static inline void atomic_long_set(atomic_long_t *l, long i)
 {
-	atomic_t *v = (atomic_t *)l;
+	ATOMIC_LONG_PFX(_t) *v = (ATOMIC_LONG_PFX(_t) *)l;
 
-	atomic_set(v, i);
+	ATOMIC_LONG_PFX(_set)(v, i);
 }
 
 static inline void atomic_long_inc(atomic_long_t *l)
 {
-	atomic_t *v = (atomic_t *)l;
+	ATOMIC_LONG_PFX(_t) *v = (ATOMIC_LONG_PFX(_t) *)l;
 
-	atomic_inc(v);
+	ATOMIC_LONG_PFX(_inc)(v);
 }
 
 static inline void atomic_long_dec(atomic_long_t *l)
 {
-	atomic_t *v = (atomic_t *)l;
+	ATOMIC_LONG_PFX(_t) *v = (ATOMIC_LONG_PFX(_t) *)l;
 
-	atomic_dec(v);
+	ATOMIC_LONG_PFX(_dec)(v);
 }
 
 static inline void atomic_long_add(long i, atomic_long_t *l)
 {
-	atomic_t *v = (atomic_t *)l;
+	ATOMIC_LONG_PFX(_t) *v = (ATOMIC_LONG_PFX(_t) *)l;
 
-	atomic_add(i, v);
+	ATOMIC_LONG_PFX(_add)(i, v);
 }
 
 static inline void atomic_long_sub(long i, atomic_long_t *l)
 {
-	atomic_t *v = (atomic_t *)l;
+	ATOMIC_LONG_PFX(_t) *v = (ATOMIC_LONG_PFX(_t) *)l;
 
-	atomic_sub(i, v);
+	ATOMIC_LONG_PFX(_sub)(i, v);
 }
 
 static inline int atomic_long_sub_and_test(long i, atomic_long_t *l)
 {
-	atomic_t *v = (atomic_t *)l;
+	ATOMIC_LONG_PFX(_t) *v = (ATOMIC_LONG_PFX(_t) *)l;
 
-	return atomic_sub_and_test(i, v);
+	return ATOMIC_LONG_PFX(_sub_and_test)(i, v);
 }
 
 static inline int atomic_long_dec_and_test(atomic_long_t *l)
 {
-	atomic_t *v = (atomic_t *)l;
+	ATOMIC_LONG_PFX(_t) *v = (ATOMIC_LONG_PFX(_t) *)l;
 
-	return atomic_dec_and_test(v);
+	return ATOMIC_LONG_PFX(_dec_and_test)(v);
 }
 
 static inline int atomic_long_inc_and_test(atomic_long_t *l)
 {
-	atomic_t *v = (atomic_t *)l;
+	ATOMIC_LONG_PFX(_t) *v = (ATOMIC_LONG_PFX(_t) *)l;
 
-	return atomic_inc_and_test(v);
+	return ATOMIC_LONG_PFX(_inc_and_test)(v);
 }
 
 static inline int atomic_long_add_negative(long i, atomic_long_t *l)
 {
-	atomic_t *v = (atomic_t *)l;
+	ATOMIC_LONG_PFX(_t) *v = (ATOMIC_LONG_PFX(_t) *)l;
 
-	return atomic_add_negative(i, v);
+	return ATOMIC_LONG_PFX(_add_negative)(i, v);
 }
 
 static inline long atomic_long_add_return(long i, atomic_long_t *l)
 {
-	atomic_t *v = (atomic_t *)l;
+	ATOMIC_LONG_PFX(_t) *v = (ATOMIC_LONG_PFX(_t) *)l;
 
-	return (long)atomic_add_return(i, v);
+	return (long)ATOMIC_LONG_PFX(_add_return)(i, v);
 }
 
 static inline long atomic_long_sub_return(long i, atomic_long_t *l)
 {
-	atomic_t *v = (atomic_t *)l;
+	ATOMIC_LONG_PFX(_t) *v = (ATOMIC_LONG_PFX(_t) *)l;
 
-	return (long)atomic_sub_return(i, v);
+	return (long)ATOMIC_LONG_PFX(_sub_return)(i, v);
 }
 
 static inline long atomic_long_inc_return(atomic_long_t *l)
 {
-	atomic_t *v = (atomic_t *)l;
+	ATOMIC_LONG_PFX(_t) *v = (ATOMIC_LONG_PFX(_t) *)l;
 
-	return (long)atomic_inc_return(v);
+	return (long)ATOMIC_LONG_PFX(_inc_return)(v);
 }
 
 static inline long atomic_long_dec_return(atomic_long_t *l)
 {
-	atomic_t *v = (atomic_t *)l;
+	ATOMIC_LONG_PFX(_t) *v = (ATOMIC_LONG_PFX(_t) *)l;
 
-	return (long)atomic_dec_return(v);
+	return (long)ATOMIC_LONG_PFX(_dec_return)(v);
 }
 
 static inline long atomic_long_add_unless(atomic_long_t *l, long a, long u)
 {
-	atomic_t *v = (atomic_t *)l;
+	ATOMIC_LONG_PFX(_t) *v = (ATOMIC_LONG_PFX(_t) *)l;
 
-	return (long)atomic_add_unless(v, a, u);
+	return (long)ATOMIC_LONG_PFX(_add_unless)(v, a, u);
 }
 
-#define atomic_long_inc_not_zero(l) atomic_inc_not_zero((atomic_t *)(l))
-
+#define atomic_long_inc_not_zero(l) \
+	ATOMIC_LONG_PFX(_inc_not_zero)((ATOMIC_LONG_PFX(_t) *)(l))
 #define atomic_long_cmpxchg(l, old, new) \
-	(atomic_cmpxchg((atomic_t *)(l), (old), (new)))
+	(ATOMIC_LONG_PFX(_cmpxchg)((ATOMIC_LONG_PFX(_t) *)(l), (old), (new)))
 #define atomic_long_xchg(v, new) \
-	(atomic_xchg((atomic_t *)(v), (new)))
-
-#endif  /*  BITS_PER_LONG == 64  */
+	(ATOMIC_LONG_PFX(_xchg)((ATOMIC_LONG_PFX(_t) *)(v), (new)))
 
 #endif  /*  _ASM_GENERIC_ATOMIC_LONG_H  */
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 3/7] asm-generic: add relaxed/acquire/release variants for atomic_long_t
  2015-07-16 15:32 [PATCH v2 0/7] Add generic support for relaxed atomics Will Deacon
  2015-07-16 15:32 ` [PATCH v2 1/7] atomics: add acquire/release/relaxed variants of some atomic operations Will Deacon
  2015-07-16 15:32 ` [PATCH v2 2/7] asm-generic: rework atomic-long.h to avoid bulk code duplication Will Deacon
@ 2015-07-16 15:32 ` Will Deacon
  2015-07-16 15:32 ` [PATCH v2 4/7] lockref: remove homebrew cmpxchg64_relaxed macro definition Will Deacon
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 21+ messages in thread
From: Will Deacon @ 2015-07-16 15:32 UTC (permalink / raw)
  To: linux-arch; +Cc: Waiman.Long, peterz, linux-kernel, paulmck, Will Deacon

This patch adds atomic_long_t wrappers for the new relaxed atomic
operations.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 include/asm-generic/atomic-long.h | 86 +++++++++++++++++++++++++++------------
 1 file changed, 59 insertions(+), 27 deletions(-)

diff --git a/include/asm-generic/atomic-long.h b/include/asm-generic/atomic-long.h
index beaea541adfb..a94cbebbc33d 100644
--- a/include/asm-generic/atomic-long.h
+++ b/include/asm-generic/atomic-long.h
@@ -34,19 +34,69 @@ typedef atomic_t atomic_long_t;
 
 #endif
 
-static inline long atomic_long_read(atomic_long_t *l)
-{
-	ATOMIC_LONG_PFX(_t) *v = (ATOMIC_LONG_PFX(_t) *)l;
-
-	return (long)ATOMIC_LONG_PFX(_read)(v);
+#define ATOMIC_LONG_READ_OP(mo)						\
+static inline long atomic_long_read##mo(atomic_long_t *l)		\
+{									\
+	ATOMIC_LONG_PFX(_t) *v = (ATOMIC_LONG_PFX(_t) *)l;		\
+									\
+	return (long)ATOMIC_LONG_PFX(_read##mo)(v);			\
 }
+ATOMIC_LONG_READ_OP()
+ATOMIC_LONG_READ_OP(_acquire)
 
-static inline void atomic_long_set(atomic_long_t *l, long i)
-{
-	ATOMIC_LONG_PFX(_t) *v = (ATOMIC_LONG_PFX(_t) *)l;
+#undef ATOMIC_LONG_READ_OP
 
-	ATOMIC_LONG_PFX(_set)(v, i);
+#define ATOMIC_LONG_SET_OP(mo)						\
+static inline void atomic_long_set##mo(atomic_long_t *l, long i)	\
+{									\
+	ATOMIC_LONG_PFX(_t) *v = (ATOMIC_LONG_PFX(_t) *)l;		\
+									\
+	ATOMIC_LONG_PFX(_set##mo)(v, i);				\
+}
+ATOMIC_LONG_SET_OP()
+ATOMIC_LONG_SET_OP(_release)
+
+#undef ATOMIC_LONG_SET_OP
+
+#define ATOMIC_LONG_ADD_SUB_OP(op, mo)					\
+static inline long							\
+atomic_long_##op##_return##mo(long i, atomic_long_t *l)			\
+{									\
+	ATOMIC_LONG_PFX(_t) *v = (ATOMIC_LONG_PFX(_t) *)l;		\
+									\
+	return (long)ATOMIC_LONG_PFX(_##op##_return##mo)(i, v);		\
 }
+ATOMIC_LONG_ADD_SUB_OP(add,)
+ATOMIC_LONG_ADD_SUB_OP(add, _relaxed)
+ATOMIC_LONG_ADD_SUB_OP(add, _acquire)
+ATOMIC_LONG_ADD_SUB_OP(add, _release)
+ATOMIC_LONG_ADD_SUB_OP(sub,)
+ATOMIC_LONG_ADD_SUB_OP(sub, _relaxed)
+ATOMIC_LONG_ADD_SUB_OP(sub, _acquire)
+ATOMIC_LONG_ADD_SUB_OP(sub, _release)
+
+#undef ATOMIC_LONG_ADD_SUB_OP
+
+#define atomic_long_cmpxchg_relaxed(l, old, new) \
+	(ATOMIC_LONG_PFX(_cmpxchg_relaxed)((ATOMIC_LONG_PFX(_t) *)(l), \
+					   (old), (new)))
+#define atomic_long_cmpxchg_acquire(l, old, new) \
+	(ATOMIC_LONG_PFX(_cmpxchg_acquire)((ATOMIC_LONG_PFX(_t) *)(l), \
+					   (old), (new)))
+#define atomic_long_cmpxchg_release(l, old, new) \
+	(ATOMIC_LONG_PFX(_cmpxchg_release)((ATOMIC_LONG_PFX(_t) *)(l), \
+					   (old), (new)))
+#define atomic_long_cmpxchg(l, old, new) \
+	(ATOMIC_LONG_PFX(_cmpxchg)((ATOMIC_LONG_PFX(_t) *)(l), (old), (new)))
+
+#define atomic_long_xchg_relaxed(v, new) \
+	(ATOMIC_LONG_PFX(_xchg_relaxed)((ATOMIC_LONG_PFX(_t) *)(v), (new)))
+#define atomic_long_xchg_acquire(v, new) \
+	(ATOMIC_LONG_PFX(_xchg_acquire)((ATOMIC_LONG_PFX(_t) *)(v), (new)))
+#define atomic_long_xchg_release(v, new) \
+	(ATOMIC_LONG_PFX(_xchg_release)((ATOMIC_LONG_PFX(_t) *)(v), (new)))
+#define atomic_long_xchg(v, new) \
+	(ATOMIC_LONG_PFX(_xchg)((ATOMIC_LONG_PFX(_t) *)(v), (new)))
 
 static inline void atomic_long_inc(atomic_long_t *l)
 {
@@ -104,20 +154,6 @@ static inline int atomic_long_add_negative(long i, atomic_long_t *l)
 	return ATOMIC_LONG_PFX(_add_negative)(i, v);
 }
 
-static inline long atomic_long_add_return(long i, atomic_long_t *l)
-{
-	ATOMIC_LONG_PFX(_t) *v = (ATOMIC_LONG_PFX(_t) *)l;
-
-	return (long)ATOMIC_LONG_PFX(_add_return)(i, v);
-}
-
-static inline long atomic_long_sub_return(long i, atomic_long_t *l)
-{
-	ATOMIC_LONG_PFX(_t) *v = (ATOMIC_LONG_PFX(_t) *)l;
-
-	return (long)ATOMIC_LONG_PFX(_sub_return)(i, v);
-}
-
 static inline long atomic_long_inc_return(atomic_long_t *l)
 {
 	ATOMIC_LONG_PFX(_t) *v = (ATOMIC_LONG_PFX(_t) *)l;
@@ -141,9 +177,5 @@ static inline long atomic_long_add_unless(atomic_long_t *l, long a, long u)
 
 #define atomic_long_inc_not_zero(l) \
 	ATOMIC_LONG_PFX(_inc_not_zero)((ATOMIC_LONG_PFX(_t) *)(l))
-#define atomic_long_cmpxchg(l, old, new) \
-	(ATOMIC_LONG_PFX(_cmpxchg)((ATOMIC_LONG_PFX(_t) *)(l), (old), (new)))
-#define atomic_long_xchg(v, new) \
-	(ATOMIC_LONG_PFX(_xchg)((ATOMIC_LONG_PFX(_t) *)(v), (new)))
 
 #endif  /*  _ASM_GENERIC_ATOMIC_LONG_H  */
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 4/7] lockref: remove homebrew cmpxchg64_relaxed macro definition
  2015-07-16 15:32 [PATCH v2 0/7] Add generic support for relaxed atomics Will Deacon
                   ` (2 preceding siblings ...)
  2015-07-16 15:32 ` [PATCH v2 3/7] asm-generic: add relaxed/acquire/release variants for atomic_long_t Will Deacon
@ 2015-07-16 15:32 ` Will Deacon
  2015-07-16 16:48   ` Peter Zijlstra
  2015-07-16 15:32 ` [PATCH v2 5/7] locking/qrwlock: make use of acquire/release/relaxed atomics Will Deacon
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 21+ messages in thread
From: Will Deacon @ 2015-07-16 15:32 UTC (permalink / raw)
  To: linux-arch; +Cc: Waiman.Long, peterz, linux-kernel, paulmck, Will Deacon

cmpxchg64_relaxed is now defined by asm-generic/atomic.h, so we can
remove our local definition from the lockref code.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 lib/lockref.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/lib/lockref.c b/lib/lockref.c
index 494994bf17c8..5a92189ad711 100644
--- a/lib/lockref.c
+++ b/lib/lockref.c
@@ -4,14 +4,6 @@
 #if USE_CMPXCHG_LOCKREF
 
 /*
- * Allow weakly-ordered memory architectures to provide barrier-less
- * cmpxchg semantics for lockref updates.
- */
-#ifndef cmpxchg64_relaxed
-# define cmpxchg64_relaxed cmpxchg64
-#endif
-
-/*
  * Note that the "cmpxchg()" reloads the "old" value for the
  * failure case.
  */
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 5/7] locking/qrwlock: make use of acquire/release/relaxed atomics
  2015-07-16 15:32 [PATCH v2 0/7] Add generic support for relaxed atomics Will Deacon
                   ` (3 preceding siblings ...)
  2015-07-16 15:32 ` [PATCH v2 4/7] lockref: remove homebrew cmpxchg64_relaxed macro definition Will Deacon
@ 2015-07-16 15:32 ` Will Deacon
  2015-07-16 16:59   ` Peter Zijlstra
  2015-07-16 15:32 ` [PATCH v2 6/7] include/llist: use linux/atomic.h instead of asm/cmpxchg.h Will Deacon
  2015-07-16 15:32 ` [PATCH v2 7/7] ARM: atomics: define our SMP atomics in terms of _relaxed operations Will Deacon
  6 siblings, 1 reply; 21+ messages in thread
From: Will Deacon @ 2015-07-16 15:32 UTC (permalink / raw)
  To: linux-arch; +Cc: Waiman.Long, peterz, linux-kernel, paulmck, Will Deacon

The qrwlock implementation is slightly heavy in its use of memory
barriers, mainly through the use of cmpxchg and _return atomics, which
imply full barrier semantics.

This patch modifies the qrwlock code to use the more relaxed atomic
routines so that we can reduce the unnecessary barrier overhead on
weakly-ordered architectures.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 include/asm-generic/qrwlock.h | 13 ++++++-------
 kernel/locking/qrwlock.c      | 12 ++++++------
 2 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/include/asm-generic/qrwlock.h b/include/asm-generic/qrwlock.h
index 930920501e33..6ab846bf6413 100644
--- a/include/asm-generic/qrwlock.h
+++ b/include/asm-generic/qrwlock.h
@@ -68,7 +68,7 @@ static inline int queued_read_trylock(struct qrwlock *lock)
 
 	cnts = atomic_read(&lock->cnts);
 	if (likely(!(cnts & _QW_WMASK))) {
-		cnts = (u32)atomic_add_return(_QR_BIAS, &lock->cnts);
+		cnts = (u32)atomic_add_return_acquire(_QR_BIAS, &lock->cnts);
 		if (likely(!(cnts & _QW_WMASK)))
 			return 1;
 		atomic_sub(_QR_BIAS, &lock->cnts);
@@ -89,8 +89,8 @@ static inline int queued_write_trylock(struct qrwlock *lock)
 	if (unlikely(cnts))
 		return 0;
 
-	return likely(atomic_cmpxchg(&lock->cnts,
-				     cnts, cnts | _QW_LOCKED) == cnts);
+	return likely(atomic_cmpxchg_acquire(&lock->cnts,
+					     cnts, cnts | _QW_LOCKED) == cnts);
 }
 /**
  * queued_read_lock - acquire read lock of a queue rwlock
@@ -100,7 +100,7 @@ static inline void queued_read_lock(struct qrwlock *lock)
 {
 	u32 cnts;
 
-	cnts = atomic_add_return(_QR_BIAS, &lock->cnts);
+	cnts = atomic_add_return_acquire(_QR_BIAS, &lock->cnts);
 	if (likely(!(cnts & _QW_WMASK)))
 		return;
 
@@ -115,7 +115,7 @@ static inline void queued_read_lock(struct qrwlock *lock)
 static inline void queued_write_lock(struct qrwlock *lock)
 {
 	/* Optimize for the unfair lock case where the fair flag is 0. */
-	if (atomic_cmpxchg(&lock->cnts, 0, _QW_LOCKED) == 0)
+	if (atomic_cmpxchg_acquire(&lock->cnts, 0, _QW_LOCKED) == 0)
 		return;
 
 	queued_write_lock_slowpath(lock);
@@ -130,8 +130,7 @@ static inline void queued_read_unlock(struct qrwlock *lock)
 	/*
 	 * Atomically decrement the reader count
 	 */
-	smp_mb__before_atomic();
-	atomic_sub(_QR_BIAS, &lock->cnts);
+	(void)atomic_sub_return_release(_QR_BIAS, &lock->cnts);
 }
 
 /**
diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
index a71bb3541880..879c8fab7bea 100644
--- a/kernel/locking/qrwlock.c
+++ b/kernel/locking/qrwlock.c
@@ -36,7 +36,7 @@ rspin_until_writer_unlock(struct qrwlock *lock, u32 cnts)
 {
 	while ((cnts & _QW_WMASK) == _QW_LOCKED) {
 		cpu_relax_lowlatency();
-		cnts = smp_load_acquire((u32 *)&lock->cnts);
+		cnts = atomic_read_acquire(&lock->cnts);
 	}
 }
 
@@ -78,7 +78,7 @@ void queued_read_lock_slowpath(struct qrwlock *lock, u32 cnts)
 	while (atomic_read(&lock->cnts) & _QW_WMASK)
 		cpu_relax_lowlatency();
 
-	cnts = atomic_add_return(_QR_BIAS, &lock->cnts) - _QR_BIAS;
+	cnts = atomic_add_return_relaxed(_QR_BIAS, &lock->cnts) - _QR_BIAS;
 	rspin_until_writer_unlock(lock, cnts);
 
 	/*
@@ -101,7 +101,7 @@ void queued_write_lock_slowpath(struct qrwlock *lock)
 
 	/* Try to acquire the lock directly if no reader is present */
 	if (!atomic_read(&lock->cnts) &&
-	    (atomic_cmpxchg(&lock->cnts, 0, _QW_LOCKED) == 0))
+	    (atomic_cmpxchg_acquire(&lock->cnts, 0, _QW_LOCKED) == 0))
 		goto unlock;
 
 	/*
@@ -110,7 +110,7 @@ void queued_write_lock_slowpath(struct qrwlock *lock)
 	 */
 	for (;;) {
 		if (!READ_ONCE(lock->wmode) &&
-		   (cmpxchg(&lock->wmode, 0, _QW_WAITING) == 0))
+		   (cmpxchg_relaxed(&lock->wmode, 0, _QW_WAITING) == 0))
 			break;
 
 		cpu_relax_lowlatency();
@@ -120,8 +120,8 @@ void queued_write_lock_slowpath(struct qrwlock *lock)
 	for (;;) {
 		cnts = atomic_read(&lock->cnts);
 		if ((cnts == _QW_WAITING) &&
-		    (atomic_cmpxchg(&lock->cnts, _QW_WAITING,
-				    _QW_LOCKED) == _QW_WAITING))
+		    (atomic_cmpxchg_acquire(&lock->cnts, _QW_WAITING,
+					    _QW_LOCKED) == _QW_WAITING))
 			break;
 
 		cpu_relax_lowlatency();
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 6/7] include/llist: use linux/atomic.h instead of asm/cmpxchg.h
  2015-07-16 15:32 [PATCH v2 0/7] Add generic support for relaxed atomics Will Deacon
                   ` (4 preceding siblings ...)
  2015-07-16 15:32 ` [PATCH v2 5/7] locking/qrwlock: make use of acquire/release/relaxed atomics Will Deacon
@ 2015-07-16 15:32 ` Will Deacon
  2015-07-16 15:32 ` [PATCH v2 7/7] ARM: atomics: define our SMP atomics in terms of _relaxed operations Will Deacon
  6 siblings, 0 replies; 21+ messages in thread
From: Will Deacon @ 2015-07-16 15:32 UTC (permalink / raw)
  To: linux-arch; +Cc: Waiman.Long, peterz, linux-kernel, paulmck, Will Deacon

Including an asm/ header directly is best avoided, so use linux/atomic.h
instead of asm/cmpxchg.h in linux/llist.h.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 include/linux/llist.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/llist.h b/include/linux/llist.h
index fbf10a0bc095..fd4ca0b4fe0f 100644
--- a/include/linux/llist.h
+++ b/include/linux/llist.h
@@ -55,8 +55,8 @@
  * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
  */
 
+#include <linux/atomic.h>
 #include <linux/kernel.h>
-#include <asm/cmpxchg.h>
 
 struct llist_head {
 	struct llist_node *first;
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 7/7] ARM: atomics: define our SMP atomics in terms of _relaxed operations
  2015-07-16 15:32 [PATCH v2 0/7] Add generic support for relaxed atomics Will Deacon
                   ` (5 preceding siblings ...)
  2015-07-16 15:32 ` [PATCH v2 6/7] include/llist: use linux/atomic.h instead of asm/cmpxchg.h Will Deacon
@ 2015-07-16 15:32 ` Will Deacon
  2015-07-16 20:40   ` Waiman Long
  6 siblings, 1 reply; 21+ messages in thread
From: Will Deacon @ 2015-07-16 15:32 UTC (permalink / raw)
  To: linux-arch; +Cc: Waiman.Long, peterz, linux-kernel, paulmck, Will Deacon

By defining our SMP atomics in terms of relaxed operations, we gain
a small reduction in code size and have acquire/release/fence variants
generated automatically by the core code.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm/include/asm/atomic.h  | 37 ++++++++++++++-------------------
 arch/arm/include/asm/cmpxchg.h | 47 +++++++-----------------------------------
 2 files changed, 24 insertions(+), 60 deletions(-)

diff --git a/arch/arm/include/asm/atomic.h b/arch/arm/include/asm/atomic.h
index e22c11970b7b..bc9da3a3ff5e 100644
--- a/arch/arm/include/asm/atomic.h
+++ b/arch/arm/include/asm/atomic.h
@@ -57,12 +57,11 @@ static inline void atomic_##op(int i, atomic_t *v)			\
 }									\
 
 #define ATOMIC_OP_RETURN(op, c_op, asm_op)				\
-static inline int atomic_##op##_return(int i, atomic_t *v)		\
+static inline int atomic_##op##_return_relaxed(int i, atomic_t *v)	\
 {									\
 	unsigned long tmp;						\
 	int result;							\
 									\
-	smp_mb();							\
 	prefetchw(&v->counter);						\
 									\
 	__asm__ __volatile__("@ atomic_" #op "_return\n"		\
@@ -75,17 +74,17 @@ static inline int atomic_##op##_return(int i, atomic_t *v)		\
 	: "r" (&v->counter), "Ir" (i)					\
 	: "cc");							\
 									\
-	smp_mb();							\
-									\
 	return result;							\
 }
 
-static inline int atomic_cmpxchg(atomic_t *ptr, int old, int new)
+#define atomic_add_return_relaxed	atomic_add_return_relaxed
+#define atomic_sub_return_relaxed	atomic_sub_return_relaxed
+
+static inline int atomic_cmpxchg_relaxed(atomic_t *ptr, int old, int new)
 {
 	int oldval;
 	unsigned long res;
 
-	smp_mb();
 	prefetchw(&ptr->counter);
 
 	do {
@@ -99,10 +98,9 @@ static inline int atomic_cmpxchg(atomic_t *ptr, int old, int new)
 		    : "cc");
 	} while (res);
 
-	smp_mb();
-
 	return oldval;
 }
+#define atomic_cmpxchg_relaxed		atomic_cmpxchg_relaxed
 
 static inline int __atomic_add_unless(atomic_t *v, int a, int u)
 {
@@ -290,12 +288,12 @@ static inline void atomic64_##op(long long i, atomic64_t *v)		\
 }									\
 
 #define ATOMIC64_OP_RETURN(op, op1, op2)				\
-static inline long long atomic64_##op##_return(long long i, atomic64_t *v) \
+static inline long long							\
+atomic64_##op##_return_relaxed(long long i, atomic64_t *v)		\
 {									\
 	long long result;						\
 	unsigned long tmp;						\
 									\
-	smp_mb();							\
 	prefetchw(&v->counter);						\
 									\
 	__asm__ __volatile__("@ atomic64_" #op "_return\n"		\
@@ -309,8 +307,6 @@ static inline long long atomic64_##op##_return(long long i, atomic64_t *v) \
 	: "r" (&v->counter), "r" (i)					\
 	: "cc");							\
 									\
-	smp_mb();							\
-									\
 	return result;							\
 }
 
@@ -321,17 +317,19 @@ static inline long long atomic64_##op##_return(long long i, atomic64_t *v) \
 ATOMIC64_OPS(add, adds, adc)
 ATOMIC64_OPS(sub, subs, sbc)
 
+#define atomic64_add_return_relaxed	atomic64_add_return_relaxed
+#define atomic64_sub_return_relaxed	atomic64_sub_return_relaxed
+
 #undef ATOMIC64_OPS
 #undef ATOMIC64_OP_RETURN
 #undef ATOMIC64_OP
 
-static inline long long atomic64_cmpxchg(atomic64_t *ptr, long long old,
-					long long new)
+static inline long long
+atomic64_cmpxchg_relaxed(atomic64_t *ptr, long long old, long long new)
 {
 	long long oldval;
 	unsigned long res;
 
-	smp_mb();
 	prefetchw(&ptr->counter);
 
 	do {
@@ -346,17 +344,15 @@ static inline long long atomic64_cmpxchg(atomic64_t *ptr, long long old,
 		: "cc");
 	} while (res);
 
-	smp_mb();
-
 	return oldval;
 }
+#define atomic64_cmpxchg_relaxed	atomic64_cmpxchg_relaxed
 
-static inline long long atomic64_xchg(atomic64_t *ptr, long long new)
+static inline long long atomic64_xchg_relaxed(atomic64_t *ptr, long long new)
 {
 	long long result;
 	unsigned long tmp;
 
-	smp_mb();
 	prefetchw(&ptr->counter);
 
 	__asm__ __volatile__("@ atomic64_xchg\n"
@@ -368,10 +364,9 @@ static inline long long atomic64_xchg(atomic64_t *ptr, long long new)
 	: "r" (&ptr->counter), "r" (new)
 	: "cc");
 
-	smp_mb();
-
 	return result;
 }
+#define atomic64_xchg_relaxed		atomic64_xchg_relaxed
 
 static inline long long atomic64_dec_if_positive(atomic64_t *v)
 {
diff --git a/arch/arm/include/asm/cmpxchg.h b/arch/arm/include/asm/cmpxchg.h
index 1692a05d3207..916a2744d5c6 100644
--- a/arch/arm/include/asm/cmpxchg.h
+++ b/arch/arm/include/asm/cmpxchg.h
@@ -35,7 +35,6 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
 	unsigned int tmp;
 #endif
 
-	smp_mb();
 	prefetchw((const void *)ptr);
 
 	switch (size) {
@@ -98,12 +97,11 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
 		__bad_xchg(ptr, size), ret = 0;
 		break;
 	}
-	smp_mb();
 
 	return ret;
 }
 
-#define xchg(ptr, x) ({							\
+#define xchg_relaxed(ptr, x) ({						\
 	(__typeof__(*(ptr)))__xchg((unsigned long)(x), (ptr),		\
 				   sizeof(*(ptr)));			\
 })
@@ -117,6 +115,8 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
 #error "SMP is not supported on this platform"
 #endif
 
+#define xchg xchg_relaxed
+
 /*
  * cmpxchg_local and cmpxchg64_local are atomic wrt current CPU. Always make
  * them available.
@@ -194,23 +194,11 @@ static inline unsigned long __cmpxchg(volatile void *ptr, unsigned long old,
 	return oldval;
 }
 
-static inline unsigned long __cmpxchg_mb(volatile void *ptr, unsigned long old,
-					 unsigned long new, int size)
-{
-	unsigned long ret;
-
-	smp_mb();
-	ret = __cmpxchg(ptr, old, new, size);
-	smp_mb();
-
-	return ret;
-}
-
-#define cmpxchg(ptr,o,n) ({						\
-	(__typeof__(*(ptr)))__cmpxchg_mb((ptr),				\
-					 (unsigned long)(o),		\
-					 (unsigned long)(n),		\
-					 sizeof(*(ptr)));		\
+#define cmpxchg_relaxed(ptr,o,n) ({					\
+	(__typeof__(*(ptr)))__cmpxchg((ptr),				\
+				      (unsigned long)(o),		\
+				      (unsigned long)(n),		\
+				      sizeof(*(ptr)));			\
 })
 
 static inline unsigned long __cmpxchg_local(volatile void *ptr,
@@ -273,25 +261,6 @@ static inline unsigned long long __cmpxchg64(unsigned long long *ptr,
 
 #define cmpxchg64_local(ptr, o, n) cmpxchg64_relaxed((ptr), (o), (n))
 
-static inline unsigned long long __cmpxchg64_mb(unsigned long long *ptr,
-						unsigned long long old,
-						unsigned long long new)
-{
-	unsigned long long ret;
-
-	smp_mb();
-	ret = __cmpxchg64(ptr, old, new);
-	smp_mb();
-
-	return ret;
-}
-
-#define cmpxchg64(ptr, o, n) ({						\
-	(__typeof__(*(ptr)))__cmpxchg64_mb((ptr),			\
-					   (unsigned long long)(o),	\
-					   (unsigned long long)(n));	\
-})
-
 #endif	/* __LINUX_ARM_ARCH__ >= 6 */
 
 #endif /* __ASM_ARM_CMPXCHG_H */
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 4/7] lockref: remove homebrew cmpxchg64_relaxed macro definition
  2015-07-16 15:32 ` [PATCH v2 4/7] lockref: remove homebrew cmpxchg64_relaxed macro definition Will Deacon
@ 2015-07-16 16:48   ` Peter Zijlstra
  2015-07-16 17:00     ` Will Deacon
  0 siblings, 1 reply; 21+ messages in thread
From: Peter Zijlstra @ 2015-07-16 16:48 UTC (permalink / raw)
  To: Will Deacon; +Cc: linux-arch, Waiman.Long, linux-kernel, paulmck

On Thu, Jul 16, 2015 at 04:32:35PM +0100, Will Deacon wrote:
> cmpxchg64_relaxed is now defined by asm-generic/atomic.h, so we can

linux/atomic.h

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 5/7] locking/qrwlock: make use of acquire/release/relaxed atomics
  2015-07-16 15:32 ` [PATCH v2 5/7] locking/qrwlock: make use of acquire/release/relaxed atomics Will Deacon
@ 2015-07-16 16:59   ` Peter Zijlstra
  2015-07-16 18:13     ` Will Deacon
  0 siblings, 1 reply; 21+ messages in thread
From: Peter Zijlstra @ 2015-07-16 16:59 UTC (permalink / raw)
  To: Will Deacon; +Cc: linux-arch, Waiman.Long, linux-kernel, paulmck

On Thu, Jul 16, 2015 at 04:32:36PM +0100, Will Deacon wrote:
> @@ -130,8 +130,7 @@ static inline void queued_read_unlock(struct qrwlock *lock)
>  	/*
>  	 * Atomically decrement the reader count
>  	 */
> -	smp_mb__before_atomic();
> -	atomic_sub(_QR_BIAS, &lock->cnts);
> +	(void)atomic_sub_return_release(_QR_BIAS, &lock->cnts);
>  }
>  
>  /**

This one will actually cause different code on x86; I think its still
fine though. LOCK XADD should not be (much) slower than LOCK SUB.

> diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
> index a71bb3541880..879c8fab7bea 100644
> --- a/kernel/locking/qrwlock.c
> +++ b/kernel/locking/qrwlock.c
> @@ -36,7 +36,7 @@ rspin_until_writer_unlock(struct qrwlock *lock, u32 cnts)
>  {
>  	while ((cnts & _QW_WMASK) == _QW_LOCKED) {
>  		cpu_relax_lowlatency();
> -		cnts = smp_load_acquire((u32 *)&lock->cnts);
> +		cnts = atomic_read_acquire(&lock->cnts);
>  	}
>  }

It might make sense to add comments to the users of this function that
actually rely on the _acquire semantics, I had to double check that :-)


But otherwise that all looks good.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 4/7] lockref: remove homebrew cmpxchg64_relaxed macro definition
  2015-07-16 16:48   ` Peter Zijlstra
@ 2015-07-16 17:00     ` Will Deacon
  0 siblings, 0 replies; 21+ messages in thread
From: Will Deacon @ 2015-07-16 17:00 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-arch, Waiman.Long, linux-kernel, paulmck

On Thu, Jul 16, 2015 at 05:48:03PM +0100, Peter Zijlstra wrote:
> On Thu, Jul 16, 2015 at 04:32:35PM +0100, Will Deacon wrote:
> > cmpxchg64_relaxed is now defined by asm-generic/atomic.h, so we can
> 
> linux/atomic.h

Whoops, thanks. Fixed.

Will

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 5/7] locking/qrwlock: make use of acquire/release/relaxed atomics
  2015-07-16 16:59   ` Peter Zijlstra
@ 2015-07-16 18:13     ` Will Deacon
  0 siblings, 0 replies; 21+ messages in thread
From: Will Deacon @ 2015-07-16 18:13 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-arch, Waiman.Long, linux-kernel, paulmck

On Thu, Jul 16, 2015 at 05:59:03PM +0100, Peter Zijlstra wrote:
> On Thu, Jul 16, 2015 at 04:32:36PM +0100, Will Deacon wrote:
> > @@ -130,8 +130,7 @@ static inline void queued_read_unlock(struct qrwlock *lock)
> >  	/*
> >  	 * Atomically decrement the reader count
> >  	 */
> > -	smp_mb__before_atomic();
> > -	atomic_sub(_QR_BIAS, &lock->cnts);
> > +	(void)atomic_sub_return_release(_QR_BIAS, &lock->cnts);
> >  }
> >  
> >  /**
> 
> This one will actually cause different code on x86; I think its still
> fine though. LOCK XADD should not be (much) slower than LOCK SUB.

Yeah, I wondered whether introduced atomic_sub_release etc was worth the
hassle and decided against it for now.

> > diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
> > index a71bb3541880..879c8fab7bea 100644
> > --- a/kernel/locking/qrwlock.c
> > +++ b/kernel/locking/qrwlock.c
> > @@ -36,7 +36,7 @@ rspin_until_writer_unlock(struct qrwlock *lock, u32 cnts)
> >  {
> >  	while ((cnts & _QW_WMASK) == _QW_LOCKED) {
> >  		cpu_relax_lowlatency();
> > -		cnts = smp_load_acquire((u32 *)&lock->cnts);
> > +		cnts = atomic_read_acquire(&lock->cnts);
> >  	}
> >  }
> 
> It might make sense to add comments to the users of this function that
> actually rely on the _acquire semantics, I had to double check that :-)

Good point, I'll add those.

> But otherwise that all looks good.

Cheers. I'll send a v3 next week with your comments addressed. Pending
any objection, I guess this could be merged via -tip with the exception
of the ARM patch? FWIW, I plan to port arm64 once I've got my pending
asm/atomic.h rework queued.

Will

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 7/7] ARM: atomics: define our SMP atomics in terms of _relaxed operations
  2015-07-16 15:32 ` [PATCH v2 7/7] ARM: atomics: define our SMP atomics in terms of _relaxed operations Will Deacon
@ 2015-07-16 20:40   ` Waiman Long
  2015-07-16 21:08     ` Peter Zijlstra
  0 siblings, 1 reply; 21+ messages in thread
From: Waiman Long @ 2015-07-16 20:40 UTC (permalink / raw)
  To: Will Deacon; +Cc: linux-arch, peterz, linux-kernel, paulmck

On 07/16/2015 11:32 AM, Will Deacon wrote:
> By defining our SMP atomics in terms of relaxed operations, we gain
> a small reduction in code size and have acquire/release/fence variants
> generated automatically by the core code.
>
> Signed-off-by: Will Deacon<will.deacon@arm.com>
> ---
>   arch/arm/include/asm/atomic.h  | 37 ++++++++++++++-------------------
>   arch/arm/include/asm/cmpxchg.h | 47 +++++++-----------------------------------
>   2 files changed, 24 insertions(+), 60 deletions(-)
>
>
>
> -#define xchg(ptr, x) ({							\
> +#define xchg_relaxed(ptr, x) ({						\
>   	(__typeof__(*(ptr)))__xchg((unsigned long)(x), (ptr),		\
>   				   sizeof(*(ptr)));			\
>   })
> @@ -117,6 +115,8 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
>   #error "SMP is not supported on this platform"
>   #endif
>
> +#define xchg xchg_relaxed

Is that a typo? I think xchg() needs to be a full memory barrier.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 7/7] ARM: atomics: define our SMP atomics in terms of _relaxed operations
  2015-07-16 20:40   ` Waiman Long
@ 2015-07-16 21:08     ` Peter Zijlstra
  2015-07-17  0:00       ` Waiman Long
  0 siblings, 1 reply; 21+ messages in thread
From: Peter Zijlstra @ 2015-07-16 21:08 UTC (permalink / raw)
  To: Waiman Long; +Cc: Will Deacon, linux-arch, linux-kernel, paulmck

On Thu, Jul 16, 2015 at 04:40:03PM -0400, Waiman Long wrote:
> On 07/16/2015 11:32 AM, Will Deacon wrote:
> >@@ -117,6 +115,8 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
> >  #error "SMP is not supported on this platform"

             ^^^^^^^^^^^^^

> >  #endif
> >
> >+#define xchg xchg_relaxed
> 
> Is that a typo? I think xchg() needs to be a full memory barrier.

Pointless on UP.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 7/7] ARM: atomics: define our SMP atomics in terms of _relaxed operations
  2015-07-16 21:08     ` Peter Zijlstra
@ 2015-07-17  0:00       ` Waiman Long
  2015-07-17  9:35         ` Will Deacon
  0 siblings, 1 reply; 21+ messages in thread
From: Waiman Long @ 2015-07-17  0:00 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Will Deacon, linux-arch, linux-kernel, paulmck

On 07/16/2015 05:08 PM, Peter Zijlstra wrote:
> On Thu, Jul 16, 2015 at 04:40:03PM -0400, Waiman Long wrote:
>> On 07/16/2015 11:32 AM, Will Deacon wrote:
>>> @@ -117,6 +115,8 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
>>>   #error "SMP is not supported on this platform"
>               ^^^^^^^^^^^^^

That #error is only for ARMv5 or below.

>>>   #endif
>>>
>>> +#define xchg xchg_relaxed
>> Is that a typo? I think xchg() needs to be a full memory barrier.
> Pointless on UP.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/7] atomics: add acquire/release/relaxed variants of some atomic operations
  2015-07-16 15:32 ` [PATCH v2 1/7] atomics: add acquire/release/relaxed variants of some atomic operations Will Deacon
@ 2015-07-17  0:07   ` Waiman Long
  2015-07-17  9:40     ` Will Deacon
  0 siblings, 1 reply; 21+ messages in thread
From: Waiman Long @ 2015-07-17  0:07 UTC (permalink / raw)
  To: Will Deacon; +Cc: linux-arch, peterz, linux-kernel, paulmck

On 07/16/2015 11:32 AM, Will Deacon wrote:
> Whilst porting the generic qrwlock code over to arm64, it became
> apparent that any portable locking code needs finer-grained control of
> the memory-ordering guarantees provided by our atomic routines.
>
> In particular: xchg, cmpxchg, {add,sub}_return are often used in
> situations where full barrier semantics (currently the only option
> available) are not required. For example, when a reader increments a
> reader count to obtain a lock, checking the old value to see if a writer
> was present, only acquire semantics are strictly needed.
>
> This patch introduces three new ordering semantics for these operations:
>
>    - *_relaxed: No ordering guarantees. This is similar to what we have
>                 already for the non-return atomics (e.g. atomic_add).
>
>    - *_acquire: ACQUIRE semantics, similar to smp_load_acquire.
>
>    - *_release: RELEASE semantics, similar to smp_store_release.
>
> In memory-ordering speak, this means that the acquire/release semantics
> are RCpc as opposed to RCsc. Consequently a RELEASE followed by an
> ACQUIRE does not imply a full barrier, as already documented in
> memory-barriers.txt.
>
> Currently, all the new macros are conditionally mapped to the full-mb
> variants, however if the *_relaxed version is provided by the
> architecture, then the acquire/release variants are constructed by
> supplementing the relaxed routine with an explicit barrier.
>
> Cc: Peter Zijlstra<peterz@infradead.org>
> Signed-off-by: Will Deacon<will.deacon@arm.com>
> ---
>   include/linux/atomic.h | 312 +++++++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 312 insertions(+)
>
> diff --git a/include/linux/atomic.h b/include/linux/atomic.h
> index 5b08a8540ecf..08c2f6e56f76 100644
> --- a/include/linux/atomic.h
> +++ b/include/linux/atomic.h
> @@ -2,6 +2,318 @@
>   #ifndef _LINUX_ATOMIC_H
>   #define _LINUX_ATOMIC_H
>   #include<asm/atomic.h>
> +#include<asm/barrier.h>
> +
> +/*
> + * Relaxed variants of xchg, cmpxchg and some atomic operations.
> + *
> + * We support four variants:
> + *
> + * - Fully ordered: The default implementation, no suffix required.
> + * - Acquire: Provides ACQUIRE semantics, _acquire suffix.
> + * - Release: Provides RELEASE semantics, _release suffix.
> + * - Relaxed: No ordering guarantees, _relaxed suffix.
> + *
> + * See Documentation/memory-barriers.txt for ACQUIRE/RELEASE definitions.
> + */
> +
> +#ifndef atomic_read_acquire
> +#define  atomic_read_acquire(v)		smp_load_acquire(&(v)->counter)
> +#endif
> +
> +#ifndef atomic_set_release
> +#define  atomic_set_release(v, i)	smp_store_release(&(v)->counter, (i))
> +#endif
> +
> +/*
> + * The idea here is to build acquire/release variants by adding explicit
> + * barriers on top of the relaxed variant. In the case where the relaxed
> + * variant is already fully ordered, no additional barriers are needed.
> + */
> +#define __atomic_op_acquire(ret_t, op, ...)				\
> +({									\
> +	ret_t __ret = op##_relaxed(__VA_ARGS__);			\
> +	smp_mb__after_atomic();						\
> +	__ret;								\
> +})
> +
> +#define __atomic_op_release(ret_t, op, ...)				\
> +({									\
> +	ret_t __ret;							\
> +	smp_mb__before_atomic();					\
> +	__ret = op##_relaxed(__VA_ARGS__);				\
> +	__ret;								\
> +})
> +
> +#define __atomic_op_fence(ret_t, op, ...)				\
> +({									\
> +	ret_t __ret;							\
> +	smp_mb__before_atomic();					\
> +	__ret = op##_relaxed(__VA_ARGS__);				\
> +	smp_mb__after_atomic();						\
> +	__ret;								\
> +})
> +
> +#ifndef atomic_add_return_relaxed
> +#define  atomic_add_return_relaxed	atomic_add_return
> +#define  atomic_add_return_acquire	atomic_add_return
> +#define  atomic_add_return_release	atomic_add_return
> +
> +#else /* atomic_add_return_relaxed */
> +
> +#ifndef atomic_add_return_acquire
> +#define  atomic_add_return_acquire(...)					\
> +	__atomic_op_acquire(int, atomic_add_return, __VA_ARGS__)
> +#endif
> +
> +#ifndef atomic_add_return_release
> +#define  atomic_add_return_release(...)					\
> +	__atomic_op_release(int, atomic_add_return, __VA_ARGS__)
> +#endif
> +
> +#ifndef atomic_add_return
> +#define  atomic_add_return(...)						\
> +	__atomic_op_fence(int, atomic_add_return, __VA_ARGS__)
> +#endif
> +#endif /* atomic_add_return_relaxed */
> +
> +#ifndef atomic_sub_return_relaxed
> +#define  atomic_sub_return_relaxed	atomic_sub_return
> +#define  atomic_sub_return_acquire	atomic_sub_return
> +#define  atomic_sub_return_release	atomic_sub_return
> +
> +#else /* atomic_sub_return_relaxed */
> +
> +#ifndef atomic_sub_return_acquire
> +#define  atomic_sub_return_acquire(...)					\
> +	__atomic_op_acquire(int, atomic_sub_return, __VA_ARGS__)
> +#endif
> +
> +#ifndef atomic_sub_return_release
> +#define  atomic_sub_return_release(...)					\
> +	__atomic_op_release(int, atomic_sub_return, __VA_ARGS__)
> +#endif
> +
> +#ifndef atomic_sub_return
> +#define  atomic_sub_return(...)						\
> +	__atomic_op_fence(int, atomic_sub_return, __VA_ARGS__)
> +#endif
> +#endif /* atomic_sub_return_relaxed */
> +
> +#ifndef atomic_xchg_relaxed
> +#define  atomic_xchg_relaxed		atomic_xchg
> +#define  atomic_xchg_acquire		atomic_xchg
> +#define  atomic_xchg_release		atomic_xchg
> +
> +#else /* atomic_xchg_relaxed */
> +
> +#ifndef atomic_xchg_acquire
> +#define  atomic_xchg_acquire(...)					\
> +	__atomic_op_acquire(int, atomic_xchg, __VA_ARGS__)
> +#endif
> +
> +#ifndef atomic_xchg_release
> +#define  atomic_xchg_release(...)					\
> +	__atomic_op_release(int, atomic_xchg, __VA_ARGS__)
> +#endif
> +
> +#ifndef atomic_xchg
> +#define  atomic_xchg(...)						\
> +	__atomic_op_fence(int, atomic_xchg, __VA_ARGS__)
> +#endif
> +#endif /* atomic_xchg_relaxed */
> +
> +#ifndef atomic_cmpxchg_relaxed
> +#define  atomic_cmpxchg_relaxed		atomic_cmpxchg
> +#define  atomic_cmpxchg_acquire		atomic_cmpxchg
> +#define  atomic_cmpxchg_release		atomic_cmpxchg
> +
> +#else /* atomic_cmpxchg_relaxed */
> +
> +#ifndef atomic_cmpxchg_acquire
> +#define  atomic_cmpxchg_acquire(...)					\
> +	__atomic_op_acquire(int, atomic_cmpxchg, __VA_ARGS__)
> +#endif
> +
> +#ifndef atomic_cmpxchg_release
> +#define  atomic_cmpxchg_release(...)					\
> +	__atomic_op_release(int, atomic_cmpxchg, __VA_ARGS__)
> +#endif
> +
> +#ifndef atomic_cmpxchg
> +#define  atomic_cmpxchg(...)						\
> +	__atomic_op_fence(int, atomic_cmpxchg, __VA_ARGS__)
> +#endif
> +#endif /* atomic_cmpxchg_relaxed */
> +
> +#ifndef atomic64_read_acquire
> +#define  atomic64_read_acquire(v)	smp_load_acquire(&(v)->counter)
> +#endif
> +
> +#ifndef atomic64_set_release
> +#define  atomic64_set_release(v, i)	smp_store_release(&(v)->counter, (i))
> +#endif
> +
> +#ifndef atomic64_add_return_relaxed
> +#define  atomic64_add_return_relaxed	atomic64_add_return
> +#define  atomic64_add_return_acquire	atomic64_add_return
> +#define  atomic64_add_return_release	atomic64_add_return
> +
> +#else /* atomic64_add_return_relaxed */
> +
> +#ifndef atomic64_add_return_acquire
> +#define  atomic64_add_return_acquire(...)				\
> +	__atomic_op_acquire(long long, atomic64_add_return, __VA_ARGS__)
> +#endif
> +
> +#ifndef atomic64_add_return_release
> +#define  atomic64_add_return_release(...)				\
> +	__atomic_op_release(long long, atomic64_add_return, __VA_ARGS__)
> +#endif
> +
> +#ifndef atomic64_add_return
> +#define  atomic64_add_return(...)					\
> +	__atomic_op_fence(long long, atomic64_add_return, __VA_ARGS__)
> +#endif
> +#endif /* atomic64_add_return_relaxed */
> +

I have a minor nit. The atomic_add_return block is repeated with 
"s/atomic_add_return/.../". Perhaps some more comments to delineate the 
blocks more visibly will make this patch easier to read.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 7/7] ARM: atomics: define our SMP atomics in terms of _relaxed operations
  2015-07-17  0:00       ` Waiman Long
@ 2015-07-17  9:35         ` Will Deacon
  2015-07-17 17:17           ` Waiman Long
  0 siblings, 1 reply; 21+ messages in thread
From: Will Deacon @ 2015-07-17  9:35 UTC (permalink / raw)
  To: Waiman Long; +Cc: Peter Zijlstra, linux-arch, linux-kernel, paulmck

On Fri, Jul 17, 2015 at 01:00:34AM +0100, Waiman Long wrote:
> On 07/16/2015 05:08 PM, Peter Zijlstra wrote:
> > On Thu, Jul 16, 2015 at 04:40:03PM -0400, Waiman Long wrote:
> >> On 07/16/2015 11:32 AM, Will Deacon wrote:
> >>> @@ -117,6 +115,8 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
> >>>   #error "SMP is not supported on this platform"
> >               ^^^^^^^^^^^^^
> 
> That #error is only for ARMv5 or below.
> 
> >>>   #endif
> >>>
> >>> +#define xchg xchg_relaxed
> >> Is that a typo? I think xchg() needs to be a full memory barrier.
> > Pointless on UP.

I don't see the problem here. As Peter pointed out, this code only gets
looked at if !SMP and structuring it this way means I can have one
definition of xchg_relaxed, regardless of architecture version.

Will

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/7] atomics: add acquire/release/relaxed variants of some atomic operations
  2015-07-17  0:07   ` Waiman Long
@ 2015-07-17  9:40     ` Will Deacon
  2015-07-17 17:19       ` Waiman Long
  0 siblings, 1 reply; 21+ messages in thread
From: Will Deacon @ 2015-07-17  9:40 UTC (permalink / raw)
  To: Waiman Long; +Cc: linux-arch, peterz, linux-kernel, paulmck

Hi Waiman,

On Fri, Jul 17, 2015 at 01:07:28AM +0100, Waiman Long wrote:
> On 07/16/2015 11:32 AM, Will Deacon wrote:
> > +#ifndef atomic64_add_return_relaxed
> > +#define  atomic64_add_return_relaxed	atomic64_add_return
> > +#define  atomic64_add_return_acquire	atomic64_add_return
> > +#define  atomic64_add_return_release	atomic64_add_return
> > +
> > +#else /* atomic64_add_return_relaxed */
> > +
> > +#ifndef atomic64_add_return_acquire
> > +#define  atomic64_add_return_acquire(...)				\
> > +	__atomic_op_acquire(long long, atomic64_add_return, __VA_ARGS__)
> > +#endif
> > +
> > +#ifndef atomic64_add_return_release
> > +#define  atomic64_add_return_release(...)				\
> > +	__atomic_op_release(long long, atomic64_add_return, __VA_ARGS__)
> > +#endif
> > +
> > +#ifndef atomic64_add_return
> > +#define  atomic64_add_return(...)					\
> > +	__atomic_op_fence(long long, atomic64_add_return, __VA_ARGS__)
> > +#endif
> > +#endif /* atomic64_add_return_relaxed */
> > +
> 
> I have a minor nit. The atomic_add_return block is repeated with 
> "s/atomic_add_return/.../". Perhaps some more comments to delineate the 
> blocks more visibly will make this patch easier to read.

Yeah, I agree that it's pretty hard going, but I don't have any great
suggestions to solve that. I could add an extra blank line + comment
before the start of each section, if you like? Example snippet below.

Will

--->8

[...]
#endif /* atomic_sub_return_relaxed */


/* atomic_xchg_relaxed */
#ifndef atomic_xchg_relaxed
#define  atomic_xchg_relaxed		atomic_xchg
#define  atomic_xchg_acquire		atomic_xchg
#define  atomic_xchg_release		atomic_xchg

#else /* atomic_xchg_relaxed */

#ifndef atomic_xchg_acquire
#define  atomic_xchg_acquire(...)					\
	__atomic_op_acquire(int, atomic_xchg, __VA_ARGS__)
#endif

#ifndef atomic_xchg_release
#define  atomic_xchg_release(...)					\
	__atomic_op_release(int, atomic_xchg, __VA_ARGS__)
#endif

#ifndef atomic_xchg
#define  atomic_xchg(...)						\
	__atomic_op_fence(int, atomic_xchg, __VA_ARGS__)
#endif
#endif /* atomic_xchg_relaxed */


/* atomic_cmpxchg_relaxed */
[...]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 7/7] ARM: atomics: define our SMP atomics in terms of _relaxed operations
  2015-07-17  9:35         ` Will Deacon
@ 2015-07-17 17:17           ` Waiman Long
  0 siblings, 0 replies; 21+ messages in thread
From: Waiman Long @ 2015-07-17 17:17 UTC (permalink / raw)
  To: Will Deacon; +Cc: Peter Zijlstra, linux-arch, linux-kernel, paulmck

On 07/17/2015 05:35 AM, Will Deacon wrote:
> On Fri, Jul 17, 2015 at 01:00:34AM +0100, Waiman Long wrote:
>> On 07/16/2015 05:08 PM, Peter Zijlstra wrote:
>>> On Thu, Jul 16, 2015 at 04:40:03PM -0400, Waiman Long wrote:
>>>> On 07/16/2015 11:32 AM, Will Deacon wrote:
>>>>> @@ -117,6 +115,8 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
>>>>>    #error "SMP is not supported on this platform"
>>>                ^^^^^^^^^^^^^
>> That #error is only for ARMv5 or below.
>>
>>>>>    #endif
>>>>>
>>>>> +#define xchg xchg_relaxed
>>>> Is that a typo? I think xchg() needs to be a full memory barrier.
>>> Pointless on UP.
> I don't see the problem here. As Peter pointed out, this code only gets
> looked at if !SMP and structuring it this way means I can have one
> definition of xchg_relaxed, regardless of architecture version.
>
> Will

I am sorry that I got confused as to where it is defined. It should be 
OK in this case.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/7] atomics: add acquire/release/relaxed variants of some atomic operations
  2015-07-17  9:40     ` Will Deacon
@ 2015-07-17 17:19       ` Waiman Long
  2015-07-17 17:30         ` Will Deacon
  0 siblings, 1 reply; 21+ messages in thread
From: Waiman Long @ 2015-07-17 17:19 UTC (permalink / raw)
  To: Will Deacon; +Cc: linux-arch, peterz, linux-kernel, paulmck

On 07/17/2015 05:40 AM, Will Deacon wrote:
> Hi Waiman,
>
> On Fri, Jul 17, 2015 at 01:07:28AM +0100, Waiman Long wrote:
>> On 07/16/2015 11:32 AM, Will Deacon wrote:
>>> +#ifndef atomic64_add_return_relaxed
>>> +#define  atomic64_add_return_relaxed	atomic64_add_return
>>> +#define  atomic64_add_return_acquire	atomic64_add_return
>>> +#define  atomic64_add_return_release	atomic64_add_return
>>> +
>>> +#else /* atomic64_add_return_relaxed */
>>> +
>>> +#ifndef atomic64_add_return_acquire
>>> +#define  atomic64_add_return_acquire(...)				\
>>> +	__atomic_op_acquire(long long, atomic64_add_return, __VA_ARGS__)
>>> +#endif
>>> +
>>> +#ifndef atomic64_add_return_release
>>> +#define  atomic64_add_return_release(...)				\
>>> +	__atomic_op_release(long long, atomic64_add_return, __VA_ARGS__)
>>> +#endif
>>> +
>>> +#ifndef atomic64_add_return
>>> +#define  atomic64_add_return(...)					\
>>> +	__atomic_op_fence(long long, atomic64_add_return, __VA_ARGS__)
>>> +#endif
>>> +#endif /* atomic64_add_return_relaxed */
>>> +
>> I have a minor nit. The atomic_add_return block is repeated with
>> "s/atomic_add_return/.../". Perhaps some more comments to delineate the
>> blocks more visibly will make this patch easier to read.
> Yeah, I agree that it's pretty hard going, but I don't have any great
> suggestions to solve that. I could add an extra blank line + comment
> before the start of each section, if you like? Example snippet below.
>
> Will
>
> --->8
>
> [...]
> #endif /* atomic_sub_return_relaxed */
>
>
> /* atomic_xchg_relaxed */
> #ifndef atomic_xchg_relaxed
> #define  atomic_xchg_relaxed		atomic_xchg
> #define  atomic_xchg_acquire		atomic_xchg
> #define  atomic_xchg_release		atomic_xchg
>
> #else /* atomic_xchg_relaxed */
>
> #ifndef atomic_xchg_acquire
> #define  atomic_xchg_acquire(...)					\
> 	__atomic_op_acquire(int, atomic_xchg, __VA_ARGS__)
> #endif
>
> #ifndef atomic_xchg_release
> #define  atomic_xchg_release(...)					\
> 	__atomic_op_release(int, atomic_xchg, __VA_ARGS__)
> #endif
>
> #ifndef atomic_xchg
> #define  atomic_xchg(...)						\
> 	__atomic_op_fence(int, atomic_xchg, __VA_ARGS__)
> #endif
> #endif /* atomic_xchg_relaxed */
>
>
> /* atomic_cmpxchg_relaxed */
> [...]
Something like

/* BEGIN atomc_xchg_relax */
...
/* END atomic_xchg_relax */

may help.

Alternatively, I sometimes add a line separator like

/*===================[ atomic_xchg_relax ]====================*/

Cheers,
Longman

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/7] atomics: add acquire/release/relaxed variants of some atomic operations
  2015-07-17 17:19       ` Waiman Long
@ 2015-07-17 17:30         ` Will Deacon
  0 siblings, 0 replies; 21+ messages in thread
From: Will Deacon @ 2015-07-17 17:30 UTC (permalink / raw)
  To: Waiman Long; +Cc: linux-arch, peterz, linux-kernel, paulmck

On Fri, Jul 17, 2015 at 06:19:54PM +0100, Waiman Long wrote:
> On 07/17/2015 05:40 AM, Will Deacon wrote:
> > On Fri, Jul 17, 2015 at 01:07:28AM +0100, Waiman Long wrote:
> >> I have a minor nit. The atomic_add_return block is repeated with
> >> "s/atomic_add_return/.../". Perhaps some more comments to delineate the
> >> blocks more visibly will make this patch easier to read.
> > Yeah, I agree that it's pretty hard going, but I don't have any great
> > suggestions to solve that. I could add an extra blank line + comment
> > before the start of each section, if you like? Example snippet below.
> >
> > Will
> >
> > --->8
> >
> > [...]
> > #endif /* atomic_sub_return_relaxed */
> >
> >
> > /* atomic_xchg_relaxed */
> > #ifndef atomic_xchg_relaxed
> > #define  atomic_xchg_relaxed		atomic_xchg
> > #define  atomic_xchg_acquire		atomic_xchg
> > #define  atomic_xchg_release		atomic_xchg
> >
> > #else /* atomic_xchg_relaxed */
> >
> > #ifndef atomic_xchg_acquire
> > #define  atomic_xchg_acquire(...)					\
> > 	__atomic_op_acquire(int, atomic_xchg, __VA_ARGS__)
> > #endif
> >
> > #ifndef atomic_xchg_release
> > #define  atomic_xchg_release(...)					\
> > 	__atomic_op_release(int, atomic_xchg, __VA_ARGS__)
> > #endif
> >
> > #ifndef atomic_xchg
> > #define  atomic_xchg(...)						\
> > 	__atomic_op_fence(int, atomic_xchg, __VA_ARGS__)
> > #endif
> > #endif /* atomic_xchg_relaxed */
> >
> >
> > /* atomic_cmpxchg_relaxed */
> > [...]
> Something like
> 
> /* BEGIN atomc_xchg_relax */
> ...
> /* END atomic_xchg_relax */
> 
> may help.
> 
> Alternatively, I sometimes add a line separator like
> 
> /*===================[ atomic_xchg_relax ]====================*/

Hmm, I think we're straying into cosmetic preferences here and I can
imagine some people taking objection to the above, especially as they
don't appear to be in common use in mainline.

Will

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2015-07-17 17:30 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-16 15:32 [PATCH v2 0/7] Add generic support for relaxed atomics Will Deacon
2015-07-16 15:32 ` [PATCH v2 1/7] atomics: add acquire/release/relaxed variants of some atomic operations Will Deacon
2015-07-17  0:07   ` Waiman Long
2015-07-17  9:40     ` Will Deacon
2015-07-17 17:19       ` Waiman Long
2015-07-17 17:30         ` Will Deacon
2015-07-16 15:32 ` [PATCH v2 2/7] asm-generic: rework atomic-long.h to avoid bulk code duplication Will Deacon
2015-07-16 15:32 ` [PATCH v2 3/7] asm-generic: add relaxed/acquire/release variants for atomic_long_t Will Deacon
2015-07-16 15:32 ` [PATCH v2 4/7] lockref: remove homebrew cmpxchg64_relaxed macro definition Will Deacon
2015-07-16 16:48   ` Peter Zijlstra
2015-07-16 17:00     ` Will Deacon
2015-07-16 15:32 ` [PATCH v2 5/7] locking/qrwlock: make use of acquire/release/relaxed atomics Will Deacon
2015-07-16 16:59   ` Peter Zijlstra
2015-07-16 18:13     ` Will Deacon
2015-07-16 15:32 ` [PATCH v2 6/7] include/llist: use linux/atomic.h instead of asm/cmpxchg.h Will Deacon
2015-07-16 15:32 ` [PATCH v2 7/7] ARM: atomics: define our SMP atomics in terms of _relaxed operations Will Deacon
2015-07-16 20:40   ` Waiman Long
2015-07-16 21:08     ` Peter Zijlstra
2015-07-17  0:00       ` Waiman Long
2015-07-17  9:35         ` Will Deacon
2015-07-17 17:17           ` Waiman Long

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.