[PATCH 0/4] arch: Introduce smp_load_acquire() and smp_store

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/4] arch: Introduce smp_load_acquire() and smp_store_release()
@ 2013-12-13 14:56 Peter Zijlstra
  2013-12-13 14:56 ` [PATCH 1/4] doc: Rename LOCK/UNLOCK to ACQUIRE/RELEASE Peter Zijlstra
                   ` (3 more replies)
  0 siblings, 4 replies; 28+ messages in thread
From: Peter Zijlstra @ 2013-12-13 14:56 UTC (permalink / raw)
  To: linux-arch, linux-kernel
  Cc: geert, paulmck, torvalds, VICTORK, oleg, anton, benh, fweisbec,
	mathieu.desnoyers, michael, mikey, linux, schwidefsky,
	heiko.carstens, tony.luck, Peter Zijlstra

These patches introduce 2 new barrier primitives:

  smp_load_acquire(p)
  smp_store_release(p, v)

See the first patch, which changes Documentation/memory-barriers.txt, to find
the exact definitions of what an ACQUIRE/RELEASE barrier is -- previously known
as LOCK/UNLOCK barriers.

The second patch moves the smp_mb__{before,after}_atomic_{dec,inc}() barriers
to asm/atomic.h for arc and hexagon -- they were already there for all other archs.

This cleans up asm/barrier.h, and the third patch makes more agressive use of
asm-generic/barrier.h to implement the simple cases.

Then the fourth patch adds the new primitives.

Previous versions were widely build tested -- this version is not, but it also
not significantly different.

These patches apply to:

  tip/master + lkml.kernel.org/r/20131211215850.GA810@linux.vnet.ibm.com

---
Changes since the last version -- lkml.kernel.org/r/20131107220314.740353088@infradead.org

 - rebased
 - broke out the smp_mb__*atomic* movement
 - changed the ia64 implementation to use volatile instead
   of actual ASM goo -- Tony, do let me know if you would
   prefer to have the ASM goo back; but given you said that
   gcc-ia64 generates the right instructions for volatile...

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/4] doc: Rename LOCK/UNLOCK to ACQUIRE/RELEASE
  2013-12-13 14:56 [PATCH 0/4] arch: Introduce smp_load_acquire() and smp_store_release() Peter Zijlstra
@ 2013-12-13 14:56 ` Peter Zijlstra
  2013-12-16 20:11   ` Paul E. McKenney
  2013-12-13 14:56 ` [PATCH 2/4] arch: Move smp_mb__{before,after}_atomic_{inc,dec}.h into asm/atomic.h Peter Zijlstra
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 28+ messages in thread
From: Peter Zijlstra @ 2013-12-13 14:56 UTC (permalink / raw)
  To: linux-arch, linux-kernel
  Cc: geert, paulmck, torvalds, VICTORK, oleg, anton, benh, fweisbec,
	mathieu.desnoyers, michael, mikey, linux, schwidefsky,
	heiko.carstens, tony.luck, Peter Zijlstra

[-- Attachment #1: peterz-arch-Document-v2.patch --]
[-- Type: text/plain, Size: 17565 bytes --]

The LOCK and UNLOCK barriers as described in our barrier document are
generally known as ACQUIRE and RELEASE barriers in other literature.

Since we plan to introduce the acquire and release nomenclature in
generic kernel primitives we should amend the document to avoid
confusion as to what an acquire/release means.

Cc: Tony Luck <tony.luck@intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Michael Ellerman <michael@ellerman.id.au>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Victor Kaplansky <VICTORK@il.ibm.com>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Reviewed-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
 Documentation/memory-barriers.txt |  217 +++++++++++++++++++-------------------
 1 file changed, 111 insertions(+), 106 deletions(-)

--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -381,39 +381,44 @@ VARIETIES OF MEMORY BARRIER
 
 And a couple of implicit varieties:
 
- (5) LOCK operations.
+ (5) ACQUIRE operations.
 
      This acts as a one-way permeable barrier.  It guarantees that all memory
-     operations after the LOCK operation will appear to happen after the LOCK
-     operation with respect to the other components of the system.
+     operations after the ACQUIRE operation will appear to happen after the
+     ACQUIRE operation with respect to the other components of the system.
+     ACQUIRE oeprations include LOCK operations and smp_load_acquire()
+     operations.
 
-     Memory operations that occur before a LOCK operation may appear to happen
-     after it completes.
+     Memory operations that occur before an ACQUIRE operation may appear to
+     happen after it completes.
 
-     A LOCK operation should almost always be paired with an UNLOCK operation.
+     An ACQUIRE operation should almost always be paired with a RELEASE
+     operation.
 
 
- (6) UNLOCK operations.
+ (6) RELEASE operations.
 
      This also acts as a one-way permeable barrier.  It guarantees that all
-     memory operations before the UNLOCK operation will appear to happen before
-     the UNLOCK operation with respect to the other components of the system.
+     memory operations before the RELEASE operation will appear to happen
+     before the RELEASE operation with respect to the other components of the
+     system. RELEASE operations include UNLOCK operations and
+     smp_store_release() operations.
 
-     Memory operations that occur after an UNLOCK operation may appear to
+     Memory operations that occur after a RELEASE operation may appear to
      happen before it completes.
 
-     The use of LOCK and UNLOCK operations generally precludes the need for
-     other sorts of memory barrier (but note the exceptions mentioned in the
-     subsection "MMIO write barrier").  In addition, an UNLOCK+LOCK pair
-     is -not- guaranteed to act as a full memory barrier.  However,
-     after a LOCK on a given lock variable, all memory accesses preceding any
-     prior UNLOCK on that same variable are guaranteed to be visible.
-     In other words, within a given lock variable's critical section,
-     all accesses of all previous critical sections for that lock variable
-     are guaranteed to have completed.
+     The use of ACQUIRE and RELEASE operations generally precludes the need
+     for other sorts of memory barrier (but note the exceptions mentioned in
+     the subsection "MMIO write barrier").  In addition, a RELEASE+ACQUIRE
+     pair is -not- guaranteed to act as a full memory barrier.  However, after
+     an ACQUIRE on a given lock variable, all memory accesses preceding any
+     prior RELEASE on that same variable are guaranteed to be visible.  In
+     other words, within a given lock variable's critical section, all
+     accesses of all previous critical sections for that lock variable are
+     guaranteed to have completed.
 
-     This means that LOCK acts as a minimal "acquire" operation and
-     UNLOCK acts as a minimal "release" operation.
+     This means that ACQUIRE acts as a minimal "acquire" operation and
+     RELEASE acts as a minimal "release" operation.
 
 
 Memory barriers are only required where there's a possibility of interaction
@@ -1585,7 +1590,7 @@ CPU from reordering them.
 	clear_bit( ... );
 
      This prevents memory operations before the clear leaking to after it.  See
-     the subsection on "Locking Functions" with reference to UNLOCK operation
+     the subsection on "Locking Functions" with reference to RELEASE operation
      implications.
 
      See Documentation/atomic_ops.txt for more information.  See the "Atomic
@@ -1619,7 +1624,7 @@ provide more substantial guarantees, but
 of arch specific code.
 
 
-LOCKING FUNCTIONS
+ACQUIREING FUNCTIONS
 -----------------
 
 The Linux kernel has a number of locking constructs:
@@ -1631,106 +1636,106 @@ LOCKING FUNCTIONS
  (*) R/W semaphores
  (*) RCU
 
-In all cases there are variants on "LOCK" operations and "UNLOCK" operations
+In all cases there are variants on "ACQUIRE" operations and "RELEASE" operations
 for each construct.  These operations all imply certain barriers:
 
- (1) LOCK operation implication:
+ (1) ACQUIRE operation implication:
 
-     Memory operations issued after the LOCK will be completed after the LOCK
-     operation has completed.
+     Memory operations issued after the ACQUIRE will be completed after the
+     ACQUIRE operation has completed.
 
-     Memory operations issued before the LOCK may be completed after the
-     LOCK operation has completed.  An smp_mb__before_spinlock(), combined
-     with a following LOCK, orders prior loads against subsequent stores
-     and stores and prior stores against subsequent stores.  Note that
-     this is weaker than smp_mb()!  The smp_mb__before_spinlock()
-     primitive is free on many architectures.
+     Memory operations issued before the ACQUIRE may be completed after the
+     ACQUIRE operation has completed.  An smp_mb__before_spinlock(), combined
+     with a following ACQUIRE, orders prior loads against subsequent stores and
+     stores and prior stores against subsequent stores.  Note that this is
+     weaker than smp_mb()!  The smp_mb__before_spinlock() primitive is free on
+     many architectures.
 
- (2) UNLOCK operation implication:
+ (2) RELEASE operation implication:
 
-     Memory operations issued before the UNLOCK will be completed before the
-     UNLOCK operation has completed.
+     Memory operations issued before the RELEASE will be completed before the
+     RELEASE operation has completed.
 
-     Memory operations issued after the UNLOCK may be completed before the
-     UNLOCK operation has completed.
+     Memory operations issued after the RELEASE may be completed before the
+     RELEASE operation has completed.
 
- (3) LOCK vs LOCK implication:
+ (3) ACQUIRE vs ACQUIRE implication:
 
-     All LOCK operations issued before another LOCK operation will be completed
-     before that LOCK operation.
+     All ACQUIRE operations issued before another ACQUIRE operation will be
+     completed before that ACQUIRE operation.
 
- (4) LOCK vs UNLOCK implication:
+ (4) ACQUIRE vs RELEASE implication:
 
-     All LOCK operations issued before an UNLOCK operation will be completed
-     before the UNLOCK operation.
+     All ACQUIRE operations issued before a RELEASE operation will be
+     completed before the RELEASE operation.
 
- (5) Failed conditional LOCK implication:
+ (5) Failed conditional ACQUIRE implication:
 
-     Certain variants of the LOCK operation may fail, either due to being
+     Certain variants of the ACQUIRE operation may fail, either due to being
      unable to get the lock immediately, or due to receiving an unblocked
      signal whilst asleep waiting for the lock to become available.  Failed
      locks do not imply any sort of barrier.
 
-[!] Note: one of the consequences of LOCKs and UNLOCKs being only one-way
+[!] Note: one of the consequences of ACQUIREs and RELEASEs being only one-way
     barriers is that the effects of instructions outside of a critical section
     may seep into the inside of the critical section.
 
-A LOCK followed by an UNLOCK may not be assumed to be full memory barrier
-because it is possible for an access preceding the LOCK to happen after the
-LOCK, and an access following the UNLOCK to happen before the UNLOCK, and the
-two accesses can themselves then cross:
+An ACQUIRE followed by a RELEASE may not be assumed to be full memory barrier
+because it is possible for an access preceding the ACQUIRE to happen after the
+ACQUIRE, and an access following the RELEASE to happen before the RELEASE, and
+the two accesses can themselves then cross:
 
 	*A = a;
-	LOCK M
-	UNLOCK M
+	ACQUIRE M
+	RELEASE M
 	*B = b;
 
 may occur as:
 
-	LOCK M, STORE *B, STORE *A, UNLOCK M
+	ACQUIRE M, STORE *B, STORE *A, RELEASE M
 
-This same reordering can of course occur if the LOCK and UNLOCK are
+This same reordering can of course occur if the ACQUIRE and RELEASE are
 to the same lock variable, but only from the perspective of another
 CPU not holding that lock.
 
-In short, an UNLOCK followed by a LOCK may -not- be assumed to be a full
-memory barrier because it is possible for a preceding UNLOCK to pass a
-later LOCK from the viewpoint of the CPU, but not from the viewpoint
+In short, a RELEASE followed by an ACQUIRE may -not- be assumed to be a full
+memory barrier because it is possible for a preceding RELEASE to pass a
+later ACQUIRE from the viewpoint of the CPU, but not from the viewpoint
 of the compiler.  Note that deadlocks cannot be introduced by this
-interchange because if such a deadlock threatened, the UNLOCK would
+interchange because if such a deadlock threatened, the RELEASE would
 simply complete.
 
-If it is necessary for an UNLOCK-LOCK pair to produce a full barrier,
-the LOCK can be followed by an smp_mb__after_unlock_lock() invocation.
-This will produce a full barrier if either (a) the UNLOCK and the LOCK
-are executed by the same CPU or task, or (b) the UNLOCK and LOCK act
-on the same lock variable.  The smp_mb__after_unlock_lock() primitive
-is free on many architectures.  Without smp_mb__after_unlock_lock(),
-the critical sections corresponding to the UNLOCK and the LOCK can cross:
+If it is necessary for a RELEASE-ACQUIRE pair to produce a full barrier, the
+ACQUIRE can be followed by an smp_mb__after_unlock_lock() invocation.  This
+will produce a full barrier if either (a) the RELEASE and the ACQUIRE are
+executed by the same CPU or task, or (b) the RELEASE and ACQUIRE act on the
+same lock variable.  The smp_mb__after_unlock_lock() primitive is free on many
+architectures.  Without smp_mb__after_unlock_lock(), the critical sections
+corresponding to the RELEASE and the ACQUIRE can cross:
 
 	*A = a;
-	UNLOCK M
-	LOCK N
+	RELEASE M
+	ACQUIRE N
 	*B = b;
 
 could occur as:
 
-	LOCK N, STORE *B, STORE *A, UNLOCK M
+	ACQUIRE N, STORE *B, STORE *A, RELEASE M
 
 With smp_mb__after_unlock_lock(), they cannot, so that:
 
 	*A = a;
-	UNLOCK M
-	LOCK N
+	RELEASE M
+	ACQUIRE N
 	smp_mb__after_unlock_lock();
 	*B = b;
 
 will always occur as either of the following:
 
-	STORE *A, UNLOCK, LOCK, STORE *B
-	STORE *A, LOCK, UNLOCK, STORE *B
+	STORE *A, RELEASE, ACQUIRE, STORE *B
+	STORE *A, ACQUIRE, RELEASE, STORE *B
 
-If the UNLOCK and LOCK were instead both operating on the same lock
+If the RELEASE and ACQUIRE were instead both operating on the same lock
 variable, only the first of these two alternatives can occur.
 
 Locks and semaphores may not provide any guarantee of ordering on UP compiled
@@ -1745,33 +1750,33 @@ See also the section on "Inter-CPU locki
 
 	*A = a;
 	*B = b;
-	LOCK
+	ACQUIRE
 	*C = c;
 	*D = d;
-	UNLOCK
+	RELEASE
 	*E = e;
 	*F = f;
 
 The following sequence of events is acceptable:
 
-	LOCK, {*F,*A}, *E, {*C,*D}, *B, UNLOCK
+	ACQUIRE, {*F,*A}, *E, {*C,*D}, *B, RELEASE
 
 	[+] Note that {*F,*A} indicates a combined access.
 
 But none of the following are:
 
-	{*F,*A}, *B,	LOCK, *C, *D,	UNLOCK, *E
-	*A, *B, *C,	LOCK, *D,	UNLOCK, *E, *F
-	*A, *B,		LOCK, *C,	UNLOCK, *D, *E, *F
-	*B,		LOCK, *C, *D,	UNLOCK, {*F,*A}, *E
+	{*F,*A}, *B,	ACQUIRE, *C, *D,	RELEASE, *E
+	*A, *B, *C,	ACQUIRE, *D,		RELEASE, *E, *F
+	*A, *B,		ACQUIRE, *C,		RELEASE, *D, *E, *F
+	*B,		ACQUIRE, *C, *D,	RELEASE, {*F,*A}, *E
 
 
 
 INTERRUPT DISABLING FUNCTIONS
 -----------------------------
 
-Functions that disable interrupts (LOCK equivalent) and enable interrupts
-(UNLOCK equivalent) will act as compiler barriers only.  So if memory or I/O
+Functions that disable interrupts (ACQUIRE equivalent) and enable interrupts
+(RELEASE equivalent) will act as compiler barriers only.  So if memory or I/O
 barriers are required in such a situation, they must be provided from some
 other means.
 
@@ -1911,7 +1916,7 @@ MISCELLANEOUS FUNCTIONS
 
 
 =================================
-INTER-CPU LOCKING BARRIER EFFECTS
+INTER-CPU ACQUIREING BARRIER EFFECTS
 =================================
 
 On SMP systems locking primitives give a more substantial form of barrier: one
@@ -1919,7 +1924,7 @@ that does affect memory access ordering
 conflict on any particular lock.
 
 
-LOCKS VS MEMORY ACCESSES
+ACQUIRES VS MEMORY ACCESSES
 ------------------------
 
 Consider the following: the system has a pair of spinlocks (M) and (Q), and
@@ -1928,24 +1933,24 @@ Consider the following: the system has a
 	CPU 1				CPU 2
 	===============================	===============================
 	ACCESS_ONCE(*A) = a;		ACCESS_ONCE(*E) = e;
-	LOCK M				LOCK Q
+	ACQUIRE M			ACQUIRE Q
 	ACCESS_ONCE(*B) = b;		ACCESS_ONCE(*F) = f;
 	ACCESS_ONCE(*C) = c;		ACCESS_ONCE(*G) = g;
-	UNLOCK M			UNLOCK Q
+	RELEASE M			RELEASE Q
 	ACCESS_ONCE(*D) = d;		ACCESS_ONCE(*H) = h;
 
 Then there is no guarantee as to what order CPU 3 will see the accesses to *A
 through *H occur in, other than the constraints imposed by the separate locks
 on the separate CPUs. It might, for example, see:
 
-	*E, LOCK M, LOCK Q, *G, *C, *F, *A, *B, UNLOCK Q, *D, *H, UNLOCK M
+	*E, ACQUIRE M, ACQUIRE Q, *G, *C, *F, *A, *B, RELEASE Q, *D, *H, RELEASE M
 
 But it won't see any of:
 
-	*B, *C or *D preceding LOCK M
-	*A, *B or *C following UNLOCK M
-	*F, *G or *H preceding LOCK Q
-	*E, *F or *G following UNLOCK Q
+	*B, *C or *D preceding ACQUIRE M
+	*A, *B or *C following RELEASE M
+	*F, *G or *H preceding ACQUIRE Q
+	*E, *F or *G following RELEASE Q
 
 
 However, if the following occurs:
@@ -1953,29 +1958,29 @@ through *H occur in, other than the cons
 	CPU 1				CPU 2
 	===============================	===============================
 	ACCESS_ONCE(*A) = a;
-	LOCK M		     [1]
+	ACQUIRE M		     [1]
 	ACCESS_ONCE(*B) = b;
 	ACCESS_ONCE(*C) = c;
-	UNLOCK M	     [1]
+	RELEASE M	     [1]
 	ACCESS_ONCE(*D) = d;		ACCESS_ONCE(*E) = e;
-					LOCK M		     [2]
+					ACQUIRE M		     [2]
 					smp_mb__after_unlock_lock();
 					ACCESS_ONCE(*F) = f;
 					ACCESS_ONCE(*G) = g;
-					UNLOCK M	     [2]
+					RELEASE M	     [2]
 					ACCESS_ONCE(*H) = h;
 
 CPU 3 might see:
 
-	*E, LOCK M [1], *C, *B, *A, UNLOCK M [1],
-		LOCK M [2], *H, *F, *G, UNLOCK M [2], *D
+	*E, ACQUIRE M [1], *C, *B, *A, RELEASE M [1],
+		ACQUIRE M [2], *H, *F, *G, RELEASE M [2], *D
 
 But assuming CPU 1 gets the lock first, CPU 3 won't see any of:
 
-	*B, *C, *D, *F, *G or *H preceding LOCK M [1]
-	*A, *B or *C following UNLOCK M [1]
-	*F, *G or *H preceding LOCK M [2]
-	*A, *B, *C, *E, *F or *G following UNLOCK M [2]
+	*B, *C, *D, *F, *G or *H preceding ACQUIRE M [1]
+	*A, *B or *C following RELEASE M [1]
+	*F, *G or *H preceding ACQUIRE M [2]
+	*A, *B, *C, *E, *F or *G following RELEASE M [2]
 
 Note that the smp_mb__after_unlock_lock() is critically important
 here: Without it CPU 3 might see some of the above orderings.
@@ -1983,7 +1988,7 @@ Without smp_mb__after_unlock_lock(), the
 to be seen in order unless CPU 3 holds lock M.
 
 
-LOCKS VS I/O ACCESSES
+ACQUIRES VS I/O ACCESSES
 ---------------------
 
 Under certain circumstances (especially involving NUMA), I/O accesses within
@@ -2202,13 +2207,13 @@ about the state (old or new) implies an
 	/* when succeeds (returns 1) */
 	atomic_add_unless();		atomic_long_add_unless();
 
-These are used for such things as implementing LOCK-class and UNLOCK-class
+These are used for such things as implementing ACQUIRE-class and RELEASE-class
 operations and adjusting reference counters towards object destruction, and as
 such the implicit memory barrier effects are necessary.
 
 
 The following operations are potential problems as they do _not_ imply memory
-barriers, but might be used for implementing such things as UNLOCK-class
+barriers, but might be used for implementing such things as RELEASE-class
 operations:
 
 	atomic_set();
@@ -2250,7 +2255,7 @@ barriers are needed or not.
 	clear_bit_unlock();
 	__clear_bit_unlock();
 
-These implement LOCK-class and UNLOCK-class operations. These should be used in
+These implement ACQUIRE-class and RELEASE-class operations. These should be used in
 preference to other operations when implementing locking primitives, because
 their implementations can be optimised on many architectures.
 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 2/4] arch: Move smp_mb__{before,after}_atomic_{inc,dec}.h into asm/atomic.h
  2013-12-13 14:56 [PATCH 0/4] arch: Introduce smp_load_acquire() and smp_store_release() Peter Zijlstra
  2013-12-13 14:56 ` [PATCH 1/4] doc: Rename LOCK/UNLOCK to ACQUIRE/RELEASE Peter Zijlstra
@ 2013-12-13 14:56 ` Peter Zijlstra
  2013-12-16 20:13   ` Paul E. McKenney
                     ` (2 more replies)
  2013-12-13 14:57 ` [PATCH 3/4] arch: Clean up asm/barrier.h implementations using asm-generic/barrier.h Peter Zijlstra
  2013-12-13 14:57 ` [PATCH 4/4] arch: Introduce smp_load_acquire(), smp_store_release() Peter Zijlstra
  3 siblings, 3 replies; 28+ messages in thread
From: Peter Zijlstra @ 2013-12-13 14:56 UTC (permalink / raw)
  To: linux-arch, linux-kernel
  Cc: geert, paulmck, torvalds, VICTORK, oleg, anton, benh, fweisbec,
	mathieu.desnoyers, michael, mikey, linux, schwidefsky,
	heiko.carstens, tony.luck, Peter Zijlstra

[-- Attachment #1: peterz-arch-move-atomic-barriers.patch --]
[-- Type: text/plain, Size: 2364 bytes --]

Move the barriers functions that depend on the atomic implementation
into the atomic implementation.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
 arch/arc/include/asm/atomic.h      |    5 +++++
 arch/arc/include/asm/barrier.h     |    5 -----
 arch/hexagon/include/asm/atomic.h  |    6 +++++-
 arch/hexagon/include/asm/barrier.h |    4 ----
 4 files changed, 10 insertions(+), 10 deletions(-)

--- a/arch/arc/include/asm/atomic.h
+++ b/arch/arc/include/asm/atomic.h
@@ -190,6 +190,11 @@ static inline void atomic_clear_mask(uns
 
 #endif /* !CONFIG_ARC_HAS_LLSC */
 
+#define smp_mb__before_atomic_dec()	barrier()
+#define smp_mb__after_atomic_dec()	barrier()
+#define smp_mb__before_atomic_inc()	barrier()
+#define smp_mb__after_atomic_inc()	barrier()
+
 /**
  * __atomic_add_unless - add unless the number is a given value
  * @v: pointer of type atomic_t
--- a/arch/arc/include/asm/barrier.h
+++ b/arch/arc/include/asm/barrier.h
@@ -30,11 +30,6 @@
 #define smp_wmb()       barrier()
 #endif
 
-#define smp_mb__before_atomic_dec()	barrier()
-#define smp_mb__after_atomic_dec()	barrier()
-#define smp_mb__before_atomic_inc()	barrier()
-#define smp_mb__after_atomic_inc()	barrier()
-
 #define smp_read_barrier_depends()      do { } while (0)
 
 #endif
--- a/arch/hexagon/include/asm/atomic.h
+++ b/arch/hexagon/include/asm/atomic.h
@@ -160,8 +160,12 @@ static inline int __atomic_add_unless(at
 #define atomic_sub_and_test(i, v) (atomic_sub_return(i, (v)) == 0)
 #define atomic_add_negative(i, v) (atomic_add_return(i, (v)) < 0)
 
-
 #define atomic_inc_return(v) (atomic_add_return(1, v))
 #define atomic_dec_return(v) (atomic_sub_return(1, v))
 
+#define smp_mb__before_atomic_dec()	barrier()
+#define smp_mb__after_atomic_dec()	barrier()
+#define smp_mb__before_atomic_inc()	barrier()
+#define smp_mb__after_atomic_inc()	barrier()
+
 #endif
--- a/arch/hexagon/include/asm/barrier.h
+++ b/arch/hexagon/include/asm/barrier.h
@@ -29,10 +29,6 @@
 #define smp_read_barrier_depends()	barrier()
 #define smp_wmb()			barrier()
 #define smp_mb()			barrier()
-#define smp_mb__before_atomic_dec()	barrier()
-#define smp_mb__after_atomic_dec()	barrier()
-#define smp_mb__before_atomic_inc()	barrier()
-#define smp_mb__after_atomic_inc()	barrier()
 
 /*  Set a value and use a memory barrier.  Used by the scheduler somewhere.  */
 #define set_mb(var, value) \

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 3/4] arch: Clean up asm/barrier.h implementations using asm-generic/barrier.h
  2013-12-13 14:56 [PATCH 0/4] arch: Introduce smp_load_acquire() and smp_store_release() Peter Zijlstra
  2013-12-13 14:56 ` [PATCH 1/4] doc: Rename LOCK/UNLOCK to ACQUIRE/RELEASE Peter Zijlstra
  2013-12-13 14:56 ` [PATCH 2/4] arch: Move smp_mb__{before,after}_atomic_{inc,dec}.h into asm/atomic.h Peter Zijlstra
@ 2013-12-13 14:57 ` Peter Zijlstra
  2013-12-13 19:16   ` Geert Uytterhoeven
                     ` (2 more replies)
  2013-12-13 14:57 ` [PATCH 4/4] arch: Introduce smp_load_acquire(), smp_store_release() Peter Zijlstra
  3 siblings, 3 replies; 28+ messages in thread
From: Peter Zijlstra @ 2013-12-13 14:57 UTC (permalink / raw)
  To: linux-arch, linux-kernel
  Cc: geert, paulmck, torvalds, VICTORK, oleg, anton, benh, fweisbec,
	mathieu.desnoyers, michael, mikey, linux, schwidefsky,
	heiko.carstens, tony.luck, Peter Zijlstra

[-- Attachment #1: peter_zijlstra-arch-generic-barrier.patch --]
[-- Type: text/plain, Size: 23972 bytes --]

We're going to be adding a few new barrier primitives, and in order to
avoid endless duplication make more agressive use of
asm-generic/barrier.h.

Change the asm-generic/barrier.h such that it allows partial barrier
definitions and fills out the rest with defaults.

There are a few architectures (m32r, m68k) that could probably
do away with their barrier.h file entirely but are kept for now due to
their unconventional nop() implementation.

Cc: Michael Ellerman <michael@ellerman.id.au>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Victor Kaplansky <VICTORK@il.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Reviewed-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
 arch/alpha/include/asm/barrier.h      |   25 ++--------
 arch/arc/include/asm/Kbuild           |    1 
 arch/avr32/include/asm/barrier.h      |   17 ++-----
 arch/blackfin/include/asm/barrier.h   |   18 -------
 arch/cris/include/asm/Kbuild          |    1 
 arch/cris/include/asm/barrier.h       |   25 ----------
 arch/frv/include/asm/barrier.h        |    8 ---
 arch/hexagon/include/asm/Kbuild       |    1 
 arch/m32r/include/asm/barrier.h       |   80 ----------------------------------
 arch/m68k/include/asm/barrier.h       |   14 -----
 arch/microblaze/include/asm/Kbuild    |    1 
 arch/microblaze/include/asm/barrier.h |   27 -----------
 arch/mn10300/include/asm/Kbuild       |    1 
 arch/mn10300/include/asm/barrier.h    |   37 ---------------
 arch/parisc/include/asm/Kbuild        |    1 
 arch/parisc/include/asm/barrier.h     |   35 --------------
 arch/score/include/asm/Kbuild         |    1 
 arch/score/include/asm/barrier.h      |   16 ------
 arch/sh/include/asm/barrier.h         |   21 +-------
 arch/sparc/include/asm/barrier_32.h   |   12 -----
 arch/tile/include/asm/barrier.h       |   68 ----------------------------
 arch/unicore32/include/asm/barrier.h  |   11 ----
 arch/xtensa/include/asm/barrier.h     |    9 ---
 include/asm-generic/barrier.h         |   42 ++++++++++++-----
 24 files changed, 58 insertions(+), 414 deletions(-)

--- a/arch/alpha/include/asm/barrier.h
+++ b/arch/alpha/include/asm/barrier.h
@@ -3,33 +3,18 @@
 
 #include <asm/compiler.h>
 
-#define mb() \
-__asm__ __volatile__("mb": : :"memory")
+#define mb()	__asm__ __volatile__("mb": : :"memory")
+#define rmb()	__asm__ __volatile__("mb": : :"memory")
+#define wmb()	__asm__ __volatile__("wmb": : :"memory")
 
-#define rmb() \
-__asm__ __volatile__("mb": : :"memory")
-
-#define wmb() \
-__asm__ __volatile__("wmb": : :"memory")
-
-#define read_barrier_depends() \
-__asm__ __volatile__("mb": : :"memory")
+#define read_barrier_depends() __asm__ __volatile__("mb": : :"memory")
 
 #ifdef CONFIG_SMP
 #define __ASM_SMP_MB	"\tmb\n"
-#define smp_mb()	mb()
-#define smp_rmb()	rmb()
-#define smp_wmb()	wmb()
-#define smp_read_barrier_depends()	read_barrier_depends()
 #else
 #define __ASM_SMP_MB
-#define smp_mb()	barrier()
-#define smp_rmb()	barrier()
-#define smp_wmb()	barrier()
-#define smp_read_barrier_depends()	do { } while (0)
 #endif
 
-#define set_mb(var, value) \
-do { var = value; mb(); } while (0)
+#include <asm-generic/barrier.h>
 
 #endif		/* __BARRIER_H */
--- a/arch/arc/include/asm/Kbuild
+++ b/arch/arc/include/asm/Kbuild
@@ -47,3 +47,4 @@ generic-y += user.h
 generic-y += vga.h
 generic-y += xor.h
 generic-y += preempt.h
+generic-y += barrier.h
--- a/arch/avr32/include/asm/barrier.h
+++ b/arch/avr32/include/asm/barrier.h
@@ -8,22 +8,15 @@
 #ifndef __ASM_AVR32_BARRIER_H
 #define __ASM_AVR32_BARRIER_H
 
-#define nop()			asm volatile("nop")
-
-#define mb()			asm volatile("" : : : "memory")
-#define rmb()			mb()
-#define wmb()			asm volatile("sync 0" : : : "memory")
-#define read_barrier_depends()  do { } while(0)
-#define set_mb(var, value)      do { var = value; mb(); } while(0)
+/*
+ * Weirdest thing ever.. no full barrier, but it has a write barrier!
+ */
+#define wmb()	asm volatile("sync 0" : : : "memory")
 
 #ifdef CONFIG_SMP
 # error "The AVR32 port does not support SMP"
-#else
-# define smp_mb()		barrier()
-# define smp_rmb()		barrier()
-# define smp_wmb()		barrier()
-# define smp_read_barrier_depends() do { } while(0)
 #endif
 
+#include <asm-generic/barrier.h>
 
 #endif /* __ASM_AVR32_BARRIER_H */
--- a/arch/blackfin/include/asm/barrier.h
+++ b/arch/blackfin/include/asm/barrier.h
@@ -23,26 +23,10 @@
 # define rmb()	do { barrier(); smp_check_barrier(); } while (0)
 # define wmb()	do { barrier(); smp_mark_barrier(); } while (0)
 # define read_barrier_depends()	do { barrier(); smp_check_barrier(); } while (0)
-#else
-# define mb()	barrier()
-# define rmb()	barrier()
-# define wmb()	barrier()
-# define read_barrier_depends()	do { } while (0)
 #endif
 
-#else /* !CONFIG_SMP */
-
-#define mb()	barrier()
-#define rmb()	barrier()
-#define wmb()	barrier()
-#define read_barrier_depends()	do { } while (0)
-
 #endif /* !CONFIG_SMP */
 
-#define smp_mb()  mb()
-#define smp_rmb() rmb()
-#define smp_wmb() wmb()
-#define set_mb(var, value) do { var = value; mb(); } while (0)
-#define smp_read_barrier_depends()	read_barrier_depends()
+#include <asm-generic/barrier.h>
 
 #endif /* _BLACKFIN_BARRIER_H */
--- a/arch/cris/include/asm/Kbuild
+++ b/arch/cris/include/asm/Kbuild
@@ -12,3 +12,4 @@ generic-y += trace_clock.h
 generic-y += vga.h
 generic-y += xor.h
 generic-y += preempt.h
+generic-y += barrier.h
--- a/arch/cris/include/asm/barrier.h
+++ /dev/null
@@ -1,25 +0,0 @@
-#ifndef __ASM_CRIS_BARRIER_H
-#define __ASM_CRIS_BARRIER_H
-
-#define nop() __asm__ __volatile__ ("nop");
-
-#define barrier() __asm__ __volatile__("": : :"memory")
-#define mb() barrier()
-#define rmb() mb()
-#define wmb() mb()
-#define read_barrier_depends() do { } while(0)
-#define set_mb(var, value)  do { var = value; mb(); } while (0)
-
-#ifdef CONFIG_SMP
-#define smp_mb()        mb()
-#define smp_rmb()       rmb()
-#define smp_wmb()       wmb()
-#define smp_read_barrier_depends()     read_barrier_depends()
-#else
-#define smp_mb()        barrier()
-#define smp_rmb()       barrier()
-#define smp_wmb()       barrier()
-#define smp_read_barrier_depends()     do { } while(0)
-#endif
-
-#endif /* __ASM_CRIS_BARRIER_H */
--- a/arch/frv/include/asm/barrier.h
+++ b/arch/frv/include/asm/barrier.h
@@ -17,13 +17,7 @@
 #define mb()			asm volatile ("membar" : : :"memory")
 #define rmb()			asm volatile ("membar" : : :"memory")
 #define wmb()			asm volatile ("membar" : : :"memory")
-#define read_barrier_depends()	do { } while (0)
 
-#define smp_mb()			barrier()
-#define smp_rmb()			barrier()
-#define smp_wmb()			barrier()
-#define smp_read_barrier_depends()	do {} while(0)
-#define set_mb(var, value) \
-	do { var = (value); barrier(); } while (0)
+#include <asm-generic/barrier.h>
 
 #endif /* _ASM_BARRIER_H */
--- a/arch/hexagon/include/asm/Kbuild
+++ b/arch/hexagon/include/asm/Kbuild
@@ -54,3 +54,4 @@ generic-y += ucontext.h
 generic-y += unaligned.h
 generic-y += xor.h
 generic-y += preempt.h
+generic-y += barrier.h
--- a/arch/m32r/include/asm/barrier.h
+++ b/arch/m32r/include/asm/barrier.h
@@ -11,84 +11,6 @@
 
 #define nop()  __asm__ __volatile__ ("nop" : : )
 
-/*
- * Memory barrier.
- *
- * mb() prevents loads and stores being reordered across this point.
- * rmb() prevents loads being reordered across this point.
- * wmb() prevents stores being reordered across this point.
- */
-#define mb()   barrier()
-#define rmb()  mb()
-#define wmb()  mb()
-
-/**
- * read_barrier_depends - Flush all pending reads that subsequents reads
- * depend on.
- *
- * No data-dependent reads from memory-like regions are ever reordered
- * over this barrier.  All reads preceding this primitive are guaranteed
- * to access memory (but not necessarily other CPUs' caches) before any
- * reads following this primitive that depend on the data return by
- * any of the preceding reads.  This primitive is much lighter weight than
- * rmb() on most CPUs, and is never heavier weight than is
- * rmb().
- *
- * These ordering constraints are respected by both the local CPU
- * and the compiler.
- *
- * Ordering is not guaranteed by anything other than these primitives,
- * not even by data dependencies.  See the documentation for
- * memory_barrier() for examples and URLs to more information.
- *
- * For example, the following code would force ordering (the initial
- * value of "a" is zero, "b" is one, and "p" is "&a"):
- *
- * <programlisting>
- *      CPU 0                           CPU 1
- *
- *      b = 2;
- *      memory_barrier();
- *      p = &b;                         q = p;
- *                                      read_barrier_depends();
- *                                      d = *q;
- * </programlisting>
- *
- *
- * because the read of "*q" depends on the read of "p" and these
- * two reads are separated by a read_barrier_depends().  However,
- * the following code, with the same initial values for "a" and "b":
- *
- * <programlisting>
- *      CPU 0                           CPU 1
- *
- *      a = 2;
- *      memory_barrier();
- *      b = 3;                          y = b;
- *                                      read_barrier_depends();
- *                                      x = a;
- * </programlisting>
- *
- * does not enforce ordering, since there is no data dependency between
- * the read of "a" and the read of "b".  Therefore, on some CPUs, such
- * as Alpha, "y" could be set to 3 and "x" to 0.  Use rmb()
- * in cases like this where there are no data dependencies.
- **/
-
-#define read_barrier_depends()	do { } while (0)
-
-#ifdef CONFIG_SMP
-#define smp_mb()	mb()
-#define smp_rmb()	rmb()
-#define smp_wmb()	wmb()
-#define smp_read_barrier_depends()	read_barrier_depends()
-#define set_mb(var, value) do { (void) xchg(&var, value); } while (0)
-#else
-#define smp_mb()	barrier()
-#define smp_rmb()	barrier()
-#define smp_wmb()	barrier()
-#define smp_read_barrier_depends()	do { } while (0)
-#define set_mb(var, value) do { var = value; barrier(); } while (0)
-#endif
+#include <asm-generic/barrier.h>
 
 #endif /* _ASM_M32R_BARRIER_H */
--- a/arch/m68k/include/asm/barrier.h
+++ b/arch/m68k/include/asm/barrier.h
@@ -1,20 +1,8 @@
 #ifndef _M68K_BARRIER_H
 #define _M68K_BARRIER_H
 
-/*
- * Force strict CPU ordering.
- * Not really required on m68k...
- */
 #define nop()		do { asm volatile ("nop"); barrier(); } while (0)
-#define mb()		barrier()
-#define rmb()		barrier()
-#define wmb()		barrier()
-#define read_barrier_depends()	((void)0)
-#define set_mb(var, value)	({ (var) = (value); wmb(); })
 
-#define smp_mb()	barrier()
-#define smp_rmb()	barrier()
-#define smp_wmb()	barrier()
-#define smp_read_barrier_depends()	((void)0)
+#include <asm-generic/barrier.h>
 
 #endif /* _M68K_BARRIER_H */
--- a/arch/microblaze/include/asm/Kbuild
+++ b/arch/microblaze/include/asm/Kbuild
@@ -4,3 +4,4 @@ generic-y += exec.h
 generic-y += trace_clock.h
 generic-y += syscalls.h
 generic-y += preempt.h
+generic-y += barrier.h
--- a/arch/microblaze/include/asm/barrier.h
+++ /dev/null
@@ -1,27 +0,0 @@
-/*
- * Copyright (C) 2006 Atmark Techno, Inc.
- *
- * This file is subject to the terms and conditions of the GNU General Public
- * License. See the file "COPYING" in the main directory of this archive
- * for more details.
- */
-
-#ifndef _ASM_MICROBLAZE_BARRIER_H
-#define _ASM_MICROBLAZE_BARRIER_H
-
-#define nop()                  asm volatile ("nop")
-
-#define smp_read_barrier_depends()	do {} while (0)
-#define read_barrier_depends()		do {} while (0)
-
-#define mb()			barrier()
-#define rmb()			mb()
-#define wmb()			mb()
-#define set_mb(var, value)	do { var = value; mb(); } while (0)
-#define set_wmb(var, value)	do { var = value; wmb(); } while (0)
-
-#define smp_mb()		mb()
-#define smp_rmb()		rmb()
-#define smp_wmb()		wmb()
-
-#endif /* _ASM_MICROBLAZE_BARRIER_H */
--- a/arch/mn10300/include/asm/Kbuild
+++ b/arch/mn10300/include/asm/Kbuild
@@ -3,3 +3,4 @@ generic-y += clkdev.h
 generic-y += exec.h
 generic-y += trace_clock.h
 generic-y += preempt.h
+generic-y += barrier.h
--- a/arch/mn10300/include/asm/barrier.h
+++ /dev/null
@@ -1,37 +0,0 @@
-/* MN10300 memory barrier definitions
- *
- * Copyright (C) 2007 Red Hat, Inc. All Rights Reserved.
- * Written by David Howells (dhowells@redhat.com)
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public Licence
- * as published by the Free Software Foundation; either version
- * 2 of the Licence, or (at your option) any later version.
- */
-#ifndef _ASM_BARRIER_H
-#define _ASM_BARRIER_H
-
-#define nop()	asm volatile ("nop")
-
-#define mb()	asm volatile ("": : :"memory")
-#define rmb()	mb()
-#define wmb()	asm volatile ("": : :"memory")
-
-#ifdef CONFIG_SMP
-#define smp_mb()	mb()
-#define smp_rmb()	rmb()
-#define smp_wmb()	wmb()
-#define set_mb(var, value)  do { xchg(&var, value); } while (0)
-#else  /* CONFIG_SMP */
-#define smp_mb()	barrier()
-#define smp_rmb()	barrier()
-#define smp_wmb()	barrier()
-#define set_mb(var, value)  do { var = value;  mb(); } while (0)
-#endif /* CONFIG_SMP */
-
-#define set_wmb(var, value) do { var = value; wmb(); } while (0)
-
-#define read_barrier_depends()		do {} while (0)
-#define smp_read_barrier_depends()	do {} while (0)
-
-#endif /* _ASM_BARRIER_H */
--- a/arch/parisc/include/asm/Kbuild
+++ b/arch/parisc/include/asm/Kbuild
@@ -5,3 +5,4 @@ generic-y += word-at-a-time.h auxvec.h u
 	  poll.h xor.h clkdev.h exec.h
 generic-y += trace_clock.h
 generic-y += preempt.h
+generic-y += barrier.h
--- a/arch/parisc/include/asm/barrier.h
+++ /dev/null
@@ -1,35 +0,0 @@
-#ifndef __PARISC_BARRIER_H
-#define __PARISC_BARRIER_H
-
-/*
-** This is simply the barrier() macro from linux/kernel.h but when serial.c
-** uses tqueue.h uses smp_mb() defined using barrier(), linux/kernel.h
-** hasn't yet been included yet so it fails, thus repeating the macro here.
-**
-** PA-RISC architecture allows for weakly ordered memory accesses although
-** none of the processors use it. There is a strong ordered bit that is
-** set in the O-bit of the page directory entry. Operating systems that
-** can not tolerate out of order accesses should set this bit when mapping
-** pages. The O-bit of the PSW should also be set to 1 (I don't believe any
-** of the processor implemented the PSW O-bit). The PCX-W ERS states that
-** the TLB O-bit is not implemented so the page directory does not need to
-** have the O-bit set when mapping pages (section 3.1). This section also
-** states that the PSW Y, Z, G, and O bits are not implemented.
-** So it looks like nothing needs to be done for parisc-linux (yet).
-** (thanks to chada for the above comment -ggg)
-**
-** The __asm__ op below simple prevents gcc/ld from reordering
-** instructions across the mb() "call".
-*/
-#define mb()		__asm__ __volatile__("":::"memory")	/* barrier() */
-#define rmb()		mb()
-#define wmb()		mb()
-#define smp_mb()	mb()
-#define smp_rmb()	mb()
-#define smp_wmb()	mb()
-#define smp_read_barrier_depends()	do { } while(0)
-#define read_barrier_depends()		do { } while(0)
-
-#define set_mb(var, value)		do { var = value; mb(); } while (0)
-
-#endif /* __PARISC_BARRIER_H */
--- a/arch/score/include/asm/Kbuild
+++ b/arch/score/include/asm/Kbuild
@@ -5,3 +5,4 @@ generic-y += clkdev.h
 generic-y += trace_clock.h
 generic-y += xor.h
 generic-y += preempt.h
+generic-y += barrier.h
--- a/arch/score/include/asm/barrier.h
+++ /dev/null
@@ -1,16 +0,0 @@
-#ifndef _ASM_SCORE_BARRIER_H
-#define _ASM_SCORE_BARRIER_H
-
-#define mb()		barrier()
-#define rmb()		barrier()
-#define wmb()		barrier()
-#define smp_mb()	barrier()
-#define smp_rmb()	barrier()
-#define smp_wmb()	barrier()
-
-#define read_barrier_depends()		do {} while (0)
-#define smp_read_barrier_depends()	do {} while (0)
-
-#define set_mb(var, value) 		do {var = value; wmb(); } while (0)
-
-#endif /* _ASM_SCORE_BARRIER_H */
--- a/arch/sh/include/asm/barrier.h
+++ b/arch/sh/include/asm/barrier.h
@@ -26,29 +26,14 @@
 #if defined(CONFIG_CPU_SH4A) || defined(CONFIG_CPU_SH5)
 #define mb()		__asm__ __volatile__ ("synco": : :"memory")
 #define rmb()		mb()
-#define wmb()		__asm__ __volatile__ ("synco": : :"memory")
+#define wmb()		mb()
 #define ctrl_barrier()	__icbi(PAGE_OFFSET)
-#define read_barrier_depends()	do { } while(0)
 #else
-#define mb()		__asm__ __volatile__ ("": : :"memory")
-#define rmb()		mb()
-#define wmb()		__asm__ __volatile__ ("": : :"memory")
 #define ctrl_barrier()	__asm__ __volatile__ ("nop;nop;nop;nop;nop;nop;nop;nop")
-#define read_barrier_depends()	do { } while(0)
-#endif
-
-#ifdef CONFIG_SMP
-#define smp_mb()	mb()
-#define smp_rmb()	rmb()
-#define smp_wmb()	wmb()
-#define smp_read_barrier_depends()	read_barrier_depends()
-#else
-#define smp_mb()	barrier()
-#define smp_rmb()	barrier()
-#define smp_wmb()	barrier()
-#define smp_read_barrier_depends()	do { } while(0)
 #endif
 
 #define set_mb(var, value) do { (void)xchg(&var, value); } while (0)
 
+#include <asm-generic/barrier.h>
+
 #endif /* __ASM_SH_BARRIER_H */
--- a/arch/sparc/include/asm/barrier_32.h
+++ b/arch/sparc/include/asm/barrier_32.h
@@ -1,15 +1,7 @@
 #ifndef __SPARC_BARRIER_H
 #define __SPARC_BARRIER_H
 
-/* XXX Change this if we ever use a PSO mode kernel. */
-#define mb()	__asm__ __volatile__ ("" : : : "memory")
-#define rmb()	mb()
-#define wmb()	mb()
-#define read_barrier_depends()	do { } while(0)
-#define set_mb(__var, __value)  do { __var = __value; mb(); } while(0)
-#define smp_mb()	__asm__ __volatile__("":::"memory")
-#define smp_rmb()	__asm__ __volatile__("":::"memory")
-#define smp_wmb()	__asm__ __volatile__("":::"memory")
-#define smp_read_barrier_depends()	do { } while(0)
+#include <asm/processor.h> /* for nop() */
+#include <asm-generic/barrier.h>
 
 #endif /* !(__SPARC_BARRIER_H) */
--- a/arch/tile/include/asm/barrier.h
+++ b/arch/tile/include/asm/barrier.h
@@ -22,59 +22,6 @@
 #include <arch/spr_def.h>
 #include <asm/timex.h>
 
-/*
- * read_barrier_depends - Flush all pending reads that subsequents reads
- * depend on.
- *
- * No data-dependent reads from memory-like regions are ever reordered
- * over this barrier.  All reads preceding this primitive are guaranteed
- * to access memory (but not necessarily other CPUs' caches) before any
- * reads following this primitive that depend on the data return by
- * any of the preceding reads.  This primitive is much lighter weight than
- * rmb() on most CPUs, and is never heavier weight than is
- * rmb().
- *
- * These ordering constraints are respected by both the local CPU
- * and the compiler.
- *
- * Ordering is not guaranteed by anything other than these primitives,
- * not even by data dependencies.  See the documentation for
- * memory_barrier() for examples and URLs to more information.
- *
- * For example, the following code would force ordering (the initial
- * value of "a" is zero, "b" is one, and "p" is "&a"):
- *
- * <programlisting>
- *	CPU 0				CPU 1
- *
- *	b = 2;
- *	memory_barrier();
- *	p = &b;				q = p;
- *					read_barrier_depends();
- *					d = *q;
- * </programlisting>
- *
- * because the read of "*q" depends on the read of "p" and these
- * two reads are separated by a read_barrier_depends().  However,
- * the following code, with the same initial values for "a" and "b":
- *
- * <programlisting>
- *	CPU 0				CPU 1
- *
- *	a = 2;
- *	memory_barrier();
- *	b = 3;				y = b;
- *					read_barrier_depends();
- *					x = a;
- * </programlisting>
- *
- * does not enforce ordering, since there is no data dependency between
- * the read of "a" and the read of "b".  Therefore, on some CPUs, such
- * as Alpha, "y" could be set to 3 and "x" to 0.  Use rmb()
- * in cases like this where there are no data dependencies.
- */
-#define read_barrier_depends()	do { } while (0)
-
 #define __sync()	__insn_mf()
 
 #include <hv/syscall_public.h>
@@ -125,20 +72,7 @@ mb_incoherent(void)
 #define mb()		fast_mb()
 #define iob()		fast_iob()
 
-#ifdef CONFIG_SMP
-#define smp_mb()	mb()
-#define smp_rmb()	rmb()
-#define smp_wmb()	wmb()
-#define smp_read_barrier_depends()	read_barrier_depends()
-#else
-#define smp_mb()	barrier()
-#define smp_rmb()	barrier()
-#define smp_wmb()	barrier()
-#define smp_read_barrier_depends()	do { } while (0)
-#endif
-
-#define set_mb(var, value) \
-	do { var = value; mb(); } while (0)
+#include <asm-generic/barrier.h>
 
 #endif /* !__ASSEMBLY__ */
 #endif /* _ASM_TILE_BARRIER_H */
--- a/arch/unicore32/include/asm/barrier.h
+++ b/arch/unicore32/include/asm/barrier.h
@@ -14,15 +14,6 @@
 #define dsb() __asm__ __volatile__ ("" : : : "memory")
 #define dmb() __asm__ __volatile__ ("" : : : "memory")
 
-#define mb()				barrier()
-#define rmb()				barrier()
-#define wmb()				barrier()
-#define smp_mb()			barrier()
-#define smp_rmb()			barrier()
-#define smp_wmb()			barrier()
-#define read_barrier_depends()		do { } while (0)
-#define smp_read_barrier_depends()	do { } while (0)
-
-#define set_mb(var, value)		do { var = value; smp_mb(); } while (0)
+#include <asm-generic/barrier.h>
 
 #endif /* __UNICORE_BARRIER_H__ */
--- a/arch/xtensa/include/asm/barrier.h
+++ b/arch/xtensa/include/asm/barrier.h
@@ -9,21 +9,14 @@
 #ifndef _XTENSA_SYSTEM_H
 #define _XTENSA_SYSTEM_H
 
-#define smp_read_barrier_depends() do { } while(0)
-#define read_barrier_depends() do { } while(0)
-
 #define mb()  ({ __asm__ __volatile__("memw" : : : "memory"); })
 #define rmb() barrier()
 #define wmb() mb()
 
 #ifdef CONFIG_SMP
 #error smp_* not defined
-#else
-#define smp_mb()	barrier()
-#define smp_rmb()	barrier()
-#define smp_wmb()	barrier()
 #endif
 
-#define set_mb(var, value)	do { var = value; mb(); } while (0)
+#include <asm-generic/barrier.h>
 
 #endif /* _XTENSA_SYSTEM_H */
--- a/include/asm-generic/barrier.h
+++ b/include/asm-generic/barrier.h
@@ -1,4 +1,5 @@
-/* Generic barrier definitions, based on MN10300 definitions.
+/*
+ * Generic barrier definitions, originally based on MN10300 definitions.
  *
  * It should be possible to use these on really simple architectures,
  * but it serves more as a starting point for new ports.
@@ -16,35 +17,50 @@
 
 #ifndef __ASSEMBLY__
 
-#define nop() asm volatile ("nop")
+#include <linux/compiler.h>
+
+#ifndef nop
+#define nop()	asm volatile ("nop")
+#endif
 
 /*
- * Force strict CPU ordering.
- * And yes, this is required on UP too when we're talking
- * to devices.
+ * Force strict CPU ordering. And yes, this is required on UP too when we're
+ * talking to devices.
  *
- * This implementation only contains a compiler barrier.
+ * Fall back to compiler barriers if nothing better is provided.
  */
 
-#define mb()	asm volatile ("": : :"memory")
+#ifndef mb
+#define mb()	barrier()
+#endif
+
+#ifndef rmb
 #define rmb()	mb()
-#define wmb()	asm volatile ("": : :"memory")
+#endif
+
+#ifndef wmb
+#define wmb()	mb()
+#endif
+
+#ifndef read_barrier_depends
+#define read_barrier_depends()		do { } while (0)
+#endif
 
 #ifdef CONFIG_SMP
 #define smp_mb()	mb()
 #define smp_rmb()	rmb()
 #define smp_wmb()	wmb()
+#define smp_read_barrier_depends()	read_barrier_depends()
 #else
 #define smp_mb()	barrier()
 #define smp_rmb()	barrier()
 #define smp_wmb()	barrier()
+#define smp_read_barrier_depends()	do { } while (0)
 #endif
 
-#define set_mb(var, value)  do { var = value;  mb(); } while (0)
-#define set_wmb(var, value) do { var = value; wmb(); } while (0)
-
-#define read_barrier_depends()		do {} while (0)
-#define smp_read_barrier_depends()	do {} while (0)
+#ifndef set_mb
+#define set_mb(var, value)  do { (var) = (value); mb(); } while (0)
+#endif
 
 #endif /* !__ASSEMBLY__ */
 #endif /* __ASM_GENERIC_BARRIER_H */

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 4/4] arch: Introduce smp_load_acquire(), smp_store_release()
  2013-12-13 14:56 [PATCH 0/4] arch: Introduce smp_load_acquire() and smp_store_release() Peter Zijlstra
                   ` (2 preceding siblings ...)
  2013-12-13 14:57 ` [PATCH 3/4] arch: Clean up asm/barrier.h implementations using asm-generic/barrier.h Peter Zijlstra
@ 2013-12-13 14:57 ` Peter Zijlstra
  2013-12-16 16:40   ` Will Deacon
                     ` (2 more replies)
  3 siblings, 3 replies; 28+ messages in thread
From: Peter Zijlstra @ 2013-12-13 14:57 UTC (permalink / raw)
  To: linux-arch, linux-kernel
  Cc: geert, paulmck, torvalds, VICTORK, oleg, anton, benh, fweisbec,
	mathieu.desnoyers, michael, mikey, linux, schwidefsky,
	heiko.carstens, tony.luck, Peter Zijlstra, Will Deacon

[-- Attachment #1: peter_zijlstra-arch-sr-la.patch --]
[-- Type: text/plain, Size: 12357 bytes --]

A number of situations currently require the heavyweight smp_mb(),
even though there is no need to order prior stores against later
loads.  Many architectures have much cheaper ways to handle these
situations, but the Linux kernel currently has no portable way
to make use of them.

This commit therefore supplies smp_load_acquire() and
smp_store_release() to remedy this situation.  The new
smp_load_acquire() primitive orders the specified load against
any subsequent reads or writes, while the new smp_store_release()
primitive orders the specifed store against any prior reads or
writes.  These primitives allow array-based circular FIFOs to be
implemented without an smp_mb(), and also allow a theoretical
hole in rcu_assign_pointer() to be closed at no additional
expense on most architectures.

In addition, the RCU experience transitioning from explicit
smp_read_barrier_depends() and smp_wmb() to rcu_dereference()
and rcu_assign_pointer(), respectively resulted in substantial
improvements in readability.  It therefore seems likely that
replacing other explicit barriers with smp_load_acquire() and
smp_store_release() will provide similar benefits.  It appears
that roughly half of the explicit barriers in core kernel code
might be so replaced.

[Changelog by PaulMck]

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Michael Ellerman <michael@ellerman.id.au>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Victor Kaplansky <VICTORK@il.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
 arch/arm/include/asm/barrier.h      |   15 ++++++++++
 arch/arm64/include/asm/barrier.h    |   50 ++++++++++++++++++++++++++++++++++++
 arch/ia64/include/asm/barrier.h     |   23 ++++++++++++++++
 arch/metag/include/asm/barrier.h    |   15 ++++++++++
 arch/mips/include/asm/barrier.h     |   15 ++++++++++
 arch/powerpc/include/asm/barrier.h  |   21 ++++++++++++++-
 arch/s390/include/asm/barrier.h     |   15 ++++++++++
 arch/sparc/include/asm/barrier_64.h |   15 ++++++++++
 arch/x86/include/asm/barrier.h      |   43 ++++++++++++++++++++++++++++++
 include/asm-generic/barrier.h       |   15 ++++++++++
 include/linux/compiler.h            |    9 ++++++
 11 files changed, 234 insertions(+), 2 deletions(-)

--- a/arch/arm/include/asm/barrier.h
+++ b/arch/arm/include/asm/barrier.h
@@ -59,6 +59,21 @@
 #define smp_wmb()	dmb(ishst)
 #endif
 
+#define smp_store_release(p, v)						\
+do {									\
+	compiletime_assert_atomic_type(*p);				\
+	smp_mb();							\
+	ACCESS_ONCE(*p) = (v);						\
+} while (0)
+
+#define smp_load_acquire(p)						\
+({									\
+	typeof(*p) ___p1 = ACCESS_ONCE(*p);				\
+	compiletime_assert_atomic_type(*p);				\
+	smp_mb();							\
+	___p1;								\
+})
+
 #define read_barrier_depends()		do { } while(0)
 #define smp_read_barrier_depends()	do { } while(0)
 
--- a/arch/arm64/include/asm/barrier.h
+++ b/arch/arm64/include/asm/barrier.h
@@ -35,10 +35,60 @@
 #define smp_mb()	barrier()
 #define smp_rmb()	barrier()
 #define smp_wmb()	barrier()
+
+#define smp_store_release(p, v)						\
+do {									\
+	compiletime_assert_atomic_type(*p);				\
+	smp_mb();							\
+	ACCESS_ONCE(*p) = (v);						\
+} while (0)
+
+#define smp_load_acquire(p)						\
+({									\
+	typeof(*p) ___p1 = ACCESS_ONCE(*p);				\
+	compiletime_assert_atomic_type(*p);				\
+	smp_mb();							\
+	___p1;								\
+})
+
 #else
+
 #define smp_mb()	asm volatile("dmb ish" : : : "memory")
 #define smp_rmb()	asm volatile("dmb ishld" : : : "memory")
 #define smp_wmb()	asm volatile("dmb ishst" : : : "memory")
+
+#define smp_store_release(p, v)						\
+do {									\
+	compiletime_assert_atomic_type(*p);				\
+	switch (sizeof(*p)) {						\
+	case 4:								\
+		asm volatile ("stlr %w1, %0"				\
+				: "=Q" (*p) : "r" (v) : "memory");	\
+		break;							\
+	case 8:								\
+		asm volatile ("stlr %1, %0"				\
+				: "=Q" (*p) : "r" (v) : "memory");	\
+		break;							\
+	}								\
+} while (0)
+
+#define smp_load_acquire(p)						\
+({									\
+	typeof(*p) ___p1;						\
+	compiletime_assert_atomic_type(*p);				\
+	switch (sizeof(*p)) {						\
+	case 4:								\
+		asm volatile ("ldar %w0, %1"				\
+			: "=r" (___p1) : "Q" (*p) : "memory");		\
+		break;							\
+	case 8:								\
+		asm volatile ("ldar %0, %1"				\
+			: "=r" (___p1) : "Q" (*p) : "memory");		\
+		break;							\
+	}								\
+	___p1;								\
+})
+
 #endif
 
 #define read_barrier_depends()		do { } while(0)
--- a/arch/ia64/include/asm/barrier.h
+++ b/arch/ia64/include/asm/barrier.h
@@ -45,14 +45,37 @@
 # define smp_rmb()	rmb()
 # define smp_wmb()	wmb()
 # define smp_read_barrier_depends()	read_barrier_depends()
+
 #else
+
 # define smp_mb()	barrier()
 # define smp_rmb()	barrier()
 # define smp_wmb()	barrier()
 # define smp_read_barrier_depends()	do { } while(0)
+
 #endif
 
 /*
+ * IA64 GCC turns volatile stores into st.rel and volatile loads into ld.acq no
+ * need for asm trickery!
+ */
+
+#define smp_store_release(p, v)						\
+do {									\
+	compiletime_assert_atomic_type(*p);				\
+	barrier();							\
+	ACCESS_ONCE(*p) = (v);						\
+} while (0)
+
+#define smp_load_acquire(p)						\
+({									\
+	typeof(*p) ___p1 = ACCESS_ONCE(*p);				\
+	compiletime_assert_atomic_type(*p);				\
+	barrier();							\
+	___p1;								\
+})
+
+/*
  * XXX check on this ---I suspect what Linus really wants here is
  * acquire vs release semantics but we can't discuss this stuff with
  * Linus just yet.  Grrr...
--- a/arch/metag/include/asm/barrier.h
+++ b/arch/metag/include/asm/barrier.h
@@ -82,4 +82,19 @@ static inline void fence(void)
 #define smp_read_barrier_depends()     do { } while (0)
 #define set_mb(var, value) do { var = value; smp_mb(); } while (0)
 
+#define smp_store_release(p, v)						\
+do {									\
+	compiletime_assert_atomic_type(*p);				\
+	smp_mb();							\
+	ACCESS_ONCE(*p) = (v);						\
+} while (0)
+
+#define smp_load_acquire(p)						\
+({									\
+	typeof(*p) ___p1 = ACCESS_ONCE(*p);				\
+	compiletime_assert_atomic_type(*p);				\
+	smp_mb();							\
+	___p1;								\
+})
+
 #endif /* _ASM_METAG_BARRIER_H */
--- a/arch/mips/include/asm/barrier.h
+++ b/arch/mips/include/asm/barrier.h
@@ -180,4 +180,19 @@
 #define nudge_writes() mb()
 #endif
 
+#define smp_store_release(p, v)						\
+do {									\
+	compiletime_assert_atomic_type(*p);				\
+	smp_mb();							\
+	ACCESS_ONCE(*p) = (v);						\
+} while (0)
+
+#define smp_load_acquire(p)						\
+({									\
+	typeof(*p) ___p1 = ACCESS_ONCE(*p);				\
+	compiletime_assert_atomic_type(*p);				\
+	smp_mb();							\
+	___p1;								\
+})
+
 #endif /* __ASM_BARRIER_H */
--- a/arch/powerpc/include/asm/barrier.h
+++ b/arch/powerpc/include/asm/barrier.h
@@ -45,11 +45,15 @@
 #    define SMPWMB      eieio
 #endif
 
+#define __lwsync()	__asm__ __volatile__ (stringify_in_c(LWSYNC) : : :"memory")
+
 #define smp_mb()	mb()
-#define smp_rmb()	__asm__ __volatile__ (stringify_in_c(LWSYNC) : : :"memory")
+#define smp_rmb()	__lwsync()
 #define smp_wmb()	__asm__ __volatile__ (stringify_in_c(SMPWMB) : : :"memory")
 #define smp_read_barrier_depends()	read_barrier_depends()
 #else
+#define __lwsync()	barrier()
+
 #define smp_mb()	barrier()
 #define smp_rmb()	barrier()
 #define smp_wmb()	barrier()
@@ -65,4 +69,19 @@
 #define data_barrier(x)	\
 	asm volatile("twi 0,%0,0; isync" : : "r" (x) : "memory");
 
+#define smp_store_release(p, v)						\
+do {									\
+	compiletime_assert_atomic_type(*p);				\
+	__lwsync();							\
+	ACCESS_ONCE(*p) = (v);						\
+} while (0)
+
+#define smp_load_acquire(p)						\
+({									\
+	typeof(*p) ___p1 = ACCESS_ONCE(*p);				\
+	compiletime_assert_atomic_type(*p);				\
+	__lwsync();							\
+	___p1;								\
+})
+
 #endif /* _ASM_POWERPC_BARRIER_H */
--- a/arch/s390/include/asm/barrier.h
+++ b/arch/s390/include/asm/barrier.h
@@ -32,4 +32,19 @@
 
 #define set_mb(var, value)		do { var = value; mb(); } while (0)
 
+#define smp_store_release(p, v)						\
+do {									\
+	compiletime_assert_atomic_type(*p);				\
+	barrier();							\
+	ACCESS_ONCE(*p) = (v);						\
+} while (0)
+
+#define smp_load_acquire(p)						\
+({									\
+	typeof(*p) ___p1 = ACCESS_ONCE(*p);				\
+	compiletime_assert_atomic_type(*p);				\
+	barrier();							\
+	___p1;								\
+})
+
 #endif /* __ASM_BARRIER_H */
--- a/arch/sparc/include/asm/barrier_64.h
+++ b/arch/sparc/include/asm/barrier_64.h
@@ -53,4 +53,19 @@ do {	__asm__ __volatile__("ba,pt	%%xcc,
 
 #define smp_read_barrier_depends()	do { } while(0)
 
+#define smp_store_release(p, v)						\
+do {									\
+	compiletime_assert_atomic_type(*p);				\
+	barrier();							\
+	ACCESS_ONCE(*p) = (v);						\
+} while (0)
+
+#define smp_load_acquire(p)						\
+({									\
+	typeof(*p) ___p1 = ACCESS_ONCE(*p);				\
+	compiletime_assert_atomic_type(*p);				\
+	barrier();							\
+	___p1;								\
+})
+
 #endif /* !(__SPARC64_BARRIER_H) */
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -92,12 +92,53 @@
 #endif
 #define smp_read_barrier_depends()	read_barrier_depends()
 #define set_mb(var, value) do { (void)xchg(&var, value); } while (0)
-#else
+#else /* !SMP */
 #define smp_mb()	barrier()
 #define smp_rmb()	barrier()
 #define smp_wmb()	barrier()
 #define smp_read_barrier_depends()	do { } while (0)
 #define set_mb(var, value) do { var = value; barrier(); } while (0)
+#endif /* SMP */
+
+#if defined(CONFIG_X86_OOSTORE) || defined(CONFIG_X86_PPRO_FENCE)
+
+/*
+ * For either of these options x86 doesn't have a strong TSO memory
+ * model and we should fall back to full barriers.
+ */
+
+#define smp_store_release(p, v)						\
+do {									\
+	compiletime_assert_atomic_type(*p);				\
+	smp_mb();							\
+	ACCESS_ONCE(*p) = (v);						\
+} while (0)
+
+#define smp_load_acquire(p)						\
+({									\
+	typeof(*p) ___p1 = ACCESS_ONCE(*p);				\
+	compiletime_assert_atomic_type(*p);				\
+	smp_mb();							\
+	___p1;								\
+})
+
+#else /* regular x86 TSO memory ordering */
+
+#define smp_store_release(p, v)						\
+do {									\
+	compiletime_assert_atomic_type(*p);				\
+	barrier();							\
+	ACCESS_ONCE(*p) = (v);						\
+} while (0)
+
+#define smp_load_acquire(p)						\
+({									\
+	typeof(*p) ___p1 = ACCESS_ONCE(*p);				\
+	compiletime_assert_atomic_type(*p);				\
+	barrier();							\
+	___p1;								\
+})
+
 #endif
 
 /*
--- a/include/asm-generic/barrier.h
+++ b/include/asm-generic/barrier.h
@@ -62,5 +62,20 @@
 #define set_mb(var, value)  do { (var) = (value); mb(); } while (0)
 #endif
 
+#define smp_store_release(p, v)						\
+do {									\
+	compiletime_assert_atomic_type(*p);				\
+	smp_mb();							\
+	ACCESS_ONCE(*p) = (v);						\
+} while (0)
+
+#define smp_load_acquire(p)						\
+({									\
+	typeof(*p) ___p1 = ACCESS_ONCE(*p);				\
+	compiletime_assert_atomic_type(*p);				\
+	smp_mb();							\
+	___p1;								\
+})
+
 #endif /* !__ASSEMBLY__ */
 #endif /* __ASM_GENERIC_BARRIER_H */
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -298,6 +298,11 @@ void ftrace_likely_update(struct ftrace_
 # define __same_type(a, b) __builtin_types_compatible_p(typeof(a), typeof(b))
 #endif
 
+/* Is this type a native word size -- useful for atomic operations */
+#ifndef __native_word
+# define __native_word(t) (sizeof(t) == sizeof(int) || sizeof(t) == sizeof(long))
+#endif
+
 /* Compile time object size, -1 for unknown */
 #ifndef __compiletime_object_size
 # define __compiletime_object_size(obj) -1
@@ -337,6 +342,10 @@ void ftrace_likely_update(struct ftrace_
 #define compiletime_assert(condition, msg) \
 	_compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
 
+#define compiletime_assert_atomic_type(t)				\
+	compiletime_assert(__native_word(t),				\
+		"Need native word sized stores/loads for atomicity.")
+
 /*
  * Prevent the compiler from merging or refetching accesses.  The compiler
  * is also forbidden from reordering successive instances of ACCESS_ONCE(),

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/4] arch: Clean up asm/barrier.h implementations using asm-generic/barrier.h
  2013-12-13 14:57 ` [PATCH 3/4] arch: Clean up asm/barrier.h implementations using asm-generic/barrier.h Peter Zijlstra
@ 2013-12-13 19:16   ` Geert Uytterhoeven
  2013-12-13 19:17     ` Fwd: " Geert Uytterhoeven
  2013-12-13 19:53     ` Peter Zijlstra
  2013-12-16 20:14   ` Paul E. McKenney
  2014-01-12 18:43   ` [tip:core/locking] arch: Clean up asm/ barrier.h " tip-bot for Peter Zijlstra
  2 siblings, 2 replies; 28+ messages in thread
From: Geert Uytterhoeven @ 2013-12-13 19:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linux-Arch, linux-kernel, Paul McKenney, Linus Torvalds,
	Victor Kaplansky, Oleg Nesterov, Anton Blanchard,
	Benjamin Herrenschmidt, Frédéric Weisbecker,
	Mathieu Desnoyers, Michael Ellerman, Michael Neuling,
	Russell King, Martin Schwidefsky, Heiko Carstens, Tony Luck

Hi Peter,

On Fri, Dec 13, 2013 at 3:57 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> There are a few architectures (m32r, m68k) that could probably
> do away with their barrier.h file entirely but are kept for now due to
> their unconventional nop() implementation.

m32r uses `__asm__ __volatile__ ("nop" : : )' instead of
`asm volatile ("nop")'. Isn't the latter just a shorthand of the former?

m68k has an additional `barrier()', which I think is unneeded.
I looked at the asm output for drivers/block/ataflop.c and
drivers/video/macfb.c, with and without the barrier.
For ataflop, there were no differences.
For macfb, removing the barrier allowed the compiler to keep the base
addresses of the MMIO registers in registers, but there were no changes
to the register accesses themselves.

So IMHO both m32r and m68k can switch to the asm-generic version.

I'm wondering whether we can just drop nop() from the in-kernel API?
There are few users outside arch/:

drivers/block/ataflop.c
drivers/net/ethernet/amd/am79c961a.c
drivers/tty/serial/crisv10.c
drivers/video/macfb.c

From a quick glance, all of them (ab)use nop() instead of one of the other
primitives. For the macfb case, it seems to be due to missing ordering
inside the nubus_*() I/O accessors.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Fwd: [PATCH 3/4] arch: Clean up asm/barrier.h implementations using asm-generic/barrier.h
  2013-12-13 19:16   ` Geert Uytterhoeven
@ 2013-12-13 19:17     ` Geert Uytterhoeven
  2013-12-13 19:53     ` Peter Zijlstra
  1 sibling, 0 replies; 28+ messages in thread
From: Geert Uytterhoeven @ 2013-12-13 19:17 UTC (permalink / raw)
  To: linux-kernel

Woops, misspelling of kernel.org in the original email...

---------- Forwarded message ----------
From: Geert Uytterhoeven <geert@linux-m68k.org>
Date: Fri, Dec 13, 2013 at 8:16 PM
Subject: Re: [PATCH 3/4] arch: Clean up asm/barrier.h implementations
using asm-generic/barrier.h
To: Peter Zijlstra <peterz@infradead.org>
Cc: Linux-Arch <linux-arch@vger.kernel.org>,
linux-kernel@vger.kerne.org, Paul McKenney
<paulmck@linux.vnet.ibm.com>, Linus Torvalds
<torvalds@linux-foundation.org>, Victor Kaplansky
<VICTORK@il.ibm.com>, Oleg Nesterov <oleg@redhat.com>, Anton Blanchard
<anton@samba.org>, Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Frédéric Weisbecker <fweisbec@gmail.com>, Mathieu Desnoyers
<mathieu.desnoyers@polymtl.ca>, Michael Ellerman
<michael@ellerman.id.au>, Michael Neuling <mikey@neuling.org>, Russell
King <linux@arm.linux.org.uk>, Martin Schwidefsky
<schwidefsky@de.ibm.com>, Heiko Carstens <heiko.carstens@de.ibm.com>,
Tony Luck <tony.luck@intel.com>


Hi Peter,

On Fri, Dec 13, 2013 at 3:57 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> There are a few architectures (m32r, m68k) that could probably
> do away with their barrier.h file entirely but are kept for now due to
> their unconventional nop() implementation.

m32r uses `__asm__ __volatile__ ("nop" : : )' instead of
`asm volatile ("nop")'. Isn't the latter just a shorthand of the former?

m68k has an additional `barrier()', which I think is unneeded.
I looked at the asm output for drivers/block/ataflop.c and
drivers/video/macfb.c, with and without the barrier.
For ataflop, there were no differences.
For macfb, removing the barrier allowed the compiler to keep the base
addresses of the MMIO registers in registers, but there were no changes
to the register accesses themselves.

So IMHO both m32r and m68k can switch to the asm-generic version.

I'm wondering whether we can just drop nop() from the in-kernel API?
There are few users outside arch/:

drivers/block/ataflop.c
drivers/net/ethernet/amd/am79c961a.c
drivers/tty/serial/crisv10.c
drivers/video/macfb.c

>From a quick glance, all of them (ab)use nop() instead of one of the other
primitives. For the macfb case, it seems to be due to missing ordering
inside the nubus_*() I/O accessors.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds


-- 
Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/4] arch: Clean up asm/barrier.h implementations using asm-generic/barrier.h
  2013-12-13 19:16   ` Geert Uytterhoeven
  2013-12-13 19:17     ` Fwd: " Geert Uytterhoeven
@ 2013-12-13 19:53     ` Peter Zijlstra
  1 sibling, 0 replies; 28+ messages in thread
From: Peter Zijlstra @ 2013-12-13 19:53 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Linux-Arch, linux-kernel, Paul McKenney, Linus Torvalds,
	Victor Kaplansky, Oleg Nesterov, Anton Blanchard,
	Benjamin Herrenschmidt, Frédéric Weisbecker,
	Mathieu Desnoyers, Michael Ellerman, Michael Neuling,
	Russell King, Martin Schwidefsky, Heiko Carstens, Tony Luck

On Fri, Dec 13, 2013 at 08:16:28PM +0100, Geert Uytterhoeven wrote:
> Hi Peter,
> 
> On Fri, Dec 13, 2013 at 3:57 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > There are a few architectures (m32r, m68k) that could probably
> > do away with their barrier.h file entirely but are kept for now due to
> > their unconventional nop() implementation.
> 
> m32r uses `__asm__ __volatile__ ("nop" : : )' instead of
> `asm volatile ("nop")'. Isn't the latter just a shorthand of the former?

Yes.. however it might be there's a gcc bug on that platform or whatnot
that they care about.

> m68k has an additional `barrier()', which I think is unneeded.

Strictly speaking yes, but I didn't want to grep through the m68k arch
code to figure out if someone somehow started to rely on that.

In both cases, I was too lazy to really figure it out and hoped the
relevant maintainers would see and clean it up for me ;-)

> I looked at the asm output for drivers/block/ataflop.c and
> drivers/video/macfb.c, with and without the barrier.
> For ataflop, there were no differences.
> For macfb, removing the barrier allowed the compiler to keep the base
> addresses of the MMIO registers in registers, but there were no changes
> to the register accesses themselves.
> 
> So IMHO both m32r and m68k can switch to the asm-generic version.

I'm all for it.

> I'm wondering whether we can just drop nop() from the in-kernel API?
> There are few users outside arch/:
> 
> drivers/block/ataflop.c
> drivers/net/ethernet/amd/am79c961a.c
> drivers/tty/serial/crisv10.c
> drivers/video/macfb.c
> 
> From a quick glance, all of them (ab)use nop() instead of one of the other
> primitives. For the macfb case, it seems to be due to missing ordering
> inside the nubus_*() I/O accessors.

Yeah, its not something that really makes sense outside of arch/; in a
previous thread someone also asked why that thing is in barrier.h to
begin with.

That said, even if you remove the generic nop() usage, arch/ is likely
to still need it so we couldn't just remove it, we'd have to stuff it
someplace else -- which might be a good idea anyhow.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/4] arch: Introduce smp_load_acquire(), smp_store_release()
  2013-12-13 14:57 ` [PATCH 4/4] arch: Introduce smp_load_acquire(), smp_store_release() Peter Zijlstra
@ 2013-12-16 16:40   ` Will Deacon
  2013-12-17  9:07     ` Peter Zijlstra
  2013-12-16 21:37   ` Paul E. McKenney
  2014-01-12 18:43   ` [tip:core/locking] " tip-bot for Peter Zijlstra
  2 siblings, 1 reply; 28+ messages in thread
From: Will Deacon @ 2013-12-16 16:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-arch, linux-kernel, geert, paulmck, torvalds, VICTORK,
	oleg, anton, benh, fweisbec, mathieu.desnoyers, michael, mikey,
	linux, schwidefsky, heiko.carstens, tony.luck

Hi Peter,

On Fri, Dec 13, 2013 at 02:57:01PM +0000, Peter Zijlstra wrote:
> A number of situations currently require the heavyweight smp_mb(),
> even though there is no need to order prior stores against later
> loads.  Many architectures have much cheaper ways to handle these
> situations, but the Linux kernel currently has no portable way
> to make use of them.
> 
> This commit therefore supplies smp_load_acquire() and
> smp_store_release() to remedy this situation.  The new
> smp_load_acquire() primitive orders the specified load against
> any subsequent reads or writes, while the new smp_store_release()
> primitive orders the specifed store against any prior reads or
> writes.  These primitives allow array-based circular FIFOs to be
> implemented without an smp_mb(), and also allow a theoretical
> hole in rcu_assign_pointer() to be closed at no additional
> expense on most architectures.
> 
> In addition, the RCU experience transitioning from explicit
> smp_read_barrier_depends() and smp_wmb() to rcu_dereference()
> and rcu_assign_pointer(), respectively resulted in substantial
> improvements in readability.  It therefore seems likely that
> replacing other explicit barriers with smp_load_acquire() and
> smp_store_release() will provide similar benefits.  It appears
> that roughly half of the explicit barriers in core kernel code
> might be so replaced.

[...]

For arm/arm64:

  Acked-by: Will Deacon <will.deacon@arm.com>

A couple of other comments:

> --- a/arch/ia64/include/asm/barrier.h
> +++ b/arch/ia64/include/asm/barrier.h
> @@ -45,14 +45,37 @@
>  # define smp_rmb()     rmb()
>  # define smp_wmb()     wmb()
>  # define smp_read_barrier_depends()    read_barrier_depends()
> +
>  #else
> +
>  # define smp_mb()      barrier()
>  # define smp_rmb()     barrier()
>  # define smp_wmb()     barrier()
>  # define smp_read_barrier_depends()    do { } while(0)
> +
>  #endif
> 
>  /*
> + * IA64 GCC turns volatile stores into st.rel and volatile loads into ld.acq no
> + * need for asm trickery!
> + */
> +
> +#define smp_store_release(p, v)                                                \
> +do {                                                                   \
> +       compiletime_assert_atomic_type(*p);                             \
> +       barrier();                                                      \
> +       ACCESS_ONCE(*p) = (v);                                          \
> +} while (0)
> +
> +#define smp_load_acquire(p)                                            \
> +({                                                                     \
> +       typeof(*p) ___p1 = ACCESS_ONCE(*p);                             \
> +       compiletime_assert_atomic_type(*p);                             \
> +       barrier();                                                      \
> +       ___p1;                                                          \
> +})
> +
> +/*
>   * XXX check on this ---I suspect what Linus really wants here is
>   * acquire vs release semantics but we can't discuss this stuff with
>   * Linus just yet.  Grrr...

I guess you can remove this comment now?

> --- a/include/linux/compiler.h
> +++ b/include/linux/compiler.h
> @@ -298,6 +298,11 @@ void ftrace_likely_update(struct ftrace_
>  # define __same_type(a, b) __builtin_types_compatible_p(typeof(a), typeof(b))
>  #endif
> 
> +/* Is this type a native word size -- useful for atomic operations */
> +#ifndef __native_word
> +# define __native_word(t) (sizeof(t) == sizeof(int) || sizeof(t) == sizeof(long))
> +#endif
> +
>  /* Compile time object size, -1 for unknown */
>  #ifndef __compiletime_object_size
>  # define __compiletime_object_size(obj) -1
> @@ -337,6 +342,10 @@ void ftrace_likely_update(struct ftrace_
>  #define compiletime_assert(condition, msg) \
>         _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
> 
> +#define compiletime_assert_atomic_type(t)                              \
> +       compiletime_assert(__native_word(t),                            \
> +               "Need native word sized stores/loads for atomicity.")
> +

At least for ARM, it's not so much the type that cause issues with
atomicity, but whether or not its naturally aligned, which we don't check
here...

Will

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/4] doc: Rename LOCK/UNLOCK to ACQUIRE/RELEASE
  2013-12-13 14:56 ` [PATCH 1/4] doc: Rename LOCK/UNLOCK to ACQUIRE/RELEASE Peter Zijlstra
@ 2013-12-16 20:11   ` Paul E. McKenney
  2013-12-17  9:24     ` Peter Zijlstra
  2013-12-17 10:13     ` [PATCH 1/4] doc: Rename LOCK/UNLOCK to ACQUIRE/RELEASE Peter Zijlstra
  0 siblings, 2 replies; 28+ messages in thread
From: Paul E. McKenney @ 2013-12-16 20:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-arch, linux-kernel, geert, torvalds, VICTORK, oleg, anton,
	benh, fweisbec, mathieu.desnoyers, michael, mikey, linux,
	schwidefsky, heiko.carstens, tony.luck

On Fri, Dec 13, 2013 at 03:56:58PM +0100, Peter Zijlstra wrote:
> The LOCK and UNLOCK barriers as described in our barrier document are
> generally known as ACQUIRE and RELEASE barriers in other literature.
> 
> Since we plan to introduce the acquire and release nomenclature in
> generic kernel primitives we should amend the document to avoid
> confusion as to what an acquire/release means.
> 
> Cc: Tony Luck <tony.luck@intel.com>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Michael Ellerman <michael@ellerman.id.au>
> Cc: Michael Neuling <mikey@neuling.org>
> Cc: Russell King <linux@arm.linux.org.uk>
> Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
> Cc: Victor Kaplansky <VICTORK@il.ibm.com>
> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> Reviewed-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

Still OK with my Reviewed-by, but some nits below.

							Thanx, Paul

> Signed-off-by: Peter Zijlstra <peterz@infradead.org>
> ---
>  Documentation/memory-barriers.txt |  217 +++++++++++++++++++-------------------
>  1 file changed, 111 insertions(+), 106 deletions(-)
> 
> --- a/Documentation/memory-barriers.txt
> +++ b/Documentation/memory-barriers.txt
> @@ -381,39 +381,44 @@ VARIETIES OF MEMORY BARRIER
> 
>  And a couple of implicit varieties:
> 
> - (5) LOCK operations.
> + (5) ACQUIRE operations.
> 
>       This acts as a one-way permeable barrier.  It guarantees that all memory
> -     operations after the LOCK operation will appear to happen after the LOCK
> -     operation with respect to the other components of the system.
> +     operations after the ACQUIRE operation will appear to happen after the
> +     ACQUIRE operation with respect to the other components of the system.
> +     ACQUIRE oeprations include LOCK operations and smp_load_acquire()

s/oeprations/operations/

> +     operations.
> 
> -     Memory operations that occur before a LOCK operation may appear to happen
> -     after it completes.
> +     Memory operations that occur before an ACQUIRE operation may appear to
> +     happen after it completes.
> > -     A LOCK operation should almost always be paired with an UNLOCK operation.
> +     An ACQUIRE operation should almost always be paired with a RELEASE
> +     operation.
> 
> 
> - (6) UNLOCK operations.
> + (6) RELEASE operations.
> 
>       This also acts as a one-way permeable barrier.  It guarantees that all
> -     memory operations before the UNLOCK operation will appear to happen before
> -     the UNLOCK operation with respect to the other components of the system.
> +     memory operations before the RELEASE operation will appear to happen
> +     before the RELEASE operation with respect to the other components of the
> +     system. RELEASE operations include UNLOCK operations and
> +     smp_store_release() operations.
> 
> -     Memory operations that occur after an UNLOCK operation may appear to
> +     Memory operations that occur after a RELEASE operation may appear to
>       happen before it completes.
> 
> -     The use of LOCK and UNLOCK operations generally precludes the need for
> -     other sorts of memory barrier (but note the exceptions mentioned in the
> -     subsection "MMIO write barrier").  In addition, an UNLOCK+LOCK pair
> -     is -not- guaranteed to act as a full memory barrier.  However,
> -     after a LOCK on a given lock variable, all memory accesses preceding any
> -     prior UNLOCK on that same variable are guaranteed to be visible.
> -     In other words, within a given lock variable's critical section,
> -     all accesses of all previous critical sections for that lock variable
> -     are guaranteed to have completed.
> +     The use of ACQUIRE and RELEASE operations generally precludes the need
> +     for other sorts of memory barrier (but note the exceptions mentioned in
> +     the subsection "MMIO write barrier").  In addition, a RELEASE+ACQUIRE
> +     pair is -not- guaranteed to act as a full memory barrier.  However, after
> +     an ACQUIRE on a given lock variable, all memory accesses preceding any

s/lock// -- this also applies to non-lock variables operated on by
smp_store_release() and smp_load_acquire().

> +     prior RELEASE on that same variable are guaranteed to be visible.  In
> +     other words, within a given lock variable's critical section, all
> +     accesses of all previous critical sections for that lock variable are
> +     guaranteed to have completed.
> 
> -     This means that LOCK acts as a minimal "acquire" operation and
> -     UNLOCK acts as a minimal "release" operation.
> +     This means that ACQUIRE acts as a minimal "acquire" operation and
> +     RELEASE acts as a minimal "release" operation.
> 
> 
>  Memory barriers are only required where there's a possibility of interaction
> @@ -1585,7 +1590,7 @@ CPU from reordering them.
>  	clear_bit( ... );
> 
>       This prevents memory operations before the clear leaking to after it.  See
> -     the subsection on "Locking Functions" with reference to UNLOCK operation
> +     the subsection on "Locking Functions" with reference to RELEASE operation
>       implications.
> 
>       See Documentation/atomic_ops.txt for more information.  See the "Atomic
> @@ -1619,7 +1624,7 @@ provide more substantial guarantees, but
>  of arch specific code.
> 
> 
> -LOCKING FUNCTIONS
> +ACQUIREING FUNCTIONS
>  -----------------

s/ACQUIREING/ACQUIRING/

And s/$/--/ if you want it to stay pretty.

> 
>  The Linux kernel has a number of locking constructs:
> @@ -1631,106 +1636,106 @@ LOCKING FUNCTIONS
>   (*) R/W semaphores
>   (*) RCU
> 
> -In all cases there are variants on "LOCK" operations and "UNLOCK" operations
> +In all cases there are variants on "ACQUIRE" operations and "RELEASE" operations
>  for each construct.  These operations all imply certain barriers:
> 
> - (1) LOCK operation implication:
> + (1) ACQUIRE operation implication:
> 
> -     Memory operations issued after the LOCK will be completed after the LOCK
> -     operation has completed.
> +     Memory operations issued after the ACQUIRE will be completed after the
> +     ACQUIRE operation has completed.
> 
> -     Memory operations issued before the LOCK may be completed after the
> -     LOCK operation has completed.  An smp_mb__before_spinlock(), combined
> -     with a following LOCK, orders prior loads against subsequent stores
> -     and stores and prior stores against subsequent stores.  Note that
> -     this is weaker than smp_mb()!  The smp_mb__before_spinlock()
> -     primitive is free on many architectures.
> +     Memory operations issued before the ACQUIRE may be completed after the
> +     ACQUIRE operation has completed.  An smp_mb__before_spinlock(), combined
> +     with a following ACQUIRE, orders prior loads against subsequent stores and
> +     stores and prior stores against subsequent stores.  Note that this is
> +     weaker than smp_mb()!  The smp_mb__before_spinlock() primitive is free on

Should we rename smp_mb__before_spinlock() to be smp_mb__before_acquire()?

> +     many architectures.
> 
> - (2) UNLOCK operation implication:
> + (2) RELEASE operation implication:
> 
> -     Memory operations issued before the UNLOCK will be completed before the
> -     UNLOCK operation has completed.
> +     Memory operations issued before the RELEASE will be completed before the
> +     RELEASE operation has completed.
> 
> -     Memory operations issued after the UNLOCK may be completed before the
> -     UNLOCK operation has completed.
> +     Memory operations issued after the RELEASE may be completed before the
> +     RELEASE operation has completed.
> 
> - (3) LOCK vs LOCK implication:
> + (3) ACQUIRE vs ACQUIRE implication:
> 
> -     All LOCK operations issued before another LOCK operation will be completed
> -     before that LOCK operation.
> +     All ACQUIRE operations issued before another ACQUIRE operation will be
> +     completed before that ACQUIRE operation.
> 
> - (4) LOCK vs UNLOCK implication:
> + (4) ACQUIRE vs RELEASE implication:
> 
> -     All LOCK operations issued before an UNLOCK operation will be completed
> -     before the UNLOCK operation.
> +     All ACQUIRE operations issued before a RELEASE operation will be
> +     completed before the RELEASE operation.
> 
> - (5) Failed conditional LOCK implication:
> + (5) Failed conditional ACQUIRE implication:
> 
> -     Certain variants of the LOCK operation may fail, either due to being
> +     Certain variants of the ACQUIRE operation may fail, either due to being

Hmmm...  Maybe something like the following to help bridge from ACQUIRE
to the later discussion of locks?

	Certain locking variants of the ACQUIRE operation may fail,
	either due to being

>       unable to get the lock immediately, or due to receiving an unblocked
>       signal whilst asleep waiting for the lock to become available.  Failed
>       locks do not imply any sort of barrier.
> 
> -[!] Note: one of the consequences of LOCKs and UNLOCKs being only one-way
> +[!] Note: one of the consequences of ACQUIREs and RELEASEs being only one-way

Similarly, would the following bridge better from ACQUIREs and RELEASEs
to locks?

	[!] Note: one of the consequences of lock ACQUIREs and RELEASEs
	being only one-way

We could instead define smp_load_acquire() to have an unbounded critical
section preceding the smp_load_acquire(), and smp_store_release() to have
a similarly unbounded critical section following the smp_store_release(),
but that sounds a bit obtuse.

>      barriers is that the effects of instructions outside of a critical section
>      may seep into the inside of the critical section.
> 
> -A LOCK followed by an UNLOCK may not be assumed to be full memory barrier
> -because it is possible for an access preceding the LOCK to happen after the
> -LOCK, and an access following the UNLOCK to happen before the UNLOCK, and the
> -two accesses can themselves then cross:
> +An ACQUIRE followed by a RELEASE may not be assumed to be full memory barrier
> +because it is possible for an access preceding the ACQUIRE to happen after the
> +ACQUIRE, and an access following the RELEASE to happen before the RELEASE, and
> +the two accesses can themselves then cross:
> 
>  	*A = a;
> -	LOCK M
> -	UNLOCK M
> +	ACQUIRE M
> +	RELEASE M
>  	*B = b;
> 
>  may occur as:
> 
> -	LOCK M, STORE *B, STORE *A, UNLOCK M
> +	ACQUIRE M, STORE *B, STORE *A, RELEASE M
> 
> -This same reordering can of course occur if the LOCK and UNLOCK are
> +This same reordering can of course occur if the ACQUIRE and RELEASE are

How about:

	This same reordering can of course occur if a lock's ACQUIRE
	and RELEASE are

Again, to help bridge from ACQUIRE and RELEASE to locks.

>  to the same lock variable, but only from the perspective of another
>  CPU not holding that lock.
> 
> -In short, an UNLOCK followed by a LOCK may -not- be assumed to be a full
> -memory barrier because it is possible for a preceding UNLOCK to pass a
> -later LOCK from the viewpoint of the CPU, but not from the viewpoint
> +In short, a RELEASE followed by an ACQUIRE may -not- be assumed to be a full
> +memory barrier because it is possible for a preceding RELEASE to pass a
> +later ACQUIRE from the viewpoint of the CPU, but not from the viewpoint
>  of the compiler.  Note that deadlocks cannot be introduced by this
> -interchange because if such a deadlock threatened, the UNLOCK would
> +interchange because if such a deadlock threatened, the RELEASE would
>  simply complete.
> 
> -If it is necessary for an UNLOCK-LOCK pair to produce a full barrier,
> -the LOCK can be followed by an smp_mb__after_unlock_lock() invocation.
> -This will produce a full barrier if either (a) the UNLOCK and the LOCK
> -are executed by the same CPU or task, or (b) the UNLOCK and LOCK act
> -on the same lock variable.  The smp_mb__after_unlock_lock() primitive
> -is free on many architectures.  Without smp_mb__after_unlock_lock(),
> -the critical sections corresponding to the UNLOCK and the LOCK can cross:
> +If it is necessary for a RELEASE-ACQUIRE pair to produce a full barrier, the
> +ACQUIRE can be followed by an smp_mb__after_unlock_lock() invocation.  This

And should we rename this to be smp_mb__after_release_acquire()?

(I kind of hope not, as it is a bit more typing.)

> +will produce a full barrier if either (a) the RELEASE and the ACQUIRE are
> +executed by the same CPU or task, or (b) the RELEASE and ACQUIRE act on the
> +same lock variable.  The smp_mb__after_unlock_lock() primitive is free on many

s/lock//

> +architectures.  Without smp_mb__after_unlock_lock(), the critical sections
> +corresponding to the RELEASE and the ACQUIRE can cross:
> 
>  	*A = a;
> -	UNLOCK M
> -	LOCK N
> +	RELEASE M
> +	ACQUIRE N
>  	*B = b;
> 
>  could occur as:
> 
> -	LOCK N, STORE *B, STORE *A, UNLOCK M
> +	ACQUIRE N, STORE *B, STORE *A, RELEASE M
> 
>  With smp_mb__after_unlock_lock(), they cannot, so that:
> 
>  	*A = a;
> -	UNLOCK M
> -	LOCK N
> +	RELEASE M
> +	ACQUIRE N
>  	smp_mb__after_unlock_lock();
>  	*B = b;
> 
>  will always occur as either of the following:
> 
> -	STORE *A, UNLOCK, LOCK, STORE *B
> -	STORE *A, LOCK, UNLOCK, STORE *B
> +	STORE *A, RELEASE, ACQUIRE, STORE *B
> +	STORE *A, ACQUIRE, RELEASE, STORE *B
> 
> -If the UNLOCK and LOCK were instead both operating on the same lock
> +If the RELEASE and ACQUIRE were instead both operating on the same lock
>  variable, only the first of these two alternatives can occur.
> 
>  Locks and semaphores may not provide any guarantee of ordering on UP compiled
> @@ -1745,33 +1750,33 @@ See also the section on "Inter-CPU locki
> 
>  	*A = a;
>  	*B = b;
> -	LOCK
> +	ACQUIRE
>  	*C = c;
>  	*D = d;
> -	UNLOCK
> +	RELEASE
>  	*E = e;
>  	*F = f;
> 
>  The following sequence of events is acceptable:
> 
> -	LOCK, {*F,*A}, *E, {*C,*D}, *B, UNLOCK
> +	ACQUIRE, {*F,*A}, *E, {*C,*D}, *B, RELEASE
> 
>  	[+] Note that {*F,*A} indicates a combined access.
> 
>  But none of the following are:
> 
> -	{*F,*A}, *B,	LOCK, *C, *D,	UNLOCK, *E
> -	*A, *B, *C,	LOCK, *D,	UNLOCK, *E, *F
> -	*A, *B,		LOCK, *C,	UNLOCK, *D, *E, *F
> -	*B,		LOCK, *C, *D,	UNLOCK, {*F,*A}, *E
> +	{*F,*A}, *B,	ACQUIRE, *C, *D,	RELEASE, *E
> +	*A, *B, *C,	ACQUIRE, *D,		RELEASE, *E, *F
> +	*A, *B,		ACQUIRE, *C,		RELEASE, *D, *E, *F
> +	*B,		ACQUIRE, *C, *D,	RELEASE, {*F,*A}, *E
> 
> 
> 
>  INTERRUPT DISABLING FUNCTIONS
>  -----------------------------
> 
> -Functions that disable interrupts (LOCK equivalent) and enable interrupts
> -(UNLOCK equivalent) will act as compiler barriers only.  So if memory or I/O
> +Functions that disable interrupts (ACQUIRE equivalent) and enable interrupts
> +(RELEASE equivalent) will act as compiler barriers only.  So if memory or I/O
>  barriers are required in such a situation, they must be provided from some
>  other means.
> 
> @@ -1911,7 +1916,7 @@ MISCELLANEOUS FUNCTIONS
> 
> 
>  =================================
> -INTER-CPU LOCKING BARRIER EFFECTS
> +INTER-CPU ACQUIREING BARRIER EFFECTS
>  =================================

s/ACQUIREING/ACQUIRING/

And two more "=" signs on each of the lines, if you want it to stay pretty.  ;-)

>  On SMP systems locking primitives give a more substantial form of barrier: one
> @@ -1919,7 +1924,7 @@ that does affect memory access ordering
>  conflict on any particular lock.
> 
> 
> -LOCKS VS MEMORY ACCESSES
> +ACQUIRES VS MEMORY ACCESSES
>  ------------------------

s/$/---/ if you want it to stay pretty.

> 
>  Consider the following: the system has a pair of spinlocks (M) and (Q), and
> @@ -1928,24 +1933,24 @@ Consider the following: the system has a
>  	CPU 1				CPU 2
>  	===============================	===============================
>  	ACCESS_ONCE(*A) = a;		ACCESS_ONCE(*E) = e;
> -	LOCK M				LOCK Q
> +	ACQUIRE M			ACQUIRE Q
>  	ACCESS_ONCE(*B) = b;		ACCESS_ONCE(*F) = f;
>  	ACCESS_ONCE(*C) = c;		ACCESS_ONCE(*G) = g;
> -	UNLOCK M			UNLOCK Q
> +	RELEASE M			RELEASE Q
>  	ACCESS_ONCE(*D) = d;		ACCESS_ONCE(*H) = h;
> 
>  Then there is no guarantee as to what order CPU 3 will see the accesses to *A
>  through *H occur in, other than the constraints imposed by the separate locks
>  on the separate CPUs. It might, for example, see:
> 
> -	*E, LOCK M, LOCK Q, *G, *C, *F, *A, *B, UNLOCK Q, *D, *H, UNLOCK M
> +	*E, ACQUIRE M, ACQUIRE Q, *G, *C, *F, *A, *B, RELEASE Q, *D, *H, RELEASE M
> 
>  But it won't see any of:
> 
> -	*B, *C or *D preceding LOCK M
> -	*A, *B or *C following UNLOCK M
> -	*F, *G or *H preceding LOCK Q
> -	*E, *F or *G following UNLOCK Q
> +	*B, *C or *D preceding ACQUIRE M
> +	*A, *B or *C following RELEASE M
> +	*F, *G or *H preceding ACQUIRE Q
> +	*E, *F or *G following RELEASE Q
> 
> 
>  However, if the following occurs:
> @@ -1953,29 +1958,29 @@ through *H occur in, other than the cons
>  	CPU 1				CPU 2
>  	===============================	===============================
>  	ACCESS_ONCE(*A) = a;
> -	LOCK M		     [1]
> +	ACQUIRE M		     [1]
>  	ACCESS_ONCE(*B) = b;
>  	ACCESS_ONCE(*C) = c;
> -	UNLOCK M	     [1]
> +	RELEASE M	     [1]
>  	ACCESS_ONCE(*D) = d;		ACCESS_ONCE(*E) = e;
> -					LOCK M		     [2]
> +					ACQUIRE M		     [2]
>  					smp_mb__after_unlock_lock();
>  					ACCESS_ONCE(*F) = f;
>  					ACCESS_ONCE(*G) = g;
> -					UNLOCK M	     [2]
> +					RELEASE M	     [2]
>  					ACCESS_ONCE(*H) = h;
> 
>  CPU 3 might see:
> 
> -	*E, LOCK M [1], *C, *B, *A, UNLOCK M [1],
> -		LOCK M [2], *H, *F, *G, UNLOCK M [2], *D
> +	*E, ACQUIRE M [1], *C, *B, *A, RELEASE M [1],
> +		ACQUIRE M [2], *H, *F, *G, RELEASE M [2], *D
> 
>  But assuming CPU 1 gets the lock first, CPU 3 won't see any of:
> 
> -	*B, *C, *D, *F, *G or *H preceding LOCK M [1]
> -	*A, *B or *C following UNLOCK M [1]
> -	*F, *G or *H preceding LOCK M [2]
> -	*A, *B, *C, *E, *F or *G following UNLOCK M [2]
> +	*B, *C, *D, *F, *G or *H preceding ACQUIRE M [1]
> +	*A, *B or *C following RELEASE M [1]
> +	*F, *G or *H preceding ACQUIRE M [2]
> +	*A, *B, *C, *E, *F or *G following RELEASE M [2]
> 
>  Note that the smp_mb__after_unlock_lock() is critically important
>  here: Without it CPU 3 might see some of the above orderings.
> @@ -1983,7 +1988,7 @@ Without smp_mb__after_unlock_lock(), the
>  to be seen in order unless CPU 3 holds lock M.
> 
> 
> -LOCKS VS I/O ACCESSES
> +ACQUIRES VS I/O ACCESSES
>  ---------------------

s/$/--/ if you want it to stay pretty.

> 
>  Under certain circumstances (especially involving NUMA), I/O accesses within
> @@ -2202,13 +2207,13 @@ about the state (old or new) implies an
>  	/* when succeeds (returns 1) */
>  	atomic_add_unless();		atomic_long_add_unless();
> 
> -These are used for such things as implementing LOCK-class and UNLOCK-class
> +These are used for such things as implementing ACQUIRE-class and RELEASE-class
>  operations and adjusting reference counters towards object destruction, and as
>  such the implicit memory barrier effects are necessary.
> 
> 
>  The following operations are potential problems as they do _not_ imply memory
> -barriers, but might be used for implementing such things as UNLOCK-class
> +barriers, but might be used for implementing such things as RELEASE-class
>  operations:
> 
>  	atomic_set();
> @@ -2250,7 +2255,7 @@ barriers are needed or not.
>  	clear_bit_unlock();
>  	__clear_bit_unlock();
> 
> -These implement LOCK-class and UNLOCK-class operations. These should be used in
> +These implement ACQUIRE-class and RELEASE-class operations. These should be used in
>  preference to other operations when implementing locking primitives, because
>  their implementations can be optimised on many architectures.
> 
> 
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/4] arch: Move smp_mb__{before,after}_atomic_{inc,dec}.h into asm/atomic.h
  2013-12-13 14:56 ` [PATCH 2/4] arch: Move smp_mb__{before,after}_atomic_{inc,dec}.h into asm/atomic.h Peter Zijlstra
@ 2013-12-16 20:13   ` Paul E. McKenney
  2013-12-18 13:40   ` Vineet Gupta
  2014-01-12 18:43   ` [tip:core/locking] arch: Move smp_mb__{before,after}_atomic_{inc, dec}.h " tip-bot for Peter Zijlstra
  2 siblings, 0 replies; 28+ messages in thread
From: Paul E. McKenney @ 2013-12-16 20:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-arch, linux-kernel, geert, torvalds, VICTORK, oleg, anton,
	benh, fweisbec, mathieu.desnoyers, michael, mikey, linux,
	schwidefsky, heiko.carstens, tony.luck

On Fri, Dec 13, 2013 at 03:56:59PM +0100, Peter Zijlstra wrote:
> Move the barriers functions that depend on the atomic implementation
> into the atomic implementation.
> 
> Signed-off-by: Peter Zijlstra <peterz@infradead.org>

Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

> ---
>  arch/arc/include/asm/atomic.h      |    5 +++++
>  arch/arc/include/asm/barrier.h     |    5 -----
>  arch/hexagon/include/asm/atomic.h  |    6 +++++-
>  arch/hexagon/include/asm/barrier.h |    4 ----
>  4 files changed, 10 insertions(+), 10 deletions(-)
> 
> --- a/arch/arc/include/asm/atomic.h
> +++ b/arch/arc/include/asm/atomic.h
> @@ -190,6 +190,11 @@ static inline void atomic_clear_mask(uns
> 
>  #endif /* !CONFIG_ARC_HAS_LLSC */
> 
> +#define smp_mb__before_atomic_dec()	barrier()
> +#define smp_mb__after_atomic_dec()	barrier()
> +#define smp_mb__before_atomic_inc()	barrier()
> +#define smp_mb__after_atomic_inc()	barrier()
> +
>  /**
>   * __atomic_add_unless - add unless the number is a given value
>   * @v: pointer of type atomic_t
> --- a/arch/arc/include/asm/barrier.h
> +++ b/arch/arc/include/asm/barrier.h
> @@ -30,11 +30,6 @@
>  #define smp_wmb()       barrier()
>  #endif
> 
> -#define smp_mb__before_atomic_dec()	barrier()
> -#define smp_mb__after_atomic_dec()	barrier()
> -#define smp_mb__before_atomic_inc()	barrier()
> -#define smp_mb__after_atomic_inc()	barrier()
> -
>  #define smp_read_barrier_depends()      do { } while (0)
> 
>  #endif
> --- a/arch/hexagon/include/asm/atomic.h
> +++ b/arch/hexagon/include/asm/atomic.h
> @@ -160,8 +160,12 @@ static inline int __atomic_add_unless(at
>  #define atomic_sub_and_test(i, v) (atomic_sub_return(i, (v)) == 0)
>  #define atomic_add_negative(i, v) (atomic_add_return(i, (v)) < 0)
> 
> -
>  #define atomic_inc_return(v) (atomic_add_return(1, v))
>  #define atomic_dec_return(v) (atomic_sub_return(1, v))
> 
> +#define smp_mb__before_atomic_dec()	barrier()
> +#define smp_mb__after_atomic_dec()	barrier()
> +#define smp_mb__before_atomic_inc()	barrier()
> +#define smp_mb__after_atomic_inc()	barrier()
> +
>  #endif
> --- a/arch/hexagon/include/asm/barrier.h
> +++ b/arch/hexagon/include/asm/barrier.h
> @@ -29,10 +29,6 @@
>  #define smp_read_barrier_depends()	barrier()
>  #define smp_wmb()			barrier()
>  #define smp_mb()			barrier()
> -#define smp_mb__before_atomic_dec()	barrier()
> -#define smp_mb__after_atomic_dec()	barrier()
> -#define smp_mb__before_atomic_inc()	barrier()
> -#define smp_mb__after_atomic_inc()	barrier()
> 
>  /*  Set a value and use a memory barrier.  Used by the scheduler somewhere.  */
>  #define set_mb(var, value) \
> 
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/4] arch: Clean up asm/barrier.h implementations using asm-generic/barrier.h
  2013-12-13 14:57 ` [PATCH 3/4] arch: Clean up asm/barrier.h implementations using asm-generic/barrier.h Peter Zijlstra
  2013-12-13 19:16   ` Geert Uytterhoeven
@ 2013-12-16 20:14   ` Paul E. McKenney
  2014-01-12 18:43   ` [tip:core/locking] arch: Clean up asm/ barrier.h " tip-bot for Peter Zijlstra
  2 siblings, 0 replies; 28+ messages in thread
From: Paul E. McKenney @ 2013-12-16 20:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-arch, linux-kernel, geert, torvalds, VICTORK, oleg, anton,
	benh, fweisbec, mathieu.desnoyers, michael, mikey, linux,
	schwidefsky, heiko.carstens, tony.luck

On Fri, Dec 13, 2013 at 03:57:00PM +0100, Peter Zijlstra wrote:
> We're going to be adding a few new barrier primitives, and in order to
> avoid endless duplication make more agressive use of
> asm-generic/barrier.h.
> 
> Change the asm-generic/barrier.h such that it allows partial barrier
> definitions and fills out the rest with defaults.
> 
> There are a few architectures (m32r, m68k) that could probably
> do away with their barrier.h file entirely but are kept for now due to
> their unconventional nop() implementation.
> 
> Cc: Michael Ellerman <michael@ellerman.id.au>
> Cc: Michael Neuling <mikey@neuling.org>
> Cc: Russell King <linux@arm.linux.org.uk>
> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
> Cc: Victor Kaplansky <VICTORK@il.ibm.com>
> Cc: Tony Luck <tony.luck@intel.com>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org>
> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> Reviewed-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> Signed-off-by: Peter Zijlstra <peterz@infradead.org>

Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

> ---
>  arch/alpha/include/asm/barrier.h      |   25 ++--------
>  arch/arc/include/asm/Kbuild           |    1 
>  arch/avr32/include/asm/barrier.h      |   17 ++-----
>  arch/blackfin/include/asm/barrier.h   |   18 -------
>  arch/cris/include/asm/Kbuild          |    1 
>  arch/cris/include/asm/barrier.h       |   25 ----------
>  arch/frv/include/asm/barrier.h        |    8 ---
>  arch/hexagon/include/asm/Kbuild       |    1 
>  arch/m32r/include/asm/barrier.h       |   80 ----------------------------------
>  arch/m68k/include/asm/barrier.h       |   14 -----
>  arch/microblaze/include/asm/Kbuild    |    1 
>  arch/microblaze/include/asm/barrier.h |   27 -----------
>  arch/mn10300/include/asm/Kbuild       |    1 
>  arch/mn10300/include/asm/barrier.h    |   37 ---------------
>  arch/parisc/include/asm/Kbuild        |    1 
>  arch/parisc/include/asm/barrier.h     |   35 --------------
>  arch/score/include/asm/Kbuild         |    1 
>  arch/score/include/asm/barrier.h      |   16 ------
>  arch/sh/include/asm/barrier.h         |   21 +-------
>  arch/sparc/include/asm/barrier_32.h   |   12 -----
>  arch/tile/include/asm/barrier.h       |   68 ----------------------------
>  arch/unicore32/include/asm/barrier.h  |   11 ----
>  arch/xtensa/include/asm/barrier.h     |    9 ---
>  include/asm-generic/barrier.h         |   42 ++++++++++++-----
>  24 files changed, 58 insertions(+), 414 deletions(-)
> 
> --- a/arch/alpha/include/asm/barrier.h
> +++ b/arch/alpha/include/asm/barrier.h
> @@ -3,33 +3,18 @@
> 
>  #include <asm/compiler.h>
> 
> -#define mb() \
> -__asm__ __volatile__("mb": : :"memory")
> +#define mb()	__asm__ __volatile__("mb": : :"memory")
> +#define rmb()	__asm__ __volatile__("mb": : :"memory")
> +#define wmb()	__asm__ __volatile__("wmb": : :"memory")
> 
> -#define rmb() \
> -__asm__ __volatile__("mb": : :"memory")
> -
> -#define wmb() \
> -__asm__ __volatile__("wmb": : :"memory")
> -
> -#define read_barrier_depends() \
> -__asm__ __volatile__("mb": : :"memory")
> +#define read_barrier_depends() __asm__ __volatile__("mb": : :"memory")
> 
>  #ifdef CONFIG_SMP
>  #define __ASM_SMP_MB	"\tmb\n"
> -#define smp_mb()	mb()
> -#define smp_rmb()	rmb()
> -#define smp_wmb()	wmb()
> -#define smp_read_barrier_depends()	read_barrier_depends()
>  #else
>  #define __ASM_SMP_MB
> -#define smp_mb()	barrier()
> -#define smp_rmb()	barrier()
> -#define smp_wmb()	barrier()
> -#define smp_read_barrier_depends()	do { } while (0)
>  #endif
> 
> -#define set_mb(var, value) \
> -do { var = value; mb(); } while (0)
> +#include <asm-generic/barrier.h>
> 
>  #endif		/* __BARRIER_H */
> --- a/arch/arc/include/asm/Kbuild
> +++ b/arch/arc/include/asm/Kbuild
> @@ -47,3 +47,4 @@ generic-y += user.h
>  generic-y += vga.h
>  generic-y += xor.h
>  generic-y += preempt.h
> +generic-y += barrier.h
> --- a/arch/avr32/include/asm/barrier.h
> +++ b/arch/avr32/include/asm/barrier.h
> @@ -8,22 +8,15 @@
>  #ifndef __ASM_AVR32_BARRIER_H
>  #define __ASM_AVR32_BARRIER_H
> 
> -#define nop()			asm volatile("nop")
> -
> -#define mb()			asm volatile("" : : : "memory")
> -#define rmb()			mb()
> -#define wmb()			asm volatile("sync 0" : : : "memory")
> -#define read_barrier_depends()  do { } while(0)
> -#define set_mb(var, value)      do { var = value; mb(); } while(0)
> +/*
> + * Weirdest thing ever.. no full barrier, but it has a write barrier!
> + */
> +#define wmb()	asm volatile("sync 0" : : : "memory")
> 
>  #ifdef CONFIG_SMP
>  # error "The AVR32 port does not support SMP"
> -#else
> -# define smp_mb()		barrier()
> -# define smp_rmb()		barrier()
> -# define smp_wmb()		barrier()
> -# define smp_read_barrier_depends() do { } while(0)
>  #endif
> 
> +#include <asm-generic/barrier.h>
> 
>  #endif /* __ASM_AVR32_BARRIER_H */
> --- a/arch/blackfin/include/asm/barrier.h
> +++ b/arch/blackfin/include/asm/barrier.h
> @@ -23,26 +23,10 @@
>  # define rmb()	do { barrier(); smp_check_barrier(); } while (0)
>  # define wmb()	do { barrier(); smp_mark_barrier(); } while (0)
>  # define read_barrier_depends()	do { barrier(); smp_check_barrier(); } while (0)
> -#else
> -# define mb()	barrier()
> -# define rmb()	barrier()
> -# define wmb()	barrier()
> -# define read_barrier_depends()	do { } while (0)
>  #endif
> 
> -#else /* !CONFIG_SMP */
> -
> -#define mb()	barrier()
> -#define rmb()	barrier()
> -#define wmb()	barrier()
> -#define read_barrier_depends()	do { } while (0)
> -
>  #endif /* !CONFIG_SMP */
> 
> -#define smp_mb()  mb()
> -#define smp_rmb() rmb()
> -#define smp_wmb() wmb()
> -#define set_mb(var, value) do { var = value; mb(); } while (0)
> -#define smp_read_barrier_depends()	read_barrier_depends()
> +#include <asm-generic/barrier.h>
> 
>  #endif /* _BLACKFIN_BARRIER_H */
> --- a/arch/cris/include/asm/Kbuild
> +++ b/arch/cris/include/asm/Kbuild
> @@ -12,3 +12,4 @@ generic-y += trace_clock.h
>  generic-y += vga.h
>  generic-y += xor.h
>  generic-y += preempt.h
> +generic-y += barrier.h
> --- a/arch/cris/include/asm/barrier.h
> +++ /dev/null
> @@ -1,25 +0,0 @@
> -#ifndef __ASM_CRIS_BARRIER_H
> -#define __ASM_CRIS_BARRIER_H
> -
> -#define nop() __asm__ __volatile__ ("nop");
> -
> -#define barrier() __asm__ __volatile__("": : :"memory")
> -#define mb() barrier()
> -#define rmb() mb()
> -#define wmb() mb()
> -#define read_barrier_depends() do { } while(0)
> -#define set_mb(var, value)  do { var = value; mb(); } while (0)
> -
> -#ifdef CONFIG_SMP
> -#define smp_mb()        mb()
> -#define smp_rmb()       rmb()
> -#define smp_wmb()       wmb()
> -#define smp_read_barrier_depends()     read_barrier_depends()
> -#else
> -#define smp_mb()        barrier()
> -#define smp_rmb()       barrier()
> -#define smp_wmb()       barrier()
> -#define smp_read_barrier_depends()     do { } while(0)
> -#endif
> -
> -#endif /* __ASM_CRIS_BARRIER_H */
> --- a/arch/frv/include/asm/barrier.h
> +++ b/arch/frv/include/asm/barrier.h
> @@ -17,13 +17,7 @@
>  #define mb()			asm volatile ("membar" : : :"memory")
>  #define rmb()			asm volatile ("membar" : : :"memory")
>  #define wmb()			asm volatile ("membar" : : :"memory")
> -#define read_barrier_depends()	do { } while (0)
> 
> -#define smp_mb()			barrier()
> -#define smp_rmb()			barrier()
> -#define smp_wmb()			barrier()
> -#define smp_read_barrier_depends()	do {} while(0)
> -#define set_mb(var, value) \
> -	do { var = (value); barrier(); } while (0)
> +#include <asm-generic/barrier.h>
> 
>  #endif /* _ASM_BARRIER_H */
> --- a/arch/hexagon/include/asm/Kbuild
> +++ b/arch/hexagon/include/asm/Kbuild
> @@ -54,3 +54,4 @@ generic-y += ucontext.h
>  generic-y += unaligned.h
>  generic-y += xor.h
>  generic-y += preempt.h
> +generic-y += barrier.h
> --- a/arch/m32r/include/asm/barrier.h
> +++ b/arch/m32r/include/asm/barrier.h
> @@ -11,84 +11,6 @@
> 
>  #define nop()  __asm__ __volatile__ ("nop" : : )
> 
> -/*
> - * Memory barrier.
> - *
> - * mb() prevents loads and stores being reordered across this point.
> - * rmb() prevents loads being reordered across this point.
> - * wmb() prevents stores being reordered across this point.
> - */
> -#define mb()   barrier()
> -#define rmb()  mb()
> -#define wmb()  mb()
> -
> -/**
> - * read_barrier_depends - Flush all pending reads that subsequents reads
> - * depend on.
> - *
> - * No data-dependent reads from memory-like regions are ever reordered
> - * over this barrier.  All reads preceding this primitive are guaranteed
> - * to access memory (but not necessarily other CPUs' caches) before any
> - * reads following this primitive that depend on the data return by
> - * any of the preceding reads.  This primitive is much lighter weight than
> - * rmb() on most CPUs, and is never heavier weight than is
> - * rmb().
> - *
> - * These ordering constraints are respected by both the local CPU
> - * and the compiler.
> - *
> - * Ordering is not guaranteed by anything other than these primitives,
> - * not even by data dependencies.  See the documentation for
> - * memory_barrier() for examples and URLs to more information.
> - *
> - * For example, the following code would force ordering (the initial
> - * value of "a" is zero, "b" is one, and "p" is "&a"):
> - *
> - * <programlisting>
> - *      CPU 0                           CPU 1
> - *
> - *      b = 2;
> - *      memory_barrier();
> - *      p = &b;                         q = p;
> - *                                      read_barrier_depends();
> - *                                      d = *q;
> - * </programlisting>
> - *
> - *
> - * because the read of "*q" depends on the read of "p" and these
> - * two reads are separated by a read_barrier_depends().  However,
> - * the following code, with the same initial values for "a" and "b":
> - *
> - * <programlisting>
> - *      CPU 0                           CPU 1
> - *
> - *      a = 2;
> - *      memory_barrier();
> - *      b = 3;                          y = b;
> - *                                      read_barrier_depends();
> - *                                      x = a;
> - * </programlisting>
> - *
> - * does not enforce ordering, since there is no data dependency between
> - * the read of "a" and the read of "b".  Therefore, on some CPUs, such
> - * as Alpha, "y" could be set to 3 and "x" to 0.  Use rmb()
> - * in cases like this where there are no data dependencies.
> - **/
> -
> -#define read_barrier_depends()	do { } while (0)
> -
> -#ifdef CONFIG_SMP
> -#define smp_mb()	mb()
> -#define smp_rmb()	rmb()
> -#define smp_wmb()	wmb()
> -#define smp_read_barrier_depends()	read_barrier_depends()
> -#define set_mb(var, value) do { (void) xchg(&var, value); } while (0)
> -#else
> -#define smp_mb()	barrier()
> -#define smp_rmb()	barrier()
> -#define smp_wmb()	barrier()
> -#define smp_read_barrier_depends()	do { } while (0)
> -#define set_mb(var, value) do { var = value; barrier(); } while (0)
> -#endif
> +#include <asm-generic/barrier.h>
> 
>  #endif /* _ASM_M32R_BARRIER_H */
> --- a/arch/m68k/include/asm/barrier.h
> +++ b/arch/m68k/include/asm/barrier.h
> @@ -1,20 +1,8 @@
>  #ifndef _M68K_BARRIER_H
>  #define _M68K_BARRIER_H
> 
> -/*
> - * Force strict CPU ordering.
> - * Not really required on m68k...
> - */
>  #define nop()		do { asm volatile ("nop"); barrier(); } while (0)
> -#define mb()		barrier()
> -#define rmb()		barrier()
> -#define wmb()		barrier()
> -#define read_barrier_depends()	((void)0)
> -#define set_mb(var, value)	({ (var) = (value); wmb(); })
> 
> -#define smp_mb()	barrier()
> -#define smp_rmb()	barrier()
> -#define smp_wmb()	barrier()
> -#define smp_read_barrier_depends()	((void)0)
> +#include <asm-generic/barrier.h>
> 
>  #endif /* _M68K_BARRIER_H */
> --- a/arch/microblaze/include/asm/Kbuild
> +++ b/arch/microblaze/include/asm/Kbuild
> @@ -4,3 +4,4 @@ generic-y += exec.h
>  generic-y += trace_clock.h
>  generic-y += syscalls.h
>  generic-y += preempt.h
> +generic-y += barrier.h
> --- a/arch/microblaze/include/asm/barrier.h
> +++ /dev/null
> @@ -1,27 +0,0 @@
> -/*
> - * Copyright (C) 2006 Atmark Techno, Inc.
> - *
> - * This file is subject to the terms and conditions of the GNU General Public
> - * License. See the file "COPYING" in the main directory of this archive
> - * for more details.
> - */
> -
> -#ifndef _ASM_MICROBLAZE_BARRIER_H
> -#define _ASM_MICROBLAZE_BARRIER_H
> -
> -#define nop()                  asm volatile ("nop")
> -
> -#define smp_read_barrier_depends()	do {} while (0)
> -#define read_barrier_depends()		do {} while (0)
> -
> -#define mb()			barrier()
> -#define rmb()			mb()
> -#define wmb()			mb()
> -#define set_mb(var, value)	do { var = value; mb(); } while (0)
> -#define set_wmb(var, value)	do { var = value; wmb(); } while (0)
> -
> -#define smp_mb()		mb()
> -#define smp_rmb()		rmb()
> -#define smp_wmb()		wmb()
> -
> -#endif /* _ASM_MICROBLAZE_BARRIER_H */
> --- a/arch/mn10300/include/asm/Kbuild
> +++ b/arch/mn10300/include/asm/Kbuild
> @@ -3,3 +3,4 @@ generic-y += clkdev.h
>  generic-y += exec.h
>  generic-y += trace_clock.h
>  generic-y += preempt.h
> +generic-y += barrier.h
> --- a/arch/mn10300/include/asm/barrier.h
> +++ /dev/null
> @@ -1,37 +0,0 @@
> -/* MN10300 memory barrier definitions
> - *
> - * Copyright (C) 2007 Red Hat, Inc. All Rights Reserved.
> - * Written by David Howells (dhowells@redhat.com)
> - *
> - * This program is free software; you can redistribute it and/or
> - * modify it under the terms of the GNU General Public Licence
> - * as published by the Free Software Foundation; either version
> - * 2 of the Licence, or (at your option) any later version.
> - */
> -#ifndef _ASM_BARRIER_H
> -#define _ASM_BARRIER_H
> -
> -#define nop()	asm volatile ("nop")
> -
> -#define mb()	asm volatile ("": : :"memory")
> -#define rmb()	mb()
> -#define wmb()	asm volatile ("": : :"memory")
> -
> -#ifdef CONFIG_SMP
> -#define smp_mb()	mb()
> -#define smp_rmb()	rmb()
> -#define smp_wmb()	wmb()
> -#define set_mb(var, value)  do { xchg(&var, value); } while (0)
> -#else  /* CONFIG_SMP */
> -#define smp_mb()	barrier()
> -#define smp_rmb()	barrier()
> -#define smp_wmb()	barrier()
> -#define set_mb(var, value)  do { var = value;  mb(); } while (0)
> -#endif /* CONFIG_SMP */
> -
> -#define set_wmb(var, value) do { var = value; wmb(); } while (0)
> -
> -#define read_barrier_depends()		do {} while (0)
> -#define smp_read_barrier_depends()	do {} while (0)
> -
> -#endif /* _ASM_BARRIER_H */
> --- a/arch/parisc/include/asm/Kbuild
> +++ b/arch/parisc/include/asm/Kbuild
> @@ -5,3 +5,4 @@ generic-y += word-at-a-time.h auxvec.h u
>  	  poll.h xor.h clkdev.h exec.h
>  generic-y += trace_clock.h
>  generic-y += preempt.h
> +generic-y += barrier.h
> --- a/arch/parisc/include/asm/barrier.h
> +++ /dev/null
> @@ -1,35 +0,0 @@
> -#ifndef __PARISC_BARRIER_H
> -#define __PARISC_BARRIER_H
> -
> -/*
> -** This is simply the barrier() macro from linux/kernel.h but when serial.c
> -** uses tqueue.h uses smp_mb() defined using barrier(), linux/kernel.h
> -** hasn't yet been included yet so it fails, thus repeating the macro here.
> -**
> -** PA-RISC architecture allows for weakly ordered memory accesses although
> -** none of the processors use it. There is a strong ordered bit that is
> -** set in the O-bit of the page directory entry. Operating systems that
> -** can not tolerate out of order accesses should set this bit when mapping
> -** pages. The O-bit of the PSW should also be set to 1 (I don't believe any
> -** of the processor implemented the PSW O-bit). The PCX-W ERS states that
> -** the TLB O-bit is not implemented so the page directory does not need to
> -** have the O-bit set when mapping pages (section 3.1). This section also
> -** states that the PSW Y, Z, G, and O bits are not implemented.
> -** So it looks like nothing needs to be done for parisc-linux (yet).
> -** (thanks to chada for the above comment -ggg)
> -**
> -** The __asm__ op below simple prevents gcc/ld from reordering
> -** instructions across the mb() "call".
> -*/
> -#define mb()		__asm__ __volatile__("":::"memory")	/* barrier() */
> -#define rmb()		mb()
> -#define wmb()		mb()
> -#define smp_mb()	mb()
> -#define smp_rmb()	mb()
> -#define smp_wmb()	mb()
> -#define smp_read_barrier_depends()	do { } while(0)
> -#define read_barrier_depends()		do { } while(0)
> -
> -#define set_mb(var, value)		do { var = value; mb(); } while (0)
> -
> -#endif /* __PARISC_BARRIER_H */
> --- a/arch/score/include/asm/Kbuild
> +++ b/arch/score/include/asm/Kbuild
> @@ -5,3 +5,4 @@ generic-y += clkdev.h
>  generic-y += trace_clock.h
>  generic-y += xor.h
>  generic-y += preempt.h
> +generic-y += barrier.h
> --- a/arch/score/include/asm/barrier.h
> +++ /dev/null
> @@ -1,16 +0,0 @@
> -#ifndef _ASM_SCORE_BARRIER_H
> -#define _ASM_SCORE_BARRIER_H
> -
> -#define mb()		barrier()
> -#define rmb()		barrier()
> -#define wmb()		barrier()
> -#define smp_mb()	barrier()
> -#define smp_rmb()	barrier()
> -#define smp_wmb()	barrier()
> -
> -#define read_barrier_depends()		do {} while (0)
> -#define smp_read_barrier_depends()	do {} while (0)
> -
> -#define set_mb(var, value) 		do {var = value; wmb(); } while (0)
> -
> -#endif /* _ASM_SCORE_BARRIER_H */
> --- a/arch/sh/include/asm/barrier.h
> +++ b/arch/sh/include/asm/barrier.h
> @@ -26,29 +26,14 @@
>  #if defined(CONFIG_CPU_SH4A) || defined(CONFIG_CPU_SH5)
>  #define mb()		__asm__ __volatile__ ("synco": : :"memory")
>  #define rmb()		mb()
> -#define wmb()		__asm__ __volatile__ ("synco": : :"memory")
> +#define wmb()		mb()
>  #define ctrl_barrier()	__icbi(PAGE_OFFSET)
> -#define read_barrier_depends()	do { } while(0)
>  #else
> -#define mb()		__asm__ __volatile__ ("": : :"memory")
> -#define rmb()		mb()
> -#define wmb()		__asm__ __volatile__ ("": : :"memory")
>  #define ctrl_barrier()	__asm__ __volatile__ ("nop;nop;nop;nop;nop;nop;nop;nop")
> -#define read_barrier_depends()	do { } while(0)
> -#endif
> -
> -#ifdef CONFIG_SMP
> -#define smp_mb()	mb()
> -#define smp_rmb()	rmb()
> -#define smp_wmb()	wmb()
> -#define smp_read_barrier_depends()	read_barrier_depends()
> -#else
> -#define smp_mb()	barrier()
> -#define smp_rmb()	barrier()
> -#define smp_wmb()	barrier()
> -#define smp_read_barrier_depends()	do { } while(0)
>  #endif
> 
>  #define set_mb(var, value) do { (void)xchg(&var, value); } while (0)
> 
> +#include <asm-generic/barrier.h>
> +
>  #endif /* __ASM_SH_BARRIER_H */
> --- a/arch/sparc/include/asm/barrier_32.h
> +++ b/arch/sparc/include/asm/barrier_32.h
> @@ -1,15 +1,7 @@
>  #ifndef __SPARC_BARRIER_H
>  #define __SPARC_BARRIER_H
> 
> -/* XXX Change this if we ever use a PSO mode kernel. */
> -#define mb()	__asm__ __volatile__ ("" : : : "memory")
> -#define rmb()	mb()
> -#define wmb()	mb()
> -#define read_barrier_depends()	do { } while(0)
> -#define set_mb(__var, __value)  do { __var = __value; mb(); } while(0)
> -#define smp_mb()	__asm__ __volatile__("":::"memory")
> -#define smp_rmb()	__asm__ __volatile__("":::"memory")
> -#define smp_wmb()	__asm__ __volatile__("":::"memory")
> -#define smp_read_barrier_depends()	do { } while(0)
> +#include <asm/processor.h> /* for nop() */
> +#include <asm-generic/barrier.h>
> 
>  #endif /* !(__SPARC_BARRIER_H) */
> --- a/arch/tile/include/asm/barrier.h
> +++ b/arch/tile/include/asm/barrier.h
> @@ -22,59 +22,6 @@
>  #include <arch/spr_def.h>
>  #include <asm/timex.h>
> 
> -/*
> - * read_barrier_depends - Flush all pending reads that subsequents reads
> - * depend on.
> - *
> - * No data-dependent reads from memory-like regions are ever reordered
> - * over this barrier.  All reads preceding this primitive are guaranteed
> - * to access memory (but not necessarily other CPUs' caches) before any
> - * reads following this primitive that depend on the data return by
> - * any of the preceding reads.  This primitive is much lighter weight than
> - * rmb() on most CPUs, and is never heavier weight than is
> - * rmb().
> - *
> - * These ordering constraints are respected by both the local CPU
> - * and the compiler.
> - *
> - * Ordering is not guaranteed by anything other than these primitives,
> - * not even by data dependencies.  See the documentation for
> - * memory_barrier() for examples and URLs to more information.
> - *
> - * For example, the following code would force ordering (the initial
> - * value of "a" is zero, "b" is one, and "p" is "&a"):
> - *
> - * <programlisting>
> - *	CPU 0				CPU 1
> - *
> - *	b = 2;
> - *	memory_barrier();
> - *	p = &b;				q = p;
> - *					read_barrier_depends();
> - *					d = *q;
> - * </programlisting>
> - *
> - * because the read of "*q" depends on the read of "p" and these
> - * two reads are separated by a read_barrier_depends().  However,
> - * the following code, with the same initial values for "a" and "b":
> - *
> - * <programlisting>
> - *	CPU 0				CPU 1
> - *
> - *	a = 2;
> - *	memory_barrier();
> - *	b = 3;				y = b;
> - *					read_barrier_depends();
> - *					x = a;
> - * </programlisting>
> - *
> - * does not enforce ordering, since there is no data dependency between
> - * the read of "a" and the read of "b".  Therefore, on some CPUs, such
> - * as Alpha, "y" could be set to 3 and "x" to 0.  Use rmb()
> - * in cases like this where there are no data dependencies.
> - */
> -#define read_barrier_depends()	do { } while (0)
> -
>  #define __sync()	__insn_mf()
> 
>  #include <hv/syscall_public.h>
> @@ -125,20 +72,7 @@ mb_incoherent(void)
>  #define mb()		fast_mb()
>  #define iob()		fast_iob()
> 
> -#ifdef CONFIG_SMP
> -#define smp_mb()	mb()
> -#define smp_rmb()	rmb()
> -#define smp_wmb()	wmb()
> -#define smp_read_barrier_depends()	read_barrier_depends()
> -#else
> -#define smp_mb()	barrier()
> -#define smp_rmb()	barrier()
> -#define smp_wmb()	barrier()
> -#define smp_read_barrier_depends()	do { } while (0)
> -#endif
> -
> -#define set_mb(var, value) \
> -	do { var = value; mb(); } while (0)
> +#include <asm-generic/barrier.h>
> 
>  #endif /* !__ASSEMBLY__ */
>  #endif /* _ASM_TILE_BARRIER_H */
> --- a/arch/unicore32/include/asm/barrier.h
> +++ b/arch/unicore32/include/asm/barrier.h
> @@ -14,15 +14,6 @@
>  #define dsb() __asm__ __volatile__ ("" : : : "memory")
>  #define dmb() __asm__ __volatile__ ("" : : : "memory")
> 
> -#define mb()				barrier()
> -#define rmb()				barrier()
> -#define wmb()				barrier()
> -#define smp_mb()			barrier()
> -#define smp_rmb()			barrier()
> -#define smp_wmb()			barrier()
> -#define read_barrier_depends()		do { } while (0)
> -#define smp_read_barrier_depends()	do { } while (0)
> -
> -#define set_mb(var, value)		do { var = value; smp_mb(); } while (0)
> +#include <asm-generic/barrier.h>
> 
>  #endif /* __UNICORE_BARRIER_H__ */
> --- a/arch/xtensa/include/asm/barrier.h
> +++ b/arch/xtensa/include/asm/barrier.h
> @@ -9,21 +9,14 @@
>  #ifndef _XTENSA_SYSTEM_H
>  #define _XTENSA_SYSTEM_H
> 
> -#define smp_read_barrier_depends() do { } while(0)
> -#define read_barrier_depends() do { } while(0)
> -
>  #define mb()  ({ __asm__ __volatile__("memw" : : : "memory"); })
>  #define rmb() barrier()
>  #define wmb() mb()
> 
>  #ifdef CONFIG_SMP
>  #error smp_* not defined
> -#else
> -#define smp_mb()	barrier()
> -#define smp_rmb()	barrier()
> -#define smp_wmb()	barrier()
>  #endif
> 
> -#define set_mb(var, value)	do { var = value; mb(); } while (0)
> +#include <asm-generic/barrier.h>
> 
>  #endif /* _XTENSA_SYSTEM_H */
> --- a/include/asm-generic/barrier.h
> +++ b/include/asm-generic/barrier.h
> @@ -1,4 +1,5 @@
> -/* Generic barrier definitions, based on MN10300 definitions.
> +/*
> + * Generic barrier definitions, originally based on MN10300 definitions.
>   *
>   * It should be possible to use these on really simple architectures,
>   * but it serves more as a starting point for new ports.
> @@ -16,35 +17,50 @@
> 
>  #ifndef __ASSEMBLY__
> 
> -#define nop() asm volatile ("nop")
> +#include <linux/compiler.h>
> +
> +#ifndef nop
> +#define nop()	asm volatile ("nop")
> +#endif
> 
>  /*
> - * Force strict CPU ordering.
> - * And yes, this is required on UP too when we're talking
> - * to devices.
> + * Force strict CPU ordering. And yes, this is required on UP too when we're
> + * talking to devices.
>   *
> - * This implementation only contains a compiler barrier.
> + * Fall back to compiler barriers if nothing better is provided.
>   */
> 
> -#define mb()	asm volatile ("": : :"memory")
> +#ifndef mb
> +#define mb()	barrier()
> +#endif
> +
> +#ifndef rmb
>  #define rmb()	mb()
> -#define wmb()	asm volatile ("": : :"memory")
> +#endif
> +
> +#ifndef wmb
> +#define wmb()	mb()
> +#endif
> +
> +#ifndef read_barrier_depends
> +#define read_barrier_depends()		do { } while (0)
> +#endif
> 
>  #ifdef CONFIG_SMP
>  #define smp_mb()	mb()
>  #define smp_rmb()	rmb()
>  #define smp_wmb()	wmb()
> +#define smp_read_barrier_depends()	read_barrier_depends()
>  #else
>  #define smp_mb()	barrier()
>  #define smp_rmb()	barrier()
>  #define smp_wmb()	barrier()
> +#define smp_read_barrier_depends()	do { } while (0)
>  #endif
> 
> -#define set_mb(var, value)  do { var = value;  mb(); } while (0)
> -#define set_wmb(var, value) do { var = value; wmb(); } while (0)
> -
> -#define read_barrier_depends()		do {} while (0)
> -#define smp_read_barrier_depends()	do {} while (0)
> +#ifndef set_mb
> +#define set_mb(var, value)  do { (var) = (value); mb(); } while (0)
> +#endif
> 
>  #endif /* !__ASSEMBLY__ */
>  #endif /* __ASM_GENERIC_BARRIER_H */
> 
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/4] arch: Introduce smp_load_acquire(), smp_store_release()
  2013-12-13 14:57 ` [PATCH 4/4] arch: Introduce smp_load_acquire(), smp_store_release() Peter Zijlstra
  2013-12-16 16:40   ` Will Deacon
@ 2013-12-16 21:37   ` Paul E. McKenney
  2013-12-17  9:14     ` Peter Zijlstra
  2014-01-12 18:43   ` [tip:core/locking] " tip-bot for Peter Zijlstra
  2 siblings, 1 reply; 28+ messages in thread
From: Paul E. McKenney @ 2013-12-16 21:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-arch, linux-kernel, geert, torvalds, VICTORK, oleg, anton,
	benh, fweisbec, mathieu.desnoyers, michael, mikey, linux,
	schwidefsky, heiko.carstens, tony.luck, Will Deacon

On Fri, Dec 13, 2013 at 03:57:01PM +0100, Peter Zijlstra wrote:
> A number of situations currently require the heavyweight smp_mb(),
> even though there is no need to order prior stores against later
> loads.  Many architectures have much cheaper ways to handle these
> situations, but the Linux kernel currently has no portable way
> to make use of them.
> 
> This commit therefore supplies smp_load_acquire() and
> smp_store_release() to remedy this situation.  The new
> smp_load_acquire() primitive orders the specified load against
> any subsequent reads or writes, while the new smp_store_release()
> primitive orders the specifed store against any prior reads or
> writes.  These primitives allow array-based circular FIFOs to be
> implemented without an smp_mb(), and also allow a theoretical
> hole in rcu_assign_pointer() to be closed at no additional
> expense on most architectures.
> 
> In addition, the RCU experience transitioning from explicit
> smp_read_barrier_depends() and smp_wmb() to rcu_dereference()
> and rcu_assign_pointer(), respectively resulted in substantial
> improvements in readability.  It therefore seems likely that
> replacing other explicit barriers with smp_load_acquire() and
> smp_store_release() will provide similar benefits.  It appears
> that roughly half of the explicit barriers in core kernel code
> might be so replaced.

And here is an RFC patch switching rcu_assign_pointer() from
smp_wmb() to smp_store_release().  Thoughts?

						Thanx, Paul

------------------------------------------------------------------------

rcu: Define rcu_assign_pointer() in terms of smp_store_release()

The new smp_store_release() function provides better guarantees than did
rcu_assign_pointer(), and potentially less overhead on some architectures.
The guarantee that smp_store_release() provides that rcu_assign_pointer()
does that is obscure, but its lack could cause considerable confusion.
This guarantee is illustrated by the following code fragment:
    
    	struct foo {
    		int a;
    		int b;
    		int c;
    		struct foo *next;
    	};
    	struct foo foo1;
    	struct foo foo2;
    	struct foo __rcu *foop;
    
    	...
    
    	foo2.a = 1;
    	foo2.b = 2;
    	BUG_ON(foo2.c);
    	rcu_assign_pointer(foop, &foo);
    
    	...
    
    	fp = rcu_dereference(foop);
    	fp.c = 3;

The current rcu_assign_pointer() semantics permit the BUG_ON() to
trigger because rcu_assign_pointer()'s smp_wmb() is not guaranteed to
order prior reads against later writes.  This commit therefore upgrades
rcu_assign_pointer() from smp_wmb() to smp_store_release() to avoid this
counter-intuitive outcome.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index ad5258cc051b..2fe509171e21 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -582,11 +582,7 @@ static inline void rcu_preempt_sleep_check(void)
  * please be careful when making changes to rcu_assign_pointer() and the
  * other macros that it invokes.
  */
-#define rcu_assign_pointer(p, v) \
-	do { \
-		smp_wmb(); \
-		ACCESS_ONCE(p) = RCU_INITIALIZER(v); \
-	} while (0)
+#define rcu_assign_pointer(p, v) smp_store_release(&p, RCU_INITIALIZER(v));
 
 
 /**

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/4] arch: Introduce smp_load_acquire(), smp_store_release()
  2013-12-16 16:40   ` Will Deacon
@ 2013-12-17  9:07     ` Peter Zijlstra
  0 siblings, 0 replies; 28+ messages in thread
From: Peter Zijlstra @ 2013-12-17  9:07 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arch, linux-kernel, geert, paulmck, torvalds, VICTORK,
	oleg, anton, benh, fweisbec, mathieu.desnoyers, michael, mikey,
	linux, schwidefsky, heiko.carstens, tony.luck

On Mon, Dec 16, 2013 at 04:40:04PM +0000, Will Deacon wrote:
> > --- a/arch/ia64/include/asm/barrier.h
> > +++ b/arch/ia64/include/asm/barrier.h
> > @@ -45,14 +45,37 @@
> >  # define smp_rmb()     rmb()
> >  # define smp_wmb()     wmb()
> >  # define smp_read_barrier_depends()    read_barrier_depends()
> > +
> >  #else
> > +
> >  # define smp_mb()      barrier()
> >  # define smp_rmb()     barrier()
> >  # define smp_wmb()     barrier()
> >  # define smp_read_barrier_depends()    do { } while(0)
> > +
> >  #endif
> > 
> >  /*
> > + * IA64 GCC turns volatile stores into st.rel and volatile loads into ld.acq no
> > + * need for asm trickery!
> > + */
> > +
> > +#define smp_store_release(p, v)                                                \
> > +do {                                                                   \
> > +       compiletime_assert_atomic_type(*p);                             \
> > +       barrier();                                                      \
> > +       ACCESS_ONCE(*p) = (v);                                          \
> > +} while (0)
> > +
> > +#define smp_load_acquire(p)                                            \
> > +({                                                                     \
> > +       typeof(*p) ___p1 = ACCESS_ONCE(*p);                             \
> > +       compiletime_assert_atomic_type(*p);                             \
> > +       barrier();                                                      \
> > +       ___p1;                                                          \
> > +})
> > +
> > +/*
> >   * XXX check on this ---I suspect what Linus really wants here is
> >   * acquire vs release semantics but we can't discuss this stuff with
> >   * Linus just yet.  Grrr...
> 
> I guess you can remove this comment now?

I suppose so, but I'll leave that up to the ia64 people.. The comment
relates to set_mb() and that hasn't actually changed.

> > --- a/include/linux/compiler.h
> > +++ b/include/linux/compiler.h
> > @@ -298,6 +298,11 @@ void ftrace_likely_update(struct ftrace_
> >  # define __same_type(a, b) __builtin_types_compatible_p(typeof(a), typeof(b))
> >  #endif
> > 
> > +/* Is this type a native word size -- useful for atomic operations */
> > +#ifndef __native_word
> > +# define __native_word(t) (sizeof(t) == sizeof(int) || sizeof(t) == sizeof(long))
> > +#endif
> > +
> >  /* Compile time object size, -1 for unknown */
> >  #ifndef __compiletime_object_size
> >  # define __compiletime_object_size(obj) -1
> > @@ -337,6 +342,10 @@ void ftrace_likely_update(struct ftrace_
> >  #define compiletime_assert(condition, msg) \
> >         _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
> > 
> > +#define compiletime_assert_atomic_type(t)                              \
> > +       compiletime_assert(__native_word(t),                            \
> > +               "Need native word sized stores/loads for atomicity.")
> > +
> 
> At least for ARM, it's not so much the type that cause issues with
> atomicity, but whether or not its naturally aligned, which we don't check
> here...

Yeah I know, but you actually need to do weird things to get unaligned
pointers. Also, I'm fairly sure ARM will fail a 20 byte store-release so
the size check isn't entirely pointless.

I suppose if anybody really cares you can have some DEBUG symbol
dependent alignment check in there, but it will generate extra code.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/4] arch: Introduce smp_load_acquire(), smp_store_release()
  2013-12-16 21:37   ` Paul E. McKenney
@ 2013-12-17  9:14     ` Peter Zijlstra
  2013-12-17 13:58       ` Paul E. McKenney
  0 siblings, 1 reply; 28+ messages in thread
From: Peter Zijlstra @ 2013-12-17  9:14 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-arch, linux-kernel, geert, torvalds, VICTORK, oleg, anton,
	benh, fweisbec, mathieu.desnoyers, michael, mikey, linux,
	schwidefsky, heiko.carstens, tony.luck, Will Deacon

On Mon, Dec 16, 2013 at 01:37:20PM -0800, Paul E. McKenney wrote:
> rcu: Define rcu_assign_pointer() in terms of smp_store_release()
> 
> The new smp_store_release() function provides better guarantees than did
> rcu_assign_pointer(), and potentially less overhead on some architectures.
> The guarantee that smp_store_release() provides that rcu_assign_pointer()
> does that is obscure, but its lack could cause considerable confusion.
> This guarantee is illustrated by the following code fragment:
>     
>     	struct foo {
>     		int a;
>     		int b;
>     		int c;
>     		struct foo *next;
>     	};
>     	struct foo foo1;
>     	struct foo foo2;
>     	struct foo __rcu *foop;
>     
>     	...
>     
>     	foo2.a = 1;
>     	foo2.b = 2;
>     	BUG_ON(foo2.c);
>     	rcu_assign_pointer(foop, &foo);
>     
>     	...
>     
>     	fp = rcu_dereference(foop);
>     	fp.c = 3;
> 
> The current rcu_assign_pointer() semantics permit the BUG_ON() to
> trigger because rcu_assign_pointer()'s smp_wmb() is not guaranteed to
> order prior reads against later writes.  This commit therefore upgrades
> rcu_assign_pointer() from smp_wmb() to smp_store_release() to avoid this
> counter-intuitive outcome.

This of course raises the obvious question; why doesn't
rcu_dereference() need to be a smp_load_acquire?

And I think I know the answer; which would be that the data dependency
is still sufficient. But the situation does create an unpleasant
asymmetry in things.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/4] doc: Rename LOCK/UNLOCK to ACQUIRE/RELEASE
  2013-12-16 20:11   ` Paul E. McKenney
@ 2013-12-17  9:24     ` Peter Zijlstra
  2013-12-17 15:31       ` Paul E. McKenney
  2014-01-12 18:43       ` [tip:core/locking] locking/doc: Rename LOCK/UNLOCK to ACQUIRE/ RELEASE tip-bot for Peter Zijlstra
  2013-12-17 10:13     ` [PATCH 1/4] doc: Rename LOCK/UNLOCK to ACQUIRE/RELEASE Peter Zijlstra
  1 sibling, 2 replies; 28+ messages in thread
From: Peter Zijlstra @ 2013-12-17  9:24 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-arch, linux-kernel, geert, torvalds, VICTORK, oleg, anton,
	benh, fweisbec, mathieu.desnoyers, michael, mikey, linux,
	schwidefsky, heiko.carstens, tony.luck

On Mon, Dec 16, 2013 at 12:11:57PM -0800, Paul E. McKenney wrote:
> Still OK with my Reviewed-by, but some nits below.

Ok, that were a few silly things indeed. I hand typed and rushed the
entire document rebase on top of your recent patches to the text...
clearly!

Still lacking the two renames (which would also affect the actual code),
an updated version below.

---
Subject: doc: Rename LOCK/UNLOCK to ACQUIRE/RELEASE
From: Peter Zijlstra <peterz@infradead.org>
Date: Wed, 6 Nov 2013 14:57:36 +0100

The LOCK and UNLOCK barriers as described in our barrier document are
generally known as ACQUIRE and RELEASE barriers in other literature.

Since we plan to introduce the acquire and release nomenclature in
generic kernel primitives we should amend the document to avoid
confusion as to what an acquire/release means.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Michael Ellerman <michael@ellerman.id.au>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Victor Kaplansky <VICTORK@il.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Reviewed-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-2f9kn2mrcdjofzke9szbgpoj@git.kernel.org
---
 Documentation/memory-barriers.txt |  241 +++++++++++++++++++-------------------
 1 file changed, 123 insertions(+), 118 deletions(-)

--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -381,39 +381,44 @@ VARIETIES OF MEMORY BARRIER
 
 And a couple of implicit varieties:
 
- (5) LOCK operations.
+ (5) ACQUIRE operations.
 
      This acts as a one-way permeable barrier.  It guarantees that all memory
-     operations after the LOCK operation will appear to happen after the LOCK
-     operation with respect to the other components of the system.
+     operations after the ACQUIRE operation will appear to happen after the
+     ACQUIRE operation with respect to the other components of the system.
+     ACQUIRE operations include LOCK operations and smp_load_acquire()
+     operations.
 
-     Memory operations that occur before a LOCK operation may appear to happen
-     after it completes.
+     Memory operations that occur before an ACQUIRE operation may appear to
+     happen after it completes.
 
-     A LOCK operation should almost always be paired with an UNLOCK operation.
+     An ACQUIRE operation should almost always be paired with a RELEASE
+     operation.
 
 
- (6) UNLOCK operations.
+ (6) RELEASE operations.
 
      This also acts as a one-way permeable barrier.  It guarantees that all
-     memory operations before the UNLOCK operation will appear to happen before
-     the UNLOCK operation with respect to the other components of the system.
+     memory operations before the RELEASE operation will appear to happen
+     before the RELEASE operation with respect to the other components of the
+     system. RELEASE operations include UNLOCK operations and
+     smp_store_release() operations.
 
-     Memory operations that occur after an UNLOCK operation may appear to
+     Memory operations that occur after a RELEASE operation may appear to
      happen before it completes.
 
-     The use of LOCK and UNLOCK operations generally precludes the need for
-     other sorts of memory barrier (but note the exceptions mentioned in the
-     subsection "MMIO write barrier").  In addition, an UNLOCK+LOCK pair
-     is -not- guaranteed to act as a full memory barrier.  However,
-     after a LOCK on a given lock variable, all memory accesses preceding any
-     prior UNLOCK on that same variable are guaranteed to be visible.
-     In other words, within a given lock variable's critical section,
-     all accesses of all previous critical sections for that lock variable
-     are guaranteed to have completed.
+     The use of ACQUIRE and RELEASE operations generally precludes the need
+     for other sorts of memory barrier (but note the exceptions mentioned in
+     the subsection "MMIO write barrier").  In addition, a RELEASE+ACQUIRE
+     pair is -not- guaranteed to act as a full memory barrier.  However, after
+     an ACQUIRE on a given variable, all memory accesses preceding any prior
+     RELEASE on that same variable are guaranteed to be visible.  In other
+     words, within a given variable's critical section, all accesses of all
+     previous critical sections for that variable are guaranteed to have
+     completed.
 
-     This means that LOCK acts as a minimal "acquire" operation and
-     UNLOCK acts as a minimal "release" operation.
+     This means that ACQUIRE acts as a minimal "acquire" operation and
+     RELEASE acts as a minimal "release" operation.
 
 
 Memory barriers are only required where there's a possibility of interaction
@@ -1585,7 +1590,7 @@ CPU from reordering them.
 	clear_bit( ... );
 
      This prevents memory operations before the clear leaking to after it.  See
-     the subsection on "Locking Functions" with reference to UNLOCK operation
+     the subsection on "Locking Functions" with reference to RELEASE operation
      implications.
 
      See Documentation/atomic_ops.txt for more information.  See the "Atomic
@@ -1619,8 +1624,8 @@ provide more substantial guarantees, but
 of arch specific code.
 
 
-LOCKING FUNCTIONS
------------------
+ACQUIRING FUNCTIONS
+-------------------
 
 The Linux kernel has a number of locking constructs:
 
@@ -1631,106 +1636,106 @@ LOCKING FUNCTIONS
  (*) R/W semaphores
  (*) RCU
 
-In all cases there are variants on "LOCK" operations and "UNLOCK" operations
+In all cases there are variants on "ACQUIRE" operations and "RELEASE" operations
 for each construct.  These operations all imply certain barriers:
 
- (1) LOCK operation implication:
+ (1) ACQUIRE operation implication:
 
-     Memory operations issued after the LOCK will be completed after the LOCK
-     operation has completed.
+     Memory operations issued after the ACQUIRE will be completed after the
+     ACQUIRE operation has completed.
 
-     Memory operations issued before the LOCK may be completed after the
-     LOCK operation has completed.  An smp_mb__before_spinlock(), combined
-     with a following LOCK, orders prior loads against subsequent stores
-     and stores and prior stores against subsequent stores.  Note that
-     this is weaker than smp_mb()!  The smp_mb__before_spinlock()
-     primitive is free on many architectures.
+     Memory operations issued before the ACQUIRE may be completed after the
+     ACQUIRE operation has completed.  An smp_mb__before_spinlock(), combined
+     with a following ACQUIRE, orders prior loads against subsequent stores and
+     stores and prior stores against subsequent stores.  Note that this is
+     weaker than smp_mb()!  The smp_mb__before_spinlock() primitive is free on
+     many architectures.
 
- (2) UNLOCK operation implication:
+ (2) RELEASE operation implication:
 
-     Memory operations issued before the UNLOCK will be completed before the
-     UNLOCK operation has completed.
+     Memory operations issued before the RELEASE will be completed before the
+     RELEASE operation has completed.
 
-     Memory operations issued after the UNLOCK may be completed before the
-     UNLOCK operation has completed.
+     Memory operations issued after the RELEASE may be completed before the
+     RELEASE operation has completed.
 
- (3) LOCK vs LOCK implication:
+ (3) ACQUIRE vs ACQUIRE implication:
 
-     All LOCK operations issued before another LOCK operation will be completed
-     before that LOCK operation.
+     All ACQUIRE operations issued before another ACQUIRE operation will be
+     completed before that ACQUIRE operation.
 
- (4) LOCK vs UNLOCK implication:
+ (4) ACQUIRE vs RELEASE implication:
 
-     All LOCK operations issued before an UNLOCK operation will be completed
-     before the UNLOCK operation.
+     All ACQUIRE operations issued before a RELEASE operation will be
+     completed before the RELEASE operation.
 
- (5) Failed conditional LOCK implication:
+ (5) Failed conditional ACQUIRE implication:
 
-     Certain variants of the LOCK operation may fail, either due to being
-     unable to get the lock immediately, or due to receiving an unblocked
+     Certain locking variants of the ACQUIRE operation may fail, either due to
+     being unable to get the lock immediately, or due to receiving an unblocked
      signal whilst asleep waiting for the lock to become available.  Failed
      locks do not imply any sort of barrier.
 
-[!] Note: one of the consequences of LOCKs and UNLOCKs being only one-way
-    barriers is that the effects of instructions outside of a critical section
-    may seep into the inside of the critical section.
-
-A LOCK followed by an UNLOCK may not be assumed to be full memory barrier
-because it is possible for an access preceding the LOCK to happen after the
-LOCK, and an access following the UNLOCK to happen before the UNLOCK, and the
-two accesses can themselves then cross:
+[!] Note: one of the consequences of lock ACQUIREs and RELEASEs being only
+one-way barriers is that the effects of instructions outside of a critical
+section may seep into the inside of the critical section.
+
+An ACQUIRE followed by a RELEASE may not be assumed to be full memory barrier
+because it is possible for an access preceding the ACQUIRE to happen after the
+ACQUIRE, and an access following the RELEASE to happen before the RELEASE, and
+the two accesses can themselves then cross:
 
 	*A = a;
-	LOCK M
-	UNLOCK M
+	ACQUIRE M
+	RELEASE M
 	*B = b;
 
 may occur as:
 
-	LOCK M, STORE *B, STORE *A, UNLOCK M
+	ACQUIRE M, STORE *B, STORE *A, RELEASE M
 
-This same reordering can of course occur if the LOCK and UNLOCK are
-to the same lock variable, but only from the perspective of another
-CPU not holding that lock.
-
-In short, an UNLOCK followed by a LOCK may -not- be assumed to be a full
-memory barrier because it is possible for a preceding UNLOCK to pass a
-later LOCK from the viewpoint of the CPU, but not from the viewpoint
+This same reordering can of course occur if the lock's ACQUIRE and RELEASE are
+to the same lock variable, but only from the perspective of another CPU not
+holding that lock.
+
+In short, a RELEASE followed by an ACQUIRE may -not- be assumed to be a full
+memory barrier because it is possible for a preceding RELEASE to pass a
+later ACQUIRE from the viewpoint of the CPU, but not from the viewpoint
 of the compiler.  Note that deadlocks cannot be introduced by this
-interchange because if such a deadlock threatened, the UNLOCK would
+interchange because if such a deadlock threatened, the RELEASE would
 simply complete.
 
-If it is necessary for an UNLOCK-LOCK pair to produce a full barrier,
-the LOCK can be followed by an smp_mb__after_unlock_lock() invocation.
-This will produce a full barrier if either (a) the UNLOCK and the LOCK
-are executed by the same CPU or task, or (b) the UNLOCK and LOCK act
-on the same lock variable.  The smp_mb__after_unlock_lock() primitive
-is free on many architectures.  Without smp_mb__after_unlock_lock(),
-the critical sections corresponding to the UNLOCK and the LOCK can cross:
+If it is necessary for a RELEASE-ACQUIRE pair to produce a full barrier, the
+ACQUIRE can be followed by an smp_mb__after_unlock_lock() invocation.  This
+will produce a full barrier if either (a) the RELEASE and the ACQUIRE are
+executed by the same CPU or task, or (b) the RELEASE and ACQUIRE act on the
+same variable.  The smp_mb__after_unlock_lock() primitive is free on many
+architectures.  Without smp_mb__after_unlock_lock(), the critical sections
+corresponding to the RELEASE and the ACQUIRE can cross:
 
 	*A = a;
-	UNLOCK M
-	LOCK N
+	RELEASE M
+	ACQUIRE N
 	*B = b;
 
 could occur as:
 
-	LOCK N, STORE *B, STORE *A, UNLOCK M
+	ACQUIRE N, STORE *B, STORE *A, RELEASE M
 
 With smp_mb__after_unlock_lock(), they cannot, so that:
 
 	*A = a;
-	UNLOCK M
-	LOCK N
+	RELEASE M
+	ACQUIRE N
 	smp_mb__after_unlock_lock();
 	*B = b;
 
 will always occur as either of the following:
 
-	STORE *A, UNLOCK, LOCK, STORE *B
-	STORE *A, LOCK, UNLOCK, STORE *B
+	STORE *A, RELEASE, ACQUIRE, STORE *B
+	STORE *A, ACQUIRE, RELEASE, STORE *B
 
-If the UNLOCK and LOCK were instead both operating on the same lock
+If the RELEASE and ACQUIRE were instead both operating on the same lock
 variable, only the first of these two alternatives can occur.
 
 Locks and semaphores may not provide any guarantee of ordering on UP compiled
@@ -1745,33 +1750,33 @@ See also the section on "Inter-CPU locki
 
 	*A = a;
 	*B = b;
-	LOCK
+	ACQUIRE
 	*C = c;
 	*D = d;
-	UNLOCK
+	RELEASE
 	*E = e;
 	*F = f;
 
 The following sequence of events is acceptable:
 
-	LOCK, {*F,*A}, *E, {*C,*D}, *B, UNLOCK
+	ACQUIRE, {*F,*A}, *E, {*C,*D}, *B, RELEASE
 
 	[+] Note that {*F,*A} indicates a combined access.
 
 But none of the following are:
 
-	{*F,*A}, *B,	LOCK, *C, *D,	UNLOCK, *E
-	*A, *B, *C,	LOCK, *D,	UNLOCK, *E, *F
-	*A, *B,		LOCK, *C,	UNLOCK, *D, *E, *F
-	*B,		LOCK, *C, *D,	UNLOCK, {*F,*A}, *E
+	{*F,*A}, *B,	ACQUIRE, *C, *D,	RELEASE, *E
+	*A, *B, *C,	ACQUIRE, *D,		RELEASE, *E, *F
+	*A, *B,		ACQUIRE, *C,		RELEASE, *D, *E, *F
+	*B,		ACQUIRE, *C, *D,	RELEASE, {*F,*A}, *E
 
 
 
 INTERRUPT DISABLING FUNCTIONS
 -----------------------------
 
-Functions that disable interrupts (LOCK equivalent) and enable interrupts
-(UNLOCK equivalent) will act as compiler barriers only.  So if memory or I/O
+Functions that disable interrupts (ACQUIRE equivalent) and enable interrupts
+(RELEASE equivalent) will act as compiler barriers only.  So if memory or I/O
 barriers are required in such a situation, they must be provided from some
 other means.
 
@@ -1910,17 +1915,17 @@ MISCELLANEOUS FUNCTIONS
  (*) schedule() and similar imply full memory barriers.
 
 
-=================================
-INTER-CPU LOCKING BARRIER EFFECTS
-=================================
+===================================
+INTER-CPU ACQUIRING BARRIER EFFECTS
+===================================
 
 On SMP systems locking primitives give a more substantial form of barrier: one
 that does affect memory access ordering on other CPUs, within the context of
 conflict on any particular lock.
 
 
-LOCKS VS MEMORY ACCESSES
-------------------------
+ACQUIRES VS MEMORY ACCESSES
+---------------------------
 
 Consider the following: the system has a pair of spinlocks (M) and (Q), and
 three CPUs; then should the following sequence of events occur:
@@ -1928,24 +1933,24 @@ Consider the following: the system has a
 	CPU 1				CPU 2
 	===============================	===============================
 	ACCESS_ONCE(*A) = a;		ACCESS_ONCE(*E) = e;
-	LOCK M				LOCK Q
+	ACQUIRE M			ACQUIRE Q
 	ACCESS_ONCE(*B) = b;		ACCESS_ONCE(*F) = f;
 	ACCESS_ONCE(*C) = c;		ACCESS_ONCE(*G) = g;
-	UNLOCK M			UNLOCK Q
+	RELEASE M			RELEASE Q
 	ACCESS_ONCE(*D) = d;		ACCESS_ONCE(*H) = h;
 
 Then there is no guarantee as to what order CPU 3 will see the accesses to *A
 through *H occur in, other than the constraints imposed by the separate locks
 on the separate CPUs. It might, for example, see:
 
-	*E, LOCK M, LOCK Q, *G, *C, *F, *A, *B, UNLOCK Q, *D, *H, UNLOCK M
+	*E, ACQUIRE M, ACQUIRE Q, *G, *C, *F, *A, *B, RELEASE Q, *D, *H, RELEASE M
 
 But it won't see any of:
 
-	*B, *C or *D preceding LOCK M
-	*A, *B or *C following UNLOCK M
-	*F, *G or *H preceding LOCK Q
-	*E, *F or *G following UNLOCK Q
+	*B, *C or *D preceding ACQUIRE M
+	*A, *B or *C following RELEASE M
+	*F, *G or *H preceding ACQUIRE Q
+	*E, *F or *G following RELEASE Q
 
 
 However, if the following occurs:
@@ -1953,29 +1958,29 @@ through *H occur in, other than the cons
 	CPU 1				CPU 2
 	===============================	===============================
 	ACCESS_ONCE(*A) = a;
-	LOCK M		     [1]
+	ACQUIRE M		     [1]
 	ACCESS_ONCE(*B) = b;
 	ACCESS_ONCE(*C) = c;
-	UNLOCK M	     [1]
+	RELEASE M	     [1]
 	ACCESS_ONCE(*D) = d;		ACCESS_ONCE(*E) = e;
-					LOCK M		     [2]
+					ACQUIRE M		     [2]
 					smp_mb__after_unlock_lock();
 					ACCESS_ONCE(*F) = f;
 					ACCESS_ONCE(*G) = g;
-					UNLOCK M	     [2]
+					RELEASE M	     [2]
 					ACCESS_ONCE(*H) = h;
 
 CPU 3 might see:
 
-	*E, LOCK M [1], *C, *B, *A, UNLOCK M [1],
-		LOCK M [2], *H, *F, *G, UNLOCK M [2], *D
+	*E, ACQUIRE M [1], *C, *B, *A, RELEASE M [1],
+		ACQUIRE M [2], *H, *F, *G, RELEASE M [2], *D
 
 But assuming CPU 1 gets the lock first, CPU 3 won't see any of:
 
-	*B, *C, *D, *F, *G or *H preceding LOCK M [1]
-	*A, *B or *C following UNLOCK M [1]
-	*F, *G or *H preceding LOCK M [2]
-	*A, *B, *C, *E, *F or *G following UNLOCK M [2]
+	*B, *C, *D, *F, *G or *H preceding ACQUIRE M [1]
+	*A, *B or *C following RELEASE M [1]
+	*F, *G or *H preceding ACQUIRE M [2]
+	*A, *B, *C, *E, *F or *G following RELEASE M [2]
 
 Note that the smp_mb__after_unlock_lock() is critically important
 here: Without it CPU 3 might see some of the above orderings.
@@ -1983,8 +1988,8 @@ Without smp_mb__after_unlock_lock(), the
 to be seen in order unless CPU 3 holds lock M.
 
 
-LOCKS VS I/O ACCESSES
----------------------
+ACQUIRES VS I/O ACCESSES
+------------------------
 
 Under certain circumstances (especially involving NUMA), I/O accesses within
 two spinlocked sections on two different CPUs may be seen as interleaved by the
@@ -2202,13 +2207,13 @@ about the state (old or new) implies an
 	/* when succeeds (returns 1) */
 	atomic_add_unless();		atomic_long_add_unless();
 
-These are used for such things as implementing LOCK-class and UNLOCK-class
+These are used for such things as implementing ACQUIRE-class and RELEASE-class
 operations and adjusting reference counters towards object destruction, and as
 such the implicit memory barrier effects are necessary.
 
 
 The following operations are potential problems as they do _not_ imply memory
-barriers, but might be used for implementing such things as UNLOCK-class
+barriers, but might be used for implementing such things as RELEASE-class
 operations:
 
 	atomic_set();
@@ -2250,7 +2255,7 @@ barriers are needed or not.
 	clear_bit_unlock();
 	__clear_bit_unlock();
 
-These implement LOCK-class and UNLOCK-class operations. These should be used in
+These implement ACQUIRE-class and RELEASE-class operations. These should be used in
 preference to other operations when implementing locking primitives, because
 their implementations can be optimised on many architectures.
 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/4] doc: Rename LOCK/UNLOCK to ACQUIRE/RELEASE
  2013-12-16 20:11   ` Paul E. McKenney
  2013-12-17  9:24     ` Peter Zijlstra
@ 2013-12-17 10:13     ` Peter Zijlstra
  2013-12-17 13:59       ` Paul E. McKenney
  1 sibling, 1 reply; 28+ messages in thread
From: Peter Zijlstra @ 2013-12-17 10:13 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-arch, linux-kernel, geert, torvalds, VICTORK, oleg, anton,
	benh, fweisbec, mathieu.desnoyers, michael, mikey, linux,
	schwidefsky, heiko.carstens, tony.luck

On Mon, Dec 16, 2013 at 12:11:57PM -0800, Paul E. McKenney wrote:
> Should we rename smp_mb__before_spinlock() to be smp_mb__before_acquire()?

> > +ACQUIRE can be followed by an smp_mb__after_unlock_lock() invocation.  This
> 
> And should we rename this to be smp_mb__after_release_acquire()?
> 
> (I kind of hope not, as it is a bit more typing.)

you could name it: smp_mb__after_rel_acq(), which is shorter again :-)

But on both cases, I would propose to wait until we have a use-case
where we need these barriers on !spinlocks.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/4] arch: Introduce smp_load_acquire(), smp_store_release()
  2013-12-17  9:14     ` Peter Zijlstra
@ 2013-12-17 13:58       ` Paul E. McKenney
  0 siblings, 0 replies; 28+ messages in thread
From: Paul E. McKenney @ 2013-12-17 13:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-arch, linux-kernel, geert, torvalds, VICTORK, oleg, anton,
	benh, fweisbec, mathieu.desnoyers, michael, mikey, linux,
	schwidefsky, heiko.carstens, tony.luck, Will Deacon

On Tue, Dec 17, 2013 at 10:14:30AM +0100, Peter Zijlstra wrote:
> On Mon, Dec 16, 2013 at 01:37:20PM -0800, Paul E. McKenney wrote:
> > rcu: Define rcu_assign_pointer() in terms of smp_store_release()
> > 
> > The new smp_store_release() function provides better guarantees than did
> > rcu_assign_pointer(), and potentially less overhead on some architectures.
> > The guarantee that smp_store_release() provides that rcu_assign_pointer()
> > does that is obscure, but its lack could cause considerable confusion.
> > This guarantee is illustrated by the following code fragment:
> >     
> >     	struct foo {
> >     		int a;
> >     		int b;
> >     		int c;
> >     		struct foo *next;
> >     	};
> >     	struct foo foo1;
> >     	struct foo foo2;
> >     	struct foo __rcu *foop;
> >     
> >     	...
> >     
> >     	foo2.a = 1;
> >     	foo2.b = 2;
> >     	BUG_ON(foo2.c);
> >     	rcu_assign_pointer(foop, &foo);
> >     
> >     	...
> >     
> >     	fp = rcu_dereference(foop);
> >     	fp.c = 3;
> > 
> > The current rcu_assign_pointer() semantics permit the BUG_ON() to
> > trigger because rcu_assign_pointer()'s smp_wmb() is not guaranteed to
> > order prior reads against later writes.  This commit therefore upgrades
> > rcu_assign_pointer() from smp_wmb() to smp_store_release() to avoid this
> > counter-intuitive outcome.
> 
> This of course raises the obvious question; why doesn't
> rcu_dereference() need to be a smp_load_acquire?
> 
> And I think I know the answer; which would be that the data dependency
> is still sufficient. But the situation does create an unpleasant
> asymmetry in things.

Yep, rcu_dereference() does rely on data dependency rather than explicit
barriers.  That said, for TSO architectures, the code emitted for
data dependency and for the acquire memory barrier is often identical.
And no, many the C/C++11 guys didn't like this asymmetry much either.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/4] doc: Rename LOCK/UNLOCK to ACQUIRE/RELEASE
  2013-12-17 10:13     ` [PATCH 1/4] doc: Rename LOCK/UNLOCK to ACQUIRE/RELEASE Peter Zijlstra
@ 2013-12-17 13:59       ` Paul E. McKenney
  0 siblings, 0 replies; 28+ messages in thread
From: Paul E. McKenney @ 2013-12-17 13:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-arch, linux-kernel, geert, torvalds, VICTORK, oleg, anton,
	benh, fweisbec, mathieu.desnoyers, michael, mikey, linux,
	schwidefsky, heiko.carstens, tony.luck

On Tue, Dec 17, 2013 at 11:13:23AM +0100, Peter Zijlstra wrote:
> On Mon, Dec 16, 2013 at 12:11:57PM -0800, Paul E. McKenney wrote:
> > Should we rename smp_mb__before_spinlock() to be smp_mb__before_acquire()?
> 
> > > +ACQUIRE can be followed by an smp_mb__after_unlock_lock() invocation.  This
> > 
> > And should we rename this to be smp_mb__after_release_acquire()?
> > 
> > (I kind of hope not, as it is a bit more typing.)
> 
> you could name it: smp_mb__after_rel_acq(), which is shorter again :-)
> 
> But on both cases, I would propose to wait until we have a use-case
> where we need these barriers on !spinlocks.

Indeed, the number of uses will be small enough for quite some time to
allow renaming (or not) at our leisure.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/4] doc: Rename LOCK/UNLOCK to ACQUIRE/RELEASE
  2013-12-17  9:24     ` Peter Zijlstra
@ 2013-12-17 15:31       ` Paul E. McKenney
  2014-01-12 18:43       ` [tip:core/locking] locking/doc: Rename LOCK/UNLOCK to ACQUIRE/ RELEASE tip-bot for Peter Zijlstra
  1 sibling, 0 replies; 28+ messages in thread
From: Paul E. McKenney @ 2013-12-17 15:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-arch, linux-kernel, geert, torvalds, VICTORK, oleg, anton,
	benh, fweisbec, mathieu.desnoyers, michael, mikey, linux,
	schwidefsky, heiko.carstens, tony.luck

On Tue, Dec 17, 2013 at 10:24:35AM +0100, Peter Zijlstra wrote:
> On Mon, Dec 16, 2013 at 12:11:57PM -0800, Paul E. McKenney wrote:
> > Still OK with my Reviewed-by, but some nits below.
> 
> Ok, that were a few silly things indeed. I hand typed and rushed the
> entire document rebase on top of your recent patches to the text...
> clearly!
> 
> Still lacking the two renames (which would also affect the actual code),
> an updated version below.

It will be easy to change the name for some time.

This update looks good!

							Thanx, Paul

> ---
> Subject: doc: Rename LOCK/UNLOCK to ACQUIRE/RELEASE
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Wed, 6 Nov 2013 14:57:36 +0100
> 
> The LOCK and UNLOCK barriers as described in our barrier document are
> generally known as ACQUIRE and RELEASE barriers in other literature.
> 
> Since we plan to introduce the acquire and release nomenclature in
> generic kernel primitives we should amend the document to avoid
> confusion as to what an acquire/release means.
> 
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Michael Ellerman <michael@ellerman.id.au>
> Cc: Michael Neuling <mikey@neuling.org>
> Cc: Russell King <linux@arm.linux.org.uk>
> Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
> Cc: Victor Kaplansky <VICTORK@il.ibm.com>
> Cc: Tony Luck <tony.luck@intel.com>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> Reviewed-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> Signed-off-by: Peter Zijlstra <peterz@infradead.org>
> Link: http://lkml.kernel.org/n/tip-2f9kn2mrcdjofzke9szbgpoj@git.kernel.org
> ---
>  Documentation/memory-barriers.txt |  241 +++++++++++++++++++-------------------
>  1 file changed, 123 insertions(+), 118 deletions(-)
> 
> --- a/Documentation/memory-barriers.txt
> +++ b/Documentation/memory-barriers.txt
> @@ -381,39 +381,44 @@ VARIETIES OF MEMORY BARRIER
> 
>  And a couple of implicit varieties:
> 
> - (5) LOCK operations.
> + (5) ACQUIRE operations.
> 
>       This acts as a one-way permeable barrier.  It guarantees that all memory
> -     operations after the LOCK operation will appear to happen after the LOCK
> -     operation with respect to the other components of the system.
> +     operations after the ACQUIRE operation will appear to happen after the
> +     ACQUIRE operation with respect to the other components of the system.
> +     ACQUIRE operations include LOCK operations and smp_load_acquire()
> +     operations.
> 
> -     Memory operations that occur before a LOCK operation may appear to happen
> -     after it completes.
> +     Memory operations that occur before an ACQUIRE operation may appear to
> +     happen after it completes.
> 
> -     A LOCK operation should almost always be paired with an UNLOCK operation.
> +     An ACQUIRE operation should almost always be paired with a RELEASE
> +     operation.
> 
> 
> - (6) UNLOCK operations.
> + (6) RELEASE operations.
> 
>       This also acts as a one-way permeable barrier.  It guarantees that all
> -     memory operations before the UNLOCK operation will appear to happen before
> -     the UNLOCK operation with respect to the other components of the system.
> +     memory operations before the RELEASE operation will appear to happen
> +     before the RELEASE operation with respect to the other components of the
> +     system. RELEASE operations include UNLOCK operations and
> +     smp_store_release() operations.
> 
> -     Memory operations that occur after an UNLOCK operation may appear to
> +     Memory operations that occur after a RELEASE operation may appear to
>       happen before it completes.
> 
> -     The use of LOCK and UNLOCK operations generally precludes the need for
> -     other sorts of memory barrier (but note the exceptions mentioned in the
> -     subsection "MMIO write barrier").  In addition, an UNLOCK+LOCK pair
> -     is -not- guaranteed to act as a full memory barrier.  However,
> -     after a LOCK on a given lock variable, all memory accesses preceding any
> -     prior UNLOCK on that same variable are guaranteed to be visible.
> -     In other words, within a given lock variable's critical section,
> -     all accesses of all previous critical sections for that lock variable
> -     are guaranteed to have completed.
> +     The use of ACQUIRE and RELEASE operations generally precludes the need
> +     for other sorts of memory barrier (but note the exceptions mentioned in
> +     the subsection "MMIO write barrier").  In addition, a RELEASE+ACQUIRE
> +     pair is -not- guaranteed to act as a full memory barrier.  However, after
> +     an ACQUIRE on a given variable, all memory accesses preceding any prior
> +     RELEASE on that same variable are guaranteed to be visible.  In other
> +     words, within a given variable's critical section, all accesses of all
> +     previous critical sections for that variable are guaranteed to have
> +     completed.
> 
> -     This means that LOCK acts as a minimal "acquire" operation and
> -     UNLOCK acts as a minimal "release" operation.
> +     This means that ACQUIRE acts as a minimal "acquire" operation and
> +     RELEASE acts as a minimal "release" operation.
> 
> 
>  Memory barriers are only required where there's a possibility of interaction
> @@ -1585,7 +1590,7 @@ CPU from reordering them.
>  	clear_bit( ... );
> 
>       This prevents memory operations before the clear leaking to after it.  See
> -     the subsection on "Locking Functions" with reference to UNLOCK operation
> +     the subsection on "Locking Functions" with reference to RELEASE operation
>       implications.
> 
>       See Documentation/atomic_ops.txt for more information.  See the "Atomic
> @@ -1619,8 +1624,8 @@ provide more substantial guarantees, but
>  of arch specific code.
> 
> 
> -LOCKING FUNCTIONS
> ------------------
> +ACQUIRING FUNCTIONS
> +-------------------
> 
>  The Linux kernel has a number of locking constructs:
> 
> @@ -1631,106 +1636,106 @@ LOCKING FUNCTIONS
>   (*) R/W semaphores
>   (*) RCU
> 
> -In all cases there are variants on "LOCK" operations and "UNLOCK" operations
> +In all cases there are variants on "ACQUIRE" operations and "RELEASE" operations
>  for each construct.  These operations all imply certain barriers:
> 
> - (1) LOCK operation implication:
> + (1) ACQUIRE operation implication:
> 
> -     Memory operations issued after the LOCK will be completed after the LOCK
> -     operation has completed.
> +     Memory operations issued after the ACQUIRE will be completed after the
> +     ACQUIRE operation has completed.
> 
> -     Memory operations issued before the LOCK may be completed after the
> -     LOCK operation has completed.  An smp_mb__before_spinlock(), combined
> -     with a following LOCK, orders prior loads against subsequent stores
> -     and stores and prior stores against subsequent stores.  Note that
> -     this is weaker than smp_mb()!  The smp_mb__before_spinlock()
> -     primitive is free on many architectures.
> +     Memory operations issued before the ACQUIRE may be completed after the
> +     ACQUIRE operation has completed.  An smp_mb__before_spinlock(), combined
> +     with a following ACQUIRE, orders prior loads against subsequent stores and
> +     stores and prior stores against subsequent stores.  Note that this is
> +     weaker than smp_mb()!  The smp_mb__before_spinlock() primitive is free on
> +     many architectures.
> 
> - (2) UNLOCK operation implication:
> + (2) RELEASE operation implication:
> 
> -     Memory operations issued before the UNLOCK will be completed before the
> -     UNLOCK operation has completed.
> +     Memory operations issued before the RELEASE will be completed before the
> +     RELEASE operation has completed.
> 
> -     Memory operations issued after the UNLOCK may be completed before the
> -     UNLOCK operation has completed.
> +     Memory operations issued after the RELEASE may be completed before the
> +     RELEASE operation has completed.
> 
> - (3) LOCK vs LOCK implication:
> + (3) ACQUIRE vs ACQUIRE implication:
> 
> -     All LOCK operations issued before another LOCK operation will be completed
> -     before that LOCK operation.
> +     All ACQUIRE operations issued before another ACQUIRE operation will be
> +     completed before that ACQUIRE operation.
> 
> - (4) LOCK vs UNLOCK implication:
> + (4) ACQUIRE vs RELEASE implication:
> 
> -     All LOCK operations issued before an UNLOCK operation will be completed
> -     before the UNLOCK operation.
> +     All ACQUIRE operations issued before a RELEASE operation will be
> +     completed before the RELEASE operation.
> 
> - (5) Failed conditional LOCK implication:
> + (5) Failed conditional ACQUIRE implication:
> 
> -     Certain variants of the LOCK operation may fail, either due to being
> -     unable to get the lock immediately, or due to receiving an unblocked
> +     Certain locking variants of the ACQUIRE operation may fail, either due to
> +     being unable to get the lock immediately, or due to receiving an unblocked
>       signal whilst asleep waiting for the lock to become available.  Failed
>       locks do not imply any sort of barrier.
> 
> -[!] Note: one of the consequences of LOCKs and UNLOCKs being only one-way
> -    barriers is that the effects of instructions outside of a critical section
> -    may seep into the inside of the critical section.
> -
> -A LOCK followed by an UNLOCK may not be assumed to be full memory barrier
> -because it is possible for an access preceding the LOCK to happen after the
> -LOCK, and an access following the UNLOCK to happen before the UNLOCK, and the
> -two accesses can themselves then cross:
> +[!] Note: one of the consequences of lock ACQUIREs and RELEASEs being only
> +one-way barriers is that the effects of instructions outside of a critical
> +section may seep into the inside of the critical section.
> +
> +An ACQUIRE followed by a RELEASE may not be assumed to be full memory barrier
> +because it is possible for an access preceding the ACQUIRE to happen after the
> +ACQUIRE, and an access following the RELEASE to happen before the RELEASE, and
> +the two accesses can themselves then cross:
> 
>  	*A = a;
> -	LOCK M
> -	UNLOCK M
> +	ACQUIRE M
> +	RELEASE M
>  	*B = b;
> 
>  may occur as:
> 
> -	LOCK M, STORE *B, STORE *A, UNLOCK M
> +	ACQUIRE M, STORE *B, STORE *A, RELEASE M
> 
> -This same reordering can of course occur if the LOCK and UNLOCK are
> -to the same lock variable, but only from the perspective of another
> -CPU not holding that lock.
> -
> -In short, an UNLOCK followed by a LOCK may -not- be assumed to be a full
> -memory barrier because it is possible for a preceding UNLOCK to pass a
> -later LOCK from the viewpoint of the CPU, but not from the viewpoint
> +This same reordering can of course occur if the lock's ACQUIRE and RELEASE are
> +to the same lock variable, but only from the perspective of another CPU not
> +holding that lock.
> +
> +In short, a RELEASE followed by an ACQUIRE may -not- be assumed to be a full
> +memory barrier because it is possible for a preceding RELEASE to pass a
> +later ACQUIRE from the viewpoint of the CPU, but not from the viewpoint
>  of the compiler.  Note that deadlocks cannot be introduced by this
> -interchange because if such a deadlock threatened, the UNLOCK would
> +interchange because if such a deadlock threatened, the RELEASE would
>  simply complete.
> 
> -If it is necessary for an UNLOCK-LOCK pair to produce a full barrier,
> -the LOCK can be followed by an smp_mb__after_unlock_lock() invocation.
> -This will produce a full barrier if either (a) the UNLOCK and the LOCK
> -are executed by the same CPU or task, or (b) the UNLOCK and LOCK act
> -on the same lock variable.  The smp_mb__after_unlock_lock() primitive
> -is free on many architectures.  Without smp_mb__after_unlock_lock(),
> -the critical sections corresponding to the UNLOCK and the LOCK can cross:
> +If it is necessary for a RELEASE-ACQUIRE pair to produce a full barrier, the
> +ACQUIRE can be followed by an smp_mb__after_unlock_lock() invocation.  This
> +will produce a full barrier if either (a) the RELEASE and the ACQUIRE are
> +executed by the same CPU or task, or (b) the RELEASE and ACQUIRE act on the
> +same variable.  The smp_mb__after_unlock_lock() primitive is free on many
> +architectures.  Without smp_mb__after_unlock_lock(), the critical sections
> +corresponding to the RELEASE and the ACQUIRE can cross:
> 
>  	*A = a;
> -	UNLOCK M
> -	LOCK N
> +	RELEASE M
> +	ACQUIRE N
>  	*B = b;
> 
>  could occur as:
> 
> -	LOCK N, STORE *B, STORE *A, UNLOCK M
> +	ACQUIRE N, STORE *B, STORE *A, RELEASE M
> 
>  With smp_mb__after_unlock_lock(), they cannot, so that:
> 
>  	*A = a;
> -	UNLOCK M
> -	LOCK N
> +	RELEASE M
> +	ACQUIRE N
>  	smp_mb__after_unlock_lock();
>  	*B = b;
> 
>  will always occur as either of the following:
> 
> -	STORE *A, UNLOCK, LOCK, STORE *B
> -	STORE *A, LOCK, UNLOCK, STORE *B
> +	STORE *A, RELEASE, ACQUIRE, STORE *B
> +	STORE *A, ACQUIRE, RELEASE, STORE *B
> 
> -If the UNLOCK and LOCK were instead both operating on the same lock
> +If the RELEASE and ACQUIRE were instead both operating on the same lock
>  variable, only the first of these two alternatives can occur.
> 
>  Locks and semaphores may not provide any guarantee of ordering on UP compiled
> @@ -1745,33 +1750,33 @@ See also the section on "Inter-CPU locki
> 
>  	*A = a;
>  	*B = b;
> -	LOCK
> +	ACQUIRE
>  	*C = c;
>  	*D = d;
> -	UNLOCK
> +	RELEASE
>  	*E = e;
>  	*F = f;
> 
>  The following sequence of events is acceptable:
> 
> -	LOCK, {*F,*A}, *E, {*C,*D}, *B, UNLOCK
> +	ACQUIRE, {*F,*A}, *E, {*C,*D}, *B, RELEASE
> 
>  	[+] Note that {*F,*A} indicates a combined access.
> 
>  But none of the following are:
> 
> -	{*F,*A}, *B,	LOCK, *C, *D,	UNLOCK, *E
> -	*A, *B, *C,	LOCK, *D,	UNLOCK, *E, *F
> -	*A, *B,		LOCK, *C,	UNLOCK, *D, *E, *F
> -	*B,		LOCK, *C, *D,	UNLOCK, {*F,*A}, *E
> +	{*F,*A}, *B,	ACQUIRE, *C, *D,	RELEASE, *E
> +	*A, *B, *C,	ACQUIRE, *D,		RELEASE, *E, *F
> +	*A, *B,		ACQUIRE, *C,		RELEASE, *D, *E, *F
> +	*B,		ACQUIRE, *C, *D,	RELEASE, {*F,*A}, *E
> 
> 
> 
>  INTERRUPT DISABLING FUNCTIONS
>  -----------------------------
> 
> -Functions that disable interrupts (LOCK equivalent) and enable interrupts
> -(UNLOCK equivalent) will act as compiler barriers only.  So if memory or I/O
> +Functions that disable interrupts (ACQUIRE equivalent) and enable interrupts
> +(RELEASE equivalent) will act as compiler barriers only.  So if memory or I/O
>  barriers are required in such a situation, they must be provided from some
>  other means.
> 
> @@ -1910,17 +1915,17 @@ MISCELLANEOUS FUNCTIONS
>   (*) schedule() and similar imply full memory barriers.
> 
> 
> -=================================
> -INTER-CPU LOCKING BARRIER EFFECTS
> -=================================
> +===================================
> +INTER-CPU ACQUIRING BARRIER EFFECTS
> +===================================
> 
>  On SMP systems locking primitives give a more substantial form of barrier: one
>  that does affect memory access ordering on other CPUs, within the context of
>  conflict on any particular lock.
> 
> 
> -LOCKS VS MEMORY ACCESSES
> -------------------------
> +ACQUIRES VS MEMORY ACCESSES
> +---------------------------
> 
>  Consider the following: the system has a pair of spinlocks (M) and (Q), and
>  three CPUs; then should the following sequence of events occur:
> @@ -1928,24 +1933,24 @@ Consider the following: the system has a
>  	CPU 1				CPU 2
>  	===============================	===============================
>  	ACCESS_ONCE(*A) = a;		ACCESS_ONCE(*E) = e;
> -	LOCK M				LOCK Q
> +	ACQUIRE M			ACQUIRE Q
>  	ACCESS_ONCE(*B) = b;		ACCESS_ONCE(*F) = f;
>  	ACCESS_ONCE(*C) = c;		ACCESS_ONCE(*G) = g;
> -	UNLOCK M			UNLOCK Q
> +	RELEASE M			RELEASE Q
>  	ACCESS_ONCE(*D) = d;		ACCESS_ONCE(*H) = h;
> 
>  Then there is no guarantee as to what order CPU 3 will see the accesses to *A
>  through *H occur in, other than the constraints imposed by the separate locks
>  on the separate CPUs. It might, for example, see:
> 
> -	*E, LOCK M, LOCK Q, *G, *C, *F, *A, *B, UNLOCK Q, *D, *H, UNLOCK M
> +	*E, ACQUIRE M, ACQUIRE Q, *G, *C, *F, *A, *B, RELEASE Q, *D, *H, RELEASE M
> 
>  But it won't see any of:
> 
> -	*B, *C or *D preceding LOCK M
> -	*A, *B or *C following UNLOCK M
> -	*F, *G or *H preceding LOCK Q
> -	*E, *F or *G following UNLOCK Q
> +	*B, *C or *D preceding ACQUIRE M
> +	*A, *B or *C following RELEASE M
> +	*F, *G or *H preceding ACQUIRE Q
> +	*E, *F or *G following RELEASE Q
> 
> 
>  However, if the following occurs:
> @@ -1953,29 +1958,29 @@ through *H occur in, other than the cons
>  	CPU 1				CPU 2
>  	===============================	===============================
>  	ACCESS_ONCE(*A) = a;
> -	LOCK M		     [1]
> +	ACQUIRE M		     [1]
>  	ACCESS_ONCE(*B) = b;
>  	ACCESS_ONCE(*C) = c;
> -	UNLOCK M	     [1]
> +	RELEASE M	     [1]
>  	ACCESS_ONCE(*D) = d;		ACCESS_ONCE(*E) = e;
> -					LOCK M		     [2]
> +					ACQUIRE M		     [2]
>  					smp_mb__after_unlock_lock();
>  					ACCESS_ONCE(*F) = f;
>  					ACCESS_ONCE(*G) = g;
> -					UNLOCK M	     [2]
> +					RELEASE M	     [2]
>  					ACCESS_ONCE(*H) = h;
> 
>  CPU 3 might see:
> 
> -	*E, LOCK M [1], *C, *B, *A, UNLOCK M [1],
> -		LOCK M [2], *H, *F, *G, UNLOCK M [2], *D
> +	*E, ACQUIRE M [1], *C, *B, *A, RELEASE M [1],
> +		ACQUIRE M [2], *H, *F, *G, RELEASE M [2], *D
> 
>  But assuming CPU 1 gets the lock first, CPU 3 won't see any of:
> 
> -	*B, *C, *D, *F, *G or *H preceding LOCK M [1]
> -	*A, *B or *C following UNLOCK M [1]
> -	*F, *G or *H preceding LOCK M [2]
> -	*A, *B, *C, *E, *F or *G following UNLOCK M [2]
> +	*B, *C, *D, *F, *G or *H preceding ACQUIRE M [1]
> +	*A, *B or *C following RELEASE M [1]
> +	*F, *G or *H preceding ACQUIRE M [2]
> +	*A, *B, *C, *E, *F or *G following RELEASE M [2]
> 
>  Note that the smp_mb__after_unlock_lock() is critically important
>  here: Without it CPU 3 might see some of the above orderings.
> @@ -1983,8 +1988,8 @@ Without smp_mb__after_unlock_lock(), the
>  to be seen in order unless CPU 3 holds lock M.
> 
> 
> -LOCKS VS I/O ACCESSES
> ----------------------
> +ACQUIRES VS I/O ACCESSES
> +------------------------
> 
>  Under certain circumstances (especially involving NUMA), I/O accesses within
>  two spinlocked sections on two different CPUs may be seen as interleaved by the
> @@ -2202,13 +2207,13 @@ about the state (old or new) implies an
>  	/* when succeeds (returns 1) */
>  	atomic_add_unless();		atomic_long_add_unless();
> 
> -These are used for such things as implementing LOCK-class and UNLOCK-class
> +These are used for such things as implementing ACQUIRE-class and RELEASE-class
>  operations and adjusting reference counters towards object destruction, and as
>  such the implicit memory barrier effects are necessary.
> 
> 
>  The following operations are potential problems as they do _not_ imply memory
> -barriers, but might be used for implementing such things as UNLOCK-class
> +barriers, but might be used for implementing such things as RELEASE-class
>  operations:
> 
>  	atomic_set();
> @@ -2250,7 +2255,7 @@ barriers are needed or not.
>  	clear_bit_unlock();
>  	__clear_bit_unlock();
> 
> -These implement LOCK-class and UNLOCK-class operations. These should be used in
> +These implement ACQUIRE-class and RELEASE-class operations. These should be used in
>  preference to other operations when implementing locking primitives, because
>  their implementations can be optimised on many architectures.
> 
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/4] arch: Move smp_mb__{before,after}_atomic_{inc,dec}.h into asm/atomic.h
  2013-12-13 14:56 ` [PATCH 2/4] arch: Move smp_mb__{before,after}_atomic_{inc,dec}.h into asm/atomic.h Peter Zijlstra
  2013-12-16 20:13   ` Paul E. McKenney
@ 2013-12-18 13:40   ` Vineet Gupta
  2014-01-12 18:43   ` [tip:core/locking] arch: Move smp_mb__{before,after}_atomic_{inc, dec}.h " tip-bot for Peter Zijlstra
  2 siblings, 0 replies; 28+ messages in thread
From: Vineet Gupta @ 2013-12-18 13:40 UTC (permalink / raw)
  To: Peter Zijlstra, linux-arch, linux-kernel
  Cc: geert, paulmck, torvalds, VICTORK, oleg, anton, benh, fweisbec,
	mathieu.desnoyers, michael, mikey, linux, schwidefsky,
	heiko.carstens, tony.luck

On 12/13/2013 08:26 PM, Peter Zijlstra wrote:
> Move the barriers functions that depend on the atomic implementation
> into the atomic implementation.
> 
> Signed-off-by: Peter Zijlstra <peterz@infradead.org>
> ---
>  arch/arc/include/asm/atomic.h      |    5 +++++
>  arch/arc/include/asm/barrier.h     |    5 -----
>  arch/hexagon/include/asm/atomic.h  |    6 +++++-
>  arch/hexagon/include/asm/barrier.h |    4 ----
>  4 files changed, 10 insertions(+), 10 deletions(-)

Acked-by: Vineet Gupta <vgupta@synopsys.com> [for arch/arc bits]

Thx,
-Vineet

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [tip:core/locking] locking/doc: Rename LOCK/UNLOCK to ACQUIRE/ RELEASE
  2013-12-17  9:24     ` Peter Zijlstra
  2013-12-17 15:31       ` Paul E. McKenney
@ 2014-01-12 18:43       ` tip-bot for Peter Zijlstra
  1 sibling, 0 replies; 28+ messages in thread
From: tip-bot for Peter Zijlstra @ 2014-01-12 18:43 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mathieu.desnoyers, mingo, torvalds, schwidefsky, peterz,
	tony.luck, fweisbec, linux, heiko.carstens, oleg, VICTORK, tglx,
	linux-kernel, hpa, michael, paulmck, benh, geert, mikey

Commit-ID:  2e4f5382d12a441b5cccfdde00308df15c2ce300
Gitweb:     http://git.kernel.org/tip/2e4f5382d12a441b5cccfdde00308df15c2ce300
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Wed, 6 Nov 2013 14:57:36 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sun, 12 Jan 2014 10:37:13 +0100

locking/doc: Rename LOCK/UNLOCK to ACQUIRE/RELEASE

The LOCK and UNLOCK barriers as described in our barrier document are
generally known as ACQUIRE and RELEASE barriers in other literature.

Since we plan to introduce the acquire and release nomenclature in
generic kernel primitives we should amend the document to avoid
confusion as to what an acquire/release means.

Reviewed-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Michael Ellerman <michael@ellerman.id.au>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Victor Kaplansky <VICTORK@il.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: http://lkml.kernel.org/r/20131217092435.GC21999@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 Documentation/memory-barriers.txt | 237 +++++++++++++++++++-------------------
 1 file changed, 121 insertions(+), 116 deletions(-)

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index cb753c8..102dc19 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -381,39 +381,44 @@ Memory barriers come in four basic varieties:
 
 And a couple of implicit varieties:
 
- (5) LOCK operations.
+ (5) ACQUIRE operations.
 
      This acts as a one-way permeable barrier.  It guarantees that all memory
-     operations after the LOCK operation will appear to happen after the LOCK
-     operation with respect to the other components of the system.
+     operations after the ACQUIRE operation will appear to happen after the
+     ACQUIRE operation with respect to the other components of the system.
+     ACQUIRE operations include LOCK operations and smp_load_acquire()
+     operations.
 
-     Memory operations that occur before a LOCK operation may appear to happen
-     after it completes.
+     Memory operations that occur before an ACQUIRE operation may appear to
+     happen after it completes.
 
-     A LOCK operation should almost always be paired with an UNLOCK operation.
+     An ACQUIRE operation should almost always be paired with a RELEASE
+     operation.
 
 
- (6) UNLOCK operations.
+ (6) RELEASE operations.
 
      This also acts as a one-way permeable barrier.  It guarantees that all
-     memory operations before the UNLOCK operation will appear to happen before
-     the UNLOCK operation with respect to the other components of the system.
+     memory operations before the RELEASE operation will appear to happen
+     before the RELEASE operation with respect to the other components of the
+     system. RELEASE operations include UNLOCK operations and
+     smp_store_release() operations.
 
-     Memory operations that occur after an UNLOCK operation may appear to
+     Memory operations that occur after a RELEASE operation may appear to
      happen before it completes.
 
-     The use of LOCK and UNLOCK operations generally precludes the need for
-     other sorts of memory barrier (but note the exceptions mentioned in the
-     subsection "MMIO write barrier").  In addition, an UNLOCK+LOCK pair
-     is -not- guaranteed to act as a full memory barrier.  However,
-     after a LOCK on a given lock variable, all memory accesses preceding any
-     prior UNLOCK on that same variable are guaranteed to be visible.
-     In other words, within a given lock variable's critical section,
-     all accesses of all previous critical sections for that lock variable
-     are guaranteed to have completed.
+     The use of ACQUIRE and RELEASE operations generally precludes the need
+     for other sorts of memory barrier (but note the exceptions mentioned in
+     the subsection "MMIO write barrier").  In addition, a RELEASE+ACQUIRE
+     pair is -not- guaranteed to act as a full memory barrier.  However, after
+     an ACQUIRE on a given variable, all memory accesses preceding any prior
+     RELEASE on that same variable are guaranteed to be visible.  In other
+     words, within a given variable's critical section, all accesses of all
+     previous critical sections for that variable are guaranteed to have
+     completed.
 
-     This means that LOCK acts as a minimal "acquire" operation and
-     UNLOCK acts as a minimal "release" operation.
+     This means that ACQUIRE acts as a minimal "acquire" operation and
+     RELEASE acts as a minimal "release" operation.
 
 
 Memory barriers are only required where there's a possibility of interaction
@@ -1585,7 +1590,7 @@ There are some more advanced barrier functions:
 	clear_bit( ... );
 
      This prevents memory operations before the clear leaking to after it.  See
-     the subsection on "Locking Functions" with reference to UNLOCK operation
+     the subsection on "Locking Functions" with reference to RELEASE operation
      implications.
 
      See Documentation/atomic_ops.txt for more information.  See the "Atomic
@@ -1619,8 +1624,8 @@ provide more substantial guarantees, but these may not be relied upon outside
 of arch specific code.
 
 
-LOCKING FUNCTIONS
------------------
+ACQUIRING FUNCTIONS
+-------------------
 
 The Linux kernel has a number of locking constructs:
 
@@ -1631,106 +1636,106 @@ The Linux kernel has a number of locking constructs:
  (*) R/W semaphores
  (*) RCU
 
-In all cases there are variants on "LOCK" operations and "UNLOCK" operations
+In all cases there are variants on "ACQUIRE" operations and "RELEASE" operations
 for each construct.  These operations all imply certain barriers:
 
- (1) LOCK operation implication:
+ (1) ACQUIRE operation implication:
 
-     Memory operations issued after the LOCK will be completed after the LOCK
-     operation has completed.
+     Memory operations issued after the ACQUIRE will be completed after the
+     ACQUIRE operation has completed.
 
-     Memory operations issued before the LOCK may be completed after the
-     LOCK operation has completed.  An smp_mb__before_spinlock(), combined
-     with a following LOCK, orders prior loads against subsequent stores
-     and stores and prior stores against subsequent stores.  Note that
-     this is weaker than smp_mb()!  The smp_mb__before_spinlock()
-     primitive is free on many architectures.
+     Memory operations issued before the ACQUIRE may be completed after the
+     ACQUIRE operation has completed.  An smp_mb__before_spinlock(), combined
+     with a following ACQUIRE, orders prior loads against subsequent stores and
+     stores and prior stores against subsequent stores.  Note that this is
+     weaker than smp_mb()!  The smp_mb__before_spinlock() primitive is free on
+     many architectures.
 
- (2) UNLOCK operation implication:
+ (2) RELEASE operation implication:
 
-     Memory operations issued before the UNLOCK will be completed before the
-     UNLOCK operation has completed.
+     Memory operations issued before the RELEASE will be completed before the
+     RELEASE operation has completed.
 
-     Memory operations issued after the UNLOCK may be completed before the
-     UNLOCK operation has completed.
+     Memory operations issued after the RELEASE may be completed before the
+     RELEASE operation has completed.
 
- (3) LOCK vs LOCK implication:
+ (3) ACQUIRE vs ACQUIRE implication:
 
-     All LOCK operations issued before another LOCK operation will be completed
-     before that LOCK operation.
+     All ACQUIRE operations issued before another ACQUIRE operation will be
+     completed before that ACQUIRE operation.
 
- (4) LOCK vs UNLOCK implication:
+ (4) ACQUIRE vs RELEASE implication:
 
-     All LOCK operations issued before an UNLOCK operation will be completed
-     before the UNLOCK operation.
+     All ACQUIRE operations issued before a RELEASE operation will be
+     completed before the RELEASE operation.
 
- (5) Failed conditional LOCK implication:
+ (5) Failed conditional ACQUIRE implication:
 
-     Certain variants of the LOCK operation may fail, either due to being
-     unable to get the lock immediately, or due to receiving an unblocked
+     Certain locking variants of the ACQUIRE operation may fail, either due to
+     being unable to get the lock immediately, or due to receiving an unblocked
      signal whilst asleep waiting for the lock to become available.  Failed
      locks do not imply any sort of barrier.
 
-[!] Note: one of the consequences of LOCKs and UNLOCKs being only one-way
-    barriers is that the effects of instructions outside of a critical section
-    may seep into the inside of the critical section.
+[!] Note: one of the consequences of lock ACQUIREs and RELEASEs being only
+one-way barriers is that the effects of instructions outside of a critical
+section may seep into the inside of the critical section.
 
-A LOCK followed by an UNLOCK may not be assumed to be full memory barrier
-because it is possible for an access preceding the LOCK to happen after the
-LOCK, and an access following the UNLOCK to happen before the UNLOCK, and the
-two accesses can themselves then cross:
+An ACQUIRE followed by a RELEASE may not be assumed to be full memory barrier
+because it is possible for an access preceding the ACQUIRE to happen after the
+ACQUIRE, and an access following the RELEASE to happen before the RELEASE, and
+the two accesses can themselves then cross:
 
 	*A = a;
-	LOCK M
-	UNLOCK M
+	ACQUIRE M
+	RELEASE M
 	*B = b;
 
 may occur as:
 
-	LOCK M, STORE *B, STORE *A, UNLOCK M
+	ACQUIRE M, STORE *B, STORE *A, RELEASE M
 
-This same reordering can of course occur if the LOCK and UNLOCK are
-to the same lock variable, but only from the perspective of another
-CPU not holding that lock.
+This same reordering can of course occur if the lock's ACQUIRE and RELEASE are
+to the same lock variable, but only from the perspective of another CPU not
+holding that lock.
 
-In short, an UNLOCK followed by a LOCK may -not- be assumed to be a full
-memory barrier because it is possible for a preceding UNLOCK to pass a
-later LOCK from the viewpoint of the CPU, but not from the viewpoint
+In short, a RELEASE followed by an ACQUIRE may -not- be assumed to be a full
+memory barrier because it is possible for a preceding RELEASE to pass a
+later ACQUIRE from the viewpoint of the CPU, but not from the viewpoint
 of the compiler.  Note that deadlocks cannot be introduced by this
-interchange because if such a deadlock threatened, the UNLOCK would
+interchange because if such a deadlock threatened, the RELEASE would
 simply complete.
 
-If it is necessary for an UNLOCK-LOCK pair to produce a full barrier,
-the LOCK can be followed by an smp_mb__after_unlock_lock() invocation.
-This will produce a full barrier if either (a) the UNLOCK and the LOCK
-are executed by the same CPU or task, or (b) the UNLOCK and LOCK act
-on the same lock variable.  The smp_mb__after_unlock_lock() primitive
-is free on many architectures.  Without smp_mb__after_unlock_lock(),
-the critical sections corresponding to the UNLOCK and the LOCK can cross:
+If it is necessary for a RELEASE-ACQUIRE pair to produce a full barrier, the
+ACQUIRE can be followed by an smp_mb__after_unlock_lock() invocation.  This
+will produce a full barrier if either (a) the RELEASE and the ACQUIRE are
+executed by the same CPU or task, or (b) the RELEASE and ACQUIRE act on the
+same variable.  The smp_mb__after_unlock_lock() primitive is free on many
+architectures.  Without smp_mb__after_unlock_lock(), the critical sections
+corresponding to the RELEASE and the ACQUIRE can cross:
 
 	*A = a;
-	UNLOCK M
-	LOCK N
+	RELEASE M
+	ACQUIRE N
 	*B = b;
 
 could occur as:
 
-	LOCK N, STORE *B, STORE *A, UNLOCK M
+	ACQUIRE N, STORE *B, STORE *A, RELEASE M
 
 With smp_mb__after_unlock_lock(), they cannot, so that:
 
 	*A = a;
-	UNLOCK M
-	LOCK N
+	RELEASE M
+	ACQUIRE N
 	smp_mb__after_unlock_lock();
 	*B = b;
 
 will always occur as either of the following:
 
-	STORE *A, UNLOCK, LOCK, STORE *B
-	STORE *A, LOCK, UNLOCK, STORE *B
+	STORE *A, RELEASE, ACQUIRE, STORE *B
+	STORE *A, ACQUIRE, RELEASE, STORE *B
 
-If the UNLOCK and LOCK were instead both operating on the same lock
+If the RELEASE and ACQUIRE were instead both operating on the same lock
 variable, only the first of these two alternatives can occur.
 
 Locks and semaphores may not provide any guarantee of ordering on UP compiled
@@ -1745,33 +1750,33 @@ As an example, consider the following:
 
 	*A = a;
 	*B = b;
-	LOCK
+	ACQUIRE
 	*C = c;
 	*D = d;
-	UNLOCK
+	RELEASE
 	*E = e;
 	*F = f;
 
 The following sequence of events is acceptable:
 
-	LOCK, {*F,*A}, *E, {*C,*D}, *B, UNLOCK
+	ACQUIRE, {*F,*A}, *E, {*C,*D}, *B, RELEASE
 
 	[+] Note that {*F,*A} indicates a combined access.
 
 But none of the following are:
 
-	{*F,*A}, *B,	LOCK, *C, *D,	UNLOCK, *E
-	*A, *B, *C,	LOCK, *D,	UNLOCK, *E, *F
-	*A, *B,		LOCK, *C,	UNLOCK, *D, *E, *F
-	*B,		LOCK, *C, *D,	UNLOCK, {*F,*A}, *E
+	{*F,*A}, *B,	ACQUIRE, *C, *D,	RELEASE, *E
+	*A, *B, *C,	ACQUIRE, *D,		RELEASE, *E, *F
+	*A, *B,		ACQUIRE, *C,		RELEASE, *D, *E, *F
+	*B,		ACQUIRE, *C, *D,	RELEASE, {*F,*A}, *E
 
 
 
 INTERRUPT DISABLING FUNCTIONS
 -----------------------------
 
-Functions that disable interrupts (LOCK equivalent) and enable interrupts
-(UNLOCK equivalent) will act as compiler barriers only.  So if memory or I/O
+Functions that disable interrupts (ACQUIRE equivalent) and enable interrupts
+(RELEASE equivalent) will act as compiler barriers only.  So if memory or I/O
 barriers are required in such a situation, they must be provided from some
 other means.
 
@@ -1910,17 +1915,17 @@ Other functions that imply barriers:
  (*) schedule() and similar imply full memory barriers.
 
 
-=================================
-INTER-CPU LOCKING BARRIER EFFECTS
-=================================
+===================================
+INTER-CPU ACQUIRING BARRIER EFFECTS
+===================================
 
 On SMP systems locking primitives give a more substantial form of barrier: one
 that does affect memory access ordering on other CPUs, within the context of
 conflict on any particular lock.
 
 
-LOCKS VS MEMORY ACCESSES
-------------------------
+ACQUIRES VS MEMORY ACCESSES
+---------------------------
 
 Consider the following: the system has a pair of spinlocks (M) and (Q), and
 three CPUs; then should the following sequence of events occur:
@@ -1928,24 +1933,24 @@ three CPUs; then should the following sequence of events occur:
 	CPU 1				CPU 2
 	===============================	===============================
 	ACCESS_ONCE(*A) = a;		ACCESS_ONCE(*E) = e;
-	LOCK M				LOCK Q
+	ACQUIRE M			ACQUIRE Q
 	ACCESS_ONCE(*B) = b;		ACCESS_ONCE(*F) = f;
 	ACCESS_ONCE(*C) = c;		ACCESS_ONCE(*G) = g;
-	UNLOCK M			UNLOCK Q
+	RELEASE M			RELEASE Q
 	ACCESS_ONCE(*D) = d;		ACCESS_ONCE(*H) = h;
 
 Then there is no guarantee as to what order CPU 3 will see the accesses to *A
 through *H occur in, other than the constraints imposed by the separate locks
 on the separate CPUs. It might, for example, see:
 
-	*E, LOCK M, LOCK Q, *G, *C, *F, *A, *B, UNLOCK Q, *D, *H, UNLOCK M
+	*E, ACQUIRE M, ACQUIRE Q, *G, *C, *F, *A, *B, RELEASE Q, *D, *H, RELEASE M
 
 But it won't see any of:
 
-	*B, *C or *D preceding LOCK M
-	*A, *B or *C following UNLOCK M
-	*F, *G or *H preceding LOCK Q
-	*E, *F or *G following UNLOCK Q
+	*B, *C or *D preceding ACQUIRE M
+	*A, *B or *C following RELEASE M
+	*F, *G or *H preceding ACQUIRE Q
+	*E, *F or *G following RELEASE Q
 
 
 However, if the following occurs:
@@ -1953,29 +1958,29 @@ However, if the following occurs:
 	CPU 1				CPU 2
 	===============================	===============================
 	ACCESS_ONCE(*A) = a;
-	LOCK M		     [1]
+	ACQUIRE M		     [1]
 	ACCESS_ONCE(*B) = b;
 	ACCESS_ONCE(*C) = c;
-	UNLOCK M	     [1]
+	RELEASE M	     [1]
 	ACCESS_ONCE(*D) = d;		ACCESS_ONCE(*E) = e;
-					LOCK M		     [2]
+					ACQUIRE M		     [2]
 					smp_mb__after_unlock_lock();
 					ACCESS_ONCE(*F) = f;
 					ACCESS_ONCE(*G) = g;
-					UNLOCK M	     [2]
+					RELEASE M	     [2]
 					ACCESS_ONCE(*H) = h;
 
 CPU 3 might see:
 
-	*E, LOCK M [1], *C, *B, *A, UNLOCK M [1],
-		LOCK M [2], *H, *F, *G, UNLOCK M [2], *D
+	*E, ACQUIRE M [1], *C, *B, *A, RELEASE M [1],
+		ACQUIRE M [2], *H, *F, *G, RELEASE M [2], *D
 
 But assuming CPU 1 gets the lock first, CPU 3 won't see any of:
 
-	*B, *C, *D, *F, *G or *H preceding LOCK M [1]
-	*A, *B or *C following UNLOCK M [1]
-	*F, *G or *H preceding LOCK M [2]
-	*A, *B, *C, *E, *F or *G following UNLOCK M [2]
+	*B, *C, *D, *F, *G or *H preceding ACQUIRE M [1]
+	*A, *B or *C following RELEASE M [1]
+	*F, *G or *H preceding ACQUIRE M [2]
+	*A, *B, *C, *E, *F or *G following RELEASE M [2]
 
 Note that the smp_mb__after_unlock_lock() is critically important
 here: Without it CPU 3 might see some of the above orderings.
@@ -1983,8 +1988,8 @@ Without smp_mb__after_unlock_lock(), the accesses are not guaranteed
 to be seen in order unless CPU 3 holds lock M.
 
 
-LOCKS VS I/O ACCESSES
----------------------
+ACQUIRES VS I/O ACCESSES
+------------------------
 
 Under certain circumstances (especially involving NUMA), I/O accesses within
 two spinlocked sections on two different CPUs may be seen as interleaved by the
@@ -2202,13 +2207,13 @@ explicit lock operations, described later).  These include:
 	/* when succeeds (returns 1) */
 	atomic_add_unless();		atomic_long_add_unless();
 
-These are used for such things as implementing LOCK-class and UNLOCK-class
+These are used for such things as implementing ACQUIRE-class and RELEASE-class
 operations and adjusting reference counters towards object destruction, and as
 such the implicit memory barrier effects are necessary.
 
 
 The following operations are potential problems as they do _not_ imply memory
-barriers, but might be used for implementing such things as UNLOCK-class
+barriers, but might be used for implementing such things as RELEASE-class
 operations:
 
 	atomic_set();
@@ -2250,7 +2255,7 @@ The following operations are special locking primitives:
 	clear_bit_unlock();
 	__clear_bit_unlock();
 
-These implement LOCK-class and UNLOCK-class operations. These should be used in
+These implement ACQUIRE-class and RELEASE-class operations. These should be used in
 preference to other operations when implementing locking primitives, because
 their implementations can be optimised on many architectures.
 

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [tip:core/locking] arch: Move smp_mb__{before,after}_atomic_{inc, dec}.h into asm/atomic.h
  2013-12-13 14:56 ` [PATCH 2/4] arch: Move smp_mb__{before,after}_atomic_{inc,dec}.h into asm/atomic.h Peter Zijlstra
  2013-12-16 20:13   ` Paul E. McKenney
  2013-12-18 13:40   ` Vineet Gupta
@ 2014-01-12 18:43   ` tip-bot for Peter Zijlstra
  2 siblings, 0 replies; 28+ messages in thread
From: tip-bot for Peter Zijlstra @ 2014-01-12 18:43 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, torvalds, peterz, paulmck, akpm, tglx

Commit-ID:  1de7da377bd880ff23917f78924d0e908329d978
Gitweb:     http://git.kernel.org/tip/1de7da377bd880ff23917f78924d0e908329d978
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Fri, 8 Nov 2013 12:01:59 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sun, 12 Jan 2014 10:37:14 +0100

arch: Move smp_mb__{before,after}_atomic_{inc,dec}.h into asm/atomic.h

Move the barriers functions that depend on the atomic implementation
into the atomic implementation.

Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Vineet Gupta <vgupta@synopsys.com> [for arch/arc bits]
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20131213150640.786183683@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/arc/include/asm/atomic.h      | 5 +++++
 arch/arc/include/asm/barrier.h     | 5 -----
 arch/hexagon/include/asm/atomic.h  | 6 +++++-
 arch/hexagon/include/asm/barrier.h | 4 ----
 4 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/arc/include/asm/atomic.h b/arch/arc/include/asm/atomic.h
index 83f03ca..03e494f 100644
--- a/arch/arc/include/asm/atomic.h
+++ b/arch/arc/include/asm/atomic.h
@@ -190,6 +190,11 @@ static inline void atomic_clear_mask(unsigned long mask, unsigned long *addr)
 
 #endif /* !CONFIG_ARC_HAS_LLSC */
 
+#define smp_mb__before_atomic_dec()	barrier()
+#define smp_mb__after_atomic_dec()	barrier()
+#define smp_mb__before_atomic_inc()	barrier()
+#define smp_mb__after_atomic_inc()	barrier()
+
 /**
  * __atomic_add_unless - add unless the number is a given value
  * @v: pointer of type atomic_t
diff --git a/arch/arc/include/asm/barrier.h b/arch/arc/include/asm/barrier.h
index f6cb7c4..c32245c 100644
--- a/arch/arc/include/asm/barrier.h
+++ b/arch/arc/include/asm/barrier.h
@@ -30,11 +30,6 @@
 #define smp_wmb()       barrier()
 #endif
 
-#define smp_mb__before_atomic_dec()	barrier()
-#define smp_mb__after_atomic_dec()	barrier()
-#define smp_mb__before_atomic_inc()	barrier()
-#define smp_mb__after_atomic_inc()	barrier()
-
 #define smp_read_barrier_depends()      do { } while (0)
 
 #endif
diff --git a/arch/hexagon/include/asm/atomic.h b/arch/hexagon/include/asm/atomic.h
index 8a64ff2..7aae4cb 100644
--- a/arch/hexagon/include/asm/atomic.h
+++ b/arch/hexagon/include/asm/atomic.h
@@ -160,8 +160,12 @@ static inline int __atomic_add_unless(atomic_t *v, int a, int u)
 #define atomic_sub_and_test(i, v) (atomic_sub_return(i, (v)) == 0)
 #define atomic_add_negative(i, v) (atomic_add_return(i, (v)) < 0)
 
-
 #define atomic_inc_return(v) (atomic_add_return(1, v))
 #define atomic_dec_return(v) (atomic_sub_return(1, v))
 
+#define smp_mb__before_atomic_dec()	barrier()
+#define smp_mb__after_atomic_dec()	barrier()
+#define smp_mb__before_atomic_inc()	barrier()
+#define smp_mb__after_atomic_inc()	barrier()
+
 #endif
diff --git a/arch/hexagon/include/asm/barrier.h b/arch/hexagon/include/asm/barrier.h
index 1041a8e..4e863da 100644
--- a/arch/hexagon/include/asm/barrier.h
+++ b/arch/hexagon/include/asm/barrier.h
@@ -29,10 +29,6 @@
 #define smp_read_barrier_depends()	barrier()
 #define smp_wmb()			barrier()
 #define smp_mb()			barrier()
-#define smp_mb__before_atomic_dec()	barrier()
-#define smp_mb__after_atomic_dec()	barrier()
-#define smp_mb__before_atomic_inc()	barrier()
-#define smp_mb__after_atomic_inc()	barrier()
 
 /*  Set a value and use a memory barrier.  Used by the scheduler somewhere.  */
 #define set_mb(var, value) \

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [tip:core/locking] arch: Clean up asm/ barrier.h implementations using asm-generic/barrier.h
  2013-12-13 14:57 ` [PATCH 3/4] arch: Clean up asm/barrier.h implementations using asm-generic/barrier.h Peter Zijlstra
  2013-12-13 19:16   ` Geert Uytterhoeven
  2013-12-16 20:14   ` Paul E. McKenney
@ 2014-01-12 18:43   ` tip-bot for Peter Zijlstra
  2 siblings, 0 replies; 28+ messages in thread
From: tip-bot for Peter Zijlstra @ 2014-01-12 18:43 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mathieu.desnoyers, mingo, torvalds, schwidefsky, peterz,
	tony.luck, fweisbec, linux, heiko.carstens, oleg, VICTORK, tglx,
	linux-kernel, hpa, michael, paulmck, benh, geert, mikey

Commit-ID:  93ea02bb84354370e51de803a9405f171f3edf88
Gitweb:     http://git.kernel.org/tip/93ea02bb84354370e51de803a9405f171f3edf88
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Wed, 6 Nov 2013 14:57:36 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sun, 12 Jan 2014 10:37:15 +0100

arch: Clean up asm/barrier.h implementations using asm-generic/barrier.h

We're going to be adding a few new barrier primitives, and in order to
avoid endless duplication make more agressive use of
asm-generic/barrier.h.

Change the asm-generic/barrier.h such that it allows partial barrier
definitions and fills out the rest with defaults.

There are a few architectures (m32r, m68k) that could probably
do away with their barrier.h file entirely but are kept for now due to
their unconventional nop() implementation.

Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Michael Ellerman <michael@ellerman.id.au>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Victor Kaplansky <VICTORK@il.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20131213150640.846368594@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/alpha/include/asm/barrier.h      | 25 +++--------
 arch/arc/include/asm/Kbuild           |  1 +
 arch/avr32/include/asm/barrier.h      | 17 +++-----
 arch/blackfin/include/asm/barrier.h   | 18 +-------
 arch/cris/include/asm/Kbuild          |  1 +
 arch/cris/include/asm/barrier.h       | 25 -----------
 arch/frv/include/asm/barrier.h        |  8 +---
 arch/hexagon/include/asm/Kbuild       |  1 +
 arch/m32r/include/asm/barrier.h       | 80 +----------------------------------
 arch/m68k/include/asm/barrier.h       | 14 +-----
 arch/microblaze/include/asm/Kbuild    |  1 +
 arch/microblaze/include/asm/barrier.h | 27 ------------
 arch/mn10300/include/asm/Kbuild       |  1 +
 arch/mn10300/include/asm/barrier.h    | 37 ----------------
 arch/parisc/include/asm/Kbuild        |  1 +
 arch/parisc/include/asm/barrier.h     | 35 ---------------
 arch/score/include/asm/Kbuild         |  1 +
 arch/score/include/asm/barrier.h      | 16 -------
 arch/sh/include/asm/barrier.h         | 21 ++-------
 arch/sparc/include/asm/barrier_32.h   | 12 +-----
 arch/tile/include/asm/barrier.h       | 68 +----------------------------
 arch/unicore32/include/asm/barrier.h  | 11 +----
 arch/xtensa/include/asm/barrier.h     |  9 +---
 include/asm-generic/barrier.h         | 42 ++++++++++++------
 24 files changed, 58 insertions(+), 414 deletions(-)

diff --git a/arch/alpha/include/asm/barrier.h b/arch/alpha/include/asm/barrier.h
index ce8860a..3832bdb 100644
--- a/arch/alpha/include/asm/barrier.h
+++ b/arch/alpha/include/asm/barrier.h
@@ -3,33 +3,18 @@
 
 #include <asm/compiler.h>
 
-#define mb() \
-__asm__ __volatile__("mb": : :"memory")
+#define mb()	__asm__ __volatile__("mb": : :"memory")
+#define rmb()	__asm__ __volatile__("mb": : :"memory")
+#define wmb()	__asm__ __volatile__("wmb": : :"memory")
 
-#define rmb() \
-__asm__ __volatile__("mb": : :"memory")
-
-#define wmb() \
-__asm__ __volatile__("wmb": : :"memory")
-
-#define read_barrier_depends() \
-__asm__ __volatile__("mb": : :"memory")
+#define read_barrier_depends() __asm__ __volatile__("mb": : :"memory")
 
 #ifdef CONFIG_SMP
 #define __ASM_SMP_MB	"\tmb\n"
-#define smp_mb()	mb()
-#define smp_rmb()	rmb()
-#define smp_wmb()	wmb()
-#define smp_read_barrier_depends()	read_barrier_depends()
 #else
 #define __ASM_SMP_MB
-#define smp_mb()	barrier()
-#define smp_rmb()	barrier()
-#define smp_wmb()	barrier()
-#define smp_read_barrier_depends()	do { } while (0)
 #endif
 
-#define set_mb(var, value) \
-do { var = value; mb(); } while (0)
+#include <asm-generic/barrier.h>
 
 #endif		/* __BARRIER_H */
diff --git a/arch/arc/include/asm/Kbuild b/arch/arc/include/asm/Kbuild
index 5943f7f..e07c786 100644
--- a/arch/arc/include/asm/Kbuild
+++ b/arch/arc/include/asm/Kbuild
@@ -47,3 +47,4 @@ generic-y += user.h
 generic-y += vga.h
 generic-y += xor.h
 generic-y += preempt.h
+generic-y += barrier.h
diff --git a/arch/avr32/include/asm/barrier.h b/arch/avr32/include/asm/barrier.h
index 0961275..7151007 100644
--- a/arch/avr32/include/asm/barrier.h
+++ b/arch/avr32/include/asm/barrier.h
@@ -8,22 +8,15 @@
 #ifndef __ASM_AVR32_BARRIER_H
 #define __ASM_AVR32_BARRIER_H
 
-#define nop()			asm volatile("nop")
-
-#define mb()			asm volatile("" : : : "memory")
-#define rmb()			mb()
-#define wmb()			asm volatile("sync 0" : : : "memory")
-#define read_barrier_depends()  do { } while(0)
-#define set_mb(var, value)      do { var = value; mb(); } while(0)
+/*
+ * Weirdest thing ever.. no full barrier, but it has a write barrier!
+ */
+#define wmb()	asm volatile("sync 0" : : : "memory")
 
 #ifdef CONFIG_SMP
 # error "The AVR32 port does not support SMP"
-#else
-# define smp_mb()		barrier()
-# define smp_rmb()		barrier()
-# define smp_wmb()		barrier()
-# define smp_read_barrier_depends() do { } while(0)
 #endif
 
+#include <asm-generic/barrier.h>
 
 #endif /* __ASM_AVR32_BARRIER_H */
diff --git a/arch/blackfin/include/asm/barrier.h b/arch/blackfin/include/asm/barrier.h
index ebb1895..19283a1 100644
--- a/arch/blackfin/include/asm/barrier.h
+++ b/arch/blackfin/include/asm/barrier.h
@@ -23,26 +23,10 @@
 # define rmb()	do { barrier(); smp_check_barrier(); } while (0)
 # define wmb()	do { barrier(); smp_mark_barrier(); } while (0)
 # define read_barrier_depends()	do { barrier(); smp_check_barrier(); } while (0)
-#else
-# define mb()	barrier()
-# define rmb()	barrier()
-# define wmb()	barrier()
-# define read_barrier_depends()	do { } while (0)
 #endif
 
-#else /* !CONFIG_SMP */
-
-#define mb()	barrier()
-#define rmb()	barrier()
-#define wmb()	barrier()
-#define read_barrier_depends()	do { } while (0)
-
 #endif /* !CONFIG_SMP */
 
-#define smp_mb()  mb()
-#define smp_rmb() rmb()
-#define smp_wmb() wmb()
-#define set_mb(var, value) do { var = value; mb(); } while (0)
-#define smp_read_barrier_depends()	read_barrier_depends()
+#include <asm-generic/barrier.h>
 
 #endif /* _BLACKFIN_BARRIER_H */
diff --git a/arch/cris/include/asm/Kbuild b/arch/cris/include/asm/Kbuild
index b06caf6..35ec2e5 100644
--- a/arch/cris/include/asm/Kbuild
+++ b/arch/cris/include/asm/Kbuild
@@ -12,3 +12,4 @@ generic-y += trace_clock.h
 generic-y += vga.h
 generic-y += xor.h
 generic-y += preempt.h
+generic-y += barrier.h
diff --git a/arch/cris/include/asm/barrier.h b/arch/cris/include/asm/barrier.h
deleted file mode 100644
index 198ad7f..0000000
--- a/arch/cris/include/asm/barrier.h
+++ /dev/null
@@ -1,25 +0,0 @@
-#ifndef __ASM_CRIS_BARRIER_H
-#define __ASM_CRIS_BARRIER_H
-
-#define nop() __asm__ __volatile__ ("nop");
-
-#define barrier() __asm__ __volatile__("": : :"memory")
-#define mb() barrier()
-#define rmb() mb()
-#define wmb() mb()
-#define read_barrier_depends() do { } while(0)
-#define set_mb(var, value)  do { var = value; mb(); } while (0)
-
-#ifdef CONFIG_SMP
-#define smp_mb()        mb()
-#define smp_rmb()       rmb()
-#define smp_wmb()       wmb()
-#define smp_read_barrier_depends()     read_barrier_depends()
-#else
-#define smp_mb()        barrier()
-#define smp_rmb()       barrier()
-#define smp_wmb()       barrier()
-#define smp_read_barrier_depends()     do { } while(0)
-#endif
-
-#endif /* __ASM_CRIS_BARRIER_H */
diff --git a/arch/frv/include/asm/barrier.h b/arch/frv/include/asm/barrier.h
index 06776ad..abbef47 100644
--- a/arch/frv/include/asm/barrier.h
+++ b/arch/frv/include/asm/barrier.h
@@ -17,13 +17,7 @@
 #define mb()			asm volatile ("membar" : : :"memory")
 #define rmb()			asm volatile ("membar" : : :"memory")
 #define wmb()			asm volatile ("membar" : : :"memory")
-#define read_barrier_depends()	do { } while (0)
 
-#define smp_mb()			barrier()
-#define smp_rmb()			barrier()
-#define smp_wmb()			barrier()
-#define smp_read_barrier_depends()	do {} while(0)
-#define set_mb(var, value) \
-	do { var = (value); barrier(); } while (0)
+#include <asm-generic/barrier.h>
 
 #endif /* _ASM_BARRIER_H */
diff --git a/arch/hexagon/include/asm/Kbuild b/arch/hexagon/include/asm/Kbuild
index 67c3450..a614ec9 100644
--- a/arch/hexagon/include/asm/Kbuild
+++ b/arch/hexagon/include/asm/Kbuild
@@ -54,3 +54,4 @@ generic-y += ucontext.h
 generic-y += unaligned.h
 generic-y += xor.h
 generic-y += preempt.h
+generic-y += barrier.h
diff --git a/arch/m32r/include/asm/barrier.h b/arch/m32r/include/asm/barrier.h
index 6976621..1a40265 100644
--- a/arch/m32r/include/asm/barrier.h
+++ b/arch/m32r/include/asm/barrier.h
@@ -11,84 +11,6 @@
 
 #define nop()  __asm__ __volatile__ ("nop" : : )
 
-/*
- * Memory barrier.
- *
- * mb() prevents loads and stores being reordered across this point.
- * rmb() prevents loads being reordered across this point.
- * wmb() prevents stores being reordered across this point.
- */
-#define mb()   barrier()
-#define rmb()  mb()
-#define wmb()  mb()
-
-/**
- * read_barrier_depends - Flush all pending reads that subsequents reads
- * depend on.
- *
- * No data-dependent reads from memory-like regions are ever reordered
- * over this barrier.  All reads preceding this primitive are guaranteed
- * to access memory (but not necessarily other CPUs' caches) before any
- * reads following this primitive that depend on the data return by
- * any of the preceding reads.  This primitive is much lighter weight than
- * rmb() on most CPUs, and is never heavier weight than is
- * rmb().
- *
- * These ordering constraints are respected by both the local CPU
- * and the compiler.
- *
- * Ordering is not guaranteed by anything other than these primitives,
- * not even by data dependencies.  See the documentation for
- * memory_barrier() for examples and URLs to more information.
- *
- * For example, the following code would force ordering (the initial
- * value of "a" is zero, "b" is one, and "p" is "&a"):
- *
- * <programlisting>
- *      CPU 0                           CPU 1
- *
- *      b = 2;
- *      memory_barrier();
- *      p = &b;                         q = p;
- *                                      read_barrier_depends();
- *                                      d = *q;
- * </programlisting>
- *
- *
- * because the read of "*q" depends on the read of "p" and these
- * two reads are separated by a read_barrier_depends().  However,
- * the following code, with the same initial values for "a" and "b":
- *
- * <programlisting>
- *      CPU 0                           CPU 1
- *
- *      a = 2;
- *      memory_barrier();
- *      b = 3;                          y = b;
- *                                      read_barrier_depends();
- *                                      x = a;
- * </programlisting>
- *
- * does not enforce ordering, since there is no data dependency between
- * the read of "a" and the read of "b".  Therefore, on some CPUs, such
- * as Alpha, "y" could be set to 3 and "x" to 0.  Use rmb()
- * in cases like this where there are no data dependencies.
- **/
-
-#define read_barrier_depends()	do { } while (0)
-
-#ifdef CONFIG_SMP
-#define smp_mb()	mb()
-#define smp_rmb()	rmb()
-#define smp_wmb()	wmb()
-#define smp_read_barrier_depends()	read_barrier_depends()
-#define set_mb(var, value) do { (void) xchg(&var, value); } while (0)
-#else
-#define smp_mb()	barrier()
-#define smp_rmb()	barrier()
-#define smp_wmb()	barrier()
-#define smp_read_barrier_depends()	do { } while (0)
-#define set_mb(var, value) do { var = value; barrier(); } while (0)
-#endif
+#include <asm-generic/barrier.h>
 
 #endif /* _ASM_M32R_BARRIER_H */
diff --git a/arch/m68k/include/asm/barrier.h b/arch/m68k/include/asm/barrier.h
index 445ce22..15c5f77 100644
--- a/arch/m68k/include/asm/barrier.h
+++ b/arch/m68k/include/asm/barrier.h
@@ -1,20 +1,8 @@
 #ifndef _M68K_BARRIER_H
 #define _M68K_BARRIER_H
 
-/*
- * Force strict CPU ordering.
- * Not really required on m68k...
- */
 #define nop()		do { asm volatile ("nop"); barrier(); } while (0)
-#define mb()		barrier()
-#define rmb()		barrier()
-#define wmb()		barrier()
-#define read_barrier_depends()	((void)0)
-#define set_mb(var, value)	({ (var) = (value); wmb(); })
 
-#define smp_mb()	barrier()
-#define smp_rmb()	barrier()
-#define smp_wmb()	barrier()
-#define smp_read_barrier_depends()	((void)0)
+#include <asm-generic/barrier.h>
 
 #endif /* _M68K_BARRIER_H */
diff --git a/arch/microblaze/include/asm/Kbuild b/arch/microblaze/include/asm/Kbuild
index ce0bbf8..f77fb66 100644
--- a/arch/microblaze/include/asm/Kbuild
+++ b/arch/microblaze/include/asm/Kbuild
@@ -4,3 +4,4 @@ generic-y += exec.h
 generic-y += trace_clock.h
 generic-y += syscalls.h
 generic-y += preempt.h
+generic-y += barrier.h
diff --git a/arch/microblaze/include/asm/barrier.h b/arch/microblaze/include/asm/barrier.h
deleted file mode 100644
index df5be3e8..0000000
--- a/arch/microblaze/include/asm/barrier.h
+++ /dev/null
@@ -1,27 +0,0 @@
-/*
- * Copyright (C) 2006 Atmark Techno, Inc.
- *
- * This file is subject to the terms and conditions of the GNU General Public
- * License. See the file "COPYING" in the main directory of this archive
- * for more details.
- */
-
-#ifndef _ASM_MICROBLAZE_BARRIER_H
-#define _ASM_MICROBLAZE_BARRIER_H
-
-#define nop()                  asm volatile ("nop")
-
-#define smp_read_barrier_depends()	do {} while (0)
-#define read_barrier_depends()		do {} while (0)
-
-#define mb()			barrier()
-#define rmb()			mb()
-#define wmb()			mb()
-#define set_mb(var, value)	do { var = value; mb(); } while (0)
-#define set_wmb(var, value)	do { var = value; wmb(); } while (0)
-
-#define smp_mb()		mb()
-#define smp_rmb()		rmb()
-#define smp_wmb()		wmb()
-
-#endif /* _ASM_MICROBLAZE_BARRIER_H */
diff --git a/arch/mn10300/include/asm/Kbuild b/arch/mn10300/include/asm/Kbuild
index 74742dc..367ef39 100644
--- a/arch/mn10300/include/asm/Kbuild
+++ b/arch/mn10300/include/asm/Kbuild
@@ -3,3 +3,4 @@ generic-y += clkdev.h
 generic-y += exec.h
 generic-y += trace_clock.h
 generic-y += preempt.h
+generic-y += barrier.h
diff --git a/arch/mn10300/include/asm/barrier.h b/arch/mn10300/include/asm/barrier.h
deleted file mode 100644
index 2bd97a5..0000000
--- a/arch/mn10300/include/asm/barrier.h
+++ /dev/null
@@ -1,37 +0,0 @@
-/* MN10300 memory barrier definitions
- *
- * Copyright (C) 2007 Red Hat, Inc. All Rights Reserved.
- * Written by David Howells (dhowells@redhat.com)
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public Licence
- * as published by the Free Software Foundation; either version
- * 2 of the Licence, or (at your option) any later version.
- */
-#ifndef _ASM_BARRIER_H
-#define _ASM_BARRIER_H
-
-#define nop()	asm volatile ("nop")
-
-#define mb()	asm volatile ("": : :"memory")
-#define rmb()	mb()
-#define wmb()	asm volatile ("": : :"memory")
-
-#ifdef CONFIG_SMP
-#define smp_mb()	mb()
-#define smp_rmb()	rmb()
-#define smp_wmb()	wmb()
-#define set_mb(var, value)  do { xchg(&var, value); } while (0)
-#else  /* CONFIG_SMP */
-#define smp_mb()	barrier()
-#define smp_rmb()	barrier()
-#define smp_wmb()	barrier()
-#define set_mb(var, value)  do { var = value;  mb(); } while (0)
-#endif /* CONFIG_SMP */
-
-#define set_wmb(var, value) do { var = value; wmb(); } while (0)
-
-#define read_barrier_depends()		do {} while (0)
-#define smp_read_barrier_depends()	do {} while (0)
-
-#endif /* _ASM_BARRIER_H */
diff --git a/arch/parisc/include/asm/Kbuild b/arch/parisc/include/asm/Kbuild
index a603b9e..8df06d0 100644
--- a/arch/parisc/include/asm/Kbuild
+++ b/arch/parisc/include/asm/Kbuild
@@ -5,3 +5,4 @@ generic-y += word-at-a-time.h auxvec.h user.h cputime.h emergency-restart.h \
 	  poll.h xor.h clkdev.h exec.h
 generic-y += trace_clock.h
 generic-y += preempt.h
+generic-y += barrier.h
diff --git a/arch/parisc/include/asm/barrier.h b/arch/parisc/include/asm/barrier.h
deleted file mode 100644
index e77d834..0000000
--- a/arch/parisc/include/asm/barrier.h
+++ /dev/null
@@ -1,35 +0,0 @@
-#ifndef __PARISC_BARRIER_H
-#define __PARISC_BARRIER_H
-
-/*
-** This is simply the barrier() macro from linux/kernel.h but when serial.c
-** uses tqueue.h uses smp_mb() defined using barrier(), linux/kernel.h
-** hasn't yet been included yet so it fails, thus repeating the macro here.
-**
-** PA-RISC architecture allows for weakly ordered memory accesses although
-** none of the processors use it. There is a strong ordered bit that is
-** set in the O-bit of the page directory entry. Operating systems that
-** can not tolerate out of order accesses should set this bit when mapping
-** pages. The O-bit of the PSW should also be set to 1 (I don't believe any
-** of the processor implemented the PSW O-bit). The PCX-W ERS states that
-** the TLB O-bit is not implemented so the page directory does not need to
-** have the O-bit set when mapping pages (section 3.1). This section also
-** states that the PSW Y, Z, G, and O bits are not implemented.
-** So it looks like nothing needs to be done for parisc-linux (yet).
-** (thanks to chada for the above comment -ggg)
-**
-** The __asm__ op below simple prevents gcc/ld from reordering
-** instructions across the mb() "call".
-*/
-#define mb()		__asm__ __volatile__("":::"memory")	/* barrier() */
-#define rmb()		mb()
-#define wmb()		mb()
-#define smp_mb()	mb()
-#define smp_rmb()	mb()
-#define smp_wmb()	mb()
-#define smp_read_barrier_depends()	do { } while(0)
-#define read_barrier_depends()		do { } while(0)
-
-#define set_mb(var, value)		do { var = value; mb(); } while (0)
-
-#endif /* __PARISC_BARRIER_H */
diff --git a/arch/score/include/asm/Kbuild b/arch/score/include/asm/Kbuild
index f3414ad..ee2993b 100644
--- a/arch/score/include/asm/Kbuild
+++ b/arch/score/include/asm/Kbuild
@@ -5,3 +5,4 @@ generic-y += clkdev.h
 generic-y += trace_clock.h
 generic-y += xor.h
 generic-y += preempt.h
+generic-y += barrier.h
diff --git a/arch/score/include/asm/barrier.h b/arch/score/include/asm/barrier.h
deleted file mode 100644
index 0eacb64..0000000
--- a/arch/score/include/asm/barrier.h
+++ /dev/null
@@ -1,16 +0,0 @@
-#ifndef _ASM_SCORE_BARRIER_H
-#define _ASM_SCORE_BARRIER_H
-
-#define mb()		barrier()
-#define rmb()		barrier()
-#define wmb()		barrier()
-#define smp_mb()	barrier()
-#define smp_rmb()	barrier()
-#define smp_wmb()	barrier()
-
-#define read_barrier_depends()		do {} while (0)
-#define smp_read_barrier_depends()	do {} while (0)
-
-#define set_mb(var, value) 		do {var = value; wmb(); } while (0)
-
-#endif /* _ASM_SCORE_BARRIER_H */
diff --git a/arch/sh/include/asm/barrier.h b/arch/sh/include/asm/barrier.h
index 72c103d..4371530 100644
--- a/arch/sh/include/asm/barrier.h
+++ b/arch/sh/include/asm/barrier.h
@@ -26,29 +26,14 @@
 #if defined(CONFIG_CPU_SH4A) || defined(CONFIG_CPU_SH5)
 #define mb()		__asm__ __volatile__ ("synco": : :"memory")
 #define rmb()		mb()
-#define wmb()		__asm__ __volatile__ ("synco": : :"memory")
+#define wmb()		mb()
 #define ctrl_barrier()	__icbi(PAGE_OFFSET)
-#define read_barrier_depends()	do { } while(0)
 #else
-#define mb()		__asm__ __volatile__ ("": : :"memory")
-#define rmb()		mb()
-#define wmb()		__asm__ __volatile__ ("": : :"memory")
 #define ctrl_barrier()	__asm__ __volatile__ ("nop;nop;nop;nop;nop;nop;nop;nop")
-#define read_barrier_depends()	do { } while(0)
-#endif
-
-#ifdef CONFIG_SMP
-#define smp_mb()	mb()
-#define smp_rmb()	rmb()
-#define smp_wmb()	wmb()
-#define smp_read_barrier_depends()	read_barrier_depends()
-#else
-#define smp_mb()	barrier()
-#define smp_rmb()	barrier()
-#define smp_wmb()	barrier()
-#define smp_read_barrier_depends()	do { } while(0)
 #endif
 
 #define set_mb(var, value) do { (void)xchg(&var, value); } while (0)
 
+#include <asm-generic/barrier.h>
+
 #endif /* __ASM_SH_BARRIER_H */
diff --git a/arch/sparc/include/asm/barrier_32.h b/arch/sparc/include/asm/barrier_32.h
index c1b7665..ae69eda 100644
--- a/arch/sparc/include/asm/barrier_32.h
+++ b/arch/sparc/include/asm/barrier_32.h
@@ -1,15 +1,7 @@
 #ifndef __SPARC_BARRIER_H
 #define __SPARC_BARRIER_H
 
-/* XXX Change this if we ever use a PSO mode kernel. */
-#define mb()	__asm__ __volatile__ ("" : : : "memory")
-#define rmb()	mb()
-#define wmb()	mb()
-#define read_barrier_depends()	do { } while(0)
-#define set_mb(__var, __value)  do { __var = __value; mb(); } while(0)
-#define smp_mb()	__asm__ __volatile__("":::"memory")
-#define smp_rmb()	__asm__ __volatile__("":::"memory")
-#define smp_wmb()	__asm__ __volatile__("":::"memory")
-#define smp_read_barrier_depends()	do { } while(0)
+#include <asm/processor.h> /* for nop() */
+#include <asm-generic/barrier.h>
 
 #endif /* !(__SPARC_BARRIER_H) */
diff --git a/arch/tile/include/asm/barrier.h b/arch/tile/include/asm/barrier.h
index a9a73da..b5a05d0 100644
--- a/arch/tile/include/asm/barrier.h
+++ b/arch/tile/include/asm/barrier.h
@@ -22,59 +22,6 @@
 #include <arch/spr_def.h>
 #include <asm/timex.h>
 
-/*
- * read_barrier_depends - Flush all pending reads that subsequents reads
- * depend on.
- *
- * No data-dependent reads from memory-like regions are ever reordered
- * over this barrier.  All reads preceding this primitive are guaranteed
- * to access memory (but not necessarily other CPUs' caches) before any
- * reads following this primitive that depend on the data return by
- * any of the preceding reads.  This primitive is much lighter weight than
- * rmb() on most CPUs, and is never heavier weight than is
- * rmb().
- *
- * These ordering constraints are respected by both the local CPU
- * and the compiler.
- *
- * Ordering is not guaranteed by anything other than these primitives,
- * not even by data dependencies.  See the documentation for
- * memory_barrier() for examples and URLs to more information.
- *
- * For example, the following code would force ordering (the initial
- * value of "a" is zero, "b" is one, and "p" is "&a"):
- *
- * <programlisting>
- *	CPU 0				CPU 1
- *
- *	b = 2;
- *	memory_barrier();
- *	p = &b;				q = p;
- *					read_barrier_depends();
- *					d = *q;
- * </programlisting>
- *
- * because the read of "*q" depends on the read of "p" and these
- * two reads are separated by a read_barrier_depends().  However,
- * the following code, with the same initial values for "a" and "b":
- *
- * <programlisting>
- *	CPU 0				CPU 1
- *
- *	a = 2;
- *	memory_barrier();
- *	b = 3;				y = b;
- *					read_barrier_depends();
- *					x = a;
- * </programlisting>
- *
- * does not enforce ordering, since there is no data dependency between
- * the read of "a" and the read of "b".  Therefore, on some CPUs, such
- * as Alpha, "y" could be set to 3 and "x" to 0.  Use rmb()
- * in cases like this where there are no data dependencies.
- */
-#define read_barrier_depends()	do { } while (0)
-
 #define __sync()	__insn_mf()
 
 #include <hv/syscall_public.h>
@@ -125,20 +72,7 @@ mb_incoherent(void)
 #define mb()		fast_mb()
 #define iob()		fast_iob()
 
-#ifdef CONFIG_SMP
-#define smp_mb()	mb()
-#define smp_rmb()	rmb()
-#define smp_wmb()	wmb()
-#define smp_read_barrier_depends()	read_barrier_depends()
-#else
-#define smp_mb()	barrier()
-#define smp_rmb()	barrier()
-#define smp_wmb()	barrier()
-#define smp_read_barrier_depends()	do { } while (0)
-#endif
-
-#define set_mb(var, value) \
-	do { var = value; mb(); } while (0)
+#include <asm-generic/barrier.h>
 
 #endif /* !__ASSEMBLY__ */
 #endif /* _ASM_TILE_BARRIER_H */
diff --git a/arch/unicore32/include/asm/barrier.h b/arch/unicore32/include/asm/barrier.h
index a6620e5..83d6a52 100644
--- a/arch/unicore32/include/asm/barrier.h
+++ b/arch/unicore32/include/asm/barrier.h
@@ -14,15 +14,6 @@
 #define dsb() __asm__ __volatile__ ("" : : : "memory")
 #define dmb() __asm__ __volatile__ ("" : : : "memory")
 
-#define mb()				barrier()
-#define rmb()				barrier()
-#define wmb()				barrier()
-#define smp_mb()			barrier()
-#define smp_rmb()			barrier()
-#define smp_wmb()			barrier()
-#define read_barrier_depends()		do { } while (0)
-#define smp_read_barrier_depends()	do { } while (0)
-
-#define set_mb(var, value)		do { var = value; smp_mb(); } while (0)
+#include <asm-generic/barrier.h>
 
 #endif /* __UNICORE_BARRIER_H__ */
diff --git a/arch/xtensa/include/asm/barrier.h b/arch/xtensa/include/asm/barrier.h
index ef02167..e1ee6b5 100644
--- a/arch/xtensa/include/asm/barrier.h
+++ b/arch/xtensa/include/asm/barrier.h
@@ -9,21 +9,14 @@
 #ifndef _XTENSA_SYSTEM_H
 #define _XTENSA_SYSTEM_H
 
-#define smp_read_barrier_depends() do { } while(0)
-#define read_barrier_depends() do { } while(0)
-
 #define mb()  ({ __asm__ __volatile__("memw" : : : "memory"); })
 #define rmb() barrier()
 #define wmb() mb()
 
 #ifdef CONFIG_SMP
 #error smp_* not defined
-#else
-#define smp_mb()	barrier()
-#define smp_rmb()	barrier()
-#define smp_wmb()	barrier()
 #endif
 
-#define set_mb(var, value)	do { var = value; mb(); } while (0)
+#include <asm-generic/barrier.h>
 
 #endif /* _XTENSA_SYSTEM_H */
diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
index 639d7a4..d12a90f 100644
--- a/include/asm-generic/barrier.h
+++ b/include/asm-generic/barrier.h
@@ -1,4 +1,5 @@
-/* Generic barrier definitions, based on MN10300 definitions.
+/*
+ * Generic barrier definitions, originally based on MN10300 definitions.
  *
  * It should be possible to use these on really simple architectures,
  * but it serves more as a starting point for new ports.
@@ -16,35 +17,50 @@
 
 #ifndef __ASSEMBLY__
 
-#define nop() asm volatile ("nop")
+#include <linux/compiler.h>
+
+#ifndef nop
+#define nop()	asm volatile ("nop")
+#endif
 
 /*
- * Force strict CPU ordering.
- * And yes, this is required on UP too when we're talking
- * to devices.
+ * Force strict CPU ordering. And yes, this is required on UP too when we're
+ * talking to devices.
  *
- * This implementation only contains a compiler barrier.
+ * Fall back to compiler barriers if nothing better is provided.
  */
 
-#define mb()	asm volatile ("": : :"memory")
+#ifndef mb
+#define mb()	barrier()
+#endif
+
+#ifndef rmb
 #define rmb()	mb()
-#define wmb()	asm volatile ("": : :"memory")
+#endif
+
+#ifndef wmb
+#define wmb()	mb()
+#endif
+
+#ifndef read_barrier_depends
+#define read_barrier_depends()		do { } while (0)
+#endif
 
 #ifdef CONFIG_SMP
 #define smp_mb()	mb()
 #define smp_rmb()	rmb()
 #define smp_wmb()	wmb()
+#define smp_read_barrier_depends()	read_barrier_depends()
 #else
 #define smp_mb()	barrier()
 #define smp_rmb()	barrier()
 #define smp_wmb()	barrier()
+#define smp_read_barrier_depends()	do { } while (0)
 #endif
 
-#define set_mb(var, value)  do { var = value;  mb(); } while (0)
-#define set_wmb(var, value) do { var = value; wmb(); } while (0)
-
-#define read_barrier_depends()		do {} while (0)
-#define smp_read_barrier_depends()	do {} while (0)
+#ifndef set_mb
+#define set_mb(var, value)  do { (var) = (value); mb(); } while (0)
+#endif
 
 #endif /* !__ASSEMBLY__ */
 #endif /* __ASM_GENERIC_BARRIER_H */

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [tip:core/locking] arch: Introduce smp_load_acquire(), smp_store_release()
  2013-12-13 14:57 ` [PATCH 4/4] arch: Introduce smp_load_acquire(), smp_store_release() Peter Zijlstra
  2013-12-16 16:40   ` Will Deacon
  2013-12-16 21:37   ` Paul E. McKenney
@ 2014-01-12 18:43   ` tip-bot for Peter Zijlstra
  2 siblings, 0 replies; 28+ messages in thread
From: tip-bot for Peter Zijlstra @ 2014-01-12 18:43 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mathieu.desnoyers, mingo, torvalds, schwidefsky, peterz,
	tony.luck, fweisbec, linux, heiko.carstens, oleg, VICTORK, tglx,
	linux-kernel, hpa, will.deacon, michael, paulmck, benh, geert,
	mikey

Commit-ID:  47933ad41a86a4a9b50bed7c9b9bd2ba242aac63
Gitweb:     http://git.kernel.org/tip/47933ad41a86a4a9b50bed7c9b9bd2ba242aac63
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Wed, 6 Nov 2013 14:57:36 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sun, 12 Jan 2014 10:37:17 +0100

arch: Introduce smp_load_acquire(), smp_store_release()

A number of situations currently require the heavyweight smp_mb(),
even though there is no need to order prior stores against later
loads.  Many architectures have much cheaper ways to handle these
situations, but the Linux kernel currently has no portable way
to make use of them.

This commit therefore supplies smp_load_acquire() and
smp_store_release() to remedy this situation.  The new
smp_load_acquire() primitive orders the specified load against
any subsequent reads or writes, while the new smp_store_release()
primitive orders the specifed store against any prior reads or
writes.  These primitives allow array-based circular FIFOs to be
implemented without an smp_mb(), and also allow a theoretical
hole in rcu_assign_pointer() to be closed at no additional
expense on most architectures.

In addition, the RCU experience transitioning from explicit
smp_read_barrier_depends() and smp_wmb() to rcu_dereference()
and rcu_assign_pointer(), respectively resulted in substantial
improvements in readability.  It therefore seems likely that
replacing other explicit barriers with smp_load_acquire() and
smp_store_release() will provide similar benefits.  It appears
that roughly half of the explicit barriers in core kernel code
might be so replaced.

[Changelog by PaulMck]

Reviewed-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Michael Ellerman <michael@ellerman.id.au>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Victor Kaplansky <VICTORK@il.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: http://lkml.kernel.org/r/20131213150640.908486364@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/arm/include/asm/barrier.h      | 15 +++++++++++
 arch/arm64/include/asm/barrier.h    | 50 +++++++++++++++++++++++++++++++++++++
 arch/ia64/include/asm/barrier.h     | 23 +++++++++++++++++
 arch/metag/include/asm/barrier.h    | 15 +++++++++++
 arch/mips/include/asm/barrier.h     | 15 +++++++++++
 arch/powerpc/include/asm/barrier.h  | 21 +++++++++++++++-
 arch/s390/include/asm/barrier.h     | 15 +++++++++++
 arch/sparc/include/asm/barrier_64.h | 15 +++++++++++
 arch/x86/include/asm/barrier.h      | 43 ++++++++++++++++++++++++++++++-
 include/asm-generic/barrier.h       | 15 +++++++++++
 include/linux/compiler.h            |  9 +++++++
 11 files changed, 234 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/barrier.h b/arch/arm/include/asm/barrier.h
index 60f15e2..2f59f74 100644
--- a/arch/arm/include/asm/barrier.h
+++ b/arch/arm/include/asm/barrier.h
@@ -59,6 +59,21 @@
 #define smp_wmb()	dmb(ishst)
 #endif
 
+#define smp_store_release(p, v)						\
+do {									\
+	compiletime_assert_atomic_type(*p);				\
+	smp_mb();							\
+	ACCESS_ONCE(*p) = (v);						\
+} while (0)
+
+#define smp_load_acquire(p)						\
+({									\
+	typeof(*p) ___p1 = ACCESS_ONCE(*p);				\
+	compiletime_assert_atomic_type(*p);				\
+	smp_mb();							\
+	___p1;								\
+})
+
 #define read_barrier_depends()		do { } while(0)
 #define smp_read_barrier_depends()	do { } while(0)
 
diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
index d4a6333..78e20ba 100644
--- a/arch/arm64/include/asm/barrier.h
+++ b/arch/arm64/include/asm/barrier.h
@@ -35,10 +35,60 @@
 #define smp_mb()	barrier()
 #define smp_rmb()	barrier()
 #define smp_wmb()	barrier()
+
+#define smp_store_release(p, v)						\
+do {									\
+	compiletime_assert_atomic_type(*p);				\
+	smp_mb();							\
+	ACCESS_ONCE(*p) = (v);						\
+} while (0)
+
+#define smp_load_acquire(p)						\
+({									\
+	typeof(*p) ___p1 = ACCESS_ONCE(*p);				\
+	compiletime_assert_atomic_type(*p);				\
+	smp_mb();							\
+	___p1;								\
+})
+
 #else
+
 #define smp_mb()	asm volatile("dmb ish" : : : "memory")
 #define smp_rmb()	asm volatile("dmb ishld" : : : "memory")
 #define smp_wmb()	asm volatile("dmb ishst" : : : "memory")
+
+#define smp_store_release(p, v)						\
+do {									\
+	compiletime_assert_atomic_type(*p);				\
+	switch (sizeof(*p)) {						\
+	case 4:								\
+		asm volatile ("stlr %w1, %0"				\
+				: "=Q" (*p) : "r" (v) : "memory");	\
+		break;							\
+	case 8:								\
+		asm volatile ("stlr %1, %0"				\
+				: "=Q" (*p) : "r" (v) : "memory");	\
+		break;							\
+	}								\
+} while (0)
+
+#define smp_load_acquire(p)						\
+({									\
+	typeof(*p) ___p1;						\
+	compiletime_assert_atomic_type(*p);				\
+	switch (sizeof(*p)) {						\
+	case 4:								\
+		asm volatile ("ldar %w0, %1"				\
+			: "=r" (___p1) : "Q" (*p) : "memory");		\
+		break;							\
+	case 8:								\
+		asm volatile ("ldar %0, %1"				\
+			: "=r" (___p1) : "Q" (*p) : "memory");		\
+		break;							\
+	}								\
+	___p1;								\
+})
+
 #endif
 
 #define read_barrier_depends()		do { } while(0)
diff --git a/arch/ia64/include/asm/barrier.h b/arch/ia64/include/asm/barrier.h
index 60576e0..d0a69aa 100644
--- a/arch/ia64/include/asm/barrier.h
+++ b/arch/ia64/include/asm/barrier.h
@@ -45,14 +45,37 @@
 # define smp_rmb()	rmb()
 # define smp_wmb()	wmb()
 # define smp_read_barrier_depends()	read_barrier_depends()
+
 #else
+
 # define smp_mb()	barrier()
 # define smp_rmb()	barrier()
 # define smp_wmb()	barrier()
 # define smp_read_barrier_depends()	do { } while(0)
+
 #endif
 
 /*
+ * IA64 GCC turns volatile stores into st.rel and volatile loads into ld.acq no
+ * need for asm trickery!
+ */
+
+#define smp_store_release(p, v)						\
+do {									\
+	compiletime_assert_atomic_type(*p);				\
+	barrier();							\
+	ACCESS_ONCE(*p) = (v);						\
+} while (0)
+
+#define smp_load_acquire(p)						\
+({									\
+	typeof(*p) ___p1 = ACCESS_ONCE(*p);				\
+	compiletime_assert_atomic_type(*p);				\
+	barrier();							\
+	___p1;								\
+})
+
+/*
  * XXX check on this ---I suspect what Linus really wants here is
  * acquire vs release semantics but we can't discuss this stuff with
  * Linus just yet.  Grrr...
diff --git a/arch/metag/include/asm/barrier.h b/arch/metag/include/asm/barrier.h
index c90bfc6..5d6b4b4 100644
--- a/arch/metag/include/asm/barrier.h
+++ b/arch/metag/include/asm/barrier.h
@@ -82,4 +82,19 @@ static inline void fence(void)
 #define smp_read_barrier_depends()     do { } while (0)
 #define set_mb(var, value) do { var = value; smp_mb(); } while (0)
 
+#define smp_store_release(p, v)						\
+do {									\
+	compiletime_assert_atomic_type(*p);				\
+	smp_mb();							\
+	ACCESS_ONCE(*p) = (v);						\
+} while (0)
+
+#define smp_load_acquire(p)						\
+({									\
+	typeof(*p) ___p1 = ACCESS_ONCE(*p);				\
+	compiletime_assert_atomic_type(*p);				\
+	smp_mb();							\
+	___p1;								\
+})
+
 #endif /* _ASM_METAG_BARRIER_H */
diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
index f26d8e1..e1aa4e4 100644
--- a/arch/mips/include/asm/barrier.h
+++ b/arch/mips/include/asm/barrier.h
@@ -180,4 +180,19 @@
 #define nudge_writes() mb()
 #endif
 
+#define smp_store_release(p, v)						\
+do {									\
+	compiletime_assert_atomic_type(*p);				\
+	smp_mb();							\
+	ACCESS_ONCE(*p) = (v);						\
+} while (0)
+
+#define smp_load_acquire(p)						\
+({									\
+	typeof(*p) ___p1 = ACCESS_ONCE(*p);				\
+	compiletime_assert_atomic_type(*p);				\
+	smp_mb();							\
+	___p1;								\
+})
+
 #endif /* __ASM_BARRIER_H */
diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h
index ae78225..f89da80 100644
--- a/arch/powerpc/include/asm/barrier.h
+++ b/arch/powerpc/include/asm/barrier.h
@@ -45,11 +45,15 @@
 #    define SMPWMB      eieio
 #endif
 
+#define __lwsync()	__asm__ __volatile__ (stringify_in_c(LWSYNC) : : :"memory")
+
 #define smp_mb()	mb()
-#define smp_rmb()	__asm__ __volatile__ (stringify_in_c(LWSYNC) : : :"memory")
+#define smp_rmb()	__lwsync()
 #define smp_wmb()	__asm__ __volatile__ (stringify_in_c(SMPWMB) : : :"memory")
 #define smp_read_barrier_depends()	read_barrier_depends()
 #else
+#define __lwsync()	barrier()
+
 #define smp_mb()	barrier()
 #define smp_rmb()	barrier()
 #define smp_wmb()	barrier()
@@ -65,4 +69,19 @@
 #define data_barrier(x)	\
 	asm volatile("twi 0,%0,0; isync" : : "r" (x) : "memory");
 
+#define smp_store_release(p, v)						\
+do {									\
+	compiletime_assert_atomic_type(*p);				\
+	__lwsync();							\
+	ACCESS_ONCE(*p) = (v);						\
+} while (0)
+
+#define smp_load_acquire(p)						\
+({									\
+	typeof(*p) ___p1 = ACCESS_ONCE(*p);				\
+	compiletime_assert_atomic_type(*p);				\
+	__lwsync();							\
+	___p1;								\
+})
+
 #endif /* _ASM_POWERPC_BARRIER_H */
diff --git a/arch/s390/include/asm/barrier.h b/arch/s390/include/asm/barrier.h
index 16760ee..578680f 100644
--- a/arch/s390/include/asm/barrier.h
+++ b/arch/s390/include/asm/barrier.h
@@ -32,4 +32,19 @@
 
 #define set_mb(var, value)		do { var = value; mb(); } while (0)
 
+#define smp_store_release(p, v)						\
+do {									\
+	compiletime_assert_atomic_type(*p);				\
+	barrier();							\
+	ACCESS_ONCE(*p) = (v);						\
+} while (0)
+
+#define smp_load_acquire(p)						\
+({									\
+	typeof(*p) ___p1 = ACCESS_ONCE(*p);				\
+	compiletime_assert_atomic_type(*p);				\
+	barrier();							\
+	___p1;								\
+})
+
 #endif /* __ASM_BARRIER_H */
diff --git a/arch/sparc/include/asm/barrier_64.h b/arch/sparc/include/asm/barrier_64.h
index 95d4598..b5aad96 100644
--- a/arch/sparc/include/asm/barrier_64.h
+++ b/arch/sparc/include/asm/barrier_64.h
@@ -53,4 +53,19 @@ do {	__asm__ __volatile__("ba,pt	%%xcc, 1f\n\t" \
 
 #define smp_read_barrier_depends()	do { } while(0)
 
+#define smp_store_release(p, v)						\
+do {									\
+	compiletime_assert_atomic_type(*p);				\
+	barrier();							\
+	ACCESS_ONCE(*p) = (v);						\
+} while (0)
+
+#define smp_load_acquire(p)						\
+({									\
+	typeof(*p) ___p1 = ACCESS_ONCE(*p);				\
+	compiletime_assert_atomic_type(*p);				\
+	barrier();							\
+	___p1;								\
+})
+
 #endif /* !(__SPARC64_BARRIER_H) */
diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index c6cd358..04a4890 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -92,12 +92,53 @@
 #endif
 #define smp_read_barrier_depends()	read_barrier_depends()
 #define set_mb(var, value) do { (void)xchg(&var, value); } while (0)
-#else
+#else /* !SMP */
 #define smp_mb()	barrier()
 #define smp_rmb()	barrier()
 #define smp_wmb()	barrier()
 #define smp_read_barrier_depends()	do { } while (0)
 #define set_mb(var, value) do { var = value; barrier(); } while (0)
+#endif /* SMP */
+
+#if defined(CONFIG_X86_OOSTORE) || defined(CONFIG_X86_PPRO_FENCE)
+
+/*
+ * For either of these options x86 doesn't have a strong TSO memory
+ * model and we should fall back to full barriers.
+ */
+
+#define smp_store_release(p, v)						\
+do {									\
+	compiletime_assert_atomic_type(*p);				\
+	smp_mb();							\
+	ACCESS_ONCE(*p) = (v);						\
+} while (0)
+
+#define smp_load_acquire(p)						\
+({									\
+	typeof(*p) ___p1 = ACCESS_ONCE(*p);				\
+	compiletime_assert_atomic_type(*p);				\
+	smp_mb();							\
+	___p1;								\
+})
+
+#else /* regular x86 TSO memory ordering */
+
+#define smp_store_release(p, v)						\
+do {									\
+	compiletime_assert_atomic_type(*p);				\
+	barrier();							\
+	ACCESS_ONCE(*p) = (v);						\
+} while (0)
+
+#define smp_load_acquire(p)						\
+({									\
+	typeof(*p) ___p1 = ACCESS_ONCE(*p);				\
+	compiletime_assert_atomic_type(*p);				\
+	barrier();							\
+	___p1;								\
+})
+
 #endif
 
 /*
diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
index d12a90f..6f692f8 100644
--- a/include/asm-generic/barrier.h
+++ b/include/asm-generic/barrier.h
@@ -62,5 +62,20 @@
 #define set_mb(var, value)  do { (var) = (value); mb(); } while (0)
 #endif
 
+#define smp_store_release(p, v)						\
+do {									\
+	compiletime_assert_atomic_type(*p);				\
+	smp_mb();							\
+	ACCESS_ONCE(*p) = (v);						\
+} while (0)
+
+#define smp_load_acquire(p)						\
+({									\
+	typeof(*p) ___p1 = ACCESS_ONCE(*p);				\
+	compiletime_assert_atomic_type(*p);				\
+	smp_mb();							\
+	___p1;								\
+})
+
 #endif /* !__ASSEMBLY__ */
 #endif /* __ASM_GENERIC_BARRIER_H */
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 92669cd..fe7a686 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -298,6 +298,11 @@ void ftrace_likely_update(struct ftrace_branch_data *f, int val, int expect);
 # define __same_type(a, b) __builtin_types_compatible_p(typeof(a), typeof(b))
 #endif
 
+/* Is this type a native word size -- useful for atomic operations */
+#ifndef __native_word
+# define __native_word(t) (sizeof(t) == sizeof(int) || sizeof(t) == sizeof(long))
+#endif
+
 /* Compile time object size, -1 for unknown */
 #ifndef __compiletime_object_size
 # define __compiletime_object_size(obj) -1
@@ -337,6 +342,10 @@ void ftrace_likely_update(struct ftrace_branch_data *f, int val, int expect);
 #define compiletime_assert(condition, msg) \
 	_compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
 
+#define compiletime_assert_atomic_type(t)				\
+	compiletime_assert(__native_word(t),				\
+		"Need native word sized stores/loads for atomicity.")
+
 /*
  * Prevent the compiler from merging or refetching accesses.  The compiler
  * is also forbidden from reordering successive instances of ACCESS_ONCE(),

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/4] arch: Introduce smp_load_acquire() and smp_store_release()
  2013-12-18 19:08 [PATCH 0/4] arch: Introduce smp_load_acquire() and smp_store_release() Peter Zijlstra
@ 2013-12-18 20:55 ` Paul E. McKenney
  0 siblings, 0 replies; 28+ messages in thread
From: Paul E. McKenney @ 2013-12-18 20:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-arch, linux-kernel, geert, torvalds, VICTORK, oleg, anton,
	benh, fweisbec, mathieu.desnoyers, michael, mikey, linux,
	schwidefsky, heiko.carstens, tony.luck

On Wed, Dec 18, 2013 at 08:08:06PM +0100, Peter Zijlstra wrote:
> 
> This should hopefully be the last posting of this series -- people felt it
> needed one more mostly because last time I typoed the linux-kernel email
> address.
> 
> If there are no further comments, Ingo will merge these patches in the next few
> days.

They still look good to me!

							Thanx, Paul

> ---
> 
> These patches introduce 2 new barrier primitives:
> 
>   smp_load_acquire(p)
>   smp_store_release(p, v)
> 
> See the first patch, which changes Documentation/memory-barriers.txt, to find
> the exact definitions of what an ACQUIRE/RELEASE barrier is -- previously known
> as LOCK/UNLOCK barriers.
> 
> The second patch moves the smp_mb__{before,after}_atomic_{dec,inc}() barriers
> to asm/atomic.h for arc and hexagon -- they were already there for all other archs.
> 
> This cleans up asm/barrier.h, and the third patch makes more agressive use of
> asm-generic/barrier.h to implement the simple cases.
> 
> Then the fourth patch adds the new primitives.
> 
> Previous versions were widely build tested -- this version is not, but it also
> not significantly different.
> 
> These patches apply to:
> 
>   tip/master
> 
> 
> ---
> Changes since the last version -- lkml.kernel.org/r/20131213145657.265414969@infradead.org
> 
>  - fixed linux-kernel email address
>  - updated the Documentation patch
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 0/4] arch: Introduce smp_load_acquire() and smp_store_release()
@ 2013-12-18 19:08 Peter Zijlstra
  2013-12-18 20:55 ` Paul E. McKenney
  0 siblings, 1 reply; 28+ messages in thread
From: Peter Zijlstra @ 2013-12-18 19:08 UTC (permalink / raw)
  To: linux-arch, linux-kernel
  Cc: geert, paulmck, torvalds, VICTORK, oleg, anton, benh, fweisbec,
	mathieu.desnoyers, michael, mikey, linux, schwidefsky,
	heiko.carstens, tony.luck, Peter Zijlstra

This should hopefully be the last posting of this series -- people felt it
needed one more mostly because last time I typoed the linux-kernel email
address.

If there are no further comments, Ingo will merge these patches in the next few
days.

---

These patches introduce 2 new barrier primitives:

  smp_load_acquire(p)
  smp_store_release(p, v)

See the first patch, which changes Documentation/memory-barriers.txt, to find
the exact definitions of what an ACQUIRE/RELEASE barrier is -- previously known
as LOCK/UNLOCK barriers.

The second patch moves the smp_mb__{before,after}_atomic_{dec,inc}() barriers
to asm/atomic.h for arc and hexagon -- they were already there for all other archs.

This cleans up asm/barrier.h, and the third patch makes more agressive use of
asm-generic/barrier.h to implement the simple cases.

Then the fourth patch adds the new primitives.

Previous versions were widely build tested -- this version is not, but it also
not significantly different.

These patches apply to:

  tip/master

---
Changes since the last version -- lkml.kernel.org/r/20131213145657.265414969@infradead.org

 - fixed linux-kernel email address
 - updated the Documentation patch

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 0/4] arch: Introduce smp_load_acquire() and smp_store_release()
@ 2013-11-07 22:03 peterz
  0 siblings, 0 replies; 28+ messages in thread
From: peterz @ 2013-11-07 22:03 UTC (permalink / raw)
  To: linux-arch
  Cc: geert, paulmck, torvalds, VICTORK, oleg, anton, benh, fweisbec,
	mathieu.desnoyers, michael, mikey, linux, schwidefsky,
	heiko.carstens, tony.luck, Peter Zijlstra

*** last posting didn't make it out to the lists ***

These patches introduce 2 new barrier primitives:

  smp_load_acquire(p)
  smp_store_release(p, v)

See the first patch, which changes Documentation/memory-barriers.txt, to find
the exact definitions of what an ACQUIRE/RELEASE barrier is -- previously known
as LOCK/UNLOCK barriers.

The second patch simplifies the asm/barrier.h implementation of a lot of
architectures by using asm-generic/barrier.h in order to save a lot of code
duplication later on.

The third patch adds the new barrier primitives.

The fourth adds the first user.

Build tested for:

alpha-defconfig - OK
ia64-defconfig - OK
m32r-defconfig - OK
frv-defconfig - OK
m68k-defconfig - OK
mips-defconfig - OK
mips-fuloong2e_defconfig - OK
arm-defconfig - OK
blackfin-defconfig - OK
mn10300-defconfig - OK
powerpc-ppc40x_defconfig - OK
powerpc-defconfig - OK
s390-defconfig - OK
sh-defconfig - OK
sparc-defconfig - OK
sparc64-defconfig - OK
i386-defconfig - OK
x86_64-defconfig - OK

Changes since the last posting that didn't make it out to lkml (and other lists)
due to an excessive Cc list:

 - added the fourth patch as a first user
 - changed the x86 implementation to not assume a TSO model 
   when OOSTORE or PPRO_FENCE

---
 Documentation/memory-barriers.txt     | 164 ++++++++++++++++++----------------
 arch/alpha/include/asm/barrier.h      |  25 ++----
 arch/arc/include/asm/Kbuild           |   1 +
 arch/arc/include/asm/atomic.h         |   5 ++
 arch/arc/include/asm/barrier.h        |  42 ---------
 arch/arm/include/asm/barrier.h        |  15 ++++
 arch/arm64/include/asm/barrier.h      |  50 +++++++++++
 arch/avr32/include/asm/barrier.h      |  17 ++--
 arch/blackfin/include/asm/barrier.h   |  18 +---
 arch/cris/include/asm/Kbuild          |   1 +
 arch/cris/include/asm/barrier.h       |  25 ------
 arch/frv/include/asm/barrier.h        |   8 +-
 arch/h8300/include/asm/barrier.h      |  21 +----
 arch/hexagon/include/asm/Kbuild       |   1 +
 arch/hexagon/include/asm/barrier.h    |  41 ---------
 arch/ia64/include/asm/barrier.h       |  49 ++++++++++
 arch/m32r/include/asm/barrier.h       |  80 +----------------
 arch/m68k/include/asm/barrier.h       |  14 +--
 arch/metag/include/asm/barrier.h      |  15 ++++
 arch/microblaze/include/asm/Kbuild    |   1 +
 arch/microblaze/include/asm/barrier.h |  27 ------
 arch/mips/include/asm/barrier.h       |  15 ++++
 arch/mn10300/include/asm/Kbuild       |   1 +
 arch/mn10300/include/asm/barrier.h    |  37 --------
 arch/parisc/include/asm/Kbuild        |   1 +
 arch/parisc/include/asm/barrier.h     |  35 --------
 arch/powerpc/include/asm/barrier.h    |  21 ++++-
 arch/s390/include/asm/barrier.h       |  15 ++++
 arch/score/include/asm/Kbuild         |   1 +
 arch/score/include/asm/barrier.h      |  16 ----
 arch/sh/include/asm/barrier.h         |  21 +----
 arch/sparc/include/asm/barrier_32.h   |  12 +--
 arch/sparc/include/asm/barrier_64.h   |  15 ++++
 arch/tile/include/asm/barrier.h       |  68 +-------------
 arch/unicore32/include/asm/barrier.h  |  11 +--
 arch/x86/include/asm/barrier.h        |  43 ++++++++-
 arch/xtensa/include/asm/barrier.h     |   9 +-
 include/asm-generic/barrier.h         |  55 +++++++++---
 include/linux/compiler.h              |   9 ++
 kernel/events/ring_buffer.c           |  62 +++++++------
 40 files changed, 444 insertions(+), 623 deletions(-)

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2014-01-12 18:46 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-12-13 14:56 [PATCH 0/4] arch: Introduce smp_load_acquire() and smp_store_release() Peter Zijlstra
2013-12-13 14:56 ` [PATCH 1/4] doc: Rename LOCK/UNLOCK to ACQUIRE/RELEASE Peter Zijlstra
2013-12-16 20:11   ` Paul E. McKenney
2013-12-17  9:24     ` Peter Zijlstra
2013-12-17 15:31       ` Paul E. McKenney
2014-01-12 18:43       ` [tip:core/locking] locking/doc: Rename LOCK/UNLOCK to ACQUIRE/ RELEASE tip-bot for Peter Zijlstra
2013-12-17 10:13     ` [PATCH 1/4] doc: Rename LOCK/UNLOCK to ACQUIRE/RELEASE Peter Zijlstra
2013-12-17 13:59       ` Paul E. McKenney
2013-12-13 14:56 ` [PATCH 2/4] arch: Move smp_mb__{before,after}_atomic_{inc,dec}.h into asm/atomic.h Peter Zijlstra
2013-12-16 20:13   ` Paul E. McKenney
2013-12-18 13:40   ` Vineet Gupta
2014-01-12 18:43   ` [tip:core/locking] arch: Move smp_mb__{before,after}_atomic_{inc, dec}.h " tip-bot for Peter Zijlstra
2013-12-13 14:57 ` [PATCH 3/4] arch: Clean up asm/barrier.h implementations using asm-generic/barrier.h Peter Zijlstra
2013-12-13 19:16   ` Geert Uytterhoeven
2013-12-13 19:17     ` Fwd: " Geert Uytterhoeven
2013-12-13 19:53     ` Peter Zijlstra
2013-12-16 20:14   ` Paul E. McKenney
2014-01-12 18:43   ` [tip:core/locking] arch: Clean up asm/ barrier.h " tip-bot for Peter Zijlstra
2013-12-13 14:57 ` [PATCH 4/4] arch: Introduce smp_load_acquire(), smp_store_release() Peter Zijlstra
2013-12-16 16:40   ` Will Deacon
2013-12-17  9:07     ` Peter Zijlstra
2013-12-16 21:37   ` Paul E. McKenney
2013-12-17  9:14     ` Peter Zijlstra
2013-12-17 13:58       ` Paul E. McKenney
2014-01-12 18:43   ` [tip:core/locking] " tip-bot for Peter Zijlstra
  -- strict thread matches above, loose matches on Subject: below --
2013-12-18 19:08 [PATCH 0/4] arch: Introduce smp_load_acquire() and smp_store_release() Peter Zijlstra
2013-12-18 20:55 ` Paul E. McKenney
2013-11-07 22:03 peterz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.