From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755236Ab3KFSs5 (ORCPT ); Wed, 6 Nov 2013 13:48:57 -0500 Received: from e31.co.us.ibm.com ([32.97.110.149]:36495 "EHLO e31.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754175Ab3KFSsz (ORCPT ); Wed, 6 Nov 2013 13:48:55 -0500 Date: Wed, 6 Nov 2013 10:48:48 -0800 From: "Paul E. McKenney" To: Peter Zijlstra Cc: Geert Uytterhoeven , Linus Torvalds , Victor Kaplansky , Oleg Nesterov , Anton Blanchard , Benjamin Herrenschmidt , Frederic Weisbecker , LKML , Linux PPC dev , Mathieu Desnoyers , Michael Ellerman , Michael Neuling , Russell King , Martin Schwidefsky , Heiko Carstens , Tony Luck Subject: Re: [RFC] arch: Introduce new TSO memory barrier smp_tmb() Message-ID: <20131106184848.GM18245@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <20131103224242.GF3947@linux.vnet.ibm.com> <20131104105059.GL3947@linux.vnet.ibm.com> <20131104112254.GK28601@twins.programming.kicks-ass.net> <20131104162732.GN3947@linux.vnet.ibm.com> <20131104191127.GW16117@laptop.programming.kicks-ass.net> <20131104205344.GW3947@linux.vnet.ibm.com> <20131106123946.GJ10651@twins.programming.kicks-ass.net> <20131106135736.GK10651@twins.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20131106135736.GK10651@twins.programming.kicks-ass.net> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 13110618-8236-0000-0000-000003800E84 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Nov 06, 2013 at 02:57:36PM +0100, Peter Zijlstra wrote: > On Wed, Nov 06, 2013 at 01:51:10PM +0100, Geert Uytterhoeven wrote: > > This is screaming for a default implementation in asm-generic. > > Right you are... how about a little something like this? > > There's a few archs I didn't fully merge with the generic one because of > weird nop implementations. > > asm volatile ("nop" :: ) vs asm volatile ("nop" ::: "memory") and the > like. They probably can (and should) use the regular asm volatile > ("nop") but I misplaced the toolchains for many of the weird archs so I > didn't attempt. > > Also fixed a silly mistake in the return type definition for most > smp_load_acquire() implementions: typeof(p) vs typeof(*p). > > --- > Subject: arch: Introduce smp_load_acquire(), smp_store_release() > From: Peter Zijlstra > Date: Mon, 4 Nov 2013 20:18:11 +0100 > > A number of situations currently require the heavyweight smp_mb(), > even though there is no need to order prior stores against later > loads. Many architectures have much cheaper ways to handle these > situations, but the Linux kernel currently has no portable way > to make use of them. > > This commit therefore supplies smp_load_acquire() and > smp_store_release() to remedy this situation. The new > smp_load_acquire() primitive orders the specified load against > any subsequent reads or writes, while the new smp_store_release() > primitive orders the specifed store against any prior reads or > writes. These primitives allow array-based circular FIFOs to be > implemented without an smp_mb(), and also allow a theoretical > hole in rcu_assign_pointer() to be closed at no additional > expense on most architectures. > > In addition, the RCU experience transitioning from explicit > smp_read_barrier_depends() and smp_wmb() to rcu_dereference() > and rcu_assign_pointer(), respectively resulted in substantial > improvements in readability. It therefore seems likely that > replacing other explicit barriers with smp_load_acquire() and > smp_store_release() will provide similar benefits. It appears > that roughly half of the explicit barriers in core kernel code > might be so replaced. > > > Cc: Michael Ellerman > Cc: Michael Neuling > Cc: "Paul E. McKenney" > Cc: Linus Torvalds > Cc: Victor Kaplansky > Cc: Oleg Nesterov > Cc: Anton Blanchard > Cc: Benjamin Herrenschmidt > Cc: Frederic Weisbecker > Cc: Mathieu Desnoyers > Signed-off-by: Peter Zijlstra A few nits on Documentation/memory-barriers.txt and some pointless comments elsewhere. With the suggested Documentation/memory-barriers.txt fixes: Reviewed-by: Paul E. McKenney > --- > Documentation/memory-barriers.txt | 157 +++++++++++++++++----------------- > arch/alpha/include/asm/barrier.h | 25 +---- > arch/arc/include/asm/Kbuild | 1 > arch/arc/include/asm/atomic.h | 5 + > arch/arc/include/asm/barrier.h | 42 --------- > arch/arm/include/asm/barrier.h | 15 +++ > arch/arm64/include/asm/barrier.h | 50 ++++++++++ > arch/avr32/include/asm/barrier.h | 17 +-- > arch/blackfin/include/asm/barrier.h | 18 --- > arch/cris/include/asm/Kbuild | 1 > arch/cris/include/asm/barrier.h | 25 ----- > arch/frv/include/asm/barrier.h | 8 - > arch/h8300/include/asm/barrier.h | 21 ---- > arch/hexagon/include/asm/Kbuild | 1 > arch/hexagon/include/asm/barrier.h | 41 -------- > arch/ia64/include/asm/barrier.h | 49 ++++++++++ > arch/m32r/include/asm/barrier.h | 80 ----------------- > arch/m68k/include/asm/barrier.h | 14 --- > arch/metag/include/asm/barrier.h | 15 +++ > arch/microblaze/include/asm/Kbuild | 1 > arch/microblaze/include/asm/barrier.h | 27 ----- > arch/mips/include/asm/barrier.h | 15 +++ > arch/mn10300/include/asm/Kbuild | 1 > arch/mn10300/include/asm/barrier.h | 37 -------- > arch/parisc/include/asm/Kbuild | 1 > arch/parisc/include/asm/barrier.h | 35 ------- > arch/powerpc/include/asm/barrier.h | 21 ++++ > arch/s390/include/asm/barrier.h | 15 +++ > arch/score/include/asm/Kbuild | 1 > arch/score/include/asm/barrier.h | 16 --- > arch/sh/include/asm/barrier.h | 21 ---- > arch/sparc/include/asm/barrier_32.h | 11 -- > arch/sparc/include/asm/barrier_64.h | 15 +++ > arch/tile/include/asm/barrier.h | 68 -------------- > arch/unicore32/include/asm/barrier.h | 11 -- > arch/x86/include/asm/barrier.h | 15 +++ > arch/xtensa/include/asm/barrier.h | 9 - > include/asm-generic/barrier.h | 55 +++++++++-- > include/linux/compiler.h | 9 + > 39 files changed, 375 insertions(+), 594 deletions(-) > > --- a/Documentation/memory-barriers.txt > +++ b/Documentation/memory-barriers.txt > @@ -371,33 +371,35 @@ VARIETIES OF MEMORY BARRIER > > And a couple of implicit varieties: > > - (5) LOCK operations. > + (5) ACQUIRE operations. > > This acts as a one-way permeable barrier. It guarantees that all memory > - operations after the LOCK operation will appear to happen after the LOCK > - operation with respect to the other components of the system. > + operations after the ACQUIRE operation will appear to happen after the > + ACQUIRE operation with respect to the other components of the system. ACQUIRE operations include LOCK operations and smp_load_acquire() operations. > > - Memory operations that occur before a LOCK operation may appear to happen > - after it completes. > + Memory operations that occur before a ACQUIRE operation may appear to > + happen after it completes. > > - A LOCK operation should almost always be paired with an UNLOCK operation. > + A ACQUIRE operation should almost always be paired with an RELEASE > + operation. > > > - (6) UNLOCK operations. > + (6) RELEASE operations. > > This also acts as a one-way permeable barrier. It guarantees that all > - memory operations before the UNLOCK operation will appear to happen before > - the UNLOCK operation with respect to the other components of the system. > + memory operations before the RELEASE operation will appear to happen > + before the RELEASE operation with respect to the other components of the > + system. Release operations include UNLOCK operations and smp_store_release() operations. > - Memory operations that occur after an UNLOCK operation may appear to > + Memory operations that occur after an RELEASE operation may appear to > happen before it completes. > > - LOCK and UNLOCK operations are guaranteed to appear with respect to each > - other strictly in the order specified. > + ACQUIRE and RELEASE operations are guaranteed to appear with respect to > + each other strictly in the order specified. > > - The use of LOCK and UNLOCK operations generally precludes the need for > - other sorts of memory barrier (but note the exceptions mentioned in the > - subsection "MMIO write barrier"). > + The use of ACQUIRE and RELEASE operations generally precludes the need > + for other sorts of memory barrier (but note the exceptions mentioned in > + the subsection "MMIO write barrier"). > > > Memory barriers are only required where there's a possibility of interaction > @@ -1135,7 +1137,7 @@ CPU from reordering them. > clear_bit( ... ); > > This prevents memory operations before the clear leaking to after it. See > - the subsection on "Locking Functions" with reference to UNLOCK operation > + the subsection on "Locking Functions" with reference to RELEASE operation > implications. > > See Documentation/atomic_ops.txt for more information. See the "Atomic > @@ -1181,65 +1183,66 @@ LOCKING FUNCTIONS > (*) R/W semaphores > (*) RCU > > -In all cases there are variants on "LOCK" operations and "UNLOCK" operations > +In all cases there are variants on "ACQUIRE" operations and "RELEASE" operations > for each construct. These operations all imply certain barriers: > > - (1) LOCK operation implication: > + (1) ACQUIRE operation implication: > > - Memory operations issued after the LOCK will be completed after the LOCK > - operation has completed. > + Memory operations issued after the ACQUIRE will be completed after the > + ACQUIRE operation has completed. > > - Memory operations issued before the LOCK may be completed after the LOCK > - operation has completed. > + Memory operations issued before the ACQUIRE may be completed after the > + ACQUIRE operation has completed. > > - (2) UNLOCK operation implication: > + (2) RELEASE operation implication: > > - Memory operations issued before the UNLOCK will be completed before the > - UNLOCK operation has completed. > + Memory operations issued before the RELEASE will be completed before the > + RELEASE operation has completed. > > - Memory operations issued after the UNLOCK may be completed before the > - UNLOCK operation has completed. > + Memory operations issued after the RELEASE may be completed before the > + RELEASE operation has completed. > > - (3) LOCK vs LOCK implication: > + (3) ACQUIRE vs ACQUIRE implication: > > - All LOCK operations issued before another LOCK operation will be completed > - before that LOCK operation. > + All ACQUIRE operations issued before another ACQUIRE operation will be > + completed before that ACQUIRE operation. > > - (4) LOCK vs UNLOCK implication: > + (4) ACQUIRE vs RELEASE implication: > > - All LOCK operations issued before an UNLOCK operation will be completed > - before the UNLOCK operation. > + All ACQUIRE operations issued before an RELEASE operation will be > + completed before the RELEASE operation. > > - All UNLOCK operations issued before a LOCK operation will be completed > - before the LOCK operation. > + All RELEASE operations issued before a ACQUIRE operation will be > + completed before the ACQUIRE operation. > > - (5) Failed conditional LOCK implication: > + (5) Failed conditional ACQUIRE implication: > > - Certain variants of the LOCK operation may fail, either due to being > + Certain variants of the ACQUIRE operation may fail, either due to being > unable to get the lock immediately, or due to receiving an unblocked > signal whilst asleep waiting for the lock to become available. Failed > locks do not imply any sort of barrier. I suggest adding "For example" to the beginning of the last sentence: For example, failed lock acquisitions do not imply any sort of barrier. Otherwise, the transition from ACQUIRE to lock is strange. > -Therefore, from (1), (2) and (4) an UNLOCK followed by an unconditional LOCK is > -equivalent to a full barrier, but a LOCK followed by an UNLOCK is not. > +Therefore, from (1), (2) and (4) an RELEASE followed by an unconditional > +ACQUIRE is equivalent to a full barrier, but a ACQUIRE followed by an RELEASE > +is not. > > [!] Note: one of the consequences of LOCKs and UNLOCKs being only one-way > barriers is that the effects of instructions outside of a critical section > may seep into the inside of the critical section. > > -A LOCK followed by an UNLOCK may not be assumed to be full memory barrier > -because it is possible for an access preceding the LOCK to happen after the > -LOCK, and an access following the UNLOCK to happen before the UNLOCK, and the > -two accesses can themselves then cross: > +A ACQUIRE followed by an RELEASE may not be assumed to be full memory barrier > +because it is possible for an access preceding the ACQUIRE to happen after the > +ACQUIRE, and an access following the RELEASE to happen before the RELEASE, and > +the two accesses can themselves then cross: > > *A = a; > - LOCK > - UNLOCK > + ACQUIRE > + RELEASE > *B = b; > > may occur as: > > - LOCK, STORE *B, STORE *A, UNLOCK > + ACQUIRE, STORE *B, STORE *A, RELEASE > > Locks and semaphores may not provide any guarantee of ordering on UP compiled > systems, and so cannot be counted on in such a situation to actually achieve > @@ -1253,33 +1256,33 @@ See also the section on "Inter-CPU locki > > *A = a; > *B = b; > - LOCK > + ACQUIRE > *C = c; > *D = d; > - UNLOCK > + RELEASE > *E = e; > *F = f; > > The following sequence of events is acceptable: > > - LOCK, {*F,*A}, *E, {*C,*D}, *B, UNLOCK > + ACQUIRE, {*F,*A}, *E, {*C,*D}, *B, RELEASE > > [+] Note that {*F,*A} indicates a combined access. > > But none of the following are: > > - {*F,*A}, *B, LOCK, *C, *D, UNLOCK, *E > - *A, *B, *C, LOCK, *D, UNLOCK, *E, *F > - *A, *B, LOCK, *C, UNLOCK, *D, *E, *F > - *B, LOCK, *C, *D, UNLOCK, {*F,*A}, *E > + {*F,*A}, *B, ACQUIRE, *C, *D, RELEASE, *E > + *A, *B, *C, ACQUIRE, *D, RELEASE, *E, *F > + *A, *B, ACQUIRE, *C, RELEASE, *D, *E, *F > + *B, ACQUIRE, *C, *D, RELEASE, {*F,*A}, *E > > > > INTERRUPT DISABLING FUNCTIONS > ----------------------------- > > -Functions that disable interrupts (LOCK equivalent) and enable interrupts > -(UNLOCK equivalent) will act as compiler barriers only. So if memory or I/O > +Functions that disable interrupts (ACQUIRE equivalent) and enable interrupts > +(RELEASE equivalent) will act as compiler barriers only. So if memory or I/O > barriers are required in such a situation, they must be provided from some > other means. > > @@ -1436,24 +1439,24 @@ Consider the following: the system has a > CPU 1 CPU 2 > =============================== =============================== > *A = a; *E = e; > - LOCK M LOCK Q > + ACQUIRE M ACQUIRE Q > *B = b; *F = f; > *C = c; *G = g; > - UNLOCK M UNLOCK Q > + RELEASE M RELEASE Q > *D = d; *H = h; > > Then there is no guarantee as to what order CPU 3 will see the accesses to *A > through *H occur in, other than the constraints imposed by the separate locks > on the separate CPUs. It might, for example, see: > > - *E, LOCK M, LOCK Q, *G, *C, *F, *A, *B, UNLOCK Q, *D, *H, UNLOCK M > + *E, ACQUIRE M, ACQUIRE Q, *G, *C, *F, *A, *B, RELEASE Q, *D, *H, RELEASE M > > But it won't see any of: > > - *B, *C or *D preceding LOCK M > - *A, *B or *C following UNLOCK M > - *F, *G or *H preceding LOCK Q > - *E, *F or *G following UNLOCK Q > + *B, *C or *D preceding ACQUIRE M > + *A, *B or *C following RELEASE M > + *F, *G or *H preceding ACQUIRE Q > + *E, *F or *G following RELEASE Q > > > However, if the following occurs: > @@ -1461,28 +1464,28 @@ through *H occur in, other than the cons > CPU 1 CPU 2 > =============================== =============================== > *A = a; > - LOCK M [1] > + ACQUIRE M [1] > *B = b; > *C = c; > - UNLOCK M [1] > + RELEASE M [1] > *D = d; *E = e; > - LOCK M [2] > + ACQUIRE M [2] > *F = f; > *G = g; > - UNLOCK M [2] > + RELEASE M [2] > *H = h; > > CPU 3 might see: > > - *E, LOCK M [1], *C, *B, *A, UNLOCK M [1], > - LOCK M [2], *H, *F, *G, UNLOCK M [2], *D > + *E, ACQUIRE M [1], *C, *B, *A, RELEASE M [1], > + ACQUIRE M [2], *H, *F, *G, RELEASE M [2], *D > > But assuming CPU 1 gets the lock first, CPU 3 won't see any of: > > - *B, *C, *D, *F, *G or *H preceding LOCK M [1] > - *A, *B or *C following UNLOCK M [1] > - *F, *G or *H preceding LOCK M [2] > - *A, *B, *C, *E, *F or *G following UNLOCK M [2] > + *B, *C, *D, *F, *G or *H preceding ACQUIRE M [1] > + *A, *B or *C following RELEASE M [1] > + *F, *G or *H preceding ACQUIRE M [2] > + *A, *B, *C, *E, *F or *G following RELEASE M [2] > > > LOCKS VS I/O ACCESSES > @@ -1702,13 +1705,13 @@ about the state (old or new) implies an > test_and_clear_bit(); > test_and_change_bit(); > > -These are used for such things as implementing LOCK-class and UNLOCK-class > +These are used for such things as implementing ACQUIRE-class and RELEASE-class > operations and adjusting reference counters towards object destruction, and as > such the implicit memory barrier effects are necessary. > > > The following operations are potential problems as they do _not_ imply memory > -barriers, but might be used for implementing such things as UNLOCK-class > +barriers, but might be used for implementing such things as RELEASE-class > operations: > > atomic_set(); > @@ -1750,9 +1753,9 @@ barriers are needed or not. > clear_bit_unlock(); > __clear_bit_unlock(); > > -These implement LOCK-class and UNLOCK-class operations. These should be used in > -preference to other operations when implementing locking primitives, because > -their implementations can be optimised on many architectures. > +These implement ACQUIRE-class and RELEASE-class operations. These should be > +used in preference to other operations when implementing locking primitives, > +because their implementations can be optimised on many architectures. > > [!] Note that special memory barrier primitives are available for these > situations because on some CPUs the atomic instructions used imply full memory > --- a/arch/alpha/include/asm/barrier.h > +++ b/arch/alpha/include/asm/barrier.h > @@ -3,33 +3,18 @@ > > #include > > -#define mb() \ > -__asm__ __volatile__("mb": : :"memory") > +#define mb() __asm__ __volatile__("mb": : :"memory") > +#define rmb() __asm__ __volatile__("mb": : :"memory") > +#define wmb() __asm__ __volatile__("wmb": : :"memory") > > -#define rmb() \ > -__asm__ __volatile__("mb": : :"memory") > - > -#define wmb() \ > -__asm__ __volatile__("wmb": : :"memory") > - > -#define read_barrier_depends() \ > -__asm__ __volatile__("mb": : :"memory") > +#define read_barrier_depends() __asm__ __volatile__("mb": : :"memory") > > #ifdef CONFIG_SMP > #define __ASM_SMP_MB "\tmb\n" > -#define smp_mb() mb() > -#define smp_rmb() rmb() > -#define smp_wmb() wmb() > -#define smp_read_barrier_depends() read_barrier_depends() > #else > #define __ASM_SMP_MB > -#define smp_mb() barrier() > -#define smp_rmb() barrier() > -#define smp_wmb() barrier() > -#define smp_read_barrier_depends() do { } while (0) > #endif > > -#define set_mb(var, value) \ > -do { var = value; mb(); } while (0) > +#include > > #endif /* __BARRIER_H */ > --- a/arch/arc/include/asm/Kbuild > +++ b/arch/arc/include/asm/Kbuild > @@ -47,3 +47,4 @@ generic-y += user.h > generic-y += vga.h > generic-y += xor.h > generic-y += preempt.h > +generic-y += barrier.h > --- a/arch/arc/include/asm/atomic.h > +++ b/arch/arc/include/asm/atomic.h > @@ -190,6 +190,11 @@ static inline void atomic_clear_mask(uns > > #endif /* !CONFIG_ARC_HAS_LLSC */ > > +#define smp_mb__before_atomic_dec() barrier() > +#define smp_mb__after_atomic_dec() barrier() > +#define smp_mb__before_atomic_inc() barrier() > +#define smp_mb__after_atomic_inc() barrier() > + > /** > * __atomic_add_unless - add unless the number is a given value > * @v: pointer of type atomic_t > --- a/arch/arc/include/asm/barrier.h > +++ /dev/null > @@ -1,42 +0,0 @@ > -/* > - * Copyright (C) 2004, 2007-2010, 2011-2012 Synopsys, Inc. (www.synopsys.com) > - * > - * This program is free software; you can redistribute it and/or modify > - * it under the terms of the GNU General Public License version 2 as > - * published by the Free Software Foundation. > - */ > - > -#ifndef __ASM_BARRIER_H > -#define __ASM_BARRIER_H > - > -#ifndef __ASSEMBLY__ > - > -/* TODO-vineetg: Need to see what this does, don't we need sync anywhere */ > -#define mb() __asm__ __volatile__ ("" : : : "memory") > -#define rmb() mb() > -#define wmb() mb() > -#define set_mb(var, value) do { var = value; mb(); } while (0) > -#define set_wmb(var, value) do { var = value; wmb(); } while (0) > -#define read_barrier_depends() mb() > - > -/* TODO-vineetg verify the correctness of macros here */ > -#ifdef CONFIG_SMP > -#define smp_mb() mb() > -#define smp_rmb() rmb() > -#define smp_wmb() wmb() > -#else > -#define smp_mb() barrier() > -#define smp_rmb() barrier() > -#define smp_wmb() barrier() > -#endif > - > -#define smp_mb__before_atomic_dec() barrier() > -#define smp_mb__after_atomic_dec() barrier() > -#define smp_mb__before_atomic_inc() barrier() > -#define smp_mb__after_atomic_inc() barrier() > - > -#define smp_read_barrier_depends() do { } while (0) > - > -#endif > - > -#endif I do like this take-no-prisoners approach! ;-) > --- a/arch/arm/include/asm/barrier.h > +++ b/arch/arm/include/asm/barrier.h > @@ -59,6 +59,21 @@ > #define smp_wmb() dmb(ishst) > #endif > > +#define smp_store_release(p, v) \ > +do { \ > + compiletime_assert_atomic_type(*p); \ > + smp_mb(); \ > + ACCESS_ONCE(*p) = (v); \ > +} while (0) > + > +#define smp_load_acquire(p) \ > +({ \ > + typeof(*p) ___p1 = ACCESS_ONCE(*p); \ > + compiletime_assert_atomic_type(*p); \ > + smp_mb(); \ > + ___p1; \ > +}) > + > #define read_barrier_depends() do { } while(0) > #define smp_read_barrier_depends() do { } while(0) > > --- a/arch/arm64/include/asm/barrier.h > +++ b/arch/arm64/include/asm/barrier.h > @@ -35,11 +35,59 @@ > #define smp_mb() barrier() > #define smp_rmb() barrier() > #define smp_wmb() barrier() > + > +#define smp_store_release(p, v) \ > +do { \ > + compiletime_assert_atomic_type(*p); \ > + smp_mb(); \ > + ACCESS_ONCE(*p) = (v); \ > +} while (0) > + > +#define smp_load_acquire(p) \ > +({ \ > + typeof(*p) ___p1 = ACCESS_ONCE(*p); \ > + compiletime_assert_atomic_type(*p); \ > + smp_mb(); \ > + ___p1; \ > +}) > + > #else > + > #define smp_mb() asm volatile("dmb ish" : : : "memory") > #define smp_rmb() asm volatile("dmb ishld" : : : "memory") > #define smp_wmb() asm volatile("dmb ishst" : : : "memory") > -#endif > + > +#define smp_store_release(p, v) \ > +do { \ > + compiletime_assert_atomic_type(*p); \ > + switch (sizeof(*p)) { \ > + case 4: \ > + asm volatile ("stlr %w1, [%0]" \ > + : "=Q" (*p) : "r" (v) : "memory"); \ > + break; \ > + case 8: \ > + asm volatile ("stlr %1, [%0]" \ > + : "=Q" (*p) : "r" (v) : "memory"); \ > + break; \ > + } \ > +} while (0) > + > +#define smp_load_acquire(p) \ > +({ \ > + typeof(*p) ___p1; \ > + compiletime_assert_atomic_type(*p); \ > + switch (sizeof(*p)) { \ > + case 4: \ > + asm volatile ("ldar %w0, [%1]" \ > + : "=r" (___p1) : "Q" (*p) : "memory"); \ > + break; \ > + case 8: \ > + asm volatile ("ldar %0, [%1]" \ > + : "=r" (___p1) : "Q" (*p) : "memory"); \ > + break; \ > + } \ > + ___p1; \ > +}) > > #define read_barrier_depends() do { } while(0) > #define smp_read_barrier_depends() do { } while(0) > --- a/arch/avr32/include/asm/barrier.h > +++ b/arch/avr32/include/asm/barrier.h > @@ -8,22 +8,15 @@ > #ifndef __ASM_AVR32_BARRIER_H > #define __ASM_AVR32_BARRIER_H > > -#define nop() asm volatile("nop") > - > -#define mb() asm volatile("" : : : "memory") > -#define rmb() mb() > -#define wmb() asm volatile("sync 0" : : : "memory") > -#define read_barrier_depends() do { } while(0) > -#define set_mb(var, value) do { var = value; mb(); } while(0) > +/* > + * Weirdest thing ever.. no full barrier, but it has a write barrier! > + */ > +#define wmb() asm volatile("sync 0" : : : "memory") Doesn't this mean that asm-generic/barrier.h needs to check for definitions? Ah, I see below that you added these checks. > #ifdef CONFIG_SMP > # error "The AVR32 port does not support SMP" > -#else > -# define smp_mb() barrier() > -# define smp_rmb() barrier() > -# define smp_wmb() barrier() > -# define smp_read_barrier_depends() do { } while(0) > #endif > > +#include > > #endif /* __ASM_AVR32_BARRIER_H */ > --- a/arch/blackfin/include/asm/barrier.h > +++ b/arch/blackfin/include/asm/barrier.h > @@ -23,26 +23,10 @@ > # define rmb() do { barrier(); smp_check_barrier(); } while (0) > # define wmb() do { barrier(); smp_mark_barrier(); } while (0) > # define read_barrier_depends() do { barrier(); smp_check_barrier(); } while (0) > -#else > -# define mb() barrier() > -# define rmb() barrier() > -# define wmb() barrier() > -# define read_barrier_depends() do { } while (0) > #endif > > -#else /* !CONFIG_SMP */ > - > -#define mb() barrier() > -#define rmb() barrier() > -#define wmb() barrier() > -#define read_barrier_depends() do { } while (0) > - > #endif /* !CONFIG_SMP */ > > -#define smp_mb() mb() > -#define smp_rmb() rmb() > -#define smp_wmb() wmb() > -#define set_mb(var, value) do { var = value; mb(); } while (0) > -#define smp_read_barrier_depends() read_barrier_depends() > +#include > > #endif /* _BLACKFIN_BARRIER_H */ > --- a/arch/cris/include/asm/Kbuild > +++ b/arch/cris/include/asm/Kbuild > @@ -12,3 +12,4 @@ generic-y += trace_clock.h > generic-y += vga.h > generic-y += xor.h > generic-y += preempt.h > +generic-y += barrier.h > --- a/arch/cris/include/asm/barrier.h > +++ /dev/null > @@ -1,25 +0,0 @@ > -#ifndef __ASM_CRIS_BARRIER_H > -#define __ASM_CRIS_BARRIER_H > - > -#define nop() __asm__ __volatile__ ("nop"); > - > -#define barrier() __asm__ __volatile__("": : :"memory") > -#define mb() barrier() > -#define rmb() mb() > -#define wmb() mb() > -#define read_barrier_depends() do { } while(0) > -#define set_mb(var, value) do { var = value; mb(); } while (0) > - > -#ifdef CONFIG_SMP > -#define smp_mb() mb() > -#define smp_rmb() rmb() > -#define smp_wmb() wmb() > -#define smp_read_barrier_depends() read_barrier_depends() > -#else > -#define smp_mb() barrier() > -#define smp_rmb() barrier() > -#define smp_wmb() barrier() > -#define smp_read_barrier_depends() do { } while(0) > -#endif > - > -#endif /* __ASM_CRIS_BARRIER_H */ > --- a/arch/frv/include/asm/barrier.h > +++ b/arch/frv/include/asm/barrier.h > @@ -17,13 +17,7 @@ > #define mb() asm volatile ("membar" : : :"memory") > #define rmb() asm volatile ("membar" : : :"memory") > #define wmb() asm volatile ("membar" : : :"memory") > -#define read_barrier_depends() do { } while (0) > > -#define smp_mb() barrier() > -#define smp_rmb() barrier() > -#define smp_wmb() barrier() > -#define smp_read_barrier_depends() do {} while(0) > -#define set_mb(var, value) \ > - do { var = (value); barrier(); } while (0) > +#include > > #endif /* _ASM_BARRIER_H */ > --- a/arch/h8300/include/asm/barrier.h > +++ b/arch/h8300/include/asm/barrier.h > @@ -3,27 +3,8 @@ > > #define nop() asm volatile ("nop"::) > > -/* > - * Force strict CPU ordering. > - * Not really required on H8... > - */ > -#define mb() asm volatile ("" : : :"memory") > -#define rmb() asm volatile ("" : : :"memory") > -#define wmb() asm volatile ("" : : :"memory") > #define set_mb(var, value) do { xchg(&var, value); } while (0) > > -#define read_barrier_depends() do { } while (0) > - > -#ifdef CONFIG_SMP > -#define smp_mb() mb() > -#define smp_rmb() rmb() > -#define smp_wmb() wmb() > -#define smp_read_barrier_depends() read_barrier_depends() > -#else > -#define smp_mb() barrier() > -#define smp_rmb() barrier() > -#define smp_wmb() barrier() > -#define smp_read_barrier_depends() do { } while(0) > -#endif > +#include > > #endif /* _H8300_BARRIER_H */ > --- a/arch/hexagon/include/asm/Kbuild > +++ b/arch/hexagon/include/asm/Kbuild > @@ -54,3 +54,4 @@ generic-y += ucontext.h > generic-y += unaligned.h > generic-y += xor.h > generic-y += preempt.h > +generic-y += barrier.h > --- a/arch/hexagon/include/asm/barrier.h > +++ /dev/null > @@ -1,41 +0,0 @@ > -/* > - * Memory barrier definitions for the Hexagon architecture > - * > - * Copyright (c) 2010-2011, The Linux Foundation. All rights reserved. > - * > - * This program is free software; you can redistribute it and/or modify > - * it under the terms of the GNU General Public License version 2 and > - * only version 2 as published by the Free Software Foundation. > - * > - * This program is distributed in the hope that it will be useful, > - * but WITHOUT ANY WARRANTY; without even the implied warranty of > - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > - * GNU General Public License for more details. > - * > - * You should have received a copy of the GNU General Public License > - * along with this program; if not, write to the Free Software > - * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA > - * 02110-1301, USA. > - */ > - > -#ifndef _ASM_BARRIER_H > -#define _ASM_BARRIER_H > - > -#define rmb() barrier() > -#define read_barrier_depends() barrier() > -#define wmb() barrier() > -#define mb() barrier() > -#define smp_rmb() barrier() > -#define smp_read_barrier_depends() barrier() > -#define smp_wmb() barrier() > -#define smp_mb() barrier() > -#define smp_mb__before_atomic_dec() barrier() > -#define smp_mb__after_atomic_dec() barrier() > -#define smp_mb__before_atomic_inc() barrier() > -#define smp_mb__after_atomic_inc() barrier() > - > -/* Set a value and use a memory barrier. Used by the scheduler somewhere. */ > -#define set_mb(var, value) \ > - do { var = value; mb(); } while (0) > - > -#endif /* _ASM_BARRIER_H */ > --- a/arch/ia64/include/asm/barrier.h > +++ b/arch/ia64/include/asm/barrier.h > @@ -45,11 +45,60 @@ > # define smp_rmb() rmb() > # define smp_wmb() wmb() > # define smp_read_barrier_depends() read_barrier_depends() > + > +#define smp_store_release(p, v) \ > +do { \ > + compiletime_assert_atomic_type(*p); \ > + switch (sizeof(*p)) { \ > + case 4: \ > + asm volatile ("st4.rel [%0]=%1" \ > + : "=r" (p) : "r" (v) : "memory"); \ > + break; \ > + case 8: \ > + asm volatile ("st8.rel [%0]=%1" \ > + : "=r" (p) : "r" (v) : "memory"); \ > + break; \ > + } \ > +} while (0) > + > +#define smp_load_acquire(p) \ > +({ \ > + typeof(*p) ___p1; \ > + compiletime_assert_atomic_type(*p); \ > + switch (sizeof(*p)) { \ > + case 4: \ > + asm volatile ("ld4.acq %0=[%1]" \ > + : "=r" (___p1) : "r" (p) : "memory"); \ > + break; \ > + case 8: \ > + asm volatile ("ld8.acq %0=[%1]" \ > + : "=r" (___p1) : "r" (p) : "memory"); \ > + break; \ > + } \ > + ___p1; \ > +}) > + > #else > + > # define smp_mb() barrier() > # define smp_rmb() barrier() > # define smp_wmb() barrier() > # define smp_read_barrier_depends() do { } while(0) > + > +#define smp_store_release(p, v) \ > +do { \ > + compiletime_assert_atomic_type(*p); \ > + smp_mb(); \ > + ACCESS_ONCE(*p) = (v); \ > +} while (0) > + > +#define smp_load_acquire(p) \ > +({ \ > + typeof(*p) ___p1 = ACCESS_ONCE(*p); \ > + compiletime_assert_atomic_type(*p); \ > + smp_mb(); \ > + ___p1; \ > +}) > #endif > > /* > --- a/arch/m32r/include/asm/barrier.h > +++ b/arch/m32r/include/asm/barrier.h > @@ -11,84 +11,6 @@ > > #define nop() __asm__ __volatile__ ("nop" : : ) > > -/* > - * Memory barrier. > - * > - * mb() prevents loads and stores being reordered across this point. > - * rmb() prevents loads being reordered across this point. > - * wmb() prevents stores being reordered across this point. > - */ > -#define mb() barrier() > -#define rmb() mb() > -#define wmb() mb() > - > -/** > - * read_barrier_depends - Flush all pending reads that subsequents reads > - * depend on. > - * > - * No data-dependent reads from memory-like regions are ever reordered > - * over this barrier. All reads preceding this primitive are guaranteed > - * to access memory (but not necessarily other CPUs' caches) before any > - * reads following this primitive that depend on the data return by > - * any of the preceding reads. This primitive is much lighter weight than > - * rmb() on most CPUs, and is never heavier weight than is > - * rmb(). > - * > - * These ordering constraints are respected by both the local CPU > - * and the compiler. > - * > - * Ordering is not guaranteed by anything other than these primitives, > - * not even by data dependencies. See the documentation for > - * memory_barrier() for examples and URLs to more information. > - * > - * For example, the following code would force ordering (the initial > - * value of "a" is zero, "b" is one, and "p" is "&a"): > - * > - * > - * CPU 0 CPU 1 > - * > - * b = 2; > - * memory_barrier(); > - * p = &b; q = p; > - * read_barrier_depends(); > - * d = *q; > - * > - * > - * > - * because the read of "*q" depends on the read of "p" and these > - * two reads are separated by a read_barrier_depends(). However, > - * the following code, with the same initial values for "a" and "b": > - * > - * > - * CPU 0 CPU 1 > - * > - * a = 2; > - * memory_barrier(); > - * b = 3; y = b; > - * read_barrier_depends(); > - * x = a; > - * > - * > - * does not enforce ordering, since there is no data dependency between > - * the read of "a" and the read of "b". Therefore, on some CPUs, such > - * as Alpha, "y" could be set to 3 and "x" to 0. Use rmb() > - * in cases like this where there are no data dependencies. > - **/ > - > -#define read_barrier_depends() do { } while (0) > - > -#ifdef CONFIG_SMP > -#define smp_mb() mb() > -#define smp_rmb() rmb() > -#define smp_wmb() wmb() > -#define smp_read_barrier_depends() read_barrier_depends() > -#define set_mb(var, value) do { (void) xchg(&var, value); } while (0) > -#else > -#define smp_mb() barrier() > -#define smp_rmb() barrier() > -#define smp_wmb() barrier() > -#define smp_read_barrier_depends() do { } while (0) > -#define set_mb(var, value) do { var = value; barrier(); } while (0) > -#endif > +#include > > #endif /* _ASM_M32R_BARRIER_H */ > --- a/arch/m68k/include/asm/barrier.h > +++ b/arch/m68k/include/asm/barrier.h > @@ -1,20 +1,8 @@ > #ifndef _M68K_BARRIER_H > #define _M68K_BARRIER_H > > -/* > - * Force strict CPU ordering. > - * Not really required on m68k... > - */ > #define nop() do { asm volatile ("nop"); barrier(); } while (0) > -#define mb() barrier() > -#define rmb() barrier() > -#define wmb() barrier() > -#define read_barrier_depends() ((void)0) > -#define set_mb(var, value) ({ (var) = (value); wmb(); }) > > -#define smp_mb() barrier() > -#define smp_rmb() barrier() > -#define smp_wmb() barrier() > -#define smp_read_barrier_depends() ((void)0) > +#include > > #endif /* _M68K_BARRIER_H */ > --- a/arch/metag/include/asm/barrier.h > +++ b/arch/metag/include/asm/barrier.h > @@ -82,4 +82,19 @@ static inline void fence(void) > #define smp_read_barrier_depends() do { } while (0) > #define set_mb(var, value) do { var = value; smp_mb(); } while (0) > > +#define smp_store_release(p, v) \ > +do { \ > + compiletime_assert_atomic_type(*p); \ > + smp_mb(); \ > + ACCESS_ONCE(*p) = (v); \ > +} while (0) > + > +#define smp_load_acquire(p) \ > +({ \ > + typeof(*p) ___p1 = ACCESS_ONCE(*p); \ > + compiletime_assert_atomic_type(*p); \ > + smp_mb(); \ > + ___p1; \ > +}) > + > #endif /* _ASM_METAG_BARRIER_H */ > --- a/arch/microblaze/include/asm/Kbuild > +++ b/arch/microblaze/include/asm/Kbuild > @@ -4,3 +4,4 @@ generic-y += exec.h > generic-y += trace_clock.h > generic-y += syscalls.h > generic-y += preempt.h > +generic-y += barrier.h > --- a/arch/microblaze/include/asm/barrier.h > +++ /dev/null > @@ -1,27 +0,0 @@ > -/* > - * Copyright (C) 2006 Atmark Techno, Inc. > - * > - * This file is subject to the terms and conditions of the GNU General Public > - * License. See the file "COPYING" in the main directory of this archive > - * for more details. > - */ > - > -#ifndef _ASM_MICROBLAZE_BARRIER_H > -#define _ASM_MICROBLAZE_BARRIER_H > - > -#define nop() asm volatile ("nop") > - > -#define smp_read_barrier_depends() do {} while (0) > -#define read_barrier_depends() do {} while (0) > - > -#define mb() barrier() > -#define rmb() mb() > -#define wmb() mb() > -#define set_mb(var, value) do { var = value; mb(); } while (0) > -#define set_wmb(var, value) do { var = value; wmb(); } while (0) > - > -#define smp_mb() mb() > -#define smp_rmb() rmb() > -#define smp_wmb() wmb() > - > -#endif /* _ASM_MICROBLAZE_BARRIER_H */ > --- a/arch/mips/include/asm/barrier.h > +++ b/arch/mips/include/asm/barrier.h > @@ -180,4 +180,19 @@ > #define nudge_writes() mb() > #endif > > +#define smp_store_release(p, v) \ > +do { \ > + compiletime_assert_atomic_type(*p); \ > + smp_mb(); \ > + ACCESS_ONCE(*p) = (v); \ > +} while (0) > + > +#define smp_load_acquire(p) \ > +({ \ > + typeof(*p) ___p1 = ACCESS_ONCE(*p); \ > + compiletime_assert_atomic_type(*p); \ > + smp_mb(); \ > + ___p1; \ > +}) > + > #endif /* __ASM_BARRIER_H */ > --- a/arch/mn10300/include/asm/Kbuild > +++ b/arch/mn10300/include/asm/Kbuild > @@ -3,3 +3,4 @@ generic-y += clkdev.h > generic-y += exec.h > generic-y += trace_clock.h > generic-y += preempt.h > +generic-y += barrier.h > --- a/arch/mn10300/include/asm/barrier.h > +++ /dev/null > @@ -1,37 +0,0 @@ > -/* MN10300 memory barrier definitions > - * > - * Copyright (C) 2007 Red Hat, Inc. All Rights Reserved. > - * Written by David Howells (dhowells@redhat.com) > - * > - * This program is free software; you can redistribute it and/or > - * modify it under the terms of the GNU General Public Licence > - * as published by the Free Software Foundation; either version > - * 2 of the Licence, or (at your option) any later version. > - */ > -#ifndef _ASM_BARRIER_H > -#define _ASM_BARRIER_H > - > -#define nop() asm volatile ("nop") > - > -#define mb() asm volatile ("": : :"memory") > -#define rmb() mb() > -#define wmb() asm volatile ("": : :"memory") > - > -#ifdef CONFIG_SMP > -#define smp_mb() mb() > -#define smp_rmb() rmb() > -#define smp_wmb() wmb() > -#define set_mb(var, value) do { xchg(&var, value); } while (0) > -#else /* CONFIG_SMP */ > -#define smp_mb() barrier() > -#define smp_rmb() barrier() > -#define smp_wmb() barrier() > -#define set_mb(var, value) do { var = value; mb(); } while (0) > -#endif /* CONFIG_SMP */ > - > -#define set_wmb(var, value) do { var = value; wmb(); } while (0) > - > -#define read_barrier_depends() do {} while (0) > -#define smp_read_barrier_depends() do {} while (0) > - > -#endif /* _ASM_BARRIER_H */ > --- a/arch/parisc/include/asm/Kbuild > +++ b/arch/parisc/include/asm/Kbuild > @@ -5,3 +5,4 @@ generic-y += word-at-a-time.h auxvec.h u > poll.h xor.h clkdev.h exec.h > generic-y += trace_clock.h > generic-y += preempt.h > +generic-y += barrier.h > --- a/arch/parisc/include/asm/barrier.h > +++ /dev/null > @@ -1,35 +0,0 @@ > -#ifndef __PARISC_BARRIER_H > -#define __PARISC_BARRIER_H > - > -/* > -** This is simply the barrier() macro from linux/kernel.h but when serial.c > -** uses tqueue.h uses smp_mb() defined using barrier(), linux/kernel.h > -** hasn't yet been included yet so it fails, thus repeating the macro here. > -** > -** PA-RISC architecture allows for weakly ordered memory accesses although > -** none of the processors use it. There is a strong ordered bit that is > -** set in the O-bit of the page directory entry. Operating systems that > -** can not tolerate out of order accesses should set this bit when mapping > -** pages. The O-bit of the PSW should also be set to 1 (I don't believe any > -** of the processor implemented the PSW O-bit). The PCX-W ERS states that > -** the TLB O-bit is not implemented so the page directory does not need to > -** have the O-bit set when mapping pages (section 3.1). This section also > -** states that the PSW Y, Z, G, and O bits are not implemented. > -** So it looks like nothing needs to be done for parisc-linux (yet). > -** (thanks to chada for the above comment -ggg) > -** > -** The __asm__ op below simple prevents gcc/ld from reordering > -** instructions across the mb() "call". > -*/ > -#define mb() __asm__ __volatile__("":::"memory") /* barrier() */ > -#define rmb() mb() > -#define wmb() mb() > -#define smp_mb() mb() > -#define smp_rmb() mb() > -#define smp_wmb() mb() > -#define smp_read_barrier_depends() do { } while(0) > -#define read_barrier_depends() do { } while(0) > - > -#define set_mb(var, value) do { var = value; mb(); } while (0) > - > -#endif /* __PARISC_BARRIER_H */ > --- a/arch/powerpc/include/asm/barrier.h > +++ b/arch/powerpc/include/asm/barrier.h > @@ -45,11 +45,15 @@ > # define SMPWMB eieio > #endif > > +#define __lwsync() __asm__ __volatile__ (stringify_in_c(LWSYNC) : : :"memory") > + > #define smp_mb() mb() > -#define smp_rmb() __asm__ __volatile__ (stringify_in_c(LWSYNC) : : :"memory") > +#define smp_rmb() __lwsync() > #define smp_wmb() __asm__ __volatile__ (stringify_in_c(SMPWMB) : : :"memory") > #define smp_read_barrier_depends() read_barrier_depends() > #else > +#define __lwsync() barrier() > + > #define smp_mb() barrier() > #define smp_rmb() barrier() > #define smp_wmb() barrier() > @@ -65,4 +69,19 @@ > #define data_barrier(x) \ > asm volatile("twi 0,%0,0; isync" : : "r" (x) : "memory"); > > +#define smp_store_release(p, v) \ > +do { \ > + compiletime_assert_atomic_type(*p); \ > + __lwsync(); \ > + ACCESS_ONCE(*p) = (v); \ > +} while (0) > + > +#define smp_load_acquire(p) \ > +({ \ > + typeof(*p) ___p1 = ACCESS_ONCE(*p); \ > + compiletime_assert_atomic_type(*p); \ > + __lwsync(); \ > + ___p1; \ > +}) > + > #endif /* _ASM_POWERPC_BARRIER_H */ > --- a/arch/s390/include/asm/barrier.h > +++ b/arch/s390/include/asm/barrier.h > @@ -32,4 +32,19 @@ > > #define set_mb(var, value) do { var = value; mb(); } while (0) > > +#define smp_store_release(p, v) \ > +do { \ > + compiletime_assert_atomic_type(*p); \ > + barrier(); \ > + ACCESS_ONCE(*p) = (v); \ > +} while (0) > + > +#define smp_load_acquire(p) \ > +({ \ > + typeof(*p) ___p1 = ACCESS_ONCE(*p); \ > + compiletime_assert_atomic_type(*p); \ > + barrier(); \ > + ___p1; \ > +}) > + > #endif /* __ASM_BARRIER_H */ > --- a/arch/score/include/asm/Kbuild > +++ b/arch/score/include/asm/Kbuild > @@ -5,3 +5,4 @@ generic-y += clkdev.h > generic-y += trace_clock.h > generic-y += xor.h > generic-y += preempt.h > +generic-y += barrier.h > --- a/arch/score/include/asm/barrier.h > +++ /dev/null > @@ -1,16 +0,0 @@ > -#ifndef _ASM_SCORE_BARRIER_H > -#define _ASM_SCORE_BARRIER_H > - > -#define mb() barrier() > -#define rmb() barrier() > -#define wmb() barrier() > -#define smp_mb() barrier() > -#define smp_rmb() barrier() > -#define smp_wmb() barrier() > - > -#define read_barrier_depends() do {} while (0) > -#define smp_read_barrier_depends() do {} while (0) > - > -#define set_mb(var, value) do {var = value; wmb(); } while (0) > - > -#endif /* _ASM_SCORE_BARRIER_H */ > --- a/arch/sh/include/asm/barrier.h > +++ b/arch/sh/include/asm/barrier.h > @@ -26,29 +26,14 @@ > #if defined(CONFIG_CPU_SH4A) || defined(CONFIG_CPU_SH5) > #define mb() __asm__ __volatile__ ("synco": : :"memory") > #define rmb() mb() > -#define wmb() __asm__ __volatile__ ("synco": : :"memory") > +#define wmb() mb() > #define ctrl_barrier() __icbi(PAGE_OFFSET) > -#define read_barrier_depends() do { } while(0) > #else > -#define mb() __asm__ __volatile__ ("": : :"memory") > -#define rmb() mb() > -#define wmb() __asm__ __volatile__ ("": : :"memory") > #define ctrl_barrier() __asm__ __volatile__ ("nop;nop;nop;nop;nop;nop;nop;nop") > -#define read_barrier_depends() do { } while(0) > -#endif > - > -#ifdef CONFIG_SMP > -#define smp_mb() mb() > -#define smp_rmb() rmb() > -#define smp_wmb() wmb() > -#define smp_read_barrier_depends() read_barrier_depends() > -#else > -#define smp_mb() barrier() > -#define smp_rmb() barrier() > -#define smp_wmb() barrier() > -#define smp_read_barrier_depends() do { } while(0) > #endif > > #define set_mb(var, value) do { (void)xchg(&var, value); } while (0) > > +#include > + > #endif /* __ASM_SH_BARRIER_H */ > --- a/arch/sparc/include/asm/barrier_32.h > +++ b/arch/sparc/include/asm/barrier_32.h > @@ -1,15 +1,6 @@ > #ifndef __SPARC_BARRIER_H > #define __SPARC_BARRIER_H > > -/* XXX Change this if we ever use a PSO mode kernel. */ > -#define mb() __asm__ __volatile__ ("" : : : "memory") > -#define rmb() mb() > -#define wmb() mb() > -#define read_barrier_depends() do { } while(0) > -#define set_mb(__var, __value) do { __var = __value; mb(); } while(0) > -#define smp_mb() __asm__ __volatile__("":::"memory") > -#define smp_rmb() __asm__ __volatile__("":::"memory") > -#define smp_wmb() __asm__ __volatile__("":::"memory") > -#define smp_read_barrier_depends() do { } while(0) > +#include > > #endif /* !(__SPARC_BARRIER_H) */ > --- a/arch/sparc/include/asm/barrier_64.h > +++ b/arch/sparc/include/asm/barrier_64.h > @@ -53,4 +53,19 @@ do { __asm__ __volatile__("ba,pt %%xcc, > > #define smp_read_barrier_depends() do { } while(0) > > +#define smp_store_release(p, v) \ > +do { \ > + compiletime_assert_atomic_type(*p); \ > + barrier(); \ > + ACCESS_ONCE(*p) = (v); \ > +} while (0) > + > +#define smp_load_acquire(p) \ > +({ \ > + typeof(*p) ___p1 = ACCESS_ONCE(*p); \ > + compiletime_assert_atomic_type(*p); \ > + barrier(); \ > + ___p1; \ > +}) > + > #endif /* !(__SPARC64_BARRIER_H) */ > --- a/arch/tile/include/asm/barrier.h > +++ b/arch/tile/include/asm/barrier.h > @@ -22,59 +22,6 @@ > #include > #include > > -/* > - * read_barrier_depends - Flush all pending reads that subsequents reads > - * depend on. > - * > - * No data-dependent reads from memory-like regions are ever reordered > - * over this barrier. All reads preceding this primitive are guaranteed > - * to access memory (but not necessarily other CPUs' caches) before any > - * reads following this primitive that depend on the data return by > - * any of the preceding reads. This primitive is much lighter weight than > - * rmb() on most CPUs, and is never heavier weight than is > - * rmb(). > - * > - * These ordering constraints are respected by both the local CPU > - * and the compiler. > - * > - * Ordering is not guaranteed by anything other than these primitives, > - * not even by data dependencies. See the documentation for > - * memory_barrier() for examples and URLs to more information. > - * > - * For example, the following code would force ordering (the initial > - * value of "a" is zero, "b" is one, and "p" is "&a"): > - * > - * > - * CPU 0 CPU 1 > - * > - * b = 2; > - * memory_barrier(); > - * p = &b; q = p; > - * read_barrier_depends(); > - * d = *q; > - * > - * > - * because the read of "*q" depends on the read of "p" and these > - * two reads are separated by a read_barrier_depends(). However, > - * the following code, with the same initial values for "a" and "b": > - * > - * > - * CPU 0 CPU 1 > - * > - * a = 2; > - * memory_barrier(); > - * b = 3; y = b; > - * read_barrier_depends(); > - * x = a; > - * > - * > - * does not enforce ordering, since there is no data dependency between > - * the read of "a" and the read of "b". Therefore, on some CPUs, such > - * as Alpha, "y" could be set to 3 and "x" to 0. Use rmb() > - * in cases like this where there are no data dependencies. > - */ > -#define read_barrier_depends() do { } while (0) > - > #define __sync() __insn_mf() > > #include > @@ -125,20 +72,7 @@ mb_incoherent(void) > #define mb() fast_mb() > #define iob() fast_iob() > > -#ifdef CONFIG_SMP > -#define smp_mb() mb() > -#define smp_rmb() rmb() > -#define smp_wmb() wmb() > -#define smp_read_barrier_depends() read_barrier_depends() > -#else > -#define smp_mb() barrier() > -#define smp_rmb() barrier() > -#define smp_wmb() barrier() > -#define smp_read_barrier_depends() do { } while (0) > -#endif > - > -#define set_mb(var, value) \ > - do { var = value; mb(); } while (0) > +#include > > #endif /* !__ASSEMBLY__ */ > #endif /* _ASM_TILE_BARRIER_H */ > --- a/arch/unicore32/include/asm/barrier.h > +++ b/arch/unicore32/include/asm/barrier.h > @@ -14,15 +14,6 @@ > #define dsb() __asm__ __volatile__ ("" : : : "memory") > #define dmb() __asm__ __volatile__ ("" : : : "memory") > > -#define mb() barrier() > -#define rmb() barrier() > -#define wmb() barrier() > -#define smp_mb() barrier() > -#define smp_rmb() barrier() > -#define smp_wmb() barrier() > -#define read_barrier_depends() do { } while (0) > -#define smp_read_barrier_depends() do { } while (0) > - > -#define set_mb(var, value) do { var = value; smp_mb(); } while (0) > +#include > > #endif /* __UNICORE_BARRIER_H__ */ > --- a/arch/x86/include/asm/barrier.h > +++ b/arch/x86/include/asm/barrier.h > @@ -100,6 +100,21 @@ > #define set_mb(var, value) do { var = value; barrier(); } while (0) > #endif > > +#define smp_store_release(p, v) \ > +do { \ > + compiletime_assert_atomic_type(*p); \ > + barrier(); \ > + ACCESS_ONCE(*p) = (v); \ > +} while (0) > + > +#define smp_load_acquire(p) \ > +({ \ > + typeof(*p) ___p1 = ACCESS_ONCE(*p); \ > + compiletime_assert_atomic_type(*p); \ > + barrier(); \ > + ___p1; \ > +}) > + > /* > * Stop RDTSC speculation. This is needed when you need to use RDTSC > * (or get_cycles or vread that possibly accesses the TSC) in a defined > --- a/arch/xtensa/include/asm/barrier.h > +++ b/arch/xtensa/include/asm/barrier.h > @@ -9,21 +9,14 @@ > #ifndef _XTENSA_SYSTEM_H > #define _XTENSA_SYSTEM_H > > -#define smp_read_barrier_depends() do { } while(0) > -#define read_barrier_depends() do { } while(0) > - > #define mb() ({ __asm__ __volatile__("memw" : : : "memory"); }) > #define rmb() barrier() > #define wmb() mb() > > #ifdef CONFIG_SMP > #error smp_* not defined > -#else > -#define smp_mb() barrier() > -#define smp_rmb() barrier() > -#define smp_wmb() barrier() > #endif > > -#define set_mb(var, value) do { var = value; mb(); } while (0) > +#include > > #endif /* _XTENSA_SYSTEM_H */ > --- a/include/asm-generic/barrier.h > +++ b/include/asm-generic/barrier.h > @@ -1,4 +1,5 @@ > -/* Generic barrier definitions, based on MN10300 definitions. > +/* > + * Generic barrier definitions, based on MN10300 definitions. > * > * It should be possible to use these on really simple architectures, > * but it serves more as a starting point for new ports. > @@ -16,35 +17,67 @@ > > #ifndef __ASSEMBLY__ > > -#define nop() asm volatile ("nop") > +#include > + > +#ifndef nop > +#define nop() asm volatile ("nop") > +#endif > > /* > - * Force strict CPU ordering. > - * And yes, this is required on UP too when we're talking > - * to devices. > + * Force strict CPU ordering. And yes, this is required on UP too when we're > + * talking to devices. > * > - * This implementation only contains a compiler barrier. > + * Fall back to compiler barriers if nothing better is provided. > */ > > -#define mb() asm volatile ("": : :"memory") > -#define rmb() mb() > -#define wmb() asm volatile ("": : :"memory") > +#ifndef mb > +#define mb() barrier() > +#endif > + > +#ifndef rmb > +#define rmb() barrier() > +#endif > + > +#ifndef wmb > +#define wmb() barrier() > +#endif > + > +#ifndef read_barrier_depends > +#define read_barrier_depends() do {} while (0) > +#endif > > #ifdef CONFIG_SMP > #define smp_mb() mb() > #define smp_rmb() rmb() > #define smp_wmb() wmb() > +#define smp_read_barrier_depends() read_barrier_depends() > #else > #define smp_mb() barrier() > #define smp_rmb() barrier() > #define smp_wmb() barrier() > +#define smp_read_barrier_depends() do {} while (0) > #endif > > +#ifndef set_mb > #define set_mb(var, value) do { var = value; mb(); } while (0) > +#endif > + > #define set_wmb(var, value) do { var = value; wmb(); } while (0) > > -#define read_barrier_depends() do {} while (0) > -#define smp_read_barrier_depends() do {} while (0) > +#define smp_store_release(p, v) \ > +do { \ > + compiletime_assert_atomic_type(*p); \ > + smp_mb(); \ > + ACCESS_ONCE(*p) = (v); \ > +} while (0) > + > +#define smp_load_acquire(p) \ > +({ \ > + typeof(*p) ___p1 = ACCESS_ONCE(*p); \ > + compiletime_assert_atomic_type(*p); \ > + smp_mb(); \ > + ___p1; \ > +}) > > #endif /* !__ASSEMBLY__ */ > #endif /* __ASM_GENERIC_BARRIER_H */ > --- a/include/linux/compiler.h > +++ b/include/linux/compiler.h > @@ -298,6 +298,11 @@ void ftrace_likely_update(struct ftrace_ > # define __same_type(a, b) __builtin_types_compatible_p(typeof(a), typeof(b)) > #endif > > +/* Is this type a native word size -- useful for atomic operations */ > +#ifndef __native_word > +# define __native_word(t) (sizeof(t) == sizeof(int) || sizeof(t) == sizeof(long)) > +#endif > + > /* Compile time object size, -1 for unknown */ > #ifndef __compiletime_object_size > # define __compiletime_object_size(obj) -1 > @@ -337,6 +342,10 @@ void ftrace_likely_update(struct ftrace_ > #define compiletime_assert(condition, msg) \ > _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__) > > +#define compiletime_assert_atomic_type(t) \ > + compiletime_assert(__native_word(t), \ > + "Need native word sized stores/loads for atomicity.") > + > /* > * Prevent the compiler from merging or refetching accesses. The compiler > * is also forbidden from reordering successive instances of ACCESS_ONCE(), > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from e35.co.us.ibm.com (e35.co.us.ibm.com [32.97.110.153]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "e35.co.us.ibm.com", Issuer "GeoTrust SSL CA" (not verified)) by ozlabs.org (Postfix) with ESMTPS id B02DD2C007A for ; Thu, 7 Nov 2013 05:48:56 +1100 (EST) Received: from /spool/local by e35.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 6 Nov 2013 11:48:54 -0700 Received: from d03relay04.boulder.ibm.com (d03relay04.boulder.ibm.com [9.17.195.106]) by d03dlp03.boulder.ibm.com (Postfix) with ESMTP id E8F2319D804C for ; Wed, 6 Nov 2013 11:48:47 -0700 (MST) Received: from d03av06.boulder.ibm.com (d03av06.boulder.ibm.com [9.17.195.245]) by d03relay04.boulder.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id rA6Imp84331352 for ; Wed, 6 Nov 2013 11:48:51 -0700 Received: from d03av06.boulder.ibm.com (loopback [127.0.0.1]) by d03av06.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id rA6Ipdwk024824 for ; Wed, 6 Nov 2013 11:51:40 -0700 Date: Wed, 6 Nov 2013 10:48:48 -0800 From: "Paul E. McKenney" To: Peter Zijlstra Subject: Re: [RFC] arch: Introduce new TSO memory barrier smp_tmb() Message-ID: <20131106184848.GM18245@linux.vnet.ibm.com> References: <20131103224242.GF3947@linux.vnet.ibm.com> <20131104105059.GL3947@linux.vnet.ibm.com> <20131104112254.GK28601@twins.programming.kicks-ass.net> <20131104162732.GN3947@linux.vnet.ibm.com> <20131104191127.GW16117@laptop.programming.kicks-ass.net> <20131104205344.GW3947@linux.vnet.ibm.com> <20131106123946.GJ10651@twins.programming.kicks-ass.net> <20131106135736.GK10651@twins.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <20131106135736.GK10651@twins.programming.kicks-ass.net> Cc: Michael Neuling , Tony Luck , Mathieu Desnoyers , Heiko Carstens , Oleg Nesterov , LKML , Linux PPC dev , Geert Uytterhoeven , Anton Blanchard , Frederic Weisbecker , Victor Kaplansky , Russell King , Linus Torvalds , Martin Schwidefsky Reply-To: paulmck@linux.vnet.ibm.com List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , On Wed, Nov 06, 2013 at 02:57:36PM +0100, Peter Zijlstra wrote: > On Wed, Nov 06, 2013 at 01:51:10PM +0100, Geert Uytterhoeven wrote: > > This is screaming for a default implementation in asm-generic. > > Right you are... how about a little something like this? > > There's a few archs I didn't fully merge with the generic one because of > weird nop implementations. > > asm volatile ("nop" :: ) vs asm volatile ("nop" ::: "memory") and the > like. They probably can (and should) use the regular asm volatile > ("nop") but I misplaced the toolchains for many of the weird archs so I > didn't attempt. > > Also fixed a silly mistake in the return type definition for most > smp_load_acquire() implementions: typeof(p) vs typeof(*p). > > --- > Subject: arch: Introduce smp_load_acquire(), smp_store_release() > From: Peter Zijlstra > Date: Mon, 4 Nov 2013 20:18:11 +0100 > > A number of situations currently require the heavyweight smp_mb(), > even though there is no need to order prior stores against later > loads. Many architectures have much cheaper ways to handle these > situations, but the Linux kernel currently has no portable way > to make use of them. > > This commit therefore supplies smp_load_acquire() and > smp_store_release() to remedy this situation. The new > smp_load_acquire() primitive orders the specified load against > any subsequent reads or writes, while the new smp_store_release() > primitive orders the specifed store against any prior reads or > writes. These primitives allow array-based circular FIFOs to be > implemented without an smp_mb(), and also allow a theoretical > hole in rcu_assign_pointer() to be closed at no additional > expense on most architectures. > > In addition, the RCU experience transitioning from explicit > smp_read_barrier_depends() and smp_wmb() to rcu_dereference() > and rcu_assign_pointer(), respectively resulted in substantial > improvements in readability. It therefore seems likely that > replacing other explicit barriers with smp_load_acquire() and > smp_store_release() will provide similar benefits. It appears > that roughly half of the explicit barriers in core kernel code > might be so replaced. > > > Cc: Michael Ellerman > Cc: Michael Neuling > Cc: "Paul E. McKenney" > Cc: Linus Torvalds > Cc: Victor Kaplansky > Cc: Oleg Nesterov > Cc: Anton Blanchard > Cc: Benjamin Herrenschmidt > Cc: Frederic Weisbecker > Cc: Mathieu Desnoyers > Signed-off-by: Peter Zijlstra A few nits on Documentation/memory-barriers.txt and some pointless comments elsewhere. With the suggested Documentation/memory-barriers.txt fixes: Reviewed-by: Paul E. McKenney > --- > Documentation/memory-barriers.txt | 157 +++++++++++++++++----------------- > arch/alpha/include/asm/barrier.h | 25 +---- > arch/arc/include/asm/Kbuild | 1 > arch/arc/include/asm/atomic.h | 5 + > arch/arc/include/asm/barrier.h | 42 --------- > arch/arm/include/asm/barrier.h | 15 +++ > arch/arm64/include/asm/barrier.h | 50 ++++++++++ > arch/avr32/include/asm/barrier.h | 17 +-- > arch/blackfin/include/asm/barrier.h | 18 --- > arch/cris/include/asm/Kbuild | 1 > arch/cris/include/asm/barrier.h | 25 ----- > arch/frv/include/asm/barrier.h | 8 - > arch/h8300/include/asm/barrier.h | 21 ---- > arch/hexagon/include/asm/Kbuild | 1 > arch/hexagon/include/asm/barrier.h | 41 -------- > arch/ia64/include/asm/barrier.h | 49 ++++++++++ > arch/m32r/include/asm/barrier.h | 80 ----------------- > arch/m68k/include/asm/barrier.h | 14 --- > arch/metag/include/asm/barrier.h | 15 +++ > arch/microblaze/include/asm/Kbuild | 1 > arch/microblaze/include/asm/barrier.h | 27 ----- > arch/mips/include/asm/barrier.h | 15 +++ > arch/mn10300/include/asm/Kbuild | 1 > arch/mn10300/include/asm/barrier.h | 37 -------- > arch/parisc/include/asm/Kbuild | 1 > arch/parisc/include/asm/barrier.h | 35 ------- > arch/powerpc/include/asm/barrier.h | 21 ++++ > arch/s390/include/asm/barrier.h | 15 +++ > arch/score/include/asm/Kbuild | 1 > arch/score/include/asm/barrier.h | 16 --- > arch/sh/include/asm/barrier.h | 21 ---- > arch/sparc/include/asm/barrier_32.h | 11 -- > arch/sparc/include/asm/barrier_64.h | 15 +++ > arch/tile/include/asm/barrier.h | 68 -------------- > arch/unicore32/include/asm/barrier.h | 11 -- > arch/x86/include/asm/barrier.h | 15 +++ > arch/xtensa/include/asm/barrier.h | 9 - > include/asm-generic/barrier.h | 55 +++++++++-- > include/linux/compiler.h | 9 + > 39 files changed, 375 insertions(+), 594 deletions(-) > > --- a/Documentation/memory-barriers.txt > +++ b/Documentation/memory-barriers.txt > @@ -371,33 +371,35 @@ VARIETIES OF MEMORY BARRIER > > And a couple of implicit varieties: > > - (5) LOCK operations. > + (5) ACQUIRE operations. > > This acts as a one-way permeable barrier. It guarantees that all memory > - operations after the LOCK operation will appear to happen after the LOCK > - operation with respect to the other components of the system. > + operations after the ACQUIRE operation will appear to happen after the > + ACQUIRE operation with respect to the other components of the system. ACQUIRE operations include LOCK operations and smp_load_acquire() operations. > > - Memory operations that occur before a LOCK operation may appear to happen > - after it completes. > + Memory operations that occur before a ACQUIRE operation may appear to > + happen after it completes. > > - A LOCK operation should almost always be paired with an UNLOCK operation. > + A ACQUIRE operation should almost always be paired with an RELEASE > + operation. > > > - (6) UNLOCK operations. > + (6) RELEASE operations. > > This also acts as a one-way permeable barrier. It guarantees that all > - memory operations before the UNLOCK operation will appear to happen before > - the UNLOCK operation with respect to the other components of the system. > + memory operations before the RELEASE operation will appear to happen > + before the RELEASE operation with respect to the other components of the > + system. Release operations include UNLOCK operations and smp_store_release() operations. > - Memory operations that occur after an UNLOCK operation may appear to > + Memory operations that occur after an RELEASE operation may appear to > happen before it completes. > > - LOCK and UNLOCK operations are guaranteed to appear with respect to each > - other strictly in the order specified. > + ACQUIRE and RELEASE operations are guaranteed to appear with respect to > + each other strictly in the order specified. > > - The use of LOCK and UNLOCK operations generally precludes the need for > - other sorts of memory barrier (but note the exceptions mentioned in the > - subsection "MMIO write barrier"). > + The use of ACQUIRE and RELEASE operations generally precludes the need > + for other sorts of memory barrier (but note the exceptions mentioned in > + the subsection "MMIO write barrier"). > > > Memory barriers are only required where there's a possibility of interaction > @@ -1135,7 +1137,7 @@ CPU from reordering them. > clear_bit( ... ); > > This prevents memory operations before the clear leaking to after it. See > - the subsection on "Locking Functions" with reference to UNLOCK operation > + the subsection on "Locking Functions" with reference to RELEASE operation > implications. > > See Documentation/atomic_ops.txt for more information. See the "Atomic > @@ -1181,65 +1183,66 @@ LOCKING FUNCTIONS > (*) R/W semaphores > (*) RCU > > -In all cases there are variants on "LOCK" operations and "UNLOCK" operations > +In all cases there are variants on "ACQUIRE" operations and "RELEASE" operations > for each construct. These operations all imply certain barriers: > > - (1) LOCK operation implication: > + (1) ACQUIRE operation implication: > > - Memory operations issued after the LOCK will be completed after the LOCK > - operation has completed. > + Memory operations issued after the ACQUIRE will be completed after the > + ACQUIRE operation has completed. > > - Memory operations issued before the LOCK may be completed after the LOCK > - operation has completed. > + Memory operations issued before the ACQUIRE may be completed after the > + ACQUIRE operation has completed. > > - (2) UNLOCK operation implication: > + (2) RELEASE operation implication: > > - Memory operations issued before the UNLOCK will be completed before the > - UNLOCK operation has completed. > + Memory operations issued before the RELEASE will be completed before the > + RELEASE operation has completed. > > - Memory operations issued after the UNLOCK may be completed before the > - UNLOCK operation has completed. > + Memory operations issued after the RELEASE may be completed before the > + RELEASE operation has completed. > > - (3) LOCK vs LOCK implication: > + (3) ACQUIRE vs ACQUIRE implication: > > - All LOCK operations issued before another LOCK operation will be completed > - before that LOCK operation. > + All ACQUIRE operations issued before another ACQUIRE operation will be > + completed before that ACQUIRE operation. > > - (4) LOCK vs UNLOCK implication: > + (4) ACQUIRE vs RELEASE implication: > > - All LOCK operations issued before an UNLOCK operation will be completed > - before the UNLOCK operation. > + All ACQUIRE operations issued before an RELEASE operation will be > + completed before the RELEASE operation. > > - All UNLOCK operations issued before a LOCK operation will be completed > - before the LOCK operation. > + All RELEASE operations issued before a ACQUIRE operation will be > + completed before the ACQUIRE operation. > > - (5) Failed conditional LOCK implication: > + (5) Failed conditional ACQUIRE implication: > > - Certain variants of the LOCK operation may fail, either due to being > + Certain variants of the ACQUIRE operation may fail, either due to being > unable to get the lock immediately, or due to receiving an unblocked > signal whilst asleep waiting for the lock to become available. Failed > locks do not imply any sort of barrier. I suggest adding "For example" to the beginning of the last sentence: For example, failed lock acquisitions do not imply any sort of barrier. Otherwise, the transition from ACQUIRE to lock is strange. > -Therefore, from (1), (2) and (4) an UNLOCK followed by an unconditional LOCK is > -equivalent to a full barrier, but a LOCK followed by an UNLOCK is not. > +Therefore, from (1), (2) and (4) an RELEASE followed by an unconditional > +ACQUIRE is equivalent to a full barrier, but a ACQUIRE followed by an RELEASE > +is not. > > [!] Note: one of the consequences of LOCKs and UNLOCKs being only one-way > barriers is that the effects of instructions outside of a critical section > may seep into the inside of the critical section. > > -A LOCK followed by an UNLOCK may not be assumed to be full memory barrier > -because it is possible for an access preceding the LOCK to happen after the > -LOCK, and an access following the UNLOCK to happen before the UNLOCK, and the > -two accesses can themselves then cross: > +A ACQUIRE followed by an RELEASE may not be assumed to be full memory barrier > +because it is possible for an access preceding the ACQUIRE to happen after the > +ACQUIRE, and an access following the RELEASE to happen before the RELEASE, and > +the two accesses can themselves then cross: > > *A = a; > - LOCK > - UNLOCK > + ACQUIRE > + RELEASE > *B = b; > > may occur as: > > - LOCK, STORE *B, STORE *A, UNLOCK > + ACQUIRE, STORE *B, STORE *A, RELEASE > > Locks and semaphores may not provide any guarantee of ordering on UP compiled > systems, and so cannot be counted on in such a situation to actually achieve > @@ -1253,33 +1256,33 @@ See also the section on "Inter-CPU locki > > *A = a; > *B = b; > - LOCK > + ACQUIRE > *C = c; > *D = d; > - UNLOCK > + RELEASE > *E = e; > *F = f; > > The following sequence of events is acceptable: > > - LOCK, {*F,*A}, *E, {*C,*D}, *B, UNLOCK > + ACQUIRE, {*F,*A}, *E, {*C,*D}, *B, RELEASE > > [+] Note that {*F,*A} indicates a combined access. > > But none of the following are: > > - {*F,*A}, *B, LOCK, *C, *D, UNLOCK, *E > - *A, *B, *C, LOCK, *D, UNLOCK, *E, *F > - *A, *B, LOCK, *C, UNLOCK, *D, *E, *F > - *B, LOCK, *C, *D, UNLOCK, {*F,*A}, *E > + {*F,*A}, *B, ACQUIRE, *C, *D, RELEASE, *E > + *A, *B, *C, ACQUIRE, *D, RELEASE, *E, *F > + *A, *B, ACQUIRE, *C, RELEASE, *D, *E, *F > + *B, ACQUIRE, *C, *D, RELEASE, {*F,*A}, *E > > > > INTERRUPT DISABLING FUNCTIONS > ----------------------------- > > -Functions that disable interrupts (LOCK equivalent) and enable interrupts > -(UNLOCK equivalent) will act as compiler barriers only. So if memory or I/O > +Functions that disable interrupts (ACQUIRE equivalent) and enable interrupts > +(RELEASE equivalent) will act as compiler barriers only. So if memory or I/O > barriers are required in such a situation, they must be provided from some > other means. > > @@ -1436,24 +1439,24 @@ Consider the following: the system has a > CPU 1 CPU 2 > =============================== =============================== > *A = a; *E = e; > - LOCK M LOCK Q > + ACQUIRE M ACQUIRE Q > *B = b; *F = f; > *C = c; *G = g; > - UNLOCK M UNLOCK Q > + RELEASE M RELEASE Q > *D = d; *H = h; > > Then there is no guarantee as to what order CPU 3 will see the accesses to *A > through *H occur in, other than the constraints imposed by the separate locks > on the separate CPUs. It might, for example, see: > > - *E, LOCK M, LOCK Q, *G, *C, *F, *A, *B, UNLOCK Q, *D, *H, UNLOCK M > + *E, ACQUIRE M, ACQUIRE Q, *G, *C, *F, *A, *B, RELEASE Q, *D, *H, RELEASE M > > But it won't see any of: > > - *B, *C or *D preceding LOCK M > - *A, *B or *C following UNLOCK M > - *F, *G or *H preceding LOCK Q > - *E, *F or *G following UNLOCK Q > + *B, *C or *D preceding ACQUIRE M > + *A, *B or *C following RELEASE M > + *F, *G or *H preceding ACQUIRE Q > + *E, *F or *G following RELEASE Q > > > However, if the following occurs: > @@ -1461,28 +1464,28 @@ through *H occur in, other than the cons > CPU 1 CPU 2 > =============================== =============================== > *A = a; > - LOCK M [1] > + ACQUIRE M [1] > *B = b; > *C = c; > - UNLOCK M [1] > + RELEASE M [1] > *D = d; *E = e; > - LOCK M [2] > + ACQUIRE M [2] > *F = f; > *G = g; > - UNLOCK M [2] > + RELEASE M [2] > *H = h; > > CPU 3 might see: > > - *E, LOCK M [1], *C, *B, *A, UNLOCK M [1], > - LOCK M [2], *H, *F, *G, UNLOCK M [2], *D > + *E, ACQUIRE M [1], *C, *B, *A, RELEASE M [1], > + ACQUIRE M [2], *H, *F, *G, RELEASE M [2], *D > > But assuming CPU 1 gets the lock first, CPU 3 won't see any of: > > - *B, *C, *D, *F, *G or *H preceding LOCK M [1] > - *A, *B or *C following UNLOCK M [1] > - *F, *G or *H preceding LOCK M [2] > - *A, *B, *C, *E, *F or *G following UNLOCK M [2] > + *B, *C, *D, *F, *G or *H preceding ACQUIRE M [1] > + *A, *B or *C following RELEASE M [1] > + *F, *G or *H preceding ACQUIRE M [2] > + *A, *B, *C, *E, *F or *G following RELEASE M [2] > > > LOCKS VS I/O ACCESSES > @@ -1702,13 +1705,13 @@ about the state (old or new) implies an > test_and_clear_bit(); > test_and_change_bit(); > > -These are used for such things as implementing LOCK-class and UNLOCK-class > +These are used for such things as implementing ACQUIRE-class and RELEASE-class > operations and adjusting reference counters towards object destruction, and as > such the implicit memory barrier effects are necessary. > > > The following operations are potential problems as they do _not_ imply memory > -barriers, but might be used for implementing such things as UNLOCK-class > +barriers, but might be used for implementing such things as RELEASE-class > operations: > > atomic_set(); > @@ -1750,9 +1753,9 @@ barriers are needed or not. > clear_bit_unlock(); > __clear_bit_unlock(); > > -These implement LOCK-class and UNLOCK-class operations. These should be used in > -preference to other operations when implementing locking primitives, because > -their implementations can be optimised on many architectures. > +These implement ACQUIRE-class and RELEASE-class operations. These should be > +used in preference to other operations when implementing locking primitives, > +because their implementations can be optimised on many architectures. > > [!] Note that special memory barrier primitives are available for these > situations because on some CPUs the atomic instructions used imply full memory > --- a/arch/alpha/include/asm/barrier.h > +++ b/arch/alpha/include/asm/barrier.h > @@ -3,33 +3,18 @@ > > #include > > -#define mb() \ > -__asm__ __volatile__("mb": : :"memory") > +#define mb() __asm__ __volatile__("mb": : :"memory") > +#define rmb() __asm__ __volatile__("mb": : :"memory") > +#define wmb() __asm__ __volatile__("wmb": : :"memory") > > -#define rmb() \ > -__asm__ __volatile__("mb": : :"memory") > - > -#define wmb() \ > -__asm__ __volatile__("wmb": : :"memory") > - > -#define read_barrier_depends() \ > -__asm__ __volatile__("mb": : :"memory") > +#define read_barrier_depends() __asm__ __volatile__("mb": : :"memory") > > #ifdef CONFIG_SMP > #define __ASM_SMP_MB "\tmb\n" > -#define smp_mb() mb() > -#define smp_rmb() rmb() > -#define smp_wmb() wmb() > -#define smp_read_barrier_depends() read_barrier_depends() > #else > #define __ASM_SMP_MB > -#define smp_mb() barrier() > -#define smp_rmb() barrier() > -#define smp_wmb() barrier() > -#define smp_read_barrier_depends() do { } while (0) > #endif > > -#define set_mb(var, value) \ > -do { var = value; mb(); } while (0) > +#include > > #endif /* __BARRIER_H */ > --- a/arch/arc/include/asm/Kbuild > +++ b/arch/arc/include/asm/Kbuild > @@ -47,3 +47,4 @@ generic-y += user.h > generic-y += vga.h > generic-y += xor.h > generic-y += preempt.h > +generic-y += barrier.h > --- a/arch/arc/include/asm/atomic.h > +++ b/arch/arc/include/asm/atomic.h > @@ -190,6 +190,11 @@ static inline void atomic_clear_mask(uns > > #endif /* !CONFIG_ARC_HAS_LLSC */ > > +#define smp_mb__before_atomic_dec() barrier() > +#define smp_mb__after_atomic_dec() barrier() > +#define smp_mb__before_atomic_inc() barrier() > +#define smp_mb__after_atomic_inc() barrier() > + > /** > * __atomic_add_unless - add unless the number is a given value > * @v: pointer of type atomic_t > --- a/arch/arc/include/asm/barrier.h > +++ /dev/null > @@ -1,42 +0,0 @@ > -/* > - * Copyright (C) 2004, 2007-2010, 2011-2012 Synopsys, Inc. (www.synopsys.com) > - * > - * This program is free software; you can redistribute it and/or modify > - * it under the terms of the GNU General Public License version 2 as > - * published by the Free Software Foundation. > - */ > - > -#ifndef __ASM_BARRIER_H > -#define __ASM_BARRIER_H > - > -#ifndef __ASSEMBLY__ > - > -/* TODO-vineetg: Need to see what this does, don't we need sync anywhere */ > -#define mb() __asm__ __volatile__ ("" : : : "memory") > -#define rmb() mb() > -#define wmb() mb() > -#define set_mb(var, value) do { var = value; mb(); } while (0) > -#define set_wmb(var, value) do { var = value; wmb(); } while (0) > -#define read_barrier_depends() mb() > - > -/* TODO-vineetg verify the correctness of macros here */ > -#ifdef CONFIG_SMP > -#define smp_mb() mb() > -#define smp_rmb() rmb() > -#define smp_wmb() wmb() > -#else > -#define smp_mb() barrier() > -#define smp_rmb() barrier() > -#define smp_wmb() barrier() > -#endif > - > -#define smp_mb__before_atomic_dec() barrier() > -#define smp_mb__after_atomic_dec() barrier() > -#define smp_mb__before_atomic_inc() barrier() > -#define smp_mb__after_atomic_inc() barrier() > - > -#define smp_read_barrier_depends() do { } while (0) > - > -#endif > - > -#endif I do like this take-no-prisoners approach! ;-) > --- a/arch/arm/include/asm/barrier.h > +++ b/arch/arm/include/asm/barrier.h > @@ -59,6 +59,21 @@ > #define smp_wmb() dmb(ishst) > #endif > > +#define smp_store_release(p, v) \ > +do { \ > + compiletime_assert_atomic_type(*p); \ > + smp_mb(); \ > + ACCESS_ONCE(*p) = (v); \ > +} while (0) > + > +#define smp_load_acquire(p) \ > +({ \ > + typeof(*p) ___p1 = ACCESS_ONCE(*p); \ > + compiletime_assert_atomic_type(*p); \ > + smp_mb(); \ > + ___p1; \ > +}) > + > #define read_barrier_depends() do { } while(0) > #define smp_read_barrier_depends() do { } while(0) > > --- a/arch/arm64/include/asm/barrier.h > +++ b/arch/arm64/include/asm/barrier.h > @@ -35,11 +35,59 @@ > #define smp_mb() barrier() > #define smp_rmb() barrier() > #define smp_wmb() barrier() > + > +#define smp_store_release(p, v) \ > +do { \ > + compiletime_assert_atomic_type(*p); \ > + smp_mb(); \ > + ACCESS_ONCE(*p) = (v); \ > +} while (0) > + > +#define smp_load_acquire(p) \ > +({ \ > + typeof(*p) ___p1 = ACCESS_ONCE(*p); \ > + compiletime_assert_atomic_type(*p); \ > + smp_mb(); \ > + ___p1; \ > +}) > + > #else > + > #define smp_mb() asm volatile("dmb ish" : : : "memory") > #define smp_rmb() asm volatile("dmb ishld" : : : "memory") > #define smp_wmb() asm volatile("dmb ishst" : : : "memory") > -#endif > + > +#define smp_store_release(p, v) \ > +do { \ > + compiletime_assert_atomic_type(*p); \ > + switch (sizeof(*p)) { \ > + case 4: \ > + asm volatile ("stlr %w1, [%0]" \ > + : "=Q" (*p) : "r" (v) : "memory"); \ > + break; \ > + case 8: \ > + asm volatile ("stlr %1, [%0]" \ > + : "=Q" (*p) : "r" (v) : "memory"); \ > + break; \ > + } \ > +} while (0) > + > +#define smp_load_acquire(p) \ > +({ \ > + typeof(*p) ___p1; \ > + compiletime_assert_atomic_type(*p); \ > + switch (sizeof(*p)) { \ > + case 4: \ > + asm volatile ("ldar %w0, [%1]" \ > + : "=r" (___p1) : "Q" (*p) : "memory"); \ > + break; \ > + case 8: \ > + asm volatile ("ldar %0, [%1]" \ > + : "=r" (___p1) : "Q" (*p) : "memory"); \ > + break; \ > + } \ > + ___p1; \ > +}) > > #define read_barrier_depends() do { } while(0) > #define smp_read_barrier_depends() do { } while(0) > --- a/arch/avr32/include/asm/barrier.h > +++ b/arch/avr32/include/asm/barrier.h > @@ -8,22 +8,15 @@ > #ifndef __ASM_AVR32_BARRIER_H > #define __ASM_AVR32_BARRIER_H > > -#define nop() asm volatile("nop") > - > -#define mb() asm volatile("" : : : "memory") > -#define rmb() mb() > -#define wmb() asm volatile("sync 0" : : : "memory") > -#define read_barrier_depends() do { } while(0) > -#define set_mb(var, value) do { var = value; mb(); } while(0) > +/* > + * Weirdest thing ever.. no full barrier, but it has a write barrier! > + */ > +#define wmb() asm volatile("sync 0" : : : "memory") Doesn't this mean that asm-generic/barrier.h needs to check for definitions? Ah, I see below that you added these checks. > #ifdef CONFIG_SMP > # error "The AVR32 port does not support SMP" > -#else > -# define smp_mb() barrier() > -# define smp_rmb() barrier() > -# define smp_wmb() barrier() > -# define smp_read_barrier_depends() do { } while(0) > #endif > > +#include > > #endif /* __ASM_AVR32_BARRIER_H */ > --- a/arch/blackfin/include/asm/barrier.h > +++ b/arch/blackfin/include/asm/barrier.h > @@ -23,26 +23,10 @@ > # define rmb() do { barrier(); smp_check_barrier(); } while (0) > # define wmb() do { barrier(); smp_mark_barrier(); } while (0) > # define read_barrier_depends() do { barrier(); smp_check_barrier(); } while (0) > -#else > -# define mb() barrier() > -# define rmb() barrier() > -# define wmb() barrier() > -# define read_barrier_depends() do { } while (0) > #endif > > -#else /* !CONFIG_SMP */ > - > -#define mb() barrier() > -#define rmb() barrier() > -#define wmb() barrier() > -#define read_barrier_depends() do { } while (0) > - > #endif /* !CONFIG_SMP */ > > -#define smp_mb() mb() > -#define smp_rmb() rmb() > -#define smp_wmb() wmb() > -#define set_mb(var, value) do { var = value; mb(); } while (0) > -#define smp_read_barrier_depends() read_barrier_depends() > +#include > > #endif /* _BLACKFIN_BARRIER_H */ > --- a/arch/cris/include/asm/Kbuild > +++ b/arch/cris/include/asm/Kbuild > @@ -12,3 +12,4 @@ generic-y += trace_clock.h > generic-y += vga.h > generic-y += xor.h > generic-y += preempt.h > +generic-y += barrier.h > --- a/arch/cris/include/asm/barrier.h > +++ /dev/null > @@ -1,25 +0,0 @@ > -#ifndef __ASM_CRIS_BARRIER_H > -#define __ASM_CRIS_BARRIER_H > - > -#define nop() __asm__ __volatile__ ("nop"); > - > -#define barrier() __asm__ __volatile__("": : :"memory") > -#define mb() barrier() > -#define rmb() mb() > -#define wmb() mb() > -#define read_barrier_depends() do { } while(0) > -#define set_mb(var, value) do { var = value; mb(); } while (0) > - > -#ifdef CONFIG_SMP > -#define smp_mb() mb() > -#define smp_rmb() rmb() > -#define smp_wmb() wmb() > -#define smp_read_barrier_depends() read_barrier_depends() > -#else > -#define smp_mb() barrier() > -#define smp_rmb() barrier() > -#define smp_wmb() barrier() > -#define smp_read_barrier_depends() do { } while(0) > -#endif > - > -#endif /* __ASM_CRIS_BARRIER_H */ > --- a/arch/frv/include/asm/barrier.h > +++ b/arch/frv/include/asm/barrier.h > @@ -17,13 +17,7 @@ > #define mb() asm volatile ("membar" : : :"memory") > #define rmb() asm volatile ("membar" : : :"memory") > #define wmb() asm volatile ("membar" : : :"memory") > -#define read_barrier_depends() do { } while (0) > > -#define smp_mb() barrier() > -#define smp_rmb() barrier() > -#define smp_wmb() barrier() > -#define smp_read_barrier_depends() do {} while(0) > -#define set_mb(var, value) \ > - do { var = (value); barrier(); } while (0) > +#include > > #endif /* _ASM_BARRIER_H */ > --- a/arch/h8300/include/asm/barrier.h > +++ b/arch/h8300/include/asm/barrier.h > @@ -3,27 +3,8 @@ > > #define nop() asm volatile ("nop"::) > > -/* > - * Force strict CPU ordering. > - * Not really required on H8... > - */ > -#define mb() asm volatile ("" : : :"memory") > -#define rmb() asm volatile ("" : : :"memory") > -#define wmb() asm volatile ("" : : :"memory") > #define set_mb(var, value) do { xchg(&var, value); } while (0) > > -#define read_barrier_depends() do { } while (0) > - > -#ifdef CONFIG_SMP > -#define smp_mb() mb() > -#define smp_rmb() rmb() > -#define smp_wmb() wmb() > -#define smp_read_barrier_depends() read_barrier_depends() > -#else > -#define smp_mb() barrier() > -#define smp_rmb() barrier() > -#define smp_wmb() barrier() > -#define smp_read_barrier_depends() do { } while(0) > -#endif > +#include > > #endif /* _H8300_BARRIER_H */ > --- a/arch/hexagon/include/asm/Kbuild > +++ b/arch/hexagon/include/asm/Kbuild > @@ -54,3 +54,4 @@ generic-y += ucontext.h > generic-y += unaligned.h > generic-y += xor.h > generic-y += preempt.h > +generic-y += barrier.h > --- a/arch/hexagon/include/asm/barrier.h > +++ /dev/null > @@ -1,41 +0,0 @@ > -/* > - * Memory barrier definitions for the Hexagon architecture > - * > - * Copyright (c) 2010-2011, The Linux Foundation. All rights reserved. > - * > - * This program is free software; you can redistribute it and/or modify > - * it under the terms of the GNU General Public License version 2 and > - * only version 2 as published by the Free Software Foundation. > - * > - * This program is distributed in the hope that it will be useful, > - * but WITHOUT ANY WARRANTY; without even the implied warranty of > - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > - * GNU General Public License for more details. > - * > - * You should have received a copy of the GNU General Public License > - * along with this program; if not, write to the Free Software > - * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA > - * 02110-1301, USA. > - */ > - > -#ifndef _ASM_BARRIER_H > -#define _ASM_BARRIER_H > - > -#define rmb() barrier() > -#define read_barrier_depends() barrier() > -#define wmb() barrier() > -#define mb() barrier() > -#define smp_rmb() barrier() > -#define smp_read_barrier_depends() barrier() > -#define smp_wmb() barrier() > -#define smp_mb() barrier() > -#define smp_mb__before_atomic_dec() barrier() > -#define smp_mb__after_atomic_dec() barrier() > -#define smp_mb__before_atomic_inc() barrier() > -#define smp_mb__after_atomic_inc() barrier() > - > -/* Set a value and use a memory barrier. Used by the scheduler somewhere. */ > -#define set_mb(var, value) \ > - do { var = value; mb(); } while (0) > - > -#endif /* _ASM_BARRIER_H */ > --- a/arch/ia64/include/asm/barrier.h > +++ b/arch/ia64/include/asm/barrier.h > @@ -45,11 +45,60 @@ > # define smp_rmb() rmb() > # define smp_wmb() wmb() > # define smp_read_barrier_depends() read_barrier_depends() > + > +#define smp_store_release(p, v) \ > +do { \ > + compiletime_assert_atomic_type(*p); \ > + switch (sizeof(*p)) { \ > + case 4: \ > + asm volatile ("st4.rel [%0]=%1" \ > + : "=r" (p) : "r" (v) : "memory"); \ > + break; \ > + case 8: \ > + asm volatile ("st8.rel [%0]=%1" \ > + : "=r" (p) : "r" (v) : "memory"); \ > + break; \ > + } \ > +} while (0) > + > +#define smp_load_acquire(p) \ > +({ \ > + typeof(*p) ___p1; \ > + compiletime_assert_atomic_type(*p); \ > + switch (sizeof(*p)) { \ > + case 4: \ > + asm volatile ("ld4.acq %0=[%1]" \ > + : "=r" (___p1) : "r" (p) : "memory"); \ > + break; \ > + case 8: \ > + asm volatile ("ld8.acq %0=[%1]" \ > + : "=r" (___p1) : "r" (p) : "memory"); \ > + break; \ > + } \ > + ___p1; \ > +}) > + > #else > + > # define smp_mb() barrier() > # define smp_rmb() barrier() > # define smp_wmb() barrier() > # define smp_read_barrier_depends() do { } while(0) > + > +#define smp_store_release(p, v) \ > +do { \ > + compiletime_assert_atomic_type(*p); \ > + smp_mb(); \ > + ACCESS_ONCE(*p) = (v); \ > +} while (0) > + > +#define smp_load_acquire(p) \ > +({ \ > + typeof(*p) ___p1 = ACCESS_ONCE(*p); \ > + compiletime_assert_atomic_type(*p); \ > + smp_mb(); \ > + ___p1; \ > +}) > #endif > > /* > --- a/arch/m32r/include/asm/barrier.h > +++ b/arch/m32r/include/asm/barrier.h > @@ -11,84 +11,6 @@ > > #define nop() __asm__ __volatile__ ("nop" : : ) > > -/* > - * Memory barrier. > - * > - * mb() prevents loads and stores being reordered across this point. > - * rmb() prevents loads being reordered across this point. > - * wmb() prevents stores being reordered across this point. > - */ > -#define mb() barrier() > -#define rmb() mb() > -#define wmb() mb() > - > -/** > - * read_barrier_depends - Flush all pending reads that subsequents reads > - * depend on. > - * > - * No data-dependent reads from memory-like regions are ever reordered > - * over this barrier. All reads preceding this primitive are guaranteed > - * to access memory (but not necessarily other CPUs' caches) before any > - * reads following this primitive that depend on the data return by > - * any of the preceding reads. This primitive is much lighter weight than > - * rmb() on most CPUs, and is never heavier weight than is > - * rmb(). > - * > - * These ordering constraints are respected by both the local CPU > - * and the compiler. > - * > - * Ordering is not guaranteed by anything other than these primitives, > - * not even by data dependencies. See the documentation for > - * memory_barrier() for examples and URLs to more information. > - * > - * For example, the following code would force ordering (the initial > - * value of "a" is zero, "b" is one, and "p" is "&a"): > - * > - * > - * CPU 0 CPU 1 > - * > - * b = 2; > - * memory_barrier(); > - * p = &b; q = p; > - * read_barrier_depends(); > - * d = *q; > - * > - * > - * > - * because the read of "*q" depends on the read of "p" and these > - * two reads are separated by a read_barrier_depends(). However, > - * the following code, with the same initial values for "a" and "b": > - * > - * > - * CPU 0 CPU 1 > - * > - * a = 2; > - * memory_barrier(); > - * b = 3; y = b; > - * read_barrier_depends(); > - * x = a; > - * > - * > - * does not enforce ordering, since there is no data dependency between > - * the read of "a" and the read of "b". Therefore, on some CPUs, such > - * as Alpha, "y" could be set to 3 and "x" to 0. Use rmb() > - * in cases like this where there are no data dependencies. > - **/ > - > -#define read_barrier_depends() do { } while (0) > - > -#ifdef CONFIG_SMP > -#define smp_mb() mb() > -#define smp_rmb() rmb() > -#define smp_wmb() wmb() > -#define smp_read_barrier_depends() read_barrier_depends() > -#define set_mb(var, value) do { (void) xchg(&var, value); } while (0) > -#else > -#define smp_mb() barrier() > -#define smp_rmb() barrier() > -#define smp_wmb() barrier() > -#define smp_read_barrier_depends() do { } while (0) > -#define set_mb(var, value) do { var = value; barrier(); } while (0) > -#endif > +#include > > #endif /* _ASM_M32R_BARRIER_H */ > --- a/arch/m68k/include/asm/barrier.h > +++ b/arch/m68k/include/asm/barrier.h > @@ -1,20 +1,8 @@ > #ifndef _M68K_BARRIER_H > #define _M68K_BARRIER_H > > -/* > - * Force strict CPU ordering. > - * Not really required on m68k... > - */ > #define nop() do { asm volatile ("nop"); barrier(); } while (0) > -#define mb() barrier() > -#define rmb() barrier() > -#define wmb() barrier() > -#define read_barrier_depends() ((void)0) > -#define set_mb(var, value) ({ (var) = (value); wmb(); }) > > -#define smp_mb() barrier() > -#define smp_rmb() barrier() > -#define smp_wmb() barrier() > -#define smp_read_barrier_depends() ((void)0) > +#include > > #endif /* _M68K_BARRIER_H */ > --- a/arch/metag/include/asm/barrier.h > +++ b/arch/metag/include/asm/barrier.h > @@ -82,4 +82,19 @@ static inline void fence(void) > #define smp_read_barrier_depends() do { } while (0) > #define set_mb(var, value) do { var = value; smp_mb(); } while (0) > > +#define smp_store_release(p, v) \ > +do { \ > + compiletime_assert_atomic_type(*p); \ > + smp_mb(); \ > + ACCESS_ONCE(*p) = (v); \ > +} while (0) > + > +#define smp_load_acquire(p) \ > +({ \ > + typeof(*p) ___p1 = ACCESS_ONCE(*p); \ > + compiletime_assert_atomic_type(*p); \ > + smp_mb(); \ > + ___p1; \ > +}) > + > #endif /* _ASM_METAG_BARRIER_H */ > --- a/arch/microblaze/include/asm/Kbuild > +++ b/arch/microblaze/include/asm/Kbuild > @@ -4,3 +4,4 @@ generic-y += exec.h > generic-y += trace_clock.h > generic-y += syscalls.h > generic-y += preempt.h > +generic-y += barrier.h > --- a/arch/microblaze/include/asm/barrier.h > +++ /dev/null > @@ -1,27 +0,0 @@ > -/* > - * Copyright (C) 2006 Atmark Techno, Inc. > - * > - * This file is subject to the terms and conditions of the GNU General Public > - * License. See the file "COPYING" in the main directory of this archive > - * for more details. > - */ > - > -#ifndef _ASM_MICROBLAZE_BARRIER_H > -#define _ASM_MICROBLAZE_BARRIER_H > - > -#define nop() asm volatile ("nop") > - > -#define smp_read_barrier_depends() do {} while (0) > -#define read_barrier_depends() do {} while (0) > - > -#define mb() barrier() > -#define rmb() mb() > -#define wmb() mb() > -#define set_mb(var, value) do { var = value; mb(); } while (0) > -#define set_wmb(var, value) do { var = value; wmb(); } while (0) > - > -#define smp_mb() mb() > -#define smp_rmb() rmb() > -#define smp_wmb() wmb() > - > -#endif /* _ASM_MICROBLAZE_BARRIER_H */ > --- a/arch/mips/include/asm/barrier.h > +++ b/arch/mips/include/asm/barrier.h > @@ -180,4 +180,19 @@ > #define nudge_writes() mb() > #endif > > +#define smp_store_release(p, v) \ > +do { \ > + compiletime_assert_atomic_type(*p); \ > + smp_mb(); \ > + ACCESS_ONCE(*p) = (v); \ > +} while (0) > + > +#define smp_load_acquire(p) \ > +({ \ > + typeof(*p) ___p1 = ACCESS_ONCE(*p); \ > + compiletime_assert_atomic_type(*p); \ > + smp_mb(); \ > + ___p1; \ > +}) > + > #endif /* __ASM_BARRIER_H */ > --- a/arch/mn10300/include/asm/Kbuild > +++ b/arch/mn10300/include/asm/Kbuild > @@ -3,3 +3,4 @@ generic-y += clkdev.h > generic-y += exec.h > generic-y += trace_clock.h > generic-y += preempt.h > +generic-y += barrier.h > --- a/arch/mn10300/include/asm/barrier.h > +++ /dev/null > @@ -1,37 +0,0 @@ > -/* MN10300 memory barrier definitions > - * > - * Copyright (C) 2007 Red Hat, Inc. All Rights Reserved. > - * Written by David Howells (dhowells@redhat.com) > - * > - * This program is free software; you can redistribute it and/or > - * modify it under the terms of the GNU General Public Licence > - * as published by the Free Software Foundation; either version > - * 2 of the Licence, or (at your option) any later version. > - */ > -#ifndef _ASM_BARRIER_H > -#define _ASM_BARRIER_H > - > -#define nop() asm volatile ("nop") > - > -#define mb() asm volatile ("": : :"memory") > -#define rmb() mb() > -#define wmb() asm volatile ("": : :"memory") > - > -#ifdef CONFIG_SMP > -#define smp_mb() mb() > -#define smp_rmb() rmb() > -#define smp_wmb() wmb() > -#define set_mb(var, value) do { xchg(&var, value); } while (0) > -#else /* CONFIG_SMP */ > -#define smp_mb() barrier() > -#define smp_rmb() barrier() > -#define smp_wmb() barrier() > -#define set_mb(var, value) do { var = value; mb(); } while (0) > -#endif /* CONFIG_SMP */ > - > -#define set_wmb(var, value) do { var = value; wmb(); } while (0) > - > -#define read_barrier_depends() do {} while (0) > -#define smp_read_barrier_depends() do {} while (0) > - > -#endif /* _ASM_BARRIER_H */ > --- a/arch/parisc/include/asm/Kbuild > +++ b/arch/parisc/include/asm/Kbuild > @@ -5,3 +5,4 @@ generic-y += word-at-a-time.h auxvec.h u > poll.h xor.h clkdev.h exec.h > generic-y += trace_clock.h > generic-y += preempt.h > +generic-y += barrier.h > --- a/arch/parisc/include/asm/barrier.h > +++ /dev/null > @@ -1,35 +0,0 @@ > -#ifndef __PARISC_BARRIER_H > -#define __PARISC_BARRIER_H > - > -/* > -** This is simply the barrier() macro from linux/kernel.h but when serial.c > -** uses tqueue.h uses smp_mb() defined using barrier(), linux/kernel.h > -** hasn't yet been included yet so it fails, thus repeating the macro here. > -** > -** PA-RISC architecture allows for weakly ordered memory accesses although > -** none of the processors use it. There is a strong ordered bit that is > -** set in the O-bit of the page directory entry. Operating systems that > -** can not tolerate out of order accesses should set this bit when mapping > -** pages. The O-bit of the PSW should also be set to 1 (I don't believe any > -** of the processor implemented the PSW O-bit). The PCX-W ERS states that > -** the TLB O-bit is not implemented so the page directory does not need to > -** have the O-bit set when mapping pages (section 3.1). This section also > -** states that the PSW Y, Z, G, and O bits are not implemented. > -** So it looks like nothing needs to be done for parisc-linux (yet). > -** (thanks to chada for the above comment -ggg) > -** > -** The __asm__ op below simple prevents gcc/ld from reordering > -** instructions across the mb() "call". > -*/ > -#define mb() __asm__ __volatile__("":::"memory") /* barrier() */ > -#define rmb() mb() > -#define wmb() mb() > -#define smp_mb() mb() > -#define smp_rmb() mb() > -#define smp_wmb() mb() > -#define smp_read_barrier_depends() do { } while(0) > -#define read_barrier_depends() do { } while(0) > - > -#define set_mb(var, value) do { var = value; mb(); } while (0) > - > -#endif /* __PARISC_BARRIER_H */ > --- a/arch/powerpc/include/asm/barrier.h > +++ b/arch/powerpc/include/asm/barrier.h > @@ -45,11 +45,15 @@ > # define SMPWMB eieio > #endif > > +#define __lwsync() __asm__ __volatile__ (stringify_in_c(LWSYNC) : : :"memory") > + > #define smp_mb() mb() > -#define smp_rmb() __asm__ __volatile__ (stringify_in_c(LWSYNC) : : :"memory") > +#define smp_rmb() __lwsync() > #define smp_wmb() __asm__ __volatile__ (stringify_in_c(SMPWMB) : : :"memory") > #define smp_read_barrier_depends() read_barrier_depends() > #else > +#define __lwsync() barrier() > + > #define smp_mb() barrier() > #define smp_rmb() barrier() > #define smp_wmb() barrier() > @@ -65,4 +69,19 @@ > #define data_barrier(x) \ > asm volatile("twi 0,%0,0; isync" : : "r" (x) : "memory"); > > +#define smp_store_release(p, v) \ > +do { \ > + compiletime_assert_atomic_type(*p); \ > + __lwsync(); \ > + ACCESS_ONCE(*p) = (v); \ > +} while (0) > + > +#define smp_load_acquire(p) \ > +({ \ > + typeof(*p) ___p1 = ACCESS_ONCE(*p); \ > + compiletime_assert_atomic_type(*p); \ > + __lwsync(); \ > + ___p1; \ > +}) > + > #endif /* _ASM_POWERPC_BARRIER_H */ > --- a/arch/s390/include/asm/barrier.h > +++ b/arch/s390/include/asm/barrier.h > @@ -32,4 +32,19 @@ > > #define set_mb(var, value) do { var = value; mb(); } while (0) > > +#define smp_store_release(p, v) \ > +do { \ > + compiletime_assert_atomic_type(*p); \ > + barrier(); \ > + ACCESS_ONCE(*p) = (v); \ > +} while (0) > + > +#define smp_load_acquire(p) \ > +({ \ > + typeof(*p) ___p1 = ACCESS_ONCE(*p); \ > + compiletime_assert_atomic_type(*p); \ > + barrier(); \ > + ___p1; \ > +}) > + > #endif /* __ASM_BARRIER_H */ > --- a/arch/score/include/asm/Kbuild > +++ b/arch/score/include/asm/Kbuild > @@ -5,3 +5,4 @@ generic-y += clkdev.h > generic-y += trace_clock.h > generic-y += xor.h > generic-y += preempt.h > +generic-y += barrier.h > --- a/arch/score/include/asm/barrier.h > +++ /dev/null > @@ -1,16 +0,0 @@ > -#ifndef _ASM_SCORE_BARRIER_H > -#define _ASM_SCORE_BARRIER_H > - > -#define mb() barrier() > -#define rmb() barrier() > -#define wmb() barrier() > -#define smp_mb() barrier() > -#define smp_rmb() barrier() > -#define smp_wmb() barrier() > - > -#define read_barrier_depends() do {} while (0) > -#define smp_read_barrier_depends() do {} while (0) > - > -#define set_mb(var, value) do {var = value; wmb(); } while (0) > - > -#endif /* _ASM_SCORE_BARRIER_H */ > --- a/arch/sh/include/asm/barrier.h > +++ b/arch/sh/include/asm/barrier.h > @@ -26,29 +26,14 @@ > #if defined(CONFIG_CPU_SH4A) || defined(CONFIG_CPU_SH5) > #define mb() __asm__ __volatile__ ("synco": : :"memory") > #define rmb() mb() > -#define wmb() __asm__ __volatile__ ("synco": : :"memory") > +#define wmb() mb() > #define ctrl_barrier() __icbi(PAGE_OFFSET) > -#define read_barrier_depends() do { } while(0) > #else > -#define mb() __asm__ __volatile__ ("": : :"memory") > -#define rmb() mb() > -#define wmb() __asm__ __volatile__ ("": : :"memory") > #define ctrl_barrier() __asm__ __volatile__ ("nop;nop;nop;nop;nop;nop;nop;nop") > -#define read_barrier_depends() do { } while(0) > -#endif > - > -#ifdef CONFIG_SMP > -#define smp_mb() mb() > -#define smp_rmb() rmb() > -#define smp_wmb() wmb() > -#define smp_read_barrier_depends() read_barrier_depends() > -#else > -#define smp_mb() barrier() > -#define smp_rmb() barrier() > -#define smp_wmb() barrier() > -#define smp_read_barrier_depends() do { } while(0) > #endif > > #define set_mb(var, value) do { (void)xchg(&var, value); } while (0) > > +#include > + > #endif /* __ASM_SH_BARRIER_H */ > --- a/arch/sparc/include/asm/barrier_32.h > +++ b/arch/sparc/include/asm/barrier_32.h > @@ -1,15 +1,6 @@ > #ifndef __SPARC_BARRIER_H > #define __SPARC_BARRIER_H > > -/* XXX Change this if we ever use a PSO mode kernel. */ > -#define mb() __asm__ __volatile__ ("" : : : "memory") > -#define rmb() mb() > -#define wmb() mb() > -#define read_barrier_depends() do { } while(0) > -#define set_mb(__var, __value) do { __var = __value; mb(); } while(0) > -#define smp_mb() __asm__ __volatile__("":::"memory") > -#define smp_rmb() __asm__ __volatile__("":::"memory") > -#define smp_wmb() __asm__ __volatile__("":::"memory") > -#define smp_read_barrier_depends() do { } while(0) > +#include > > #endif /* !(__SPARC_BARRIER_H) */ > --- a/arch/sparc/include/asm/barrier_64.h > +++ b/arch/sparc/include/asm/barrier_64.h > @@ -53,4 +53,19 @@ do { __asm__ __volatile__("ba,pt %%xcc, > > #define smp_read_barrier_depends() do { } while(0) > > +#define smp_store_release(p, v) \ > +do { \ > + compiletime_assert_atomic_type(*p); \ > + barrier(); \ > + ACCESS_ONCE(*p) = (v); \ > +} while (0) > + > +#define smp_load_acquire(p) \ > +({ \ > + typeof(*p) ___p1 = ACCESS_ONCE(*p); \ > + compiletime_assert_atomic_type(*p); \ > + barrier(); \ > + ___p1; \ > +}) > + > #endif /* !(__SPARC64_BARRIER_H) */ > --- a/arch/tile/include/asm/barrier.h > +++ b/arch/tile/include/asm/barrier.h > @@ -22,59 +22,6 @@ > #include > #include > > -/* > - * read_barrier_depends - Flush all pending reads that subsequents reads > - * depend on. > - * > - * No data-dependent reads from memory-like regions are ever reordered > - * over this barrier. All reads preceding this primitive are guaranteed > - * to access memory (but not necessarily other CPUs' caches) before any > - * reads following this primitive that depend on the data return by > - * any of the preceding reads. This primitive is much lighter weight than > - * rmb() on most CPUs, and is never heavier weight than is > - * rmb(). > - * > - * These ordering constraints are respected by both the local CPU > - * and the compiler. > - * > - * Ordering is not guaranteed by anything other than these primitives, > - * not even by data dependencies. See the documentation for > - * memory_barrier() for examples and URLs to more information. > - * > - * For example, the following code would force ordering (the initial > - * value of "a" is zero, "b" is one, and "p" is "&a"): > - * > - * > - * CPU 0 CPU 1 > - * > - * b = 2; > - * memory_barrier(); > - * p = &b; q = p; > - * read_barrier_depends(); > - * d = *q; > - * > - * > - * because the read of "*q" depends on the read of "p" and these > - * two reads are separated by a read_barrier_depends(). However, > - * the following code, with the same initial values for "a" and "b": > - * > - * > - * CPU 0 CPU 1 > - * > - * a = 2; > - * memory_barrier(); > - * b = 3; y = b; > - * read_barrier_depends(); > - * x = a; > - * > - * > - * does not enforce ordering, since there is no data dependency between > - * the read of "a" and the read of "b". Therefore, on some CPUs, such > - * as Alpha, "y" could be set to 3 and "x" to 0. Use rmb() > - * in cases like this where there are no data dependencies. > - */ > -#define read_barrier_depends() do { } while (0) > - > #define __sync() __insn_mf() > > #include > @@ -125,20 +72,7 @@ mb_incoherent(void) > #define mb() fast_mb() > #define iob() fast_iob() > > -#ifdef CONFIG_SMP > -#define smp_mb() mb() > -#define smp_rmb() rmb() > -#define smp_wmb() wmb() > -#define smp_read_barrier_depends() read_barrier_depends() > -#else > -#define smp_mb() barrier() > -#define smp_rmb() barrier() > -#define smp_wmb() barrier() > -#define smp_read_barrier_depends() do { } while (0) > -#endif > - > -#define set_mb(var, value) \ > - do { var = value; mb(); } while (0) > +#include > > #endif /* !__ASSEMBLY__ */ > #endif /* _ASM_TILE_BARRIER_H */ > --- a/arch/unicore32/include/asm/barrier.h > +++ b/arch/unicore32/include/asm/barrier.h > @@ -14,15 +14,6 @@ > #define dsb() __asm__ __volatile__ ("" : : : "memory") > #define dmb() __asm__ __volatile__ ("" : : : "memory") > > -#define mb() barrier() > -#define rmb() barrier() > -#define wmb() barrier() > -#define smp_mb() barrier() > -#define smp_rmb() barrier() > -#define smp_wmb() barrier() > -#define read_barrier_depends() do { } while (0) > -#define smp_read_barrier_depends() do { } while (0) > - > -#define set_mb(var, value) do { var = value; smp_mb(); } while (0) > +#include > > #endif /* __UNICORE_BARRIER_H__ */ > --- a/arch/x86/include/asm/barrier.h > +++ b/arch/x86/include/asm/barrier.h > @@ -100,6 +100,21 @@ > #define set_mb(var, value) do { var = value; barrier(); } while (0) > #endif > > +#define smp_store_release(p, v) \ > +do { \ > + compiletime_assert_atomic_type(*p); \ > + barrier(); \ > + ACCESS_ONCE(*p) = (v); \ > +} while (0) > + > +#define smp_load_acquire(p) \ > +({ \ > + typeof(*p) ___p1 = ACCESS_ONCE(*p); \ > + compiletime_assert_atomic_type(*p); \ > + barrier(); \ > + ___p1; \ > +}) > + > /* > * Stop RDTSC speculation. This is needed when you need to use RDTSC > * (or get_cycles or vread that possibly accesses the TSC) in a defined > --- a/arch/xtensa/include/asm/barrier.h > +++ b/arch/xtensa/include/asm/barrier.h > @@ -9,21 +9,14 @@ > #ifndef _XTENSA_SYSTEM_H > #define _XTENSA_SYSTEM_H > > -#define smp_read_barrier_depends() do { } while(0) > -#define read_barrier_depends() do { } while(0) > - > #define mb() ({ __asm__ __volatile__("memw" : : : "memory"); }) > #define rmb() barrier() > #define wmb() mb() > > #ifdef CONFIG_SMP > #error smp_* not defined > -#else > -#define smp_mb() barrier() > -#define smp_rmb() barrier() > -#define smp_wmb() barrier() > #endif > > -#define set_mb(var, value) do { var = value; mb(); } while (0) > +#include > > #endif /* _XTENSA_SYSTEM_H */ > --- a/include/asm-generic/barrier.h > +++ b/include/asm-generic/barrier.h > @@ -1,4 +1,5 @@ > -/* Generic barrier definitions, based on MN10300 definitions. > +/* > + * Generic barrier definitions, based on MN10300 definitions. > * > * It should be possible to use these on really simple architectures, > * but it serves more as a starting point for new ports. > @@ -16,35 +17,67 @@ > > #ifndef __ASSEMBLY__ > > -#define nop() asm volatile ("nop") > +#include > + > +#ifndef nop > +#define nop() asm volatile ("nop") > +#endif > > /* > - * Force strict CPU ordering. > - * And yes, this is required on UP too when we're talking > - * to devices. > + * Force strict CPU ordering. And yes, this is required on UP too when we're > + * talking to devices. > * > - * This implementation only contains a compiler barrier. > + * Fall back to compiler barriers if nothing better is provided. > */ > > -#define mb() asm volatile ("": : :"memory") > -#define rmb() mb() > -#define wmb() asm volatile ("": : :"memory") > +#ifndef mb > +#define mb() barrier() > +#endif > + > +#ifndef rmb > +#define rmb() barrier() > +#endif > + > +#ifndef wmb > +#define wmb() barrier() > +#endif > + > +#ifndef read_barrier_depends > +#define read_barrier_depends() do {} while (0) > +#endif > > #ifdef CONFIG_SMP > #define smp_mb() mb() > #define smp_rmb() rmb() > #define smp_wmb() wmb() > +#define smp_read_barrier_depends() read_barrier_depends() > #else > #define smp_mb() barrier() > #define smp_rmb() barrier() > #define smp_wmb() barrier() > +#define smp_read_barrier_depends() do {} while (0) > #endif > > +#ifndef set_mb > #define set_mb(var, value) do { var = value; mb(); } while (0) > +#endif > + > #define set_wmb(var, value) do { var = value; wmb(); } while (0) > > -#define read_barrier_depends() do {} while (0) > -#define smp_read_barrier_depends() do {} while (0) > +#define smp_store_release(p, v) \ > +do { \ > + compiletime_assert_atomic_type(*p); \ > + smp_mb(); \ > + ACCESS_ONCE(*p) = (v); \ > +} while (0) > + > +#define smp_load_acquire(p) \ > +({ \ > + typeof(*p) ___p1 = ACCESS_ONCE(*p); \ > + compiletime_assert_atomic_type(*p); \ > + smp_mb(); \ > + ___p1; \ > +}) > > #endif /* !__ASSEMBLY__ */ > #endif /* __ASM_GENERIC_BARRIER_H */ > --- a/include/linux/compiler.h > +++ b/include/linux/compiler.h > @@ -298,6 +298,11 @@ void ftrace_likely_update(struct ftrace_ > # define __same_type(a, b) __builtin_types_compatible_p(typeof(a), typeof(b)) > #endif > > +/* Is this type a native word size -- useful for atomic operations */ > +#ifndef __native_word > +# define __native_word(t) (sizeof(t) == sizeof(int) || sizeof(t) == sizeof(long)) > +#endif > + > /* Compile time object size, -1 for unknown */ > #ifndef __compiletime_object_size > # define __compiletime_object_size(obj) -1 > @@ -337,6 +342,10 @@ void ftrace_likely_update(struct ftrace_ > #define compiletime_assert(condition, msg) \ > _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__) > > +#define compiletime_assert_atomic_type(t) \ > + compiletime_assert(__native_word(t), \ > + "Need native word sized stores/loads for atomicity.") > + > /* > * Prevent the compiler from merging or refetching accesses. The compiler > * is also forbidden from reordering successive instances of ACCESS_ONCE(), >