From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932191AbbKLS7H (ORCPT ); Thu, 12 Nov 2015 13:59:07 -0500 Received: from e39.co.us.ibm.com ([32.97.110.160]:50145 "EHLO e39.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754446AbbKLS7C (ORCPT ); Thu, 12 Nov 2015 13:59:02 -0500 X-IBM-Helo: d03dlp03.boulder.ibm.com X-IBM-MailFrom: paulmck@linux.vnet.ibm.com X-IBM-RcptTo: linux-kernel@vger.kernel.org Date: Thu, 12 Nov 2015 10:59:06 -0800 From: "Paul E. McKenney" To: Oleg Nesterov Cc: Peter Zijlstra , Boqun Feng , mingo@kernel.org, linux-kernel@vger.kernel.org, corbet@lwn.net, mhocko@kernel.org, dhowells@redhat.com, torvalds@linux-foundation.org, will.deacon@arm.com, Michael Ellerman , Benjamin Herrenschmidt , Paul Mackerras Subject: Re: [PATCH 4/4] locking: Introduce smp_cond_acquire() Message-ID: <20151112185906.GK3972@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <20151103175958.GA4800@redhat.com> <20151111093939.GA6314@fixme-laptop.cn.ibm.com> <20151111121232.GN17308@twins.programming.kicks-ass.net> <20151111193953.GA23515@redhat.com> <20151112070915.GC6314@fixme-laptop.cn.ibm.com> <20151112150058.GA30321@redhat.com> <20151112151839.GE6314@fixme-laptop.cn.ibm.com> <20151112183807.GA7538@redhat.com> <20151112180203.GF17308@twins.programming.kicks-ass.net> <20151112193302.GA9988@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20151112193302.GA9988@redhat.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 15111218-0033-0000-0000-000006D0383B Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Nov 12, 2015 at 08:33:02PM +0100, Oleg Nesterov wrote: > On 11/12, Peter Zijlstra wrote: > > > > On Thu, Nov 12, 2015 at 07:38:07PM +0100, Oleg Nesterov wrote: > > > It seems that PPC needs to define smp_mb__before_spinlock() as full mb(), > > > no? > > > > It does: > > > > arch/powerpc/include/asm/barrier.h:#define smp_mb__before_spinlock() smp_mb() > > Ah, indeed, thanks. > > And given that it also defines smp_mb__after_unlock_lock() as smp_mb(), > I am starting to understand how it can help to avoid the races with > spin_unlock_wait() in (for example) do_exit(). > > But as Boqun has already mentioned, this means that mb__after_unlock_lock() > has the new meaning which should be documented. > > Hmm. And 12d560f4 "Privatize smp_mb__after_unlock_lock()" should be reverted > then ;) Surprisingly, this reverts cleanly against today's mainline, please see the patch below. Against my -rcu stack, not so much, but so it goes. ;-) Thanx, Paul ------------------------------------------------------------------------ commit eff0632b4181f91f2596d56f7c73194e1a869aff Author: Paul E. McKenney Date: Thu Nov 12 10:54:23 2015 -0800 Revert "rcu,locking: Privatize smp_mb__after_unlock_lock()" This reverts commit 12d560f4ea87030667438a169912380be00cea4b. The reason for this revert is that smp_mb__after_unlock_lock() might prove useful outside of RCU after all for interactions between the locking primitives and spin_unlock_wait(). diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt index aef9487303d0..d4501664d49f 100644 --- a/Documentation/memory-barriers.txt +++ b/Documentation/memory-barriers.txt @@ -1855,10 +1855,16 @@ RELEASE are to the same lock variable, but only from the perspective of another CPU not holding that lock. In short, a ACQUIRE followed by an RELEASE may -not- be assumed to be a full memory barrier. -Similarly, the reverse case of a RELEASE followed by an ACQUIRE does -not imply a full memory barrier. Therefore, the CPU's execution of the -critical sections corresponding to the RELEASE and the ACQUIRE can cross, -so that: +Similarly, the reverse case of a RELEASE followed by an ACQUIRE does not +imply a full memory barrier. If it is necessary for a RELEASE-ACQUIRE +pair to produce a full barrier, the ACQUIRE can be followed by an +smp_mb__after_unlock_lock() invocation. This will produce a full barrier +(including transitivity) if either (a) the RELEASE and the ACQUIRE are +executed by the same CPU or task, or (b) the RELEASE and ACQUIRE act on +the same variable. The smp_mb__after_unlock_lock() primitive is free +on many architectures. Without smp_mb__after_unlock_lock(), the CPU's +execution of the critical sections corresponding to the RELEASE and the +ACQUIRE can cross, so that: *A = a; RELEASE M @@ -1896,6 +1902,29 @@ the RELEASE would simply complete, thereby avoiding the deadlock. a sleep-unlock race, but the locking primitive needs to resolve such races properly in any case. +With smp_mb__after_unlock_lock(), the two critical sections cannot overlap. +For example, with the following code, the store to *A will always be +seen by other CPUs before the store to *B: + + *A = a; + RELEASE M + ACQUIRE N + smp_mb__after_unlock_lock(); + *B = b; + +The operations will always occur in one of the following orders: + + STORE *A, RELEASE, ACQUIRE, smp_mb__after_unlock_lock(), STORE *B + STORE *A, ACQUIRE, RELEASE, smp_mb__after_unlock_lock(), STORE *B + ACQUIRE, STORE *A, RELEASE, smp_mb__after_unlock_lock(), STORE *B + +If the RELEASE and ACQUIRE were instead both operating on the same lock +variable, only the first of these alternatives can occur. In addition, +the more strongly ordered systems may rule out some of the above orders. +But in any case, as noted earlier, the smp_mb__after_unlock_lock() +ensures that the store to *A will always be seen as happening before +the store to *B. + Locks and semaphores may not provide any guarantee of ordering on UP compiled systems, and so cannot be counted on in such a situation to actually achieve anything at all - especially with respect to I/O accesses - unless combined @@ -2126,6 +2155,40 @@ But it won't see any of: *E, *F or *G following RELEASE Q +However, if the following occurs: + + CPU 1 CPU 2 + =============================== =============================== + WRITE_ONCE(*A, a); + ACQUIRE M [1] + WRITE_ONCE(*B, b); + WRITE_ONCE(*C, c); + RELEASE M [1] + WRITE_ONCE(*D, d); WRITE_ONCE(*E, e); + ACQUIRE M [2] + smp_mb__after_unlock_lock(); + WRITE_ONCE(*F, f); + WRITE_ONCE(*G, g); + RELEASE M [2] + WRITE_ONCE(*H, h); + +CPU 3 might see: + + *E, ACQUIRE M [1], *C, *B, *A, RELEASE M [1], + ACQUIRE M [2], *H, *F, *G, RELEASE M [2], *D + +But assuming CPU 1 gets the lock first, CPU 3 won't see any of: + + *B, *C, *D, *F, *G or *H preceding ACQUIRE M [1] + *A, *B or *C following RELEASE M [1] + *F, *G or *H preceding ACQUIRE M [2] + *A, *B, *C, *E, *F or *G following RELEASE M [2] + +Note that the smp_mb__after_unlock_lock() is critically important +here: Without it CPU 3 might see some of the above orderings. +Without smp_mb__after_unlock_lock(), the accesses are not guaranteed +to be seen in order unless CPU 3 holds lock M. + ACQUIRES VS I/O ACCESSES ------------------------ diff --git a/arch/powerpc/include/asm/spinlock.h b/arch/powerpc/include/asm/spinlock.h index 523673d7583c..4dbe072eecbe 100644 --- a/arch/powerpc/include/asm/spinlock.h +++ b/arch/powerpc/include/asm/spinlock.h @@ -28,6 +28,8 @@ #include #include +#define smp_mb__after_unlock_lock() smp_mb() /* Full ordering for lock. */ + #ifdef CONFIG_PPC64 /* use 0x800000yy when locked, where yy == CPU number */ #ifdef __BIG_ENDIAN__ diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h index 47dd0cebd204..ffcd053ca89a 100644 --- a/include/linux/spinlock.h +++ b/include/linux/spinlock.h @@ -130,6 +130,16 @@ do { \ #define smp_mb__before_spinlock() smp_wmb() #endif +/* + * Place this after a lock-acquisition primitive to guarantee that + * an UNLOCK+LOCK pair act as a full barrier. This guarantee applies + * if the UNLOCK and LOCK are executed by the same CPU or if the + * UNLOCK and LOCK operate on the same lock variable. + */ +#ifndef smp_mb__after_unlock_lock +#define smp_mb__after_unlock_lock() do { } while (0) +#endif + /** * raw_spin_unlock_wait - wait until the spinlock gets unlocked * @lock: the spinlock in question. diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h index 9fb4e238d4dc..8c6753d903ec 100644 --- a/kernel/rcu/tree.h +++ b/kernel/rcu/tree.h @@ -652,15 +652,3 @@ static inline void rcu_nocb_q_lengths(struct rcu_data *rdp, long *ql, long *qll) #endif /* #else #ifdef CONFIG_RCU_NOCB_CPU */ } #endif /* #ifdef CONFIG_RCU_TRACE */ - -/* - * Place this after a lock-acquisition primitive to guarantee that - * an UNLOCK+LOCK pair act as a full barrier. This guarantee applies - * if the UNLOCK and LOCK are executed by the same CPU or if the - * UNLOCK and LOCK operate on the same lock variable. - */ -#ifdef CONFIG_PPC -#define smp_mb__after_unlock_lock() smp_mb() /* Full ordering for lock. */ -#else /* #ifdef CONFIG_PPC */ -#define smp_mb__after_unlock_lock() do { } while (0) -#endif /* #else #ifdef CONFIG_PPC */