From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933186AbcIEKhY (ORCPT ); Mon, 5 Sep 2016 06:37:24 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:33511 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S932249AbcIEKhW (ORCPT ); Mon, 5 Sep 2016 06:37:22 -0400 X-IBM-Helo: d03dlp02.boulder.ibm.com X-IBM-MailFrom: paulmck@linux.vnet.ibm.com Date: Mon, 5 Sep 2016 03:37:14 -0700 From: "Paul E. McKenney" To: Peter Zijlstra Cc: Linus Torvalds , Will Deacon , Oleg Nesterov , Benjamin Herrenschmidt , Michael Ellerman , linux-kernel@vger.kernel.org, Nicholas Piggin , Ingo Molnar , Alan Stern Subject: Re: Question on smp_mb__before_spinlock Reply-To: paulmck@linux.vnet.ibm.com References: <20160905093753.GN10138@twins.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160905093753.GN10138@twins.programming.kicks-ass.net> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16090510-0024-0000-0000-0000147A9F25 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00005713; HX=3.00000240; KW=3.00000007; PH=3.00000004; SC=3.00000184; SDB=6.00754046; UDB=6.00356734; IPR=6.00526894; BA=6.00004693; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00012583; XFM=3.00000011; UTC=2016-09-05 10:37:18 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 16090510-0025-0000-0000-0000442EC0E2 Message-Id: <20160905103714.GZ3663@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-09-05_05:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1604210000 definitions=main-1609050161 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Sep 05, 2016 at 11:37:53AM +0200, Peter Zijlstra wrote: > Hi all, > > So recently I've had two separate issues that touched upon > smp_mb__before_spinlock(). > > > Since its inception, our understanding of ACQUIRE, esp. as applied to > spinlocks, has changed somewhat. Also, I wonder if, with a simple > change, we cannot make it provide more. > > The problem with the comment is that the STORE done by spin_lock isn't > itself ordered by the ACQUIRE, and therefore a later LOAD can pass over > it and cross with any prior STORE, rendering the default WMB > insufficient (pointed out by Alan). > > Now, this is only really a problem on PowerPC and ARM64, the former of > which already defined smp_mb__before_spinlock() as a smp_mb(), the > latter does not, Will? > > The second issue I wondered about is spinlock transitivity. All except > powerpc have RCsc locks, and since Power already does a full mb, would > it not make sense to put it _after_ the spin_lock(), which would provide > the same guarantee, but also upgrades the section to RCsc. > > That would make all schedule() calls fully transitive against one > another. > > > That is, would something like the below make sense? Looks to me like you have reinvented smp_mb__after_unlock_lock()... Thanx, Paul > (does not deal with mm_types.h and overlayfs using > smp_mb__before_spnlock). > > --- > arch/arm64/include/asm/barrier.h | 2 ++ > arch/powerpc/include/asm/barrier.h | 2 +- > include/linux/spinlock.h | 41 +++++++++++++++++++++++++++++--------- > kernel/sched/core.c | 5 +++-- > 4 files changed, 38 insertions(+), 12 deletions(-) > > diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h > index 4eea7f618dce..d5cc8b58f942 100644 > --- a/arch/arm64/include/asm/barrier.h > +++ b/arch/arm64/include/asm/barrier.h > @@ -104,6 +104,8 @@ do { \ > VAL; \ > }) > > +#define smp_mb__after_spinlock() smp_mb() > + > #include > > #endif /* __ASSEMBLY__ */ > diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h > index c0deafc212b8..23d64d7196b7 100644 > --- a/arch/powerpc/include/asm/barrier.h > +++ b/arch/powerpc/include/asm/barrier.h > @@ -74,7 +74,7 @@ do { \ > ___p1; \ > }) > > -#define smp_mb__before_spinlock() smp_mb() > +#define smp_mb__after_spinlock() smp_mb() > > #include > > diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h > index 47dd0cebd204..284616dad607 100644 > --- a/include/linux/spinlock.h > +++ b/include/linux/spinlock.h > @@ -118,16 +118,39 @@ do { \ > #endif > > /* > - * Despite its name it doesn't necessarily has to be a full barrier. > - * It should only guarantee that a STORE before the critical section > - * can not be reordered with LOADs and STOREs inside this section. > - * spin_lock() is the one-way barrier, this LOAD can not escape out > - * of the region. So the default implementation simply ensures that > - * a STORE can not move into the critical section, smp_wmb() should > - * serialize it with another STORE done by spin_lock(). > + * This barrier must provide two things: > + * > + * - it must guarantee a STORE before the spin_lock() is ordered against a > + * LOAD after it, see the comments at its two usage sites. > + * > + * - it must ensure the critical section is RCsc. > + * > + * The latter is important for cases where we observe values written by other > + * CPUs in spin-loops, without barriers, while being subject to scheduling. > + * > + * CPU0 CPU1 CPU2 > + * > + * for (;;) { > + * if (READ_ONCE(X)) > + * break; > + * } > + * X=1 > + * > + * > + * r = X; > + * > + * without transitivity it could be that CPU1 observes X!=0 breaks the loop, > + * we get migrated and CPU2 sees X==0. > + * > + * Since most load-store architectures implement ACQUIRE with an smp_mb() after > + * the LL/SC loop, they need no further barriers. Similarly all our TSO > + * architectures imlpy an smp_mb() for each atomic instruction and equally don't > + * need more. > + * > + * Architectures that can implement ACQUIRE better need to take care. > */ > -#ifndef smp_mb__before_spinlock > -#define smp_mb__before_spinlock() smp_wmb() > +#ifndef smp_mb__after_spinlock > +#define smp_mb__after_spinlock() do { } while (0) > #endif > > /** > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 556cb07ab1cf..b151a33d393b 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -2006,8 +2006,8 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags) > * reordered with p->state check below. This pairs with mb() in > * set_current_state() the waiting thread does. > */ > - smp_mb__before_spinlock(); > raw_spin_lock_irqsave(&p->pi_lock, flags); > + smp_mb__after_spinlock(); > if (!(p->state & state)) > goto out; > > @@ -3332,8 +3332,9 @@ static void __sched notrace __schedule(bool preempt) > * can't be reordered with __set_current_state(TASK_INTERRUPTIBLE) > * done by the caller to avoid the race with signal_wake_up(). > */ > - smp_mb__before_spinlock(); > raw_spin_lock(&rq->lock); > + smp_mb__after_spinlock(); > + > cookie = lockdep_pin_lock(&rq->lock); > > rq->clock_skip_update <<= 1; /* promote REQ to ACT */ >