From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S964898AbdKBUV7 (ORCPT ); Thu, 2 Nov 2017 16:21:59 -0400 Received: from iolanthe.rowland.org ([192.131.102.54]:33300 "HELO iolanthe.rowland.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S964885AbdKBUV5 (ORCPT ); Thu, 2 Nov 2017 16:21:57 -0400 Date: Thu, 2 Nov 2017 16:21:56 -0400 (EDT) From: Alan Stern X-X-Sender: stern@iolanthe.rowland.org To: Will Deacon cc: Peter Zijlstra , "Reshetova, Elena" , "linux-kernel@vger.kernel.org" , "gregkh@linuxfoundation.org" , "keescook@chromium.org" , "tglx@linutronix.de" , "mingo@redhat.com" , "ishkamiel@gmail.com" , Paul McKenney , , , , Subject: Re: [PATCH] refcount: provide same memory ordering guarantees as in atomic_t In-Reply-To: <20171102171644.GD595@arm.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 2 Nov 2017, Will Deacon wrote: > > Right. To address your point: release + acquire isn't the same as a > > full barrier either. The SB pattern illustrates the difference: > > > > P0 P1 > > Write x=1 Write y=1 > > Release a smp_mb > > Acquire b Read x=0 > > Read y=0 > > > > This would not be allowed if the release + acquire sequence was > > replaced by smp_mb. But as it stands, this is allowed because nothing > > prevents the CPU from interchanging the order of the release and the > > acquire -- and then you're back to the acquire + release case. > > > > However, there is one circumstance where this interchange isn't > > allowed: when the release and acquire access the same memory > > location. Thus: > > > > P0(int *x, int *y, int *a) > > { > > int r0; > > > > WRITE_ONCE(*x, 1); > > smp_store_release(a, 1); > > smp_load_acquire(a); > > r0 = READ_ONCE(*y); > > } > > > > P1(int *x, int *y) > > { > > int r1; > > > > WRITE_ONCE(*y, 1); > > smp_mb(); > > r1 = READ_ONCE(*x); > > } > > > > exists (0:r0=0 /\ 1:r1=0) > > > > This is forbidden. It would remain forbidden even if the smp_mb in P1 > > were replaced by a similar release/acquire pair for the same memory > > location. I have to apologize; this was totally wrong. This test is not forbidden under the LKMM, and it certainly isn't forbidden if the smp_mb is replaced by a release/acquire pair. I was trying to think of something completely different. If you have a release/acquire to the same address, it creates a happens-before ordering: Access x Release a Acquire a Access y Here is the access to x happens-before the access to y. This is true even on x86, even in the presence of forwarding -- the CPU still has to execute the instructions in order. But if the release and acquire are to different addresses: Access x Release a Acquire b Access y then there is no happens-before ordering for x and y -- the CPU can execute the last two instructions before the first two. x86 and PowerPC won't do this, but I believe ARMv8 can. (Please correct me if it can't.) But happens-before is much weaker than a strong fence. So in short, release + acquire, even to the same address, is no replacement for smp_mb(). > Isn't this allowed on x86 mapping smp_mb() to mfence, store-release to plain > store and load-acquire to plain load? All we're saying is that you can forward > from a release to an acquire, which is fine for RCpc semantics. > > e.g. > > X86 SB+mfence+po-rfi-po > "MFencedWR Fre PodWW Rfi PodRR Fre" > Generator=diyone7 (version 7.46+3) > Prefetch=0:x=F,0:y=T,1:y=F,1:x=T > Com=Fr Fr > Orig=MFencedWR Fre PodWW Rfi PodRR Fre > { > } > P0 | P1 ; > MOV [x],$1 | MOV [y],$1 ; > MFENCE | MOV [z],$1 ; > MOV EAX,[y] | MOV EAX,[z] ; > | MOV EBX,[x] ; > exists > (0:EAX=0 /\ 1:EAX=1 /\ 1:EBX=0) > > which herd says is allowed: > > Test SB+mfence+po-rfi-po Allowed > States 4 > 0:EAX=0; 1:EAX=1; 1:EBX=0; > 0:EAX=0; 1:EAX=1; 1:EBX=1; > 0:EAX=1; 1:EAX=1; 1:EBX=0; > 0:EAX=1; 1:EAX=1; 1:EBX=1; > Ok > Witnesses > Positive: 1 Negative: 3 > Condition exists (0:EAX=0 /\ 1:EAX=1 /\ 1:EBX=0) > Observation SB+mfence+po-rfi-po Sometimes 1 3 > Time SB+mfence+po-rfi-po 0.00 > Hash=0f983e2d7579e5c04c332f9ac620c31f > > and I can reproduce using litmus to actually run it on my x86 box: > > %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% > % Results for SB+mfence+po-rfi-po.litmus % > %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% > X86 SB+mfence+po-rfi-po > "MFencedWR Fre PodWW Rfi PodRR Fre" > > {} > > P0 | P1 ; > MOV [x],$1 | MOV [y],$1 ; > MFENCE | MOV [z],$1 ; > MOV EAX,[y] | MOV EAX,[z] ; > | MOV EBX,[x] ; > > exists (0:EAX=0 /\ 1:EAX=1 /\ 1:EBX=0) > Generated assembler > #START _litmus_P1 > movl $1,(%r8,%rcx) > movl $1,(%r9,%rcx) > movl (%r9,%rcx),%eax > movl (%rdi,%rcx),%edx > #START _litmus_P0 > movl $1,(%rdx,%rcx) > mfence > movl (%rdi,%rcx),%eax > > Test SB+mfence+po-rfi-po Allowed > Histogram (4 states) > 8 *>0:EAX=0; 1:EAX=1; 1:EBX=0; > 1999851:>0:EAX=1; 1:EAX=1; 1:EBX=0; > 1999549:>0:EAX=0; 1:EAX=1; 1:EBX=1; > 592 :>0:EAX=1; 1:EAX=1; 1:EBX=1; > Ok > > Witnesses > Positive: 8, Negative: 3999992 > Condition exists (0:EAX=0 /\ 1:EAX=1 /\ 1:EBX=0) is validated > Hash=0f983e2d7579e5c04c332f9ac620c31f > Generator=diyone7 (version 7.46+3) > Com=Fr Fr > Orig=MFencedWR Fre PodWW Rfi PodRR Fre > Observation SB+mfence+po-rfi-po Sometimes 8 3999992 > Time SB+mfence+po-rfi-po 0.17 Yes, you are quite correct. Thanks for pointing out my mistake. Alan Stern