From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755818AbdKBRQq (ORCPT ); Thu, 2 Nov 2017 13:16:46 -0400 Received: from usa-sjc-mx-foss1.foss.arm.com ([217.140.101.70]:34540 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755215AbdKBRQl (ORCPT ); Thu, 2 Nov 2017 13:16:41 -0400 Date: Thu, 2 Nov 2017 17:16:44 +0000 From: Will Deacon To: Alan Stern Cc: Peter Zijlstra , "Reshetova, Elena" , "linux-kernel@vger.kernel.org" , "gregkh@linuxfoundation.org" , "keescook@chromium.org" , "tglx@linutronix.de" , "mingo@redhat.com" , "ishkamiel@gmail.com" , Paul McKenney , parri.andrea@gmail.com, boqun.feng@gmail.com, dhowells@redhat.com, david@fromorbit.com Subject: Re: [PATCH] refcount: provide same memory ordering guarantees as in atomic_t Message-ID: <20171102171644.GD595@arm.com> References: <20171102160237.t2xkryg6joskf77y@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Nov 02, 2017 at 01:08:52PM -0400, Alan Stern wrote: > On Thu, 2 Nov 2017, Peter Zijlstra wrote: > > > On Thu, Nov 02, 2017 at 11:40:35AM -0400, Alan Stern wrote: > > > On Thu, 2 Nov 2017, Peter Zijlstra wrote: > > > > > > > > Lock functions such as refcount_dec_and_lock() & > > > > > refcount_dec_and_mutex_lock() Provide exactly the same guarantees as > > > > > they atomic counterparts. > > > > > > > > Nope. The atomic_dec_and_lock() provides smp_mb() while > > > > refcount_dec_and_lock() merely orders all prior load/store's against all > > > > later load/store's. > > > > > > In fact there is no guaranteed ordering when refcount_dec_and_lock() > > > returns false; > > > > It should provide a release: > > > > - if !=1, dec_not_one will provide release > > - if ==1, dec_not_one will no-op, but then we'll acquire the lock and > > dec_and_test will provide the release, even if the test fails and we > > unlock again it should still dec. > > > > The one exception is when the counter is saturated, but in that case > > we'll never free the object and the ordering is moot in any case. > > Also if the counter is 0, but that will never happen if the > refcounting is correct. > > > > it provides ordering only if the return value is true. > > > In which case it provides acquire ordering (thanks to the spin_lock), > > > and both release ordering and a control dependency (thanks to the > > > refcount_dec_and_test). > > > > > > > The difference is subtle and involves at least 3 CPUs. I can't seem to > > > > write up anything simple, keeps turning into monsters :/ Will, Paul, > > > > have you got anything simple around? > > > > > > The combination of acquire + release is not the same as smp_mb, because > > > > acquire+release is nothing, its release+acquire that I meant which > > should order things locally, but now that you've got me looking at it > > again, we don't in fact do that. > > > > So refcount_dec_and_lock() will provide a release, irrespective of the > > return value (assuming we're not saturated). If it returns true, it also > > does an acquire for the lock. > > > > But combined they're acquire+release, which is unfortunate.. it means > > the lock section and the refcount stuff overlaps, but I don't suppose > > that's actually a problem. Need to consider more. > > Right. To address your point: release + acquire isn't the same as a > full barrier either. The SB pattern illustrates the difference: > > P0 P1 > Write x=1 Write y=1 > Release a smp_mb > Acquire b Read x=0 > Read y=0 > > This would not be allowed if the release + acquire sequence was > replaced by smp_mb. But as it stands, this is allowed because nothing > prevents the CPU from interchanging the order of the release and the > acquire -- and then you're back to the acquire + release case. > > However, there is one circumstance where this interchange isn't > allowed: when the release and acquire access the same memory > location. Thus: > > P0(int *x, int *y, int *a) > { > int r0; > > WRITE_ONCE(*x, 1); > smp_store_release(a, 1); > smp_load_acquire(a); > r0 = READ_ONCE(*y); > } > > P1(int *x, int *y) > { > int r1; > > WRITE_ONCE(*y, 1); > smp_mb(); > r1 = READ_ONCE(*x); > } > > exists (0:r0=0 /\ 1:r1=0) > > This is forbidden. It would remain forbidden even if the smp_mb in P1 > were replaced by a similar release/acquire pair for the same memory > location. Isn't this allowed on x86 mapping smp_mb() to mfence, store-release to plain store and load-acquire to plain load? All we're saying is that you can forward from a release to an acquire, which is fine for RCpc semantics. e.g. X86 SB+mfence+po-rfi-po "MFencedWR Fre PodWW Rfi PodRR Fre" Generator=diyone7 (version 7.46+3) Prefetch=0:x=F,0:y=T,1:y=F,1:x=T Com=Fr Fr Orig=MFencedWR Fre PodWW Rfi PodRR Fre { } P0 | P1 ; MOV [x],$1 | MOV [y],$1 ; MFENCE | MOV [z],$1 ; MOV EAX,[y] | MOV EAX,[z] ; | MOV EBX,[x] ; exists (0:EAX=0 /\ 1:EAX=1 /\ 1:EBX=0) which herd says is allowed: Test SB+mfence+po-rfi-po Allowed States 4 0:EAX=0; 1:EAX=1; 1:EBX=0; 0:EAX=0; 1:EAX=1; 1:EBX=1; 0:EAX=1; 1:EAX=1; 1:EBX=0; 0:EAX=1; 1:EAX=1; 1:EBX=1; Ok Witnesses Positive: 1 Negative: 3 Condition exists (0:EAX=0 /\ 1:EAX=1 /\ 1:EBX=0) Observation SB+mfence+po-rfi-po Sometimes 1 3 Time SB+mfence+po-rfi-po 0.00 Hash=0f983e2d7579e5c04c332f9ac620c31f and I can reproduce using litmus to actually run it on my x86 box: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Results for SB+mfence+po-rfi-po.litmus % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% X86 SB+mfence+po-rfi-po "MFencedWR Fre PodWW Rfi PodRR Fre" {} P0 | P1 ; MOV [x],$1 | MOV [y],$1 ; MFENCE | MOV [z],$1 ; MOV EAX,[y] | MOV EAX,[z] ; | MOV EBX,[x] ; exists (0:EAX=0 /\ 1:EAX=1 /\ 1:EBX=0) Generated assembler #START _litmus_P1 movl $1,(%r8,%rcx) movl $1,(%r9,%rcx) movl (%r9,%rcx),%eax movl (%rdi,%rcx),%edx #START _litmus_P0 movl $1,(%rdx,%rcx) mfence movl (%rdi,%rcx),%eax Test SB+mfence+po-rfi-po Allowed Histogram (4 states) 8 *>0:EAX=0; 1:EAX=1; 1:EBX=0; 1999851:>0:EAX=1; 1:EAX=1; 1:EBX=0; 1999549:>0:EAX=0; 1:EAX=1; 1:EBX=1; 592 :>0:EAX=1; 1:EAX=1; 1:EBX=1; Ok Witnesses Positive: 8, Negative: 3999992 Condition exists (0:EAX=0 /\ 1:EAX=1 /\ 1:EBX=0) is validated Hash=0f983e2d7579e5c04c332f9ac620c31f Generator=diyone7 (version 7.46+3) Com=Fr Fr Orig=MFencedWR Fre PodWW Rfi PodRR Fre Observation SB+mfence+po-rfi-po Sometimes 8 3999992 Time SB+mfence+po-rfi-po 0.17 Will