* [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock() @ 2018-02-22 12:19 Andrea Parri 2018-02-22 12:44 ` Andrea Parri 2018-02-22 13:40 ` Peter Zijlstra 0 siblings, 2 replies; 23+ messages in thread From: Andrea Parri @ 2018-02-22 12:19 UTC (permalink / raw) To: linux-riscv The LKMM defines certain memory ordering constraints for spin_lock(), spin_unlock() and other primitives that the kernel developer can rely on; unfortunately, some of these constraints are not currently met by the RISC-V implementation of spin_lock(), spin_unlock(). The following MP-like program exemplifies the issue: according to our LKMM, program "unlock-lock-read-ordering" below can never reach state (1:r0=1 /\ 1:r1=0). However, when we map this C program to the RISCV program "RISCV-unlock-lock-read-ordering" below following the current implementation, the corresponding state is reachable according to the RISCV specification and its formalizations [2]. C unlock-lock-read-ordering {} /* s initially owned by P1 */ P0(int *x, int *y) { WRITE_ONCE(*x, 1); smp_wmb(); WRITE_ONCE(*y, 1); } P1(int *x, int *y, spinlock_t *s) { int r0; int r1; r0 = READ_ONCE(*y); spin_unlock(s); spin_lock(s); r1 = READ_ONCE(*y); } exists (1:r0=1 /\ 1:r1=0) RISCV RISCV-unlock-lock-read-ordering { 0:x2=x; 0:x4=y; 1:x2=y; 1:x4=x; 1:x6=s; s=1; } P0 | P1 ; ori x1,x0,1 | lw x1,0(x2) ; sw x1,0(x2) | amoswap.w.rl x0,x0,(x6) ; fence w,w | ori x5,x0,1 ; ori x3,x0,1 | amoswap.w.aq x0,x5,(x6) ; sw x3,0(x4) | lw x3,0(x4) ; exists (1:x1=1 /\ 1:x3=0) The issue can in fact be exarcebated if, as envisaged/discussed in [3], the LKMM will be modified to become even more "demanding" on the order- ing constraints associated to the locking primitives. For example the state (1:r0=1 /\ 1:r1=0) in program "unlock-lock-write-ordering" below is currently allowed by LKMM (as is the corresponding state in "RISCV- unlock-lock-write-ordering" below). However, proposals modifying LKMM to _forbid_ that state have already appeared on LKML [4]. C unlock-lock-write-ordering {} /* s initially owned by P0 */ P0(int *x, int *y, spinlock_t *s) { WRITE_ONCE(*x, 1); spin_unlock(s); spin_lock(s); WRITE_ONCE(*y, 1); } P1(int *x, int *y) { int r0; int r1; r0 = READ_ONCE(*y); smp_rmb(); r1 = READ_ONCE(*y); } exists (1:r0=1 /\ 1:r1=0) RISCV RISCV-unlock-lock-write-ordering { 0:x2=x; 0:x4=y; 0:x6=s; 1:x2=y; 1:x4=x; s=1; } P0 | P1 ; ori x1,x0,1 | lw x1,0(x2) ; sw x1,0(x2) | fence r,r ; amoswap.w.rl x0,x0,(x6) | lw x3,0(x4) ; ori x5,x0,1 | ; amoswap.w.aq x0,x5,(x6) | ; ori x3,x0,1 | ; sw x3,0(x4) | ; exists (1:x1=1 /\ 1:x3=0) [Curiously, RISC-V's current implementations of smp_load_acquire() and smp_store_release() provide way stronger ordering than what currently required by LKMM since those're relying on the generic implementation (c.f, also, [5]). ] This RFC fixes the issue by strengthening RISC-V's implementations of spin_lock() and spin_unlock(), based on "A spinlock with fences" from Section 2.3.5 ("Acquire/Release Ordering") of the RISC-V draft spec. It does _not_ attempt to fix read-lock and atomics (for which, AFAICT, similar considerations would hold). IMPORTANT. This patch is _NOT_ intended to be applied as is. Rather, this is intended to test the waters, implicit questions being "Should we take this direction?" "Are changes to LKMM needed?" (and develop a technical discussion on the above issues.) [1] https://marc.info/?l=linux-kernel&m=151633436614259&w=2 [2] https://groups.google.com/a/groups.riscv.org/forum/#!topic/isa-dev/hKywNHBkAXM [3] https://marc.info/?l=linux-kernel&m=151181741400461&w=2 [4] https://marc.info/?l=linux-kernel&m=151871035014425&w=2 [5] https://marc.info/?l=linux-kernel&m=151912186913692&w=2 Signed-off-by: Andrea Parri <parri.andrea@gmail.com> Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Albert Ou <albert@sifive.com> Cc: Daniel Lustig <dlustig@nvidia.com> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Will Deacon <will.deacon@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: David Howells <dhowells@redhat.com> Cc: Jade Alglave <j.alglave@ucl.ac.uk> Cc: Luc Maranget <luc.maranget@inria.fr> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Akira Yokosawa <akiyks@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-riscv at lists.infradead.org Cc: linux-kernel at vger.kernel.org --- arch/riscv/include/asm/spinlock.h | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/arch/riscv/include/asm/spinlock.h b/arch/riscv/include/asm/spinlock.h index 2fd27e8ef1fd6..2f89fc62c9196 100644 --- a/arch/riscv/include/asm/spinlock.h +++ b/arch/riscv/include/asm/spinlock.h @@ -28,8 +28,9 @@ static inline void arch_spin_unlock(arch_spinlock_t *lock) { + RISCV_FENCE(rw,w); __asm__ __volatile__ ( - "amoswap.w.rl x0, x0, %0" + "amoswap.w x0, x0, %0" : "=A" (lock->lock) :: "memory"); } @@ -39,10 +40,11 @@ static inline int arch_spin_trylock(arch_spinlock_t *lock) int tmp = 1, busy; __asm__ __volatile__ ( - "amoswap.w.aq %0, %2, %1" + "amoswap.w %0, %2, %1" : "=r" (busy), "+A" (lock->lock) : "r" (tmp) : "memory"); + RISCV_FENCE(r,rw); return !busy; } -- 2.7.4 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock() 2018-02-22 12:19 [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock() Andrea Parri @ 2018-02-22 12:44 ` Andrea Parri 2018-02-22 13:40 ` Peter Zijlstra 1 sibling, 0 replies; 23+ messages in thread From: Andrea Parri @ 2018-02-22 12:44 UTC (permalink / raw) To: linux-riscv On Thu, Feb 22, 2018 at 01:19:50PM +0100, Andrea Parri wrote: > The LKMM defines certain memory ordering constraints for spin_lock(), > spin_unlock() and other primitives that the kernel developer can rely > on; unfortunately, some of these constraints are not currently met by > the RISC-V implementation of spin_lock(), spin_unlock(). > > The following MP-like program exemplifies the issue: according to our > LKMM, program "unlock-lock-read-ordering" below can never reach state > (1:r0=1 /\ 1:r1=0). However, when we map this C program to the RISCV > program "RISCV-unlock-lock-read-ordering" below following the current > implementation, the corresponding state is reachable according to the > RISCV specification and its formalizations [2]. > > C unlock-lock-read-ordering > > {} > /* s initially owned by P1 */ > > P0(int *x, int *y) > { > WRITE_ONCE(*x, 1); > smp_wmb(); > WRITE_ONCE(*y, 1); > } > > P1(int *x, int *y, spinlock_t *s) > { > int r0; > int r1; > > r0 = READ_ONCE(*y); > spin_unlock(s); > spin_lock(s); > r1 = READ_ONCE(*y); This last read should have been from 'x' of course (and similarly in unlock-lock-write-ordering). Sorry, Andrea > } > > exists (1:r0=1 /\ 1:r1=0) > > RISCV RISCV-unlock-lock-read-ordering > { > 0:x2=x; 0:x4=y; > 1:x2=y; 1:x4=x; 1:x6=s; > s=1; > } > P0 | P1 ; > ori x1,x0,1 | lw x1,0(x2) ; > sw x1,0(x2) | amoswap.w.rl x0,x0,(x6) ; > fence w,w | ori x5,x0,1 ; > ori x3,x0,1 | amoswap.w.aq x0,x5,(x6) ; > sw x3,0(x4) | lw x3,0(x4) ; > exists > (1:x1=1 /\ 1:x3=0) > > The issue can in fact be exarcebated if, as envisaged/discussed in [3], > the LKMM will be modified to become even more "demanding" on the order- > ing constraints associated to the locking primitives. For example the > state (1:r0=1 /\ 1:r1=0) in program "unlock-lock-write-ordering" below > is currently allowed by LKMM (as is the corresponding state in "RISCV- > unlock-lock-write-ordering" below). However, proposals modifying LKMM > to _forbid_ that state have already appeared on LKML [4]. > > C unlock-lock-write-ordering > > {} > /* s initially owned by P0 */ > > P0(int *x, int *y, spinlock_t *s) > { > WRITE_ONCE(*x, 1); > spin_unlock(s); > spin_lock(s); > WRITE_ONCE(*y, 1); > } > > P1(int *x, int *y) > { > int r0; > int r1; > > r0 = READ_ONCE(*y); > smp_rmb(); > r1 = READ_ONCE(*y); > } > > exists (1:r0=1 /\ 1:r1=0) > > RISCV RISCV-unlock-lock-write-ordering > { > 0:x2=x; 0:x4=y; 0:x6=s; > 1:x2=y; 1:x4=x; > s=1; > } > P0 | P1 ; > ori x1,x0,1 | lw x1,0(x2) ; > sw x1,0(x2) | fence r,r ; > amoswap.w.rl x0,x0,(x6) | lw x3,0(x4) ; > ori x5,x0,1 | ; > amoswap.w.aq x0,x5,(x6) | ; > ori x3,x0,1 | ; > sw x3,0(x4) | ; > exists > (1:x1=1 /\ 1:x3=0) > > [Curiously, RISC-V's current implementations of smp_load_acquire() and > smp_store_release() provide way stronger ordering than what currently > required by LKMM since those're relying on the generic implementation > (c.f, also, [5]). ] > > This RFC fixes the issue by strengthening RISC-V's implementations of > spin_lock() and spin_unlock(), based on "A spinlock with fences" from > Section 2.3.5 ("Acquire/Release Ordering") of the RISC-V draft spec. > It does _not_ attempt to fix read-lock and atomics (for which, AFAICT, > similar considerations would hold). > > IMPORTANT. This patch is _NOT_ intended to be applied as is. Rather, > this is intended to test the waters, implicit questions being "Should > we take this direction?" "Are changes to LKMM needed?" (and develop a > technical discussion on the above issues.) > > [1] https://marc.info/?l=linux-kernel&m=151633436614259&w=2 > [2] https://groups.google.com/a/groups.riscv.org/forum/#!topic/isa-dev/hKywNHBkAXM > [3] https://marc.info/?l=linux-kernel&m=151181741400461&w=2 > [4] https://marc.info/?l=linux-kernel&m=151871035014425&w=2 > [5] https://marc.info/?l=linux-kernel&m=151912186913692&w=2 > > Signed-off-by: Andrea Parri <parri.andrea@gmail.com> > Cc: Palmer Dabbelt <palmer@sifive.com> > Cc: Albert Ou <albert@sifive.com> > Cc: Daniel Lustig <dlustig@nvidia.com> > Cc: Alan Stern <stern@rowland.harvard.edu> > Cc: Will Deacon <will.deacon@arm.com> > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: Boqun Feng <boqun.feng@gmail.com> > Cc: Nicholas Piggin <npiggin@gmail.com> > Cc: David Howells <dhowells@redhat.com> > Cc: Jade Alglave <j.alglave@ucl.ac.uk> > Cc: Luc Maranget <luc.maranget@inria.fr> > Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> > Cc: Akira Yokosawa <akiyks@gmail.com> > Cc: Ingo Molnar <mingo@kernel.org> > Cc: Linus Torvalds <torvalds@linux-foundation.org> > Cc: linux-riscv at lists.infradead.org > Cc: linux-kernel at vger.kernel.org > --- > arch/riscv/include/asm/spinlock.h | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/arch/riscv/include/asm/spinlock.h b/arch/riscv/include/asm/spinlock.h > index 2fd27e8ef1fd6..2f89fc62c9196 100644 > --- a/arch/riscv/include/asm/spinlock.h > +++ b/arch/riscv/include/asm/spinlock.h > @@ -28,8 +28,9 @@ > > static inline void arch_spin_unlock(arch_spinlock_t *lock) > { > + RISCV_FENCE(rw,w); > __asm__ __volatile__ ( > - "amoswap.w.rl x0, x0, %0" > + "amoswap.w x0, x0, %0" > : "=A" (lock->lock) > :: "memory"); > } > @@ -39,10 +40,11 @@ static inline int arch_spin_trylock(arch_spinlock_t *lock) > int tmp = 1, busy; > > __asm__ __volatile__ ( > - "amoswap.w.aq %0, %2, %1" > + "amoswap.w %0, %2, %1" > : "=r" (busy), "+A" (lock->lock) > : "r" (tmp) > : "memory"); > + RISCV_FENCE(r,rw); > > return !busy; > } > -- > 2.7.4 > ^ permalink raw reply [flat|nested] 23+ messages in thread
* [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock() 2018-02-22 12:19 [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock() Andrea Parri 2018-02-22 12:44 ` Andrea Parri @ 2018-02-22 13:40 ` Peter Zijlstra 2018-02-22 14:12 ` Andrea Parri 1 sibling, 1 reply; 23+ messages in thread From: Peter Zijlstra @ 2018-02-22 13:40 UTC (permalink / raw) To: linux-riscv On Thu, Feb 22, 2018 at 01:19:50PM +0100, Andrea Parri wrote: > C unlock-lock-read-ordering > > {} > /* s initially owned by P1 */ > > P0(int *x, int *y) > { > WRITE_ONCE(*x, 1); > smp_wmb(); > WRITE_ONCE(*y, 1); > } > > P1(int *x, int *y, spinlock_t *s) > { > int r0; > int r1; > > r0 = READ_ONCE(*y); > spin_unlock(s); > spin_lock(s); > r1 = READ_ONCE(*y); > } > > exists (1:r0=1 /\ 1:r1=0) > > RISCV RISCV-unlock-lock-read-ordering > { > 0:x2=x; 0:x4=y; > 1:x2=y; 1:x4=x; 1:x6=s; > s=1; > } > P0 | P1 ; > ori x1,x0,1 | lw x1,0(x2) ; > sw x1,0(x2) | amoswap.w.rl x0,x0,(x6) ; > fence w,w | ori x5,x0,1 ; > ori x3,x0,1 | amoswap.w.aq x0,x5,(x6) ; > sw x3,0(x4) | lw x3,0(x4) ; > exists > (1:x1=1 /\ 1:x3=0) So I would indeed expect this to be forbidden. Could someone please explain how this could be allowed? > C unlock-lock-write-ordering > > {} > /* s initially owned by P0 */ > > P0(int *x, int *y, spinlock_t *s) > { > WRITE_ONCE(*x, 1); > spin_unlock(s); > spin_lock(s); > WRITE_ONCE(*y, 1); > } > > P1(int *x, int *y) > { > int r0; > int r1; > > r0 = READ_ONCE(*y); > smp_rmb(); > r1 = READ_ONCE(*y); > } > > exists (1:r0=1 /\ 1:r1=0) > > RISCV RISCV-unlock-lock-write-ordering > { > 0:x2=x; 0:x4=y; 0:x6=s; > 1:x2=y; 1:x4=x; > s=1; > } > P0 | P1 ; > ori x1,x0,1 | lw x1,0(x2) ; > sw x1,0(x2) | fence r,r ; > amoswap.w.rl x0,x0,(x6) | lw x3,0(x4) ; > ori x5,x0,1 | ; > amoswap.w.aq x0,x5,(x6) | ; > ori x3,x0,1 | ; > sw x3,0(x4) | ; > exists > (1:x1=1 /\ 1:x3=0) And here I think the RISCV conversion is flawed, there should be a ctrl dependency. The second store-word in P0 should depend on the result of amoswap.w.aq being 0. (strictly speaking there should be a ctrl-dep in the read example too, except it'd be pointless for ordering reads, so I accept it being left out) Again, I cannot see how this could be allowed. ^ permalink raw reply [flat|nested] 23+ messages in thread
* [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock() 2018-02-22 13:40 ` Peter Zijlstra @ 2018-02-22 14:12 ` Andrea Parri 2018-02-22 17:27 ` Daniel Lustig 0 siblings, 1 reply; 23+ messages in thread From: Andrea Parri @ 2018-02-22 14:12 UTC (permalink / raw) To: linux-riscv On Thu, Feb 22, 2018 at 02:40:04PM +0100, Peter Zijlstra wrote: > On Thu, Feb 22, 2018 at 01:19:50PM +0100, Andrea Parri wrote: > > > C unlock-lock-read-ordering > > > > {} > > /* s initially owned by P1 */ > > > > P0(int *x, int *y) > > { > > WRITE_ONCE(*x, 1); > > smp_wmb(); > > WRITE_ONCE(*y, 1); > > } > > > > P1(int *x, int *y, spinlock_t *s) > > { > > int r0; > > int r1; > > > > r0 = READ_ONCE(*y); > > spin_unlock(s); > > spin_lock(s); > > r1 = READ_ONCE(*y); > > } > > > > exists (1:r0=1 /\ 1:r1=0) > > > > RISCV RISCV-unlock-lock-read-ordering > > { > > 0:x2=x; 0:x4=y; > > 1:x2=y; 1:x4=x; 1:x6=s; > > s=1; > > } > > P0 | P1 ; > > ori x1,x0,1 | lw x1,0(x2) ; > > sw x1,0(x2) | amoswap.w.rl x0,x0,(x6) ; > > fence w,w | ori x5,x0,1 ; > > ori x3,x0,1 | amoswap.w.aq x0,x5,(x6) ; > > sw x3,0(x4) | lw x3,0(x4) ; > > exists > > (1:x1=1 /\ 1:x3=0) > > So I would indeed expect this to be forbidden. Could someone please > explain how this could be allowed? As mentioned in IRC, my understanding here is only based on the spec. referred below and on its (available) formalizations. I expect that RISC-V people will be able to provide more information. > > > C unlock-lock-write-ordering > > > > {} > > /* s initially owned by P0 */ > > > > P0(int *x, int *y, spinlock_t *s) > > { > > WRITE_ONCE(*x, 1); > > spin_unlock(s); > > spin_lock(s); > > WRITE_ONCE(*y, 1); > > } > > > > P1(int *x, int *y) > > { > > int r0; > > int r1; > > > > r0 = READ_ONCE(*y); > > smp_rmb(); > > r1 = READ_ONCE(*y); > > } > > > > exists (1:r0=1 /\ 1:r1=0) > > > > RISCV RISCV-unlock-lock-write-ordering > > { > > 0:x2=x; 0:x4=y; 0:x6=s; > > 1:x2=y; 1:x4=x; > > s=1; > > } > > P0 | P1 ; > > ori x1,x0,1 | lw x1,0(x2) ; > > sw x1,0(x2) | fence r,r ; > > amoswap.w.rl x0,x0,(x6) | lw x3,0(x4) ; > > ori x5,x0,1 | ; > > amoswap.w.aq x0,x5,(x6) | ; > > ori x3,x0,1 | ; > > sw x3,0(x4) | ; > > exists > > (1:x1=1 /\ 1:x3=0) > > And here I think the RISCV conversion is flawed, there should be a ctrl > dependency. The second store-word in P0 should depend on the result of > amoswap.w.aq being 0. You're right: AFAICT, this can be remedied by inserting "beq x0,x5,FAIL00" right after amoswap.w.aq (and this label at the end of P0); this does not change the verdict of the available formalizations reported above however. (So, AFAICT, the above question remains valid/open.) Andrea > > (strictly speaking there should be a ctrl-dep in the read example too, > except it'd be pointless for ordering reads, so I accept it being left > out) > > Again, I cannot see how this could be allowed. > ^ permalink raw reply [flat|nested] 23+ messages in thread
* [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock() 2018-02-22 14:12 ` Andrea Parri @ 2018-02-22 17:27 ` Daniel Lustig 2018-02-22 18:13 ` Paul E. McKenney 2018-02-22 18:21 ` Peter Zijlstra 0 siblings, 2 replies; 23+ messages in thread From: Daniel Lustig @ 2018-02-22 17:27 UTC (permalink / raw) To: linux-riscv On 2/22/2018 6:12 AM, Andrea Parri wrote: > On Thu, Feb 22, 2018 at 02:40:04PM +0100, Peter Zijlstra wrote: >> On Thu, Feb 22, 2018 at 01:19:50PM +0100, Andrea Parri wrote: >> >>> C unlock-lock-read-ordering >>> >>> {} >>> /* s initially owned by P1 */ >>> >>> P0(int *x, int *y) >>> { >>> WRITE_ONCE(*x, 1); >>> smp_wmb(); >>> WRITE_ONCE(*y, 1); >>> } >>> >>> P1(int *x, int *y, spinlock_t *s) >>> { >>> int r0; >>> int r1; >>> >>> r0 = READ_ONCE(*y); >>> spin_unlock(s); >>> spin_lock(s); >>> r1 = READ_ONCE(*y); >>> } >>> >>> exists (1:r0=1 /\ 1:r1=0) >>> >>> RISCV RISCV-unlock-lock-read-ordering >>> { >>> 0:x2=x; 0:x4=y; >>> 1:x2=y; 1:x4=x; 1:x6=s; >>> s=1; >>> } >>> P0 | P1 ; >>> ori x1,x0,1 | lw x1,0(x2) ; >>> sw x1,0(x2) | amoswap.w.rl x0,x0,(x6) ; >>> fence w,w | ori x5,x0,1 ; >>> ori x3,x0,1 | amoswap.w.aq x0,x5,(x6) ; >>> sw x3,0(x4) | lw x3,0(x4) ; >>> exists >>> (1:x1=1 /\ 1:x3=0) >> >> So I would indeed expect this to be forbidden. Could someone please >> explain how this could be allowed? > > As mentioned in IRC, my understanding here is only based on the spec. > referred below and on its (available) formalizations. I expect that > RISC-V people will be able to provide more information. I think an RCpc interpretation of .aq and .rl would in fact allow the two normal loads in P1 to be reordered. Wouldn't it? Does that go against what the LKMM requires? The intuition would be that the amoswap.w.aq can forward from the amoswap.w.rl while that's still in the store buffer, and then the lw x3,0(x4) can also perform while the amoswap.w.rl is still in the store buffer, all before the l1 x1,0(x2) executes. That's not forbidden unless the amoswaps are RCsc, unless I'm missing something. Likewise even if the unlock()/lock() is between two stores. A control dependency might originate from the load part of the amoswap.w.aq, but there still would have to be something to ensure that this load part in fact performs after the store part of the amoswap.w.rl performs globally, and that's not automatic under RCpc. In any case, our concern that amoswap.w.rl and amoswap.w.aq might not actually be sufficient for spin_unlock() and spin_lock() was what prompted us to start our own internal discussion on this. FWIW we have made a few clarifications to our spec that haven't propagated to the public upstream yet, but that should be happening soon. In any case as it relates to the questions here, it's more a matter of us deciding what we should put into the spec in the end than it is a matter of checking the current text. In other words, we're still finalizing things, and there's still some time to tweak as necessary. >> >>> C unlock-lock-write-ordering >>> >>> {} >>> /* s initially owned by P0 */ >>> >>> P0(int *x, int *y, spinlock_t *s) >>> { >>> WRITE_ONCE(*x, 1); >>> spin_unlock(s); >>> spin_lock(s); >>> WRITE_ONCE(*y, 1); >>> } >>> >>> P1(int *x, int *y) >>> { >>> int r0; >>> int r1; >>> >>> r0 = READ_ONCE(*y); >>> smp_rmb(); >>> r1 = READ_ONCE(*y); >>> } >>> >>> exists (1:r0=1 /\ 1:r1=0) >>> >>> RISCV RISCV-unlock-lock-write-ordering >>> { >>> 0:x2=x; 0:x4=y; 0:x6=s; >>> 1:x2=y; 1:x4=x; >>> s=1; >>> } >>> P0 | P1 ; >>> ori x1,x0,1 | lw x1,0(x2) ; >>> sw x1,0(x2) | fence r,r ; >>> amoswap.w.rl x0,x0,(x6) | lw x3,0(x4) ; >>> ori x5,x0,1 | ; >>> amoswap.w.aq x0,x5,(x6) | ; >>> ori x3,x0,1 | ; >>> sw x3,0(x4) | ; >>> exists >>> (1:x1=1 /\ 1:x3=0) >> >> And here I think the RISCV conversion is flawed, there should be a ctrl >> dependency. The second store-word in P0 should depend on the result of >> amoswap.w.aq being 0. > > You're right: AFAICT, this can be remedied by inserting "beq x0,x5,FAIL00" > right after amoswap.w.aq (and this label at the end of P0); this does not > change the verdict of the available formalizations reported above however. > > (So, AFAICT, the above question remains valid/open.) This makes sense, but one other question: Paul mentioned earlier in the thread that > The PowerPC lock implementation's unlock-lock pair does not order writes > from the previous critical section against reads from the later critical > section, but it does order other combinations of reads and writes. My understanding is that the same logic about control dependencies applies here too, right? In other words, in spite of Peter's claim, the control dependency doesn't automatically fix this for RISC-V or for PowerPC? > > Andrea > > >> >> (strictly speaking there should be a ctrl-dep in the read example too, >> except it'd be pointless for ordering reads, so I accept it being left >> out) >> >> Again, I cannot see how this could be allowed. >> Peter, in this test you mentioned earlier: P0() { WRITE_ONCE(x, 1); smp_store_release(&y, 1); r0 = smp_load_acquire(&y); WRITE_ONCE(z, 1); } P1() { r1 = READ_ONCE(z); smp_rmb(); r2 = READ_ONCE(x); } exists: r0 == 1 /\ r1==1 /\ r2==0 You expect this to be forbidden too, even if the release and acquire are RCpc? This part I don't follow. There's no conditional branch here to enforce a control dependency. I get and I agree that smp_store_release/smp_load_acquire are different than spin_unlock/ spin_lock, but isn't that even more reason to allow this one if smp_store_release/smp_load_acquire are RCpc without question? Thanks, Dan ^ permalink raw reply [flat|nested] 23+ messages in thread
* [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock() 2018-02-22 17:27 ` Daniel Lustig @ 2018-02-22 18:13 ` Paul E. McKenney 2018-02-22 18:27 ` Peter Zijlstra 2018-02-22 18:21 ` Peter Zijlstra 1 sibling, 1 reply; 23+ messages in thread From: Paul E. McKenney @ 2018-02-22 18:13 UTC (permalink / raw) To: linux-riscv On Thu, Feb 22, 2018 at 09:27:09AM -0800, Daniel Lustig wrote: > On 2/22/2018 6:12 AM, Andrea Parri wrote: > > On Thu, Feb 22, 2018 at 02:40:04PM +0100, Peter Zijlstra wrote: > >> On Thu, Feb 22, 2018 at 01:19:50PM +0100, Andrea Parri wrote: > >> > >>> C unlock-lock-read-ordering > >>> > >>> {} > >>> /* s initially owned by P1 */ > >>> > >>> P0(int *x, int *y) > >>> { > >>> WRITE_ONCE(*x, 1); > >>> smp_wmb(); > >>> WRITE_ONCE(*y, 1); > >>> } > >>> > >>> P1(int *x, int *y, spinlock_t *s) > >>> { > >>> int r0; > >>> int r1; > >>> > >>> r0 = READ_ONCE(*y); > >>> spin_unlock(s); > >>> spin_lock(s); > >>> r1 = READ_ONCE(*y); > >>> } > >>> > >>> exists (1:r0=1 /\ 1:r1=0) > >>> > >>> RISCV RISCV-unlock-lock-read-ordering > >>> { > >>> 0:x2=x; 0:x4=y; > >>> 1:x2=y; 1:x4=x; 1:x6=s; > >>> s=1; > >>> } > >>> P0 | P1 ; > >>> ori x1,x0,1 | lw x1,0(x2) ; > >>> sw x1,0(x2) | amoswap.w.rl x0,x0,(x6) ; > >>> fence w,w | ori x5,x0,1 ; > >>> ori x3,x0,1 | amoswap.w.aq x0,x5,(x6) ; > >>> sw x3,0(x4) | lw x3,0(x4) ; > >>> exists > >>> (1:x1=1 /\ 1:x3=0) > >> > >> So I would indeed expect this to be forbidden. Could someone please > >> explain how this could be allowed? > > > > As mentioned in IRC, my understanding here is only based on the spec. > > referred below and on its (available) formalizations. I expect that > > RISC-V people will be able to provide more information. > > I think an RCpc interpretation of .aq and .rl would in fact allow > the two normal loads in P1 to be reordered. Wouldn't it? Does that > go against what the LKMM requires? > > The intuition would be that the amoswap.w.aq can forward from the > amoswap.w.rl while that's still in the store buffer, and then the > lw x3,0(x4) can also perform while the amoswap.w.rl is still in > the store buffer, all before the l1 x1,0(x2) executes. That's > not forbidden unless the amoswaps are RCsc, unless I'm missing > something. > > Likewise even if the unlock()/lock() is between two stores. A > control dependency might originate from the load part of the > amoswap.w.aq, but there still would have to be something to > ensure that this load part in fact performs after the store > part of the amoswap.w.rl performs globally, and that's not > automatic under RCpc. > > In any case, our concern that amoswap.w.rl and amoswap.w.aq might > not actually be sufficient for spin_unlock() and spin_lock() was > what prompted us to start our own internal discussion on this. > > FWIW we have made a few clarifications to our spec that haven't > propagated to the public upstream yet, but that should be happening > soon. In any case as it relates to the questions here, it's > more a matter of us deciding what we should put into the spec in > the end than it is a matter of checking the current text. In > other words, we're still finalizing things, and there's still > some time to tweak as necessary. > > >> > >>> C unlock-lock-write-ordering > >>> > >>> {} > >>> /* s initially owned by P0 */ > >>> > >>> P0(int *x, int *y, spinlock_t *s) > >>> { > >>> WRITE_ONCE(*x, 1); > >>> spin_unlock(s); > >>> spin_lock(s); > >>> WRITE_ONCE(*y, 1); > >>> } > >>> > >>> P1(int *x, int *y) > >>> { > >>> int r0; > >>> int r1; > >>> > >>> r0 = READ_ONCE(*y); > >>> smp_rmb(); > >>> r1 = READ_ONCE(*y); > >>> } > >>> > >>> exists (1:r0=1 /\ 1:r1=0) > >>> > >>> RISCV RISCV-unlock-lock-write-ordering > >>> { > >>> 0:x2=x; 0:x4=y; 0:x6=s; > >>> 1:x2=y; 1:x4=x; > >>> s=1; > >>> } > >>> P0 | P1 ; > >>> ori x1,x0,1 | lw x1,0(x2) ; > >>> sw x1,0(x2) | fence r,r ; > >>> amoswap.w.rl x0,x0,(x6) | lw x3,0(x4) ; > >>> ori x5,x0,1 | ; > >>> amoswap.w.aq x0,x5,(x6) | ; > >>> ori x3,x0,1 | ; > >>> sw x3,0(x4) | ; > >>> exists > >>> (1:x1=1 /\ 1:x3=0) > >> > >> And here I think the RISCV conversion is flawed, there should be a ctrl > >> dependency. The second store-word in P0 should depend on the result of > >> amoswap.w.aq being 0. > > > > You're right: AFAICT, this can be remedied by inserting "beq x0,x5,FAIL00" > > right after amoswap.w.aq (and this label at the end of P0); this does not > > change the verdict of the available formalizations reported above however. > > > > (So, AFAICT, the above question remains valid/open.) > > This makes sense, but one other question: Paul mentioned earlier in the > thread that > > > The PowerPC lock implementation's unlock-lock pair does not order writes > > from the previous critical section against reads from the later critical > > section, but it does order other combinations of reads and writes. > > My understanding is that the same logic about control dependencies applies > here too, right? In other words, in spite of Peter's claim, the control > dependency doesn't automatically fix this for RISC-V or for PowerPC? For PowerPC, an lwsync instruction is used, which means that an external obsserver can see a write from an earlier critical section being reordered with a read from a later critical section, but no other combination of reads and writes can be so reordered. I will dig up PowerPC litmus tests demonstrating this and send them along. > >> (strictly speaking there should be a ctrl-dep in the read example too, > >> except it'd be pointless for ordering reads, so I accept it being left > >> out) > >> > >> Again, I cannot see how this could be allowed. > >> > > Peter, in this test you mentioned earlier: > > P0() > { > WRITE_ONCE(x, 1); > smp_store_release(&y, 1); > r0 = smp_load_acquire(&y); > WRITE_ONCE(z, 1); > } > > P1() > { > r1 = READ_ONCE(z); > smp_rmb(); > r2 = READ_ONCE(x); > } > > exists: r0 == 1 /\ r1==1 /\ r2==0 > > You expect this to be forbidden too, even if the release and acquire > are RCpc? This part I don't follow. There's no conditional branch > here to enforce a control dependency. I get and I agree that > smp_store_release/smp_load_acquire are different than spin_unlock/ > spin_lock, but isn't that even more reason to allow this one if > smp_store_release/smp_load_acquire are RCpc without question? Here is the above test, reworked a bit to allow herd to accept it: ------------------------------------------------------------------------ C C-peterz { } P0(int *x, int *y, int *z) { int r0; WRITE_ONCE(*x, 1); smp_store_release(y, 1); r0 = smp_load_acquire(y); WRITE_ONCE(*z, 1); } P1(int *x, int *y, int *z) { int r1; int r2; r1 = READ_ONCE(*z); smp_rmb(); r2 = READ_ONCE(*x); } exists (0:r0 == 1 /\ 1:r1==1 /\ 1:r2==0) ------------------------------------------------------------------------ Please let me know if I messed up the conversion. On the perhaps unlikely chance that I got it right, here is what the current version of the Linux-kernel memory model says: ------------------------------------------------------------------------ $ herd7 -conf linux-kernel.cfg /tmp/C-peterz.litmus Test C-peterz Allowed States 4 0:r0=1; 1:r1=0; 1:r2=0; 0:r0=1; 1:r1=0; 1:r2=1; 0:r0=1; 1:r1=1; 1:r2=0; 0:r0=1; 1:r1=1; 1:r2=1; Ok Witnesses Positive: 1 Negative: 3 Condition exists (0:r0=1 /\ 1:r1=1 /\ 1:r2=0) Observation C-peterz Sometimes 1 3 Time C-peterz 0.01 Hash=6c4f7f8a8dc750fd5d706d6f3c104360 ------------------------------------------------------------------------ So the current Linux-kernel memory model does in fact allow this, as you say. But again, there is a proposal to tighten the Linux-kernel memory model based on the pre-RISC-V set of architectures running the Linux kernel, all of which would forbid this test. So we have something that is not all that rare in the Linux kernel community, namely two conflicting more-or-less concurrent changes. This clearly needs to be resolved, either by us not strengthening the Linux-kernel memory model in the way we were planning to or by you strengthening RISC-V to be no weaker than PowerPC for these sorts of externally viewed release-acquire situations. Other thoughts? Thanx, Paul ^ permalink raw reply [flat|nested] 23+ messages in thread
* [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock() 2018-02-22 18:13 ` Paul E. McKenney @ 2018-02-22 18:27 ` Peter Zijlstra 2018-02-22 19:47 ` Daniel Lustig 2018-02-22 20:02 ` Paul E. McKenney 0 siblings, 2 replies; 23+ messages in thread From: Peter Zijlstra @ 2018-02-22 18:27 UTC (permalink / raw) To: linux-riscv On Thu, Feb 22, 2018 at 10:13:17AM -0800, Paul E. McKenney wrote: > So we have something that is not all that rare in the Linux kernel > community, namely two conflicting more-or-less concurrent changes. > This clearly needs to be resolved, either by us not strengthening the > Linux-kernel memory model in the way we were planning to or by you > strengthening RISC-V to be no weaker than PowerPC for these sorts of > externally viewed release-acquire situations. > > Other thoughts? Like said in the other email, I would _much_ prefer to not go weaker than PPC, I find that PPC is already painfully weak at times. ^ permalink raw reply [flat|nested] 23+ messages in thread
* [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock() 2018-02-22 18:27 ` Peter Zijlstra @ 2018-02-22 19:47 ` Daniel Lustig 2018-02-23 11:16 ` Andrea Parri ` (3 more replies) 2018-02-22 20:02 ` Paul E. McKenney 1 sibling, 4 replies; 23+ messages in thread From: Daniel Lustig @ 2018-02-22 19:47 UTC (permalink / raw) To: linux-riscv On 2/22/2018 10:27 AM, Peter Zijlstra wrote: > On Thu, Feb 22, 2018 at 10:13:17AM -0800, Paul E. McKenney wrote: >> So we have something that is not all that rare in the Linux kernel >> community, namely two conflicting more-or-less concurrent changes. >> This clearly needs to be resolved, either by us not strengthening the >> Linux-kernel memory model in the way we were planning to or by you >> strengthening RISC-V to be no weaker than PowerPC for these sorts of >> externally viewed release-acquire situations. >> >> Other thoughts? > > Like said in the other email, I would _much_ prefer to not go weaker > than PPC, I find that PPC is already painfully weak at times. Sure, and RISC-V could make this work too by using RCsc instructions and/or by using lightweight fences instead. It just wasn't clear at first whether smp_load_acquire() and smp_store_release() were RCpc, RCsc, or something else, and hence whether RISC-V would actually need to use something stronger than pure RCpc there. Likewise for spin_unlock()/spin_lock() and everywhere else this comes up. As Paul's email in the other thread observed, RCpc seems to be OK for smp_load_acquire()/smp_store_release() at least according to the current LKMM herd spec. Unlock/lock are stronger already I guess. But if there's an active proposal to strengthen them all to something stricter than pure RCpc, then that's good to know. My understanding from earlier discussions is that ARM has no plans to use their own RCpc instruction for smp_load_acquire() instead of their RCsc instructions. Is that still true? If they were to use the RCpc load there, that would cause them to have the same problem we're discussing here, right? Just checking. Dan ^ permalink raw reply [flat|nested] 23+ messages in thread
* [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock() 2018-02-22 19:47 ` Daniel Lustig @ 2018-02-23 11:16 ` Andrea Parri 2018-02-26 10:39 ` Will Deacon ` (2 subsequent siblings) 3 siblings, 0 replies; 23+ messages in thread From: Andrea Parri @ 2018-02-23 11:16 UTC (permalink / raw) To: linux-riscv On Thu, Feb 22, 2018 at 11:47:57AM -0800, Daniel Lustig wrote: > On 2/22/2018 10:27 AM, Peter Zijlstra wrote: > > On Thu, Feb 22, 2018 at 10:13:17AM -0800, Paul E. McKenney wrote: > >> So we have something that is not all that rare in the Linux kernel > >> community, namely two conflicting more-or-less concurrent changes. > >> This clearly needs to be resolved, either by us not strengthening the > >> Linux-kernel memory model in the way we were planning to or by you > >> strengthening RISC-V to be no weaker than PowerPC for these sorts of > >> externally viewed release-acquire situations. > >> > >> Other thoughts? > > > > Like said in the other email, I would _much_ prefer to not go weaker > > than PPC, I find that PPC is already painfully weak at times. > > Sure, and RISC-V could make this work too by using RCsc instructions > and/or by using lightweight fences instead. It just wasn't clear at > first whether smp_load_acquire() and smp_store_release() were RCpc, > RCsc, or something else, and hence whether RISC-V would actually need > to use something stronger than pure RCpc there. Likewise for > spin_unlock()/spin_lock() and everywhere else this comes up. Thank you for the confirmations. The hope was indeed that this thread helped clarify what the (current) LKMM "expectations" are, and how these are likely to evolve in a near future. And Yes, I'm definitely with you and Paul on "let us (try to) keep LKMM and RISC-V synchronized"! ;-) Andrea > > > Dan ^ permalink raw reply [flat|nested] 23+ messages in thread
* [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock() 2018-02-22 19:47 ` Daniel Lustig 2018-02-23 11:16 ` Andrea Parri @ 2018-02-26 10:39 ` Will Deacon 2018-02-26 14:21 ` Luc Maranget 2018-03-01 15:11 ` Andrea Parri 3 siblings, 0 replies; 23+ messages in thread From: Will Deacon @ 2018-02-26 10:39 UTC (permalink / raw) To: linux-riscv On Thu, Feb 22, 2018 at 11:47:57AM -0800, Daniel Lustig wrote: > On 2/22/2018 10:27 AM, Peter Zijlstra wrote: > > On Thu, Feb 22, 2018 at 10:13:17AM -0800, Paul E. McKenney wrote: > >> So we have something that is not all that rare in the Linux kernel > >> community, namely two conflicting more-or-less concurrent changes. > >> This clearly needs to be resolved, either by us not strengthening the > >> Linux-kernel memory model in the way we were planning to or by you > >> strengthening RISC-V to be no weaker than PowerPC for these sorts of > >> externally viewed release-acquire situations. > >> > >> Other thoughts? > > > > Like said in the other email, I would _much_ prefer to not go weaker > > than PPC, I find that PPC is already painfully weak at times. > > Sure, and RISC-V could make this work too by using RCsc instructions > and/or by using lightweight fences instead. It just wasn't clear at > first whether smp_load_acquire() and smp_store_release() were RCpc, > RCsc, or something else, and hence whether RISC-V would actually need > to use something stronger than pure RCpc there. Likewise for > spin_unlock()/spin_lock() and everywhere else this comes up. > > As Paul's email in the other thread observed, RCpc seems to be > OK for smp_load_acquire()/smp_store_release() at least according > to the current LKMM herd spec. Unlock/lock are stronger already > I guess. But if there's an active proposal to strengthen them all > to something stricter than pure RCpc, then that's good to know. > > My understanding from earlier discussions is that ARM has no plans > to use their own RCpc instruction for smp_load_acquire() instead > of their RCsc instructions. Is that still true? If they were to > use the RCpc load there, that would cause them to have the same > problem we're discussing here, right? Just checking. Agreed. No plans to use the LDAPR instruction in Linux. Will ^ permalink raw reply [flat|nested] 23+ messages in thread
* [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock() 2018-02-22 19:47 ` Daniel Lustig 2018-02-23 11:16 ` Andrea Parri 2018-02-26 10:39 ` Will Deacon @ 2018-02-26 14:21 ` Luc Maranget 2018-02-26 16:06 ` Linus Torvalds 2018-03-01 15:11 ` Andrea Parri 3 siblings, 1 reply; 23+ messages in thread From: Luc Maranget @ 2018-02-26 14:21 UTC (permalink / raw) To: linux-riscv > On 2/22/2018 10:27 AM, Peter Zijlstra wrote: > > On Thu, Feb 22, 2018 at 10:13:17AM -0800, Paul E. McKenney wrote: > >> So we have something that is not all that rare in the Linux kernel > >> community, namely two conflicting more-or-less concurrent changes. > >> This clearly needs to be resolved, either by us not strengthening the > >> Linux-kernel memory model in the way we were planning to or by you > >> strengthening RISC-V to be no weaker than PowerPC for these sorts of > >> externally viewed release-acquire situations. > >> > >> Other thoughts? > > > > Like said in the other email, I would _much_ prefer to not go weaker > > than PPC, I find that PPC is already painfully weak at times. > > Sure, and RISC-V could make this work too by using RCsc instructions > and/or by using lightweight fences instead. It just wasn't clear at > first whether smp_load_acquire() and smp_store_release() were RCpc, > RCsc, or something else, As far as I now understand, they are something else. Something in-between. Basically, considering a multi-copy atomic model, and following RISCV axiomatic herd notations RCpc [Acq];po in ppo (ie Acquire "half barrier") po;[Rel] in ppo (ie Release "half barrier") RCsc adds the rule [Rel];po;[Acq] in ppo. (ie Store release followed by load Acquire is kept in order) While LKMM has a less constrained additional rule [Rel];po-loc;[Acq] (ie Store release followed by load Acquire from the SAME location is kept in order) (For specialists LKMM is formulated differently, the overall effect is the same) > and hence whether RISC-V would actually need > to use something stronger than pure RCpc there. Likewise for > spin_unlock()/spin_lock() and everywhere else this comes up. On may here observe that the LKMM rule is here precisely to forbid Andrea's test. In that aspect,the suggested RISCv implemention of lock and unlock on spinlock_t is too weak as it allows the test. However, I cannot conclude on smp_load_acquire/smp_store_release (see below). > > As Paul's email in the other thread observed, RCpc seems to be > OK for smp_load_acquire()/smp_store_release() at least according > to the current LKMM herd spec. Unlock/lock are stronger already > I guess. But if there's an active proposal to strengthen them all > to something stricter than pure RCpc, then that's good to know. As illustrated above RCpc is not enough to implement LLKM smp_load_acquire()/smp_store_release(). However, I do not know for sure what the current implementation of smp_load_acquire() and smp_store_release() (ie LLKM primitives) in RISCV. As far as I could find the current implementation is the generic default that uses strong fences and is safe. On a related note, it may not be clear yet to everyone that LLKM has "platonic spinlocks". That is, locks are not implemented from more basic primitive but are specified. The specification can be described as behaving that way: - A lock behaves as a read-modify-write. teh read behaving as a read-acquire - A unlock behaves as a store release. - Lock and Unlock operation (to a given lock variable) alternate in some total order. This order induces teh usal relations between accesses to a given location. This of course strongly suggest implementing lock with a RWM (R being acquire, ie amoswap.aq in RISCV) and unlock as a store release (ie amoswap.rl in RISCV). And again, this is too weak given LKMM additional rule for RC. However, nothing prevents from a different implemention provided it is safe. As regards LKMM spinlocks, I am not sure that having strong implementations of lock/unlock in Power (lwsync) and ARMv8 (RCsc?) commands a specification stronger than the current one. > > My understanding from earlier discussions is that ARM has no plans > to use their own RCpc instruction for smp_load_acquire() instead > of their RCsc instructions. Is that still true? If they were to > use the RCpc load there, that would cause them to have the same > problem we're discussing here, right? Just checking. > > Dan ^ permalink raw reply [flat|nested] 23+ messages in thread
* [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock() 2018-02-26 14:21 ` Luc Maranget @ 2018-02-26 16:06 ` Linus Torvalds 2018-02-26 16:24 ` Will Deacon 0 siblings, 1 reply; 23+ messages in thread From: Linus Torvalds @ 2018-02-26 16:06 UTC (permalink / raw) To: linux-riscv On Mon, Feb 26, 2018 at 6:21 AM, Luc Maranget <luc.maranget@inria.fr> wrote: > > That is, locks are not implemented from more basic primitive but are specified. > The specification can be described as behaving that way: > - A lock behaves as a read-modify-write. the read behaving as a read-acquire This is wrong, or perhaps just misleading. The *whole* r-m-w acts as an acquire. Not just the read part. The write is very much part of it. Maybe that's what you meant, but it read to me as "just the read part of the rmw behaves as a read-acquire". Because it is very important that the _write_ part of the rmw is also ordered wrt everything that is inside the spinlock. So doing a spinlock as (a) read-locked-acquire modify (c) write-conditional would be wrong, because the accesses inside the spinlock are ordered not just wrt the read-acquire, they have to be ordered wrt the write too. So it is closer to say that it's the _write_ of the r-m-w sequence that has the acquire semantics, not the read. > - A unlock behaves as a store release. Yes. Linus ^ permalink raw reply [flat|nested] 23+ messages in thread
* [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock() 2018-02-26 16:06 ` Linus Torvalds @ 2018-02-26 16:24 ` Will Deacon 2018-02-26 17:00 ` Linus Torvalds 2018-02-27 5:06 ` Boqun Feng 0 siblings, 2 replies; 23+ messages in thread From: Will Deacon @ 2018-02-26 16:24 UTC (permalink / raw) To: linux-riscv On Mon, Feb 26, 2018 at 08:06:59AM -0800, Linus Torvalds wrote: > On Mon, Feb 26, 2018 at 6:21 AM, Luc Maranget <luc.maranget@inria.fr> wrote: > > > > That is, locks are not implemented from more basic primitive but are specified. > > The specification can be described as behaving that way: > > - A lock behaves as a read-modify-write. the read behaving as a read-acquire > > This is wrong, or perhaps just misleading. > > The *whole* r-m-w acts as an acquire. Not just the read part. The > write is very much part of it. > > Maybe that's what you meant, but it read to me as "just the read part > of the rmw behaves as a read-acquire". > > Because it is very important that the _write_ part of the rmw is also > ordered wrt everything that is inside the spinlock. > > So doing a spinlock as > > (a) read-locked-acquire > modify > (c) write-conditional > > would be wrong, because the accesses inside the spinlock are ordered > not just wrt the read-acquire, they have to be ordered wrt the write > too. > > So it is closer to say that it's the _write_ of the r-m-w sequence > that has the acquire semantics, not the read. Strictly speaking, that's not what we've got implemented on arm64: only the read part of the RmW has Acquire semantics, but there is a total order on the lock/unlock operations for the lock. For example, if one CPU does: spin_lock(&lock); WRITE_ONCE(foo, 42); then another CPU could do: if (smp_load_acquire(&foo) == 42) BUG_ON(!spin_is_locked(&lock)); and that could fire. Is that relied on somewhere? Will ^ permalink raw reply [flat|nested] 23+ messages in thread
* [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock() 2018-02-26 16:24 ` Will Deacon @ 2018-02-26 17:00 ` Linus Torvalds 2018-02-26 17:10 ` Will Deacon 2018-03-06 13:00 ` Peter Zijlstra 2018-02-27 5:06 ` Boqun Feng 1 sibling, 2 replies; 23+ messages in thread From: Linus Torvalds @ 2018-02-26 17:00 UTC (permalink / raw) To: linux-riscv On Mon, Feb 26, 2018 at 8:24 AM, Will Deacon <will.deacon@arm.com> wrote: > > Strictly speaking, that's not what we've got implemented on arm64: only > the read part of the RmW has Acquire semantics, but there is a total > order on the lock/unlock operations for the lock. Hmm. I thought we had exactly that bug on some architecture with the queued spinlocks, and people decided it was wrong. But it's possible that I mis-remember, and that we decided it was ok after all. > spin_lock(&lock); > WRITE_ONCE(foo, 42); > > then another CPU could do: > > if (smp_load_acquire(&foo) == 42) > BUG_ON(!spin_is_locked(&lock)); > > and that could fire. Is that relied on somewhere? I have a distinct memory that we said the spinlock write is seen in order, wrt the writes inside the spinlock, and the reason was something very similar to the above, except that "spin_is_locked()" was about our spin_unlock_wait(). Because we had something very much like the above in the exit path, where we would look at some state and do "spin_unlock_wait()" and expect to be guaranteed to be the last user after that. But a few months ago we obviously got rid of spin_unlock_wait exactly because people were worried about the semantics. So maybe I just remember an older issue that simply became a non-issue with that. Linus ^ permalink raw reply [flat|nested] 23+ messages in thread
* [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock() 2018-02-26 17:00 ` Linus Torvalds @ 2018-02-26 17:10 ` Will Deacon 2018-03-06 13:00 ` Peter Zijlstra 1 sibling, 0 replies; 23+ messages in thread From: Will Deacon @ 2018-02-26 17:10 UTC (permalink / raw) To: linux-riscv On Mon, Feb 26, 2018 at 09:00:43AM -0800, Linus Torvalds wrote: > On Mon, Feb 26, 2018 at 8:24 AM, Will Deacon <will.deacon@arm.com> wrote: > > > > Strictly speaking, that's not what we've got implemented on arm64: only > > the read part of the RmW has Acquire semantics, but there is a total > > order on the lock/unlock operations for the lock. > > Hmm. > > I thought we had exactly that bug on some architecture with the queued > spinlocks, and people decided it was wrong. > > But it's possible that I mis-remember, and that we decided it was ok after all. > > > spin_lock(&lock); > > WRITE_ONCE(foo, 42); > > > > then another CPU could do: > > > > if (smp_load_acquire(&foo) == 42) > > BUG_ON(!spin_is_locked(&lock)); > > > > and that could fire. Is that relied on somewhere? > > I have a distinct memory that we said the spinlock write is seen in > order, wrt the writes inside the spinlock, and the reason was > something very similar to the above, except that "spin_is_locked()" > was about our spin_unlock_wait(). Yes, we did run into problems with spin_unlock_wait and we ended up strengthening the arm64 implementation to do an RmW, which puts it into the total order of lock/unlock operations. However, we then went and killed the thing because it was seldom used correctly and we struggled to define what "correctly" even meant! > Because we had something very much like the above in the exit path, > where we would look at some state and do "spin_unlock_wait()" and > expect to be guaranteed to be the last user after that. > > But a few months ago we obviously got rid of spin_unlock_wait exactly > because people were worried about the semantics. Similarly for spin_can_lock. > So maybe I just remember an older issue that simply became a non-issue > with that. I think so. If we need to, I could make spin_is_locked do an RmW on arm64 so we can say that all successful spin_* operations are totally ordered for a given lock, but spin_is_locked is normally only used as a coarse debug check anyway where it's assumed that if it's held, it's held by the current CPU. We should probably move most users over to lockdep and see what we're left with. Will ^ permalink raw reply [flat|nested] 23+ messages in thread
* [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock() 2018-02-26 17:00 ` Linus Torvalds 2018-02-26 17:10 ` Will Deacon @ 2018-03-06 13:00 ` Peter Zijlstra 1 sibling, 0 replies; 23+ messages in thread From: Peter Zijlstra @ 2018-03-06 13:00 UTC (permalink / raw) To: linux-riscv On Mon, Feb 26, 2018 at 09:00:43AM -0800, Linus Torvalds wrote: > On Mon, Feb 26, 2018 at 8:24 AM, Will Deacon <will.deacon@arm.com> wrote: > > > > Strictly speaking, that's not what we've got implemented on arm64: only > > the read part of the RmW has Acquire semantics, but there is a total > > order on the lock/unlock operations for the lock. > > Hmm. > > I thought we had exactly that bug on some architecture with the queued > spinlocks, and people decided it was wrong. So ARM64 and Power have the acquire-on-load only thing, but qspinlock has it per construction on anything that allowes reordering stores. Given that unlock/lock are ordered, which covers about 99% of the users out there, and fixing the issue would make things significantly slower on the weak architectures we let it be. But yes, its a pesky detail. ^ permalink raw reply [flat|nested] 23+ messages in thread
* [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock() 2018-02-26 16:24 ` Will Deacon 2018-02-26 17:00 ` Linus Torvalds @ 2018-02-27 5:06 ` Boqun Feng 2018-02-27 10:16 ` Boqun Feng 1 sibling, 1 reply; 23+ messages in thread From: Boqun Feng @ 2018-02-27 5:06 UTC (permalink / raw) To: linux-riscv On Mon, Feb 26, 2018 at 04:24:27PM +0000, Will Deacon wrote: > On Mon, Feb 26, 2018 at 08:06:59AM -0800, Linus Torvalds wrote: > > On Mon, Feb 26, 2018 at 6:21 AM, Luc Maranget <luc.maranget@inria.fr> wrote: > > > > > > That is, locks are not implemented from more basic primitive but are specified. > > > The specification can be described as behaving that way: > > > - A lock behaves as a read-modify-write. the read behaving as a read-acquire > > > > This is wrong, or perhaps just misleading. > > > > The *whole* r-m-w acts as an acquire. Not just the read part. The > > write is very much part of it. > > > > Maybe that's what you meant, but it read to me as "just the read part > > of the rmw behaves as a read-acquire". > > > > Because it is very important that the _write_ part of the rmw is also > > ordered wrt everything that is inside the spinlock. > > > > So doing a spinlock as > > > > (a) read-locked-acquire > > modify > > (c) write-conditional > > > > would be wrong, because the accesses inside the spinlock are ordered > > not just wrt the read-acquire, they have to be ordered wrt the write > > too. > > > > So it is closer to say that it's the _write_ of the r-m-w sequence > > that has the acquire semantics, not the read. > > Strictly speaking, that's not what we've got implemented on arm64: only > the read part of the RmW has Acquire semantics, but there is a total > order on the lock/unlock operations for the lock. For example, if one > CPU does: > > spin_lock(&lock); > WRITE_ONCE(foo, 42); > > then another CPU could do: > > if (smp_load_acquire(&foo) == 42) > BUG_ON(!spin_is_locked(&lock)); > Hmm.. this is new to me. So the write part of spin_lock() and the WRITE_ONCE() will not get reordered? Could you explain more about this or point where I should look in the document? I understand the write part of spin_lock() must be committed earlier than the WRITE_ONCE() committed due to the ll/sc, but I thought the ordering of their arrivals in memory system is undefined/arbitary. Regards, Boqun > and that could fire. Is that relied on somewhere? > > Will -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: <http://lists.infradead.org/pipermail/linux-riscv/attachments/20180227/6adaf085/attachment.sig> ^ permalink raw reply [flat|nested] 23+ messages in thread
* [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock() 2018-02-27 5:06 ` Boqun Feng @ 2018-02-27 10:16 ` Boqun Feng 0 siblings, 0 replies; 23+ messages in thread From: Boqun Feng @ 2018-02-27 10:16 UTC (permalink / raw) To: linux-riscv On Tue, Feb 27, 2018 at 01:06:35PM +0800, Boqun Feng wrote: > On Mon, Feb 26, 2018 at 04:24:27PM +0000, Will Deacon wrote: > > On Mon, Feb 26, 2018 at 08:06:59AM -0800, Linus Torvalds wrote: > > > On Mon, Feb 26, 2018 at 6:21 AM, Luc Maranget <luc.maranget@inria.fr> wrote: > > > > > > > > That is, locks are not implemented from more basic primitive but are specified. > > > > The specification can be described as behaving that way: > > > > - A lock behaves as a read-modify-write. the read behaving as a read-acquire > > > > > > This is wrong, or perhaps just misleading. > > > > > > The *whole* r-m-w acts as an acquire. Not just the read part. The > > > write is very much part of it. > > > > > > Maybe that's what you meant, but it read to me as "just the read part > > > of the rmw behaves as a read-acquire". > > > > > > Because it is very important that the _write_ part of the rmw is also > > > ordered wrt everything that is inside the spinlock. > > > > > > So doing a spinlock as > > > > > > (a) read-locked-acquire > > > modify > > > (c) write-conditional > > > > > > would be wrong, because the accesses inside the spinlock are ordered > > > not just wrt the read-acquire, they have to be ordered wrt the write > > > too. > > > > > > So it is closer to say that it's the _write_ of the r-m-w sequence > > > that has the acquire semantics, not the read. > > > > Strictly speaking, that's not what we've got implemented on arm64: only > > the read part of the RmW has Acquire semantics, but there is a total > > order on the lock/unlock operations for the lock. For example, if one > > CPU does: > > > > spin_lock(&lock); > > WRITE_ONCE(foo, 42); > > > > then another CPU could do: > > > > if (smp_load_acquire(&foo) == 42) > > BUG_ON(!spin_is_locked(&lock)); > > > > Hmm.. this is new to me. So the write part of spin_lock() and the > WRITE_ONCE() will not get reordered? Could you explain more about this > or point where I should look in the document? I understand the write > part of spin_lock() must be committed earlier than the WRITE_ONCE() > committed due to the ll/sc, but I thought the ordering of their arrivals > in memory system is undefined/arbitary. > > Regards, > Boqun > > > and that could fire. Is that relied on somewhere? > > My bad, I misunderstood your mail. So you were saying the BUG_ON() can be triggered in currently implementations. Never mind my reply above. Regards, Boqun > > Will -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: <http://lists.infradead.org/pipermail/linux-riscv/attachments/20180227/a1dfcf1b/attachment.sig> ^ permalink raw reply [flat|nested] 23+ messages in thread
* [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock() 2018-02-22 19:47 ` Daniel Lustig ` (2 preceding siblings ...) 2018-02-26 14:21 ` Luc Maranget @ 2018-03-01 15:11 ` Andrea Parri 2018-03-01 21:54 ` Palmer Dabbelt 3 siblings, 1 reply; 23+ messages in thread From: Andrea Parri @ 2018-03-01 15:11 UTC (permalink / raw) To: linux-riscv Hi Daniel, On Thu, Feb 22, 2018 at 11:47:57AM -0800, Daniel Lustig wrote: > On 2/22/2018 10:27 AM, Peter Zijlstra wrote: > > On Thu, Feb 22, 2018 at 10:13:17AM -0800, Paul E. McKenney wrote: > >> So we have something that is not all that rare in the Linux kernel > >> community, namely two conflicting more-or-less concurrent changes. > >> This clearly needs to be resolved, either by us not strengthening the > >> Linux-kernel memory model in the way we were planning to or by you > >> strengthening RISC-V to be no weaker than PowerPC for these sorts of > >> externally viewed release-acquire situations. > >> > >> Other thoughts? > > > > Like said in the other email, I would _much_ prefer to not go weaker > > than PPC, I find that PPC is already painfully weak at times. > > Sure, and RISC-V could make this work too by using RCsc instructions > and/or by using lightweight fences instead. It just wasn't clear at > first whether smp_load_acquire() and smp_store_release() were RCpc, > RCsc, or something else, and hence whether RISC-V would actually need > to use something stronger than pure RCpc there. Likewise for > spin_unlock()/spin_lock() and everywhere else this comes up. while digging into riscv's locks and atomics to fix the issues discussed earlier in this thread, I became aware of another issue: Considering here the CMPXCHG primitives, for example, I noticed that the implementation currently provides the four variants ATOMIC_OPS( , .aqrl) ATOMIC_OPS(_acquire, .aq) ATOMIC_OPS(_release, .rl) ATOMIC_OPS(_relaxed, ) (corresponding, resp., to atomic_cmpxchg() atomic_cmpxchg_acquire() atomic_cmpxchg_release() atomic_cmpxchg_relaxed() ) so that the first variant above ends up doing 0: lr.w.aqrl %0, %addr bne %0, %old, 1f sc.w.aqrl %1, %new, %addr bnez %1, 0b 1: >From Sect. 2.3.5. ("Acquire/Release Ordering") of the Spec., "AMOs with both .aq and .rl set are fully-ordered operations. Treating the load part and the store part as independent RCsc operations is not in and of itself sufficient to enforce full fencing behavior, but this subtle weak behavior is counterintuitive and not much of an advantage architecturally, especially with lr and sc also available [...]." I understand that { x = y = u = v = 0 } P0() WRITE_ONCE(x, 1); ("relaxed" store, sw) atomic_cmpxchg(&u, 0, 1); r0 = READ_ONCE(y); ("relaxed" load, lw) P1() WRITE_ONCE(y, 1); atomic_cmpxchg(&v, 0, 1); r1 = READ_ONCE(x); could result in (u = v = 1 and r0 = r1 = 0) at the end; can you confirm? Thanks, Andrea ^ permalink raw reply [flat|nested] 23+ messages in thread
* [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock() 2018-03-01 15:11 ` Andrea Parri @ 2018-03-01 21:54 ` Palmer Dabbelt 2018-03-01 22:21 ` Daniel Lustig 0 siblings, 1 reply; 23+ messages in thread From: Palmer Dabbelt @ 2018-03-01 21:54 UTC (permalink / raw) To: linux-riscv On Thu, 01 Mar 2018 07:11:41 PST (-0800), parri.andrea at gmail.com wrote: > Hi Daniel, > > On Thu, Feb 22, 2018 at 11:47:57AM -0800, Daniel Lustig wrote: >> On 2/22/2018 10:27 AM, Peter Zijlstra wrote: >> > On Thu, Feb 22, 2018 at 10:13:17AM -0800, Paul E. McKenney wrote: >> >> So we have something that is not all that rare in the Linux kernel >> >> community, namely two conflicting more-or-less concurrent changes. >> >> This clearly needs to be resolved, either by us not strengthening the >> >> Linux-kernel memory model in the way we were planning to or by you >> >> strengthening RISC-V to be no weaker than PowerPC for these sorts of >> >> externally viewed release-acquire situations. >> >> >> >> Other thoughts? >> > >> > Like said in the other email, I would _much_ prefer to not go weaker >> > than PPC, I find that PPC is already painfully weak at times. >> >> Sure, and RISC-V could make this work too by using RCsc instructions >> and/or by using lightweight fences instead. It just wasn't clear at >> first whether smp_load_acquire() and smp_store_release() were RCpc, >> RCsc, or something else, and hence whether RISC-V would actually need >> to use something stronger than pure RCpc there. Likewise for >> spin_unlock()/spin_lock() and everywhere else this comes up. > > while digging into riscv's locks and atomics to fix the issues discussed > earlier in this thread, I became aware of another issue: > > Considering here the CMPXCHG primitives, for example, I noticed that the > implementation currently provides the four variants > > ATOMIC_OPS( , .aqrl) > ATOMIC_OPS(_acquire, .aq) > ATOMIC_OPS(_release, .rl) > ATOMIC_OPS(_relaxed, ) > > (corresponding, resp., to > > atomic_cmpxchg() > atomic_cmpxchg_acquire() > atomic_cmpxchg_release() > atomic_cmpxchg_relaxed() ) > > so that the first variant above ends up doing > > 0: lr.w.aqrl %0, %addr > bne %0, %old, 1f > sc.w.aqrl %1, %new, %addr > bnez %1, 0b > 1: > > From Sect. 2.3.5. ("Acquire/Release Ordering") of the Spec., > > "AMOs with both .aq and .rl set are fully-ordered operations. Treating > the load part and the store part as independent RCsc operations is not > in and of itself sufficient to enforce full fencing behavior, but this > subtle weak behavior is counterintuitive and not much of an advantage > architecturally, especially with lr and sc also available [...]." > > I understand that > > { x = y = u = v = 0 } > > P0() > > WRITE_ONCE(x, 1); ("relaxed" store, sw) > atomic_cmpxchg(&u, 0, 1); > r0 = READ_ONCE(y); ("relaxed" load, lw) > > P1() > > WRITE_ONCE(y, 1); > atomic_cmpxchg(&v, 0, 1); > r1 = READ_ONCE(x); > > could result in (u = v = 1 and r0 = r1 = 0) at the end; can you confirm? cmpxchg isn't an AMO, it's an LR SC sequence, so that blurb doesn't apply. I think "lr.w.aqrl" and "sc.w.aqrl" is not sufficient to perform a fully ordered operation (ie, it's an incorrect implementation of atomic_cmpxchg()), but I was hoping to get some time to actually internalize this part of the RISC-V memory model at some point to be sure. ^ permalink raw reply [flat|nested] 23+ messages in thread
* [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock() 2018-03-01 21:54 ` Palmer Dabbelt @ 2018-03-01 22:21 ` Daniel Lustig 0 siblings, 0 replies; 23+ messages in thread From: Daniel Lustig @ 2018-03-01 22:21 UTC (permalink / raw) To: linux-riscv On 3/1/2018 1:54 PM, Palmer Dabbelt wrote: > On Thu, 01 Mar 2018 07:11:41 PST (-0800), parri.andrea at gmail.com wrote: >> Hi Daniel, >> >> On Thu, Feb 22, 2018 at 11:47:57AM -0800, Daniel Lustig wrote: >>> On 2/22/2018 10:27 AM, Peter Zijlstra wrote: >>> > On Thu, Feb 22, 2018 at 10:13:17AM -0800, Paul E. McKenney wrote: >>> >> So we have something that is not all that rare in the Linux kernel >>> >> community, namely two conflicting more-or-less concurrent changes. >>> >> This clearly needs to be resolved, either by us not strengthening the >>> >> Linux-kernel memory model in the way we were planning to or by you >>> >> strengthening RISC-V to be no weaker than PowerPC for these sorts of >>> >> externally viewed release-acquire situations. >>> >> >>> >> Other thoughts? >>> > >>> > Like said in the other email, I would _much_ prefer to not go weaker >>> > than PPC, I find that PPC is already painfully weak at times. >>> >>> Sure, and RISC-V could make this work too by using RCsc instructions >>> and/or by using lightweight fences instead.? It just wasn't clear at >>> first whether smp_load_acquire() and smp_store_release() were RCpc, >>> RCsc, or something else, and hence whether RISC-V would actually need >>> to use something stronger than pure RCpc there.? Likewise for >>> spin_unlock()/spin_lock() and everywhere else this comes up. >> >> while digging into riscv's locks and atomics to fix the issues discussed >> earlier in this thread, I became aware of another issue: >> >> Considering here the CMPXCHG primitives, for example, I noticed that the >> implementation currently provides the four variants >> >> ????ATOMIC_OPS(??????? , .aqrl) >> ????ATOMIC_OPS(_acquire,?? .aq) >> ????ATOMIC_OPS(_release,?? .rl) >> ????ATOMIC_OPS(_relaxed,????? ) >> >> (corresponding, resp., to >> >> ????atomic_cmpxchg() >> ????atomic_cmpxchg_acquire() >> ????atomic_cmpxchg_release() >> ????atomic_cmpxchg_relaxed()? ) >> >> so that the first variant above ends up doing >> >> ????0:??? lr.w.aqrl? %0, %addr >> ??????? bne??????? %0, %old, 1f >> ??????? sc.w.aqrl? %1, %new, %addr >> ??????? bnez?????? %1, 0b >> ??????? 1: >> >> From Sect. 2.3.5. ("Acquire/Release Ordering") of the Spec., >> >> ?"AMOs with both .aq and .rl set are fully-ordered operations.? Treating >> ? the load part and the store part as independent RCsc operations is not >> ? in and of itself sufficient to enforce full fencing behavior, but this >> ? subtle weak behavior is counterintuitive and not much of an advantage >> ? architecturally, especially with lr and sc also available [...]." >> >> I understand that >> >> ??????? { x = y = u = v = 0 } >> >> ????P0() >> >> ????WRITE_ONCE(x, 1);??????? ("relaxed" store, sw) >> ????atomic_cmpxchg(&u, 0, 1); >> ????r0 = READ_ONCE(y);??????? ("relaxed" load, lw) >> >> ????P1() >> >> ????WRITE_ONCE(y, 1); >> ????atomic_cmpxchg(&v, 0, 1); >> ????r1 = READ_ONCE(x); >> >> could result in (u = v = 1 and r0 = r1 = 0) at the end; can you confirm? > > cmpxchg isn't an AMO, it's an LR SC sequence, so that blurb doesn't > apply. I think "lr.w.aqrl" and "sc.w.aqrl" is not sufficient to > perform a fully ordered operation (ie, it's an incorrect > implementation of atomic_cmpxchg()), but I was hoping to get some > time to actually internalize this part of the RISC-V memory model at > some point to be sure. Agreed, the current intention is that we'll add a fence rw,rw in there when we use lr/sc as the implementation, likely similar to what ARM does: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8e86f0b409a44193f1587e87b69c5dcf8f65be67 It's also probably not the only piece of synchronization-related code we'll have to go back and audit as we finalize the RISC-V memory consistency model over the next couple months or so. Dan ^ permalink raw reply [flat|nested] 23+ messages in thread
* [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock() 2018-02-22 18:27 ` Peter Zijlstra 2018-02-22 19:47 ` Daniel Lustig @ 2018-02-22 20:02 ` Paul E. McKenney 1 sibling, 0 replies; 23+ messages in thread From: Paul E. McKenney @ 2018-02-22 20:02 UTC (permalink / raw) To: linux-riscv On Thu, Feb 22, 2018 at 07:27:17PM +0100, Peter Zijlstra wrote: > On Thu, Feb 22, 2018 at 10:13:17AM -0800, Paul E. McKenney wrote: > > So we have something that is not all that rare in the Linux kernel > > community, namely two conflicting more-or-less concurrent changes. > > This clearly needs to be resolved, either by us not strengthening the > > Linux-kernel memory model in the way we were planning to or by you > > strengthening RISC-V to be no weaker than PowerPC for these sorts of > > externally viewed release-acquire situations. > > > > Other thoughts? > > Like said in the other email, I would _much_ prefer to not go weaker > than PPC, I find that PPC is already painfully weak at times. And here are the four PowerPC litmus tests. As expected, a release-acquire pair within a given process orders everything except for prior stores against later loads, from the viewpoint of some other process. And yes, a few of the filenames are unfortunate. Thanx, Paul ------------------------------------------------------------------------ PPC MP+o-r-a-o+o-rmb-o "" (* 22-Feb-2018: ppcmem says "Never" *) { 0:r1=1; 0:r4=x; 0:r5=y; 0:r6=z; 1:r1=1; 1:r4=x; 1:r5=y; 1:r6=z; } P0 | P1 ; stw r1,0(r4) | lwz r7,0(r6) ; lwsync | lwsync ; stw r1,0(r5) | lwz r8,0(r4) ; lwz r7,0(r5) | ; lwsync | ; stw r1,0(r6) | ; exists (1:r7=1 /\ 1:r8=0) ------------------------------------------------------------------------ PPC SB+o-r-a-o+o-rmb-o "" (* 22-Feb-2018: ppcmem says "Sometimes" *) { 0:r1=1; 0:r4=x; 0:r5=y; 0:r6=z; 1:r1=1; 1:r4=x; 1:r5=y; 1:r6=z; } P0 | P1 ; stw r1,0(r4) | stw r1,0(r6) ; lwsync | lwsync ; stw r1,0(r5) | lwz r7,0(r4) ; lwz r8,0(r5) | ; lwsync | ; lwz r7,0(r6) | ; exists (0:r7=0 /\ 1:r7=0) ------------------------------------------------------------------------ PPC LB+o-r-a-o+o-rmb-o "" (* 22-Feb-2018: ppcmem says "Never" *) { 0:r1=1; 0:r4=x; 0:r5=y; 0:r6=z; 1:r1=1; 1:r4=x; 1:r5=y; 1:r6=z; } P0 | P1 ; lwz r7,0(r4) | lwz r7,0(r6) ; lwsync | lwsync ; stw r1,0(r5) | stw r1,0(r4) ; lwz r8,0(r5) | ; lwsync | ; stw r1,0(r6) | ; exists (0:r7=1 /\ 1:r7=1) ------------------------------------------------------------------------ PPC MP+o-rmb-o+o-r-a-o.litmus "" (* 22-Feb-2018: ppcmem says "Never" *) { 0:r1=1; 0:r4=x; 0:r5=y; 0:r6=z; 1:r1=1; 1:r4=x; 1:r5=y; 1:r6=z; } P0 | P1 ; lwz r7,0(r4) | stw r1,0(r6) ; lwsync | lwsync ; stw r1,0(r5) | stw r1,0(r4) ; lwz r8,0(r5) | ; lwsync | ; lwz r9,0(r6) | ; exists (0:r7=1 /\ 0:r9=0) ^ permalink raw reply [flat|nested] 23+ messages in thread
* [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock() 2018-02-22 17:27 ` Daniel Lustig 2018-02-22 18:13 ` Paul E. McKenney @ 2018-02-22 18:21 ` Peter Zijlstra 1 sibling, 0 replies; 23+ messages in thread From: Peter Zijlstra @ 2018-02-22 18:21 UTC (permalink / raw) To: linux-riscv On Thu, Feb 22, 2018 at 09:27:09AM -0800, Daniel Lustig wrote: > On 2/22/2018 6:12 AM, Andrea Parri wrote: > > On Thu, Feb 22, 2018 at 02:40:04PM +0100, Peter Zijlstra wrote: > >> On Thu, Feb 22, 2018 at 01:19:50PM +0100, Andrea Parri wrote: > >> > >>> C unlock-lock-read-ordering > >>> > >>> {} > >>> /* s initially owned by P1 */ > >>> > >>> P0(int *x, int *y) > >>> { > >>> WRITE_ONCE(*x, 1); > >>> smp_wmb(); > >>> WRITE_ONCE(*y, 1); > >>> } > >>> > >>> P1(int *x, int *y, spinlock_t *s) > >>> { > >>> int r0; > >>> int r1; > >>> > >>> r0 = READ_ONCE(*y); > >>> spin_unlock(s); > >>> spin_lock(s); > >>> r1 = READ_ONCE(*y); > >>> } > >> So I would indeed expect this to be forbidden. Could someone please > >> explain how this could be allowed? > I think an RCpc interpretation of .aq and .rl would in fact allow > the two normal loads in P1 to be reordered. Wouldn't it? Does that > go against what the LKMM requires? > > The intuition would be that the amoswap.w.aq can forward from the > amoswap.w.rl while that's still in the store buffer, and then the > lw x3,0(x4) can also perform while the amoswap.w.rl is still in > the store buffer, all before the l1 x1,0(x2) executes. That's > not forbidden unless the amoswaps are RCsc, unless I'm missing > something. So here you're equating flushing the store buffer to RCsc. Is that entirely fair? Can't a local store-buffer flush still not mean global visibility, similarly, could not a store-buffer have internal ordering, such that it guarantees to flush writes in-order? Employing for example multiple stages / generations. That is, I think there's quite a bit of gray area here. But even with the store buffer, the STORE-RELEASE could 'wait' for all prior LOADs to complete before issuing the STORE. It could also increment a store-buffer generation before issuing, such that it would, once it gets about flushing things, issue the STOREs in the 'promised' order. > Likewise even if the unlock()/lock() is between two stores. A > control dependency might originate from the load part of the > amoswap.w.aq, but there still would have to be something to > ensure that this load part in fact performs after the store > part of the amoswap.w.rl performs globally, and that's not > automatic under RCpc. Right, so I think we have different definitions of what RCpc means, and I suspect I might be the odd one out. And its not just spinlocks. > >> > >>> C unlock-lock-write-ordering > >>> > >>> {} > >>> /* s initially owned by P0 */ > >>> > >>> P0(int *x, int *y, spinlock_t *s) > >>> { > >>> WRITE_ONCE(*x, 1); > >>> spin_unlock(s); > >>> spin_lock(s); > >>> WRITE_ONCE(*y, 1); > >>> } > >>> > >>> P1(int *x, int *y) > >>> { > >>> int r0; > >>> int r1; > >>> > >>> r0 = READ_ONCE(*y); > >>> smp_rmb(); > >>> r1 = READ_ONCE(*y); > >>> } > >>> > >>> exists (1:r0=1 /\ 1:r1=0) > >>> > >>> RISCV RISCV-unlock-lock-write-ordering > >>> { > >>> 0:x2=x; 0:x4=y; 0:x6=s; > >>> 1:x2=y; 1:x4=x; > >>> s=1; > >>> } > >>> P0 | P1 ; > >>> ori x1,x0,1 | lw x1,0(x2) ; > >>> sw x1,0(x2) | fence r,r ; > >>> amoswap.w.rl x0,x0,(x6) | lw x3,0(x4) ; > >>> ori x5,x0,1 | ; > >>> amoswap.w.aq x0,x5,(x6) | ; > >>> ori x3,x0,1 | ; > >>> sw x3,0(x4) | ; > >>> exists > >>> (1:x1=1 /\ 1:x3=0) > >> > >> And here I think the RISCV conversion is flawed, there should be a ctrl > >> dependency. The second store-word in P0 should depend on the result of > >> amoswap.w.aq being 0. > > > > You're right: AFAICT, this can be remedied by inserting "beq x0,x5,FAIL00" > > right after amoswap.w.aq (and this label at the end of P0); this does not > > change the verdict of the available formalizations reported above however. > > > > (So, AFAICT, the above question remains valid/open.) > > This makes sense, but one other question: Paul mentioned earlier in the > thread that > > > The PowerPC lock implementation's unlock-lock pair does not order writes > > from the previous critical section against reads from the later critical > > section, but it does order other combinations of reads and writes. > > My understanding is that the same logic about control dependencies applies > here too, right? In other words, in spite of Peter's claim, the control > dependency doesn't automatically fix this for RISC-V or for PowerPC? So what Paul says is that earlier STOREs and later LOADs can reorder (the TSO reordering) over the RELEASE+ACQUIRE. And that is right, LWSYNC allows just that one reorder. The ctrl-dep cannot fix that, ctrl-dep is a LOAD->STORE order, later STOREs will happen after the earlier LOAD. But the above example is a STORE->STORE, which LWSYNC _will_ order. But yes, I remember being confused by that example. > Peter, in this test you mentioned earlier: > > P0() > { > WRITE_ONCE(x, 1); > smp_store_release(&y, 1); > r0 = smp_load_acquire(&y); > WRITE_ONCE(z, 1); > } > > P1() > { > r1 = READ_ONCE(z); > smp_rmb(); > r2 = READ_ONCE(x); > } > > exists: r0 == 1 /\ r1==1 /\ r2==0 > > You expect this to be forbidden too, even if the release and acquire > are RCpc? This part I don't follow. There's no conditional branch > here to enforce a control dependency. I get and I agree that > smp_store_release/smp_load_acquire are different than spin_unlock/ > spin_lock, but isn't that even more reason to allow this one if > smp_store_release/smp_load_acquire are RCpc without question? Yes, I was expecting that to be forbidden. My definition of RCpc is such that ACQUIRE and RELEASE are 'locally' respected. The TSO archs forbid the above by ordering STOREs, then the only architectures that implement ACQUIRE/RELEASE are PPC and ARM64 (the rest uses smp_mb()), ARM64 has globally consistent RELEASE/ACQUIRE and PPC has their LWSYNC which imposes a local TSO order. In particular I'm not a great fan of weakening the LKMM further. PPC is already painfully weak (more so than ARM64), but the proposed RISC-V would be weaker still. Of course, I come from x86 so I could be biased, but history has shown that the weaker this stuff becomes the more painful this all becomes. Not to mention that we'd have to go audit all existant code for constructs no longer allowed, which is incredibly painful. Linus too has expressed his preference for stronger models. ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2018-03-06 13:00 UTC | newest] Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-02-22 12:19 [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock() Andrea Parri 2018-02-22 12:44 ` Andrea Parri 2018-02-22 13:40 ` Peter Zijlstra 2018-02-22 14:12 ` Andrea Parri 2018-02-22 17:27 ` Daniel Lustig 2018-02-22 18:13 ` Paul E. McKenney 2018-02-22 18:27 ` Peter Zijlstra 2018-02-22 19:47 ` Daniel Lustig 2018-02-23 11:16 ` Andrea Parri 2018-02-26 10:39 ` Will Deacon 2018-02-26 14:21 ` Luc Maranget 2018-02-26 16:06 ` Linus Torvalds 2018-02-26 16:24 ` Will Deacon 2018-02-26 17:00 ` Linus Torvalds 2018-02-26 17:10 ` Will Deacon 2018-03-06 13:00 ` Peter Zijlstra 2018-02-27 5:06 ` Boqun Feng 2018-02-27 10:16 ` Boqun Feng 2018-03-01 15:11 ` Andrea Parri 2018-03-01 21:54 ` Palmer Dabbelt 2018-03-01 22:21 ` Daniel Lustig 2018-02-22 20:02 ` Paul E. McKenney 2018-02-22 18:21 ` Peter Zijlstra
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).