All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock()
@ 2018-02-22 12:19 ` Andrea Parri
  0 siblings, 0 replies; 46+ messages in thread
From: Andrea Parri @ 2018-02-22 12:19 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrea Parri, Palmer Dabbelt, Albert Ou, Daniel Lustig,
	Alan Stern, Will Deacon, Peter Zijlstra, Boqun Feng,
	Nicholas Piggin, David Howells, Jade Alglave, Luc Maranget,
	Paul E. McKenney, Akira Yokosawa, Ingo Molnar, Linus Torvalds,
	linux-riscv

The LKMM defines certain memory ordering constraints for spin_lock(),
spin_unlock() and other primitives that the kernel developer can rely
on; unfortunately, some of these constraints are not currently met by
the RISC-V implementation of spin_lock(), spin_unlock().

The following MP-like program exemplifies the issue: according to our
LKMM, program "unlock-lock-read-ordering" below can never reach state
(1:r0=1 /\ 1:r1=0).  However, when we map this C program to the RISCV
program "RISCV-unlock-lock-read-ordering" below following the current
implementation, the corresponding state is reachable according to the
RISCV specification and its formalizations [2].

C unlock-lock-read-ordering

{}
/* s initially owned by P1 */

P0(int *x, int *y)
{
	WRITE_ONCE(*x, 1);
	smp_wmb();
	WRITE_ONCE(*y, 1);
}

P1(int *x, int *y, spinlock_t *s)
{
	int r0;
	int r1;

	r0 = READ_ONCE(*y);
	spin_unlock(s);
	spin_lock(s);
	r1 = READ_ONCE(*y);
}

exists (1:r0=1 /\ 1:r1=0)

RISCV RISCV-unlock-lock-read-ordering
{
0:x2=x; 0:x4=y;
1:x2=y; 1:x4=x; 1:x6=s;
s=1;
}
 P0           |  P1                      ;
 ori x1,x0,1  | lw x1,0(x2)              ;
 sw x1,0(x2)  | amoswap.w.rl x0,x0,(x6)  ;
 fence w,w    | ori x5,x0,1              ;
 ori x3,x0,1  | amoswap.w.aq x0,x5,(x6)  ;
 sw x3,0(x4)  | lw x3,0(x4)              ;
exists
(1:x1=1 /\ 1:x3=0)

The issue can in fact be exarcebated if, as envisaged/discussed in [3],
the LKMM will be modified to become even more "demanding" on the order-
ing constraints associated to the locking primitives.  For example the
state (1:r0=1 /\ 1:r1=0) in program "unlock-lock-write-ordering" below
is currently allowed by LKMM (as is the corresponding state in "RISCV-
unlock-lock-write-ordering" below).  However, proposals modifying LKMM
to _forbid_ that state have already appeared on LKML [4].

C unlock-lock-write-ordering

{}
/* s initially owned by P0 */

P0(int *x, int *y, spinlock_t *s)
{
	WRITE_ONCE(*x, 1);
	spin_unlock(s);
	spin_lock(s);
	WRITE_ONCE(*y, 1);
}

P1(int *x, int *y)
{
	int r0;
	int r1;

	r0 = READ_ONCE(*y);
	smp_rmb();
	r1 = READ_ONCE(*y);
}

exists (1:r0=1 /\ 1:r1=0)

RISCV RISCV-unlock-lock-write-ordering
{
0:x2=x; 0:x4=y; 0:x6=s;
1:x2=y; 1:x4=x;
s=1;
}
 P0                       | P1           ;
 ori x1,x0,1              | lw x1,0(x2)  ;
 sw x1,0(x2)              | fence r,r    ;
 amoswap.w.rl x0,x0,(x6)  | lw x3,0(x4)  ;
 ori x5,x0,1              |              ;
 amoswap.w.aq x0,x5,(x6)  |              ;
 ori x3,x0,1              |              ;
 sw x3,0(x4)              |              ;
exists
(1:x1=1 /\ 1:x3=0)

[Curiously, RISC-V's current implementations of smp_load_acquire() and
 smp_store_release() provide way stronger ordering than what currently
 required by LKMM since those're relying on the generic implementation
 (c.f, also, [5]). ]

This RFC fixes the issue by strengthening RISC-V's implementations of
spin_lock() and spin_unlock(), based on "A spinlock with fences" from
Section 2.3.5 ("Acquire/Release Ordering") of the RISC-V draft spec.
It does _not_ attempt to fix read-lock and atomics (for which, AFAICT,
similar considerations would hold).

IMPORTANT.  This patch is _NOT_ intended to be applied as is.  Rather,
this is intended to test the waters, implicit questions being "Should
we take this direction?" "Are changes to LKMM needed?" (and develop a
technical discussion on the above issues.)

[1] https://marc.info/?l=linux-kernel&m=151633436614259&w=2
[2] https://groups.google.com/a/groups.riscv.org/forum/#!topic/isa-dev/hKywNHBkAXM
[3] https://marc.info/?l=linux-kernel&m=151181741400461&w=2
[4] https://marc.info/?l=linux-kernel&m=151871035014425&w=2
[5] https://marc.info/?l=linux-kernel&m=151912186913692&w=2

Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Albert Ou <albert@sifive.com>
Cc: Daniel Lustig <dlustig@nvidia.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Jade Alglave <j.alglave@ucl.ac.uk>
Cc: Luc Maranget <luc.maranget@inria.fr>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Akira Yokosawa <akiyks@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-riscv@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
---
 arch/riscv/include/asm/spinlock.h | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/riscv/include/asm/spinlock.h b/arch/riscv/include/asm/spinlock.h
index 2fd27e8ef1fd6..2f89fc62c9196 100644
--- a/arch/riscv/include/asm/spinlock.h
+++ b/arch/riscv/include/asm/spinlock.h
@@ -28,8 +28,9 @@
 
 static inline void arch_spin_unlock(arch_spinlock_t *lock)
 {
+	RISCV_FENCE(rw,w);
 	__asm__ __volatile__ (
-		"amoswap.w.rl x0, x0, %0"
+		"amoswap.w x0, x0, %0"
 		: "=A" (lock->lock)
 		:: "memory");
 }
@@ -39,10 +40,11 @@ static inline int arch_spin_trylock(arch_spinlock_t *lock)
 	int tmp = 1, busy;
 
 	__asm__ __volatile__ (
-		"amoswap.w.aq %0, %2, %1"
+		"amoswap.w %0, %2, %1"
 		: "=r" (busy), "+A" (lock->lock)
 		: "r" (tmp)
 		: "memory");
+	RISCV_FENCE(r,rw);
 
 	return !busy;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock()
@ 2018-02-22 12:19 ` Andrea Parri
  0 siblings, 0 replies; 46+ messages in thread
From: Andrea Parri @ 2018-02-22 12:19 UTC (permalink / raw)
  To: linux-riscv

The LKMM defines certain memory ordering constraints for spin_lock(),
spin_unlock() and other primitives that the kernel developer can rely
on; unfortunately, some of these constraints are not currently met by
the RISC-V implementation of spin_lock(), spin_unlock().

The following MP-like program exemplifies the issue: according to our
LKMM, program "unlock-lock-read-ordering" below can never reach state
(1:r0=1 /\ 1:r1=0).  However, when we map this C program to the RISCV
program "RISCV-unlock-lock-read-ordering" below following the current
implementation, the corresponding state is reachable according to the
RISCV specification and its formalizations [2].

C unlock-lock-read-ordering

{}
/* s initially owned by P1 */

P0(int *x, int *y)
{
	WRITE_ONCE(*x, 1);
	smp_wmb();
	WRITE_ONCE(*y, 1);
}

P1(int *x, int *y, spinlock_t *s)
{
	int r0;
	int r1;

	r0 = READ_ONCE(*y);
	spin_unlock(s);
	spin_lock(s);
	r1 = READ_ONCE(*y);
}

exists (1:r0=1 /\ 1:r1=0)

RISCV RISCV-unlock-lock-read-ordering
{
0:x2=x; 0:x4=y;
1:x2=y; 1:x4=x; 1:x6=s;
s=1;
}
 P0           |  P1                      ;
 ori x1,x0,1  | lw x1,0(x2)              ;
 sw x1,0(x2)  | amoswap.w.rl x0,x0,(x6)  ;
 fence w,w    | ori x5,x0,1              ;
 ori x3,x0,1  | amoswap.w.aq x0,x5,(x6)  ;
 sw x3,0(x4)  | lw x3,0(x4)              ;
exists
(1:x1=1 /\ 1:x3=0)

The issue can in fact be exarcebated if, as envisaged/discussed in [3],
the LKMM will be modified to become even more "demanding" on the order-
ing constraints associated to the locking primitives.  For example the
state (1:r0=1 /\ 1:r1=0) in program "unlock-lock-write-ordering" below
is currently allowed by LKMM (as is the corresponding state in "RISCV-
unlock-lock-write-ordering" below).  However, proposals modifying LKMM
to _forbid_ that state have already appeared on LKML [4].

C unlock-lock-write-ordering

{}
/* s initially owned by P0 */

P0(int *x, int *y, spinlock_t *s)
{
	WRITE_ONCE(*x, 1);
	spin_unlock(s);
	spin_lock(s);
	WRITE_ONCE(*y, 1);
}

P1(int *x, int *y)
{
	int r0;
	int r1;

	r0 = READ_ONCE(*y);
	smp_rmb();
	r1 = READ_ONCE(*y);
}

exists (1:r0=1 /\ 1:r1=0)

RISCV RISCV-unlock-lock-write-ordering
{
0:x2=x; 0:x4=y; 0:x6=s;
1:x2=y; 1:x4=x;
s=1;
}
 P0                       | P1           ;
 ori x1,x0,1              | lw x1,0(x2)  ;
 sw x1,0(x2)              | fence r,r    ;
 amoswap.w.rl x0,x0,(x6)  | lw x3,0(x4)  ;
 ori x5,x0,1              |              ;
 amoswap.w.aq x0,x5,(x6)  |              ;
 ori x3,x0,1              |              ;
 sw x3,0(x4)              |              ;
exists
(1:x1=1 /\ 1:x3=0)

[Curiously, RISC-V's current implementations of smp_load_acquire() and
 smp_store_release() provide way stronger ordering than what currently
 required by LKMM since those're relying on the generic implementation
 (c.f, also, [5]). ]

This RFC fixes the issue by strengthening RISC-V's implementations of
spin_lock() and spin_unlock(), based on "A spinlock with fences" from
Section 2.3.5 ("Acquire/Release Ordering") of the RISC-V draft spec.
It does _not_ attempt to fix read-lock and atomics (for which, AFAICT,
similar considerations would hold).

IMPORTANT.  This patch is _NOT_ intended to be applied as is.  Rather,
this is intended to test the waters, implicit questions being "Should
we take this direction?" "Are changes to LKMM needed?" (and develop a
technical discussion on the above issues.)

[1] https://marc.info/?l=linux-kernel&m=151633436614259&w=2
[2] https://groups.google.com/a/groups.riscv.org/forum/#!topic/isa-dev/hKywNHBkAXM
[3] https://marc.info/?l=linux-kernel&m=151181741400461&w=2
[4] https://marc.info/?l=linux-kernel&m=151871035014425&w=2
[5] https://marc.info/?l=linux-kernel&m=151912186913692&w=2

Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Albert Ou <albert@sifive.com>
Cc: Daniel Lustig <dlustig@nvidia.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Jade Alglave <j.alglave@ucl.ac.uk>
Cc: Luc Maranget <luc.maranget@inria.fr>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Akira Yokosawa <akiyks@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-riscv at lists.infradead.org
Cc: linux-kernel at vger.kernel.org
---
 arch/riscv/include/asm/spinlock.h | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/riscv/include/asm/spinlock.h b/arch/riscv/include/asm/spinlock.h
index 2fd27e8ef1fd6..2f89fc62c9196 100644
--- a/arch/riscv/include/asm/spinlock.h
+++ b/arch/riscv/include/asm/spinlock.h
@@ -28,8 +28,9 @@
 
 static inline void arch_spin_unlock(arch_spinlock_t *lock)
 {
+	RISCV_FENCE(rw,w);
 	__asm__ __volatile__ (
-		"amoswap.w.rl x0, x0, %0"
+		"amoswap.w x0, x0, %0"
 		: "=A" (lock->lock)
 		:: "memory");
 }
@@ -39,10 +40,11 @@ static inline int arch_spin_trylock(arch_spinlock_t *lock)
 	int tmp = 1, busy;
 
 	__asm__ __volatile__ (
-		"amoswap.w.aq %0, %2, %1"
+		"amoswap.w %0, %2, %1"
 		: "=r" (busy), "+A" (lock->lock)
 		: "r" (tmp)
 		: "memory");
+	RISCV_FENCE(r,rw);
 
 	return !busy;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock()
  2018-02-22 12:19 ` Andrea Parri
@ 2018-02-22 12:44   ` Andrea Parri
  -1 siblings, 0 replies; 46+ messages in thread
From: Andrea Parri @ 2018-02-22 12:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: Palmer Dabbelt, Albert Ou, Daniel Lustig, Alan Stern,
	Will Deacon, Peter Zijlstra, Boqun Feng, Nicholas Piggin,
	David Howells, Jade Alglave, Luc Maranget, Paul E. McKenney,
	Akira Yokosawa, Ingo Molnar, Linus Torvalds, linux-riscv

On Thu, Feb 22, 2018 at 01:19:50PM +0100, Andrea Parri wrote:
> The LKMM defines certain memory ordering constraints for spin_lock(),
> spin_unlock() and other primitives that the kernel developer can rely
> on; unfortunately, some of these constraints are not currently met by
> the RISC-V implementation of spin_lock(), spin_unlock().
> 
> The following MP-like program exemplifies the issue: according to our
> LKMM, program "unlock-lock-read-ordering" below can never reach state
> (1:r0=1 /\ 1:r1=0).  However, when we map this C program to the RISCV
> program "RISCV-unlock-lock-read-ordering" below following the current
> implementation, the corresponding state is reachable according to the
> RISCV specification and its formalizations [2].
> 
> C unlock-lock-read-ordering
> 
> {}
> /* s initially owned by P1 */
> 
> P0(int *x, int *y)
> {
> 	WRITE_ONCE(*x, 1);
> 	smp_wmb();
> 	WRITE_ONCE(*y, 1);
> }
> 
> P1(int *x, int *y, spinlock_t *s)
> {
> 	int r0;
> 	int r1;
> 
> 	r0 = READ_ONCE(*y);
> 	spin_unlock(s);
> 	spin_lock(s);
> 	r1 = READ_ONCE(*y);

This last read should have been from 'x' of course (and similarly in
unlock-lock-write-ordering).  Sorry,

  Andrea


> }
> 
> exists (1:r0=1 /\ 1:r1=0)
> 
> RISCV RISCV-unlock-lock-read-ordering
> {
> 0:x2=x; 0:x4=y;
> 1:x2=y; 1:x4=x; 1:x6=s;
> s=1;
> }
>  P0           |  P1                      ;
>  ori x1,x0,1  | lw x1,0(x2)              ;
>  sw x1,0(x2)  | amoswap.w.rl x0,x0,(x6)  ;
>  fence w,w    | ori x5,x0,1              ;
>  ori x3,x0,1  | amoswap.w.aq x0,x5,(x6)  ;
>  sw x3,0(x4)  | lw x3,0(x4)              ;
> exists
> (1:x1=1 /\ 1:x3=0)
> 
> The issue can in fact be exarcebated if, as envisaged/discussed in [3],
> the LKMM will be modified to become even more "demanding" on the order-
> ing constraints associated to the locking primitives.  For example the
> state (1:r0=1 /\ 1:r1=0) in program "unlock-lock-write-ordering" below
> is currently allowed by LKMM (as is the corresponding state in "RISCV-
> unlock-lock-write-ordering" below).  However, proposals modifying LKMM
> to _forbid_ that state have already appeared on LKML [4].
> 
> C unlock-lock-write-ordering
> 
> {}
> /* s initially owned by P0 */
> 
> P0(int *x, int *y, spinlock_t *s)
> {
> 	WRITE_ONCE(*x, 1);
> 	spin_unlock(s);
> 	spin_lock(s);
> 	WRITE_ONCE(*y, 1);
> }
> 
> P1(int *x, int *y)
> {
> 	int r0;
> 	int r1;
> 
> 	r0 = READ_ONCE(*y);
> 	smp_rmb();
> 	r1 = READ_ONCE(*y);
> }
> 
> exists (1:r0=1 /\ 1:r1=0)
> 
> RISCV RISCV-unlock-lock-write-ordering
> {
> 0:x2=x; 0:x4=y; 0:x6=s;
> 1:x2=y; 1:x4=x;
> s=1;
> }
>  P0                       | P1           ;
>  ori x1,x0,1              | lw x1,0(x2)  ;
>  sw x1,0(x2)              | fence r,r    ;
>  amoswap.w.rl x0,x0,(x6)  | lw x3,0(x4)  ;
>  ori x5,x0,1              |              ;
>  amoswap.w.aq x0,x5,(x6)  |              ;
>  ori x3,x0,1              |              ;
>  sw x3,0(x4)              |              ;
> exists
> (1:x1=1 /\ 1:x3=0)
> 
> [Curiously, RISC-V's current implementations of smp_load_acquire() and
>  smp_store_release() provide way stronger ordering than what currently
>  required by LKMM since those're relying on the generic implementation
>  (c.f, also, [5]). ]
> 
> This RFC fixes the issue by strengthening RISC-V's implementations of
> spin_lock() and spin_unlock(), based on "A spinlock with fences" from
> Section 2.3.5 ("Acquire/Release Ordering") of the RISC-V draft spec.
> It does _not_ attempt to fix read-lock and atomics (for which, AFAICT,
> similar considerations would hold).
> 
> IMPORTANT.  This patch is _NOT_ intended to be applied as is.  Rather,
> this is intended to test the waters, implicit questions being "Should
> we take this direction?" "Are changes to LKMM needed?" (and develop a
> technical discussion on the above issues.)
> 
> [1] https://marc.info/?l=linux-kernel&m=151633436614259&w=2
> [2] https://groups.google.com/a/groups.riscv.org/forum/#!topic/isa-dev/hKywNHBkAXM
> [3] https://marc.info/?l=linux-kernel&m=151181741400461&w=2
> [4] https://marc.info/?l=linux-kernel&m=151871035014425&w=2
> [5] https://marc.info/?l=linux-kernel&m=151912186913692&w=2
> 
> Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
> Cc: Palmer Dabbelt <palmer@sifive.com>
> Cc: Albert Ou <albert@sifive.com>
> Cc: Daniel Lustig <dlustig@nvidia.com>
> Cc: Alan Stern <stern@rowland.harvard.edu>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Boqun Feng <boqun.feng@gmail.com>
> Cc: Nicholas Piggin <npiggin@gmail.com>
> Cc: David Howells <dhowells@redhat.com>
> Cc: Jade Alglave <j.alglave@ucl.ac.uk>
> Cc: Luc Maranget <luc.maranget@inria.fr>
> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> Cc: Akira Yokosawa <akiyks@gmail.com>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: linux-riscv@lists.infradead.org
> Cc: linux-kernel@vger.kernel.org
> ---
>  arch/riscv/include/asm/spinlock.h | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/riscv/include/asm/spinlock.h b/arch/riscv/include/asm/spinlock.h
> index 2fd27e8ef1fd6..2f89fc62c9196 100644
> --- a/arch/riscv/include/asm/spinlock.h
> +++ b/arch/riscv/include/asm/spinlock.h
> @@ -28,8 +28,9 @@
>  
>  static inline void arch_spin_unlock(arch_spinlock_t *lock)
>  {
> +	RISCV_FENCE(rw,w);
>  	__asm__ __volatile__ (
> -		"amoswap.w.rl x0, x0, %0"
> +		"amoswap.w x0, x0, %0"
>  		: "=A" (lock->lock)
>  		:: "memory");
>  }
> @@ -39,10 +40,11 @@ static inline int arch_spin_trylock(arch_spinlock_t *lock)
>  	int tmp = 1, busy;
>  
>  	__asm__ __volatile__ (
> -		"amoswap.w.aq %0, %2, %1"
> +		"amoswap.w %0, %2, %1"
>  		: "=r" (busy), "+A" (lock->lock)
>  		: "r" (tmp)
>  		: "memory");
> +	RISCV_FENCE(r,rw);
>  
>  	return !busy;
>  }
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock()
@ 2018-02-22 12:44   ` Andrea Parri
  0 siblings, 0 replies; 46+ messages in thread
From: Andrea Parri @ 2018-02-22 12:44 UTC (permalink / raw)
  To: linux-riscv

On Thu, Feb 22, 2018 at 01:19:50PM +0100, Andrea Parri wrote:
> The LKMM defines certain memory ordering constraints for spin_lock(),
> spin_unlock() and other primitives that the kernel developer can rely
> on; unfortunately, some of these constraints are not currently met by
> the RISC-V implementation of spin_lock(), spin_unlock().
> 
> The following MP-like program exemplifies the issue: according to our
> LKMM, program "unlock-lock-read-ordering" below can never reach state
> (1:r0=1 /\ 1:r1=0).  However, when we map this C program to the RISCV
> program "RISCV-unlock-lock-read-ordering" below following the current
> implementation, the corresponding state is reachable according to the
> RISCV specification and its formalizations [2].
> 
> C unlock-lock-read-ordering
> 
> {}
> /* s initially owned by P1 */
> 
> P0(int *x, int *y)
> {
> 	WRITE_ONCE(*x, 1);
> 	smp_wmb();
> 	WRITE_ONCE(*y, 1);
> }
> 
> P1(int *x, int *y, spinlock_t *s)
> {
> 	int r0;
> 	int r1;
> 
> 	r0 = READ_ONCE(*y);
> 	spin_unlock(s);
> 	spin_lock(s);
> 	r1 = READ_ONCE(*y);

This last read should have been from 'x' of course (and similarly in
unlock-lock-write-ordering).  Sorry,

  Andrea


> }
> 
> exists (1:r0=1 /\ 1:r1=0)
> 
> RISCV RISCV-unlock-lock-read-ordering
> {
> 0:x2=x; 0:x4=y;
> 1:x2=y; 1:x4=x; 1:x6=s;
> s=1;
> }
>  P0           |  P1                      ;
>  ori x1,x0,1  | lw x1,0(x2)              ;
>  sw x1,0(x2)  | amoswap.w.rl x0,x0,(x6)  ;
>  fence w,w    | ori x5,x0,1              ;
>  ori x3,x0,1  | amoswap.w.aq x0,x5,(x6)  ;
>  sw x3,0(x4)  | lw x3,0(x4)              ;
> exists
> (1:x1=1 /\ 1:x3=0)
> 
> The issue can in fact be exarcebated if, as envisaged/discussed in [3],
> the LKMM will be modified to become even more "demanding" on the order-
> ing constraints associated to the locking primitives.  For example the
> state (1:r0=1 /\ 1:r1=0) in program "unlock-lock-write-ordering" below
> is currently allowed by LKMM (as is the corresponding state in "RISCV-
> unlock-lock-write-ordering" below).  However, proposals modifying LKMM
> to _forbid_ that state have already appeared on LKML [4].
> 
> C unlock-lock-write-ordering
> 
> {}
> /* s initially owned by P0 */
> 
> P0(int *x, int *y, spinlock_t *s)
> {
> 	WRITE_ONCE(*x, 1);
> 	spin_unlock(s);
> 	spin_lock(s);
> 	WRITE_ONCE(*y, 1);
> }
> 
> P1(int *x, int *y)
> {
> 	int r0;
> 	int r1;
> 
> 	r0 = READ_ONCE(*y);
> 	smp_rmb();
> 	r1 = READ_ONCE(*y);
> }
> 
> exists (1:r0=1 /\ 1:r1=0)
> 
> RISCV RISCV-unlock-lock-write-ordering
> {
> 0:x2=x; 0:x4=y; 0:x6=s;
> 1:x2=y; 1:x4=x;
> s=1;
> }
>  P0                       | P1           ;
>  ori x1,x0,1              | lw x1,0(x2)  ;
>  sw x1,0(x2)              | fence r,r    ;
>  amoswap.w.rl x0,x0,(x6)  | lw x3,0(x4)  ;
>  ori x5,x0,1              |              ;
>  amoswap.w.aq x0,x5,(x6)  |              ;
>  ori x3,x0,1              |              ;
>  sw x3,0(x4)              |              ;
> exists
> (1:x1=1 /\ 1:x3=0)
> 
> [Curiously, RISC-V's current implementations of smp_load_acquire() and
>  smp_store_release() provide way stronger ordering than what currently
>  required by LKMM since those're relying on the generic implementation
>  (c.f, also, [5]). ]
> 
> This RFC fixes the issue by strengthening RISC-V's implementations of
> spin_lock() and spin_unlock(), based on "A spinlock with fences" from
> Section 2.3.5 ("Acquire/Release Ordering") of the RISC-V draft spec.
> It does _not_ attempt to fix read-lock and atomics (for which, AFAICT,
> similar considerations would hold).
> 
> IMPORTANT.  This patch is _NOT_ intended to be applied as is.  Rather,
> this is intended to test the waters, implicit questions being "Should
> we take this direction?" "Are changes to LKMM needed?" (and develop a
> technical discussion on the above issues.)
> 
> [1] https://marc.info/?l=linux-kernel&m=151633436614259&w=2
> [2] https://groups.google.com/a/groups.riscv.org/forum/#!topic/isa-dev/hKywNHBkAXM
> [3] https://marc.info/?l=linux-kernel&m=151181741400461&w=2
> [4] https://marc.info/?l=linux-kernel&m=151871035014425&w=2
> [5] https://marc.info/?l=linux-kernel&m=151912186913692&w=2
> 
> Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
> Cc: Palmer Dabbelt <palmer@sifive.com>
> Cc: Albert Ou <albert@sifive.com>
> Cc: Daniel Lustig <dlustig@nvidia.com>
> Cc: Alan Stern <stern@rowland.harvard.edu>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Boqun Feng <boqun.feng@gmail.com>
> Cc: Nicholas Piggin <npiggin@gmail.com>
> Cc: David Howells <dhowells@redhat.com>
> Cc: Jade Alglave <j.alglave@ucl.ac.uk>
> Cc: Luc Maranget <luc.maranget@inria.fr>
> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> Cc: Akira Yokosawa <akiyks@gmail.com>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: linux-riscv at lists.infradead.org
> Cc: linux-kernel at vger.kernel.org
> ---
>  arch/riscv/include/asm/spinlock.h | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/riscv/include/asm/spinlock.h b/arch/riscv/include/asm/spinlock.h
> index 2fd27e8ef1fd6..2f89fc62c9196 100644
> --- a/arch/riscv/include/asm/spinlock.h
> +++ b/arch/riscv/include/asm/spinlock.h
> @@ -28,8 +28,9 @@
>  
>  static inline void arch_spin_unlock(arch_spinlock_t *lock)
>  {
> +	RISCV_FENCE(rw,w);
>  	__asm__ __volatile__ (
> -		"amoswap.w.rl x0, x0, %0"
> +		"amoswap.w x0, x0, %0"
>  		: "=A" (lock->lock)
>  		:: "memory");
>  }
> @@ -39,10 +40,11 @@ static inline int arch_spin_trylock(arch_spinlock_t *lock)
>  	int tmp = 1, busy;
>  
>  	__asm__ __volatile__ (
> -		"amoswap.w.aq %0, %2, %1"
> +		"amoswap.w %0, %2, %1"
>  		: "=r" (busy), "+A" (lock->lock)
>  		: "r" (tmp)
>  		: "memory");
> +	RISCV_FENCE(r,rw);
>  
>  	return !busy;
>  }
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock()
  2018-02-22 12:19 ` Andrea Parri
@ 2018-02-22 13:40   ` Peter Zijlstra
  -1 siblings, 0 replies; 46+ messages in thread
From: Peter Zijlstra @ 2018-02-22 13:40 UTC (permalink / raw)
  To: Andrea Parri
  Cc: linux-kernel, Palmer Dabbelt, Albert Ou, Daniel Lustig,
	Alan Stern, Will Deacon, Boqun Feng, Nicholas Piggin,
	David Howells, Jade Alglave, Luc Maranget, Paul E. McKenney,
	Akira Yokosawa, Ingo Molnar, Linus Torvalds, linux-riscv

On Thu, Feb 22, 2018 at 01:19:50PM +0100, Andrea Parri wrote:

> C unlock-lock-read-ordering
> 
> {}
> /* s initially owned by P1 */
> 
> P0(int *x, int *y)
> {
> 	WRITE_ONCE(*x, 1);
> 	smp_wmb();
> 	WRITE_ONCE(*y, 1);
> }
> 
> P1(int *x, int *y, spinlock_t *s)
> {
> 	int r0;
> 	int r1;
> 
> 	r0 = READ_ONCE(*y);
> 	spin_unlock(s);
> 	spin_lock(s);
> 	r1 = READ_ONCE(*y);
> }
> 
> exists (1:r0=1 /\ 1:r1=0)
> 
> RISCV RISCV-unlock-lock-read-ordering
> {
> 0:x2=x; 0:x4=y;
> 1:x2=y; 1:x4=x; 1:x6=s;
> s=1;
> }
>  P0           |  P1                      ;
>  ori x1,x0,1  | lw x1,0(x2)              ;
>  sw x1,0(x2)  | amoswap.w.rl x0,x0,(x6)  ;
>  fence w,w    | ori x5,x0,1              ;
>  ori x3,x0,1  | amoswap.w.aq x0,x5,(x6)  ;
>  sw x3,0(x4)  | lw x3,0(x4)              ;
> exists
> (1:x1=1 /\ 1:x3=0)

So I would indeed expect this to be forbidden. Could someone please
explain how this could be allowed?

> C unlock-lock-write-ordering
> 
> {}
> /* s initially owned by P0 */
> 
> P0(int *x, int *y, spinlock_t *s)
> {
> 	WRITE_ONCE(*x, 1);
> 	spin_unlock(s);
> 	spin_lock(s);
> 	WRITE_ONCE(*y, 1);
> }
> 
> P1(int *x, int *y)
> {
> 	int r0;
> 	int r1;
> 
> 	r0 = READ_ONCE(*y);
> 	smp_rmb();
> 	r1 = READ_ONCE(*y);
> }
> 
> exists (1:r0=1 /\ 1:r1=0)
> 
> RISCV RISCV-unlock-lock-write-ordering
> {
> 0:x2=x; 0:x4=y; 0:x6=s;
> 1:x2=y; 1:x4=x;
> s=1;
> }
>  P0                       | P1           ;
>  ori x1,x0,1              | lw x1,0(x2)  ;
>  sw x1,0(x2)              | fence r,r    ;
>  amoswap.w.rl x0,x0,(x6)  | lw x3,0(x4)  ;
>  ori x5,x0,1              |              ;
>  amoswap.w.aq x0,x5,(x6)  |              ;
>  ori x3,x0,1              |              ;
>  sw x3,0(x4)              |              ;
> exists
> (1:x1=1 /\ 1:x3=0)

And here I think the RISCV conversion is flawed, there should be a ctrl
dependency. The second store-word in P0 should depend on the result of
amoswap.w.aq being 0.

(strictly speaking there should be a ctrl-dep in the read example too,
except it'd be pointless for ordering reads, so I accept it being left
out)

Again, I cannot see how this could be allowed.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock()
@ 2018-02-22 13:40   ` Peter Zijlstra
  0 siblings, 0 replies; 46+ messages in thread
From: Peter Zijlstra @ 2018-02-22 13:40 UTC (permalink / raw)
  To: linux-riscv

On Thu, Feb 22, 2018 at 01:19:50PM +0100, Andrea Parri wrote:

> C unlock-lock-read-ordering
> 
> {}
> /* s initially owned by P1 */
> 
> P0(int *x, int *y)
> {
> 	WRITE_ONCE(*x, 1);
> 	smp_wmb();
> 	WRITE_ONCE(*y, 1);
> }
> 
> P1(int *x, int *y, spinlock_t *s)
> {
> 	int r0;
> 	int r1;
> 
> 	r0 = READ_ONCE(*y);
> 	spin_unlock(s);
> 	spin_lock(s);
> 	r1 = READ_ONCE(*y);
> }
> 
> exists (1:r0=1 /\ 1:r1=0)
> 
> RISCV RISCV-unlock-lock-read-ordering
> {
> 0:x2=x; 0:x4=y;
> 1:x2=y; 1:x4=x; 1:x6=s;
> s=1;
> }
>  P0           |  P1                      ;
>  ori x1,x0,1  | lw x1,0(x2)              ;
>  sw x1,0(x2)  | amoswap.w.rl x0,x0,(x6)  ;
>  fence w,w    | ori x5,x0,1              ;
>  ori x3,x0,1  | amoswap.w.aq x0,x5,(x6)  ;
>  sw x3,0(x4)  | lw x3,0(x4)              ;
> exists
> (1:x1=1 /\ 1:x3=0)

So I would indeed expect this to be forbidden. Could someone please
explain how this could be allowed?

> C unlock-lock-write-ordering
> 
> {}
> /* s initially owned by P0 */
> 
> P0(int *x, int *y, spinlock_t *s)
> {
> 	WRITE_ONCE(*x, 1);
> 	spin_unlock(s);
> 	spin_lock(s);
> 	WRITE_ONCE(*y, 1);
> }
> 
> P1(int *x, int *y)
> {
> 	int r0;
> 	int r1;
> 
> 	r0 = READ_ONCE(*y);
> 	smp_rmb();
> 	r1 = READ_ONCE(*y);
> }
> 
> exists (1:r0=1 /\ 1:r1=0)
> 
> RISCV RISCV-unlock-lock-write-ordering
> {
> 0:x2=x; 0:x4=y; 0:x6=s;
> 1:x2=y; 1:x4=x;
> s=1;
> }
>  P0                       | P1           ;
>  ori x1,x0,1              | lw x1,0(x2)  ;
>  sw x1,0(x2)              | fence r,r    ;
>  amoswap.w.rl x0,x0,(x6)  | lw x3,0(x4)  ;
>  ori x5,x0,1              |              ;
>  amoswap.w.aq x0,x5,(x6)  |              ;
>  ori x3,x0,1              |              ;
>  sw x3,0(x4)              |              ;
> exists
> (1:x1=1 /\ 1:x3=0)

And here I think the RISCV conversion is flawed, there should be a ctrl
dependency. The second store-word in P0 should depend on the result of
amoswap.w.aq being 0.

(strictly speaking there should be a ctrl-dep in the read example too,
except it'd be pointless for ordering reads, so I accept it being left
out)

Again, I cannot see how this could be allowed.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock()
  2018-02-22 13:40   ` Peter Zijlstra
@ 2018-02-22 14:12     ` Andrea Parri
  -1 siblings, 0 replies; 46+ messages in thread
From: Andrea Parri @ 2018-02-22 14:12 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Palmer Dabbelt, Albert Ou, Daniel Lustig,
	Alan Stern, Will Deacon, Boqun Feng, Nicholas Piggin,
	David Howells, Jade Alglave, Luc Maranget, Paul E. McKenney,
	Akira Yokosawa, Ingo Molnar, Linus Torvalds, linux-riscv

On Thu, Feb 22, 2018 at 02:40:04PM +0100, Peter Zijlstra wrote:
> On Thu, Feb 22, 2018 at 01:19:50PM +0100, Andrea Parri wrote:
> 
> > C unlock-lock-read-ordering
> > 
> > {}
> > /* s initially owned by P1 */
> > 
> > P0(int *x, int *y)
> > {
> > 	WRITE_ONCE(*x, 1);
> > 	smp_wmb();
> > 	WRITE_ONCE(*y, 1);
> > }
> > 
> > P1(int *x, int *y, spinlock_t *s)
> > {
> > 	int r0;
> > 	int r1;
> > 
> > 	r0 = READ_ONCE(*y);
> > 	spin_unlock(s);
> > 	spin_lock(s);
> > 	r1 = READ_ONCE(*y);
> > }
> > 
> > exists (1:r0=1 /\ 1:r1=0)
> > 
> > RISCV RISCV-unlock-lock-read-ordering
> > {
> > 0:x2=x; 0:x4=y;
> > 1:x2=y; 1:x4=x; 1:x6=s;
> > s=1;
> > }
> >  P0           |  P1                      ;
> >  ori x1,x0,1  | lw x1,0(x2)              ;
> >  sw x1,0(x2)  | amoswap.w.rl x0,x0,(x6)  ;
> >  fence w,w    | ori x5,x0,1              ;
> >  ori x3,x0,1  | amoswap.w.aq x0,x5,(x6)  ;
> >  sw x3,0(x4)  | lw x3,0(x4)              ;
> > exists
> > (1:x1=1 /\ 1:x3=0)
> 
> So I would indeed expect this to be forbidden. Could someone please
> explain how this could be allowed?

As mentioned in IRC, my understanding here is only based on the spec.
referred below and on its (available) formalizations.  I expect that
RISC-V people will be able to provide more information.


> 
> > C unlock-lock-write-ordering
> > 
> > {}
> > /* s initially owned by P0 */
> > 
> > P0(int *x, int *y, spinlock_t *s)
> > {
> > 	WRITE_ONCE(*x, 1);
> > 	spin_unlock(s);
> > 	spin_lock(s);
> > 	WRITE_ONCE(*y, 1);
> > }
> > 
> > P1(int *x, int *y)
> > {
> > 	int r0;
> > 	int r1;
> > 
> > 	r0 = READ_ONCE(*y);
> > 	smp_rmb();
> > 	r1 = READ_ONCE(*y);
> > }
> > 
> > exists (1:r0=1 /\ 1:r1=0)
> > 
> > RISCV RISCV-unlock-lock-write-ordering
> > {
> > 0:x2=x; 0:x4=y; 0:x6=s;
> > 1:x2=y; 1:x4=x;
> > s=1;
> > }
> >  P0                       | P1           ;
> >  ori x1,x0,1              | lw x1,0(x2)  ;
> >  sw x1,0(x2)              | fence r,r    ;
> >  amoswap.w.rl x0,x0,(x6)  | lw x3,0(x4)  ;
> >  ori x5,x0,1              |              ;
> >  amoswap.w.aq x0,x5,(x6)  |              ;
> >  ori x3,x0,1              |              ;
> >  sw x3,0(x4)              |              ;
> > exists
> > (1:x1=1 /\ 1:x3=0)
> 
> And here I think the RISCV conversion is flawed, there should be a ctrl
> dependency. The second store-word in P0 should depend on the result of
> amoswap.w.aq being 0.

You're right: AFAICT, this can be remedied by inserting "beq x0,x5,FAIL00"
right after amoswap.w.aq (and this label at the end of P0); this does not
change the verdict of the available formalizations reported above however.

(So, AFAICT, the above question remains valid/open.)

  Andrea


> 
> (strictly speaking there should be a ctrl-dep in the read example too,
> except it'd be pointless for ordering reads, so I accept it being left
> out)
> 
> Again, I cannot see how this could be allowed.
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock()
@ 2018-02-22 14:12     ` Andrea Parri
  0 siblings, 0 replies; 46+ messages in thread
From: Andrea Parri @ 2018-02-22 14:12 UTC (permalink / raw)
  To: linux-riscv

On Thu, Feb 22, 2018 at 02:40:04PM +0100, Peter Zijlstra wrote:
> On Thu, Feb 22, 2018 at 01:19:50PM +0100, Andrea Parri wrote:
> 
> > C unlock-lock-read-ordering
> > 
> > {}
> > /* s initially owned by P1 */
> > 
> > P0(int *x, int *y)
> > {
> > 	WRITE_ONCE(*x, 1);
> > 	smp_wmb();
> > 	WRITE_ONCE(*y, 1);
> > }
> > 
> > P1(int *x, int *y, spinlock_t *s)
> > {
> > 	int r0;
> > 	int r1;
> > 
> > 	r0 = READ_ONCE(*y);
> > 	spin_unlock(s);
> > 	spin_lock(s);
> > 	r1 = READ_ONCE(*y);
> > }
> > 
> > exists (1:r0=1 /\ 1:r1=0)
> > 
> > RISCV RISCV-unlock-lock-read-ordering
> > {
> > 0:x2=x; 0:x4=y;
> > 1:x2=y; 1:x4=x; 1:x6=s;
> > s=1;
> > }
> >  P0           |  P1                      ;
> >  ori x1,x0,1  | lw x1,0(x2)              ;
> >  sw x1,0(x2)  | amoswap.w.rl x0,x0,(x6)  ;
> >  fence w,w    | ori x5,x0,1              ;
> >  ori x3,x0,1  | amoswap.w.aq x0,x5,(x6)  ;
> >  sw x3,0(x4)  | lw x3,0(x4)              ;
> > exists
> > (1:x1=1 /\ 1:x3=0)
> 
> So I would indeed expect this to be forbidden. Could someone please
> explain how this could be allowed?

As mentioned in IRC, my understanding here is only based on the spec.
referred below and on its (available) formalizations.  I expect that
RISC-V people will be able to provide more information.


> 
> > C unlock-lock-write-ordering
> > 
> > {}
> > /* s initially owned by P0 */
> > 
> > P0(int *x, int *y, spinlock_t *s)
> > {
> > 	WRITE_ONCE(*x, 1);
> > 	spin_unlock(s);
> > 	spin_lock(s);
> > 	WRITE_ONCE(*y, 1);
> > }
> > 
> > P1(int *x, int *y)
> > {
> > 	int r0;
> > 	int r1;
> > 
> > 	r0 = READ_ONCE(*y);
> > 	smp_rmb();
> > 	r1 = READ_ONCE(*y);
> > }
> > 
> > exists (1:r0=1 /\ 1:r1=0)
> > 
> > RISCV RISCV-unlock-lock-write-ordering
> > {
> > 0:x2=x; 0:x4=y; 0:x6=s;
> > 1:x2=y; 1:x4=x;
> > s=1;
> > }
> >  P0                       | P1           ;
> >  ori x1,x0,1              | lw x1,0(x2)  ;
> >  sw x1,0(x2)              | fence r,r    ;
> >  amoswap.w.rl x0,x0,(x6)  | lw x3,0(x4)  ;
> >  ori x5,x0,1              |              ;
> >  amoswap.w.aq x0,x5,(x6)  |              ;
> >  ori x3,x0,1              |              ;
> >  sw x3,0(x4)              |              ;
> > exists
> > (1:x1=1 /\ 1:x3=0)
> 
> And here I think the RISCV conversion is flawed, there should be a ctrl
> dependency. The second store-word in P0 should depend on the result of
> amoswap.w.aq being 0.

You're right: AFAICT, this can be remedied by inserting "beq x0,x5,FAIL00"
right after amoswap.w.aq (and this label at the end of P0); this does not
change the verdict of the available formalizations reported above however.

(So, AFAICT, the above question remains valid/open.)

  Andrea


> 
> (strictly speaking there should be a ctrl-dep in the read example too,
> except it'd be pointless for ordering reads, so I accept it being left
> out)
> 
> Again, I cannot see how this could be allowed.
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock()
  2018-02-22 14:12     ` Andrea Parri
@ 2018-02-22 17:27       ` Daniel Lustig
  -1 siblings, 0 replies; 46+ messages in thread
From: Daniel Lustig @ 2018-02-22 17:27 UTC (permalink / raw)
  To: Andrea Parri, Peter Zijlstra
  Cc: linux-kernel, Palmer Dabbelt, Albert Ou, Alan Stern, Will Deacon,
	Boqun Feng, Nicholas Piggin, David Howells, Jade Alglave,
	Luc Maranget, Paul E. McKenney, Akira Yokosawa, Ingo Molnar,
	Linus Torvalds, linux-riscv

On 2/22/2018 6:12 AM, Andrea Parri wrote:
> On Thu, Feb 22, 2018 at 02:40:04PM +0100, Peter Zijlstra wrote:
>> On Thu, Feb 22, 2018 at 01:19:50PM +0100, Andrea Parri wrote:
>>
>>> C unlock-lock-read-ordering
>>>
>>> {}
>>> /* s initially owned by P1 */
>>>
>>> P0(int *x, int *y)
>>> {
>>> 	WRITE_ONCE(*x, 1);
>>> 	smp_wmb();
>>> 	WRITE_ONCE(*y, 1);
>>> }
>>>
>>> P1(int *x, int *y, spinlock_t *s)
>>> {
>>> 	int r0;
>>> 	int r1;
>>>
>>> 	r0 = READ_ONCE(*y);
>>> 	spin_unlock(s);
>>> 	spin_lock(s);
>>> 	r1 = READ_ONCE(*y);
>>> }
>>>
>>> exists (1:r0=1 /\ 1:r1=0)
>>>
>>> RISCV RISCV-unlock-lock-read-ordering
>>> {
>>> 0:x2=x; 0:x4=y;
>>> 1:x2=y; 1:x4=x; 1:x6=s;
>>> s=1;
>>> }
>>>  P0           |  P1                      ;
>>>  ori x1,x0,1  | lw x1,0(x2)              ;
>>>  sw x1,0(x2)  | amoswap.w.rl x0,x0,(x6)  ;
>>>  fence w,w    | ori x5,x0,1              ;
>>>  ori x3,x0,1  | amoswap.w.aq x0,x5,(x6)  ;
>>>  sw x3,0(x4)  | lw x3,0(x4)              ;
>>> exists
>>> (1:x1=1 /\ 1:x3=0)
>>
>> So I would indeed expect this to be forbidden. Could someone please
>> explain how this could be allowed?
> 
> As mentioned in IRC, my understanding here is only based on the spec.
> referred below and on its (available) formalizations.  I expect that
> RISC-V people will be able to provide more information.

I think an RCpc interpretation of .aq and .rl would in fact allow
the two normal loads in P1 to be reordered.  Wouldn't it?  Does that
go against what the LKMM requires?

The intuition would be that the amoswap.w.aq can forward from the
amoswap.w.rl while that's still in the store buffer, and then the
lw x3,0(x4) can also perform while the amoswap.w.rl is still in
the store buffer, all before the l1 x1,0(x2) executes.  That's
not forbidden unless the amoswaps are RCsc, unless I'm missing
something.

Likewise even if the unlock()/lock() is between two stores.  A
control dependency might originate from the load part of the
amoswap.w.aq, but there still would have to be something to
ensure that this load part in fact performs after the store
part of the amoswap.w.rl performs globally, and that's not
automatic under RCpc.

In any case, our concern that amoswap.w.rl and amoswap.w.aq might
not actually be sufficient for spin_unlock() and spin_lock() was
what prompted us to start our own internal discussion on this.

FWIW we have made a few clarifications to our spec that haven't
propagated to the public upstream yet, but that should be happening
soon.  In any case as it relates to the questions here, it's
more a matter of us deciding what we should put into the spec in
the end than it is a matter of checking the current text.  In
other words, we're still finalizing things, and there's still
some time to tweak as necessary.

>>
>>> C unlock-lock-write-ordering
>>>
>>> {}
>>> /* s initially owned by P0 */
>>>
>>> P0(int *x, int *y, spinlock_t *s)
>>> {
>>> 	WRITE_ONCE(*x, 1);
>>> 	spin_unlock(s);
>>> 	spin_lock(s);
>>> 	WRITE_ONCE(*y, 1);
>>> }
>>>
>>> P1(int *x, int *y)
>>> {
>>> 	int r0;
>>> 	int r1;
>>>
>>> 	r0 = READ_ONCE(*y);
>>> 	smp_rmb();
>>> 	r1 = READ_ONCE(*y);
>>> }
>>>
>>> exists (1:r0=1 /\ 1:r1=0)
>>>
>>> RISCV RISCV-unlock-lock-write-ordering
>>> {
>>> 0:x2=x; 0:x4=y; 0:x6=s;
>>> 1:x2=y; 1:x4=x;
>>> s=1;
>>> }
>>>  P0                       | P1           ;
>>>  ori x1,x0,1              | lw x1,0(x2)  ;
>>>  sw x1,0(x2)              | fence r,r    ;
>>>  amoswap.w.rl x0,x0,(x6)  | lw x3,0(x4)  ;
>>>  ori x5,x0,1              |              ;
>>>  amoswap.w.aq x0,x5,(x6)  |              ;
>>>  ori x3,x0,1              |              ;
>>>  sw x3,0(x4)              |              ;
>>> exists
>>> (1:x1=1 /\ 1:x3=0)
>>
>> And here I think the RISCV conversion is flawed, there should be a ctrl
>> dependency. The second store-word in P0 should depend on the result of
>> amoswap.w.aq being 0.
> 
> You're right: AFAICT, this can be remedied by inserting "beq x0,x5,FAIL00"
> right after amoswap.w.aq (and this label at the end of P0); this does not
> change the verdict of the available formalizations reported above however.
> 
> (So, AFAICT, the above question remains valid/open.)

This makes sense, but one other question: Paul mentioned earlier in the
thread that

> The PowerPC lock implementation's unlock-lock pair does not order writes
> from the previous critical section against reads from the later critical
> section, but it does order other combinations of reads and writes.

My understanding is that the same logic about control dependencies applies
here too, right?  In other words, in spite of Peter's claim, the control
dependency doesn't automatically fix this for RISC-V or for PowerPC?

> 
>   Andrea
> 
> 
>>
>> (strictly speaking there should be a ctrl-dep in the read example too,
>> except it'd be pointless for ordering reads, so I accept it being left
>> out)
>>
>> Again, I cannot see how this could be allowed.
>>

Peter, in this test you mentioned earlier:

P0()
{
	WRITE_ONCE(x, 1);
	smp_store_release(&y, 1);
	r0 = smp_load_acquire(&y);
	WRITE_ONCE(z, 1);
}

P1()
{
	r1 = READ_ONCE(z);
	smp_rmb();
	r2 = READ_ONCE(x);
}

exists: r0 == 1 /\ r1==1 /\ r2==0

You expect this to be forbidden too, even if the release and acquire
are RCpc?  This part I don't follow.  There's no conditional branch
here to enforce a control dependency.  I get and I agree that
smp_store_release/smp_load_acquire are different than spin_unlock/
spin_lock, but isn't that even more reason to allow this one if
smp_store_release/smp_load_acquire are RCpc without question?

Thanks,
Dan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock()
@ 2018-02-22 17:27       ` Daniel Lustig
  0 siblings, 0 replies; 46+ messages in thread
From: Daniel Lustig @ 2018-02-22 17:27 UTC (permalink / raw)
  To: linux-riscv

On 2/22/2018 6:12 AM, Andrea Parri wrote:
> On Thu, Feb 22, 2018 at 02:40:04PM +0100, Peter Zijlstra wrote:
>> On Thu, Feb 22, 2018 at 01:19:50PM +0100, Andrea Parri wrote:
>>
>>> C unlock-lock-read-ordering
>>>
>>> {}
>>> /* s initially owned by P1 */
>>>
>>> P0(int *x, int *y)
>>> {
>>> 	WRITE_ONCE(*x, 1);
>>> 	smp_wmb();
>>> 	WRITE_ONCE(*y, 1);
>>> }
>>>
>>> P1(int *x, int *y, spinlock_t *s)
>>> {
>>> 	int r0;
>>> 	int r1;
>>>
>>> 	r0 = READ_ONCE(*y);
>>> 	spin_unlock(s);
>>> 	spin_lock(s);
>>> 	r1 = READ_ONCE(*y);
>>> }
>>>
>>> exists (1:r0=1 /\ 1:r1=0)
>>>
>>> RISCV RISCV-unlock-lock-read-ordering
>>> {
>>> 0:x2=x; 0:x4=y;
>>> 1:x2=y; 1:x4=x; 1:x6=s;
>>> s=1;
>>> }
>>>  P0           |  P1                      ;
>>>  ori x1,x0,1  | lw x1,0(x2)              ;
>>>  sw x1,0(x2)  | amoswap.w.rl x0,x0,(x6)  ;
>>>  fence w,w    | ori x5,x0,1              ;
>>>  ori x3,x0,1  | amoswap.w.aq x0,x5,(x6)  ;
>>>  sw x3,0(x4)  | lw x3,0(x4)              ;
>>> exists
>>> (1:x1=1 /\ 1:x3=0)
>>
>> So I would indeed expect this to be forbidden. Could someone please
>> explain how this could be allowed?
> 
> As mentioned in IRC, my understanding here is only based on the spec.
> referred below and on its (available) formalizations.  I expect that
> RISC-V people will be able to provide more information.

I think an RCpc interpretation of .aq and .rl would in fact allow
the two normal loads in P1 to be reordered.  Wouldn't it?  Does that
go against what the LKMM requires?

The intuition would be that the amoswap.w.aq can forward from the
amoswap.w.rl while that's still in the store buffer, and then the
lw x3,0(x4) can also perform while the amoswap.w.rl is still in
the store buffer, all before the l1 x1,0(x2) executes.  That's
not forbidden unless the amoswaps are RCsc, unless I'm missing
something.

Likewise even if the unlock()/lock() is between two stores.  A
control dependency might originate from the load part of the
amoswap.w.aq, but there still would have to be something to
ensure that this load part in fact performs after the store
part of the amoswap.w.rl performs globally, and that's not
automatic under RCpc.

In any case, our concern that amoswap.w.rl and amoswap.w.aq might
not actually be sufficient for spin_unlock() and spin_lock() was
what prompted us to start our own internal discussion on this.

FWIW we have made a few clarifications to our spec that haven't
propagated to the public upstream yet, but that should be happening
soon.  In any case as it relates to the questions here, it's
more a matter of us deciding what we should put into the spec in
the end than it is a matter of checking the current text.  In
other words, we're still finalizing things, and there's still
some time to tweak as necessary.

>>
>>> C unlock-lock-write-ordering
>>>
>>> {}
>>> /* s initially owned by P0 */
>>>
>>> P0(int *x, int *y, spinlock_t *s)
>>> {
>>> 	WRITE_ONCE(*x, 1);
>>> 	spin_unlock(s);
>>> 	spin_lock(s);
>>> 	WRITE_ONCE(*y, 1);
>>> }
>>>
>>> P1(int *x, int *y)
>>> {
>>> 	int r0;
>>> 	int r1;
>>>
>>> 	r0 = READ_ONCE(*y);
>>> 	smp_rmb();
>>> 	r1 = READ_ONCE(*y);
>>> }
>>>
>>> exists (1:r0=1 /\ 1:r1=0)
>>>
>>> RISCV RISCV-unlock-lock-write-ordering
>>> {
>>> 0:x2=x; 0:x4=y; 0:x6=s;
>>> 1:x2=y; 1:x4=x;
>>> s=1;
>>> }
>>>  P0                       | P1           ;
>>>  ori x1,x0,1              | lw x1,0(x2)  ;
>>>  sw x1,0(x2)              | fence r,r    ;
>>>  amoswap.w.rl x0,x0,(x6)  | lw x3,0(x4)  ;
>>>  ori x5,x0,1              |              ;
>>>  amoswap.w.aq x0,x5,(x6)  |              ;
>>>  ori x3,x0,1              |              ;
>>>  sw x3,0(x4)              |              ;
>>> exists
>>> (1:x1=1 /\ 1:x3=0)
>>
>> And here I think the RISCV conversion is flawed, there should be a ctrl
>> dependency. The second store-word in P0 should depend on the result of
>> amoswap.w.aq being 0.
> 
> You're right: AFAICT, this can be remedied by inserting "beq x0,x5,FAIL00"
> right after amoswap.w.aq (and this label at the end of P0); this does not
> change the verdict of the available formalizations reported above however.
> 
> (So, AFAICT, the above question remains valid/open.)

This makes sense, but one other question: Paul mentioned earlier in the
thread that

> The PowerPC lock implementation's unlock-lock pair does not order writes
> from the previous critical section against reads from the later critical
> section, but it does order other combinations of reads and writes.

My understanding is that the same logic about control dependencies applies
here too, right?  In other words, in spite of Peter's claim, the control
dependency doesn't automatically fix this for RISC-V or for PowerPC?

> 
>   Andrea
> 
> 
>>
>> (strictly speaking there should be a ctrl-dep in the read example too,
>> except it'd be pointless for ordering reads, so I accept it being left
>> out)
>>
>> Again, I cannot see how this could be allowed.
>>

Peter, in this test you mentioned earlier:

P0()
{
	WRITE_ONCE(x, 1);
	smp_store_release(&y, 1);
	r0 = smp_load_acquire(&y);
	WRITE_ONCE(z, 1);
}

P1()
{
	r1 = READ_ONCE(z);
	smp_rmb();
	r2 = READ_ONCE(x);
}

exists: r0 == 1 /\ r1==1 /\ r2==0

You expect this to be forbidden too, even if the release and acquire
are RCpc?  This part I don't follow.  There's no conditional branch
here to enforce a control dependency.  I get and I agree that
smp_store_release/smp_load_acquire are different than spin_unlock/
spin_lock, but isn't that even more reason to allow this one if
smp_store_release/smp_load_acquire are RCpc without question?

Thanks,
Dan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock()
  2018-02-22 17:27       ` Daniel Lustig
@ 2018-02-22 18:13         ` Paul E. McKenney
  -1 siblings, 0 replies; 46+ messages in thread
From: Paul E. McKenney @ 2018-02-22 18:13 UTC (permalink / raw)
  To: Daniel Lustig
  Cc: Andrea Parri, Peter Zijlstra, linux-kernel, Palmer Dabbelt,
	Albert Ou, Alan Stern, Will Deacon, Boqun Feng, Nicholas Piggin,
	David Howells, Jade Alglave, Luc Maranget, Akira Yokosawa,
	Ingo Molnar, Linus Torvalds, linux-riscv

On Thu, Feb 22, 2018 at 09:27:09AM -0800, Daniel Lustig wrote:
> On 2/22/2018 6:12 AM, Andrea Parri wrote:
> > On Thu, Feb 22, 2018 at 02:40:04PM +0100, Peter Zijlstra wrote:
> >> On Thu, Feb 22, 2018 at 01:19:50PM +0100, Andrea Parri wrote:
> >>
> >>> C unlock-lock-read-ordering
> >>>
> >>> {}
> >>> /* s initially owned by P1 */
> >>>
> >>> P0(int *x, int *y)
> >>> {
> >>> 	WRITE_ONCE(*x, 1);
> >>> 	smp_wmb();
> >>> 	WRITE_ONCE(*y, 1);
> >>> }
> >>>
> >>> P1(int *x, int *y, spinlock_t *s)
> >>> {
> >>> 	int r0;
> >>> 	int r1;
> >>>
> >>> 	r0 = READ_ONCE(*y);
> >>> 	spin_unlock(s);
> >>> 	spin_lock(s);
> >>> 	r1 = READ_ONCE(*y);
> >>> }
> >>>
> >>> exists (1:r0=1 /\ 1:r1=0)
> >>>
> >>> RISCV RISCV-unlock-lock-read-ordering
> >>> {
> >>> 0:x2=x; 0:x4=y;
> >>> 1:x2=y; 1:x4=x; 1:x6=s;
> >>> s=1;
> >>> }
> >>>  P0           |  P1                      ;
> >>>  ori x1,x0,1  | lw x1,0(x2)              ;
> >>>  sw x1,0(x2)  | amoswap.w.rl x0,x0,(x6)  ;
> >>>  fence w,w    | ori x5,x0,1              ;
> >>>  ori x3,x0,1  | amoswap.w.aq x0,x5,(x6)  ;
> >>>  sw x3,0(x4)  | lw x3,0(x4)              ;
> >>> exists
> >>> (1:x1=1 /\ 1:x3=0)
> >>
> >> So I would indeed expect this to be forbidden. Could someone please
> >> explain how this could be allowed?
> > 
> > As mentioned in IRC, my understanding here is only based on the spec.
> > referred below and on its (available) formalizations.  I expect that
> > RISC-V people will be able to provide more information.
> 
> I think an RCpc interpretation of .aq and .rl would in fact allow
> the two normal loads in P1 to be reordered.  Wouldn't it?  Does that
> go against what the LKMM requires?
> 
> The intuition would be that the amoswap.w.aq can forward from the
> amoswap.w.rl while that's still in the store buffer, and then the
> lw x3,0(x4) can also perform while the amoswap.w.rl is still in
> the store buffer, all before the l1 x1,0(x2) executes.  That's
> not forbidden unless the amoswaps are RCsc, unless I'm missing
> something.
> 
> Likewise even if the unlock()/lock() is between two stores.  A
> control dependency might originate from the load part of the
> amoswap.w.aq, but there still would have to be something to
> ensure that this load part in fact performs after the store
> part of the amoswap.w.rl performs globally, and that's not
> automatic under RCpc.
> 
> In any case, our concern that amoswap.w.rl and amoswap.w.aq might
> not actually be sufficient for spin_unlock() and spin_lock() was
> what prompted us to start our own internal discussion on this.
> 
> FWIW we have made a few clarifications to our spec that haven't
> propagated to the public upstream yet, but that should be happening
> soon.  In any case as it relates to the questions here, it's
> more a matter of us deciding what we should put into the spec in
> the end than it is a matter of checking the current text.  In
> other words, we're still finalizing things, and there's still
> some time to tweak as necessary.
> 
> >>
> >>> C unlock-lock-write-ordering
> >>>
> >>> {}
> >>> /* s initially owned by P0 */
> >>>
> >>> P0(int *x, int *y, spinlock_t *s)
> >>> {
> >>> 	WRITE_ONCE(*x, 1);
> >>> 	spin_unlock(s);
> >>> 	spin_lock(s);
> >>> 	WRITE_ONCE(*y, 1);
> >>> }
> >>>
> >>> P1(int *x, int *y)
> >>> {
> >>> 	int r0;
> >>> 	int r1;
> >>>
> >>> 	r0 = READ_ONCE(*y);
> >>> 	smp_rmb();
> >>> 	r1 = READ_ONCE(*y);
> >>> }
> >>>
> >>> exists (1:r0=1 /\ 1:r1=0)
> >>>
> >>> RISCV RISCV-unlock-lock-write-ordering
> >>> {
> >>> 0:x2=x; 0:x4=y; 0:x6=s;
> >>> 1:x2=y; 1:x4=x;
> >>> s=1;
> >>> }
> >>>  P0                       | P1           ;
> >>>  ori x1,x0,1              | lw x1,0(x2)  ;
> >>>  sw x1,0(x2)              | fence r,r    ;
> >>>  amoswap.w.rl x0,x0,(x6)  | lw x3,0(x4)  ;
> >>>  ori x5,x0,1              |              ;
> >>>  amoswap.w.aq x0,x5,(x6)  |              ;
> >>>  ori x3,x0,1              |              ;
> >>>  sw x3,0(x4)              |              ;
> >>> exists
> >>> (1:x1=1 /\ 1:x3=0)
> >>
> >> And here I think the RISCV conversion is flawed, there should be a ctrl
> >> dependency. The second store-word in P0 should depend on the result of
> >> amoswap.w.aq being 0.
> > 
> > You're right: AFAICT, this can be remedied by inserting "beq x0,x5,FAIL00"
> > right after amoswap.w.aq (and this label at the end of P0); this does not
> > change the verdict of the available formalizations reported above however.
> > 
> > (So, AFAICT, the above question remains valid/open.)
> 
> This makes sense, but one other question: Paul mentioned earlier in the
> thread that
> 
> > The PowerPC lock implementation's unlock-lock pair does not order writes
> > from the previous critical section against reads from the later critical
> > section, but it does order other combinations of reads and writes.
> 
> My understanding is that the same logic about control dependencies applies
> here too, right?  In other words, in spite of Peter's claim, the control
> dependency doesn't automatically fix this for RISC-V or for PowerPC?

For PowerPC, an lwsync instruction is used, which means that an external
obsserver can see a write from an earlier critical section being
reordered with a read from a later critical section, but no other
combination of reads and writes can be so reordered.

I will dig up PowerPC litmus tests demonstrating this and send them
along.

> >> (strictly speaking there should be a ctrl-dep in the read example too,
> >> except it'd be pointless for ordering reads, so I accept it being left
> >> out)
> >>
> >> Again, I cannot see how this could be allowed.
> >>
> 
> Peter, in this test you mentioned earlier:
> 
> P0()
> {
> 	WRITE_ONCE(x, 1);
> 	smp_store_release(&y, 1);
> 	r0 = smp_load_acquire(&y);
> 	WRITE_ONCE(z, 1);
> }
> 
> P1()
> {
> 	r1 = READ_ONCE(z);
> 	smp_rmb();
> 	r2 = READ_ONCE(x);
> }
> 
> exists: r0 == 1 /\ r1==1 /\ r2==0
> 
> You expect this to be forbidden too, even if the release and acquire
> are RCpc?  This part I don't follow.  There's no conditional branch
> here to enforce a control dependency.  I get and I agree that
> smp_store_release/smp_load_acquire are different than spin_unlock/
> spin_lock, but isn't that even more reason to allow this one if
> smp_store_release/smp_load_acquire are RCpc without question?

Here is the above test, reworked a bit to allow herd to accept it:

------------------------------------------------------------------------

C C-peterz

{
}

P0(int *x, int *y, int *z)
{
	int r0;

	WRITE_ONCE(*x, 1);
	smp_store_release(y, 1);
	r0 = smp_load_acquire(y);
	WRITE_ONCE(*z, 1);
}

P1(int *x, int *y, int *z)
{
	int r1;
	int r2;

	r1 = READ_ONCE(*z);
	smp_rmb();
	r2 = READ_ONCE(*x);
}

exists (0:r0 == 1 /\ 1:r1==1 /\ 1:r2==0)

------------------------------------------------------------------------

Please let me know if I messed up the conversion.  On the perhaps unlikely
chance that I got it right, here is what the current version of the
Linux-kernel memory model says:

------------------------------------------------------------------------

$ herd7 -conf linux-kernel.cfg /tmp/C-peterz.litmus 
Test C-peterz Allowed
States 4
0:r0=1; 1:r1=0; 1:r2=0;
0:r0=1; 1:r1=0; 1:r2=1;
0:r0=1; 1:r1=1; 1:r2=0;
0:r0=1; 1:r1=1; 1:r2=1;
Ok
Witnesses
Positive: 1 Negative: 3
Condition exists (0:r0=1 /\ 1:r1=1 /\ 1:r2=0)
Observation C-peterz Sometimes 1 3
Time C-peterz 0.01
Hash=6c4f7f8a8dc750fd5d706d6f3c104360

------------------------------------------------------------------------

So the current Linux-kernel memory model does in fact allow this, as
you say.  But again, there is a proposal to tighten the Linux-kernel
memory model based on the pre-RISC-V set of architectures running the
Linux kernel, all of which would forbid this test.

So we have something that is not all that rare in the Linux kernel
community, namely two conflicting more-or-less concurrent changes.
This clearly needs to be resolved, either by us not strengthening the
Linux-kernel memory model in the way we were planning to or by you
strengthening RISC-V to be no weaker than PowerPC for these sorts of
externally viewed release-acquire situations.

Other thoughts?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock()
@ 2018-02-22 18:13         ` Paul E. McKenney
  0 siblings, 0 replies; 46+ messages in thread
From: Paul E. McKenney @ 2018-02-22 18:13 UTC (permalink / raw)
  To: linux-riscv

On Thu, Feb 22, 2018 at 09:27:09AM -0800, Daniel Lustig wrote:
> On 2/22/2018 6:12 AM, Andrea Parri wrote:
> > On Thu, Feb 22, 2018 at 02:40:04PM +0100, Peter Zijlstra wrote:
> >> On Thu, Feb 22, 2018 at 01:19:50PM +0100, Andrea Parri wrote:
> >>
> >>> C unlock-lock-read-ordering
> >>>
> >>> {}
> >>> /* s initially owned by P1 */
> >>>
> >>> P0(int *x, int *y)
> >>> {
> >>> 	WRITE_ONCE(*x, 1);
> >>> 	smp_wmb();
> >>> 	WRITE_ONCE(*y, 1);
> >>> }
> >>>
> >>> P1(int *x, int *y, spinlock_t *s)
> >>> {
> >>> 	int r0;
> >>> 	int r1;
> >>>
> >>> 	r0 = READ_ONCE(*y);
> >>> 	spin_unlock(s);
> >>> 	spin_lock(s);
> >>> 	r1 = READ_ONCE(*y);
> >>> }
> >>>
> >>> exists (1:r0=1 /\ 1:r1=0)
> >>>
> >>> RISCV RISCV-unlock-lock-read-ordering
> >>> {
> >>> 0:x2=x; 0:x4=y;
> >>> 1:x2=y; 1:x4=x; 1:x6=s;
> >>> s=1;
> >>> }
> >>>  P0           |  P1                      ;
> >>>  ori x1,x0,1  | lw x1,0(x2)              ;
> >>>  sw x1,0(x2)  | amoswap.w.rl x0,x0,(x6)  ;
> >>>  fence w,w    | ori x5,x0,1              ;
> >>>  ori x3,x0,1  | amoswap.w.aq x0,x5,(x6)  ;
> >>>  sw x3,0(x4)  | lw x3,0(x4)              ;
> >>> exists
> >>> (1:x1=1 /\ 1:x3=0)
> >>
> >> So I would indeed expect this to be forbidden. Could someone please
> >> explain how this could be allowed?
> > 
> > As mentioned in IRC, my understanding here is only based on the spec.
> > referred below and on its (available) formalizations.  I expect that
> > RISC-V people will be able to provide more information.
> 
> I think an RCpc interpretation of .aq and .rl would in fact allow
> the two normal loads in P1 to be reordered.  Wouldn't it?  Does that
> go against what the LKMM requires?
> 
> The intuition would be that the amoswap.w.aq can forward from the
> amoswap.w.rl while that's still in the store buffer, and then the
> lw x3,0(x4) can also perform while the amoswap.w.rl is still in
> the store buffer, all before the l1 x1,0(x2) executes.  That's
> not forbidden unless the amoswaps are RCsc, unless I'm missing
> something.
> 
> Likewise even if the unlock()/lock() is between two stores.  A
> control dependency might originate from the load part of the
> amoswap.w.aq, but there still would have to be something to
> ensure that this load part in fact performs after the store
> part of the amoswap.w.rl performs globally, and that's not
> automatic under RCpc.
> 
> In any case, our concern that amoswap.w.rl and amoswap.w.aq might
> not actually be sufficient for spin_unlock() and spin_lock() was
> what prompted us to start our own internal discussion on this.
> 
> FWIW we have made a few clarifications to our spec that haven't
> propagated to the public upstream yet, but that should be happening
> soon.  In any case as it relates to the questions here, it's
> more a matter of us deciding what we should put into the spec in
> the end than it is a matter of checking the current text.  In
> other words, we're still finalizing things, and there's still
> some time to tweak as necessary.
> 
> >>
> >>> C unlock-lock-write-ordering
> >>>
> >>> {}
> >>> /* s initially owned by P0 */
> >>>
> >>> P0(int *x, int *y, spinlock_t *s)
> >>> {
> >>> 	WRITE_ONCE(*x, 1);
> >>> 	spin_unlock(s);
> >>> 	spin_lock(s);
> >>> 	WRITE_ONCE(*y, 1);
> >>> }
> >>>
> >>> P1(int *x, int *y)
> >>> {
> >>> 	int r0;
> >>> 	int r1;
> >>>
> >>> 	r0 = READ_ONCE(*y);
> >>> 	smp_rmb();
> >>> 	r1 = READ_ONCE(*y);
> >>> }
> >>>
> >>> exists (1:r0=1 /\ 1:r1=0)
> >>>
> >>> RISCV RISCV-unlock-lock-write-ordering
> >>> {
> >>> 0:x2=x; 0:x4=y; 0:x6=s;
> >>> 1:x2=y; 1:x4=x;
> >>> s=1;
> >>> }
> >>>  P0                       | P1           ;
> >>>  ori x1,x0,1              | lw x1,0(x2)  ;
> >>>  sw x1,0(x2)              | fence r,r    ;
> >>>  amoswap.w.rl x0,x0,(x6)  | lw x3,0(x4)  ;
> >>>  ori x5,x0,1              |              ;
> >>>  amoswap.w.aq x0,x5,(x6)  |              ;
> >>>  ori x3,x0,1              |              ;
> >>>  sw x3,0(x4)              |              ;
> >>> exists
> >>> (1:x1=1 /\ 1:x3=0)
> >>
> >> And here I think the RISCV conversion is flawed, there should be a ctrl
> >> dependency. The second store-word in P0 should depend on the result of
> >> amoswap.w.aq being 0.
> > 
> > You're right: AFAICT, this can be remedied by inserting "beq x0,x5,FAIL00"
> > right after amoswap.w.aq (and this label at the end of P0); this does not
> > change the verdict of the available formalizations reported above however.
> > 
> > (So, AFAICT, the above question remains valid/open.)
> 
> This makes sense, but one other question: Paul mentioned earlier in the
> thread that
> 
> > The PowerPC lock implementation's unlock-lock pair does not order writes
> > from the previous critical section against reads from the later critical
> > section, but it does order other combinations of reads and writes.
> 
> My understanding is that the same logic about control dependencies applies
> here too, right?  In other words, in spite of Peter's claim, the control
> dependency doesn't automatically fix this for RISC-V or for PowerPC?

For PowerPC, an lwsync instruction is used, which means that an external
obsserver can see a write from an earlier critical section being
reordered with a read from a later critical section, but no other
combination of reads and writes can be so reordered.

I will dig up PowerPC litmus tests demonstrating this and send them
along.

> >> (strictly speaking there should be a ctrl-dep in the read example too,
> >> except it'd be pointless for ordering reads, so I accept it being left
> >> out)
> >>
> >> Again, I cannot see how this could be allowed.
> >>
> 
> Peter, in this test you mentioned earlier:
> 
> P0()
> {
> 	WRITE_ONCE(x, 1);
> 	smp_store_release(&y, 1);
> 	r0 = smp_load_acquire(&y);
> 	WRITE_ONCE(z, 1);
> }
> 
> P1()
> {
> 	r1 = READ_ONCE(z);
> 	smp_rmb();
> 	r2 = READ_ONCE(x);
> }
> 
> exists: r0 == 1 /\ r1==1 /\ r2==0
> 
> You expect this to be forbidden too, even if the release and acquire
> are RCpc?  This part I don't follow.  There's no conditional branch
> here to enforce a control dependency.  I get and I agree that
> smp_store_release/smp_load_acquire are different than spin_unlock/
> spin_lock, but isn't that even more reason to allow this one if
> smp_store_release/smp_load_acquire are RCpc without question?

Here is the above test, reworked a bit to allow herd to accept it:

------------------------------------------------------------------------

C C-peterz

{
}

P0(int *x, int *y, int *z)
{
	int r0;

	WRITE_ONCE(*x, 1);
	smp_store_release(y, 1);
	r0 = smp_load_acquire(y);
	WRITE_ONCE(*z, 1);
}

P1(int *x, int *y, int *z)
{
	int r1;
	int r2;

	r1 = READ_ONCE(*z);
	smp_rmb();
	r2 = READ_ONCE(*x);
}

exists (0:r0 == 1 /\ 1:r1==1 /\ 1:r2==0)

------------------------------------------------------------------------

Please let me know if I messed up the conversion.  On the perhaps unlikely
chance that I got it right, here is what the current version of the
Linux-kernel memory model says:

------------------------------------------------------------------------

$ herd7 -conf linux-kernel.cfg /tmp/C-peterz.litmus 
Test C-peterz Allowed
States 4
0:r0=1; 1:r1=0; 1:r2=0;
0:r0=1; 1:r1=0; 1:r2=1;
0:r0=1; 1:r1=1; 1:r2=0;
0:r0=1; 1:r1=1; 1:r2=1;
Ok
Witnesses
Positive: 1 Negative: 3
Condition exists (0:r0=1 /\ 1:r1=1 /\ 1:r2=0)
Observation C-peterz Sometimes 1 3
Time C-peterz 0.01
Hash=6c4f7f8a8dc750fd5d706d6f3c104360

------------------------------------------------------------------------

So the current Linux-kernel memory model does in fact allow this, as
you say.  But again, there is a proposal to tighten the Linux-kernel
memory model based on the pre-RISC-V set of architectures running the
Linux kernel, all of which would forbid this test.

So we have something that is not all that rare in the Linux kernel
community, namely two conflicting more-or-less concurrent changes.
This clearly needs to be resolved, either by us not strengthening the
Linux-kernel memory model in the way we were planning to or by you
strengthening RISC-V to be no weaker than PowerPC for these sorts of
externally viewed release-acquire situations.

Other thoughts?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock()
  2018-02-22 17:27       ` Daniel Lustig
@ 2018-02-22 18:21         ` Peter Zijlstra
  -1 siblings, 0 replies; 46+ messages in thread
From: Peter Zijlstra @ 2018-02-22 18:21 UTC (permalink / raw)
  To: Daniel Lustig
  Cc: Andrea Parri, linux-kernel, Palmer Dabbelt, Albert Ou,
	Alan Stern, Will Deacon, Boqun Feng, Nicholas Piggin,
	David Howells, Jade Alglave, Luc Maranget, Paul E. McKenney,
	Akira Yokosawa, Ingo Molnar, Linus Torvalds, linux-riscv

On Thu, Feb 22, 2018 at 09:27:09AM -0800, Daniel Lustig wrote:
> On 2/22/2018 6:12 AM, Andrea Parri wrote:
> > On Thu, Feb 22, 2018 at 02:40:04PM +0100, Peter Zijlstra wrote:
> >> On Thu, Feb 22, 2018 at 01:19:50PM +0100, Andrea Parri wrote:
> >>
> >>> C unlock-lock-read-ordering
> >>>
> >>> {}
> >>> /* s initially owned by P1 */
> >>>
> >>> P0(int *x, int *y)
> >>> {
> >>> 	WRITE_ONCE(*x, 1);
> >>> 	smp_wmb();
> >>> 	WRITE_ONCE(*y, 1);
> >>> }
> >>>
> >>> P1(int *x, int *y, spinlock_t *s)
> >>> {
> >>> 	int r0;
> >>> 	int r1;
> >>>
> >>> 	r0 = READ_ONCE(*y);
> >>> 	spin_unlock(s);
> >>> 	spin_lock(s);
> >>> 	r1 = READ_ONCE(*y);
> >>> }

> >> So I would indeed expect this to be forbidden. Could someone please
> >> explain how this could be allowed?

> I think an RCpc interpretation of .aq and .rl would in fact allow
> the two normal loads in P1 to be reordered.  Wouldn't it?  Does that
> go against what the LKMM requires?
> 
> The intuition would be that the amoswap.w.aq can forward from the
> amoswap.w.rl while that's still in the store buffer, and then the
> lw x3,0(x4) can also perform while the amoswap.w.rl is still in
> the store buffer, all before the l1 x1,0(x2) executes.  That's
> not forbidden unless the amoswaps are RCsc, unless I'm missing
> something.

So here you're equating flushing the store buffer to RCsc. Is that
entirely fair? Can't a local store-buffer flush still not mean global
visibility, similarly, could not a store-buffer have internal ordering,
such that it guarantees to flush writes in-order? Employing for example
multiple stages / generations.

That is, I think there's quite a bit of gray area here.

But even with the store buffer, the STORE-RELEASE could 'wait' for all
prior LOADs to complete before issuing the STORE. It could also
increment a store-buffer generation before issuing, such that it would,
once it gets about flushing things, issue the STOREs in the 'promised'
order.

> Likewise even if the unlock()/lock() is between two stores.  A
> control dependency might originate from the load part of the
> amoswap.w.aq, but there still would have to be something to
> ensure that this load part in fact performs after the store
> part of the amoswap.w.rl performs globally, and that's not
> automatic under RCpc.

Right, so I think we have different definitions of what RCpc means, and
I suspect I might be the odd one out. And its not just spinlocks.

> >>
> >>> C unlock-lock-write-ordering
> >>>
> >>> {}
> >>> /* s initially owned by P0 */
> >>>
> >>> P0(int *x, int *y, spinlock_t *s)
> >>> {
> >>> 	WRITE_ONCE(*x, 1);
> >>> 	spin_unlock(s);
> >>> 	spin_lock(s);
> >>> 	WRITE_ONCE(*y, 1);
> >>> }
> >>>
> >>> P1(int *x, int *y)
> >>> {
> >>> 	int r0;
> >>> 	int r1;
> >>>
> >>> 	r0 = READ_ONCE(*y);
> >>> 	smp_rmb();
> >>> 	r1 = READ_ONCE(*y);
> >>> }
> >>>
> >>> exists (1:r0=1 /\ 1:r1=0)
> >>>
> >>> RISCV RISCV-unlock-lock-write-ordering
> >>> {
> >>> 0:x2=x; 0:x4=y; 0:x6=s;
> >>> 1:x2=y; 1:x4=x;
> >>> s=1;
> >>> }
> >>>  P0                       | P1           ;
> >>>  ori x1,x0,1              | lw x1,0(x2)  ;
> >>>  sw x1,0(x2)              | fence r,r    ;
> >>>  amoswap.w.rl x0,x0,(x6)  | lw x3,0(x4)  ;
> >>>  ori x5,x0,1              |              ;
> >>>  amoswap.w.aq x0,x5,(x6)  |              ;
> >>>  ori x3,x0,1              |              ;
> >>>  sw x3,0(x4)              |              ;
> >>> exists
> >>> (1:x1=1 /\ 1:x3=0)
> >>
> >> And here I think the RISCV conversion is flawed, there should be a ctrl
> >> dependency. The second store-word in P0 should depend on the result of
> >> amoswap.w.aq being 0.
> > 
> > You're right: AFAICT, this can be remedied by inserting "beq x0,x5,FAIL00"
> > right after amoswap.w.aq (and this label at the end of P0); this does not
> > change the verdict of the available formalizations reported above however.
> > 
> > (So, AFAICT, the above question remains valid/open.)
> 
> This makes sense, but one other question: Paul mentioned earlier in the
> thread that
> 
> > The PowerPC lock implementation's unlock-lock pair does not order writes
> > from the previous critical section against reads from the later critical
> > section, but it does order other combinations of reads and writes.
> 
> My understanding is that the same logic about control dependencies applies
> here too, right?  In other words, in spite of Peter's claim, the control
> dependency doesn't automatically fix this for RISC-V or for PowerPC?

So what Paul says is that earlier STOREs and later LOADs can reorder
(the TSO reordering) over the RELEASE+ACQUIRE. And that is right, LWSYNC
allows just that one reorder.

The ctrl-dep cannot fix that, ctrl-dep is a LOAD->STORE order, later
STOREs will happen after the earlier LOAD.

But the above example is a STORE->STORE, which LWSYNC _will_ order.

But yes, I remember being confused by that example.

> Peter, in this test you mentioned earlier:
> 
> P0()
> {
> 	WRITE_ONCE(x, 1);
> 	smp_store_release(&y, 1);
> 	r0 = smp_load_acquire(&y);
> 	WRITE_ONCE(z, 1);
> }
> 
> P1()
> {
> 	r1 = READ_ONCE(z);
> 	smp_rmb();
> 	r2 = READ_ONCE(x);
> }
> 
> exists: r0 == 1 /\ r1==1 /\ r2==0
> 
> You expect this to be forbidden too, even if the release and acquire
> are RCpc?  This part I don't follow.  There's no conditional branch
> here to enforce a control dependency.  I get and I agree that
> smp_store_release/smp_load_acquire are different than spin_unlock/
> spin_lock, but isn't that even more reason to allow this one if
> smp_store_release/smp_load_acquire are RCpc without question?

Yes, I was expecting that to be forbidden. My definition of RCpc is such
that ACQUIRE and RELEASE are 'locally' respected.

The TSO archs forbid the above by ordering STOREs, then the only
architectures that implement ACQUIRE/RELEASE are PPC and ARM64 (the rest
uses smp_mb()), ARM64 has globally consistent RELEASE/ACQUIRE and PPC
has their LWSYNC which imposes a local TSO order.

In particular I'm not a great fan of weakening the LKMM further. PPC
is already painfully weak (more so than ARM64), but the proposed RISC-V
would be weaker still.

Of course, I come from x86 so I could be biased, but history has shown
that the weaker this stuff becomes the more painful this all becomes.
Not to mention that we'd have to go audit all existant code for
constructs no longer allowed, which is incredibly painful.

Linus too has expressed his preference for stronger models.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock()
@ 2018-02-22 18:21         ` Peter Zijlstra
  0 siblings, 0 replies; 46+ messages in thread
From: Peter Zijlstra @ 2018-02-22 18:21 UTC (permalink / raw)
  To: linux-riscv

On Thu, Feb 22, 2018 at 09:27:09AM -0800, Daniel Lustig wrote:
> On 2/22/2018 6:12 AM, Andrea Parri wrote:
> > On Thu, Feb 22, 2018 at 02:40:04PM +0100, Peter Zijlstra wrote:
> >> On Thu, Feb 22, 2018 at 01:19:50PM +0100, Andrea Parri wrote:
> >>
> >>> C unlock-lock-read-ordering
> >>>
> >>> {}
> >>> /* s initially owned by P1 */
> >>>
> >>> P0(int *x, int *y)
> >>> {
> >>> 	WRITE_ONCE(*x, 1);
> >>> 	smp_wmb();
> >>> 	WRITE_ONCE(*y, 1);
> >>> }
> >>>
> >>> P1(int *x, int *y, spinlock_t *s)
> >>> {
> >>> 	int r0;
> >>> 	int r1;
> >>>
> >>> 	r0 = READ_ONCE(*y);
> >>> 	spin_unlock(s);
> >>> 	spin_lock(s);
> >>> 	r1 = READ_ONCE(*y);
> >>> }

> >> So I would indeed expect this to be forbidden. Could someone please
> >> explain how this could be allowed?

> I think an RCpc interpretation of .aq and .rl would in fact allow
> the two normal loads in P1 to be reordered.  Wouldn't it?  Does that
> go against what the LKMM requires?
> 
> The intuition would be that the amoswap.w.aq can forward from the
> amoswap.w.rl while that's still in the store buffer, and then the
> lw x3,0(x4) can also perform while the amoswap.w.rl is still in
> the store buffer, all before the l1 x1,0(x2) executes.  That's
> not forbidden unless the amoswaps are RCsc, unless I'm missing
> something.

So here you're equating flushing the store buffer to RCsc. Is that
entirely fair? Can't a local store-buffer flush still not mean global
visibility, similarly, could not a store-buffer have internal ordering,
such that it guarantees to flush writes in-order? Employing for example
multiple stages / generations.

That is, I think there's quite a bit of gray area here.

But even with the store buffer, the STORE-RELEASE could 'wait' for all
prior LOADs to complete before issuing the STORE. It could also
increment a store-buffer generation before issuing, such that it would,
once it gets about flushing things, issue the STOREs in the 'promised'
order.

> Likewise even if the unlock()/lock() is between two stores.  A
> control dependency might originate from the load part of the
> amoswap.w.aq, but there still would have to be something to
> ensure that this load part in fact performs after the store
> part of the amoswap.w.rl performs globally, and that's not
> automatic under RCpc.

Right, so I think we have different definitions of what RCpc means, and
I suspect I might be the odd one out. And its not just spinlocks.

> >>
> >>> C unlock-lock-write-ordering
> >>>
> >>> {}
> >>> /* s initially owned by P0 */
> >>>
> >>> P0(int *x, int *y, spinlock_t *s)
> >>> {
> >>> 	WRITE_ONCE(*x, 1);
> >>> 	spin_unlock(s);
> >>> 	spin_lock(s);
> >>> 	WRITE_ONCE(*y, 1);
> >>> }
> >>>
> >>> P1(int *x, int *y)
> >>> {
> >>> 	int r0;
> >>> 	int r1;
> >>>
> >>> 	r0 = READ_ONCE(*y);
> >>> 	smp_rmb();
> >>> 	r1 = READ_ONCE(*y);
> >>> }
> >>>
> >>> exists (1:r0=1 /\ 1:r1=0)
> >>>
> >>> RISCV RISCV-unlock-lock-write-ordering
> >>> {
> >>> 0:x2=x; 0:x4=y; 0:x6=s;
> >>> 1:x2=y; 1:x4=x;
> >>> s=1;
> >>> }
> >>>  P0                       | P1           ;
> >>>  ori x1,x0,1              | lw x1,0(x2)  ;
> >>>  sw x1,0(x2)              | fence r,r    ;
> >>>  amoswap.w.rl x0,x0,(x6)  | lw x3,0(x4)  ;
> >>>  ori x5,x0,1              |              ;
> >>>  amoswap.w.aq x0,x5,(x6)  |              ;
> >>>  ori x3,x0,1              |              ;
> >>>  sw x3,0(x4)              |              ;
> >>> exists
> >>> (1:x1=1 /\ 1:x3=0)
> >>
> >> And here I think the RISCV conversion is flawed, there should be a ctrl
> >> dependency. The second store-word in P0 should depend on the result of
> >> amoswap.w.aq being 0.
> > 
> > You're right: AFAICT, this can be remedied by inserting "beq x0,x5,FAIL00"
> > right after amoswap.w.aq (and this label at the end of P0); this does not
> > change the verdict of the available formalizations reported above however.
> > 
> > (So, AFAICT, the above question remains valid/open.)
> 
> This makes sense, but one other question: Paul mentioned earlier in the
> thread that
> 
> > The PowerPC lock implementation's unlock-lock pair does not order writes
> > from the previous critical section against reads from the later critical
> > section, but it does order other combinations of reads and writes.
> 
> My understanding is that the same logic about control dependencies applies
> here too, right?  In other words, in spite of Peter's claim, the control
> dependency doesn't automatically fix this for RISC-V or for PowerPC?

So what Paul says is that earlier STOREs and later LOADs can reorder
(the TSO reordering) over the RELEASE+ACQUIRE. And that is right, LWSYNC
allows just that one reorder.

The ctrl-dep cannot fix that, ctrl-dep is a LOAD->STORE order, later
STOREs will happen after the earlier LOAD.

But the above example is a STORE->STORE, which LWSYNC _will_ order.

But yes, I remember being confused by that example.

> Peter, in this test you mentioned earlier:
> 
> P0()
> {
> 	WRITE_ONCE(x, 1);
> 	smp_store_release(&y, 1);
> 	r0 = smp_load_acquire(&y);
> 	WRITE_ONCE(z, 1);
> }
> 
> P1()
> {
> 	r1 = READ_ONCE(z);
> 	smp_rmb();
> 	r2 = READ_ONCE(x);
> }
> 
> exists: r0 == 1 /\ r1==1 /\ r2==0
> 
> You expect this to be forbidden too, even if the release and acquire
> are RCpc?  This part I don't follow.  There's no conditional branch
> here to enforce a control dependency.  I get and I agree that
> smp_store_release/smp_load_acquire are different than spin_unlock/
> spin_lock, but isn't that even more reason to allow this one if
> smp_store_release/smp_load_acquire are RCpc without question?

Yes, I was expecting that to be forbidden. My definition of RCpc is such
that ACQUIRE and RELEASE are 'locally' respected.

The TSO archs forbid the above by ordering STOREs, then the only
architectures that implement ACQUIRE/RELEASE are PPC and ARM64 (the rest
uses smp_mb()), ARM64 has globally consistent RELEASE/ACQUIRE and PPC
has their LWSYNC which imposes a local TSO order.

In particular I'm not a great fan of weakening the LKMM further. PPC
is already painfully weak (more so than ARM64), but the proposed RISC-V
would be weaker still.

Of course, I come from x86 so I could be biased, but history has shown
that the weaker this stuff becomes the more painful this all becomes.
Not to mention that we'd have to go audit all existant code for
constructs no longer allowed, which is incredibly painful.

Linus too has expressed his preference for stronger models.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock()
  2018-02-22 18:13         ` Paul E. McKenney
@ 2018-02-22 18:27           ` Peter Zijlstra
  -1 siblings, 0 replies; 46+ messages in thread
From: Peter Zijlstra @ 2018-02-22 18:27 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Daniel Lustig, Andrea Parri, linux-kernel, Palmer Dabbelt,
	Albert Ou, Alan Stern, Will Deacon, Boqun Feng, Nicholas Piggin,
	David Howells, Jade Alglave, Luc Maranget, Akira Yokosawa,
	Ingo Molnar, Linus Torvalds, linux-riscv

On Thu, Feb 22, 2018 at 10:13:17AM -0800, Paul E. McKenney wrote:
> So we have something that is not all that rare in the Linux kernel
> community, namely two conflicting more-or-less concurrent changes.
> This clearly needs to be resolved, either by us not strengthening the
> Linux-kernel memory model in the way we were planning to or by you
> strengthening RISC-V to be no weaker than PowerPC for these sorts of
> externally viewed release-acquire situations.
> 
> Other thoughts?

Like said in the other email, I would _much_ prefer to not go weaker
than PPC, I find that PPC is already painfully weak at times.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock()
@ 2018-02-22 18:27           ` Peter Zijlstra
  0 siblings, 0 replies; 46+ messages in thread
From: Peter Zijlstra @ 2018-02-22 18:27 UTC (permalink / raw)
  To: linux-riscv

On Thu, Feb 22, 2018 at 10:13:17AM -0800, Paul E. McKenney wrote:
> So we have something that is not all that rare in the Linux kernel
> community, namely two conflicting more-or-less concurrent changes.
> This clearly needs to be resolved, either by us not strengthening the
> Linux-kernel memory model in the way we were planning to or by you
> strengthening RISC-V to be no weaker than PowerPC for these sorts of
> externally viewed release-acquire situations.
> 
> Other thoughts?

Like said in the other email, I would _much_ prefer to not go weaker
than PPC, I find that PPC is already painfully weak at times.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock()
  2018-02-22 18:27           ` Peter Zijlstra
@ 2018-02-22 19:47             ` Daniel Lustig
  -1 siblings, 0 replies; 46+ messages in thread
From: Daniel Lustig @ 2018-02-22 19:47 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E. McKenney
  Cc: Andrea Parri, linux-kernel, Palmer Dabbelt, Albert Ou,
	Alan Stern, Will Deacon, Boqun Feng, Nicholas Piggin,
	David Howells, Jade Alglave, Luc Maranget, Akira Yokosawa,
	Ingo Molnar, Linus Torvalds, linux-riscv

On 2/22/2018 10:27 AM, Peter Zijlstra wrote:
> On Thu, Feb 22, 2018 at 10:13:17AM -0800, Paul E. McKenney wrote:
>> So we have something that is not all that rare in the Linux kernel
>> community, namely two conflicting more-or-less concurrent changes.
>> This clearly needs to be resolved, either by us not strengthening the
>> Linux-kernel memory model in the way we were planning to or by you
>> strengthening RISC-V to be no weaker than PowerPC for these sorts of
>> externally viewed release-acquire situations.
>>
>> Other thoughts?
> 
> Like said in the other email, I would _much_ prefer to not go weaker
> than PPC, I find that PPC is already painfully weak at times.

Sure, and RISC-V could make this work too by using RCsc instructions
and/or by using lightweight fences instead.  It just wasn't clear at
first whether smp_load_acquire() and smp_store_release() were RCpc,
RCsc, or something else, and hence whether RISC-V would actually need
to use something stronger than pure RCpc there.  Likewise for
spin_unlock()/spin_lock() and everywhere else this comes up.

As Paul's email in the other thread observed, RCpc seems to be
OK for smp_load_acquire()/smp_store_release() at least according
to the current LKMM herd spec.  Unlock/lock are stronger already
I guess.  But if there's an active proposal to strengthen them all
to something stricter than pure RCpc, then that's good to know.

My understanding from earlier discussions is that ARM has no plans
to use their own RCpc instruction for smp_load_acquire() instead
of their RCsc instructions.  Is that still true?  If they were to
use the RCpc load there, that would cause them to have the same
problem we're discussing here, right?  Just checking.

Dan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock()
@ 2018-02-22 19:47             ` Daniel Lustig
  0 siblings, 0 replies; 46+ messages in thread
From: Daniel Lustig @ 2018-02-22 19:47 UTC (permalink / raw)
  To: linux-riscv

On 2/22/2018 10:27 AM, Peter Zijlstra wrote:
> On Thu, Feb 22, 2018 at 10:13:17AM -0800, Paul E. McKenney wrote:
>> So we have something that is not all that rare in the Linux kernel
>> community, namely two conflicting more-or-less concurrent changes.
>> This clearly needs to be resolved, either by us not strengthening the
>> Linux-kernel memory model in the way we were planning to or by you
>> strengthening RISC-V to be no weaker than PowerPC for these sorts of
>> externally viewed release-acquire situations.
>>
>> Other thoughts?
> 
> Like said in the other email, I would _much_ prefer to not go weaker
> than PPC, I find that PPC is already painfully weak at times.

Sure, and RISC-V could make this work too by using RCsc instructions
and/or by using lightweight fences instead.  It just wasn't clear at
first whether smp_load_acquire() and smp_store_release() were RCpc,
RCsc, or something else, and hence whether RISC-V would actually need
to use something stronger than pure RCpc there.  Likewise for
spin_unlock()/spin_lock() and everywhere else this comes up.

As Paul's email in the other thread observed, RCpc seems to be
OK for smp_load_acquire()/smp_store_release() at least according
to the current LKMM herd spec.  Unlock/lock are stronger already
I guess.  But if there's an active proposal to strengthen them all
to something stricter than pure RCpc, then that's good to know.

My understanding from earlier discussions is that ARM has no plans
to use their own RCpc instruction for smp_load_acquire() instead
of their RCsc instructions.  Is that still true?  If they were to
use the RCpc load there, that would cause them to have the same
problem we're discussing here, right?  Just checking.

Dan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock()
  2018-02-22 18:27           ` Peter Zijlstra
@ 2018-02-22 20:02             ` Paul E. McKenney
  -1 siblings, 0 replies; 46+ messages in thread
From: Paul E. McKenney @ 2018-02-22 20:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Daniel Lustig, Andrea Parri, linux-kernel, Palmer Dabbelt,
	Albert Ou, Alan Stern, Will Deacon, Boqun Feng, Nicholas Piggin,
	David Howells, Jade Alglave, Luc Maranget, Akira Yokosawa,
	Ingo Molnar, Linus Torvalds, linux-riscv

On Thu, Feb 22, 2018 at 07:27:17PM +0100, Peter Zijlstra wrote:
> On Thu, Feb 22, 2018 at 10:13:17AM -0800, Paul E. McKenney wrote:
> > So we have something that is not all that rare in the Linux kernel
> > community, namely two conflicting more-or-less concurrent changes.
> > This clearly needs to be resolved, either by us not strengthening the
> > Linux-kernel memory model in the way we were planning to or by you
> > strengthening RISC-V to be no weaker than PowerPC for these sorts of
> > externally viewed release-acquire situations.
> > 
> > Other thoughts?
> 
> Like said in the other email, I would _much_ prefer to not go weaker
> than PPC, I find that PPC is already painfully weak at times.

And here are the four PowerPC litmus tests.  As expected, a
release-acquire pair within a given process orders everything except
for prior stores against later loads, from the viewpoint of some other
process.

And yes, a few of the filenames are unfortunate.

							Thanx, Paul

------------------------------------------------------------------------

PPC MP+o-r-a-o+o-rmb-o
""
(* 22-Feb-2018: ppcmem says "Never" *)
{
0:r1=1; 0:r4=x; 0:r5=y; 0:r6=z;
1:r1=1; 1:r4=x; 1:r5=y; 1:r6=z;
}
 P0           | P1           ;
 stw r1,0(r4) | lwz r7,0(r6) ;
 lwsync       | lwsync       ;
 stw r1,0(r5) | lwz r8,0(r4) ;
 lwz r7,0(r5) |              ;
 lwsync       |              ;
 stw r1,0(r6) |              ;


exists
(1:r7=1 /\ 1:r8=0)

------------------------------------------------------------------------

PPC SB+o-r-a-o+o-rmb-o
""
(* 22-Feb-2018: ppcmem says "Sometimes" *)
{
0:r1=1; 0:r4=x; 0:r5=y; 0:r6=z;
1:r1=1; 1:r4=x; 1:r5=y; 1:r6=z;
}
 P0           | P1           ;
 stw r1,0(r4) | stw r1,0(r6) ;
 lwsync       | lwsync       ;
 stw r1,0(r5) | lwz r7,0(r4) ;
 lwz r8,0(r5) |              ;
 lwsync       |              ;
 lwz r7,0(r6) |              ;


exists
(0:r7=0 /\ 1:r7=0)

------------------------------------------------------------------------

PPC LB+o-r-a-o+o-rmb-o
""
(* 22-Feb-2018: ppcmem says "Never" *)
{
0:r1=1; 0:r4=x; 0:r5=y; 0:r6=z;
1:r1=1; 1:r4=x; 1:r5=y; 1:r6=z;
}
 P0           | P1           ;
 lwz r7,0(r4) | lwz r7,0(r6) ;
 lwsync       | lwsync       ;
 stw r1,0(r5) | stw r1,0(r4) ;
 lwz r8,0(r5) |              ;
 lwsync       |              ;
 stw r1,0(r6) |              ;


exists
(0:r7=1 /\ 1:r7=1)

------------------------------------------------------------------------

PPC MP+o-rmb-o+o-r-a-o.litmus
""
(* 22-Feb-2018: ppcmem says "Never" *)
{
0:r1=1; 0:r4=x; 0:r5=y; 0:r6=z;
1:r1=1; 1:r4=x; 1:r5=y; 1:r6=z;
}
 P0           | P1           ;
 lwz r7,0(r4) | stw r1,0(r6) ;
 lwsync       | lwsync       ;
 stw r1,0(r5) | stw r1,0(r4) ;
 lwz r8,0(r5) |              ;
 lwsync       |              ;
 lwz r9,0(r6) |              ;


exists
(0:r7=1 /\ 0:r9=0)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock()
@ 2018-02-22 20:02             ` Paul E. McKenney
  0 siblings, 0 replies; 46+ messages in thread
From: Paul E. McKenney @ 2018-02-22 20:02 UTC (permalink / raw)
  To: linux-riscv

On Thu, Feb 22, 2018 at 07:27:17PM +0100, Peter Zijlstra wrote:
> On Thu, Feb 22, 2018 at 10:13:17AM -0800, Paul E. McKenney wrote:
> > So we have something that is not all that rare in the Linux kernel
> > community, namely two conflicting more-or-less concurrent changes.
> > This clearly needs to be resolved, either by us not strengthening the
> > Linux-kernel memory model in the way we were planning to or by you
> > strengthening RISC-V to be no weaker than PowerPC for these sorts of
> > externally viewed release-acquire situations.
> > 
> > Other thoughts?
> 
> Like said in the other email, I would _much_ prefer to not go weaker
> than PPC, I find that PPC is already painfully weak at times.

And here are the four PowerPC litmus tests.  As expected, a
release-acquire pair within a given process orders everything except
for prior stores against later loads, from the viewpoint of some other
process.

And yes, a few of the filenames are unfortunate.

							Thanx, Paul

------------------------------------------------------------------------

PPC MP+o-r-a-o+o-rmb-o
""
(* 22-Feb-2018: ppcmem says "Never" *)
{
0:r1=1; 0:r4=x; 0:r5=y; 0:r6=z;
1:r1=1; 1:r4=x; 1:r5=y; 1:r6=z;
}
 P0           | P1           ;
 stw r1,0(r4) | lwz r7,0(r6) ;
 lwsync       | lwsync       ;
 stw r1,0(r5) | lwz r8,0(r4) ;
 lwz r7,0(r5) |              ;
 lwsync       |              ;
 stw r1,0(r6) |              ;


exists
(1:r7=1 /\ 1:r8=0)

------------------------------------------------------------------------

PPC SB+o-r-a-o+o-rmb-o
""
(* 22-Feb-2018: ppcmem says "Sometimes" *)
{
0:r1=1; 0:r4=x; 0:r5=y; 0:r6=z;
1:r1=1; 1:r4=x; 1:r5=y; 1:r6=z;
}
 P0           | P1           ;
 stw r1,0(r4) | stw r1,0(r6) ;
 lwsync       | lwsync       ;
 stw r1,0(r5) | lwz r7,0(r4) ;
 lwz r8,0(r5) |              ;
 lwsync       |              ;
 lwz r7,0(r6) |              ;


exists
(0:r7=0 /\ 1:r7=0)

------------------------------------------------------------------------

PPC LB+o-r-a-o+o-rmb-o
""
(* 22-Feb-2018: ppcmem says "Never" *)
{
0:r1=1; 0:r4=x; 0:r5=y; 0:r6=z;
1:r1=1; 1:r4=x; 1:r5=y; 1:r6=z;
}
 P0           | P1           ;
 lwz r7,0(r4) | lwz r7,0(r6) ;
 lwsync       | lwsync       ;
 stw r1,0(r5) | stw r1,0(r4) ;
 lwz r8,0(r5) |              ;
 lwsync       |              ;
 stw r1,0(r6) |              ;


exists
(0:r7=1 /\ 1:r7=1)

------------------------------------------------------------------------

PPC MP+o-rmb-o+o-r-a-o.litmus
""
(* 22-Feb-2018: ppcmem says "Never" *)
{
0:r1=1; 0:r4=x; 0:r5=y; 0:r6=z;
1:r1=1; 1:r4=x; 1:r5=y; 1:r6=z;
}
 P0           | P1           ;
 lwz r7,0(r4) | stw r1,0(r6) ;
 lwsync       | lwsync       ;
 stw r1,0(r5) | stw r1,0(r4) ;
 lwz r8,0(r5) |              ;
 lwsync       |              ;
 lwz r9,0(r6) |              ;


exists
(0:r7=1 /\ 0:r9=0)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock()
  2018-02-22 19:47             ` Daniel Lustig
@ 2018-02-23 11:16               ` Andrea Parri
  -1 siblings, 0 replies; 46+ messages in thread
From: Andrea Parri @ 2018-02-23 11:16 UTC (permalink / raw)
  To: Daniel Lustig
  Cc: Peter Zijlstra, Paul E. McKenney, linux-kernel, Palmer Dabbelt,
	Albert Ou, Alan Stern, Will Deacon, Boqun Feng, Nicholas Piggin,
	David Howells, Jade Alglave, Luc Maranget, Akira Yokosawa,
	Ingo Molnar, Linus Torvalds, linux-riscv

On Thu, Feb 22, 2018 at 11:47:57AM -0800, Daniel Lustig wrote:
> On 2/22/2018 10:27 AM, Peter Zijlstra wrote:
> > On Thu, Feb 22, 2018 at 10:13:17AM -0800, Paul E. McKenney wrote:
> >> So we have something that is not all that rare in the Linux kernel
> >> community, namely two conflicting more-or-less concurrent changes.
> >> This clearly needs to be resolved, either by us not strengthening the
> >> Linux-kernel memory model in the way we were planning to or by you
> >> strengthening RISC-V to be no weaker than PowerPC for these sorts of
> >> externally viewed release-acquire situations.
> >>
> >> Other thoughts?
> > 
> > Like said in the other email, I would _much_ prefer to not go weaker
> > than PPC, I find that PPC is already painfully weak at times.
> 
> Sure, and RISC-V could make this work too by using RCsc instructions
> and/or by using lightweight fences instead.  It just wasn't clear at
> first whether smp_load_acquire() and smp_store_release() were RCpc,
> RCsc, or something else, and hence whether RISC-V would actually need
> to use something stronger than pure RCpc there.  Likewise for
> spin_unlock()/spin_lock() and everywhere else this comes up.

Thank you for the confirmations.

The hope was indeed that this thread helped clarify what the (current)
LKMM "expectations" are, and how these are likely to evolve in a near
future.  And Yes, I'm definitely with you and Paul on "let us (try to)
keep LKMM and RISC-V synchronized"! ;-)

  Andrea


> 
> 
> Dan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock()
@ 2018-02-23 11:16               ` Andrea Parri
  0 siblings, 0 replies; 46+ messages in thread
From: Andrea Parri @ 2018-02-23 11:16 UTC (permalink / raw)
  To: linux-riscv

On Thu, Feb 22, 2018 at 11:47:57AM -0800, Daniel Lustig wrote:
> On 2/22/2018 10:27 AM, Peter Zijlstra wrote:
> > On Thu, Feb 22, 2018 at 10:13:17AM -0800, Paul E. McKenney wrote:
> >> So we have something that is not all that rare in the Linux kernel
> >> community, namely two conflicting more-or-less concurrent changes.
> >> This clearly needs to be resolved, either by us not strengthening the
> >> Linux-kernel memory model in the way we were planning to or by you
> >> strengthening RISC-V to be no weaker than PowerPC for these sorts of
> >> externally viewed release-acquire situations.
> >>
> >> Other thoughts?
> > 
> > Like said in the other email, I would _much_ prefer to not go weaker
> > than PPC, I find that PPC is already painfully weak at times.
> 
> Sure, and RISC-V could make this work too by using RCsc instructions
> and/or by using lightweight fences instead.  It just wasn't clear at
> first whether smp_load_acquire() and smp_store_release() were RCpc,
> RCsc, or something else, and hence whether RISC-V would actually need
> to use something stronger than pure RCpc there.  Likewise for
> spin_unlock()/spin_lock() and everywhere else this comes up.

Thank you for the confirmations.

The hope was indeed that this thread helped clarify what the (current)
LKMM "expectations" are, and how these are likely to evolve in a near
future.  And Yes, I'm definitely with you and Paul on "let us (try to)
keep LKMM and RISC-V synchronized"! ;-)

  Andrea


> 
> 
> Dan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock()
  2018-02-22 19:47             ` Daniel Lustig
@ 2018-02-26 10:39               ` Will Deacon
  -1 siblings, 0 replies; 46+ messages in thread
From: Will Deacon @ 2018-02-26 10:39 UTC (permalink / raw)
  To: Daniel Lustig
  Cc: Peter Zijlstra, Paul E. McKenney, Andrea Parri, linux-kernel,
	Palmer Dabbelt, Albert Ou, Alan Stern, Boqun Feng,
	Nicholas Piggin, David Howells, Jade Alglave, Luc Maranget,
	Akira Yokosawa, Ingo Molnar, Linus Torvalds, linux-riscv

On Thu, Feb 22, 2018 at 11:47:57AM -0800, Daniel Lustig wrote:
> On 2/22/2018 10:27 AM, Peter Zijlstra wrote:
> > On Thu, Feb 22, 2018 at 10:13:17AM -0800, Paul E. McKenney wrote:
> >> So we have something that is not all that rare in the Linux kernel
> >> community, namely two conflicting more-or-less concurrent changes.
> >> This clearly needs to be resolved, either by us not strengthening the
> >> Linux-kernel memory model in the way we were planning to or by you
> >> strengthening RISC-V to be no weaker than PowerPC for these sorts of
> >> externally viewed release-acquire situations.
> >>
> >> Other thoughts?
> > 
> > Like said in the other email, I would _much_ prefer to not go weaker
> > than PPC, I find that PPC is already painfully weak at times.
> 
> Sure, and RISC-V could make this work too by using RCsc instructions
> and/or by using lightweight fences instead.  It just wasn't clear at
> first whether smp_load_acquire() and smp_store_release() were RCpc,
> RCsc, or something else, and hence whether RISC-V would actually need
> to use something stronger than pure RCpc there.  Likewise for
> spin_unlock()/spin_lock() and everywhere else this comes up.
> 
> As Paul's email in the other thread observed, RCpc seems to be
> OK for smp_load_acquire()/smp_store_release() at least according
> to the current LKMM herd spec.  Unlock/lock are stronger already
> I guess.  But if there's an active proposal to strengthen them all
> to something stricter than pure RCpc, then that's good to know.
> 
> My understanding from earlier discussions is that ARM has no plans
> to use their own RCpc instruction for smp_load_acquire() instead
> of their RCsc instructions.  Is that still true?  If they were to
> use the RCpc load there, that would cause them to have the same
> problem we're discussing here, right?  Just checking.

Agreed. No plans to use the LDAPR instruction in Linux.

Will

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock()
@ 2018-02-26 10:39               ` Will Deacon
  0 siblings, 0 replies; 46+ messages in thread
From: Will Deacon @ 2018-02-26 10:39 UTC (permalink / raw)
  To: linux-riscv

On Thu, Feb 22, 2018 at 11:47:57AM -0800, Daniel Lustig wrote:
> On 2/22/2018 10:27 AM, Peter Zijlstra wrote:
> > On Thu, Feb 22, 2018 at 10:13:17AM -0800, Paul E. McKenney wrote:
> >> So we have something that is not all that rare in the Linux kernel
> >> community, namely two conflicting more-or-less concurrent changes.
> >> This clearly needs to be resolved, either by us not strengthening the
> >> Linux-kernel memory model in the way we were planning to or by you
> >> strengthening RISC-V to be no weaker than PowerPC for these sorts of
> >> externally viewed release-acquire situations.
> >>
> >> Other thoughts?
> > 
> > Like said in the other email, I would _much_ prefer to not go weaker
> > than PPC, I find that PPC is already painfully weak at times.
> 
> Sure, and RISC-V could make this work too by using RCsc instructions
> and/or by using lightweight fences instead.  It just wasn't clear at
> first whether smp_load_acquire() and smp_store_release() were RCpc,
> RCsc, or something else, and hence whether RISC-V would actually need
> to use something stronger than pure RCpc there.  Likewise for
> spin_unlock()/spin_lock() and everywhere else this comes up.
> 
> As Paul's email in the other thread observed, RCpc seems to be
> OK for smp_load_acquire()/smp_store_release() at least according
> to the current LKMM herd spec.  Unlock/lock are stronger already
> I guess.  But if there's an active proposal to strengthen them all
> to something stricter than pure RCpc, then that's good to know.
> 
> My understanding from earlier discussions is that ARM has no plans
> to use their own RCpc instruction for smp_load_acquire() instead
> of their RCsc instructions.  Is that still true?  If they were to
> use the RCpc load there, that would cause them to have the same
> problem we're discussing here, right?  Just checking.

Agreed. No plans to use the LDAPR instruction in Linux.

Will

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock()
  2018-02-22 19:47             ` Daniel Lustig
@ 2018-02-26 14:21               ` Luc Maranget
  -1 siblings, 0 replies; 46+ messages in thread
From: Luc Maranget @ 2018-02-26 14:21 UTC (permalink / raw)
  To: Daniel Lustig
  Cc: Peter Zijlstra, Paul E. McKenney, Andrea Parri, linux-kernel,
	Palmer Dabbelt, Albert Ou, Alan Stern, Will Deacon, Boqun Feng,
	Nicholas Piggin, David Howells, Jade Alglave, Luc Maranget,
	Akira Yokosawa, Ingo Molnar, Linus Torvalds, linux-riscv

> On 2/22/2018 10:27 AM, Peter Zijlstra wrote:
> > On Thu, Feb 22, 2018 at 10:13:17AM -0800, Paul E. McKenney wrote:
> >> So we have something that is not all that rare in the Linux kernel
> >> community, namely two conflicting more-or-less concurrent changes.
> >> This clearly needs to be resolved, either by us not strengthening the
> >> Linux-kernel memory model in the way we were planning to or by you
> >> strengthening RISC-V to be no weaker than PowerPC for these sorts of
> >> externally viewed release-acquire situations.
> >>
> >> Other thoughts?
> > 
> > Like said in the other email, I would _much_ prefer to not go weaker
> > than PPC, I find that PPC is already painfully weak at times.
> 
> Sure, and RISC-V could make this work too by using RCsc instructions
> and/or by using lightweight fences instead.  It just wasn't clear at
> first whether smp_load_acquire() and smp_store_release() were RCpc,
> RCsc, or something else, 

As far as I now understand, they are something else. Something in-between.

Basically, considering a multi-copy atomic model, and following RISCV
axiomatic herd notations

RCpc
   [Acq];po in ppo (ie Acquire "half barrier")
   po;[Rel] in ppo (ie Release "half barrier")

RCsc adds the rule
   [Rel];po;[Acq] in ppo. (ie Store release followed by load Acquire is
                          kept in order)

While LKMM has a less constrained additional rule
   [Rel];po-loc;[Acq]     (ie Store release followed by load Acquire from
                          the SAME location is  kept in order)
(For specialists LKMM is formulated differently, the overall
effect is the same)

> and hence whether RISC-V would actually need
> to use something stronger than pure RCpc there.  Likewise for
> spin_unlock()/spin_lock() and everywhere else this comes up.
On may here observe that the LKMM rule is here precisely to
forbid Andrea's test. In that aspect,the suggested RISCv implemention
of lock and unlock on spinlock_t is too weak as it allows the test.

However, I cannot conclude on smp_load_acquire/smp_store_release (see below).

> 
> As Paul's email in the other thread observed, RCpc seems to be
> OK for smp_load_acquire()/smp_store_release() at least according
> to the current LKMM herd spec.  Unlock/lock are stronger already
> I guess.  But if there's an active proposal to strengthen them all
> to something stricter than pure RCpc, then that's good to know.

As illustrated above  RCpc is not enough to implement LLKM
smp_load_acquire()/smp_store_release(). However, I do not know
for sure what the current implementation of smp_load_acquire()
and smp_store_release() (ie LLKM primitives) in RISCV.
As far as I could find the current implementation is the generic default
that uses strong fences and is safe.

On a related note, it may not be clear yet to everyone
that LLKM has "platonic spinlocks".

That is, locks are not implemented from more basic primitive but are specified.
The specification can be described as behaving that way:
  - A lock behaves as a read-modify-write. teh read behaving as a read-acquire
  - A unlock behaves as a store release.
  - Lock and Unlock operation (to a given lock variable)
    alternate in some total order. This order induces teh usal relations
    between accesses to a given location.

This of course strongly suggest implementing lock with a RWM (R being acquire,
ie amoswap.aq in RISCV)  and unlock as a store release
(ie amoswap.rl in RISCV). And again, this is too weak given LKMM
additional rule for RC. However, nothing prevents from a different
implemention provided it is safe.
As regards LKMM spinlocks, I am not sure that having strong
implementations of lock/unlock in Power (lwsync) and
ARMv8 (RCsc?) commands a specification stronger than the current one.


> 
> My understanding from earlier discussions is that ARM has no plans
> to use their own RCpc instruction for smp_load_acquire() instead
> of their RCsc instructions.  Is that still true?  If they were to
> use the RCpc load there, that would cause them to have the same
> problem we're discussing here, right?  Just checking.
> 
> Dan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock()
@ 2018-02-26 14:21               ` Luc Maranget
  0 siblings, 0 replies; 46+ messages in thread
From: Luc Maranget @ 2018-02-26 14:21 UTC (permalink / raw)
  To: linux-riscv

> On 2/22/2018 10:27 AM, Peter Zijlstra wrote:
> > On Thu, Feb 22, 2018 at 10:13:17AM -0800, Paul E. McKenney wrote:
> >> So we have something that is not all that rare in the Linux kernel
> >> community, namely two conflicting more-or-less concurrent changes.
> >> This clearly needs to be resolved, either by us not strengthening the
> >> Linux-kernel memory model in the way we were planning to or by you
> >> strengthening RISC-V to be no weaker than PowerPC for these sorts of
> >> externally viewed release-acquire situations.
> >>
> >> Other thoughts?
> > 
> > Like said in the other email, I would _much_ prefer to not go weaker
> > than PPC, I find that PPC is already painfully weak at times.
> 
> Sure, and RISC-V could make this work too by using RCsc instructions
> and/or by using lightweight fences instead.  It just wasn't clear at
> first whether smp_load_acquire() and smp_store_release() were RCpc,
> RCsc, or something else, 

As far as I now understand, they are something else. Something in-between.

Basically, considering a multi-copy atomic model, and following RISCV
axiomatic herd notations

RCpc
   [Acq];po in ppo (ie Acquire "half barrier")
   po;[Rel] in ppo (ie Release "half barrier")

RCsc adds the rule
   [Rel];po;[Acq] in ppo. (ie Store release followed by load Acquire is
                          kept in order)

While LKMM has a less constrained additional rule
   [Rel];po-loc;[Acq]     (ie Store release followed by load Acquire from
                          the SAME location is  kept in order)
(For specialists LKMM is formulated differently, the overall
effect is the same)

> and hence whether RISC-V would actually need
> to use something stronger than pure RCpc there.  Likewise for
> spin_unlock()/spin_lock() and everywhere else this comes up.
On may here observe that the LKMM rule is here precisely to
forbid Andrea's test. In that aspect,the suggested RISCv implemention
of lock and unlock on spinlock_t is too weak as it allows the test.

However, I cannot conclude on smp_load_acquire/smp_store_release (see below).

> 
> As Paul's email in the other thread observed, RCpc seems to be
> OK for smp_load_acquire()/smp_store_release() at least according
> to the current LKMM herd spec.  Unlock/lock are stronger already
> I guess.  But if there's an active proposal to strengthen them all
> to something stricter than pure RCpc, then that's good to know.

As illustrated above  RCpc is not enough to implement LLKM
smp_load_acquire()/smp_store_release(). However, I do not know
for sure what the current implementation of smp_load_acquire()
and smp_store_release() (ie LLKM primitives) in RISCV.
As far as I could find the current implementation is the generic default
that uses strong fences and is safe.

On a related note, it may not be clear yet to everyone
that LLKM has "platonic spinlocks".

That is, locks are not implemented from more basic primitive but are specified.
The specification can be described as behaving that way:
  - A lock behaves as a read-modify-write. teh read behaving as a read-acquire
  - A unlock behaves as a store release.
  - Lock and Unlock operation (to a given lock variable)
    alternate in some total order. This order induces teh usal relations
    between accesses to a given location.

This of course strongly suggest implementing lock with a RWM (R being acquire,
ie amoswap.aq in RISCV)  and unlock as a store release
(ie amoswap.rl in RISCV). And again, this is too weak given LKMM
additional rule for RC. However, nothing prevents from a different
implemention provided it is safe.
As regards LKMM spinlocks, I am not sure that having strong
implementations of lock/unlock in Power (lwsync) and
ARMv8 (RCsc?) commands a specification stronger than the current one.


> 
> My understanding from earlier discussions is that ARM has no plans
> to use their own RCpc instruction for smp_load_acquire() instead
> of their RCsc instructions.  Is that still true?  If they were to
> use the RCpc load there, that would cause them to have the same
> problem we're discussing here, right?  Just checking.
> 
> Dan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock()
  2018-02-26 14:21               ` Luc Maranget
@ 2018-02-26 16:06                 ` Linus Torvalds
  -1 siblings, 0 replies; 46+ messages in thread
From: Linus Torvalds @ 2018-02-26 16:06 UTC (permalink / raw)
  To: Luc Maranget
  Cc: Daniel Lustig, Peter Zijlstra, Paul E. McKenney, Andrea Parri,
	Linux Kernel Mailing List, Palmer Dabbelt, Albert Ou, Alan Stern,
	Will Deacon, Boqun Feng, Nicholas Piggin, David Howells,
	Jade Alglave, Akira Yokosawa, Ingo Molnar, linux-riscv

On Mon, Feb 26, 2018 at 6:21 AM, Luc Maranget <luc.maranget@inria.fr> wrote:
>
> That is, locks are not implemented from more basic primitive but are specified.
> The specification can be described as behaving that way:
>   - A lock behaves as a read-modify-write. the read behaving as a read-acquire

This is wrong, or perhaps just misleading.

The *whole* r-m-w acts as an acquire. Not just the read part. The
write is very much part of it.

Maybe that's what you meant, but it read to me as "just the read part
of the rmw behaves as a read-acquire".

Because it is very important that the _write_ part of the rmw is also
ordered wrt everything that is inside the spinlock.

So doing a spinlock as

 (a) read-locked-acquire
   modify
 (c) write-conditional

would be wrong, because the accesses inside the spinlock are ordered
not just wrt the read-acquire, they have to be ordered wrt the write
too.

So it is closer to say that it's the _write_ of the r-m-w sequence
that has the acquire semantics, not the read.

>   - A unlock behaves as a store release.

Yes.

                  Linus

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock()
@ 2018-02-26 16:06                 ` Linus Torvalds
  0 siblings, 0 replies; 46+ messages in thread
From: Linus Torvalds @ 2018-02-26 16:06 UTC (permalink / raw)
  To: linux-riscv

On Mon, Feb 26, 2018 at 6:21 AM, Luc Maranget <luc.maranget@inria.fr> wrote:
>
> That is, locks are not implemented from more basic primitive but are specified.
> The specification can be described as behaving that way:
>   - A lock behaves as a read-modify-write. the read behaving as a read-acquire

This is wrong, or perhaps just misleading.

The *whole* r-m-w acts as an acquire. Not just the read part. The
write is very much part of it.

Maybe that's what you meant, but it read to me as "just the read part
of the rmw behaves as a read-acquire".

Because it is very important that the _write_ part of the rmw is also
ordered wrt everything that is inside the spinlock.

So doing a spinlock as

 (a) read-locked-acquire
   modify
 (c) write-conditional

would be wrong, because the accesses inside the spinlock are ordered
not just wrt the read-acquire, they have to be ordered wrt the write
too.

So it is closer to say that it's the _write_ of the r-m-w sequence
that has the acquire semantics, not the read.

>   - A unlock behaves as a store release.

Yes.

                  Linus

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock()
  2018-02-26 16:06                 ` Linus Torvalds
@ 2018-02-26 16:24                   ` Will Deacon
  -1 siblings, 0 replies; 46+ messages in thread
From: Will Deacon @ 2018-02-26 16:24 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Luc Maranget, Daniel Lustig, Peter Zijlstra, Paul E. McKenney,
	Andrea Parri, Linux Kernel Mailing List, Palmer Dabbelt,
	Albert Ou, Alan Stern, Boqun Feng, Nicholas Piggin,
	David Howells, Jade Alglave, Akira Yokosawa, Ingo Molnar,
	linux-riscv

On Mon, Feb 26, 2018 at 08:06:59AM -0800, Linus Torvalds wrote:
> On Mon, Feb 26, 2018 at 6:21 AM, Luc Maranget <luc.maranget@inria.fr> wrote:
> >
> > That is, locks are not implemented from more basic primitive but are specified.
> > The specification can be described as behaving that way:
> >   - A lock behaves as a read-modify-write. the read behaving as a read-acquire
> 
> This is wrong, or perhaps just misleading.
> 
> The *whole* r-m-w acts as an acquire. Not just the read part. The
> write is very much part of it.
> 
> Maybe that's what you meant, but it read to me as "just the read part
> of the rmw behaves as a read-acquire".
> 
> Because it is very important that the _write_ part of the rmw is also
> ordered wrt everything that is inside the spinlock.
> 
> So doing a spinlock as
> 
>  (a) read-locked-acquire
>    modify
>  (c) write-conditional
> 
> would be wrong, because the accesses inside the spinlock are ordered
> not just wrt the read-acquire, they have to be ordered wrt the write
> too.
> 
> So it is closer to say that it's the _write_ of the r-m-w sequence
> that has the acquire semantics, not the read.

Strictly speaking, that's not what we've got implemented on arm64: only
the read part of the RmW has Acquire semantics, but there is a total
order on the lock/unlock operations for the lock. For example, if one
CPU does:

spin_lock(&lock);
WRITE_ONCE(foo, 42);

then another CPU could do:

if (smp_load_acquire(&foo) == 42)
	BUG_ON(!spin_is_locked(&lock));

and that could fire. Is that relied on somewhere?

Will

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock()
@ 2018-02-26 16:24                   ` Will Deacon
  0 siblings, 0 replies; 46+ messages in thread
From: Will Deacon @ 2018-02-26 16:24 UTC (permalink / raw)
  To: linux-riscv

On Mon, Feb 26, 2018 at 08:06:59AM -0800, Linus Torvalds wrote:
> On Mon, Feb 26, 2018 at 6:21 AM, Luc Maranget <luc.maranget@inria.fr> wrote:
> >
> > That is, locks are not implemented from more basic primitive but are specified.
> > The specification can be described as behaving that way:
> >   - A lock behaves as a read-modify-write. the read behaving as a read-acquire
> 
> This is wrong, or perhaps just misleading.
> 
> The *whole* r-m-w acts as an acquire. Not just the read part. The
> write is very much part of it.
> 
> Maybe that's what you meant, but it read to me as "just the read part
> of the rmw behaves as a read-acquire".
> 
> Because it is very important that the _write_ part of the rmw is also
> ordered wrt everything that is inside the spinlock.
> 
> So doing a spinlock as
> 
>  (a) read-locked-acquire
>    modify
>  (c) write-conditional
> 
> would be wrong, because the accesses inside the spinlock are ordered
> not just wrt the read-acquire, they have to be ordered wrt the write
> too.
> 
> So it is closer to say that it's the _write_ of the r-m-w sequence
> that has the acquire semantics, not the read.

Strictly speaking, that's not what we've got implemented on arm64: only
the read part of the RmW has Acquire semantics, but there is a total
order on the lock/unlock operations for the lock. For example, if one
CPU does:

spin_lock(&lock);
WRITE_ONCE(foo, 42);

then another CPU could do:

if (smp_load_acquire(&foo) == 42)
	BUG_ON(!spin_is_locked(&lock));

and that could fire. Is that relied on somewhere?

Will

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock()
  2018-02-26 16:24                   ` Will Deacon
@ 2018-02-26 17:00                     ` Linus Torvalds
  -1 siblings, 0 replies; 46+ messages in thread
From: Linus Torvalds @ 2018-02-26 17:00 UTC (permalink / raw)
  To: Will Deacon
  Cc: Luc Maranget, Daniel Lustig, Peter Zijlstra, Paul E. McKenney,
	Andrea Parri, Linux Kernel Mailing List, Palmer Dabbelt,
	Albert Ou, Alan Stern, Boqun Feng, Nicholas Piggin,
	David Howells, Jade Alglave, Akira Yokosawa, Ingo Molnar,
	linux-riscv

On Mon, Feb 26, 2018 at 8:24 AM, Will Deacon <will.deacon@arm.com> wrote:
>
> Strictly speaking, that's not what we've got implemented on arm64: only
> the read part of the RmW has Acquire semantics, but there is a total
> order on the lock/unlock operations for the lock.

Hmm.

I thought we had exactly that bug on some architecture with the queued
spinlocks, and people decided it was wrong.

But it's possible that I mis-remember, and that we decided it was ok after all.

> spin_lock(&lock);
> WRITE_ONCE(foo, 42);
>
> then another CPU could do:
>
> if (smp_load_acquire(&foo) == 42)
>         BUG_ON(!spin_is_locked(&lock));
>
> and that could fire. Is that relied on somewhere?

I have a distinct memory that we said the spinlock write is seen in
order, wrt the writes inside the spinlock, and the reason was
something very similar to the above, except that "spin_is_locked()"
was about our spin_unlock_wait().

Because we had something very much like the above in the exit path,
where we would look at some state and do "spin_unlock_wait()" and
expect to be guaranteed to be the last user after that.

But a few months ago we obviously got rid of spin_unlock_wait exactly
because people were worried about the semantics.

So maybe I just remember an older issue that simply became a non-issue
with that.

               Linus

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock()
@ 2018-02-26 17:00                     ` Linus Torvalds
  0 siblings, 0 replies; 46+ messages in thread
From: Linus Torvalds @ 2018-02-26 17:00 UTC (permalink / raw)
  To: linux-riscv

On Mon, Feb 26, 2018 at 8:24 AM, Will Deacon <will.deacon@arm.com> wrote:
>
> Strictly speaking, that's not what we've got implemented on arm64: only
> the read part of the RmW has Acquire semantics, but there is a total
> order on the lock/unlock operations for the lock.

Hmm.

I thought we had exactly that bug on some architecture with the queued
spinlocks, and people decided it was wrong.

But it's possible that I mis-remember, and that we decided it was ok after all.

> spin_lock(&lock);
> WRITE_ONCE(foo, 42);
>
> then another CPU could do:
>
> if (smp_load_acquire(&foo) == 42)
>         BUG_ON(!spin_is_locked(&lock));
>
> and that could fire. Is that relied on somewhere?

I have a distinct memory that we said the spinlock write is seen in
order, wrt the writes inside the spinlock, and the reason was
something very similar to the above, except that "spin_is_locked()"
was about our spin_unlock_wait().

Because we had something very much like the above in the exit path,
where we would look at some state and do "spin_unlock_wait()" and
expect to be guaranteed to be the last user after that.

But a few months ago we obviously got rid of spin_unlock_wait exactly
because people were worried about the semantics.

So maybe I just remember an older issue that simply became a non-issue
with that.

               Linus

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock()
  2018-02-26 17:00                     ` Linus Torvalds
@ 2018-02-26 17:10                       ` Will Deacon
  -1 siblings, 0 replies; 46+ messages in thread
From: Will Deacon @ 2018-02-26 17:10 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Luc Maranget, Daniel Lustig, Peter Zijlstra, Paul E. McKenney,
	Andrea Parri, Linux Kernel Mailing List, Palmer Dabbelt,
	Albert Ou, Alan Stern, Boqun Feng, Nicholas Piggin,
	David Howells, Jade Alglave, Akira Yokosawa, Ingo Molnar,
	linux-riscv

On Mon, Feb 26, 2018 at 09:00:43AM -0800, Linus Torvalds wrote:
> On Mon, Feb 26, 2018 at 8:24 AM, Will Deacon <will.deacon@arm.com> wrote:
> >
> > Strictly speaking, that's not what we've got implemented on arm64: only
> > the read part of the RmW has Acquire semantics, but there is a total
> > order on the lock/unlock operations for the lock.
> 
> Hmm.
> 
> I thought we had exactly that bug on some architecture with the queued
> spinlocks, and people decided it was wrong.
> 
> But it's possible that I mis-remember, and that we decided it was ok after all.
> 
> > spin_lock(&lock);
> > WRITE_ONCE(foo, 42);
> >
> > then another CPU could do:
> >
> > if (smp_load_acquire(&foo) == 42)
> >         BUG_ON(!spin_is_locked(&lock));
> >
> > and that could fire. Is that relied on somewhere?
> 
> I have a distinct memory that we said the spinlock write is seen in
> order, wrt the writes inside the spinlock, and the reason was
> something very similar to the above, except that "spin_is_locked()"
> was about our spin_unlock_wait().

Yes, we did run into problems with spin_unlock_wait and we ended up
strengthening the arm64 implementation to do an RmW, which puts it into
the total order of lock/unlock operations. However, we then went and
killed the thing because it was seldom used correctly and we struggled
to define what "correctly" even meant!

> Because we had something very much like the above in the exit path,
> where we would look at some state and do "spin_unlock_wait()" and
> expect to be guaranteed to be the last user after that.
> 
> But a few months ago we obviously got rid of spin_unlock_wait exactly
> because people were worried about the semantics.

Similarly for spin_can_lock.

> So maybe I just remember an older issue that simply became a non-issue
> with that.

I think so. If we need to, I could make spin_is_locked do an RmW on
arm64 so we can say that all successful spin_* operations are totally
ordered for a given lock, but spin_is_locked is normally only used as a
coarse debug check anyway where it's assumed that if it's held, it's
held by the current CPU. We should probably move most users over to
lockdep and see what we're left with.

Will

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock()
@ 2018-02-26 17:10                       ` Will Deacon
  0 siblings, 0 replies; 46+ messages in thread
From: Will Deacon @ 2018-02-26 17:10 UTC (permalink / raw)
  To: linux-riscv

On Mon, Feb 26, 2018 at 09:00:43AM -0800, Linus Torvalds wrote:
> On Mon, Feb 26, 2018 at 8:24 AM, Will Deacon <will.deacon@arm.com> wrote:
> >
> > Strictly speaking, that's not what we've got implemented on arm64: only
> > the read part of the RmW has Acquire semantics, but there is a total
> > order on the lock/unlock operations for the lock.
> 
> Hmm.
> 
> I thought we had exactly that bug on some architecture with the queued
> spinlocks, and people decided it was wrong.
> 
> But it's possible that I mis-remember, and that we decided it was ok after all.
> 
> > spin_lock(&lock);
> > WRITE_ONCE(foo, 42);
> >
> > then another CPU could do:
> >
> > if (smp_load_acquire(&foo) == 42)
> >         BUG_ON(!spin_is_locked(&lock));
> >
> > and that could fire. Is that relied on somewhere?
> 
> I have a distinct memory that we said the spinlock write is seen in
> order, wrt the writes inside the spinlock, and the reason was
> something very similar to the above, except that "spin_is_locked()"
> was about our spin_unlock_wait().

Yes, we did run into problems with spin_unlock_wait and we ended up
strengthening the arm64 implementation to do an RmW, which puts it into
the total order of lock/unlock operations. However, we then went and
killed the thing because it was seldom used correctly and we struggled
to define what "correctly" even meant!

> Because we had something very much like the above in the exit path,
> where we would look at some state and do "spin_unlock_wait()" and
> expect to be guaranteed to be the last user after that.
> 
> But a few months ago we obviously got rid of spin_unlock_wait exactly
> because people were worried about the semantics.

Similarly for spin_can_lock.

> So maybe I just remember an older issue that simply became a non-issue
> with that.

I think so. If we need to, I could make spin_is_locked do an RmW on
arm64 so we can say that all successful spin_* operations are totally
ordered for a given lock, but spin_is_locked is normally only used as a
coarse debug check anyway where it's assumed that if it's held, it's
held by the current CPU. We should probably move most users over to
lockdep and see what we're left with.

Will

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock()
  2018-02-26 16:24                   ` Will Deacon
@ 2018-02-27  5:06                     ` Boqun Feng
  -1 siblings, 0 replies; 46+ messages in thread
From: Boqun Feng @ 2018-02-27  5:06 UTC (permalink / raw)
  To: Will Deacon
  Cc: Linus Torvalds, Luc Maranget, Daniel Lustig, Peter Zijlstra,
	Paul E. McKenney, Andrea Parri, Linux Kernel Mailing List,
	Palmer Dabbelt, Albert Ou, Alan Stern, Nicholas Piggin,
	David Howells, Jade Alglave, Akira Yokosawa, Ingo Molnar,
	linux-riscv

[-- Attachment #1: Type: text/plain, Size: 2173 bytes --]

On Mon, Feb 26, 2018 at 04:24:27PM +0000, Will Deacon wrote:
> On Mon, Feb 26, 2018 at 08:06:59AM -0800, Linus Torvalds wrote:
> > On Mon, Feb 26, 2018 at 6:21 AM, Luc Maranget <luc.maranget@inria.fr> wrote:
> > >
> > > That is, locks are not implemented from more basic primitive but are specified.
> > > The specification can be described as behaving that way:
> > >   - A lock behaves as a read-modify-write. the read behaving as a read-acquire
> > 
> > This is wrong, or perhaps just misleading.
> > 
> > The *whole* r-m-w acts as an acquire. Not just the read part. The
> > write is very much part of it.
> > 
> > Maybe that's what you meant, but it read to me as "just the read part
> > of the rmw behaves as a read-acquire".
> > 
> > Because it is very important that the _write_ part of the rmw is also
> > ordered wrt everything that is inside the spinlock.
> > 
> > So doing a spinlock as
> > 
> >  (a) read-locked-acquire
> >    modify
> >  (c) write-conditional
> > 
> > would be wrong, because the accesses inside the spinlock are ordered
> > not just wrt the read-acquire, they have to be ordered wrt the write
> > too.
> > 
> > So it is closer to say that it's the _write_ of the r-m-w sequence
> > that has the acquire semantics, not the read.
> 
> Strictly speaking, that's not what we've got implemented on arm64: only
> the read part of the RmW has Acquire semantics, but there is a total
> order on the lock/unlock operations for the lock. For example, if one
> CPU does:
> 
> spin_lock(&lock);
> WRITE_ONCE(foo, 42);
> 
> then another CPU could do:
> 
> if (smp_load_acquire(&foo) == 42)
> 	BUG_ON(!spin_is_locked(&lock));
> 

Hmm.. this is new to me. So the write part of spin_lock() and the
WRITE_ONCE() will not get reordered? Could you explain more about this
or point where I should look in the document? I understand the write
part of spin_lock() must be committed earlier than the WRITE_ONCE()
committed due to the ll/sc, but I thought the ordering of their arrivals
in memory system is undefined/arbitary.

Regards,
Boqun

> and that could fire. Is that relied on somewhere?
> 
> Will

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock()
@ 2018-02-27  5:06                     ` Boqun Feng
  0 siblings, 0 replies; 46+ messages in thread
From: Boqun Feng @ 2018-02-27  5:06 UTC (permalink / raw)
  To: linux-riscv

On Mon, Feb 26, 2018 at 04:24:27PM +0000, Will Deacon wrote:
> On Mon, Feb 26, 2018 at 08:06:59AM -0800, Linus Torvalds wrote:
> > On Mon, Feb 26, 2018 at 6:21 AM, Luc Maranget <luc.maranget@inria.fr> wrote:
> > >
> > > That is, locks are not implemented from more basic primitive but are specified.
> > > The specification can be described as behaving that way:
> > >   - A lock behaves as a read-modify-write. the read behaving as a read-acquire
> > 
> > This is wrong, or perhaps just misleading.
> > 
> > The *whole* r-m-w acts as an acquire. Not just the read part. The
> > write is very much part of it.
> > 
> > Maybe that's what you meant, but it read to me as "just the read part
> > of the rmw behaves as a read-acquire".
> > 
> > Because it is very important that the _write_ part of the rmw is also
> > ordered wrt everything that is inside the spinlock.
> > 
> > So doing a spinlock as
> > 
> >  (a) read-locked-acquire
> >    modify
> >  (c) write-conditional
> > 
> > would be wrong, because the accesses inside the spinlock are ordered
> > not just wrt the read-acquire, they have to be ordered wrt the write
> > too.
> > 
> > So it is closer to say that it's the _write_ of the r-m-w sequence
> > that has the acquire semantics, not the read.
> 
> Strictly speaking, that's not what we've got implemented on arm64: only
> the read part of the RmW has Acquire semantics, but there is a total
> order on the lock/unlock operations for the lock. For example, if one
> CPU does:
> 
> spin_lock(&lock);
> WRITE_ONCE(foo, 42);
> 
> then another CPU could do:
> 
> if (smp_load_acquire(&foo) == 42)
> 	BUG_ON(!spin_is_locked(&lock));
> 

Hmm.. this is new to me. So the write part of spin_lock() and the
WRITE_ONCE() will not get reordered? Could you explain more about this
or point where I should look in the document? I understand the write
part of spin_lock() must be committed earlier than the WRITE_ONCE()
committed due to the ll/sc, but I thought the ordering of their arrivals
in memory system is undefined/arbitary.

Regards,
Boqun

> and that could fire. Is that relied on somewhere?
> 
> Will
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-riscv/attachments/20180227/6adaf085/attachment.sig>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock()
  2018-02-27  5:06                     ` Boqun Feng
@ 2018-02-27 10:16                       ` Boqun Feng
  -1 siblings, 0 replies; 46+ messages in thread
From: Boqun Feng @ 2018-02-27 10:16 UTC (permalink / raw)
  To: Will Deacon
  Cc: Linus Torvalds, Luc Maranget, Daniel Lustig, Peter Zijlstra,
	Paul E. McKenney, Andrea Parri, Linux Kernel Mailing List,
	Palmer Dabbelt, Albert Ou, Alan Stern, Nicholas Piggin,
	David Howells, Jade Alglave, Akira Yokosawa, Ingo Molnar,
	linux-riscv

[-- Attachment #1: Type: text/plain, Size: 2522 bytes --]

On Tue, Feb 27, 2018 at 01:06:35PM +0800, Boqun Feng wrote:
> On Mon, Feb 26, 2018 at 04:24:27PM +0000, Will Deacon wrote:
> > On Mon, Feb 26, 2018 at 08:06:59AM -0800, Linus Torvalds wrote:
> > > On Mon, Feb 26, 2018 at 6:21 AM, Luc Maranget <luc.maranget@inria.fr> wrote:
> > > >
> > > > That is, locks are not implemented from more basic primitive but are specified.
> > > > The specification can be described as behaving that way:
> > > >   - A lock behaves as a read-modify-write. the read behaving as a read-acquire
> > > 
> > > This is wrong, or perhaps just misleading.
> > > 
> > > The *whole* r-m-w acts as an acquire. Not just the read part. The
> > > write is very much part of it.
> > > 
> > > Maybe that's what you meant, but it read to me as "just the read part
> > > of the rmw behaves as a read-acquire".
> > > 
> > > Because it is very important that the _write_ part of the rmw is also
> > > ordered wrt everything that is inside the spinlock.
> > > 
> > > So doing a spinlock as
> > > 
> > >  (a) read-locked-acquire
> > >    modify
> > >  (c) write-conditional
> > > 
> > > would be wrong, because the accesses inside the spinlock are ordered
> > > not just wrt the read-acquire, they have to be ordered wrt the write
> > > too.
> > > 
> > > So it is closer to say that it's the _write_ of the r-m-w sequence
> > > that has the acquire semantics, not the read.
> > 
> > Strictly speaking, that's not what we've got implemented on arm64: only
> > the read part of the RmW has Acquire semantics, but there is a total
> > order on the lock/unlock operations for the lock. For example, if one
> > CPU does:
> > 
> > spin_lock(&lock);
> > WRITE_ONCE(foo, 42);
> > 
> > then another CPU could do:
> > 
> > if (smp_load_acquire(&foo) == 42)
> > 	BUG_ON(!spin_is_locked(&lock));
> > 
> 
> Hmm.. this is new to me. So the write part of spin_lock() and the
> WRITE_ONCE() will not get reordered? Could you explain more about this
> or point where I should look in the document? I understand the write
> part of spin_lock() must be committed earlier than the WRITE_ONCE()
> committed due to the ll/sc, but I thought the ordering of their arrivals
> in memory system is undefined/arbitary.
> 
> Regards,
> Boqun
> 
> > and that could fire. Is that relied on somewhere?
> > 

My bad, I misunderstood your mail. So you were saying the BUG_ON() can
be triggered in currently implementations. Never mind my reply above.

Regards,
Boqun

> > Will



[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock()
@ 2018-02-27 10:16                       ` Boqun Feng
  0 siblings, 0 replies; 46+ messages in thread
From: Boqun Feng @ 2018-02-27 10:16 UTC (permalink / raw)
  To: linux-riscv

On Tue, Feb 27, 2018 at 01:06:35PM +0800, Boqun Feng wrote:
> On Mon, Feb 26, 2018 at 04:24:27PM +0000, Will Deacon wrote:
> > On Mon, Feb 26, 2018 at 08:06:59AM -0800, Linus Torvalds wrote:
> > > On Mon, Feb 26, 2018 at 6:21 AM, Luc Maranget <luc.maranget@inria.fr> wrote:
> > > >
> > > > That is, locks are not implemented from more basic primitive but are specified.
> > > > The specification can be described as behaving that way:
> > > >   - A lock behaves as a read-modify-write. the read behaving as a read-acquire
> > > 
> > > This is wrong, or perhaps just misleading.
> > > 
> > > The *whole* r-m-w acts as an acquire. Not just the read part. The
> > > write is very much part of it.
> > > 
> > > Maybe that's what you meant, but it read to me as "just the read part
> > > of the rmw behaves as a read-acquire".
> > > 
> > > Because it is very important that the _write_ part of the rmw is also
> > > ordered wrt everything that is inside the spinlock.
> > > 
> > > So doing a spinlock as
> > > 
> > >  (a) read-locked-acquire
> > >    modify
> > >  (c) write-conditional
> > > 
> > > would be wrong, because the accesses inside the spinlock are ordered
> > > not just wrt the read-acquire, they have to be ordered wrt the write
> > > too.
> > > 
> > > So it is closer to say that it's the _write_ of the r-m-w sequence
> > > that has the acquire semantics, not the read.
> > 
> > Strictly speaking, that's not what we've got implemented on arm64: only
> > the read part of the RmW has Acquire semantics, but there is a total
> > order on the lock/unlock operations for the lock. For example, if one
> > CPU does:
> > 
> > spin_lock(&lock);
> > WRITE_ONCE(foo, 42);
> > 
> > then another CPU could do:
> > 
> > if (smp_load_acquire(&foo) == 42)
> > 	BUG_ON(!spin_is_locked(&lock));
> > 
> 
> Hmm.. this is new to me. So the write part of spin_lock() and the
> WRITE_ONCE() will not get reordered? Could you explain more about this
> or point where I should look in the document? I understand the write
> part of spin_lock() must be committed earlier than the WRITE_ONCE()
> committed due to the ll/sc, but I thought the ordering of their arrivals
> in memory system is undefined/arbitary.
> 
> Regards,
> Boqun
> 
> > and that could fire. Is that relied on somewhere?
> > 

My bad, I misunderstood your mail. So you were saying the BUG_ON() can
be triggered in currently implementations. Never mind my reply above.

Regards,
Boqun

> > Will


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-riscv/attachments/20180227/a1dfcf1b/attachment.sig>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock()
  2018-02-22 19:47             ` Daniel Lustig
@ 2018-03-01 15:11               ` Andrea Parri
  -1 siblings, 0 replies; 46+ messages in thread
From: Andrea Parri @ 2018-03-01 15:11 UTC (permalink / raw)
  To: Daniel Lustig
  Cc: Peter Zijlstra, Paul E. McKenney, linux-kernel, Palmer Dabbelt,
	Albert Ou, Alan Stern, Will Deacon, Boqun Feng, Nicholas Piggin,
	David Howells, Jade Alglave, Luc Maranget, Akira Yokosawa,
	Ingo Molnar, Linus Torvalds, linux-riscv

Hi Daniel,

On Thu, Feb 22, 2018 at 11:47:57AM -0800, Daniel Lustig wrote:
> On 2/22/2018 10:27 AM, Peter Zijlstra wrote:
> > On Thu, Feb 22, 2018 at 10:13:17AM -0800, Paul E. McKenney wrote:
> >> So we have something that is not all that rare in the Linux kernel
> >> community, namely two conflicting more-or-less concurrent changes.
> >> This clearly needs to be resolved, either by us not strengthening the
> >> Linux-kernel memory model in the way we were planning to or by you
> >> strengthening RISC-V to be no weaker than PowerPC for these sorts of
> >> externally viewed release-acquire situations.
> >>
> >> Other thoughts?
> > 
> > Like said in the other email, I would _much_ prefer to not go weaker
> > than PPC, I find that PPC is already painfully weak at times.
> 
> Sure, and RISC-V could make this work too by using RCsc instructions
> and/or by using lightweight fences instead.  It just wasn't clear at
> first whether smp_load_acquire() and smp_store_release() were RCpc,
> RCsc, or something else, and hence whether RISC-V would actually need
> to use something stronger than pure RCpc there.  Likewise for
> spin_unlock()/spin_lock() and everywhere else this comes up.

while digging into riscv's locks and atomics to fix the issues discussed
earlier in this thread, I became aware of another issue:

Considering here the CMPXCHG primitives, for example, I noticed that the
implementation currently provides the four variants

	ATOMIC_OPS(        , .aqrl)
	ATOMIC_OPS(_acquire,   .aq)
	ATOMIC_OPS(_release,   .rl)
	ATOMIC_OPS(_relaxed,      )

(corresponding, resp., to

	atomic_cmpxchg()
	atomic_cmpxchg_acquire()
	atomic_cmpxchg_release()
	atomic_cmpxchg_relaxed()  )

so that the first variant above ends up doing

	0:	lr.w.aqrl  %0, %addr
		bne        %0, %old, 1f
		sc.w.aqrl  %1, %new, %addr
		bnez       %1, 0b
        1:                   

>From Sect. 2.3.5. ("Acquire/Release Ordering") of the Spec.,

 "AMOs with both .aq and .rl set are fully-ordered operations.  Treating
  the load part and the store part as independent RCsc operations is not
  in and of itself sufficient to enforce full fencing behavior, but this
  subtle weak behavior is counterintuitive and not much of an advantage
  architecturally, especially with lr and sc also available [...]."

I understand that

        { x = y = u = v = 0 }

	P0()

	WRITE_ONCE(x, 1);		("relaxed" store, sw)
	atomic_cmpxchg(&u, 0, 1);
	r0 = READ_ONCE(y);		("relaxed" load, lw)

	P1()

	WRITE_ONCE(y, 1);
	atomic_cmpxchg(&v, 0, 1);
	r1 = READ_ONCE(x);

could result in (u = v = 1 and r0 = r1 = 0) at the end; can you confirm?

Thanks,
  Andrea

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock()
@ 2018-03-01 15:11               ` Andrea Parri
  0 siblings, 0 replies; 46+ messages in thread
From: Andrea Parri @ 2018-03-01 15:11 UTC (permalink / raw)
  To: linux-riscv

Hi Daniel,

On Thu, Feb 22, 2018 at 11:47:57AM -0800, Daniel Lustig wrote:
> On 2/22/2018 10:27 AM, Peter Zijlstra wrote:
> > On Thu, Feb 22, 2018 at 10:13:17AM -0800, Paul E. McKenney wrote:
> >> So we have something that is not all that rare in the Linux kernel
> >> community, namely two conflicting more-or-less concurrent changes.
> >> This clearly needs to be resolved, either by us not strengthening the
> >> Linux-kernel memory model in the way we were planning to or by you
> >> strengthening RISC-V to be no weaker than PowerPC for these sorts of
> >> externally viewed release-acquire situations.
> >>
> >> Other thoughts?
> > 
> > Like said in the other email, I would _much_ prefer to not go weaker
> > than PPC, I find that PPC is already painfully weak at times.
> 
> Sure, and RISC-V could make this work too by using RCsc instructions
> and/or by using lightweight fences instead.  It just wasn't clear at
> first whether smp_load_acquire() and smp_store_release() were RCpc,
> RCsc, or something else, and hence whether RISC-V would actually need
> to use something stronger than pure RCpc there.  Likewise for
> spin_unlock()/spin_lock() and everywhere else this comes up.

while digging into riscv's locks and atomics to fix the issues discussed
earlier in this thread, I became aware of another issue:

Considering here the CMPXCHG primitives, for example, I noticed that the
implementation currently provides the four variants

	ATOMIC_OPS(        , .aqrl)
	ATOMIC_OPS(_acquire,   .aq)
	ATOMIC_OPS(_release,   .rl)
	ATOMIC_OPS(_relaxed,      )

(corresponding, resp., to

	atomic_cmpxchg()
	atomic_cmpxchg_acquire()
	atomic_cmpxchg_release()
	atomic_cmpxchg_relaxed()  )

so that the first variant above ends up doing

	0:	lr.w.aqrl  %0, %addr
		bne        %0, %old, 1f
		sc.w.aqrl  %1, %new, %addr
		bnez       %1, 0b
        1:                   

>From Sect. 2.3.5. ("Acquire/Release Ordering") of the Spec.,

 "AMOs with both .aq and .rl set are fully-ordered operations.  Treating
  the load part and the store part as independent RCsc operations is not
  in and of itself sufficient to enforce full fencing behavior, but this
  subtle weak behavior is counterintuitive and not much of an advantage
  architecturally, especially with lr and sc also available [...]."

I understand that

        { x = y = u = v = 0 }

	P0()

	WRITE_ONCE(x, 1);		("relaxed" store, sw)
	atomic_cmpxchg(&u, 0, 1);
	r0 = READ_ONCE(y);		("relaxed" load, lw)

	P1()

	WRITE_ONCE(y, 1);
	atomic_cmpxchg(&v, 0, 1);
	r1 = READ_ONCE(x);

could result in (u = v = 1 and r0 = r1 = 0) at the end; can you confirm?

Thanks,
  Andrea

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock()
  2018-03-01 15:11               ` Andrea Parri
@ 2018-03-01 21:54                 ` Palmer Dabbelt
  -1 siblings, 0 replies; 46+ messages in thread
From: Palmer Dabbelt @ 2018-03-01 21:54 UTC (permalink / raw)
  To: parri.andrea
  Cc: Daniel Lustig, peterz, paulmck, linux-kernel, albert, stern,
	Will Deacon, boqun.feng, npiggin, dhowells, j.alglave,
	luc.maranget, akiyks, mingo, Linus Torvalds, linux-riscv

On Thu, 01 Mar 2018 07:11:41 PST (-0800), parri.andrea@gmail.com wrote:
> Hi Daniel,
>
> On Thu, Feb 22, 2018 at 11:47:57AM -0800, Daniel Lustig wrote:
>> On 2/22/2018 10:27 AM, Peter Zijlstra wrote:
>> > On Thu, Feb 22, 2018 at 10:13:17AM -0800, Paul E. McKenney wrote:
>> >> So we have something that is not all that rare in the Linux kernel
>> >> community, namely two conflicting more-or-less concurrent changes.
>> >> This clearly needs to be resolved, either by us not strengthening the
>> >> Linux-kernel memory model in the way we were planning to or by you
>> >> strengthening RISC-V to be no weaker than PowerPC for these sorts of
>> >> externally viewed release-acquire situations.
>> >>
>> >> Other thoughts?
>> >
>> > Like said in the other email, I would _much_ prefer to not go weaker
>> > than PPC, I find that PPC is already painfully weak at times.
>>
>> Sure, and RISC-V could make this work too by using RCsc instructions
>> and/or by using lightweight fences instead.  It just wasn't clear at
>> first whether smp_load_acquire() and smp_store_release() were RCpc,
>> RCsc, or something else, and hence whether RISC-V would actually need
>> to use something stronger than pure RCpc there.  Likewise for
>> spin_unlock()/spin_lock() and everywhere else this comes up.
>
> while digging into riscv's locks and atomics to fix the issues discussed
> earlier in this thread, I became aware of another issue:
>
> Considering here the CMPXCHG primitives, for example, I noticed that the
> implementation currently provides the four variants
>
> 	ATOMIC_OPS(        , .aqrl)
> 	ATOMIC_OPS(_acquire,   .aq)
> 	ATOMIC_OPS(_release,   .rl)
> 	ATOMIC_OPS(_relaxed,      )
>
> (corresponding, resp., to
>
> 	atomic_cmpxchg()
> 	atomic_cmpxchg_acquire()
> 	atomic_cmpxchg_release()
> 	atomic_cmpxchg_relaxed()  )
>
> so that the first variant above ends up doing
>
> 	0:	lr.w.aqrl  %0, %addr
> 		bne        %0, %old, 1f
> 		sc.w.aqrl  %1, %new, %addr
> 		bnez       %1, 0b
>         1:
>
> From Sect. 2.3.5. ("Acquire/Release Ordering") of the Spec.,
>
>  "AMOs with both .aq and .rl set are fully-ordered operations.  Treating
>   the load part and the store part as independent RCsc operations is not
>   in and of itself sufficient to enforce full fencing behavior, but this
>   subtle weak behavior is counterintuitive and not much of an advantage
>   architecturally, especially with lr and sc also available [...]."
>
> I understand that
>
>         { x = y = u = v = 0 }
>
> 	P0()
>
> 	WRITE_ONCE(x, 1);		("relaxed" store, sw)
> 	atomic_cmpxchg(&u, 0, 1);
> 	r0 = READ_ONCE(y);		("relaxed" load, lw)
>
> 	P1()
>
> 	WRITE_ONCE(y, 1);
> 	atomic_cmpxchg(&v, 0, 1);
> 	r1 = READ_ONCE(x);
>
> could result in (u = v = 1 and r0 = r1 = 0) at the end; can you confirm?

cmpxchg isn't an AMO, it's an LR SC sequence, so that blurb doesn't apply.  I 
think "lr.w.aqrl" and "sc.w.aqrl" is not sufficient to perform a fully ordered 
operation (ie, it's an incorrect implementation of atomic_cmpxchg()), but I was 
hoping to get some time to actually internalize this part of the RISC-V memory 
model at some point to be sure.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock()
@ 2018-03-01 21:54                 ` Palmer Dabbelt
  0 siblings, 0 replies; 46+ messages in thread
From: Palmer Dabbelt @ 2018-03-01 21:54 UTC (permalink / raw)
  To: linux-riscv

On Thu, 01 Mar 2018 07:11:41 PST (-0800), parri.andrea at gmail.com wrote:
> Hi Daniel,
>
> On Thu, Feb 22, 2018 at 11:47:57AM -0800, Daniel Lustig wrote:
>> On 2/22/2018 10:27 AM, Peter Zijlstra wrote:
>> > On Thu, Feb 22, 2018 at 10:13:17AM -0800, Paul E. McKenney wrote:
>> >> So we have something that is not all that rare in the Linux kernel
>> >> community, namely two conflicting more-or-less concurrent changes.
>> >> This clearly needs to be resolved, either by us not strengthening the
>> >> Linux-kernel memory model in the way we were planning to or by you
>> >> strengthening RISC-V to be no weaker than PowerPC for these sorts of
>> >> externally viewed release-acquire situations.
>> >>
>> >> Other thoughts?
>> >
>> > Like said in the other email, I would _much_ prefer to not go weaker
>> > than PPC, I find that PPC is already painfully weak at times.
>>
>> Sure, and RISC-V could make this work too by using RCsc instructions
>> and/or by using lightweight fences instead.  It just wasn't clear at
>> first whether smp_load_acquire() and smp_store_release() were RCpc,
>> RCsc, or something else, and hence whether RISC-V would actually need
>> to use something stronger than pure RCpc there.  Likewise for
>> spin_unlock()/spin_lock() and everywhere else this comes up.
>
> while digging into riscv's locks and atomics to fix the issues discussed
> earlier in this thread, I became aware of another issue:
>
> Considering here the CMPXCHG primitives, for example, I noticed that the
> implementation currently provides the four variants
>
> 	ATOMIC_OPS(        , .aqrl)
> 	ATOMIC_OPS(_acquire,   .aq)
> 	ATOMIC_OPS(_release,   .rl)
> 	ATOMIC_OPS(_relaxed,      )
>
> (corresponding, resp., to
>
> 	atomic_cmpxchg()
> 	atomic_cmpxchg_acquire()
> 	atomic_cmpxchg_release()
> 	atomic_cmpxchg_relaxed()  )
>
> so that the first variant above ends up doing
>
> 	0:	lr.w.aqrl  %0, %addr
> 		bne        %0, %old, 1f
> 		sc.w.aqrl  %1, %new, %addr
> 		bnez       %1, 0b
>         1:
>
> From Sect. 2.3.5. ("Acquire/Release Ordering") of the Spec.,
>
>  "AMOs with both .aq and .rl set are fully-ordered operations.  Treating
>   the load part and the store part as independent RCsc operations is not
>   in and of itself sufficient to enforce full fencing behavior, but this
>   subtle weak behavior is counterintuitive and not much of an advantage
>   architecturally, especially with lr and sc also available [...]."
>
> I understand that
>
>         { x = y = u = v = 0 }
>
> 	P0()
>
> 	WRITE_ONCE(x, 1);		("relaxed" store, sw)
> 	atomic_cmpxchg(&u, 0, 1);
> 	r0 = READ_ONCE(y);		("relaxed" load, lw)
>
> 	P1()
>
> 	WRITE_ONCE(y, 1);
> 	atomic_cmpxchg(&v, 0, 1);
> 	r1 = READ_ONCE(x);
>
> could result in (u = v = 1 and r0 = r1 = 0) at the end; can you confirm?

cmpxchg isn't an AMO, it's an LR SC sequence, so that blurb doesn't apply.  I 
think "lr.w.aqrl" and "sc.w.aqrl" is not sufficient to perform a fully ordered 
operation (ie, it's an incorrect implementation of atomic_cmpxchg()), but I was 
hoping to get some time to actually internalize this part of the RISC-V memory 
model at some point to be sure.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock()
  2018-03-01 21:54                 ` Palmer Dabbelt
@ 2018-03-01 22:21                   ` Daniel Lustig
  -1 siblings, 0 replies; 46+ messages in thread
From: Daniel Lustig @ 2018-03-01 22:21 UTC (permalink / raw)
  To: Palmer Dabbelt, parri.andrea
  Cc: peterz, paulmck, linux-kernel, albert, stern, Will Deacon,
	boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks,
	mingo, Linus Torvalds, linux-riscv

On 3/1/2018 1:54 PM, Palmer Dabbelt wrote:
> On Thu, 01 Mar 2018 07:11:41 PST (-0800), parri.andrea@gmail.com wrote:
>> Hi Daniel,
>>
>> On Thu, Feb 22, 2018 at 11:47:57AM -0800, Daniel Lustig wrote:
>>> On 2/22/2018 10:27 AM, Peter Zijlstra wrote:
>>> > On Thu, Feb 22, 2018 at 10:13:17AM -0800, Paul E. McKenney wrote:
>>> >> So we have something that is not all that rare in the Linux kernel
>>> >> community, namely two conflicting more-or-less concurrent changes.
>>> >> This clearly needs to be resolved, either by us not strengthening the
>>> >> Linux-kernel memory model in the way we were planning to or by you
>>> >> strengthening RISC-V to be no weaker than PowerPC for these sorts of
>>> >> externally viewed release-acquire situations.
>>> >>
>>> >> Other thoughts?
>>> >
>>> > Like said in the other email, I would _much_ prefer to not go weaker
>>> > than PPC, I find that PPC is already painfully weak at times.
>>>
>>> Sure, and RISC-V could make this work too by using RCsc instructions
>>> and/or by using lightweight fences instead.  It just wasn't clear at
>>> first whether smp_load_acquire() and smp_store_release() were RCpc,
>>> RCsc, or something else, and hence whether RISC-V would actually need
>>> to use something stronger than pure RCpc there.  Likewise for
>>> spin_unlock()/spin_lock() and everywhere else this comes up.
>>
>> while digging into riscv's locks and atomics to fix the issues discussed
>> earlier in this thread, I became aware of another issue:
>>
>> Considering here the CMPXCHG primitives, for example, I noticed that the
>> implementation currently provides the four variants
>>
>>     ATOMIC_OPS(        , .aqrl)
>>     ATOMIC_OPS(_acquire,   .aq)
>>     ATOMIC_OPS(_release,   .rl)
>>     ATOMIC_OPS(_relaxed,      )
>>
>> (corresponding, resp., to
>>
>>     atomic_cmpxchg()
>>     atomic_cmpxchg_acquire()
>>     atomic_cmpxchg_release()
>>     atomic_cmpxchg_relaxed()  )
>>
>> so that the first variant above ends up doing
>>
>>     0:    lr.w.aqrl  %0, %addr
>>         bne        %0, %old, 1f
>>         sc.w.aqrl  %1, %new, %addr
>>         bnez       %1, 0b
>>         1:
>>
>> From Sect. 2.3.5. ("Acquire/Release Ordering") of the Spec.,
>>
>>  "AMOs with both .aq and .rl set are fully-ordered operations.  Treating
>>   the load part and the store part as independent RCsc operations is not
>>   in and of itself sufficient to enforce full fencing behavior, but this
>>   subtle weak behavior is counterintuitive and not much of an advantage
>>   architecturally, especially with lr and sc also available [...]."
>>
>> I understand that
>>
>>         { x = y = u = v = 0 }
>>
>>     P0()
>>
>>     WRITE_ONCE(x, 1);        ("relaxed" store, sw)
>>     atomic_cmpxchg(&u, 0, 1);
>>     r0 = READ_ONCE(y);        ("relaxed" load, lw)
>>
>>     P1()
>>
>>     WRITE_ONCE(y, 1);
>>     atomic_cmpxchg(&v, 0, 1);
>>     r1 = READ_ONCE(x);
>>
>> could result in (u = v = 1 and r0 = r1 = 0) at the end; can you confirm?
> 
> cmpxchg isn't an AMO, it's an LR SC sequence, so that blurb doesn't
> apply.  I think "lr.w.aqrl" and "sc.w.aqrl" is not sufficient to
> perform a fully ordered operation (ie, it's an incorrect
> implementation of atomic_cmpxchg()), but I was hoping to get some
> time to actually internalize this part of the RISC-V memory model at
> some point to be sure.

Agreed, the current intention is that we'll add a fence rw,rw in there
when we use lr/sc as the implementation, likely similar to what ARM does:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8e86f0b409a44193f1587e87b69c5dcf8f65be67

It's also probably not the only piece of synchronization-related code
we'll have to go back and audit as we finalize the RISC-V memory
consistency model over the next couple months or so.

Dan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock()
@ 2018-03-01 22:21                   ` Daniel Lustig
  0 siblings, 0 replies; 46+ messages in thread
From: Daniel Lustig @ 2018-03-01 22:21 UTC (permalink / raw)
  To: linux-riscv

On 3/1/2018 1:54 PM, Palmer Dabbelt wrote:
> On Thu, 01 Mar 2018 07:11:41 PST (-0800), parri.andrea at gmail.com wrote:
>> Hi Daniel,
>>
>> On Thu, Feb 22, 2018 at 11:47:57AM -0800, Daniel Lustig wrote:
>>> On 2/22/2018 10:27 AM, Peter Zijlstra wrote:
>>> > On Thu, Feb 22, 2018 at 10:13:17AM -0800, Paul E. McKenney wrote:
>>> >> So we have something that is not all that rare in the Linux kernel
>>> >> community, namely two conflicting more-or-less concurrent changes.
>>> >> This clearly needs to be resolved, either by us not strengthening the
>>> >> Linux-kernel memory model in the way we were planning to or by you
>>> >> strengthening RISC-V to be no weaker than PowerPC for these sorts of
>>> >> externally viewed release-acquire situations.
>>> >>
>>> >> Other thoughts?
>>> >
>>> > Like said in the other email, I would _much_ prefer to not go weaker
>>> > than PPC, I find that PPC is already painfully weak at times.
>>>
>>> Sure, and RISC-V could make this work too by using RCsc instructions
>>> and/or by using lightweight fences instead.? It just wasn't clear at
>>> first whether smp_load_acquire() and smp_store_release() were RCpc,
>>> RCsc, or something else, and hence whether RISC-V would actually need
>>> to use something stronger than pure RCpc there.? Likewise for
>>> spin_unlock()/spin_lock() and everywhere else this comes up.
>>
>> while digging into riscv's locks and atomics to fix the issues discussed
>> earlier in this thread, I became aware of another issue:
>>
>> Considering here the CMPXCHG primitives, for example, I noticed that the
>> implementation currently provides the four variants
>>
>> ????ATOMIC_OPS(??????? , .aqrl)
>> ????ATOMIC_OPS(_acquire,?? .aq)
>> ????ATOMIC_OPS(_release,?? .rl)
>> ????ATOMIC_OPS(_relaxed,????? )
>>
>> (corresponding, resp., to
>>
>> ????atomic_cmpxchg()
>> ????atomic_cmpxchg_acquire()
>> ????atomic_cmpxchg_release()
>> ????atomic_cmpxchg_relaxed()? )
>>
>> so that the first variant above ends up doing
>>
>> ????0:??? lr.w.aqrl? %0, %addr
>> ??????? bne??????? %0, %old, 1f
>> ??????? sc.w.aqrl? %1, %new, %addr
>> ??????? bnez?????? %1, 0b
>> ??????? 1:
>>
>> From Sect. 2.3.5. ("Acquire/Release Ordering") of the Spec.,
>>
>> ?"AMOs with both .aq and .rl set are fully-ordered operations.? Treating
>> ? the load part and the store part as independent RCsc operations is not
>> ? in and of itself sufficient to enforce full fencing behavior, but this
>> ? subtle weak behavior is counterintuitive and not much of an advantage
>> ? architecturally, especially with lr and sc also available [...]."
>>
>> I understand that
>>
>> ??????? { x = y = u = v = 0 }
>>
>> ????P0()
>>
>> ????WRITE_ONCE(x, 1);??????? ("relaxed" store, sw)
>> ????atomic_cmpxchg(&u, 0, 1);
>> ????r0 = READ_ONCE(y);??????? ("relaxed" load, lw)
>>
>> ????P1()
>>
>> ????WRITE_ONCE(y, 1);
>> ????atomic_cmpxchg(&v, 0, 1);
>> ????r1 = READ_ONCE(x);
>>
>> could result in (u = v = 1 and r0 = r1 = 0) at the end; can you confirm?
> 
> cmpxchg isn't an AMO, it's an LR SC sequence, so that blurb doesn't
> apply.  I think "lr.w.aqrl" and "sc.w.aqrl" is not sufficient to
> perform a fully ordered operation (ie, it's an incorrect
> implementation of atomic_cmpxchg()), but I was hoping to get some
> time to actually internalize this part of the RISC-V memory model at
> some point to be sure.

Agreed, the current intention is that we'll add a fence rw,rw in there
when we use lr/sc as the implementation, likely similar to what ARM does:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8e86f0b409a44193f1587e87b69c5dcf8f65be67

It's also probably not the only piece of synchronization-related code
we'll have to go back and audit as we finalize the RISC-V memory
consistency model over the next couple months or so.

Dan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock()
  2018-02-26 17:00                     ` Linus Torvalds
@ 2018-03-06 13:00                       ` Peter Zijlstra
  -1 siblings, 0 replies; 46+ messages in thread
From: Peter Zijlstra @ 2018-03-06 13:00 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Will Deacon, Luc Maranget, Daniel Lustig, Paul E. McKenney,
	Andrea Parri, Linux Kernel Mailing List, Palmer Dabbelt,
	Albert Ou, Alan Stern, Boqun Feng, Nicholas Piggin,
	David Howells, Jade Alglave, Akira Yokosawa, Ingo Molnar,
	linux-riscv

On Mon, Feb 26, 2018 at 09:00:43AM -0800, Linus Torvalds wrote:
> On Mon, Feb 26, 2018 at 8:24 AM, Will Deacon <will.deacon@arm.com> wrote:
> >
> > Strictly speaking, that's not what we've got implemented on arm64: only
> > the read part of the RmW has Acquire semantics, but there is a total
> > order on the lock/unlock operations for the lock.
> 
> Hmm.
> 
> I thought we had exactly that bug on some architecture with the queued
> spinlocks, and people decided it was wrong.

So ARM64 and Power have the acquire-on-load only thing, but qspinlock
has it per construction on anything that allowes reordering stores.

Given that unlock/lock are ordered, which covers about 99% of the users
out there, and fixing the issue would make things significantly slower
on the weak architectures we let it be.

But yes, its a pesky detail.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock()
@ 2018-03-06 13:00                       ` Peter Zijlstra
  0 siblings, 0 replies; 46+ messages in thread
From: Peter Zijlstra @ 2018-03-06 13:00 UTC (permalink / raw)
  To: linux-riscv

On Mon, Feb 26, 2018 at 09:00:43AM -0800, Linus Torvalds wrote:
> On Mon, Feb 26, 2018 at 8:24 AM, Will Deacon <will.deacon@arm.com> wrote:
> >
> > Strictly speaking, that's not what we've got implemented on arm64: only
> > the read part of the RmW has Acquire semantics, but there is a total
> > order on the lock/unlock operations for the lock.
> 
> Hmm.
> 
> I thought we had exactly that bug on some architecture with the queued
> spinlocks, and people decided it was wrong.

So ARM64 and Power have the acquire-on-load only thing, but qspinlock
has it per construction on anything that allowes reordering stores.

Given that unlock/lock are ordered, which covers about 99% of the users
out there, and fixing the issue would make things significantly slower
on the weak architectures we let it be.

But yes, its a pesky detail.

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2018-03-06 13:01 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-22 12:19 [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock() Andrea Parri
2018-02-22 12:19 ` Andrea Parri
2018-02-22 12:44 ` Andrea Parri
2018-02-22 12:44   ` Andrea Parri
2018-02-22 13:40 ` Peter Zijlstra
2018-02-22 13:40   ` Peter Zijlstra
2018-02-22 14:12   ` Andrea Parri
2018-02-22 14:12     ` Andrea Parri
2018-02-22 17:27     ` Daniel Lustig
2018-02-22 17:27       ` Daniel Lustig
2018-02-22 18:13       ` Paul E. McKenney
2018-02-22 18:13         ` Paul E. McKenney
2018-02-22 18:27         ` Peter Zijlstra
2018-02-22 18:27           ` Peter Zijlstra
2018-02-22 19:47           ` Daniel Lustig
2018-02-22 19:47             ` Daniel Lustig
2018-02-23 11:16             ` Andrea Parri
2018-02-23 11:16               ` Andrea Parri
2018-02-26 10:39             ` Will Deacon
2018-02-26 10:39               ` Will Deacon
2018-02-26 14:21             ` Luc Maranget
2018-02-26 14:21               ` Luc Maranget
2018-02-26 16:06               ` Linus Torvalds
2018-02-26 16:06                 ` Linus Torvalds
2018-02-26 16:24                 ` Will Deacon
2018-02-26 16:24                   ` Will Deacon
2018-02-26 17:00                   ` Linus Torvalds
2018-02-26 17:00                     ` Linus Torvalds
2018-02-26 17:10                     ` Will Deacon
2018-02-26 17:10                       ` Will Deacon
2018-03-06 13:00                     ` Peter Zijlstra
2018-03-06 13:00                       ` Peter Zijlstra
2018-02-27  5:06                   ` Boqun Feng
2018-02-27  5:06                     ` Boqun Feng
2018-02-27 10:16                     ` Boqun Feng
2018-02-27 10:16                       ` Boqun Feng
2018-03-01 15:11             ` Andrea Parri
2018-03-01 15:11               ` Andrea Parri
2018-03-01 21:54               ` Palmer Dabbelt
2018-03-01 21:54                 ` Palmer Dabbelt
2018-03-01 22:21                 ` Daniel Lustig
2018-03-01 22:21                   ` Daniel Lustig
2018-02-22 20:02           ` Paul E. McKenney
2018-02-22 20:02             ` Paul E. McKenney
2018-02-22 18:21       ` Peter Zijlstra
2018-02-22 18:21         ` Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.