[RFC v2 0/3] memory model: Make unlock(A)+lock(B) on the same CPU RCtso

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RFC v2 0/3] memory model: Make unlock(A)+lock(B) on the same CPU RCtso
@ 2021-10-25 14:54 Boqun Feng
  2021-10-25 14:54 ` [RFC v2 1/3] tools/memory-model: Provide extra ordering for unlock+lock pair on the same CPU Boqun Feng
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Boqun Feng @ 2021-10-25 14:54 UTC (permalink / raw)
  To: linux-kernel, linux-arch, linux-doc
  Cc: Paul E . McKenney ,
	Dan Lustig, Will Deacon, Peter Zijlstra, Linus Torvalds,
	Alexander Shishkin, Peter Anvin, Andrea Parri, Ingo Molnar,
	Vince Weaver, Thomas Gleixner, Jiri Olsa,
	Arnaldo Carvalho de Melo, Stephane Eranian, palmer,
	paul.walmsley, mpe, Jonathan Corbet, Boqun Feng

Hi,

Just a new version trying to make forward progress on this  ;-)

v1: https://lore.kernel.org/lkml/20210930130823.2103688-1-boqun.feng@gmail.com/

Changes since v1:

*	Split the patch into three to help resolve the litmus test
	addition discussion.
*	Add some explanation in patch #2 on the requirement of tests in
	litmus-tests directory.

To summarize the change in memory model, we now guarantee in the
following code:

	<memory access M>
	spin_unlock(A);
	spin_lock(B);
	<memory access N>

M is ordered against N unless M is a store and N is a load. More
detailed examples of this guarantee can be found in patch #3.

Architecture maintainers, appreciate it that you can take a look at
patch #3 and rest of whole set to confirm this guarantee works on your
architectures.

Alan, I split the patchset into three patches because I do think we need
some sort of patch #2 so that we can have consensus about whether merge
patch #3 or not. I know you want to keep litmus-tests directory as
simple as possible, but it won't hurt to document the requirement.
Looking forwards to your thoughts ;-)

Suggestion and comments are welcome!

Regards,
Boqun


Boqun Feng (3):
  tools/memory-model: Provide extra ordering for unlock+lock pair on the
    same CPU
  tools/memory-model: doc: Describe the requirement of the litmus-tests
    directory
  tools/memory-model: litmus: Add two tests for unlock(A)+lock(B)
    ordering

 .../Documentation/explanation.txt             | 44 +++++++++++--------
 tools/memory-model/README                     | 12 +++++
 tools/memory-model/linux-kernel.cat           |  6 +--
 ...LB+unlocklockonceonce+poacquireonce.litmus | 33 ++++++++++++++
 ...unlocklockonceonce+fencermbonceonce.litmus | 33 ++++++++++++++
 tools/memory-model/litmus-tests/README        |  8 ++++
 6 files changed, 114 insertions(+), 22 deletions(-)
 create mode 100644 tools/memory-model/litmus-tests/LB+unlocklockonceonce+poacquireonce.litmus
 create mode 100644 tools/memory-model/litmus-tests/MP+unlocklockonceonce+fencermbonceonce.litmus

-- 
2.33.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC v2 1/3] tools/memory-model: Provide extra ordering for unlock+lock pair on the same CPU
  2021-10-25 14:54 [RFC v2 0/3] memory model: Make unlock(A)+lock(B) on the same CPU RCtso Boqun Feng
@ 2021-10-25 14:54 ` Boqun Feng
  2021-10-25 14:54 ` [RFC v2 2/3] tools/memory-model: doc: Describe the requirement of the litmus-tests directory Boqun Feng
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 10+ messages in thread
From: Boqun Feng @ 2021-10-25 14:54 UTC (permalink / raw)
  To: linux-kernel, linux-arch, linux-doc
  Cc: Paul E . McKenney ,
	Dan Lustig, Will Deacon, Peter Zijlstra, Linus Torvalds,
	Alexander Shishkin, Peter Anvin, Andrea Parri, Ingo Molnar,
	Vince Weaver, Thomas Gleixner, Jiri Olsa,
	Arnaldo Carvalho de Melo, Stephane Eranian, palmer,
	paul.walmsley, mpe, Jonathan Corbet, Boqun Feng, Alan Stern,
	Palmer Dabbelt

A recent discussion[1] shows that we are in favor of strengthening the
ordering of unlock + lock on the same CPU: a unlock and a po-after lock
should provide the so-called RCtso ordering, that is a memory access S
po-before the unlock should be ordered against a memory access R
po-after the lock, unless S is a store and R is a load.

The strengthening meets programmers' expection that "sequence of two
locked regions to be ordered wrt each other" (from Linus), and can
reduce the mental burden when using locks. Therefore add it in LKMM.

[1]: https://lore.kernel.org/lkml/20210909185937.GA12379@rowland.harvard.edu/

Co-developed-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Reviewed-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Acked-by: Palmer Dabbelt <palmerdabbelt@google.com> (RISC-V)
---
 .../Documentation/explanation.txt             | 44 +++++++++++--------
 tools/memory-model/linux-kernel.cat           |  6 +--
 2 files changed, 28 insertions(+), 22 deletions(-)

diff --git a/tools/memory-model/Documentation/explanation.txt b/tools/memory-model/Documentation/explanation.txt
index 5d72f3112e56..394ee57d58f2 100644
--- a/tools/memory-model/Documentation/explanation.txt
+++ b/tools/memory-model/Documentation/explanation.txt
@@ -1813,15 +1813,16 @@ spin_trylock() -- we can call these things lock-releases and
 lock-acquires -- have two properties beyond those of ordinary releases
 and acquires.
 
-First, when a lock-acquire reads from a lock-release, the LKMM
-requires that every instruction po-before the lock-release must
-execute before any instruction po-after the lock-acquire.  This would
-naturally hold if the release and acquire operations were on different
-CPUs, but the LKMM says it holds even when they are on the same CPU.
-For example:
+First, when a lock-acquire reads from or is po-after a lock-release,
+the LKMM requires that every instruction po-before the lock-release
+must execute before any instruction po-after the lock-acquire.  This
+would naturally hold if the release and acquire operations were on
+different CPUs and accessed the same lock variable, but the LKMM says
+it also holds when they are on the same CPU, even if they access
+different lock variables.  For example:
 
 	int x, y;
-	spinlock_t s;
+	spinlock_t s, t;
 
 	P0()
 	{
@@ -1830,9 +1831,9 @@ For example:
 		spin_lock(&s);
 		r1 = READ_ONCE(x);
 		spin_unlock(&s);
-		spin_lock(&s);
+		spin_lock(&t);
 		r2 = READ_ONCE(y);
-		spin_unlock(&s);
+		spin_unlock(&t);
 	}
 
 	P1()
@@ -1842,10 +1843,10 @@ For example:
 		WRITE_ONCE(x, 1);
 	}
 
-Here the second spin_lock() reads from the first spin_unlock(), and
-therefore the load of x must execute before the load of y.  Thus we
-cannot have r1 = 1 and r2 = 0 at the end (this is an instance of the
-MP pattern).
+Here the second spin_lock() is po-after the first spin_unlock(), and
+therefore the load of x must execute before the load of y, even though
+the two locking operations use different locks.  Thus we cannot have
+r1 = 1 and r2 = 0 at the end (this is an instance of the MP pattern).
 
 This requirement does not apply to ordinary release and acquire
 fences, only to lock-related operations.  For instance, suppose P0()
@@ -1872,13 +1873,13 @@ instructions in the following order:
 
 and thus it could load y before x, obtaining r2 = 0 and r1 = 1.
 
-Second, when a lock-acquire reads from a lock-release, and some other
-stores W and W' occur po-before the lock-release and po-after the
-lock-acquire respectively, the LKMM requires that W must propagate to
-each CPU before W' does.  For example, consider:
+Second, when a lock-acquire reads from or is po-after a lock-release,
+and some other stores W and W' occur po-before the lock-release and
+po-after the lock-acquire respectively, the LKMM requires that W must
+propagate to each CPU before W' does.  For example, consider:
 
 	int x, y;
-	spinlock_t x;
+	spinlock_t s;
 
 	P0()
 	{
@@ -1908,7 +1909,12 @@ each CPU before W' does.  For example, consider:
 
 If r1 = 1 at the end then the spin_lock() in P1 must have read from
 the spin_unlock() in P0.  Hence the store to x must propagate to P2
-before the store to y does, so we cannot have r2 = 1 and r3 = 0.
+before the store to y does, so we cannot have r2 = 1 and r3 = 0.  But
+if P1 had used a lock variable different from s, the writes could have
+propagated in either order.  (On the other hand, if the code in P0 and
+P1 had all executed on a single CPU, as in the example before this
+one, then the writes would have propagated in order even if the two
+critical sections used different lock variables.)
 
 These two special requirements for lock-release and lock-acquire do
 not arise from the operational model.  Nevertheless, kernel developers
diff --git a/tools/memory-model/linux-kernel.cat b/tools/memory-model/linux-kernel.cat
index 2a9b4fe4a84e..d70315fddef6 100644
--- a/tools/memory-model/linux-kernel.cat
+++ b/tools/memory-model/linux-kernel.cat
@@ -27,7 +27,7 @@ include "lock.cat"
 (* Release Acquire *)
 let acq-po = [Acquire] ; po ; [M]
 let po-rel = [M] ; po ; [Release]
-let po-unlock-rf-lock-po = po ; [UL] ; rf ; [LKR] ; po
+let po-unlock-lock-po = po ; [UL] ; (po|rf) ; [LKR] ; po
 
 (* Fences *)
 let R4rmb = R \ Noreturn	(* Reads for which rmb works *)
@@ -70,12 +70,12 @@ let rwdep = (dep | ctrl) ; [W]
 let overwrite = co | fr
 let to-w = rwdep | (overwrite & int) | (addr ; [Plain] ; wmb)
 let to-r = addr | (dep ; [Marked] ; rfi)
-let ppo = to-r | to-w | fence | (po-unlock-rf-lock-po & int)
+let ppo = to-r | to-w | fence | (po-unlock-lock-po & int)
 
 (* Propagation: Ordering from release operations and strong fences. *)
 let A-cumul(r) = (rfe ; [Marked])? ; r
 let cumul-fence = [Marked] ; (A-cumul(strong-fence | po-rel) | wmb |
-	po-unlock-rf-lock-po) ; [Marked]
+	po-unlock-lock-po) ; [Marked]
 let prop = [Marked] ; (overwrite & ext)? ; cumul-fence* ;
 	[Marked] ; rfe? ; [Marked]
 
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC v2 2/3] tools/memory-model: doc: Describe the requirement of the litmus-tests directory
  2021-10-25 14:54 [RFC v2 0/3] memory model: Make unlock(A)+lock(B) on the same CPU RCtso Boqun Feng
  2021-10-25 14:54 ` [RFC v2 1/3] tools/memory-model: Provide extra ordering for unlock+lock pair on the same CPU Boqun Feng
@ 2021-10-25 14:54 ` Boqun Feng
  2021-10-25 14:54 ` [RFC v2 3/3] tools/memory-model: litmus: Add two tests for unlock(A)+lock(B) ordering Boqun Feng
  2021-10-26  6:59 ` [RFC v2 0/3] memory model: Make unlock(A)+lock(B) on the same CPU RCtso Peter Zijlstra
  3 siblings, 0 replies; 10+ messages in thread
From: Boqun Feng @ 2021-10-25 14:54 UTC (permalink / raw)
  To: linux-kernel, linux-arch, linux-doc
  Cc: Paul E . McKenney ,
	Dan Lustig, Will Deacon, Peter Zijlstra, Linus Torvalds,
	Alexander Shishkin, Peter Anvin, Andrea Parri, Ingo Molnar,
	Vince Weaver, Thomas Gleixner, Jiri Olsa,
	Arnaldo Carvalho de Melo, Stephane Eranian, palmer,
	paul.walmsley, mpe, Jonathan Corbet, Boqun Feng

It's better that we have some "standard" about which test should be put
in the litmus-tests directory because it helps future contributors
understand whether they should work on litmus-tests in kernel or Paul's
GitHub repo. Therefore explain a little bit on what a "representative"
litmus test is.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
---
 tools/memory-model/README | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/tools/memory-model/README b/tools/memory-model/README
index 9a84c45504ab..9edd402704c4 100644
--- a/tools/memory-model/README
+++ b/tools/memory-model/README
@@ -195,6 +195,18 @@ litmus-tests
 	are listed in litmus-tests/README.  A great deal more litmus
 	tests are available at https://github.com/paulmckrcu/litmus.
 
+	By "representative", it means the one in the litmus-tests
+	directory is:
+
+		1) simple, the number of threads should be relatively
+		   small and each thread function should be relatively
+		   simple.
+		2) orthogonal, there should be no two litmus tests
+		   describing the same aspect of the memory model.
+		3) textbook, developers can easily copy-paste-modify
+		   the litmus tests to use the patterns on their own
+		   code.
+
 lock.cat
 	Provides a front-end analysis of lock acquisition and release,
 	for example, associating a lock acquisition with the preceding
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC v2 3/3] tools/memory-model: litmus: Add two tests for unlock(A)+lock(B) ordering
  2021-10-25 14:54 [RFC v2 0/3] memory model: Make unlock(A)+lock(B) on the same CPU RCtso Boqun Feng
  2021-10-25 14:54 ` [RFC v2 1/3] tools/memory-model: Provide extra ordering for unlock+lock pair on the same CPU Boqun Feng
  2021-10-25 14:54 ` [RFC v2 2/3] tools/memory-model: doc: Describe the requirement of the litmus-tests directory Boqun Feng
@ 2021-10-25 14:54 ` Boqun Feng
  2021-10-26  7:01   ` Peter Zijlstra
  2021-10-26  6:59 ` [RFC v2 0/3] memory model: Make unlock(A)+lock(B) on the same CPU RCtso Peter Zijlstra
  3 siblings, 1 reply; 10+ messages in thread
From: Boqun Feng @ 2021-10-25 14:54 UTC (permalink / raw)
  To: linux-kernel, linux-arch, linux-doc
  Cc: Paul E . McKenney ,
	Dan Lustig, Will Deacon, Peter Zijlstra, Linus Torvalds,
	Alexander Shishkin, Peter Anvin, Andrea Parri, Ingo Molnar,
	Vince Weaver, Thomas Gleixner, Jiri Olsa,
	Arnaldo Carvalho de Melo, Stephane Eranian, palmer,
	paul.walmsley, mpe, Jonathan Corbet, Boqun Feng, Alan Stern

The memory model has been updated to provide a stronger ordering
guarantee for unlock(A)+lock(B) on the same CPU/thread. Therefore add
two litmus tests describing this new guarantee, these tests are simple
yet can clearly show the usage of the new guarantee, also they can serve
as the self tests for the modification in the model.

Co-developed-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
---
 ...LB+unlocklockonceonce+poacquireonce.litmus | 33 +++++++++++++++++++
 ...unlocklockonceonce+fencermbonceonce.litmus | 33 +++++++++++++++++++
 tools/memory-model/litmus-tests/README        |  8 +++++
 3 files changed, 74 insertions(+)
 create mode 100644 tools/memory-model/litmus-tests/LB+unlocklockonceonce+poacquireonce.litmus
 create mode 100644 tools/memory-model/litmus-tests/MP+unlocklockonceonce+fencermbonceonce.litmus

diff --git a/tools/memory-model/litmus-tests/LB+unlocklockonceonce+poacquireonce.litmus b/tools/memory-model/litmus-tests/LB+unlocklockonceonce+poacquireonce.litmus
new file mode 100644
index 000000000000..955b9c7cdc7f
--- /dev/null
+++ b/tools/memory-model/litmus-tests/LB+unlocklockonceonce+poacquireonce.litmus
@@ -0,0 +1,33 @@
+C LB+unlocklockonceonce+poacquireonce
+
+(*
+ * Result: Never
+ *
+ * If two locked critical sections execute on the same CPU, all accesses
+ * in the first must execute before any accesses in the second, even if
+ * the critical sections are protected by different locks.
+ *)
+
+{}
+
+P0(spinlock_t *s, spinlock_t *t, int *x, int *y)
+{
+	int r1;
+
+	spin_lock(s);
+	r1 = READ_ONCE(*x);
+	spin_unlock(s);
+	spin_lock(t);
+	WRITE_ONCE(*y, 1);
+	spin_unlock(t);
+}
+
+P1(int *x, int *y)
+{
+	int r2;
+
+	r2 = smp_load_acquire(y);
+	WRITE_ONCE(*x, 1);
+}
+
+exists (0:r1=1 /\ 1:r2=1)
diff --git a/tools/memory-model/litmus-tests/MP+unlocklockonceonce+fencermbonceonce.litmus b/tools/memory-model/litmus-tests/MP+unlocklockonceonce+fencermbonceonce.litmus
new file mode 100644
index 000000000000..2feb1398be71
--- /dev/null
+++ b/tools/memory-model/litmus-tests/MP+unlocklockonceonce+fencermbonceonce.litmus
@@ -0,0 +1,33 @@
+C MP+unlocklockonceonce+fencermbonceonce
+
+(*
+ * Result: Never
+ *
+ * If two locked critical sections execute on the same CPU, stores in the
+ * first must propagate to each CPU before stores in the second do, even if
+ * the critical sections are protected by different locks.
+ *)
+
+{}
+
+P0(spinlock_t *s, spinlock_t *t, int *x, int *y)
+{
+	spin_lock(s);
+	WRITE_ONCE(*x, 1);
+	spin_unlock(s);
+	spin_lock(t);
+	WRITE_ONCE(*y, 1);
+	spin_unlock(t);
+}
+
+P1(int *x, int *y)
+{
+	int r1;
+	int r2;
+
+	r1 = READ_ONCE(*y);
+	smp_rmb();
+	r2 = READ_ONCE(*x);
+}
+
+exists (1:r1=1 /\ 1:r2=0)
diff --git a/tools/memory-model/litmus-tests/README b/tools/memory-model/litmus-tests/README
index 681f9067fa9e..d311a0ff1ae6 100644
--- a/tools/memory-model/litmus-tests/README
+++ b/tools/memory-model/litmus-tests/README
@@ -63,6 +63,10 @@ LB+poonceonces.litmus
 	As above, but with store-release replaced with WRITE_ONCE()
 	and load-acquire replaced with READ_ONCE().
 
+LB+unlocklockonceonce+poacquireonce.litmus
+	Does a unlock+lock pair provides ordering guarantee between a
+	load and a store?
+
 MP+onceassign+derefonce.litmus
 	As below, but with rcu_assign_pointer() and an rcu_dereference().
 
@@ -90,6 +94,10 @@ MP+porevlocks.litmus
 	As below, but with the first access of the writer process
 	and the second access of reader process protected by a lock.
 
+MP+unlocklockonceonce+fencermbonceonce.litmus
+	Does a unlock+lock pair provides ordering guarantee between a
+	store and another store?
+
 MP+fencewmbonceonce+fencermbonceonce.litmus
 	Does a smp_wmb() (between the stores) and an smp_rmb() (between
 	the loads) suffice for the message-passing litmus test, where one
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [RFC v2 0/3] memory model: Make unlock(A)+lock(B) on the same CPU RCtso
  2021-10-25 14:54 [RFC v2 0/3] memory model: Make unlock(A)+lock(B) on the same CPU RCtso Boqun Feng
                   ` (2 preceding siblings ...)
  2021-10-25 14:54 ` [RFC v2 3/3] tools/memory-model: litmus: Add two tests for unlock(A)+lock(B) ordering Boqun Feng
@ 2021-10-26  6:59 ` Peter Zijlstra
  3 siblings, 0 replies; 10+ messages in thread
From: Peter Zijlstra @ 2021-10-26  6:59 UTC (permalink / raw)
  To: Boqun Feng
  Cc: linux-kernel, linux-arch, linux-doc, Paul E . McKenney ,
	Dan Lustig, Will Deacon, Linus Torvalds, Alexander Shishkin,
	Peter Anvin, Andrea Parri, Ingo Molnar, Vince Weaver,
	Thomas Gleixner, Jiri Olsa, Arnaldo Carvalho de Melo,
	Stephane Eranian, palmer, paul.walmsley, mpe, Jonathan Corbet

On Mon, Oct 25, 2021 at 10:54:13PM +0800, Boqun Feng wrote:
> Boqun Feng (3):
>   tools/memory-model: Provide extra ordering for unlock+lock pair on the
>     same CPU
>   tools/memory-model: doc: Describe the requirement of the litmus-tests
>     directory
>   tools/memory-model: litmus: Add two tests for unlock(A)+lock(B)
>     ordering

I'm obviously all in favour of this :-)

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC v2 3/3] tools/memory-model: litmus: Add two tests for unlock(A)+lock(B) ordering
  2021-10-25 14:54 ` [RFC v2 3/3] tools/memory-model: litmus: Add two tests for unlock(A)+lock(B) ordering Boqun Feng
@ 2021-10-26  7:01   ` Peter Zijlstra
  2021-10-28 19:11     ` Paul E. McKenney
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Zijlstra @ 2021-10-26  7:01 UTC (permalink / raw)
  To: Boqun Feng
  Cc: linux-kernel, linux-arch, linux-doc, Paul E . McKenney ,
	Dan Lustig, Will Deacon, Linus Torvalds, Alexander Shishkin,
	Peter Anvin, Andrea Parri, Ingo Molnar, Vince Weaver,
	Thomas Gleixner, Jiri Olsa, Arnaldo Carvalho de Melo,
	Stephane Eranian, palmer, paul.walmsley, mpe, Jonathan Corbet,
	Alan Stern

On Mon, Oct 25, 2021 at 10:54:16PM +0800, Boqun Feng wrote:
> diff --git a/tools/memory-model/litmus-tests/LB+unlocklockonceonce+poacquireonce.litmus b/tools/memory-model/litmus-tests/LB+unlocklockonceonce+poacquireonce.litmus
> new file mode 100644
> index 000000000000..955b9c7cdc7f
> --- /dev/null
> +++ b/tools/memory-model/litmus-tests/LB+unlocklockonceonce+poacquireonce.litmus
> @@ -0,0 +1,33 @@
> +C LB+unlocklockonceonce+poacquireonce
> +
> +(*
> + * Result: Never
> + *
> + * If two locked critical sections execute on the same CPU, all accesses
> + * in the first must execute before any accesses in the second, even if
> + * the critical sections are protected by different locks.

One small nit; the above "all accesses" reads as if:

	spin_lock(s);
	WRITE_ONCE(*x, 1);
	spin_unlock(s);
	spin_lock(t);
	r1 = READ_ONCE(*y);
	spin_unlock(t);

would also work, except of course that's the one reorder allowed by TSO.

> + *)
> +
> +{}
> +
> +P0(spinlock_t *s, spinlock_t *t, int *x, int *y)
> +{
> +	int r1;
> +
> +	spin_lock(s);
> +	r1 = READ_ONCE(*x);
> +	spin_unlock(s);
> +	spin_lock(t);
> +	WRITE_ONCE(*y, 1);
> +	spin_unlock(t);
> +}
> +
> +P1(int *x, int *y)
> +{
> +	int r2;
> +
> +	r2 = smp_load_acquire(y);
> +	WRITE_ONCE(*x, 1);
> +}
> +
> +exists (0:r1=1 /\ 1:r2=1)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC v2 3/3] tools/memory-model: litmus: Add two tests for unlock(A)+lock(B) ordering
  2021-10-26  7:01   ` Peter Zijlstra
@ 2021-10-28 19:11     ` Paul E. McKenney
  2021-10-28 23:51       ` Boqun Feng
  0 siblings, 1 reply; 10+ messages in thread
From: Paul E. McKenney @ 2021-10-28 19:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Boqun Feng, linux-kernel, linux-arch, linux-doc, Dan Lustig,
	Will Deacon, Linus Torvalds, Alexander Shishkin, Peter Anvin,
	Andrea Parri, Ingo Molnar, Vince Weaver, Thomas Gleixner,
	Jiri Olsa, Arnaldo Carvalho de Melo, Stephane Eranian, palmer,
	paul.walmsley, mpe, Jonathan Corbet, Alan Stern

On Tue, Oct 26, 2021 at 09:01:00AM +0200, Peter Zijlstra wrote:
> On Mon, Oct 25, 2021 at 10:54:16PM +0800, Boqun Feng wrote:
> > diff --git a/tools/memory-model/litmus-tests/LB+unlocklockonceonce+poacquireonce.litmus b/tools/memory-model/litmus-tests/LB+unlocklockonceonce+poacquireonce.litmus
> > new file mode 100644
> > index 000000000000..955b9c7cdc7f
> > --- /dev/null
> > +++ b/tools/memory-model/litmus-tests/LB+unlocklockonceonce+poacquireonce.litmus
> > @@ -0,0 +1,33 @@
> > +C LB+unlocklockonceonce+poacquireonce
> > +
> > +(*
> > + * Result: Never
> > + *
> > + * If two locked critical sections execute on the same CPU, all accesses
> > + * in the first must execute before any accesses in the second, even if
> > + * the critical sections are protected by different locks.
> 
> One small nit; the above "all accesses" reads as if:
> 
> 	spin_lock(s);
> 	WRITE_ONCE(*x, 1);
> 	spin_unlock(s);
> 	spin_lock(t);
> 	r1 = READ_ONCE(*y);
> 	spin_unlock(t);
> 
> would also work, except of course that's the one reorder allowed by TSO.

I applied this series with Peter's Acked-by, and with the above comment
reading as follows:

+(*
+ * Result: Never
+ *
+ * If two locked critical sections execute on the same CPU, all accesses
+ * in the first must execute before any accesses in the second, even if the
+ * critical sections are protected by different locks.  The one exception
+ * to this rule is that (consistent with TSO) a prior write can be reordered
+ * with a later read from the viewpoint of a process not holding both locks.
+ *)

Thank you all!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC v2 3/3] tools/memory-model: litmus: Add two tests for unlock(A)+lock(B) ordering
  2021-10-28 19:11     ` Paul E. McKenney
@ 2021-10-28 23:51       ` Boqun Feng
  2021-10-29 14:34         ` Alan Stern
  0 siblings, 1 reply; 10+ messages in thread
From: Boqun Feng @ 2021-10-28 23:51 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Peter Zijlstra, linux-kernel, linux-arch, linux-doc, Dan Lustig,
	Will Deacon, Linus Torvalds, Alexander Shishkin, Peter Anvin,
	Andrea Parri, Ingo Molnar, Vince Weaver, Thomas Gleixner,
	Jiri Olsa, Arnaldo Carvalho de Melo, Stephane Eranian, palmer,
	paul.walmsley, mpe, Jonathan Corbet, Alan Stern

On Thu, Oct 28, 2021 at 12:11:29PM -0700, Paul E. McKenney wrote:
> On Tue, Oct 26, 2021 at 09:01:00AM +0200, Peter Zijlstra wrote:
> > On Mon, Oct 25, 2021 at 10:54:16PM +0800, Boqun Feng wrote:
> > > diff --git a/tools/memory-model/litmus-tests/LB+unlocklockonceonce+poacquireonce.litmus b/tools/memory-model/litmus-tests/LB+unlocklockonceonce+poacquireonce.litmus
> > > new file mode 100644
> > > index 000000000000..955b9c7cdc7f
> > > --- /dev/null
> > > +++ b/tools/memory-model/litmus-tests/LB+unlocklockonceonce+poacquireonce.litmus
> > > @@ -0,0 +1,33 @@
> > > +C LB+unlocklockonceonce+poacquireonce
> > > +
> > > +(*
> > > + * Result: Never
> > > + *
> > > + * If two locked critical sections execute on the same CPU, all accesses
> > > + * in the first must execute before any accesses in the second, even if
> > > + * the critical sections are protected by different locks.
> > 
> > One small nit; the above "all accesses" reads as if:
> > 
> > 	spin_lock(s);
> > 	WRITE_ONCE(*x, 1);
> > 	spin_unlock(s);
> > 	spin_lock(t);
> > 	r1 = READ_ONCE(*y);
> > 	spin_unlock(t);
> > 
> > would also work, except of course that's the one reorder allowed by TSO.
> 
> I applied this series with Peter's Acked-by, and with the above comment

Thanks!

> reading as follows:
> 
> +(*
> + * Result: Never
> + *
> + * If two locked critical sections execute on the same CPU, all accesses
> + * in the first must execute before any accesses in the second, even if the
> + * critical sections are protected by different locks.  The one exception
> + * to this rule is that (consistent with TSO) a prior write can be reordered
> + * with a later read from the viewpoint of a process not holding both locks.

Just want to be accurate, in our memory model "execute" means a CPU
commit an memory access instruction to the Memory Subsystem, so if we
have a store W and a load R, where W executes before R, it doesn't mean
the memory effect of W is observed before the memory effect of R by
other CPUs, consider the following case


	CPU0			Memory Subsystem		CPU1
	====							====
	WRITE_ONCE(*x,1); // W ---------->|
	spin_unlock(s);                   |
	spin_lock(t);                     |
	r1 = READ_ONCE(*y); // R -------->|
	// R reads 0                      |
					  |<----------------WRITR_ONCE(*y, 1); // W'
		 W' propagates to CPU0    |
		<-------------------------|
					  |                 smp_mb();
					  |<----------------r1 = READ_ONCE(*x); // R' reads 0
					  |
					  | W progrates to CPU 1
		                          |----------------->

The "->" from CPU0 to the Memory Subsystem shows that W executes before
R, however the memory effect of a store can be observed only after the
Memory Subsystem propagates it to another CPU, as a result CPU1 doesn't
observe W before R is executed. So the original version of the comments
is correct in our memory model terminology, at least that's how I
understand it, Alan can correct me if I'm wrong.

Maybe it's better to replace the sentence starting with "The one
exception..." into:

One thing to notice is that even though a write executes by a read, the
memory effects can still be reordered from the viewpoint of a process
not holding both locks, similar to TSO ordering.

Thoughts?

Apologies for responsing late...

("Memory Subsystem" is an abstraction in our memory model, which doesn't
mean hardware implements things in the same way.).

Regards,
Boqun

> + *)
> 
> Thank you all!
> 
> 							Thanx, Paul

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC v2 3/3] tools/memory-model: litmus: Add two tests for unlock(A)+lock(B) ordering
  2021-10-28 23:51       ` Boqun Feng
@ 2021-10-29 14:34         ` Alan Stern
  2021-10-29 15:01           ` Paul E. McKenney
  0 siblings, 1 reply; 10+ messages in thread
From: Alan Stern @ 2021-10-29 14:34 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Paul E. McKenney, Peter Zijlstra, linux-kernel, linux-arch,
	linux-doc, Dan Lustig, Will Deacon, Linus Torvalds,
	Alexander Shishkin, Peter Anvin, Andrea Parri, Ingo Molnar,
	Vince Weaver, Thomas Gleixner, Jiri Olsa,
	Arnaldo Carvalho de Melo, Stephane Eranian, palmer,
	paul.walmsley, mpe, Jonathan Corbet

On Fri, Oct 29, 2021 at 07:51:39AM +0800, Boqun Feng wrote:
> On Thu, Oct 28, 2021 at 12:11:29PM -0700, Paul E. McKenney wrote:
> > On Tue, Oct 26, 2021 at 09:01:00AM +0200, Peter Zijlstra wrote:
> > > On Mon, Oct 25, 2021 at 10:54:16PM +0800, Boqun Feng wrote:
> > > > diff --git a/tools/memory-model/litmus-tests/LB+unlocklockonceonce+poacquireonce.litmus b/tools/memory-model/litmus-tests/LB+unlocklockonceonce+poacquireonce.litmus
> > > > new file mode 100644
> > > > index 000000000000..955b9c7cdc7f
> > > > --- /dev/null
> > > > +++ b/tools/memory-model/litmus-tests/LB+unlocklockonceonce+poacquireonce.litmus
> > > > @@ -0,0 +1,33 @@
> > > > +C LB+unlocklockonceonce+poacquireonce
> > > > +
> > > > +(*
> > > > + * Result: Never
> > > > + *
> > > > + * If two locked critical sections execute on the same CPU, all accesses
> > > > + * in the first must execute before any accesses in the second, even if
> > > > + * the critical sections are protected by different locks.
> > > 
> > > One small nit; the above "all accesses" reads as if:
> > > 
> > > 	spin_lock(s);
> > > 	WRITE_ONCE(*x, 1);
> > > 	spin_unlock(s);
> > > 	spin_lock(t);
> > > 	r1 = READ_ONCE(*y);
> > > 	spin_unlock(t);
> > > 
> > > would also work, except of course that's the one reorder allowed by TSO.
> > 
> > I applied this series with Peter's Acked-by, and with the above comment
> 
> Thanks!
> 
> > reading as follows:
> > 
> > +(*
> > + * Result: Never
> > + *
> > + * If two locked critical sections execute on the same CPU, all accesses
> > + * in the first must execute before any accesses in the second, even if the
> > + * critical sections are protected by different locks.  The one exception
> > + * to this rule is that (consistent with TSO) a prior write can be reordered
> > + * with a later read from the viewpoint of a process not holding both locks.
> 
> Just want to be accurate, in our memory model "execute" means a CPU
> commit an memory access instruction to the Memory Subsystem, so if we
> have a store W and a load R, where W executes before R, it doesn't mean
> the memory effect of W is observed before the memory effect of R by
> other CPUs, consider the following case
> 
> 
> 	CPU0			Memory Subsystem		CPU1
> 	====							====
> 	WRITE_ONCE(*x,1); // W ---------->|
> 	spin_unlock(s);                   |
> 	spin_lock(t);                     |
> 	r1 = READ_ONCE(*y); // R -------->|
> 	// R reads 0                      |
> 					  |<----------------WRITR_ONCE(*y, 1); // W'
> 		 W' propagates to CPU0    |
> 		<-------------------------|
> 					  |                 smp_mb();
> 					  |<----------------r1 = READ_ONCE(*x); // R' reads 0
> 					  |
> 					  | W progrates to CPU 1
> 		                          |----------------->
> 
> The "->" from CPU0 to the Memory Subsystem shows that W executes before
> R, however the memory effect of a store can be observed only after the
> Memory Subsystem propagates it to another CPU, as a result CPU1 doesn't
> observe W before R is executed. So the original version of the comments
> is correct in our memory model terminology, at least that's how I
> understand it, Alan can correct me if I'm wrong.

Indeed, that is correct.

It is an unfortunate inconsistency with the terminology in 
Documentation/memory-barriers.txt.  I suspect most people think of a 
write as executing when it is observed by another CPU, even though that 
really isn't a coherent concept.  (For example, it could easily lead 
somebody to think that a write observed at different times by different 
CPUs has executed more than once!)

> Maybe it's better to replace the sentence starting with "The one
> exception..." into:
> 
> One thing to notice is that even though a write executes by a read, the
> memory effects can still be reordered from the viewpoint of a process
> not holding both locks, similar to TSO ordering.
> 
> Thoughts?

Or more briefly:

	Note: Even when a write executes before a read, their memory
	effects can be reordered from the viewpoint of another CPU (the 
	kind of reordering allowed by TSO).

Alan

> Apologies for responsing late...
> 
> ("Memory Subsystem" is an abstraction in our memory model, which doesn't
> mean hardware implements things in the same way.).
> 
> Regards,
> Boqun
> 
> > + *)
> > 
> > Thank you all!
> > 
> > 							Thanx, Paul

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC v2 3/3] tools/memory-model: litmus: Add two tests for unlock(A)+lock(B) ordering
  2021-10-29 14:34         ` Alan Stern
@ 2021-10-29 15:01           ` Paul E. McKenney
  0 siblings, 0 replies; 10+ messages in thread
From: Paul E. McKenney @ 2021-10-29 15:01 UTC (permalink / raw)
  To: Alan Stern
  Cc: Boqun Feng, Peter Zijlstra, linux-kernel, linux-arch, linux-doc,
	Dan Lustig, Will Deacon, Linus Torvalds, Alexander Shishkin,
	Peter Anvin, Andrea Parri, Ingo Molnar, Vince Weaver,
	Thomas Gleixner, Jiri Olsa, Arnaldo Carvalho de Melo,
	Stephane Eranian, palmer, paul.walmsley, mpe, Jonathan Corbet

On Fri, Oct 29, 2021 at 10:34:42AM -0400, Alan Stern wrote:
> On Fri, Oct 29, 2021 at 07:51:39AM +0800, Boqun Feng wrote:
> > On Thu, Oct 28, 2021 at 12:11:29PM -0700, Paul E. McKenney wrote:
> > > On Tue, Oct 26, 2021 at 09:01:00AM +0200, Peter Zijlstra wrote:
> > > > On Mon, Oct 25, 2021 at 10:54:16PM +0800, Boqun Feng wrote:
> > > > > diff --git a/tools/memory-model/litmus-tests/LB+unlocklockonceonce+poacquireonce.litmus b/tools/memory-model/litmus-tests/LB+unlocklockonceonce+poacquireonce.litmus
> > > > > new file mode 100644
> > > > > index 000000000000..955b9c7cdc7f
> > > > > --- /dev/null
> > > > > +++ b/tools/memory-model/litmus-tests/LB+unlocklockonceonce+poacquireonce.litmus
> > > > > @@ -0,0 +1,33 @@
> > > > > +C LB+unlocklockonceonce+poacquireonce
> > > > > +
> > > > > +(*
> > > > > + * Result: Never
> > > > > + *
> > > > > + * If two locked critical sections execute on the same CPU, all accesses
> > > > > + * in the first must execute before any accesses in the second, even if
> > > > > + * the critical sections are protected by different locks.
> > > > 
> > > > One small nit; the above "all accesses" reads as if:
> > > > 
> > > > 	spin_lock(s);
> > > > 	WRITE_ONCE(*x, 1);
> > > > 	spin_unlock(s);
> > > > 	spin_lock(t);
> > > > 	r1 = READ_ONCE(*y);
> > > > 	spin_unlock(t);
> > > > 
> > > > would also work, except of course that's the one reorder allowed by TSO.
> > > 
> > > I applied this series with Peter's Acked-by, and with the above comment
> > 
> > Thanks!
> > 
> > > reading as follows:
> > > 
> > > +(*
> > > + * Result: Never
> > > + *
> > > + * If two locked critical sections execute on the same CPU, all accesses
> > > + * in the first must execute before any accesses in the second, even if the
> > > + * critical sections are protected by different locks.  The one exception
> > > + * to this rule is that (consistent with TSO) a prior write can be reordered
> > > + * with a later read from the viewpoint of a process not holding both locks.
> > 
> > Just want to be accurate, in our memory model "execute" means a CPU
> > commit an memory access instruction to the Memory Subsystem, so if we
> > have a store W and a load R, where W executes before R, it doesn't mean
> > the memory effect of W is observed before the memory effect of R by
> > other CPUs, consider the following case
> > 
> > 
> > 	CPU0			Memory Subsystem		CPU1
> > 	====							====
> > 	WRITE_ONCE(*x,1); // W ---------->|
> > 	spin_unlock(s);                   |
> > 	spin_lock(t);                     |
> > 	r1 = READ_ONCE(*y); // R -------->|
> > 	// R reads 0                      |
> > 					  |<----------------WRITR_ONCE(*y, 1); // W'
> > 		 W' propagates to CPU0    |
> > 		<-------------------------|
> > 					  |                 smp_mb();
> > 					  |<----------------r1 = READ_ONCE(*x); // R' reads 0
> > 					  |
> > 					  | W progrates to CPU 1
> > 		                          |----------------->
> > 
> > The "->" from CPU0 to the Memory Subsystem shows that W executes before
> > R, however the memory effect of a store can be observed only after the
> > Memory Subsystem propagates it to another CPU, as a result CPU1 doesn't
> > observe W before R is executed. So the original version of the comments
> > is correct in our memory model terminology, at least that's how I
> > understand it, Alan can correct me if I'm wrong.
> 
> Indeed, that is correct.
> 
> It is an unfortunate inconsistency with the terminology in 
> Documentation/memory-barriers.txt.  I suspect most people think of a 
> write as executing when it is observed by another CPU, even though that 
> really isn't a coherent concept.  (For example, it could easily lead 
> somebody to think that a write observed at different times by different 
> CPUs has executed more than once!)

Agreed, the terminology is odd.  But the fact that different CPUs can
see writes in different orders is probably always going to be a bit
counter-intuitive, so it is good to avoid giving that intuition any
support.

> > Maybe it's better to replace the sentence starting with "The one
> > exception..." into:
> > 
> > One thing to notice is that even though a write executes by a read, the
> > memory effects can still be reordered from the viewpoint of a process
> > not holding both locks, similar to TSO ordering.
> > 
> > Thoughts?
> 
> Or more briefly:
> 
> 	Note: Even when a write executes before a read, their memory
> 	effects can be reordered from the viewpoint of another CPU (the 
> 	kind of reordering allowed by TSO).

Very good!  I took this verbatim in a fixup patch to be combined
with the original on my next rebase.

							Thanx, Paul

> Alan
> 
> > Apologies for responsing late...
> > 
> > ("Memory Subsystem" is an abstraction in our memory model, which doesn't
> > mean hardware implements things in the same way.).
> > 
> > Regards,
> > Boqun
> > 
> > > + *)
> > > 
> > > Thank you all!
> > > 
> > > 							Thanx, Paul

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-10-29 15:01 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-25 14:54 [RFC v2 0/3] memory model: Make unlock(A)+lock(B) on the same CPU RCtso Boqun Feng
2021-10-25 14:54 ` [RFC v2 1/3] tools/memory-model: Provide extra ordering for unlock+lock pair on the same CPU Boqun Feng
2021-10-25 14:54 ` [RFC v2 2/3] tools/memory-model: doc: Describe the requirement of the litmus-tests directory Boqun Feng
2021-10-25 14:54 ` [RFC v2 3/3] tools/memory-model: litmus: Add two tests for unlock(A)+lock(B) ordering Boqun Feng
2021-10-26  7:01   ` Peter Zijlstra
2021-10-28 19:11     ` Paul E. McKenney
2021-10-28 23:51       ` Boqun Feng
2021-10-29 14:34         ` Alan Stern
2021-10-29 15:01           ` Paul E. McKenney
2021-10-26  6:59 ` [RFC v2 0/3] memory model: Make unlock(A)+lock(B) on the same CPU RCtso Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.