All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH memory-model 0/4] LKMM updates for v6.3
@ 2023-01-05  1:09 Paul E. McKenney
  2023-01-05  1:09 ` [PATCH memory-model 1/4] locking/memory-barriers.txt: Improve documentation for writel() example Paul E. McKenney
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Paul E. McKenney @ 2023-01-05  1:09 UTC (permalink / raw)
  To: linux-kernel, linux-arch, kernel-team, mingo
  Cc: stern, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells,
	j.alglave, luc.maranget, akiyks

Hello!

This series provides LKMM updates:

1.	locking/memory-barriers.txt: Improve documentation for writel()
	example, courtesy of Parav Pandit.

2.	memory-model: Add rmw-sequences to the LKMM, courtesy of Alan
	Stern.

3.	Documentation: Fixed a typo in atomic_t.txt, courtesy of
	Kushagra Verma.

4.	memory-model: Make plain accesses carry dependencies, courtesy
	of Jonas Oberhauser.

						Thanx, Paul

------------------------------------------------------------------------

 b/Documentation/atomic_t.txt                       |    2 -
 b/Documentation/memory-barriers.txt                |   22 +++++++-------
 b/tools/memory-model/Documentation/explanation.txt |   30 ++++++++++++++++++++
 b/tools/memory-model/linux-kernel.bell             |    6 ++++
 b/tools/memory-model/linux-kernel.cat              |    5 ++-
 b/tools/memory-model/litmus-tests/dep+plain.litmus |   31 +++++++++++++++++++++
 tools/memory-model/Documentation/explanation.txt   |    9 +++++-
 7 files changed, 90 insertions(+), 15 deletions(-)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH memory-model 1/4] locking/memory-barriers.txt: Improve documentation for writel() example
  2023-01-05  1:09 [PATCH memory-model 0/4] LKMM updates for v6.3 Paul E. McKenney
@ 2023-01-05  1:09 ` Paul E. McKenney
  2023-01-05  1:09 ` [PATCH memory-model 2/4] tools: memory-model: Add rmw-sequences to the LKMM Paul E. McKenney
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Paul E. McKenney @ 2023-01-05  1:09 UTC (permalink / raw)
  To: linux-kernel, linux-arch, kernel-team, mingo
  Cc: stern, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells,
	j.alglave, luc.maranget, akiyks, Parav Pandit, Paul E . McKenney

From: Parav Pandit <parav@nvidia.com>

The cited commit describes that when using writel(), explicit wmb()
is not needed. wmb() is an expensive barrier. writel() uses the needed
platform specific barrier instead of wmb().

writeX() section of "KERNEL I/O BARRIER EFFECTS" already describes
ordering of I/O accessors with MMIO writes.

Hence add the comment for pseudo code of writel() and remove confusing
text around writel() and wmb().

commit 5846581e3563 ("locking/memory-barriers.txt: Fix broken DMA vs. MMIO ordering example")

Co-developed-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 Documentation/memory-barriers.txt | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index cc621decd9439..06e14efd8662a 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1910,7 +1910,8 @@ There are some more advanced barrier functions:
 
      These are for use with consistent memory to guarantee the ordering
      of writes or reads of shared memory accessible to both the CPU and a
-     DMA capable device.
+     DMA capable device. See Documentation/core-api/dma-api.rst file for more
+     information about consistent memory.
 
      For example, consider a device driver that shares memory with a device
      and uses a descriptor status value to indicate if the descriptor belongs
@@ -1931,22 +1932,21 @@ There are some more advanced barrier functions:
 		/* assign ownership */
 		desc->status = DEVICE_OWN;
 
-		/* notify device of new descriptors */
+		/* Make descriptor status visible to the device followed by
+		 * notify device of new descriptor
+		 */
 		writel(DESC_NOTIFY, doorbell);
 	}
 
-     The dma_rmb() allows us guarantee the device has released ownership
+     The dma_rmb() allows us to guarantee that the device has released ownership
      before we read the data from the descriptor, and the dma_wmb() allows
      us to guarantee the data is written to the descriptor before the device
      can see it now has ownership.  The dma_mb() implies both a dma_rmb() and
-     a dma_wmb().  Note that, when using writel(), a prior wmb() is not needed
-     to guarantee that the cache coherent memory writes have completed before
-     writing to the MMIO region.  The cheaper writel_relaxed() does not provide
-     this guarantee and must not be used here.
-
-     See the subsection "Kernel I/O barrier effects" for more information on
-     relaxed I/O accessors and the Documentation/core-api/dma-api.rst file for
-     more information on consistent memory.
+     a dma_wmb().
+
+     Note that the dma_*() barriers do not provide any ordering guarantees for
+     accesses to MMIO regions.  See the later "KERNEL I/O BARRIER EFFECTS"
+     subsection for more information about I/O accessors and MMIO ordering.
 
  (*) pmem_wmb();
 
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH memory-model 2/4] tools: memory-model: Add rmw-sequences to the LKMM
  2023-01-05  1:09 [PATCH memory-model 0/4] LKMM updates for v6.3 Paul E. McKenney
  2023-01-05  1:09 ` [PATCH memory-model 1/4] locking/memory-barriers.txt: Improve documentation for writel() example Paul E. McKenney
@ 2023-01-05  1:09 ` Paul E. McKenney
  2023-01-05  1:09 ` [PATCH memory-model 3/4] Documentation: Fixed a typo in atomic_t.txt Paul E. McKenney
  2023-01-05  1:09 ` [PATCH memory-model 4/4] tools: memory-model: Make plain accesses carry dependencies Paul E. McKenney
  3 siblings, 0 replies; 5+ messages in thread
From: Paul E. McKenney @ 2023-01-05  1:09 UTC (permalink / raw)
  To: linux-kernel, linux-arch, kernel-team, mingo
  Cc: stern, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells,
	j.alglave, luc.maranget, akiyks, Viktor Vafeiadis,
	Jonas Oberhauser, Paul E . McKenney

From: Alan Stern <stern@rowland.harvard.edu>

Viktor (as relayed by Jonas) has pointed out a weakness in the Linux
Kernel Memory Model.  Namely, the memory ordering properties of atomic
operations are not monotonic: An atomic op with full-barrier semantics
does not always provide ordering as strong as one with release-barrier
semantics.

The following litmus test illustrates the problem:

--------------------------------------------------
C atomics-not-monotonic

{}

P0(int *x, atomic_t *y)
{
	WRITE_ONCE(*x, 1);
	smp_wmb();
	atomic_set(y, 1);
}

P1(atomic_t *y)
{
	int r1;

	r1 = atomic_inc_return(y);
}

P2(int *x, atomic_t *y)
{
	int r2;
	int r3;

	r2 = atomic_read(y);
	smp_rmb();
	r3 = READ_ONCE(*x);
}

exists (2:r2=2 /\ 2:r3=0)
--------------------------------------------------

The litmus test is allowed as shown with atomic_inc_return(), which
has full-barrier semantics.  But if the operation is changed to
atomic_inc_return_release(), which only has release-barrier semantics,
the litmus test is forbidden.  Clearly this violates monotonicity.

The reason is because the LKMM treats full-barrier atomic ops as if
they were written:

	mb();
	load();
	store();
	mb();

(where the load() and store() are the two parts of an atomic RMW op),
whereas it treats release-barrier atomic ops as if they were written:

	load();
	release_barrier();
	store();

The difference is that here the release barrier orders the load part
of the atomic op before the store part with A-cumulativity, whereas
the mb()'s above do not.  This means that release-barrier atomics can
effectively extend the cumul-fence relation but full-barrier atomics
cannot.

To resolve this problem we introduce the rmw-sequence relation,
representing an arbitrarily long sequence of atomic RMW operations in
which each operation reads from the previous one, and explicitly allow
it to extend cumul-fence.  This modification of the memory model is
sound; it holds for PPC because of B-cumulativity, it holds for TSO
and ARM64 because of other-multicopy atomicity, and we can assume that
atomic ops on all other architectures will be implemented so as to
make it hold for them.

For similar reasons we also allow rmw-sequence to extend the
w-post-bounded relation, which is analogous to cumul-fence in some
ways.

Reported-by: Viktor Vafeiadis <viktor@mpi-sws.org>
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reviewed-by: Jonas Oberhauser <jonas.oberhauser@huawei.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 .../Documentation/explanation.txt             | 30 +++++++++++++++++++
 tools/memory-model/linux-kernel.cat           |  5 ++--
 2 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/tools/memory-model/Documentation/explanation.txt b/tools/memory-model/Documentation/explanation.txt
index 11a1d2d4f681c..e901b47236c37 100644
--- a/tools/memory-model/Documentation/explanation.txt
+++ b/tools/memory-model/Documentation/explanation.txt
@@ -1007,6 +1007,36 @@ order.  Equivalently,
 where the rmw relation links the read and write events making up each
 atomic update.  This is what the LKMM's "atomic" axiom says.
 
+Atomic rmw updates play one more role in the LKMM: They can form "rmw
+sequences".  An rmw sequence is simply a bunch of atomic updates where
+each update reads from the previous one.  Written using events, it
+looks like this:
+
+	Z0 ->rf Y1 ->rmw Z1 ->rf ... ->rf Yn ->rmw Zn,
+
+where Z0 is some store event and n can be any number (even 0, in the
+degenerate case).  We write this relation as: Z0 ->rmw-sequence Zn.
+Note that this implies Z0 and Zn are stores to the same variable.
+
+Rmw sequences have a special property in the LKMM: They can extend the
+cumul-fence relation.  That is, if we have:
+
+	U ->cumul-fence X -> rmw-sequence Y
+
+then also U ->cumul-fence Y.  Thinking about this in terms of the
+operational model, U ->cumul-fence X says that the store U propagates
+to each CPU before the store X does.  Then the fact that X and Y are
+linked by an rmw sequence means that U also propagates to each CPU
+before Y does.  In an analogous way, rmw sequences can also extend
+the w-post-bounded relation defined below in the PLAIN ACCESSES AND
+DATA RACES section.
+
+(The notion of rmw sequences in the LKMM is similar to, but not quite
+the same as, that of release sequences in the C11 memory model.  They
+were added to the LKMM to fix an obscure bug; without them, atomic
+updates with full-barrier semantics did not always guarantee ordering
+at least as strong as atomic updates with release-barrier semantics.)
+
 
 THE PRESERVED PROGRAM ORDER RELATION: ppo
 -----------------------------------------
diff --git a/tools/memory-model/linux-kernel.cat b/tools/memory-model/linux-kernel.cat
index d70315fddef6e..07f884f9b2bff 100644
--- a/tools/memory-model/linux-kernel.cat
+++ b/tools/memory-model/linux-kernel.cat
@@ -74,8 +74,9 @@ let ppo = to-r | to-w | fence | (po-unlock-lock-po & int)
 
 (* Propagation: Ordering from release operations and strong fences. *)
 let A-cumul(r) = (rfe ; [Marked])? ; r
+let rmw-sequence = (rf ; rmw)*
 let cumul-fence = [Marked] ; (A-cumul(strong-fence | po-rel) | wmb |
-	po-unlock-lock-po) ; [Marked]
+	po-unlock-lock-po) ; [Marked] ; rmw-sequence
 let prop = [Marked] ; (overwrite & ext)? ; cumul-fence* ;
 	[Marked] ; rfe? ; [Marked]
 
@@ -174,7 +175,7 @@ let vis = cumul-fence* ; rfe? ; [Marked] ;
 let w-pre-bounded = [Marked] ; (addr | fence)?
 let r-pre-bounded = [Marked] ; (addr | nonrw-fence |
 	([R4rmb] ; fencerel(Rmb) ; [~Noreturn]))?
-let w-post-bounded = fence? ; [Marked]
+let w-post-bounded = fence? ; [Marked] ; rmw-sequence
 let r-post-bounded = (nonrw-fence | ([~Noreturn] ; fencerel(Rmb) ; [R4rmb]))? ;
 	[Marked]
 
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH memory-model 3/4] Documentation: Fixed a typo in atomic_t.txt
  2023-01-05  1:09 [PATCH memory-model 0/4] LKMM updates for v6.3 Paul E. McKenney
  2023-01-05  1:09 ` [PATCH memory-model 1/4] locking/memory-barriers.txt: Improve documentation for writel() example Paul E. McKenney
  2023-01-05  1:09 ` [PATCH memory-model 2/4] tools: memory-model: Add rmw-sequences to the LKMM Paul E. McKenney
@ 2023-01-05  1:09 ` Paul E. McKenney
  2023-01-05  1:09 ` [PATCH memory-model 4/4] tools: memory-model: Make plain accesses carry dependencies Paul E. McKenney
  3 siblings, 0 replies; 5+ messages in thread
From: Paul E. McKenney @ 2023-01-05  1:09 UTC (permalink / raw)
  To: linux-kernel, linux-arch, kernel-team, mingo
  Cc: stern, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells,
	j.alglave, luc.maranget, akiyks, Kushagra Verma,
	Paul E . McKenney

From: Kushagra Verma <kushagra765@outlook.com>

Fixed a typo in the word 'architecture'.

Signed-off-by: Kushagra Verma <kushagra765@outlook.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 Documentation/atomic_t.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/atomic_t.txt b/Documentation/atomic_t.txt
index 0f1ffa03db09a..d7adc6d543db4 100644
--- a/Documentation/atomic_t.txt
+++ b/Documentation/atomic_t.txt
@@ -324,7 +324,7 @@ atomic operations.
 
 Specifically 'simple' cmpxchg() loops are expected to not starve one another
 indefinitely. However, this is not evident on LL/SC architectures, because
-while an LL/SC architecure 'can/should/must' provide forward progress
+while an LL/SC architecture 'can/should/must' provide forward progress
 guarantees between competing LL/SC sections, such a guarantee does not
 transfer to cmpxchg() implemented using LL/SC. Consider:
 
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH memory-model 4/4] tools: memory-model: Make plain accesses carry dependencies
  2023-01-05  1:09 [PATCH memory-model 0/4] LKMM updates for v6.3 Paul E. McKenney
                   ` (2 preceding siblings ...)
  2023-01-05  1:09 ` [PATCH memory-model 3/4] Documentation: Fixed a typo in atomic_t.txt Paul E. McKenney
@ 2023-01-05  1:09 ` Paul E. McKenney
  3 siblings, 0 replies; 5+ messages in thread
From: Paul E. McKenney @ 2023-01-05  1:09 UTC (permalink / raw)
  To: linux-kernel, linux-arch, kernel-team, mingo
  Cc: stern, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells,
	j.alglave, luc.maranget, akiyks, Jonas Oberhauser,
	Viktor Vafeiadis, Paul E . McKenney

From: Jonas Oberhauser <jonas.oberhauser@huawei.com>

As reported by Viktor, plain accesses in LKMM are weaker than
accesses to registers: the latter carry dependencies but the former
do not. This is exemplified in the following snippet:

  int r = READ_ONCE(*x);
  WRITE_ONCE(*y, r);

Here a data dependency links the READ_ONCE() to the WRITE_ONCE(),
preserving their order, because the model treats r as a register.
If r is turned into a memory location accessed by plain accesses,
however, the link is broken and the order between READ_ONCE() and
WRITE_ONCE() is no longer preserved.

This is too conservative, since any optimizations on plain
accesses that might break dependencies are also possible on
registers; it also contradicts the intuitive notion of "dependency"
as the data stored by the WRITE_ONCE() does depend on the data read
by the READ_ONCE(), independently of whether r is a register or a
memory location.

This is resolved by redefining all dependencies to include
dependencies carried by memory accesses; a dependency is said to be
carried by memory accesses (in the model: carry-dep) from one load
to another load if the initial load is followed by an arbitrarily
long sequence alternating between stores and loads of the same
thread, where the data of each store depends on the previous load,
and is read by the next load.

Any dependency linking the final load in the sequence to another
access also links the initial load in the sequence to that access.

More deep details can be found in this LKML discussion:

https://lore.kernel.org/lkml/d86295788ad14a02874ab030ddb8a6f8@huawei.com/

Reported-by: Viktor Vafeiadis <viktor@mpi-sws.org>
Signed-off-by: Jonas Oberhauser <jonas.oberhauser@huawei.com>
Reviewed-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 .../Documentation/explanation.txt             |  9 +++++-
 tools/memory-model/linux-kernel.bell          |  6 ++++
 .../litmus-tests/dep+plain.litmus             | 31 +++++++++++++++++++
 3 files changed, 45 insertions(+), 1 deletion(-)
 create mode 100644 tools/memory-model/litmus-tests/dep+plain.litmus

diff --git a/tools/memory-model/Documentation/explanation.txt b/tools/memory-model/Documentation/explanation.txt
index e901b47236c37..8e70852384709 100644
--- a/tools/memory-model/Documentation/explanation.txt
+++ b/tools/memory-model/Documentation/explanation.txt
@@ -2575,7 +2575,7 @@ smp_store_release() -- which is basically how the Linux kernel treats
 them.
 
 Although we said that plain accesses are not linked by the ppo
-relation, they do contribute to it indirectly.  Namely, when there is
+relation, they do contribute to it indirectly.  Firstly, when there is
 an address dependency from a marked load R to a plain store W,
 followed by smp_wmb() and then a marked store W', the LKMM creates a
 ppo link from R to W'.  The reasoning behind this is perhaps a little
@@ -2584,6 +2584,13 @@ for this source code in which W' could execute before R.  Just as with
 pre-bounding by address dependencies, it is possible for the compiler
 to undermine this relation if sufficient care is not taken.
 
+Secondly, plain accesses can carry dependencies: If a data dependency
+links a marked load R to a store W, and the store is read by a load R'
+from the same thread, then the data loaded by R' depends on the data
+loaded originally by R. Thus, if R' is linked to any access X by a
+dependency, R is also linked to access X by the same dependency, even
+if W' or R' (or both!) are plain.
+
 There are a few oddball fences which need special treatment:
 smp_mb__before_atomic(), smp_mb__after_atomic(), and
 smp_mb__after_spinlock().  The LKMM uses fence events with special
diff --git a/tools/memory-model/linux-kernel.bell b/tools/memory-model/linux-kernel.bell
index 5be86b1025e8d..70a9073dec3e5 100644
--- a/tools/memory-model/linux-kernel.bell
+++ b/tools/memory-model/linux-kernel.bell
@@ -82,3 +82,9 @@ flag ~empty different-values(srcu-rscs) as srcu-bad-nesting
 let Marked = (~M) | IW | Once | Release | Acquire | domain(rmw) | range(rmw) |
 		LKR | LKW | UL | LF | RL | RU
 let Plain = M \ Marked
+
+(* Redefine dependencies to include those carried through plain accesses *)
+let carry-dep = (data ; rfi)*
+let addr = carry-dep ; addr
+let ctrl = carry-dep ; ctrl
+let data = carry-dep ; data
diff --git a/tools/memory-model/litmus-tests/dep+plain.litmus b/tools/memory-model/litmus-tests/dep+plain.litmus
new file mode 100644
index 0000000000000..ebf84daa9a590
--- /dev/null
+++ b/tools/memory-model/litmus-tests/dep+plain.litmus
@@ -0,0 +1,31 @@
+C dep+plain
+
+(*
+ * Result: Never
+ *
+ * This litmus test demonstrates that in LKMM, plain accesses
+ * carry dependencies much like accesses to registers:
+ * The data stored to *z1 and *z2 by P0() originates from P0()'s
+ * READ_ONCE(), and therefore using that data to compute the
+ * conditional of P0()'s if-statement creates a control dependency
+ * from that READ_ONCE() to P0()'s WRITE_ONCE().
+ *)
+
+{}
+
+P0(int *x, int *y, int *z1, int *z2)
+{
+	int a = READ_ONCE(*x);
+	*z1 = a;
+	*z2 = *z1;
+	if (*z2 == 1)
+		WRITE_ONCE(*y, 1);
+}
+
+P1(int *x, int *y)
+{
+	int r = smp_load_acquire(y);
+	smp_store_release(x, r);
+}
+
+exists (x=1 /\ y=1)
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-01-05  1:10 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-05  1:09 [PATCH memory-model 0/4] LKMM updates for v6.3 Paul E. McKenney
2023-01-05  1:09 ` [PATCH memory-model 1/4] locking/memory-barriers.txt: Improve documentation for writel() example Paul E. McKenney
2023-01-05  1:09 ` [PATCH memory-model 2/4] tools: memory-model: Add rmw-sequences to the LKMM Paul E. McKenney
2023-01-05  1:09 ` [PATCH memory-model 3/4] Documentation: Fixed a typo in atomic_t.txt Paul E. McKenney
2023-01-05  1:09 ` [PATCH memory-model 4/4] tools: memory-model: Make plain accesses carry dependencies Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.