[PATCH locking/Documentation 1/2] Add note of release-acquire store vulnerability

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH locking/Documentation 1/2] Add note of release-acquire store vulnerability
@ 2016-09-29 15:54 Paul E. McKenney
  2016-09-29 15:58 ` Peter Zijlstra
  0 siblings, 1 reply; 27+ messages in thread
From: Paul E. McKenney @ 2016-09-29 15:54 UTC (permalink / raw)
  To: linux-kernel; +Cc: mingo, dhowells, will.deacon, peterz, stern

If two processes are related by a RELEASE+ACQUIRE pair, ordering can be
broken if a third process overwrites the value written by the RELEASE
operation before the ACQUIRE operation has a chance of reading it.
This commit therefore updates the documentation to call this vulnerability
out explicitly.

Reported-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 Documentation/memory-barriers.txt | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index ba818ecce6f9..a57679ec9441 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -490,14 +490,18 @@ And a couple of implicit varieties:
      the subsection "MMIO write barrier").  In addition, a RELEASE+ACQUIRE
      pair is -not- guaranteed to act as a full memory barrier.  However, after
      an ACQUIRE on a given variable, all memory accesses preceding any prior
-     RELEASE on that same variable are guaranteed to be visible.  In other
-     words, within a given variable's critical section, all accesses of all
-     previous critical sections for that variable are guaranteed to have
-     completed.
+     RELEASE on that same variable in that same chain of RELEASE+ACQUIRE
+     pairs are guaranteed to be visible.  In other words, within a given
+     variable's critical section, all accesses of all previous critical
+     sections for that variable are guaranteed to have completed.
 
      This means that ACQUIRE acts as a minimal "acquire" operation and
      RELEASE acts as a minimal "release" operation.
 
+     However, please note that a chain of RELEASE+ACQUIRE pairs may be
+     broken by a store by another thread that overwrites the RELEASE
+     operation's store before the ACQUIRE operation's read.
+
 A subset of the atomic operations described in atomic_ops.txt have ACQUIRE
 and RELEASE variants in addition to fully-ordered and relaxed (no barrier
 semantics) definitions.  For compound atomics performing both a load and a
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH locking/Documentation 1/2] Add note of release-acquire store vulnerability
  2016-09-29 15:54 [PATCH locking/Documentation 1/2] Add note of release-acquire store vulnerability Paul E. McKenney
@ 2016-09-29 15:58 ` Peter Zijlstra
  2016-09-29 16:03   ` Will Deacon
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2016-09-29 15:58 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: linux-kernel, mingo, dhowells, will.deacon, stern

On Thu, Sep 29, 2016 at 08:54:01AM -0700, Paul E. McKenney wrote:
> If two processes are related by a RELEASE+ACQUIRE pair, ordering can be
> broken if a third process overwrites the value written by the RELEASE
> operation before the ACQUIRE operation has a chance of reading it.
> This commit therefore updates the documentation to call this vulnerability
> out explicitly.
> 
> Reported-by: Alan Stern <stern@rowland.harvard.edu>
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

> +     However, please note that a chain of RELEASE+ACQUIRE pairs may be
> +     broken by a store by another thread that overwrites the RELEASE
> +     operation's store before the ACQUIRE operation's read.

This is the powerpc lwsync quirk, right? Where the barrier disappears
when it looses the store.

Or is there more to it? Its not entirely clear from the Changelog, which
I feel should describe the reason for the behaviour.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH locking/Documentation 1/2] Add note of release-acquire store vulnerability
  2016-09-29 15:58 ` Peter Zijlstra
@ 2016-09-29 16:03   ` Will Deacon
  2016-09-29 16:17     ` Peter Zijlstra
  2016-09-29 16:43     ` Paul E. McKenney
  0 siblings, 2 replies; 27+ messages in thread
From: Will Deacon @ 2016-09-29 16:03 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Paul E. McKenney, linux-kernel, mingo, dhowells, stern

On Thu, Sep 29, 2016 at 05:58:17PM +0200, Peter Zijlstra wrote:
> On Thu, Sep 29, 2016 at 08:54:01AM -0700, Paul E. McKenney wrote:
> > If two processes are related by a RELEASE+ACQUIRE pair, ordering can be
> > broken if a third process overwrites the value written by the RELEASE
> > operation before the ACQUIRE operation has a chance of reading it.
> > This commit therefore updates the documentation to call this vulnerability
> > out explicitly.
> > 
> > Reported-by: Alan Stern <stern@rowland.harvard.edu>
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
> > +     However, please note that a chain of RELEASE+ACQUIRE pairs may be
> > +     broken by a store by another thread that overwrites the RELEASE
> > +     operation's store before the ACQUIRE operation's read.
> 
> This is the powerpc lwsync quirk, right? Where the barrier disappears
> when it looses the store.
> 
> Or is there more to it? Its not entirely clear from the Changelog, which
> I feel should describe the reason for the behaviour.

If I've groked it correctly, it's for cases like:


PO:
Wx=1
WyRel=1

P1:
Wy=2

P2:
RyAcq=2
Rx=0

Final value of y is 2.


This is permitted on arm64. If you make P1's store a store-release, then
it's forbidden, but I suspect that's not generally true of the kernel
memory model.

Will

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH locking/Documentation 1/2] Add note of release-acquire store vulnerability
  2016-09-29 16:03   ` Will Deacon
@ 2016-09-29 16:17     ` Peter Zijlstra
  2016-09-29 16:44       ` Paul E. McKenney
  2016-09-29 16:43     ` Paul E. McKenney
  1 sibling, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2016-09-29 16:17 UTC (permalink / raw)
  To: Will Deacon; +Cc: Paul E. McKenney, linux-kernel, mingo, dhowells, stern

On Thu, Sep 29, 2016 at 05:03:08PM +0100, Will Deacon wrote:
> On Thu, Sep 29, 2016 at 05:58:17PM +0200, Peter Zijlstra wrote:
> > On Thu, Sep 29, 2016 at 08:54:01AM -0700, Paul E. McKenney wrote:
> > > If two processes are related by a RELEASE+ACQUIRE pair, ordering can be
> > > broken if a third process overwrites the value written by the RELEASE
> > > operation before the ACQUIRE operation has a chance of reading it.
> > > This commit therefore updates the documentation to call this vulnerability
> > > out explicitly.
> > > 
> > > Reported-by: Alan Stern <stern@rowland.harvard.edu>
> > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > 
> > > +     However, please note that a chain of RELEASE+ACQUIRE pairs may be
> > > +     broken by a store by another thread that overwrites the RELEASE
> > > +     operation's store before the ACQUIRE operation's read.
> > 
> > This is the powerpc lwsync quirk, right? Where the barrier disappears
> > when it looses the store.
> > 
> > Or is there more to it? Its not entirely clear from the Changelog, which
> > I feel should describe the reason for the behaviour.
> 
> If I've groked it correctly, it's for cases like:
> 
> 
> PO:
> Wx=1
> WyRel=1
> 
> P1:
> Wy=2
> 
> P2:
> RyAcq=2
> Rx=0
> 
> Final value of y is 2.
> 
> 
> This is permitted on arm64. If you make P1's store a store-release, then
> it's forbidden, but I suspect that's not generally true of the kernel
> memory model.

Right, I think that on PowerPC, even if P1 does store-release you can
still get this, since the two stores conflict one can loose out, and the
lwsync associated with the loosing store gets removed along with it.


So yes, I think this needs more clarification.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH locking/Documentation 1/2] Add note of release-acquire store vulnerability
  2016-09-29 16:03   ` Will Deacon
  2016-09-29 16:17     ` Peter Zijlstra
@ 2016-09-29 16:43     ` Paul E. McKenney
  2016-09-29 17:10       ` Will Deacon
  2016-09-30 10:25       ` Peter Zijlstra
  1 sibling, 2 replies; 27+ messages in thread
From: Paul E. McKenney @ 2016-09-29 16:43 UTC (permalink / raw)
  To: Will Deacon; +Cc: Peter Zijlstra, linux-kernel, mingo, dhowells, stern

On Thu, Sep 29, 2016 at 05:03:08PM +0100, Will Deacon wrote:
> On Thu, Sep 29, 2016 at 05:58:17PM +0200, Peter Zijlstra wrote:
> > On Thu, Sep 29, 2016 at 08:54:01AM -0700, Paul E. McKenney wrote:
> > > If two processes are related by a RELEASE+ACQUIRE pair, ordering can be
> > > broken if a third process overwrites the value written by the RELEASE
> > > operation before the ACQUIRE operation has a chance of reading it.
> > > This commit therefore updates the documentation to call this vulnerability
> > > out explicitly.
> > > 
> > > Reported-by: Alan Stern <stern@rowland.harvard.edu>
> > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > 
> > > +     However, please note that a chain of RELEASE+ACQUIRE pairs may be
> > > +     broken by a store by another thread that overwrites the RELEASE
> > > +     operation's store before the ACQUIRE operation's read.
> > 
> > This is the powerpc lwsync quirk, right? Where the barrier disappears
> > when it looses the store.
> > 
> > Or is there more to it? Its not entirely clear from the Changelog, which
> > I feel should describe the reason for the behaviour.
> 
> If I've groked it correctly, it's for cases like:
> 
> 
> PO:
> Wx=1
> WyRel=1
> 
> P1:
> Wy=2
> 
> P2:
> RyAcq=2
> Rx=0
> 
> Final value of y is 2.
> 
> 
> This is permitted on arm64. If you make P1's store a store-release, then
> it's forbidden, but I suspect that's not generally true of the kernel
> memory model.

That is the one!  And to Peter's point, powerpc does the same for the
example as shown.  However, on powerpc, upgrading P1's store to release
has no effect because there is no earlier access for the resulting
lwsync to influence.  For whatever it might be worth, C11 won't guarantee
ordering in that case, either.  Nor will the current Linux-kernel memory
model.  (Yes, I did just try it to make sure.  Why do you ask?)

So you guys are fishing for an expanded commit log, for example, like
the following?  ;-)

							Thanx, Paul

------------------------------------------------------------------------

If two processes are related by a RELEASE+ACQUIRE pair, ordering can be
broken if a third process overwrites the value written by the RELEASE
operation before the ACQUIRE operation has a chance of reading it, for
example:

	P0(int *x, int *y)
	{
		WRITE_ONCE(*x, 1);
		smp_wmb();
		smp_store_release(y, 1);
	}

	P1(int *y)
	{
		smp_store_release(y, 2);
	}

	P2(int *x, int *y)
	{
		r1 = smp_load_acquire(y);
		r2 = READ_ONCE(*x);
	}

Both ARM and powerpc allow the "after the dust settles" outcome (r1=2 &&
r2=0), as does the current version of the early prototype Linux-kernel
memory model.

This commit therefore updates the documentation to call this vulnerability
out explicitly.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH locking/Documentation 1/2] Add note of release-acquire store vulnerability
  2016-09-29 16:17     ` Peter Zijlstra
@ 2016-09-29 16:44       ` Paul E. McKenney
  0 siblings, 0 replies; 27+ messages in thread
From: Paul E. McKenney @ 2016-09-29 16:44 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Will Deacon, linux-kernel, mingo, dhowells, stern

On Thu, Sep 29, 2016 at 06:17:38PM +0200, Peter Zijlstra wrote:
> On Thu, Sep 29, 2016 at 05:03:08PM +0100, Will Deacon wrote:
> > On Thu, Sep 29, 2016 at 05:58:17PM +0200, Peter Zijlstra wrote:
> > > On Thu, Sep 29, 2016 at 08:54:01AM -0700, Paul E. McKenney wrote:
> > > > If two processes are related by a RELEASE+ACQUIRE pair, ordering can be
> > > > broken if a third process overwrites the value written by the RELEASE
> > > > operation before the ACQUIRE operation has a chance of reading it.
> > > > This commit therefore updates the documentation to call this vulnerability
> > > > out explicitly.
> > > > 
> > > > Reported-by: Alan Stern <stern@rowland.harvard.edu>
> > > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > 
> > > > +     However, please note that a chain of RELEASE+ACQUIRE pairs may be
> > > > +     broken by a store by another thread that overwrites the RELEASE
> > > > +     operation's store before the ACQUIRE operation's read.
> > > 
> > > This is the powerpc lwsync quirk, right? Where the barrier disappears
> > > when it looses the store.
> > > 
> > > Or is there more to it? Its not entirely clear from the Changelog, which
> > > I feel should describe the reason for the behaviour.
> > 
> > If I've groked it correctly, it's for cases like:
> > 
> > 
> > PO:
> > Wx=1
> > WyRel=1
> > 
> > P1:
> > Wy=2
> > 
> > P2:
> > RyAcq=2
> > Rx=0
> > 
> > Final value of y is 2.
> > 
> > 
> > This is permitted on arm64. If you make P1's store a store-release, then
> > it's forbidden, but I suspect that's not generally true of the kernel
> > memory model.
> 
> Right, I think that on PowerPC, even if P1 does store-release you can
> still get this, since the two stores conflict one can loose out, and the
> lwsync associated with the loosing store gets removed along with it.
> 
> So yes, I think this needs more clarification.

Whether the store is loose or not, yes, putting an lwsync as the first
instruction in a given task has no effect.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH locking/Documentation 1/2] Add note of release-acquire store vulnerability
  2016-09-29 16:43     ` Paul E. McKenney
@ 2016-09-29 17:10       ` Will Deacon
  2016-09-29 17:23         ` Paul E. McKenney
  2016-09-30 10:25       ` Peter Zijlstra
  1 sibling, 1 reply; 27+ messages in thread
From: Will Deacon @ 2016-09-29 17:10 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Peter Zijlstra, linux-kernel, mingo, dhowells, stern

On Thu, Sep 29, 2016 at 09:43:53AM -0700, Paul E. McKenney wrote:
> On Thu, Sep 29, 2016 at 05:03:08PM +0100, Will Deacon wrote:
> > On Thu, Sep 29, 2016 at 05:58:17PM +0200, Peter Zijlstra wrote:
> > > On Thu, Sep 29, 2016 at 08:54:01AM -0700, Paul E. McKenney wrote:
> > > > If two processes are related by a RELEASE+ACQUIRE pair, ordering can be
> > > > broken if a third process overwrites the value written by the RELEASE
> > > > operation before the ACQUIRE operation has a chance of reading it.
> > > > This commit therefore updates the documentation to call this vulnerability
> > > > out explicitly.
> > > > 
> > > > Reported-by: Alan Stern <stern@rowland.harvard.edu>
> > > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > 
> > > > +     However, please note that a chain of RELEASE+ACQUIRE pairs may be
> > > > +     broken by a store by another thread that overwrites the RELEASE
> > > > +     operation's store before the ACQUIRE operation's read.
> > > 
> > > This is the powerpc lwsync quirk, right? Where the barrier disappears
> > > when it looses the store.
> > > 
> > > Or is there more to it? Its not entirely clear from the Changelog, which
> > > I feel should describe the reason for the behaviour.
> > 
> > If I've groked it correctly, it's for cases like:
> > 
> > 
> > PO:
> > Wx=1
> > WyRel=1
> > 
> > P1:
> > Wy=2
> > 
> > P2:
> > RyAcq=2
> > Rx=0
> > 
> > Final value of y is 2.
> > 
> > 
> > This is permitted on arm64. If you make P1's store a store-release, then
> > it's forbidden, but I suspect that's not generally true of the kernel
> > memory model.
> 
> That is the one!  And to Peter's point, powerpc does the same for the
> example as shown.  However, on powerpc, upgrading P1's store to release
> has no effect because there is no earlier access for the resulting
> lwsync to influence.  For whatever it might be worth, C11 won't guarantee
> ordering in that case, either.  Nor will the current Linux-kernel memory
> model.  (Yes, I did just try it to make sure.  Why do you ask?)
> 
> So you guys are fishing for an expanded commit log, for example, like
> the following?  ;-)
> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> If two processes are related by a RELEASE+ACQUIRE pair, ordering can be
> broken if a third process overwrites the value written by the RELEASE
> operation before the ACQUIRE operation has a chance of reading it, for
> example:
> 
> 	P0(int *x, int *y)
> 	{
> 		WRITE_ONCE(*x, 1);
> 		smp_wmb();
> 		smp_store_release(y, 1);
> 	}
> 
> 	P1(int *y)
> 	{
> 		smp_store_release(y, 2);
> 	}
> 
> 	P2(int *x, int *y)
> 	{
> 		r1 = smp_load_acquire(y);
> 		r2 = READ_ONCE(*x);
> 	}
> 
> Both ARM and powerpc allow the "after the dust settles" outcome (r1=2 &&
> r2=0), as does the current version of the early prototype Linux-kernel
> memory model.

FWIW, ARM doesn't allow this and arm64 only allows it if P1 uses WRITE_ONCE
instead of store-release.

Will

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH locking/Documentation 1/2] Add note of release-acquire store vulnerability
  2016-09-29 17:10       ` Will Deacon
@ 2016-09-29 17:23         ` Paul E. McKenney
  2016-09-29 18:04           ` Paul E. McKenney
  2016-09-30  5:53           ` Boqun Feng
  0 siblings, 2 replies; 27+ messages in thread
From: Paul E. McKenney @ 2016-09-29 17:23 UTC (permalink / raw)
  To: Will Deacon; +Cc: Peter Zijlstra, linux-kernel, mingo, dhowells, stern

On Thu, Sep 29, 2016 at 06:10:37PM +0100, Will Deacon wrote:
> On Thu, Sep 29, 2016 at 09:43:53AM -0700, Paul E. McKenney wrote:
> > On Thu, Sep 29, 2016 at 05:03:08PM +0100, Will Deacon wrote:
> > > On Thu, Sep 29, 2016 at 05:58:17PM +0200, Peter Zijlstra wrote:
> > > > On Thu, Sep 29, 2016 at 08:54:01AM -0700, Paul E. McKenney wrote:
> > > > > If two processes are related by a RELEASE+ACQUIRE pair, ordering can be
> > > > > broken if a third process overwrites the value written by the RELEASE
> > > > > operation before the ACQUIRE operation has a chance of reading it.
> > > > > This commit therefore updates the documentation to call this vulnerability
> > > > > out explicitly.
> > > > > 
> > > > > Reported-by: Alan Stern <stern@rowland.harvard.edu>
> > > > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > > 
> > > > > +     However, please note that a chain of RELEASE+ACQUIRE pairs may be
> > > > > +     broken by a store by another thread that overwrites the RELEASE
> > > > > +     operation's store before the ACQUIRE operation's read.
> > > > 
> > > > This is the powerpc lwsync quirk, right? Where the barrier disappears
> > > > when it looses the store.
> > > > 
> > > > Or is there more to it? Its not entirely clear from the Changelog, which
> > > > I feel should describe the reason for the behaviour.
> > > 
> > > If I've groked it correctly, it's for cases like:
> > > 
> > > 
> > > PO:
> > > Wx=1
> > > WyRel=1
> > > 
> > > P1:
> > > Wy=2
> > > 
> > > P2:
> > > RyAcq=2
> > > Rx=0
> > > 
> > > Final value of y is 2.
> > > 
> > > 
> > > This is permitted on arm64. If you make P1's store a store-release, then
> > > it's forbidden, but I suspect that's not generally true of the kernel
> > > memory model.
> > 
> > That is the one!  And to Peter's point, powerpc does the same for the
> > example as shown.  However, on powerpc, upgrading P1's store to release
> > has no effect because there is no earlier access for the resulting
> > lwsync to influence.  For whatever it might be worth, C11 won't guarantee
> > ordering in that case, either.  Nor will the current Linux-kernel memory
> > model.  (Yes, I did just try it to make sure.  Why do you ask?)
> > 
> > So you guys are fishing for an expanded commit log, for example, like
> > the following?  ;-)
> > 
> > 							Thanx, Paul
> > 
> > ------------------------------------------------------------------------
> > 
> > If two processes are related by a RELEASE+ACQUIRE pair, ordering can be
> > broken if a third process overwrites the value written by the RELEASE
> > operation before the ACQUIRE operation has a chance of reading it, for
> > example:
> > 
> > 	P0(int *x, int *y)
> > 	{
> > 		WRITE_ONCE(*x, 1);
> > 		smp_wmb();
> > 		smp_store_release(y, 1);
> > 	}
> > 
> > 	P1(int *y)
> > 	{
> > 		smp_store_release(y, 2);
> > 	}
> > 
> > 	P2(int *x, int *y)
> > 	{
> > 		r1 = smp_load_acquire(y);
> > 		r2 = READ_ONCE(*x);
> > 	}
> > 
> > Both ARM and powerpc allow the "after the dust settles" outcome (r1=2 &&
> > r2=0), as does the current version of the early prototype Linux-kernel
> > memory model.
> 
> FWIW, ARM doesn't allow this and arm64 only allows it if P1 uses WRITE_ONCE
> instead of store-release.

Good catch, apologies for the error.  The following, then?

							Thanx, Paul

------------------------------------------------------------------------

If two processes are related by a RELEASE+ACQUIRE pair, ordering can be
broken if a third process overwrites the value written by the RELEASE
operation before the ACQUIRE operation has a chance of reading it, for
example:

	P0(int *x, int *y)
	{
		WRITE_ONCE(*x, 1);
		smp_wmb();
		smp_store_release(y, 1);
	}

	P1(int *y)
	{
		WRITE_ONCE(*y, 2);
	}

	P2(int *x, int *y)
	{
		r1 = smp_load_acquire(y);
		r2 = READ_ONCE(*x);
	}

Both ARM and powerpc allow the "after the dust settles" outcome (r1=2 &&
r2=0), as does the current version of the early prototype Linux-kernel
memory model.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH locking/Documentation 1/2] Add note of release-acquire store vulnerability
  2016-09-29 17:23         ` Paul E. McKenney
@ 2016-09-29 18:04           ` Paul E. McKenney
  2016-09-29 18:10             ` Paul E. McKenney
  2016-09-30  5:53           ` Boqun Feng
  1 sibling, 1 reply; 27+ messages in thread
From: Paul E. McKenney @ 2016-09-29 18:04 UTC (permalink / raw)
  To: Will Deacon; +Cc: Peter Zijlstra, linux-kernel, mingo, dhowells, stern

On Thu, Sep 29, 2016 at 10:23:22AM -0700, Paul E. McKenney wrote:
> On Thu, Sep 29, 2016 at 06:10:37PM +0100, Will Deacon wrote:
> > On Thu, Sep 29, 2016 at 09:43:53AM -0700, Paul E. McKenney wrote:
> > > On Thu, Sep 29, 2016 at 05:03:08PM +0100, Will Deacon wrote:
> > > > On Thu, Sep 29, 2016 at 05:58:17PM +0200, Peter Zijlstra wrote:
> > > > > On Thu, Sep 29, 2016 at 08:54:01AM -0700, Paul E. McKenney wrote:
> > > > > > If two processes are related by a RELEASE+ACQUIRE pair, ordering can be
> > > > > > broken if a third process overwrites the value written by the RELEASE
> > > > > > operation before the ACQUIRE operation has a chance of reading it.
> > > > > > This commit therefore updates the documentation to call this vulnerability
> > > > > > out explicitly.
> > > > > > 
> > > > > > Reported-by: Alan Stern <stern@rowland.harvard.edu>
> > > > > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > > > 
> > > > > > +     However, please note that a chain of RELEASE+ACQUIRE pairs may be
> > > > > > +     broken by a store by another thread that overwrites the RELEASE
> > > > > > +     operation's store before the ACQUIRE operation's read.
> > > > > 
> > > > > This is the powerpc lwsync quirk, right? Where the barrier disappears
> > > > > when it looses the store.
> > > > > 
> > > > > Or is there more to it? Its not entirely clear from the Changelog, which
> > > > > I feel should describe the reason for the behaviour.
> > > > 
> > > > If I've groked it correctly, it's for cases like:
> > > > 
> > > > 
> > > > PO:
> > > > Wx=1
> > > > WyRel=1
> > > > 
> > > > P1:
> > > > Wy=2
> > > > 
> > > > P2:
> > > > RyAcq=2
> > > > Rx=0
> > > > 
> > > > Final value of y is 2.
> > > > 
> > > > 
> > > > This is permitted on arm64. If you make P1's store a store-release, then
> > > > it's forbidden, but I suspect that's not generally true of the kernel
> > > > memory model.
> > > 
> > > That is the one!  And to Peter's point, powerpc does the same for the
> > > example as shown.  However, on powerpc, upgrading P1's store to release
> > > has no effect because there is no earlier access for the resulting
> > > lwsync to influence.  For whatever it might be worth, C11 won't guarantee
> > > ordering in that case, either.  Nor will the current Linux-kernel memory
> > > model.  (Yes, I did just try it to make sure.  Why do you ask?)
> > > 
> > > So you guys are fishing for an expanded commit log, for example, like
> > > the following?  ;-)
> > > 
> > > 							Thanx, Paul
> > > 
> > > ------------------------------------------------------------------------
> > > 
> > > If two processes are related by a RELEASE+ACQUIRE pair, ordering can be
> > > broken if a third process overwrites the value written by the RELEASE
> > > operation before the ACQUIRE operation has a chance of reading it, for
> > > example:
> > > 
> > > 	P0(int *x, int *y)
> > > 	{
> > > 		WRITE_ONCE(*x, 1);
> > > 		smp_wmb();
> > > 		smp_store_release(y, 1);
> > > 	}
> > > 
> > > 	P1(int *y)
> > > 	{
> > > 		smp_store_release(y, 2);
> > > 	}
> > > 
> > > 	P2(int *x, int *y)
> > > 	{
> > > 		r1 = smp_load_acquire(y);
> > > 		r2 = READ_ONCE(*x);
> > > 	}
> > > 
> > > Both ARM and powerpc allow the "after the dust settles" outcome (r1=2 &&
> > > r2=0), as does the current version of the early prototype Linux-kernel
> > > memory model.
> > 
> > FWIW, ARM doesn't allow this and arm64 only allows it if P1 uses WRITE_ONCE
> > instead of store-release.
> 
> Good catch, apologies for the error.  The following, then?
> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> If two processes are related by a RELEASE+ACQUIRE pair, ordering can be
> broken if a third process overwrites the value written by the RELEASE
> operation before the ACQUIRE operation has a chance of reading it, for
> example:
> 
> 	P0(int *x, int *y)
> 	{
> 		WRITE_ONCE(*x, 1);
> 		smp_wmb();
> 		smp_store_release(y, 1);
> 	}
> 
> 	P1(int *y)
> 	{
> 		WRITE_ONCE(*y, 2);
> 	}
> 
> 	P2(int *x, int *y)
> 	{
> 		r1 = smp_load_acquire(y);
> 		r2 = READ_ONCE(*x);
> 	}
> 
> Both ARM and powerpc allow the "after the dust settles" outcome (r1=2 &&
> r2=0), as does the current version of the early prototype Linux-kernel

And the above needs to be (r1!=2 || r2 != 0)...  Sigh!

							Thanx, Paul

> memory model.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH locking/Documentation 1/2] Add note of release-acquire store vulnerability
  2016-09-29 18:04           ` Paul E. McKenney
@ 2016-09-29 18:10             ` Paul E. McKenney
  2016-09-29 18:44               ` Peter Zijlstra
  0 siblings, 1 reply; 27+ messages in thread
From: Paul E. McKenney @ 2016-09-29 18:10 UTC (permalink / raw)
  To: Will Deacon; +Cc: Peter Zijlstra, linux-kernel, mingo, dhowells, stern

On Thu, Sep 29, 2016 at 11:04:44AM -0700, Paul E. McKenney wrote:
> On Thu, Sep 29, 2016 at 10:23:22AM -0700, Paul E. McKenney wrote:
> > On Thu, Sep 29, 2016 at 06:10:37PM +0100, Will Deacon wrote:
> > > On Thu, Sep 29, 2016 at 09:43:53AM -0700, Paul E. McKenney wrote:
> > > > On Thu, Sep 29, 2016 at 05:03:08PM +0100, Will Deacon wrote:
> > > > > On Thu, Sep 29, 2016 at 05:58:17PM +0200, Peter Zijlstra wrote:
> > > > > > On Thu, Sep 29, 2016 at 08:54:01AM -0700, Paul E. McKenney wrote:
> > > > > > > If two processes are related by a RELEASE+ACQUIRE pair, ordering can be
> > > > > > > broken if a third process overwrites the value written by the RELEASE
> > > > > > > operation before the ACQUIRE operation has a chance of reading it.
> > > > > > > This commit therefore updates the documentation to call this vulnerability
> > > > > > > out explicitly.
> > > > > > > 
> > > > > > > Reported-by: Alan Stern <stern@rowland.harvard.edu>
> > > > > > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > > > > 
> > > > > > > +     However, please note that a chain of RELEASE+ACQUIRE pairs may be
> > > > > > > +     broken by a store by another thread that overwrites the RELEASE
> > > > > > > +     operation's store before the ACQUIRE operation's read.
> > > > > > 
> > > > > > This is the powerpc lwsync quirk, right? Where the barrier disappears
> > > > > > when it looses the store.
> > > > > > 
> > > > > > Or is there more to it? Its not entirely clear from the Changelog, which
> > > > > > I feel should describe the reason for the behaviour.
> > > > > 
> > > > > If I've groked it correctly, it's for cases like:
> > > > > 
> > > > > 
> > > > > PO:
> > > > > Wx=1
> > > > > WyRel=1
> > > > > 
> > > > > P1:
> > > > > Wy=2
> > > > > 
> > > > > P2:
> > > > > RyAcq=2
> > > > > Rx=0
> > > > > 
> > > > > Final value of y is 2.
> > > > > 
> > > > > 
> > > > > This is permitted on arm64. If you make P1's store a store-release, then
> > > > > it's forbidden, but I suspect that's not generally true of the kernel
> > > > > memory model.
> > > > 
> > > > That is the one!  And to Peter's point, powerpc does the same for the
> > > > example as shown.  However, on powerpc, upgrading P1's store to release
> > > > has no effect because there is no earlier access for the resulting
> > > > lwsync to influence.  For whatever it might be worth, C11 won't guarantee
> > > > ordering in that case, either.  Nor will the current Linux-kernel memory
> > > > model.  (Yes, I did just try it to make sure.  Why do you ask?)
> > > > 
> > > > So you guys are fishing for an expanded commit log, for example, like
> > > > the following?  ;-)
> > > > 
> > > > 							Thanx, Paul
> > > > 
> > > > ------------------------------------------------------------------------
> > > > 
> > > > If two processes are related by a RELEASE+ACQUIRE pair, ordering can be
> > > > broken if a third process overwrites the value written by the RELEASE
> > > > operation before the ACQUIRE operation has a chance of reading it, for
> > > > example:
> > > > 
> > > > 	P0(int *x, int *y)
> > > > 	{
> > > > 		WRITE_ONCE(*x, 1);
> > > > 		smp_wmb();
> > > > 		smp_store_release(y, 1);
> > > > 	}
> > > > 
> > > > 	P1(int *y)
> > > > 	{
> > > > 		smp_store_release(y, 2);
> > > > 	}
> > > > 
> > > > 	P2(int *x, int *y)
> > > > 	{
> > > > 		r1 = smp_load_acquire(y);
> > > > 		r2 = READ_ONCE(*x);
> > > > 	}
> > > > 
> > > > Both ARM and powerpc allow the "after the dust settles" outcome (r1=2 &&
> > > > r2=0), as does the current version of the early prototype Linux-kernel
> > > > memory model.
> > > 
> > > FWIW, ARM doesn't allow this and arm64 only allows it if P1 uses WRITE_ONCE
> > > instead of store-release.
> > 
> > Good catch, apologies for the error.  The following, then?
> > 
> > 							Thanx, Paul
> > 
> > ------------------------------------------------------------------------
> > 
> > If two processes are related by a RELEASE+ACQUIRE pair, ordering can be
> > broken if a third process overwrites the value written by the RELEASE
> > operation before the ACQUIRE operation has a chance of reading it, for
> > example:
> > 
> > 	P0(int *x, int *y)
> > 	{
> > 		WRITE_ONCE(*x, 1);
> > 		smp_wmb();
> > 		smp_store_release(y, 1);
> > 	}
> > 
> > 	P1(int *y)
> > 	{
> > 		WRITE_ONCE(*y, 2);
> > 	}
> > 
> > 	P2(int *x, int *y)
> > 	{
> > 		r1 = smp_load_acquire(y);
> > 		r2 = READ_ONCE(*x);
> > 	}
> > 
> > Both ARM and powerpc allow the "after the dust settles" outcome (r1=2 &&
> > r2=0), as does the current version of the early prototype Linux-kernel
> 
> And the above needs to be (r1!=2 || r2 != 0)...  Sigh!

Make that (y==2 && r1==2 && r2 == 0).

Any further bids?  ;-)

							Thanx, Paul

> > memory model.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH locking/Documentation 1/2] Add note of release-acquire store vulnerability
  2016-09-29 18:10             ` Paul E. McKenney
@ 2016-09-29 18:44               ` Peter Zijlstra
  2016-09-29 19:18                 ` Paul E. McKenney
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2016-09-29 18:44 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Will Deacon, linux-kernel, mingo, dhowells, stern

On Thu, Sep 29, 2016 at 11:10:15AM -0700, Paul E. McKenney wrote:
> > > 
> > > 	P0(int *x, int *y)
> > > 	{
> > > 		WRITE_ONCE(*x, 1);
> > > 		smp_wmb();
> > > 		smp_store_release(y, 1);
> > > 	}
> > > 
> > > 	P1(int *y)
> > > 	{
> > > 		WRITE_ONCE(*y, 2);
> > > 	}
> > > 
> > > 	P2(int *x, int *y)
> > > 	{
> > > 		r1 = smp_load_acquire(y);
> > > 		r2 = READ_ONCE(*x);
> > > 	}
> > > 
> > > Both ARM and powerpc allow the "after the dust settles" outcome (r1=2 &&
> > > r2=0), as does the current version of the early prototype Linux-kernel
> > 
> > And the above needs to be (r1!=2 || r2 != 0)...  Sigh!
> 
> Make that (y==2 && r1==2 && r2 == 0).
> 
> Any further bids?  ;-)

Isn't that the trivial P1,P2,P0 order again?

How about something like so on PPC?

P0(int *x, int *y)
{
	WRITE_ONCE(*x, 1);
	smp_store_release(y, 1);
}

P1(int *x, int *y)
{
	WRITE_ONCE(x, 2);
	smp_store_release(y, 2);
}

P2(int *x, int *y)
{
	r1 = smp_load_acquire(y);
	r2 = READ_ONCE(*x);
}

(((x==1 && y==2) | (x==2 && y==1)) && (r1==1 || r1==2) && r2==0)

If you execute P0 and P1 concurrently and one store of each 'wins' the
LWSYNC of either is null and void, and therefore P2 is unordered and can
observe r2==0.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH locking/Documentation 1/2] Add note of release-acquire store vulnerability
  2016-09-29 18:44               ` Peter Zijlstra
@ 2016-09-29 19:18                 ` Paul E. McKenney
  2016-09-29 19:36                   ` Alan Stern
                                     ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Paul E. McKenney @ 2016-09-29 19:18 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Will Deacon, linux-kernel, mingo, dhowells, stern

On Thu, Sep 29, 2016 at 08:44:39PM +0200, Peter Zijlstra wrote:
> On Thu, Sep 29, 2016 at 11:10:15AM -0700, Paul E. McKenney wrote:
> > > > 
> > > > 	P0(int *x, int *y)
> > > > 	{
> > > > 		WRITE_ONCE(*x, 1);
> > > > 		smp_wmb();
> > > > 		smp_store_release(y, 1);
> > > > 	}
> > > > 
> > > > 	P1(int *y)
> > > > 	{
> > > > 		WRITE_ONCE(*y, 2);
> > > > 	}
> > > > 
> > > > 	P2(int *x, int *y)
> > > > 	{
> > > > 		r1 = smp_load_acquire(y);
> > > > 		r2 = READ_ONCE(*x);
> > > > 	}
> > > > 
> > > > Both ARM and powerpc allow the "after the dust settles" outcome (r1=2 &&
> > > > r2=0), as does the current version of the early prototype Linux-kernel
> > > 
> > > And the above needs to be (r1!=2 || r2 != 0)...  Sigh!
> > 
> > Make that (y==2 && r1==2 && r2 == 0).
> > 
> > Any further bids?  ;-)
> 
> Isn't that the trivial P1,P2,P0 order again?

I don't believe so.  Wouldn't the final P0 would leave y==1?

> How about something like so on PPC?
> 
> P0(int *x, int *y)
> {
> 	WRITE_ONCE(*x, 1);
> 	smp_store_release(y, 1);
> }
> 
> P1(int *x, int *y)
> {
> 	WRITE_ONCE(x, 2);

Need "WRITE_ONCE(*x, 2)" here.

> 	smp_store_release(y, 2);
> }
> 
> P2(int *x, int *y)
> {
> 	r1 = smp_load_acquire(y);
> 	r2 = READ_ONCE(*x);
> }
> 
> (((x==1 && y==2) | (x==2 && y==1)) && (r1==1 || r1==2) && r2==0)

That exists-clause is quite dazzling...  So if each of P0 and P1
win, but on different stores, and if P2 follows one or the other
of P0 or P1, can r2 get the pre-initialization value for x?

> If you execute P0 and P1 concurrently and one store of each 'wins' the
> LWSYNC of either is null and void, and therefore P2 is unordered and can
> observe r2==0.

That vaguely resembles the infamous Z6.3, but only vaguely.  The Linux-kernel
memory model says "forbidden" to this:

	C C-WillDeacon-AcqRelStore.litmus

	{
	}

	P0(int *x, int *y)
	{
	      WRITE_ONCE(*x, 1);
	      smp_store_release(y, 1);
	}

	P1(int *x, int *y)
	{
	      WRITE_ONCE(*x, 2);
	      smp_store_release(y, 2);
	}

	P2(int *x, int *y)
	{
	      r1 = smp_load_acquire(y);
	      r2 = READ_ONCE(*x);
	}

	exists
	(((x=1 /\ y=2) \/ (x=2 /\ y=1)) /\ (2:r1=1 \/ 2:r1=2) /\ 2:r2=0)

So let's try PPCMEM.  If PPCMEM allows it, then the kernel model is
clearly broken.

	PPC PeterZijlstra+o-r+o-r+a-o-SB.litmus
	{
	0:r1=1; 0:r2=2; 0:r3=x; 0:r4=y;
	1:r1=1; 1:r2=2; 1:r3=x; 1:r4=y;
			2:r3=x; 2:r4=y;
	}
	 P0           | P1           | P2           ;
	 stw r1,0(r3) | stw r2,0(r3) | lwz r1,0(r4) ;
	 lwsync       | lwsync       | lwsync       ;
	 stw r1,0(r4) | stw r2,0(r4) | lwz r2,0(r3) ;
	exists
	(((x=1 /\ y=2) \/ (x=2 /\ y=1)) /\ (2:r1=1 \/ 2:r1=2) /\ 2:r2=0)

Now herd says that this is forbidden, but let's ask ppcmem.  Futzing with
the web site seems to say "no".  The lwsyncs insist on propagating the
change to x to P2 before the change to y.  The full-state-space search
tool might take awhile, so will get you know if it disagrees.  Just in
case I missed some odd combination of ppcmem events.

But at the moment, my guess is that your dazzling condition cannot
happen on PowerPC, and that at least this aspect of the current draft
of the kernel memory model is in fact correct.

Or did I incorrectly translate your litmus test?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH locking/Documentation 1/2] Add note of release-acquire store vulnerability
  2016-09-29 19:18                 ` Paul E. McKenney
@ 2016-09-29 19:36                   ` Alan Stern
  2016-09-29 20:26                     ` Paul E. McKenney
  2016-09-30  8:53                     ` Peter Zijlstra
  2016-09-30  9:00                   ` Peter Zijlstra
  2016-09-30  9:57                   ` Peter Zijlstra
  2 siblings, 2 replies; 27+ messages in thread
From: Alan Stern @ 2016-09-29 19:36 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Peter Zijlstra, Will Deacon, linux-kernel, mingo, dhowells

On Thu, 29 Sep 2016, Paul E. McKenney wrote:

> On Thu, Sep 29, 2016 at 08:44:39PM +0200, Peter Zijlstra wrote:

> > How about something like so on PPC?
> > 
> > P0(int *x, int *y)
> > {
> > 	WRITE_ONCE(*x, 1);
> > 	smp_store_release(y, 1);
> > }
> > 
> > P1(int *x, int *y)
> > {
> > 	WRITE_ONCE(x, 2);
> 
> Need "WRITE_ONCE(*x, 2)" here.
> 
> > 	smp_store_release(y, 2);
> > }
> > 
> > P2(int *x, int *y)
> > {
> > 	r1 = smp_load_acquire(y);
> > 	r2 = READ_ONCE(*x);
> > }
> > 
> > (((x==1 && y==2) | (x==2 && y==1)) && (r1==1 || r1==2) && r2==0)
> 
> That exists-clause is quite dazzling...  So if each of P0 and P1
> win, but on different stores, and if P2 follows one or the other
> of P0 or P1, can r2 get the pre-initialization value for x?

In fact, this is more than you need.  It's enough to specify

exists (2:r1=1 \/ 2:r1=2) /\ 2:r2=0

This much already is forbidden.  For the sake of argument, say r1=1.  
Then P2 has read from P1's store-release.  By definition, P1's write to
x is visible to P2, so r2 will get the value from that write or from
one that is later in x's coherence order.  In other words, r2 will end
up equal to either 1 or 2, but not 0.

> > If you execute P0 and P1 concurrently and one store of each 'wins' the
> > LWSYNC of either is null and void, and therefore P2 is unordered and can
> > observe r2==0.

Not so.  lwsync instructions cannot be "voided".

> That vaguely resembles the infamous Z6.3, but only vaguely.  The Linux-kernel
> memory model says "forbidden" to this:
> 
> 	C C-WillDeacon-AcqRelStore.litmus
> 
> 	{
> 	}
> 
> 	P0(int *x, int *y)
> 	{
> 	      WRITE_ONCE(*x, 1);
> 	      smp_store_release(y, 1);
> 	}
> 
> 	P1(int *x, int *y)
> 	{
> 	      WRITE_ONCE(*x, 2);
> 	      smp_store_release(y, 2);
> 	}
> 
> 	P2(int *x, int *y)
> 	{
> 	      r1 = smp_load_acquire(y);
> 	      r2 = READ_ONCE(*x);
> 	}
> 
> 	exists
> 	(((x=1 /\ y=2) \/ (x=2 /\ y=1)) /\ (2:r1=1 \/ 2:r1=2) /\ 2:r2=0)

As above, you can leave out the part about the final values for x and 
y.  The test will still be forbidden.

On the other hand, there's no guarantee that if r1=1 at the end then r2 
will also be 1.  It's quite possible that r1=1 and r2=2, or vice versa.

Alan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH locking/Documentation 1/2] Add note of release-acquire store vulnerability
  2016-09-29 19:36                   ` Alan Stern
@ 2016-09-29 20:26                     ` Paul E. McKenney
  2016-09-30  8:53                     ` Peter Zijlstra
  1 sibling, 0 replies; 27+ messages in thread
From: Paul E. McKenney @ 2016-09-29 20:26 UTC (permalink / raw)
  To: Alan Stern; +Cc: Peter Zijlstra, Will Deacon, linux-kernel, mingo, dhowells

On Thu, Sep 29, 2016 at 03:36:38PM -0400, Alan Stern wrote:
> On Thu, 29 Sep 2016, Paul E. McKenney wrote:
> 
> > On Thu, Sep 29, 2016 at 08:44:39PM +0200, Peter Zijlstra wrote:
> 
> > > How about something like so on PPC?
> > > 
> > > P0(int *x, int *y)
> > > {
> > > 	WRITE_ONCE(*x, 1);
> > > 	smp_store_release(y, 1);
> > > }
> > > 
> > > P1(int *x, int *y)
> > > {
> > > 	WRITE_ONCE(x, 2);
> > 
> > Need "WRITE_ONCE(*x, 2)" here.
> > 
> > > 	smp_store_release(y, 2);
> > > }
> > > 
> > > P2(int *x, int *y)
> > > {
> > > 	r1 = smp_load_acquire(y);
> > > 	r2 = READ_ONCE(*x);
> > > }
> > > 
> > > (((x==1 && y==2) | (x==2 && y==1)) && (r1==1 || r1==2) && r2==0)
> > 
> > That exists-clause is quite dazzling...  So if each of P0 and P1
> > win, but on different stores, and if P2 follows one or the other
> > of P0 or P1, can r2 get the pre-initialization value for x?
> 
> In fact, this is more than you need.  It's enough to specify
> 
> exists (2:r1=1 \/ 2:r1=2) /\ 2:r2=0
> 
> This much already is forbidden.  For the sake of argument, say r1=1.  
> Then P2 has read from P1's store-release.  By definition, P1's write to
> x is visible to P2, so r2 will get the value from that write or from
> one that is later in x's coherence order.  In other words, r2 will end
> up equal to either 1 or 2, but not 0.
> 
> > > If you execute P0 and P1 concurrently and one store of each 'wins' the
> > > LWSYNC of either is null and void, and therefore P2 is unordered and can
> > > observe r2==0.
> 
> Not so.  lwsync instructions cannot be "voided".
> 
> > That vaguely resembles the infamous Z6.3, but only vaguely.  The Linux-kernel
> > memory model says "forbidden" to this:
> > 
> > 	C C-WillDeacon-AcqRelStore.litmus
> > 
> > 	{
> > 	}
> > 
> > 	P0(int *x, int *y)
> > 	{
> > 	      WRITE_ONCE(*x, 1);
> > 	      smp_store_release(y, 1);
> > 	}
> > 
> > 	P1(int *x, int *y)
> > 	{
> > 	      WRITE_ONCE(*x, 2);
> > 	      smp_store_release(y, 2);
> > 	}
> > 
> > 	P2(int *x, int *y)
> > 	{
> > 	      r1 = smp_load_acquire(y);
> > 	      r2 = READ_ONCE(*x);
> > 	}
> > 
> > 	exists
> > 	(((x=1 /\ y=2) \/ (x=2 /\ y=1)) /\ (2:r1=1 \/ 2:r1=2) /\ 2:r2=0)
> 
> As above, you can leave out the part about the final values for x and 
> y.  The test will still be forbidden.
> 
> On the other hand, there's no guarantee that if r1=1 at the end then r2 
> will also be 1.  It's quite possible that r1=1 and r2=2, or vice versa.

And herd agrees for both the kernel model and the powerpc translation.
I killed PPCMEM, which was up to 1.2G of state space.  So is this a
case where "herd -cat ppc.cat" can be trusted?  ;-)

And the web ppcmem does not allow the exists clause, from what I could
see.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH locking/Documentation 1/2] Add note of release-acquire store vulnerability
  2016-09-29 17:23         ` Paul E. McKenney
  2016-09-29 18:04           ` Paul E. McKenney
@ 2016-09-30  5:53           ` Boqun Feng
  2016-09-30  9:20             ` Will Deacon
  1 sibling, 1 reply; 27+ messages in thread
From: Boqun Feng @ 2016-09-30  5:53 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Will Deacon, Peter Zijlstra, linux-kernel, mingo, dhowells, stern

[-- Attachment #1: Type: text/plain, Size: 5009 bytes --]

Hi Paul,

On Thu, Sep 29, 2016 at 10:23:22AM -0700, Paul E. McKenney wrote:
> On Thu, Sep 29, 2016 at 06:10:37PM +0100, Will Deacon wrote:
> > On Thu, Sep 29, 2016 at 09:43:53AM -0700, Paul E. McKenney wrote:
> > > On Thu, Sep 29, 2016 at 05:03:08PM +0100, Will Deacon wrote:
> > > > On Thu, Sep 29, 2016 at 05:58:17PM +0200, Peter Zijlstra wrote:
> > > > > On Thu, Sep 29, 2016 at 08:54:01AM -0700, Paul E. McKenney wrote:
> > > > > > If two processes are related by a RELEASE+ACQUIRE pair, ordering can be
> > > > > > broken if a third process overwrites the value written by the RELEASE
> > > > > > operation before the ACQUIRE operation has a chance of reading it.
> > > > > > This commit therefore updates the documentation to call this vulnerability
> > > > > > out explicitly.
> > > > > > 
> > > > > > Reported-by: Alan Stern <stern@rowland.harvard.edu>
> > > > > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > > > 
> > > > > > +     However, please note that a chain of RELEASE+ACQUIRE pairs may be
> > > > > > +     broken by a store by another thread that overwrites the RELEASE
> > > > > > +     operation's store before the ACQUIRE operation's read.
> > > > > 
> > > > > This is the powerpc lwsync quirk, right? Where the barrier disappears
> > > > > when it looses the store.
> > > > > 
> > > > > Or is there more to it? Its not entirely clear from the Changelog, which
> > > > > I feel should describe the reason for the behaviour.
> > > > 
> > > > If I've groked it correctly, it's for cases like:
> > > > 
> > > > 
> > > > PO:
> > > > Wx=1
> > > > WyRel=1
> > > > 
> > > > P1:
> > > > Wy=2
> > > > 
> > > > P2:
> > > > RyAcq=2
> > > > Rx=0
> > > > 
> > > > Final value of y is 2.
> > > > 
> > > > 
> > > > This is permitted on arm64. If you make P1's store a store-release, then
> > > > it's forbidden, but I suspect that's not generally true of the kernel
> > > > memory model.
> > > 
> > > That is the one!  And to Peter's point, powerpc does the same for the
> > > example as shown.  However, on powerpc, upgrading P1's store to release
> > > has no effect because there is no earlier access for the resulting
> > > lwsync to influence.  For whatever it might be worth, C11 won't guarantee
> > > ordering in that case, either.  Nor will the current Linux-kernel memory
> > > model.  (Yes, I did just try it to make sure.  Why do you ask?)
> > > 
> > > So you guys are fishing for an expanded commit log, for example, like
> > > the following?  ;-)
> > > 
> > > 							Thanx, Paul
> > > 
> > > ------------------------------------------------------------------------
> > > 
> > > If two processes are related by a RELEASE+ACQUIRE pair, ordering can be
> > > broken if a third process overwrites the value written by the RELEASE
> > > operation before the ACQUIRE operation has a chance of reading it, for
> > > example:
> > > 
> > > 	P0(int *x, int *y)
> > > 	{
> > > 		WRITE_ONCE(*x, 1);
> > > 		smp_wmb();
> > > 		smp_store_release(y, 1);
> > > 	}
> > > 
> > > 	P1(int *y)
> > > 	{
> > > 		smp_store_release(y, 2);
> > > 	}
> > > 
> > > 	P2(int *x, int *y)
> > > 	{
> > > 		r1 = smp_load_acquire(y);
> > > 		r2 = READ_ONCE(*x);
> > > 	}
> > > 
> > > Both ARM and powerpc allow the "after the dust settles" outcome (r1=2 &&
> > > r2=0), as does the current version of the early prototype Linux-kernel
> > > memory model.
> > 
> > FWIW, ARM doesn't allow this and arm64 only allows it if P1 uses WRITE_ONCE
> > instead of store-release.
> 
> Good catch, apologies for the error.  The following, then?
> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> If two processes are related by a RELEASE+ACQUIRE pair, ordering can be
> broken if a third process overwrites the value written by the RELEASE
> operation before the ACQUIRE operation has a chance of reading it, for
> example:
> 
> 	P0(int *x, int *y)
> 	{
> 		WRITE_ONCE(*x, 1);
> 		smp_wmb();
               ^^^^^^^^^^^

What is this smp_wmb() for?

> 		smp_store_release(y, 1);
> 	}
> 
> 	P1(int *y)
> 	{
> 		WRITE_ONCE(*y, 2);

If we change this WRITE_ONCE to a relaxed atomic operation(e.g.
xchg_relaxed(y, 2)), both herd and ppcmem said the exist-clause "y = 2
/\ 2:r1 = 2 /\ 2:r2 = 0" wouldn't be triggered on PPC.

I guess we will get the same behavior on ARM/ARM64, Will?

If a normal store could break chain, while a RmW atomic won't, do we
want to call it out in the document and build our memory model around
this?

I asked because in spin_unlock_wait() fix, we kind of relied on this. So
it's good for us to clarify it?

Regards,
Boqun

> 	}
> 
> 	P2(int *x, int *y)
> 	{
> 		r1 = smp_load_acquire(y);
> 		r2 = READ_ONCE(*x);
> 	}
> 
> Both ARM and powerpc allow the "after the dust settles" outcome (r1=2 &&
> r2=0), as does the current version of the early prototype Linux-kernel
> memory model.
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH locking/Documentation 1/2] Add note of release-acquire store vulnerability
  2016-09-29 19:36                   ` Alan Stern
  2016-09-29 20:26                     ` Paul E. McKenney
@ 2016-09-30  8:53                     ` Peter Zijlstra
  1 sibling, 0 replies; 27+ messages in thread
From: Peter Zijlstra @ 2016-09-30  8:53 UTC (permalink / raw)
  To: Alan Stern; +Cc: Paul E. McKenney, Will Deacon, linux-kernel, mingo, dhowells

On Thu, Sep 29, 2016 at 03:36:38PM -0400, Alan Stern wrote:
> > > If you execute P0 and P1 concurrently and one store of each 'wins' the
> > > LWSYNC of either is null and void, and therefore P2 is unordered and can
> > > observe r2==0.
> 
> Not so.  lwsync instructions cannot be "voided".

I distinctly remember there being a case (smp_wmb()) where the lwsync
would disappear if the store was lost.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH locking/Documentation 1/2] Add note of release-acquire store vulnerability
  2016-09-29 19:18                 ` Paul E. McKenney
  2016-09-29 19:36                   ` Alan Stern
@ 2016-09-30  9:00                   ` Peter Zijlstra
  2016-09-30  9:57                   ` Peter Zijlstra
  2 siblings, 0 replies; 27+ messages in thread
From: Peter Zijlstra @ 2016-09-30  9:00 UTC (permalink / raw)
  To: Paul E. McKenney, gg; +Cc: Will Deacon, linux-kernel, mingo, dhowells, stern

On Thu, Sep 29, 2016 at 12:18:58PM -0700, Paul E. McKenney wrote:
> On Thu, Sep 29, 2016 at 08:44:39PM +0200, Peter Zijlstra wrote:
> > On Thu, Sep 29, 2016 at 11:10:15AM -0700, Paul E. McKenney wrote:
> > > > > 
> > > > > 	P0(int *x, int *y)
> > > > > 	{
> > > > > 		WRITE_ONCE(*x, 1);
> > > > > 		smp_wmb();
> > > > > 		smp_store_release(y, 1);
> > > > > 	}
> > > > > 
> > > > > 	P1(int *y)
> > > > > 	{
> > > > > 		WRITE_ONCE(*y, 2);
> > > > > 	}
> > > > > 
> > > > > 	P2(int *x, int *y)
> > > > > 	{
> > > > > 		r1 = smp_load_acquire(y);
> > > > > 		r2 = READ_ONCE(*x);
> > > > > 	}
> > > > > 
> > > > > Both ARM and powerpc allow the "after the dust settles" outcome (r1=2 &&
> > > > > r2=0), as does the current version of the early prototype Linux-kernel
> > > > 
> > > > And the above needs to be (r1!=2 || r2 != 0)...  Sigh!
> > > 
> > > Make that (y==2 && r1==2 && r2 == 0).
> > > 
> > > Any further bids?  ;-)
> > 
> > Isn't that the trivial P1,P2,P0 order again?
> 
> I don't believe so.  Wouldn't the final P0 would leave y==1?

Ah, indeed. I was forgetting to 'execute' P0 entirely.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH locking/Documentation 1/2] Add note of release-acquire store vulnerability
  2016-09-30  5:53           ` Boqun Feng
@ 2016-09-30  9:20             ` Will Deacon
  2016-09-30 11:35               ` Paul E. McKenney
  0 siblings, 1 reply; 27+ messages in thread
From: Will Deacon @ 2016-09-30  9:20 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Paul E. McKenney, Peter Zijlstra, linux-kernel, mingo, dhowells, stern

On Fri, Sep 30, 2016 at 01:53:52PM +0800, Boqun Feng wrote:
> On Thu, Sep 29, 2016 at 10:23:22AM -0700, Paul E. McKenney wrote:
> > If two processes are related by a RELEASE+ACQUIRE pair, ordering can be
> > broken if a third process overwrites the value written by the RELEASE
> > operation before the ACQUIRE operation has a chance of reading it, for
> > example:
> > 
> > 	P0(int *x, int *y)
> > 	{
> > 		WRITE_ONCE(*x, 1);
> > 		smp_wmb();
>                ^^^^^^^^^^^
> 
> What is this smp_wmb() for?
> 
> > 		smp_store_release(y, 1);
> > 	}
> > 
> > 	P1(int *y)
> > 	{
> > 		WRITE_ONCE(*y, 2);
> 
> If we change this WRITE_ONCE to a relaxed atomic operation(e.g.
> xchg_relaxed(y, 2)), both herd and ppcmem said the exist-clause "y = 2
> /\ 2:r1 = 2 /\ 2:r2 = 0" wouldn't be triggered on PPC.
> 
> I guess we will get the same behavior on ARM/ARM64, Will?
> 
> If a normal store could break chain, while a RmW atomic won't, do we
> want to call it out in the document and build our memory model around
> this?

I think this is required to work by C11's definition of release sequences,
so any architecture that claims to support those with the same instructions
will need this to be forbidden.

Personally, I think that's a bug in C11, because I think it goes too far
in forbidding some hardware optimisations around relaxed xchg, but it is
what it is.

Will

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH locking/Documentation 1/2] Add note of release-acquire store vulnerability
  2016-09-29 19:18                 ` Paul E. McKenney
  2016-09-29 19:36                   ` Alan Stern
  2016-09-30  9:00                   ` Peter Zijlstra
@ 2016-09-30  9:57                   ` Peter Zijlstra
  2016-09-30 12:14                     ` Paul E. McKenney
  2 siblings, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2016-09-30  9:57 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Will Deacon, linux-kernel, mingo, dhowells, stern

On Thu, Sep 29, 2016 at 12:18:58PM -0700, Paul E. McKenney wrote:
> On Thu, Sep 29, 2016 at 08:44:39PM +0200, Peter Zijlstra wrote:

> > How about something like so on PPC?
> > 
> > P0(int *x, int *y)
> > {
> > 	WRITE_ONCE(*x, 1);
> > 	smp_store_release(y, 1);
> > }
> > 
> > P1(int *x, int *y)
> > {
> > 	WRITE_ONCE(x, 2);
> 
> Need "WRITE_ONCE(*x, 2)" here.
> 
> > 	smp_store_release(y, 2);
> > }
> > 
> > P2(int *x, int *y)
> > {
> > 	r1 = smp_load_acquire(y);
> > 	r2 = READ_ONCE(*x);
> > }
> > 
> > (((x==1 && y==2) | (x==2 && y==1)) && (r1==1 || r1==2) && r2==0)
> 
> That exists-clause is quite dazzling...  So if each of P0 and P1
> win, but on different stores, and if P2 follows one or the other
> of P0 or P1, can r2 get the pre-initialization value for x?
> 
> > If you execute P0 and P1 concurrently and one store of each 'wins' the
> > LWSYNC of either is null and void, and therefore P2 is unordered and can
> > observe r2==0.
> 
> That vaguely resembles the infamous Z6.3, but only vaguely.  The Linux-kernel
> memory model says "forbidden" to this:

  https://www.cl.cam.ac.uk/~pes20/ppc-supplemental/ppc710.html

That one, right?

Hmm, I seem to remember something else.. /me goes poke through history
and comes up with:

  https://lkml.kernel.org/r/20160115215853.GC3818@linux.vnet.ibm.com

So what was that about then? I remember it being a completely
nonsensical case, but a weird one.

> So let's try PPCMEM.  If PPCMEM allows it, then the kernel model is
> clearly broken.
> 
> 	PPC PeterZijlstra+o-r+o-r+a-o-SB.litmus
> 	{
> 	0:r1=1; 0:r2=2; 0:r3=x; 0:r4=y;
> 	1:r1=1; 1:r2=2; 1:r3=x; 1:r4=y;
> 			2:r3=x; 2:r4=y;
> 	}
> 	 P0           | P1           | P2           ;
> 	 stw r1,0(r3) | stw r2,0(r3) | lwz r1,0(r4) ;
> 	 lwsync       | lwsync       | lwsync       ;
> 	 stw r1,0(r4) | stw r2,0(r4) | lwz r2,0(r3) ;
> 	exists
> 	(((x=1 /\ y=2) \/ (x=2 /\ y=1)) /\ (2:r1=1 \/ 2:r1=2) /\ 2:r2=0)

> Or did I incorrectly translate your litmus test?

Looks about right.

Still not seeing how that is prohibited though. My reasoning is as
follows:

 - P0 and P1 both store to x, one looses (say P0). Effectively only P1
   does a store.

 - P0 and P1 both store to y, one looses (say P1). Effectively only P0
   does a store.

 - P2 reads y, sees the value from P0.

 - P2 does lwsync, which constraints P2 to not issue the load of x
   before this. It also forms a (local) sync-point with P0 for having
   seen its store or y.

 - P2 reads x, sees the initial value because the store from P1 hasn't
   been propagated yet.

It will not see the store P0 did to x, since that didn't happen.

Assuming I'm wrong on that last part, is then the following possible?

(x=2 /\ y=1 /\ 2:r1=1 /\ 2:r2=1)

Where we see a store that didn't happen?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH locking/Documentation 1/2] Add note of release-acquire store vulnerability
  2016-09-29 16:43     ` Paul E. McKenney
  2016-09-29 17:10       ` Will Deacon
@ 2016-09-30 10:25       ` Peter Zijlstra
  2016-09-30 12:17         ` Paul E. McKenney
  1 sibling, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2016-09-30 10:25 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Will Deacon, linux-kernel, mingo, dhowells, stern

On Thu, Sep 29, 2016 at 09:43:53AM -0700, Paul E. McKenney wrote:
> If two processes are related by a RELEASE+ACQUIRE pair, ordering can be
> broken if a third process overwrites the value written by the RELEASE
> operation before the ACQUIRE operation has a chance of reading it, for
> example:
> 
> 	P0(int *x, int *y)
> 	{
> 		WRITE_ONCE(*x, 1);
> 		smp_store_release(y, 1);
> 	}
> 
> 	P1(int *y)
> 	{
> 		smp_store_release(y, 2);
> 	}
> 
> 	P2(int *x, int *y)
> 	{
> 		r1 = smp_load_acquire(y);
> 		r2 = READ_ONCE(*x);
> 	}
> 
> Both ARM and powerpc allow the "after the dust settles" outcome (r1=2 &&
> r2=0), as does the current version of the early prototype Linux-kernel
> memory model.
> 
> This commit therefore updates the documentation to call this vulnerability
> out explicitly.

So its a pretty dumb thing to do in any case (and yes the kernel does
this). Its also entirely expected in my book, that if you generate
conflicting writes on a release, ordering is out the window.

Why do we need to call this out? Who in his right mind would want to do
this and expect anything other than wreckage?

Not that we're not having too much 'fun' discussing this,.. but I do
wonder why we need to call this out.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH locking/Documentation 1/2] Add note of release-acquire store vulnerability
  2016-09-30  9:20             ` Will Deacon
@ 2016-09-30 11:35               ` Paul E. McKenney
  0 siblings, 0 replies; 27+ messages in thread
From: Paul E. McKenney @ 2016-09-30 11:35 UTC (permalink / raw)
  To: Will Deacon
  Cc: Boqun Feng, Peter Zijlstra, linux-kernel, mingo, dhowells, stern

On Fri, Sep 30, 2016 at 10:20:09AM +0100, Will Deacon wrote:
> On Fri, Sep 30, 2016 at 01:53:52PM +0800, Boqun Feng wrote:
> > On Thu, Sep 29, 2016 at 10:23:22AM -0700, Paul E. McKenney wrote:
> > > If two processes are related by a RELEASE+ACQUIRE pair, ordering can be
> > > broken if a third process overwrites the value written by the RELEASE
> > > operation before the ACQUIRE operation has a chance of reading it, for
> > > example:
> > > 
> > > 	P0(int *x, int *y)
> > > 	{
> > > 		WRITE_ONCE(*x, 1);
> > > 		smp_wmb();
> >                ^^^^^^^^^^^
> > 
> > What is this smp_wmb() for?

Seems redundant to me, now that you mention it.  ;-)

But maybe this does something on ARM?

> > > 		smp_store_release(y, 1);
> > > 	}
> > > 
> > > 	P1(int *y)
> > > 	{
> > > 		WRITE_ONCE(*y, 2);
> > 
> > If we change this WRITE_ONCE to a relaxed atomic operation(e.g.
> > xchg_relaxed(y, 2)), both herd and ppcmem said the exist-clause "y = 2
> > /\ 2:r1 = 2 /\ 2:r2 = 0" wouldn't be triggered on PPC.
> > 
> > I guess we will get the same behavior on ARM/ARM64, Will?
> > 
> > If a normal store could break chain, while a RmW atomic won't, do we
> > want to call it out in the document and build our memory model around
> > this?
> 
> I think this is required to work by C11's definition of release sequences,
> so any architecture that claims to support those with the same instructions
> will need this to be forbidden.

Yes.

> Personally, I think that's a bug in C11, because I think it goes too far
> in forbidding some hardware optimisations around relaxed xchg, but it is
> what it is.

The idea at the time (2007 or thereabouts) was that the atomic operation
would have a hard time breaking the causal chain.  To your point, atomic
xchg could presumably update the value and figure out what the previous
value was after the fact.  Maybe we should try to get the committee to
relax the requirement for relaxed xchg, though backwards compatibility
will make that a tough sell.  Might need a new xchg API.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH locking/Documentation 1/2] Add note of release-acquire store vulnerability
  2016-09-30  9:57                   ` Peter Zijlstra
@ 2016-09-30 12:14                     ` Paul E. McKenney
  2016-09-30 12:51                       ` Peter Zijlstra
  0 siblings, 1 reply; 27+ messages in thread
From: Paul E. McKenney @ 2016-09-30 12:14 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Will Deacon, linux-kernel, mingo, dhowells, stern

On Fri, Sep 30, 2016 at 11:57:38AM +0200, Peter Zijlstra wrote:
> On Thu, Sep 29, 2016 at 12:18:58PM -0700, Paul E. McKenney wrote:
> > On Thu, Sep 29, 2016 at 08:44:39PM +0200, Peter Zijlstra wrote:
> 
> > > How about something like so on PPC?
> > > 
> > > P0(int *x, int *y)
> > > {
> > > 	WRITE_ONCE(*x, 1);
> > > 	smp_store_release(y, 1);
> > > }
> > > 
> > > P1(int *x, int *y)
> > > {
> > > 	WRITE_ONCE(x, 2);
> > 
> > Need "WRITE_ONCE(*x, 2)" here.
> > 
> > > 	smp_store_release(y, 2);
> > > }
> > > 
> > > P2(int *x, int *y)
> > > {
> > > 	r1 = smp_load_acquire(y);
> > > 	r2 = READ_ONCE(*x);
> > > }
> > > 
> > > (((x==1 && y==2) | (x==2 && y==1)) && (r1==1 || r1==2) && r2==0)
> > 
> > That exists-clause is quite dazzling...  So if each of P0 and P1
> > win, but on different stores, and if P2 follows one or the other
> > of P0 or P1, can r2 get the pre-initialization value for x?
> > 
> > > If you execute P0 and P1 concurrently and one store of each 'wins' the
> > > LWSYNC of either is null and void, and therefore P2 is unordered and can
> > > observe r2==0.
> > 
> > That vaguely resembles the infamous Z6.3, but only vaguely.  The Linux-kernel
> > memory model says "forbidden" to this:
> 
>   https://www.cl.cam.ac.uk/~pes20/ppc-supplemental/ppc710.html
> 
> That one, right?

That is the one!  Prohibiting the cycle requires smp_mb() on both threads
0 and 1 on the one hand, or on both threads 0 and 2 on the other hand.

> Hmm, I seem to remember something else.. /me goes poke through history
> and comes up with:
> 
>   https://lkml.kernel.org/r/20160115215853.GC3818@linux.vnet.ibm.com
> 
> So what was that about then? I remember it being a completely
> nonsensical case, but a weird one.

That was an Alan Stern example where PowerPC prohibits the outcome,
and the ppcmem tool agrees, but where herd does not.  (ppcmem is giving
the correct answer.)

But this is specific to PowerPC.  I would not advise writing code that
relies on this one.  ;-)

> > So let's try PPCMEM.  If PPCMEM allows it, then the kernel model is
> > clearly broken.
> > 
> > 	PPC PeterZijlstra+o-r+o-r+a-o-SB.litmus
> > 	{
> > 	0:r1=1; 0:r2=2; 0:r3=x; 0:r4=y;
> > 	1:r1=1; 1:r2=2; 1:r3=x; 1:r4=y;
> > 			2:r3=x; 2:r4=y;
> > 	}
> > 	 P0           | P1           | P2           ;
> > 	 stw r1,0(r3) | stw r2,0(r3) | lwz r1,0(r4) ;
> > 	 lwsync       | lwsync       | lwsync       ;
> > 	 stw r1,0(r4) | stw r2,0(r4) | lwz r2,0(r3) ;
> > 	exists
> > 	(((x=1 /\ y=2) \/ (x=2 /\ y=1)) /\ (2:r1=1 \/ 2:r1=2) /\ 2:r2=0)
> 
> > Or did I incorrectly translate your litmus test?
> 
> Looks about right.
> 
> Still not seeing how that is prohibited though. My reasoning is as
> follows:
> 
>  - P0 and P1 both store to x, one looses (say P0). Effectively only P1
>    does a store.
> 
>  - P0 and P1 both store to y, one looses (say P1). Effectively only P0
>    does a store.

PowerPC does not "obscure" stores, so both stores really are there and
the lwsync really has effect on all CPUs.  From what I understand, even
CPUs that do obscure stores only do so in the case of repeated stores
by the same CPU to the same variable, and the above litmus test doesn't
have this.

So all the stores happen, and each CPU's stores are at least locally
ordered.

>  - P2 reads y, sees the value from P0.

Fair enough!

>  - P2 does lwsync, which constraints P2 to not issue the load of x
>    before this. It also forms a (local) sync-point with P0 for having
>    seen its store or y.
> 
>  - P2 reads x, sees the initial value because the store from P1 hasn't
>    been propagated yet.
> 
> It will not see the store P0 did to x, since that didn't happen.

Well, it saw the store to y, so it absolutely must see one of the other of
the stores to x in this particular litmus test.  If P1's store overwrote
P0's store, then P2 has to see P1's store, for example.

> Assuming I'm wrong on that last part, is then the following possible?
> 
> (x=2 /\ y=1 /\ 2:r1=1 /\ 2:r2=1)
> 
> Where we see a store that didn't happen?

Again, both stores really did happen.  ;-)

So with this litmus test:

	PPC PeterZijlstra+o-r+o-r+a-o-SB.litmus
	{
	0:r1=1; 0:r2=2; 0:r3=x; 0:r4=y;
	1:r1=1; 1:r2=2; 1:r3=x; 1:r4=y;
			2:r3=x; 2:r4=y;
	}
	 P0           | P1           | P2           ;
	 stw r1,0(r3) | stw r2,0(r3) | lwz r1,0(r4) ;
	 lwsync       | lwsync       | lwsync       ;
	 stw r1,0(r4) | stw r2,0(r4) | lwz r2,0(r3) ;
	exists
	(x=2 /\ y=1 /\ 2:r1=1 /\ 2:r2=1)

The herd tool says:

	Test PeterZijlstra+o-r+o-r+a-o-SB Allowed
	States 24
	2:r1=0; 2:r2=0; x=1; y=1;
	2:r1=0; 2:r2=0; x=1; y=2;
	2:r1=0; 2:r2=0; x=2; y=1;
	2:r1=0; 2:r2=0; x=2; y=2;
	2:r1=0; 2:r2=1; x=1; y=1;
	2:r1=0; 2:r2=1; x=1; y=2;
	2:r1=0; 2:r2=1; x=2; y=1;
	2:r1=0; 2:r2=1; x=2; y=2;
	2:r1=0; 2:r2=2; x=1; y=1;
	2:r1=0; 2:r2=2; x=1; y=2;
	2:r1=0; 2:r2=2; x=2; y=1;
	2:r1=0; 2:r2=2; x=2; y=2;
	2:r1=1; 2:r2=1; x=1; y=1;
	2:r1=1; 2:r2=1; x=1; y=2;
	2:r1=1; 2:r2=1; x=2; y=1;
	2:r1=1; 2:r2=1; x=2; y=2;
	2:r1=1; 2:r2=2; x=2; y=1;
	2:r1=1; 2:r2=2; x=2; y=2;
	2:r1=2; 2:r2=1; x=1; y=1;
	2:r1=2; 2:r2=1; x=1; y=2;
	2:r1=2; 2:r2=2; x=1; y=1;
	2:r1=2; 2:r2=2; x=1; y=2;
	2:r1=2; 2:r2=2; x=2; y=1;
	2:r1=2; 2:r2=2; x=2; y=2;
	Ok
	Witnesses
	Positive: 1 Negative: 23
	Condition exists (x=2 /\ y=1 /\ 2:r1=1 /\ 2:r2=1)
	Observation PeterZijlstra+o-r+o-r+a-o-SB Sometimes 1 23
	Hash=c32afd1ac8bfee7d4b23a27e783d0998

So herd believes that it can happen.  I was also able to force the web
ppcmem tool (https://www.cl.cam.ac.uk/~pes20/ppcmem/index.html) to get
into this state with the following sequence of choices:

	[0;5;4;0;0;4;4;2;2;2;4;4;1;1;1;7;6;0;0;3;4;0;1;2;5;0;0]

So, yes, this can happen, architecturally at least.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH locking/Documentation 1/2] Add note of release-acquire store vulnerability
  2016-09-30 10:25       ` Peter Zijlstra
@ 2016-09-30 12:17         ` Paul E. McKenney
  2016-09-30 12:45           ` Peter Zijlstra
  0 siblings, 1 reply; 27+ messages in thread
From: Paul E. McKenney @ 2016-09-30 12:17 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Will Deacon, linux-kernel, mingo, dhowells, stern

On Fri, Sep 30, 2016 at 12:25:16PM +0200, Peter Zijlstra wrote:
> On Thu, Sep 29, 2016 at 09:43:53AM -0700, Paul E. McKenney wrote:
> > If two processes are related by a RELEASE+ACQUIRE pair, ordering can be
> > broken if a third process overwrites the value written by the RELEASE
> > operation before the ACQUIRE operation has a chance of reading it, for
> > example:
> > 
> > 	P0(int *x, int *y)
> > 	{
> > 		WRITE_ONCE(*x, 1);
> > 		smp_store_release(y, 1);
> > 	}
> > 
> > 	P1(int *y)
> > 	{
> > 		smp_store_release(y, 2);
> > 	}
> > 
> > 	P2(int *x, int *y)
> > 	{
> > 		r1 = smp_load_acquire(y);
> > 		r2 = READ_ONCE(*x);
> > 	}
> > 
> > Both ARM and powerpc allow the "after the dust settles" outcome (r1=2 &&
> > r2=0), as does the current version of the early prototype Linux-kernel
> > memory model.
> > 
> > This commit therefore updates the documentation to call this vulnerability
> > out explicitly.
> 
> So its a pretty dumb thing to do in any case (and yes the kernel does
> this). Its also entirely expected in my book, that if you generate
> conflicting writes on a release, ordering is out the window.
> 
> Why do we need to call this out? Who in his right mind would want to do
> this and expect anything other than wreckage?
> 
> Not that we're not having too much 'fun' discussing this,.. but I do
> wonder why we need to call this out.

You lost me on this one...  If no one does this, sure, we can leave it out.
But if some part of the kernel does rely on this, we should call it out as
forbidden.  And fix the kernel, of course.

Or am I missing your point?

								Thanx, Paul

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH locking/Documentation 1/2] Add note of release-acquire store vulnerability
  2016-09-30 12:17         ` Paul E. McKenney
@ 2016-09-30 12:45           ` Peter Zijlstra
  2016-09-30 13:10             ` Paul E. McKenney
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2016-09-30 12:45 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Will Deacon, linux-kernel, mingo, dhowells, stern

On Fri, Sep 30, 2016 at 05:17:21AM -0700, Paul E. McKenney wrote:
> On Fri, Sep 30, 2016 at 12:25:16PM +0200, Peter Zijlstra wrote:

> > So its a pretty dumb thing to do in any case (and yes the kernel does
> > this). Its also entirely expected in my book, that if you generate
> > conflicting writes on a release, ordering is out the window.
> > 
> > Why do we need to call this out? Who in his right mind would want to do
> > this and expect anything other than wreckage?
> > 
> > Not that we're not having too much 'fun' discussing this,.. but I do
> > wonder why we need to call this out.
> 
> You lost me on this one...  If no one does this, sure, we can leave it out.
> But if some part of the kernel does rely on this, we should call it out as
> forbidden.  And fix the kernel, of course.

Well, the kernel does this, but doesn't rely on ordering. Do "git grep
zap_locks". Its disgusting, can (and does) fail and generally is a sign
of badly broken code (printk is all that).

> Or am I missing your point?

My point was, its obvious crack, anybody doing this needs to have his
head examined. Then again, maybe we should just say that :-)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH locking/Documentation 1/2] Add note of release-acquire store vulnerability
  2016-09-30 12:14                     ` Paul E. McKenney
@ 2016-09-30 12:51                       ` Peter Zijlstra
  2016-09-30 13:35                         ` Paul E. McKenney
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2016-09-30 12:51 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Will Deacon, linux-kernel, mingo, dhowells, stern

On Fri, Sep 30, 2016 at 05:14:03AM -0700, Paul E. McKenney wrote:
> PowerPC does not "obscure" stores, so both stores really are there and
> the lwsync really has effect on all CPUs.  From what I understand, even
> CPUs that do obscure stores only do so in the case of repeated stores
> by the same CPU to the same variable, and the above litmus test doesn't
> have this.
> 
> So all the stores happen, and each CPU's stores are at least locally
> ordered.

OK, when I'm not sure I ever understood the case where smp_wmb() went
wonky on PPC, sadly I cannot now find the email where you mentioned
that :/

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH locking/Documentation 1/2] Add note of release-acquire store vulnerability
  2016-09-30 12:45           ` Peter Zijlstra
@ 2016-09-30 13:10             ` Paul E. McKenney
  0 siblings, 0 replies; 27+ messages in thread
From: Paul E. McKenney @ 2016-09-30 13:10 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Will Deacon, linux-kernel, mingo, dhowells, stern

On Fri, Sep 30, 2016 at 02:45:42PM +0200, Peter Zijlstra wrote:
> On Fri, Sep 30, 2016 at 05:17:21AM -0700, Paul E. McKenney wrote:
> > On Fri, Sep 30, 2016 at 12:25:16PM +0200, Peter Zijlstra wrote:
> 
> > > So its a pretty dumb thing to do in any case (and yes the kernel does
> > > this). Its also entirely expected in my book, that if you generate
> > > conflicting writes on a release, ordering is out the window.
> > > 
> > > Why do we need to call this out? Who in his right mind would want to do
> > > this and expect anything other than wreckage?
> > > 
> > > Not that we're not having too much 'fun' discussing this,.. but I do
> > > wonder why we need to call this out.
> > 
> > You lost me on this one...  If no one does this, sure, we can leave it out.
> > But if some part of the kernel does rely on this, we should call it out as
> > forbidden.  And fix the kernel, of course.
> 
> Well, the kernel does this, but doesn't rely on ordering. Do "git grep
> zap_locks". Its disgusting, can (and does) fail and generally is a sign
> of badly broken code (printk is all that).

;-) ;-) ;-)

> > Or am I missing your point?
> 
> My point was, its obvious crack, anybody doing this needs to have his
> head examined. Then again, maybe we should just say that :-)

I do find that as I get older, emphatically stating the obvious becomes
an increasingly large fraction of my role, so agreed.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH locking/Documentation 1/2] Add note of release-acquire store vulnerability
  2016-09-30 12:51                       ` Peter Zijlstra
@ 2016-09-30 13:35                         ` Paul E. McKenney
  0 siblings, 0 replies; 27+ messages in thread
From: Paul E. McKenney @ 2016-09-30 13:35 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Will Deacon, linux-kernel, mingo, dhowells, stern

On Fri, Sep 30, 2016 at 02:51:13PM +0200, Peter Zijlstra wrote:
> On Fri, Sep 30, 2016 at 05:14:03AM -0700, Paul E. McKenney wrote:
> > PowerPC does not "obscure" stores, so both stores really are there and
> > the lwsync really has effect on all CPUs.  From what I understand, even
> > CPUs that do obscure stores only do so in the case of repeated stores
> > by the same CPU to the same variable, and the above litmus test doesn't
> > have this.
> > 
> > So all the stores happen, and each CPU's stores are at least locally
> > ordered.
> 
> OK, when I'm not sure I ever understood the case where smp_wmb() went
> wonky on PPC, sadly I cannot now find the email where you mentioned
> that :/

First, a better explanation of your example:

	PPC PeterZijlstra+o-r+o-r+a-o-SB.litmus
	{
	0:r1=1; 0:r2=2; 0:r3=x; 0:r4=y;
	1:r1=1; 1:r2=2; 1:r3=x; 1:r4=y;
			2:r3=x; 2:r4=y;
	}
	 P0           | P1           | P2           ;
	 stw r1,0(r3) | stw r2,0(r3) | lwz r1,0(r4) ;
	 lwsync       | lwsync       | lwsync       ;
	 stw r1,0(r4) | stw r2,0(r4) | lwz r2,0(r3) ;
	exists
	(x=2 /\ y=1 /\ 2:r1=1 /\ 2:r2=1)

Given that 2:r1=1, and ignoring P1 for the moment, we have simple message
passing.  If P2 sees P0's store to y, it must also see P0's store to x.

So what happens when we include P1?  Well, we have constrained the test
to the case where P2 sees P0's store to y, so P2's load from x must
still see P0's store to x, or some later store to x.  Either way, given
that P2 sees P0's store to y, it cannot see the initial value of x.
In other words, even if P0's store to x is overwritten, it still has
effect on the ordering.

There are several changes to the litmus test that could require ordering
that lwsync does not provide, which I suppose could be considered to
introduce wonkiness.  ;-)

One is the infamous "Z6.3" litmus test that you called out in your
earlier email.  At least one of the pairs of stores must be separated
by sync rather than lwsync.  Z6.3's third variable defeats lwsync's
local ordering.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2016-09-30 13:35 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-29 15:54 [PATCH locking/Documentation 1/2] Add note of release-acquire store vulnerability Paul E. McKenney
2016-09-29 15:58 ` Peter Zijlstra
2016-09-29 16:03   ` Will Deacon
2016-09-29 16:17     ` Peter Zijlstra
2016-09-29 16:44       ` Paul E. McKenney
2016-09-29 16:43     ` Paul E. McKenney
2016-09-29 17:10       ` Will Deacon
2016-09-29 17:23         ` Paul E. McKenney
2016-09-29 18:04           ` Paul E. McKenney
2016-09-29 18:10             ` Paul E. McKenney
2016-09-29 18:44               ` Peter Zijlstra
2016-09-29 19:18                 ` Paul E. McKenney
2016-09-29 19:36                   ` Alan Stern
2016-09-29 20:26                     ` Paul E. McKenney
2016-09-30  8:53                     ` Peter Zijlstra
2016-09-30  9:00                   ` Peter Zijlstra
2016-09-30  9:57                   ` Peter Zijlstra
2016-09-30 12:14                     ` Paul E. McKenney
2016-09-30 12:51                       ` Peter Zijlstra
2016-09-30 13:35                         ` Paul E. McKenney
2016-09-30  5:53           ` Boqun Feng
2016-09-30  9:20             ` Will Deacon
2016-09-30 11:35               ` Paul E. McKenney
2016-09-30 10:25       ` Peter Zijlstra
2016-09-30 12:17         ` Paul E. McKenney
2016-09-30 12:45           ` Peter Zijlstra
2016-09-30 13:10             ` Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.