All of lore.kernel.org
 help / color / mirror / Atom feed
* LKMM litmus test for Roman Penyaev's rcu-rr
@ 2018-05-28 22:08 Paul E. McKenney
  2018-05-29 18:35   ` Alan Stern
  0 siblings, 1 reply; 45+ messages in thread
From: Paul E. McKenney @ 2018-05-28 22:08 UTC (permalink / raw)
  To: linux-kernel, linux-arch
  Cc: stern, andrea.parri, will.deacon, peterz, boqun.feng, npiggin,
	dhowells, j.alglave, luc.maranget, akiyks, mingo, torvalds,
	roman.penyaev

Hello!

The litmus test below is a first attempt to model Roman's rcu-rr
round-robin RCU-protected linked list.  His test code, which includes
the algorithm under test, may be found here:

https://github.com/rouming/rcu-rr/blob/master/rcu-rr.c

The P0() process below roughly corresponds to remove_conn_from_arr(),
with litmus-test variable "c" standing in for the per-CPU ppcpu_con.
Similarly, P1() roughly corresponds to get_next_conn_rr().  It claims
that the algorithm is safe, and also claims that it becomes unsafe if
either synchronize_rcu() is removed.

Does this in fact realistically model Roman's algorithm?  Either way,
is there a better approach?

							Thanx, Paul

------------------------------------------------------------------------

C C-RomanPenyaev-list-rcu-rr

{
	int *z=1; (* List: v->w->x->y->z. Noncircular, but long enough. *)
	int *y=z;
	int *x=y;
	int *w=x;
	int *v=w; (* List head is v. *)
	int *c=w; (* Cache, emulating ppcpu_con. *)
}

P0(int *c, int *v, int *w, int *x, int *y)
{
	rcu_assign_pointer(*w, y); /* Remove x from list. */
	synchronize_rcu();
	r1 = READ_ONCE(*c);
	if (r1 == x) {
		WRITE_ONCE(*c, 0); /* Invalidate cache. */
		synchronize_rcu();
	}
	smp_store_release(x, 0);  /* Emulate kfree(x). */
}

P1(int *c, int *v)
{
	rcu_read_lock();
	r1 = READ_ONCE(*c); /* Pick up cache. */
	if (r1 == 0) {
		r1 = READ_ONCE(*v); /* Cache empty, start from head. */
	}
	r2 = rcu_dereference(*r1); /* Advance to next element. */
	smp_store_release(c, r2); /* Update cache. */
	rcu_read_unlock();

	/* And repeat. */
	rcu_read_lock();
	r3 = READ_ONCE(*c);
	if (r3 == 0) {
		r3 = READ_ONCE(*v);
	}
	r4 = rcu_dereference(*r3);
	smp_store_release(c, r4);
	rcu_read_unlock();
}

locations [0:r1; 1:r1; 1:r3; c; v; w; x; y]
exists (1:r1=0 \/ 1:r2=0 \/ 1:r3=0 \/ 1:r4=0) (* Better not be freed!!! *)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: LKMM litmus test for Roman Penyaev's rcu-rr
  2018-05-28 22:08 LKMM litmus test for Roman Penyaev's rcu-rr Paul E. McKenney
@ 2018-05-29 18:35   ` Alan Stern
  0 siblings, 0 replies; 45+ messages in thread
From: Alan Stern @ 2018-05-29 18:35 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, linux-arch, andrea.parri, will.deacon, peterz,
	boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks,
	mingo, torvalds, roman.penyaev

On Mon, 28 May 2018, Paul E. McKenney wrote:

> Hello!
> 
> The litmus test below is a first attempt to model Roman's rcu-rr
> round-robin RCU-protected linked list.  His test code, which includes
> the algorithm under test, may be found here:
> 
> https://github.com/rouming/rcu-rr/blob/master/rcu-rr.c
> 
> The P0() process below roughly corresponds to remove_conn_from_arr(),
> with litmus-test variable "c" standing in for the per-CPU ppcpu_con.
> Similarly, P1() roughly corresponds to get_next_conn_rr().  It claims
> that the algorithm is safe, and also claims that it becomes unsafe if
> either synchronize_rcu() is removed.

This algorithm (the one in the litmus test; I haven't looked at Roman's
code) does seem valid.  In addition to removing either
synchronize_rcu(), interchanging the order of the stores in P0 (c
first, then w) would also invalidate it.

This is a little unusual in that c is written by more than one thread 
with no protection.  It works because the writes are all stores of a 
single pointer.

Why does the litmus test use smp_store_release() in three places?  
There doesn't seem to be any need; WRITE_ONCE() would be sufficient.

Alan

> Does this in fact realistically model Roman's algorithm?  Either way,
> is there a better approach?
> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> C C-RomanPenyaev-list-rcu-rr
> 
> {
> 	int *z=1; (* List: v->w->x->y->z. Noncircular, but long enough. *)
> 	int *y=z;
> 	int *x=y;
> 	int *w=x;
> 	int *v=w; (* List head is v. *)
> 	int *c=w; (* Cache, emulating ppcpu_con. *)
> }
> 
> P0(int *c, int *v, int *w, int *x, int *y)
> {
> 	rcu_assign_pointer(*w, y); /* Remove x from list. */
> 	synchronize_rcu();
> 	r1 = READ_ONCE(*c);
> 	if (r1 == x) {
> 		WRITE_ONCE(*c, 0); /* Invalidate cache. */
> 		synchronize_rcu();
> 	}
> 	smp_store_release(x, 0);  /* Emulate kfree(x). */
> }
> 
> P1(int *c, int *v)
> {
> 	rcu_read_lock();
> 	r1 = READ_ONCE(*c); /* Pick up cache. */
> 	if (r1 == 0) {
> 		r1 = READ_ONCE(*v); /* Cache empty, start from head. */
> 	}
> 	r2 = rcu_dereference(*r1); /* Advance to next element. */
> 	smp_store_release(c, r2); /* Update cache. */
> 	rcu_read_unlock();
> 
> 	/* And repeat. */
> 	rcu_read_lock();
> 	r3 = READ_ONCE(*c);
> 	if (r3 == 0) {
> 		r3 = READ_ONCE(*v);
> 	}
> 	r4 = rcu_dereference(*r3);
> 	smp_store_release(c, r4);
> 	rcu_read_unlock();
> }
> 
> locations [0:r1; 1:r1; 1:r3; c; v; w; x; y]
> exists (1:r1=0 \/ 1:r2=0 \/ 1:r3=0 \/ 1:r4=0) (* Better not be freed!!! *)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: LKMM litmus test for Roman Penyaev's rcu-rr
@ 2018-05-29 18:35   ` Alan Stern
  0 siblings, 0 replies; 45+ messages in thread
From: Alan Stern @ 2018-05-29 18:35 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, linux-arch, andrea.parri, will.deacon, peterz,
	boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks,
	mingo, torvalds, roman.penyaev

On Mon, 28 May 2018, Paul E. McKenney wrote:

> Hello!
> 
> The litmus test below is a first attempt to model Roman's rcu-rr
> round-robin RCU-protected linked list.  His test code, which includes
> the algorithm under test, may be found here:
> 
> https://github.com/rouming/rcu-rr/blob/master/rcu-rr.c
> 
> The P0() process below roughly corresponds to remove_conn_from_arr(),
> with litmus-test variable "c" standing in for the per-CPU ppcpu_con.
> Similarly, P1() roughly corresponds to get_next_conn_rr().  It claims
> that the algorithm is safe, and also claims that it becomes unsafe if
> either synchronize_rcu() is removed.

This algorithm (the one in the litmus test; I haven't looked at Roman's
code) does seem valid.  In addition to removing either
synchronize_rcu(), interchanging the order of the stores in P0 (c
first, then w) would also invalidate it.

This is a little unusual in that c is written by more than one thread 
with no protection.  It works because the writes are all stores of a 
single pointer.

Why does the litmus test use smp_store_release() in three places?  
There doesn't seem to be any need; WRITE_ONCE() would be sufficient.

Alan

> Does this in fact realistically model Roman's algorithm?  Either way,
> is there a better approach?
> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> C C-RomanPenyaev-list-rcu-rr
> 
> {
> 	int *z=1; (* List: v->w->x->y->z. Noncircular, but long enough. *)
> 	int *y=z;
> 	int *x=y;
> 	int *w=x;
> 	int *v=w; (* List head is v. *)
> 	int *c=w; (* Cache, emulating ppcpu_con. *)
> }
> 
> P0(int *c, int *v, int *w, int *x, int *y)
> {
> 	rcu_assign_pointer(*w, y); /* Remove x from list. */
> 	synchronize_rcu();
> 	r1 = READ_ONCE(*c);
> 	if (r1 == x) {
> 		WRITE_ONCE(*c, 0); /* Invalidate cache. */
> 		synchronize_rcu();
> 	}
> 	smp_store_release(x, 0);  /* Emulate kfree(x). */
> }
> 
> P1(int *c, int *v)
> {
> 	rcu_read_lock();
> 	r1 = READ_ONCE(*c); /* Pick up cache. */
> 	if (r1 == 0) {
> 		r1 = READ_ONCE(*v); /* Cache empty, start from head. */
> 	}
> 	r2 = rcu_dereference(*r1); /* Advance to next element. */
> 	smp_store_release(c, r2); /* Update cache. */
> 	rcu_read_unlock();
> 
> 	/* And repeat. */
> 	rcu_read_lock();
> 	r3 = READ_ONCE(*c);
> 	if (r3 == 0) {
> 		r3 = READ_ONCE(*v);
> 	}
> 	r4 = rcu_dereference(*r3);
> 	smp_store_release(c, r4);
> 	rcu_read_unlock();
> }
> 
> locations [0:r1; 1:r1; 1:r3; c; v; w; x; y]
> exists (1:r1=0 \/ 1:r2=0 \/ 1:r3=0 \/ 1:r4=0) (* Better not be freed!!! *)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: LKMM litmus test for Roman Penyaev's rcu-rr
  2018-05-29 18:35   ` Alan Stern
  (?)
@ 2018-05-29 19:03   ` Paul E. McKenney
  2018-05-29 20:49       ` Alan Stern
  -1 siblings, 1 reply; 45+ messages in thread
From: Paul E. McKenney @ 2018-05-29 19:03 UTC (permalink / raw)
  To: Alan Stern
  Cc: linux-kernel, linux-arch, andrea.parri, will.deacon, peterz,
	boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks,
	mingo, torvalds, roman.penyaev

On Tue, May 29, 2018 at 02:35:34PM -0400, Alan Stern wrote:
> On Mon, 28 May 2018, Paul E. McKenney wrote:
> 
> > Hello!
> > 
> > The litmus test below is a first attempt to model Roman's rcu-rr
> > round-robin RCU-protected linked list.  His test code, which includes
> > the algorithm under test, may be found here:
> > 
> > https://github.com/rouming/rcu-rr/blob/master/rcu-rr.c
> > 
> > The P0() process below roughly corresponds to remove_conn_from_arr(),
> > with litmus-test variable "c" standing in for the per-CPU ppcpu_con.
> > Similarly, P1() roughly corresponds to get_next_conn_rr().  It claims
> > that the algorithm is safe, and also claims that it becomes unsafe if
> > either synchronize_rcu() is removed.
> 
> This algorithm (the one in the litmus test; I haven't looked at Roman's
> code) does seem valid.  In addition to removing either
> synchronize_rcu(), interchanging the order of the stores in P0 (c
> first, then w) would also invalidate it.
> 
> This is a little unusual in that c is written by more than one thread 
> with no protection.  It works because the writes are all stores of a 
> single pointer.
> 
> Why does the litmus test use smp_store_release() in three places?  
> There doesn't seem to be any need; WRITE_ONCE() would be sufficient.

Because the algorithm did.  A bit of a stretch for kfree, but... ;-)

Let's try removing them, please see below.

> Alan
> 
> > Does this in fact realistically model Roman's algorithm?  Either way,
> > is there a better approach?
> > 
> > 							Thanx, Paul
> > 
> > ------------------------------------------------------------------------
> > 
> > C C-RomanPenyaev-list-rcu-rr
> > 
> > {
> > 	int *z=1; (* List: v->w->x->y->z. Noncircular, but long enough. *)
> > 	int *y=z;
> > 	int *x=y;
> > 	int *w=x;
> > 	int *v=w; (* List head is v. *)
> > 	int *c=w; (* Cache, emulating ppcpu_con. *)
> > }
> > 
> > P0(int *c, int *v, int *w, int *x, int *y)
> > {
> > 	rcu_assign_pointer(*w, y); /* Remove x from list. */

No change when converting this to WRITE_ONCE();

> > 	synchronize_rcu();
> > 	r1 = READ_ONCE(*c);
> > 	if (r1 == x) {
> > 		WRITE_ONCE(*c, 0); /* Invalidate cache. */
> > 		synchronize_rcu();
> > 	}
> > 	smp_store_release(x, 0);  /* Emulate kfree(x). */

Converting this one to WRITE_ONCE() does have an effect:

	Test C-RomanPenyaev-list-rcu-rr Allowed
	States 8
	0:r1=0; 1:r1=w; 1:r2=x; 1:r3=x; 1:r4=0; c=0; v=w; w=y; x=0; y=z;
	0:r1=w; 1:r1=w; 1:r2=y; 1:r3=y; 1:r4=z; c=z; v=w; w=y; x=0; y=z;
	0:r1=x; 1:r1=w; 1:r2=x; 1:r3=w; 1:r4=y; c=y; v=w; w=y; x=0; y=z;
	0:r1=x; 1:r1=w; 1:r2=x; 1:r3=x; 1:r4=y; c=0; v=w; w=y; x=0; y=z;
	0:r1=x; 1:r1=w; 1:r2=x; 1:r3=x; 1:r4=y; c=y; v=w; w=y; x=0; y=z;
	0:r1=y; 1:r1=w; 1:r2=x; 1:r3=x; 1:r4=y; c=y; v=w; w=y; x=0; y=z;
	0:r1=y; 1:r1=w; 1:r2=y; 1:r3=y; 1:r4=z; c=z; v=w; w=y; x=0; y=z;
	0:r1=z; 1:r1=w; 1:r2=y; 1:r3=y; 1:r4=z; c=z; v=w; w=y; x=0; y=z;
	Ok
	Witnesses
	Positive: 1 Negative: 7
	Condition exists (1:r1=0 \/ 1:r2=0 \/ 1:r3=0 \/ 1:r4=0)
	Observation C-RomanPenyaev-list-rcu-rr Sometimes 1 7
	Time C-RomanPenyaev-list-rcu-rr 0.40
	Hash=2ec66290a6622117b9877436950e6a08

Maybe reordered with READ_ONCE(*c) when r1 != x?

> > }
> > 
> > P1(int *c, int *v)
> > {
> > 	rcu_read_lock();
> > 	r1 = READ_ONCE(*c); /* Pick up cache. */
> > 	if (r1 == 0) {
> > 		r1 = READ_ONCE(*v); /* Cache empty, start from head. */
> > 	}
> > 	r2 = rcu_dereference(*r1); /* Advance to next element. */
> > 	smp_store_release(c, r2); /* Update cache. */
> > 	rcu_read_unlock();
> > 
> > 	/* And repeat. */
> > 	rcu_read_lock();
> > 	r3 = READ_ONCE(*c);
> > 	if (r3 == 0) {
> > 		r3 = READ_ONCE(*v);
> > 	}
> > 	r4 = rcu_dereference(*r3);
> > 	smp_store_release(c, r4);

Converting this to WRITE_ONCE() has no effect.

> > 	rcu_read_unlock();
> > }
> > 
> > locations [0:r1; 1:r1; 1:r3; c; v; w; x; y]
> > exists (1:r1=0 \/ 1:r2=0 \/ 1:r3=0 \/ 1:r4=0) (* Better not be freed!!! *)
> 

							Thanx, Paul

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: LKMM litmus test for Roman Penyaev's rcu-rr
  2018-05-29 19:03   ` Paul E. McKenney
@ 2018-05-29 20:49       ` Alan Stern
  0 siblings, 0 replies; 45+ messages in thread
From: Alan Stern @ 2018-05-29 20:49 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, linux-arch, andrea.parri, will.deacon, peterz,
	boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks,
	mingo, torvalds, roman.penyaev

On Tue, 29 May 2018, Paul E. McKenney wrote:

> On Tue, May 29, 2018 at 02:35:34PM -0400, Alan Stern wrote:
> > On Mon, 28 May 2018, Paul E. McKenney wrote:
> > 
> > > Hello!
> > > 
> > > The litmus test below is a first attempt to model Roman's rcu-rr
> > > round-robin RCU-protected linked list.  His test code, which includes
> > > the algorithm under test, may be found here:
> > > 
> > > https://github.com/rouming/rcu-rr/blob/master/rcu-rr.c
> > > 
> > > The P0() process below roughly corresponds to remove_conn_from_arr(),
> > > with litmus-test variable "c" standing in for the per-CPU ppcpu_con.
> > > Similarly, P1() roughly corresponds to get_next_conn_rr().  It claims
> > > that the algorithm is safe, and also claims that it becomes unsafe if
> > > either synchronize_rcu() is removed.
> > 
> > This algorithm (the one in the litmus test; I haven't looked at Roman's
> > code) does seem valid.  In addition to removing either
> > synchronize_rcu(), interchanging the order of the stores in P0 (c
> > first, then w) would also invalidate it.
> > 
> > This is a little unusual in that c is written by more than one thread 
> > with no protection.  It works because the writes are all stores of a 
> > single pointer.
> > 
> > Why does the litmus test use smp_store_release() in three places?  
> > There doesn't seem to be any need; WRITE_ONCE() would be sufficient.
> 
> Because the algorithm did.  A bit of a stretch for kfree, but... ;-)
> 
> Let's try removing them, please see below.
> 
> > Alan
> > 
> > > Does this in fact realistically model Roman's algorithm?  Either way,
> > > is there a better approach?
> > > 
> > > 							Thanx, Paul
> > > 
> > > ------------------------------------------------------------------------
> > > 
> > > C C-RomanPenyaev-list-rcu-rr
> > > 
> > > {
> > > 	int *z=1; (* List: v->w->x->y->z. Noncircular, but long enough. *)
> > > 	int *y=z;
> > > 	int *x=y;
> > > 	int *w=x;
> > > 	int *v=w; (* List head is v. *)
> > > 	int *c=w; (* Cache, emulating ppcpu_con. *)
> > > }
> > > 
> > > P0(int *c, int *v, int *w, int *x, int *y)
> > > {
> > > 	rcu_assign_pointer(*w, y); /* Remove x from list. */
> 
> No change when converting this to WRITE_ONCE();
> 
> > > 	synchronize_rcu();
> > > 	r1 = READ_ONCE(*c);
> > > 	if (r1 == x) {
> > > 		WRITE_ONCE(*c, 0); /* Invalidate cache. */
> > > 		synchronize_rcu();
> > > 	}
> > > 	smp_store_release(x, 0);  /* Emulate kfree(x). */
> 
> Converting this one to WRITE_ONCE() does have an effect:
> 
> 	Test C-RomanPenyaev-list-rcu-rr Allowed
> 	States 8
> 	0:r1=0; 1:r1=w; 1:r2=x; 1:r3=x; 1:r4=0; c=0; v=w; w=y; x=0; y=z;
> 	0:r1=w; 1:r1=w; 1:r2=y; 1:r3=y; 1:r4=z; c=z; v=w; w=y; x=0; y=z;
> 	0:r1=x; 1:r1=w; 1:r2=x; 1:r3=w; 1:r4=y; c=y; v=w; w=y; x=0; y=z;
> 	0:r1=x; 1:r1=w; 1:r2=x; 1:r3=x; 1:r4=y; c=0; v=w; w=y; x=0; y=z;
> 	0:r1=x; 1:r1=w; 1:r2=x; 1:r3=x; 1:r4=y; c=y; v=w; w=y; x=0; y=z;
> 	0:r1=y; 1:r1=w; 1:r2=x; 1:r3=x; 1:r4=y; c=y; v=w; w=y; x=0; y=z;
> 	0:r1=y; 1:r1=w; 1:r2=y; 1:r3=y; 1:r4=z; c=z; v=w; w=y; x=0; y=z;
> 	0:r1=z; 1:r1=w; 1:r2=y; 1:r3=y; 1:r4=z; c=z; v=w; w=y; x=0; y=z;
> 	Ok
> 	Witnesses
> 	Positive: 1 Negative: 7
> 	Condition exists (1:r1=0 \/ 1:r2=0 \/ 1:r3=0 \/ 1:r4=0)
> 	Observation C-RomanPenyaev-list-rcu-rr Sometimes 1 7
> 	Time C-RomanPenyaev-list-rcu-rr 0.40
> 	Hash=2ec66290a6622117b9877436950e6a08
> 
> Maybe reordered with READ_ONCE(*c) when r1 != x?

Probably.  This points out a weakness in the memory model.  Even though
the memory model allows that reordering, no compiler would reorder the
two statements and no CPU would execute the store before the load.

The general pattern is this: Suppose we have

	A;
	if (B)
		C;
	D;

where A is a load, B depends on A, D is a store, and C contains a
memory barrier or something else that orders D after A in the case
where B is true.  Furthermore, assume that B sometimes really is true
(i.e., the compiler can't prove that B is always false).  Then the
compiler is not allowed to move D up before A or B, and consequently
even when B is false, the CPU can't execute D before A.

Putting this into herd would be extremely difficult, if not impossible, 
because it involves analyzing code that was not executed.

Alan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: LKMM litmus test for Roman Penyaev's rcu-rr
@ 2018-05-29 20:49       ` Alan Stern
  0 siblings, 0 replies; 45+ messages in thread
From: Alan Stern @ 2018-05-29 20:49 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, linux-arch, andrea.parri, will.deacon, peterz,
	boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks,
	mingo, torvalds, roman.penyaev

On Tue, 29 May 2018, Paul E. McKenney wrote:

> On Tue, May 29, 2018 at 02:35:34PM -0400, Alan Stern wrote:
> > On Mon, 28 May 2018, Paul E. McKenney wrote:
> > 
> > > Hello!
> > > 
> > > The litmus test below is a first attempt to model Roman's rcu-rr
> > > round-robin RCU-protected linked list.  His test code, which includes
> > > the algorithm under test, may be found here:
> > > 
> > > https://github.com/rouming/rcu-rr/blob/master/rcu-rr.c
> > > 
> > > The P0() process below roughly corresponds to remove_conn_from_arr(),
> > > with litmus-test variable "c" standing in for the per-CPU ppcpu_con.
> > > Similarly, P1() roughly corresponds to get_next_conn_rr().  It claims
> > > that the algorithm is safe, and also claims that it becomes unsafe if
> > > either synchronize_rcu() is removed.
> > 
> > This algorithm (the one in the litmus test; I haven't looked at Roman's
> > code) does seem valid.  In addition to removing either
> > synchronize_rcu(), interchanging the order of the stores in P0 (c
> > first, then w) would also invalidate it.
> > 
> > This is a little unusual in that c is written by more than one thread 
> > with no protection.  It works because the writes are all stores of a 
> > single pointer.
> > 
> > Why does the litmus test use smp_store_release() in three places?  
> > There doesn't seem to be any need; WRITE_ONCE() would be sufficient.
> 
> Because the algorithm did.  A bit of a stretch for kfree, but... ;-)
> 
> Let's try removing them, please see below.
> 
> > Alan
> > 
> > > Does this in fact realistically model Roman's algorithm?  Either way,
> > > is there a better approach?
> > > 
> > > 							Thanx, Paul
> > > 
> > > ------------------------------------------------------------------------
> > > 
> > > C C-RomanPenyaev-list-rcu-rr
> > > 
> > > {
> > > 	int *z=1; (* List: v->w->x->y->z. Noncircular, but long enough. *)
> > > 	int *y=z;
> > > 	int *x=y;
> > > 	int *w=x;
> > > 	int *v=w; (* List head is v. *)
> > > 	int *c=w; (* Cache, emulating ppcpu_con. *)
> > > }
> > > 
> > > P0(int *c, int *v, int *w, int *x, int *y)
> > > {
> > > 	rcu_assign_pointer(*w, y); /* Remove x from list. */
> 
> No change when converting this to WRITE_ONCE();
> 
> > > 	synchronize_rcu();
> > > 	r1 = READ_ONCE(*c);
> > > 	if (r1 == x) {
> > > 		WRITE_ONCE(*c, 0); /* Invalidate cache. */
> > > 		synchronize_rcu();
> > > 	}
> > > 	smp_store_release(x, 0);  /* Emulate kfree(x). */
> 
> Converting this one to WRITE_ONCE() does have an effect:
> 
> 	Test C-RomanPenyaev-list-rcu-rr Allowed
> 	States 8
> 	0:r1=0; 1:r1=w; 1:r2=x; 1:r3=x; 1:r4=0; c=0; v=w; w=y; x=0; y=z;
> 	0:r1=w; 1:r1=w; 1:r2=y; 1:r3=y; 1:r4=z; c=z; v=w; w=y; x=0; y=z;
> 	0:r1=x; 1:r1=w; 1:r2=x; 1:r3=w; 1:r4=y; c=y; v=w; w=y; x=0; y=z;
> 	0:r1=x; 1:r1=w; 1:r2=x; 1:r3=x; 1:r4=y; c=0; v=w; w=y; x=0; y=z;
> 	0:r1=x; 1:r1=w; 1:r2=x; 1:r3=x; 1:r4=y; c=y; v=w; w=y; x=0; y=z;
> 	0:r1=y; 1:r1=w; 1:r2=x; 1:r3=x; 1:r4=y; c=y; v=w; w=y; x=0; y=z;
> 	0:r1=y; 1:r1=w; 1:r2=y; 1:r3=y; 1:r4=z; c=z; v=w; w=y; x=0; y=z;
> 	0:r1=z; 1:r1=w; 1:r2=y; 1:r3=y; 1:r4=z; c=z; v=w; w=y; x=0; y=z;
> 	Ok
> 	Witnesses
> 	Positive: 1 Negative: 7
> 	Condition exists (1:r1=0 \/ 1:r2=0 \/ 1:r3=0 \/ 1:r4=0)
> 	Observation C-RomanPenyaev-list-rcu-rr Sometimes 1 7
> 	Time C-RomanPenyaev-list-rcu-rr 0.40
> 	Hash=2ec66290a6622117b9877436950e6a08
> 
> Maybe reordered with READ_ONCE(*c) when r1 != x?

Probably.  This points out a weakness in the memory model.  Even though
the memory model allows that reordering, no compiler would reorder the
two statements and no CPU would execute the store before the load.

The general pattern is this: Suppose we have

	A;
	if (B)
		C;
	D;

where A is a load, B depends on A, D is a store, and C contains a
memory barrier or something else that orders D after A in the case
where B is true.  Furthermore, assume that B sometimes really is true
(i.e., the compiler can't prove that B is always false).  Then the
compiler is not allowed to move D up before A or B, and consequently
even when B is false, the CPU can't execute D before A.

Putting this into herd would be extremely difficult, if not impossible, 
because it involves analyzing code that was not executed.

Alan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: LKMM litmus test for Roman Penyaev's rcu-rr
  2018-05-29 20:49       ` Alan Stern
  (?)
@ 2018-05-29 21:10       ` Linus Torvalds
  2018-05-29 22:53         ` Paul E. McKenney
  2018-05-30 14:29           ` Alan Stern
  -1 siblings, 2 replies; 45+ messages in thread
From: Linus Torvalds @ 2018-05-29 21:10 UTC (permalink / raw)
  To: Alan Stern
  Cc: Paul McKenney, Linux Kernel Mailing List, linux-arch,
	andrea.parri, Will Deacon, Peter Zijlstra, Boqun Feng,
	Nick Piggin, David Howells, Jade Alglave, Luc Maranget,
	Akira Yokosawa, Ingo Molnar, Roman Pen

On Tue, May 29, 2018 at 3:49 PM Alan Stern <stern@rowland.harvard.edu>
wrote:

> Putting this into herd would be extremely difficult, if not impossible,
> because it involves analyzing code that was not executed.

Does it?

Can't we simplify the whole sequence as basically

     A
     if (!B)
         D

for that "not B" case, and just think about that. IOW, let's ignore the
whole "not executed" code.

If B depends on A like you state, then that already implies that the write
in D cannot come before the read of A.

You fundamentally cannot do a conditional write before the read that the
write condition depends on. So *any* write after a conditional is dependent
on the read.

So the existence of C - whether it has a barrier or not - is entirely
immaterial at run-time.

Now, the *compiler* can use the whole existence of that memory barrier in C
to determine whether it can re-order the write to D or not, of course, but
that's a separate issue, and then the whole "code that isn't executed" is
not the issue any more. The compiler obviously sees all code, whether
executing or not.

Or am I being stupid and missing something entirely? That's possible.

               Linus

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: LKMM litmus test for Roman Penyaev's rcu-rr
  2018-05-29 21:10       ` Linus Torvalds
@ 2018-05-29 22:53         ` Paul E. McKenney
  2018-05-30 14:46             ` Alan Stern
  2018-05-30 14:29           ` Alan Stern
  1 sibling, 1 reply; 45+ messages in thread
From: Paul E. McKenney @ 2018-05-29 22:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alan Stern, Linux Kernel Mailing List, linux-arch, andrea.parri,
	Will Deacon, Peter Zijlstra, Boqun Feng, Nick Piggin,
	David Howells, Jade Alglave, Luc Maranget, Akira Yokosawa,
	Ingo Molnar, Roman Pen

On Tue, May 29, 2018 at 04:10:02PM -0500, Linus Torvalds wrote:
> On Tue, May 29, 2018 at 3:49 PM Alan Stern <stern@rowland.harvard.edu>
> wrote:
> 
> > Putting this into herd would be extremely difficult, if not impossible,
> > because it involves analyzing code that was not executed.

One (ugly) way to handle it, assuming we are correct about what it
happening, would be to place ordering on the other side of the "if"
that is at least as strong as on the first side.  Probably some example
that completely breaks this approach, though...

> Does it?
> 
> Can't we simplify the whole sequence as basically
> 
>      A
>      if (!B)
>          D
> 
> for that "not B" case, and just think about that. IOW, let's ignore the
> whole "not executed" code.
> 
> If B depends on A like you state, then that already implies that the write
> in D cannot come before the read of A.
> 
> You fundamentally cannot do a conditional write before the read that the
> write condition depends on. So *any* write after a conditional is dependent
> on the read.
> 
> So the existence of C - whether it has a barrier or not - is entirely
> immaterial at run-time.
> 
> Now, the *compiler* can use the whole existence of that memory barrier in C
> to determine whether it can re-order the write to D or not, of course, but
> that's a separate issue, and then the whole "code that isn't executed" is
> not the issue any more. The compiler obviously sees all code, whether
> executing or not.
> 
> Or am I being stupid and missing something entirely? That's possible.

This will take some analysis, both to make sure that I got Roman's
example correct and to get to the bottom of exactly what LKMM thinks
can be reordered.  I am shifting timezones eastward, so I am not going
to dig into it today.

But here are a couple of things that take some getting used to:

1.	The "if (r1 == x)" would likely be "if (r1 == &x)" in the Linux
	kernel.

2.	Unless there is something explicit stopping the reordering, the
	herd tool assumes that the compiler can reorder unrelated code
	completely across the entirety of an "if" statement.  It might
	well have decided that it could do so in this case, due to the
	fact that the "if" statement isn't doing anything with x (just
	with its address).

	But yes, given that r1 comes from the load from *c, it would
	be difficult (at best) to actually apply that optimization in
	this case.

But let's find out what is really happening.  Easy to speculate, but
much harder to speculate correctly.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: LKMM litmus test for Roman Penyaev's rcu-rr
  2018-05-29 21:10       ` Linus Torvalds
@ 2018-05-30 14:29           ` Alan Stern
  2018-05-30 14:29           ` Alan Stern
  1 sibling, 0 replies; 45+ messages in thread
From: Alan Stern @ 2018-05-30 14:29 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Paul McKenney, Linux Kernel Mailing List, linux-arch,
	andrea.parri, Will Deacon, Peter Zijlstra, Boqun Feng,
	Nick Piggin, David Howells, Jade Alglave, Luc Maranget,
	Akira Yokosawa, Ingo Molnar, Roman Pen

On Tue, 29 May 2018, Linus Torvalds wrote:

> On Tue, May 29, 2018 at 3:49 PM Alan Stern <stern@rowland.harvard.edu>
> wrote:
> 
> > Putting this into herd would be extremely difficult, if not impossible,
> > because it involves analyzing code that was not executed.
> 
> Does it?
> 
> Can't we simplify the whole sequence as basically
> 
>      A
>      if (!B)
>          D
> 
> for that "not B" case, and just think about that. IOW, let's ignore the
> whole "not executed" code.

Your listing is slightly misleading.  It really should be:

	A
	if (!B)
		; // NOP
	D

In other words, D should be beyond the end of the "if" statement, not 
inside one of the branches.  At run time, of course, it doesn't matter; 
CPUs don't try to detect where the two branches of an "if" recombine.
(Leaving aside issues like implementing an "if" as a move-conditional.)

> If B depends on A like you state, then that already implies that the write
> in D cannot come before the read of A.
> 
> You fundamentally cannot do a conditional write before the read that the
> write condition depends on. So *any* write after a conditional is dependent
> on the read.

Remember, the original code was:

	A
	if (!B)
		C
	D

So the execution of D is _not_ conditional, and it doesn't depend on A 
or B.  (Again, CPUs don't make this distinction, but compilers do.)

> So the existence of C - whether it has a barrier or not - is entirely
> immaterial at run-time.
> 
> Now, the *compiler* can use the whole existence of that memory barrier in C
> to determine whether it can re-order the write to D or not, of course, but
> that's a separate issue, and then the whole "code that isn't executed" is
> not the issue any more. The compiler obviously sees all code, whether
> executing or not.
> 
> Or am I being stupid and missing something entirely? That's possible.

Not at all -- that point about the compiler is exactly what I was
trying to get at.  The object code the compiler generates depends on
the contents of _both_ branches of the "if" statement, even though only
one branch is executed at run time.  Hence an analysis based entirely
on what instructions are executed in a particular run (which is what
herd does) will necessarily be incomplete.

That's what happened here.  The LKMM herd model treats statements
beyond the end of an "if" statement as though the compiler can reorder
them before the start of the "if", unless something in the branch which
was taken prevents this reordering.  But herd ignores the contents of
the untaken branch, even when they would block the reordering.

Alan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: LKMM litmus test for Roman Penyaev's rcu-rr
@ 2018-05-30 14:29           ` Alan Stern
  0 siblings, 0 replies; 45+ messages in thread
From: Alan Stern @ 2018-05-30 14:29 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Paul McKenney, Linux Kernel Mailing List, linux-arch,
	andrea.parri, Will Deacon, Peter Zijlstra, Boqun Feng,
	Nick Piggin, David Howells, Jade Alglave, Luc Maranget,
	Akira Yokosawa, Ingo Molnar, Roman Pen

On Tue, 29 May 2018, Linus Torvalds wrote:

> On Tue, May 29, 2018 at 3:49 PM Alan Stern <stern@rowland.harvard.edu>
> wrote:
> 
> > Putting this into herd would be extremely difficult, if not impossible,
> > because it involves analyzing code that was not executed.
> 
> Does it?
> 
> Can't we simplify the whole sequence as basically
> 
>      A
>      if (!B)
>          D
> 
> for that "not B" case, and just think about that. IOW, let's ignore the
> whole "not executed" code.

Your listing is slightly misleading.  It really should be:

	A
	if (!B)
		; // NOP
	D

In other words, D should be beyond the end of the "if" statement, not 
inside one of the branches.  At run time, of course, it doesn't matter; 
CPUs don't try to detect where the two branches of an "if" recombine.
(Leaving aside issues like implementing an "if" as a move-conditional.)

> If B depends on A like you state, then that already implies that the write
> in D cannot come before the read of A.
> 
> You fundamentally cannot do a conditional write before the read that the
> write condition depends on. So *any* write after a conditional is dependent
> on the read.

Remember, the original code was:

	A
	if (!B)
		C
	D

So the execution of D is _not_ conditional, and it doesn't depend on A 
or B.  (Again, CPUs don't make this distinction, but compilers do.)

> So the existence of C - whether it has a barrier or not - is entirely
> immaterial at run-time.
> 
> Now, the *compiler* can use the whole existence of that memory barrier in C
> to determine whether it can re-order the write to D or not, of course, but
> that's a separate issue, and then the whole "code that isn't executed" is
> not the issue any more. The compiler obviously sees all code, whether
> executing or not.
> 
> Or am I being stupid and missing something entirely? That's possible.

Not at all -- that point about the compiler is exactly what I was
trying to get at.  The object code the compiler generates depends on
the contents of _both_ branches of the "if" statement, even though only
one branch is executed at run time.  Hence an analysis based entirely
on what instructions are executed in a particular run (which is what
herd does) will necessarily be incomplete.

That's what happened here.  The LKMM herd model treats statements
beyond the end of an "if" statement as though the compiler can reorder
them before the start of the "if", unless something in the branch which
was taken prevents this reordering.  But herd ignores the contents of
the untaken branch, even when they would block the reordering.

Alan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: LKMM litmus test for Roman Penyaev's rcu-rr
  2018-05-29 22:53         ` Paul E. McKenney
@ 2018-05-30 14:46             ` Alan Stern
  0 siblings, 0 replies; 45+ messages in thread
From: Alan Stern @ 2018-05-30 14:46 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Linus Torvalds, Linux Kernel Mailing List, linux-arch,
	andrea.parri, Will Deacon, Peter Zijlstra, Boqun Feng,
	Nick Piggin, David Howells, Jade Alglave, Luc Maranget,
	Akira Yokosawa, Ingo Molnar, Roman Pen

On Tue, 29 May 2018, Paul E. McKenney wrote:

> On Tue, May 29, 2018 at 04:10:02PM -0500, Linus Torvalds wrote:
> > On Tue, May 29, 2018 at 3:49 PM Alan Stern <stern@rowland.harvard.edu>
> > wrote:
> > 
> > > Putting this into herd would be extremely difficult, if not impossible,
> > > because it involves analyzing code that was not executed.
> 
> One (ugly) way to handle it, assuming we are correct about what it
> happening, would be to place ordering on the other side of the "if"
> that is at least as strong as on the first side.  Probably some example
> that completely breaks this approach, though...
> 
> > Does it?
> > 
> > Can't we simplify the whole sequence as basically
> > 
> >      A
> >      if (!B)
> >          D
> > 
> > for that "not B" case, and just think about that. IOW, let's ignore the
> > whole "not executed" code.
> > 
> > If B depends on A like you state, then that already implies that the write
> > in D cannot come before the read of A.
> > 
> > You fundamentally cannot do a conditional write before the read that the
> > write condition depends on. So *any* write after a conditional is dependent
> > on the read.
> > 
> > So the existence of C - whether it has a barrier or not - is entirely
> > immaterial at run-time.
> > 
> > Now, the *compiler* can use the whole existence of that memory barrier in C
> > to determine whether it can re-order the write to D or not, of course, but
> > that's a separate issue, and then the whole "code that isn't executed" is
> > not the issue any more. The compiler obviously sees all code, whether
> > executing or not.
> > 
> > Or am I being stupid and missing something entirely? That's possible.
> 
> This will take some analysis, both to make sure that I got Roman's
> example correct and to get to the bottom of exactly what LKMM thinks
> can be reordered.  I am shifting timezones eastward, so I am not going
> to dig into it today.
> 
> But here are a couple of things that take some getting used to:
> 
> 1.	The "if (r1 == x)" would likely be "if (r1 == &x)" in the Linux
> 	kernel.
> 
> 2.	Unless there is something explicit stopping the reordering, the
> 	herd tool assumes that the compiler can reorder unrelated code
> 	completely across the entirety of an "if" statement.  It might
> 	well have decided that it could do so in this case, due to the
> 	fact that the "if" statement isn't doing anything with x (just
> 	with its address).
> 
> 	But yes, given that r1 comes from the load from *c, it would
> 	be difficult (at best) to actually apply that optimization in
> 	this case.
> 
> But let's find out what is really happening.  Easy to speculate, but
> much harder to speculate correctly.  ;-)

What's really happening is that herd doesn't realize the load from *c
must execute before the store to x (as in your 2 above).  You can see
this if you add the following to linux-kernel.cat:

let prop2 = ((prop \ id) & int)

and then run the modified litmus test through herd with "-show prop
-doshow prop2 -gv".  The output graph has a prop2 link from the store
to x leading back to the load from c, indicating that in this
execution, the store must execute before the load.

Alan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: LKMM litmus test for Roman Penyaev's rcu-rr
@ 2018-05-30 14:46             ` Alan Stern
  0 siblings, 0 replies; 45+ messages in thread
From: Alan Stern @ 2018-05-30 14:46 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Linus Torvalds, Linux Kernel Mailing List, linux-arch,
	andrea.parri, Will Deacon, Peter Zijlstra, Boqun Feng,
	Nick Piggin, David Howells, Jade Alglave, Luc Maranget,
	Akira Yokosawa, Ingo Molnar, Roman Pen

On Tue, 29 May 2018, Paul E. McKenney wrote:

> On Tue, May 29, 2018 at 04:10:02PM -0500, Linus Torvalds wrote:
> > On Tue, May 29, 2018 at 3:49 PM Alan Stern <stern@rowland.harvard.edu>
> > wrote:
> > 
> > > Putting this into herd would be extremely difficult, if not impossible,
> > > because it involves analyzing code that was not executed.
> 
> One (ugly) way to handle it, assuming we are correct about what it
> happening, would be to place ordering on the other side of the "if"
> that is at least as strong as on the first side.  Probably some example
> that completely breaks this approach, though...
> 
> > Does it?
> > 
> > Can't we simplify the whole sequence as basically
> > 
> >      A
> >      if (!B)
> >          D
> > 
> > for that "not B" case, and just think about that. IOW, let's ignore the
> > whole "not executed" code.
> > 
> > If B depends on A like you state, then that already implies that the write
> > in D cannot come before the read of A.
> > 
> > You fundamentally cannot do a conditional write before the read that the
> > write condition depends on. So *any* write after a conditional is dependent
> > on the read.
> > 
> > So the existence of C - whether it has a barrier or not - is entirely
> > immaterial at run-time.
> > 
> > Now, the *compiler* can use the whole existence of that memory barrier in C
> > to determine whether it can re-order the write to D or not, of course, but
> > that's a separate issue, and then the whole "code that isn't executed" is
> > not the issue any more. The compiler obviously sees all code, whether
> > executing or not.
> > 
> > Or am I being stupid and missing something entirely? That's possible.
> 
> This will take some analysis, both to make sure that I got Roman's
> example correct and to get to the bottom of exactly what LKMM thinks
> can be reordered.  I am shifting timezones eastward, so I am not going
> to dig into it today.
> 
> But here are a couple of things that take some getting used to:
> 
> 1.	The "if (r1 == x)" would likely be "if (r1 == &x)" in the Linux
> 	kernel.
> 
> 2.	Unless there is something explicit stopping the reordering, the
> 	herd tool assumes that the compiler can reorder unrelated code
> 	completely across the entirety of an "if" statement.  It might
> 	well have decided that it could do so in this case, due to the
> 	fact that the "if" statement isn't doing anything with x (just
> 	with its address).
> 
> 	But yes, given that r1 comes from the load from *c, it would
> 	be difficult (at best) to actually apply that optimization in
> 	this case.
> 
> But let's find out what is really happening.  Easy to speculate, but
> much harder to speculate correctly.  ;-)

What's really happening is that herd doesn't realize the load from *c
must execute before the store to x (as in your 2 above).  You can see
this if you add the following to linux-kernel.cat:

let prop2 = ((prop \ id) & int)

and then run the modified litmus test through herd with "-show prop
-doshow prop2 -gv".  The output graph has a prop2 link from the store
to x leading back to the load from c, indicating that in this
execution, the store must execute before the load.

Alan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: LKMM litmus test for Roman Penyaev's rcu-rr
  2018-05-30 14:29           ` Alan Stern
  (?)
@ 2018-05-30 14:59           ` Linus Torvalds
  2018-05-30 18:10             ` Alan Stern
  2018-05-30 18:38             ` Paul E. McKenney
  -1 siblings, 2 replies; 45+ messages in thread
From: Linus Torvalds @ 2018-05-30 14:59 UTC (permalink / raw)
  To: Alan Stern
  Cc: Paul McKenney, Linux Kernel Mailing List, linux-arch,
	andrea.parri, Will Deacon, Peter Zijlstra, Boqun Feng,
	Nick Piggin, David Howells, Jade Alglave, Luc Maranget,
	Akira Yokosawa, Ingo Molnar, Roman Pen

On Wed, May 30, 2018 at 9:29 AM Alan Stern <stern@rowland.harvard.edu>
wrote:

> >
> > Can't we simplify the whole sequence as basically
> >
> >      A
> >      if (!B)
> >          D
> >
> > for that "not B" case, and just think about that. IOW, let's ignore the
> > whole "not executed" code.

> Your listing is slightly misleading.

No. You're confused.

You're confused because you're conflating two *entirely* different things.

You're conflating the static source code with the dynamic execution. They
are NOT THE SAME.

   It really should be:

>          A
>          if (!B)
>                  ; // NOP
>          D

No it really really shouldn't.

There are two completely different situations:

(1) there is the source code:

     A
     if (B)
          C
     D

where  C contains a barrier, and B depends on A and is not statically
determinable.

In the source code, 'D' looks unconditional BUT IT DAMN WELL IS NOT.

It's not unconditional - it's just done in both conditions! That's a big
big difference.

> In other words, D should be beyond the end of the "if" statement, not
> inside one of the branches.

You're just completely confused.

What you are stating makes no sense at all.

Seriously, your reading of the code is entirely monsenscal, and seems to be
about syntax, not about semantics. Which is crazy.

Lookie here, you can change the syntactic model of that code to just be

     A
     if (B)
         C
         D
     else
         D

and that code obviously has the EXACT SAME SEMANTICS.

So if you get hung up on trivial syntactic issues, you are by definition
confused, and your tool is garbage. You're doing memory ordering analysis,
not syntax parsing, for chrissake!

>   At run time, of course, it doesn't matter;
> CPUs don't try to detect where the two branches of an "if" recombine.
> (Leaving aside issues like implementing an "if" as a move-conditional.)

You cannot do it as a move-conditional, since that code generation would
have been buggy shit, exactly because of C. But that's a code generation
issue, not a run-time decision.

So at run-time, the code ends up being

     A
     if (!B)
         D

and D cannot be written before A has been read, because B depends on A, and
you cannot expose specutive writes before the preconditions have been
evaluated.

> Remember, the original code was:

>          A
>          if (!B)
>                  C
>          D

> So the execution of D is _not_ conditional, and it doesn't depend on A
> or B.  (Again, CPUs don't make this distinction, but compilers do.)

Again, the above is nothing but confused bullshit.

D depends on B, which depends on A.

Really. Really really.

Anybody - or any tool - that doesn't see that is fundamentally wrong, and
has been confused by syntax.

A *compiler* will very much also make that distinction. If it doesn't make
that distinction, it's not a compiler, it's a buggy piece of shit.

Because semantics matter.

Think about it.

                  Linus

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: LKMM litmus test for Roman Penyaev's rcu-rr
  2018-05-30 14:59           ` Linus Torvalds
@ 2018-05-30 18:10             ` Alan Stern
  2018-05-30 18:38             ` Paul E. McKenney
  1 sibling, 0 replies; 45+ messages in thread
From: Alan Stern @ 2018-05-30 18:10 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Paul McKenney, Linux Kernel Mailing List, linux-arch,
	Andrea Parri, Will Deacon, Peter Zijlstra, Boqun Feng,
	Nick Piggin, David Howells, Jade Alglave, Luc Maranget,
	Akira Yokosawa, Ingo Molnar, Roman Pen

On Wed, 30 May 2018, Linus Torvalds wrote:

> On Wed, May 30, 2018 at 9:29 AM Alan Stern <stern@rowland.harvard.edu>
> wrote:
> 
> > >
> > > Can't we simplify the whole sequence as basically
> > >
> > >      A
> > >      if (!B)
> > >          D
> > >
> > > for that "not B" case, and just think about that. IOW, let's ignore the
> > > whole "not executed" code.
> 
> > Your listing is slightly misleading.
> 
> No. You're confused.
> 
> You're confused because you're conflating two *entirely* different things.
> 
> You're conflating the static source code with the dynamic execution. They
> are NOT THE SAME.

I'll do my best not to conflate them.

Your phrasing was not the greatest in this regard; it looked like you
were trying to talk about a different static source program.

>    It really should be:
> 
> >          A
> >          if (!B)
> >                  ; // NOP
> >          D
> 
> No it really really shouldn't.
> 
> There are two completely different situations:
> 
> (1) there is the source code:
> 
>      A
>      if (B)
>           C
>      D
> 
> where  C contains a barrier, and B depends on A and is not statically
> determinable.
> 
> In the source code, 'D' looks unconditional BUT IT DAMN WELL IS NOT.
> 
> It's not unconditional - it's just done in both conditions! That's a big
> big difference.

Can you explain in greater detail?  A lot of people would say that
something which occurs under either condition _is_ unconditional, by
definition.

> > In other words, D should be beyond the end of the "if" statement, not
> > inside one of the branches.
> 
> You're just completely confused.
> 
> What you are stating makes no sense at all.
> 
> Seriously, your reading of the code is entirely monsenscal, and seems to be
> about syntax, not about semantics. Which is crazy.
> 
> Lookie here, you can change the syntactic model of that code to just be
> 
>      A
>      if (B)
>          C
>          D
>      else
>          D
> 
> and that code obviously has the EXACT SAME SEMANTICS.

That is not entirely clear.  The semantics of a source program are
partly determined by the memory model, and the quality of the memory
model is exactly what this email thread is about.  Under a defective
memory model, this code might indeed have different semantics from the
original.

Granted, it _shouldn't_ -- but our LKMM is not ideal.  As an example,
one of the known defects of the memory model (we have pointed it out
repeatedly from the very beginning) is that when identical writes occur
in the two branches of an "if" statement, just as in your code above,
the model does not allow for the possibility that the compiler may
combine the two writes into one and move it up before the "if".

That's not the issue in this case, but it does show that the memory
model we are working with isn't perfect.  The subject of this
discussion is another example in which the memory model fails to give
the desired result.

[It's also worth pointing out that semantics vary with context.  Is
"x = 1; y = 2;" semantically equivalent to "y = 2; x = 1;"?  It is in
a single-threaded setting.  But in a multi-threaded setting it need not
be.]

> So if you get hung up on trivial syntactic issues, you are by definition
> confused, and your tool is garbage. You're doing memory ordering analysis,
> not syntax parsing, for chrissake!

To be fair, we are (or rather, herd is) doing both.

> >   At run time, of course, it doesn't matter;
> > CPUs don't try to detect where the two branches of an "if" recombine.
> > (Leaving aside issues like implementing an "if" as a move-conditional.)
> 
> You cannot do it as a move-conditional, since that code generation would
> have been buggy shit, exactly because of C. But that's a code generation
> issue, not a run-time decision.
> 
> So at run-time, the code ends up being
> 
>      A
>      if (!B)
>          D

Sorry, not clear enough.  Do you mean that the CPU ends up executing
essentially the same sequence of assembly instructions as you would get
from a naive compilation of this source code?  If so, then I agree.

By the same reasoning, at run-time the code also ends up being

	A
	if (B)
		;
	D

(that is, the sequence of executed assembly instructions would be
essentially the same).

> and D cannot be written before A has been read, because B depends on A, and
> you cannot expose specutive writes before the preconditions have been
> evaluated.
> 
> > Remember, the original code was:
> 
> >          A
> >          if (!B)
> >                  C
> >          D
> 
> > So the execution of D is _not_ conditional, and it doesn't depend on A
> > or B.  (Again, CPUs don't make this distinction, but compilers do.)
> 
> Again, the above is nothing but confused bullshit.
> 
> D depends on B, which depends on A.
> 
> Really. Really really.

Sure, in the code you wrote there is a control dependency from A to D.  
I would never say otherwise.

But my earlier email concerned the original source code outline.  
Whether the two are semantically equivalent or not, they are certainly
different syntactically.  herd's analysis has a large syntactic
component; given the original source code, LKMM does not generate a
control dependency from A to D.

You may think this is a weakness in LKMM: It treats two semantically
equivalent programs differently just because of syntactic variations.  
I wouldn't disagree.  (Much the same could be said of real C
compilers.)

In fact, the whole point of my original email was that this code 
pattern illustrates a weakness in LKMM.

> Anybody - or any tool - that doesn't see that is fundamentally wrong, and
> has been confused by syntax.
> 
> A *compiler* will very much also make that distinction. If it doesn't make
> that distinction, it's not a compiler, it's a buggy piece of shit.
> 
> Because semantics matter.
> 
> Think about it.

In the end, it seems that we are in violent agreement.  :-)

Alan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: LKMM litmus test for Roman Penyaev's rcu-rr
  2018-05-30 14:59           ` Linus Torvalds
  2018-05-30 18:10             ` Alan Stern
@ 2018-05-30 18:38             ` Paul E. McKenney
  2018-05-30 19:08                 ` Alan Stern
  1 sibling, 1 reply; 45+ messages in thread
From: Paul E. McKenney @ 2018-05-30 18:38 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alan Stern, Linux Kernel Mailing List, linux-arch, andrea.parri,
	Will Deacon, Peter Zijlstra, Boqun Feng, Nick Piggin,
	David Howells, Jade Alglave, Luc Maranget, Akira Yokosawa,
	Ingo Molnar, Roman Pen

On Wed, May 30, 2018 at 09:59:28AM -0500, Linus Torvalds wrote:
> On Wed, May 30, 2018 at 9:29 AM Alan Stern <stern@rowland.harvard.edu>
> wrote:
> 
> > >
> > > Can't we simplify the whole sequence as basically
> > >
> > >      A
> > >      if (!B)
> > >          D
> > >
> > > for that "not B" case, and just think about that. IOW, let's ignore the
> > > whole "not executed" code.
> 
> > Your listing is slightly misleading.
> 
> No. You're confused.
> 
> You're confused because you're conflating two *entirely* different things.
> 
> You're conflating the static source code with the dynamic execution. They
> are NOT THE SAME.

I am going to go out on a limb and assert that Linus is talking about
what gcc and hardware do, while Alan is talking about what the tool and
memory model do.  In a perfect world, these would be the same, but it
appears that the world might not be perfect just now.

My current guess is that we need to change the memory-model tool.  If
that is the case, here are some possible resolutions:

1.	Make herd's C-language control dependencies work the same as its
	assembly language, so that they extend beyond the end of the
	"if" statement.  I believe that this would make Roman's case
	work, but it could claim that other situations are safe that
	are actually problematic due to compiler optimizations.

	The fact that the model currently handles only READ_ONCE()
	and WRITE_ONCE() and not unmarked reads and writes make this
	option more attractive than it otherwise be, compilers not
	being allowed to reorder volatile accesses, but we are likely
	to introduce unmarked accesses sometime in the future.

2.	Like #1 above, but only if something in one of the "if"'s
	branches would prevent the compiler from reordering
	(smp_mb(), synchronize_rcu(), value-returning non-relaxed
	RMW atomic, ...).  Easy for me to say, but I am guessing
	that much violence would be needed to the tooling to make
	this work.  ;-)

If I understand Alan correctly, there is not an obvious way to make
this change within the confines of the memory model's .bell and .cat
files.

						Thanx, Paul

>    It really should be:
> 
> >          A
> >          if (!B)
> >                  ; // NOP
> >          D
> 
> No it really really shouldn't.
> 
> There are two completely different situations:
> 
> (1) there is the source code:
> 
>      A
>      if (B)
>           C
>      D
> 
> where  C contains a barrier, and B depends on A and is not statically
> determinable.
> 
> In the source code, 'D' looks unconditional BUT IT DAMN WELL IS NOT.
> 
> It's not unconditional - it's just done in both conditions! That's a big
> big difference.
> 
> > In other words, D should be beyond the end of the "if" statement, not
> > inside one of the branches.
> 
> You're just completely confused.
> 
> What you are stating makes no sense at all.
> 
> Seriously, your reading of the code is entirely monsenscal, and seems to be
> about syntax, not about semantics. Which is crazy.
> 
> Lookie here, you can change the syntactic model of that code to just be
> 
>      A
>      if (B)
>          C
>          D
>      else
>          D
> 
> and that code obviously has the EXACT SAME SEMANTICS.
> 
> So if you get hung up on trivial syntactic issues, you are by definition
> confused, and your tool is garbage. You're doing memory ordering analysis,
> not syntax parsing, for chrissake!
> 
> >   At run time, of course, it doesn't matter;
> > CPUs don't try to detect where the two branches of an "if" recombine.
> > (Leaving aside issues like implementing an "if" as a move-conditional.)
> 
> You cannot do it as a move-conditional, since that code generation would
> have been buggy shit, exactly because of C. But that's a code generation
> issue, not a run-time decision.
> 
> So at run-time, the code ends up being
> 
>      A
>      if (!B)
>          D
> 
> and D cannot be written before A has been read, because B depends on A, and
> you cannot expose specutive writes before the preconditions have been
> evaluated.
> 
> > Remember, the original code was:
> 
> >          A
> >          if (!B)
> >                  C
> >          D
> 
> > So the execution of D is _not_ conditional, and it doesn't depend on A
> > or B.  (Again, CPUs don't make this distinction, but compilers do.)
> 
> Again, the above is nothing but confused bullshit.
> 
> D depends on B, which depends on A.
> 
> Really. Really really.
> 
> Anybody - or any tool - that doesn't see that is fundamentally wrong, and
> has been confused by syntax.
> 
> A *compiler* will very much also make that distinction. If it doesn't make
> that distinction, it's not a compiler, it's a buggy piece of shit.
> 
> Because semantics matter.
> 
> Think about it.
> 
>                   Linus
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: LKMM litmus test for Roman Penyaev's rcu-rr
  2018-05-30 18:38             ` Paul E. McKenney
@ 2018-05-30 19:08                 ` Alan Stern
  0 siblings, 0 replies; 45+ messages in thread
From: Alan Stern @ 2018-05-30 19:08 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Linus Torvalds, Linux Kernel Mailing List, linux-arch,
	andrea.parri, Will Deacon, Peter Zijlstra, Boqun Feng,
	Nick Piggin, David Howells, Jade Alglave, Luc Maranget,
	Akira Yokosawa, Ingo Molnar, Roman Pen

On Wed, 30 May 2018, Paul E. McKenney wrote:

> On Wed, May 30, 2018 at 09:59:28AM -0500, Linus Torvalds wrote:
> > On Wed, May 30, 2018 at 9:29 AM Alan Stern <stern@rowland.harvard.edu>
> > wrote:
> > 
> > > >
> > > > Can't we simplify the whole sequence as basically
> > > >
> > > >      A
> > > >      if (!B)
> > > >          D
> > > >
> > > > for that "not B" case, and just think about that. IOW, let's ignore the
> > > > whole "not executed" code.
> > 
> > > Your listing is slightly misleading.
> > 
> > No. You're confused.
> > 
> > You're confused because you're conflating two *entirely* different things.
> > 
> > You're conflating the static source code with the dynamic execution. They
> > are NOT THE SAME.
> 
> I am going to go out on a limb and assert that Linus is talking about
> what gcc and hardware do, while Alan is talking about what the tool and
> memory model do.

Indeed.  The very first line Linus quoted in his first reply to me
(elided above) was:

	Putting this into herd would be extremely difficult, if not impossible,
	because it involves analyzing code that was not executed.

It should be clear from this that I was talking about herd.  Not gcc or
real hardware.

(After rereading his message a few times, I'm not sure exactly what 
Linus was talking about...)

>  In a perfect world, these would be the same, but it
> appears that the world might not be perfect just now.
> 
> My current guess is that we need to change the memory-model tool.  If
> that is the case, here are some possible resolutions:
> 
> 1.	Make herd's C-language control dependencies work the same as its
> 	assembly language, so that they extend beyond the end of the
> 	"if" statement.  I believe that this would make Roman's case
> 	work, but it could claim that other situations are safe that
> 	are actually problematic due to compiler optimizations.
> 
> 	The fact that the model currently handles only READ_ONCE()
> 	and WRITE_ONCE() and not unmarked reads and writes make this
> 	option more attractive than it otherwise be, compilers not
> 	being allowed to reorder volatile accesses, but we are likely
> 	to introduce unmarked accesses sometime in the future.

Preserving the order of volatile accesses isn't sufficient.  The
compiler is still allowed to translate

	r1 = READ_ONCE(x);
	if (r1) {
		...
	}
	WRITE_ONCE(y, r2);

into something resembling

	r1 = READ_ONCE(x);
	WRITE_ONCE(y, r2);
	if (r1) {
		...
	}

(provided the "..." part doesn't contain any volatile accesses,
barriers, or anything affecting r2), which would destroy any run-time
control dependency.  The CPU could then execute the write before the
read.

> 2.	Like #1 above, but only if something in one of the "if"'s
> 	branches would prevent the compiler from reordering
> 	(smp_mb(), synchronize_rcu(), value-returning non-relaxed
> 	RMW atomic, ...).  Easy for me to say, but I am guessing
> 	that much violence would be needed to the tooling to make
> 	this work.  ;-)

This would be my preference.  But I'm afraid it isn't practical at the 
moment.

> If I understand Alan correctly, there is not an obvious way to make
> this change within the confines of the memory model's .bell and .cat
> files.

No way at all.  It would require significant changes to herd's internal 
workings and its external interface -- my original point.

Alan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: LKMM litmus test for Roman Penyaev's rcu-rr
@ 2018-05-30 19:08                 ` Alan Stern
  0 siblings, 0 replies; 45+ messages in thread
From: Alan Stern @ 2018-05-30 19:08 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Linus Torvalds, Linux Kernel Mailing List, linux-arch,
	andrea.parri, Will Deacon, Peter Zijlstra, Boqun Feng,
	Nick Piggin, David Howells, Jade Alglave, Luc Maranget,
	Akira Yokosawa, Ingo Molnar, Roman Pen

On Wed, 30 May 2018, Paul E. McKenney wrote:

> On Wed, May 30, 2018 at 09:59:28AM -0500, Linus Torvalds wrote:
> > On Wed, May 30, 2018 at 9:29 AM Alan Stern <stern@rowland.harvard.edu>
> > wrote:
> > 
> > > >
> > > > Can't we simplify the whole sequence as basically
> > > >
> > > >      A
> > > >      if (!B)
> > > >          D
> > > >
> > > > for that "not B" case, and just think about that. IOW, let's ignore the
> > > > whole "not executed" code.
> > 
> > > Your listing is slightly misleading.
> > 
> > No. You're confused.
> > 
> > You're confused because you're conflating two *entirely* different things.
> > 
> > You're conflating the static source code with the dynamic execution. They
> > are NOT THE SAME.
> 
> I am going to go out on a limb and assert that Linus is talking about
> what gcc and hardware do, while Alan is talking about what the tool and
> memory model do.

Indeed.  The very first line Linus quoted in his first reply to me
(elided above) was:

	Putting this into herd would be extremely difficult, if not impossible,
	because it involves analyzing code that was not executed.

It should be clear from this that I was talking about herd.  Not gcc or
real hardware.

(After rereading his message a few times, I'm not sure exactly what 
Linus was talking about...)

>  In a perfect world, these would be the same, but it
> appears that the world might not be perfect just now.
> 
> My current guess is that we need to change the memory-model tool.  If
> that is the case, here are some possible resolutions:
> 
> 1.	Make herd's C-language control dependencies work the same as its
> 	assembly language, so that they extend beyond the end of the
> 	"if" statement.  I believe that this would make Roman's case
> 	work, but it could claim that other situations are safe that
> 	are actually problematic due to compiler optimizations.
> 
> 	The fact that the model currently handles only READ_ONCE()
> 	and WRITE_ONCE() and not unmarked reads and writes make this
> 	option more attractive than it otherwise be, compilers not
> 	being allowed to reorder volatile accesses, but we are likely
> 	to introduce unmarked accesses sometime in the future.

Preserving the order of volatile accesses isn't sufficient.  The
compiler is still allowed to translate

	r1 = READ_ONCE(x);
	if (r1) {
		...
	}
	WRITE_ONCE(y, r2);

into something resembling

	r1 = READ_ONCE(x);
	WRITE_ONCE(y, r2);
	if (r1) {
		...
	}

(provided the "..." part doesn't contain any volatile accesses,
barriers, or anything affecting r2), which would destroy any run-time
control dependency.  The CPU could then execute the write before the
read.

> 2.	Like #1 above, but only if something in one of the "if"'s
> 	branches would prevent the compiler from reordering
> 	(smp_mb(), synchronize_rcu(), value-returning non-relaxed
> 	RMW atomic, ...).  Easy for me to say, but I am guessing
> 	that much violence would be needed to the tooling to make
> 	this work.  ;-)

This would be my preference.  But I'm afraid it isn't practical at the 
moment.

> If I understand Alan correctly, there is not an obvious way to make
> this change within the confines of the memory model's .bell and .cat
> files.

No way at all.  It would require significant changes to herd's internal 
workings and its external interface -- my original point.

Alan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: LKMM litmus test for Roman Penyaev's rcu-rr
  2018-05-30 19:08                 ` Alan Stern
  (?)
@ 2018-05-30 19:45                   ` Paul E. McKenney
  -1 siblings, 0 replies; 45+ messages in thread
From: Paul E. McKenney @ 2018-05-30 19:45 UTC (permalink / raw)
  To: Alan Stern
  Cc: Linus Torvalds, Linux Kernel Mailing List, linux-arch,
	andrea.parri, Will Deacon, Peter Zijlstra, Boqun Feng,
	Nick Piggin, David Howells, Jade Alglave, Luc Maranget,
	Akira Yokosawa, Ingo Molnar, Roman Pen

On Wed, May 30, 2018 at 03:08:43PM -0400, Alan Stern wrote:
> On Wed, 30 May 2018, Paul E. McKenney wrote:
> 
> > On Wed, May 30, 2018 at 09:59:28AM -0500, Linus Torvalds wrote:
> > > On Wed, May 30, 2018 at 9:29 AM Alan Stern <stern@rowland.harvard.edu>
> > > wrote:
> > > 
> > > > >
> > > > > Can't we simplify the whole sequence as basically
> > > > >
> > > > >      A
> > > > >      if (!B)
> > > > >          D
> > > > >
> > > > > for that "not B" case, and just think about that. IOW, let's ignore the
> > > > > whole "not executed" code.
> > > 
> > > > Your listing is slightly misleading.
> > > 
> > > No. You're confused.
> > > 
> > > You're confused because you're conflating two *entirely* different things.
> > > 
> > > You're conflating the static source code with the dynamic execution. They
> > > are NOT THE SAME.
> > 
> > I am going to go out on a limb and assert that Linus is talking about
> > what gcc and hardware do, while Alan is talking about what the tool and
> > memory model do.
> 
> Indeed.  The very first line Linus quoted in his first reply to me
> (elided above) was:
> 
> 	Putting this into herd would be extremely difficult, if not impossible,
> 	because it involves analyzing code that was not executed.
> 
> It should be clear from this that I was talking about herd.  Not gcc or
> real hardware.
> 
> (After rereading his message a few times, I'm not sure exactly what 
> Linus was talking about...)

>From what I could see, real compilers and real hardware.  ;-)

> >  In a perfect world, these would be the same, but it
> > appears that the world might not be perfect just now.
> > 
> > My current guess is that we need to change the memory-model tool.  If
> > that is the case, here are some possible resolutions:
> > 
> > 1.	Make herd's C-language control dependencies work the same as its
> > 	assembly language, so that they extend beyond the end of the
> > 	"if" statement.  I believe that this would make Roman's case
> > 	work, but it could claim that other situations are safe that
> > 	are actually problematic due to compiler optimizations.
> > 
> > 	The fact that the model currently handles only READ_ONCE()
> > 	and WRITE_ONCE() and not unmarked reads and writes make this
> > 	option more attractive than it otherwise be, compilers not
> > 	being allowed to reorder volatile accesses, but we are likely
> > 	to introduce unmarked accesses sometime in the future.
> 
> Preserving the order of volatile accesses isn't sufficient.  The
> compiler is still allowed to translate
> 
> 	r1 = READ_ONCE(x);
> 	if (r1) {
> 		...
> 	}
> 	WRITE_ONCE(y, r2);
> 
> into something resembling
> 
> 	r1 = READ_ONCE(x);
> 	WRITE_ONCE(y, r2);
> 	if (r1) {
> 		...
> 	}
> 
> (provided the "..." part doesn't contain any volatile accesses,
> barriers, or anything affecting r2), which would destroy any run-time
> control dependency.  The CPU could then execute the write before the
> read.

True, but almost all current litmus tests do have at least one volatile
access in their "if" statements.  I am guessing that this has the same
memory-model tooling issues as #2 below, but I am as usual happy to be
proven wrong.  ;-)

> > 2.	Like #1 above, but only if something in one of the "if"'s
> > 	branches would prevent the compiler from reordering
> > 	(smp_mb(), synchronize_rcu(), value-returning non-relaxed
> > 	RMW atomic, ...).  Easy for me to say, but I am guessing
> > 	that much violence would be needed to the tooling to make
> > 	this work.  ;-)
> 
> This would be my preference.  But I'm afraid it isn't practical at the 
> moment.

I bet that some combination of scripting and smallish modifications to
tooling could make it happen in reasonably short term.  Might be more
difficult to make something more future-proof, though, agreed.

> > If I understand Alan correctly, there is not an obvious way to make
> > this change within the confines of the memory model's .bell and .cat
> > files.
> 
> No way at all.  It would require significant changes to herd's internal 
> workings and its external interface -- my original point.

I was afraid of that.  ;-)

Though truth be told, I was expecting an issue like this to crop up
sooner rather than later, so I was actually getting a bit nervous
about the fact that it had not yet shown itself...

							Thanx, Paul

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: LKMM litmus test for Roman Penyaev's rcu-rr
@ 2018-05-30 19:45                   ` Paul E. McKenney
  0 siblings, 0 replies; 45+ messages in thread
From: Paul E. McKenney @ 2018-05-30 19:45 UTC (permalink / raw)
  To: Alan Stern
  Cc: Linus Torvalds, Linux Kernel Mailing List, linux-arch,
	andrea.parri, Will Deacon, Peter Zijlstra, Boqun Feng,
	Nick Piggin, David Howells, Jade Alglave, Luc Maranget,
	Akira Yokosawa, Ingo Molnar, Roman Pen

On Wed, May 30, 2018 at 03:08:43PM -0400, Alan Stern wrote:
> On Wed, 30 May 2018, Paul E. McKenney wrote:
> 
> > On Wed, May 30, 2018 at 09:59:28AM -0500, Linus Torvalds wrote:
> > > On Wed, May 30, 2018 at 9:29 AM Alan Stern <stern@rowland.harvard.edu>
> > > wrote:
> > > 
> > > > >
> > > > > Can't we simplify the whole sequence as basically
> > > > >
> > > > >      A
> > > > >      if (!B)
> > > > >          D
> > > > >
> > > > > for that "not B" case, and just think about that. IOW, let's ignore the
> > > > > whole "not executed" code.
> > > 
> > > > Your listing is slightly misleading.
> > > 
> > > No. You're confused.
> > > 
> > > You're confused because you're conflating two *entirely* different things.
> > > 
> > > You're conflating the static source code with the dynamic execution. They
> > > are NOT THE SAME.
> > 
> > I am going to go out on a limb and assert that Linus is talking about
> > what gcc and hardware do, while Alan is talking about what the tool and
> > memory model do.
> 
> Indeed.  The very first line Linus quoted in his first reply to me
> (elided above) was:
> 
> 	Putting this into herd would be extremely difficult, if not impossible,
> 	because it involves analyzing code that was not executed.
> 
> It should be clear from this that I was talking about herd.  Not gcc or
> real hardware.
> 
> (After rereading his message a few times, I'm not sure exactly what 
> Linus was talking about...)

From what I could see, real compilers and real hardware.  ;-)

> >  In a perfect world, these would be the same, but it
> > appears that the world might not be perfect just now.
> > 
> > My current guess is that we need to change the memory-model tool.  If
> > that is the case, here are some possible resolutions:
> > 
> > 1.	Make herd's C-language control dependencies work the same as its
> > 	assembly language, so that they extend beyond the end of the
> > 	"if" statement.  I believe that this would make Roman's case
> > 	work, but it could claim that other situations are safe that
> > 	are actually problematic due to compiler optimizations.
> > 
> > 	The fact that the model currently handles only READ_ONCE()
> > 	and WRITE_ONCE() and not unmarked reads and writes make this
> > 	option more attractive than it otherwise be, compilers not
> > 	being allowed to reorder volatile accesses, but we are likely
> > 	to introduce unmarked accesses sometime in the future.
> 
> Preserving the order of volatile accesses isn't sufficient.  The
> compiler is still allowed to translate
> 
> 	r1 = READ_ONCE(x);
> 	if (r1) {
> 		...
> 	}
> 	WRITE_ONCE(y, r2);
> 
> into something resembling
> 
> 	r1 = READ_ONCE(x);
> 	WRITE_ONCE(y, r2);
> 	if (r1) {
> 		...
> 	}
> 
> (provided the "..." part doesn't contain any volatile accesses,
> barriers, or anything affecting r2), which would destroy any run-time
> control dependency.  The CPU could then execute the write before the
> read.

True, but almost all current litmus tests do have at least one volatile
access in their "if" statements.  I am guessing that this has the same
memory-model tooling issues as #2 below, but I am as usual happy to be
proven wrong.  ;-)

> > 2.	Like #1 above, but only if something in one of the "if"'s
> > 	branches would prevent the compiler from reordering
> > 	(smp_mb(), synchronize_rcu(), value-returning non-relaxed
> > 	RMW atomic, ...).  Easy for me to say, but I am guessing
> > 	that much violence would be needed to the tooling to make
> > 	this work.  ;-)
> 
> This would be my preference.  But I'm afraid it isn't practical at the 
> moment.

I bet that some combination of scripting and smallish modifications to
tooling could make it happen in reasonably short term.  Might be more
difficult to make something more future-proof, though, agreed.

> > If I understand Alan correctly, there is not an obvious way to make
> > this change within the confines of the memory model's .bell and .cat
> > files.
> 
> No way at all.  It would require significant changes to herd's internal 
> workings and its external interface -- my original point.

I was afraid of that.  ;-)

Though truth be told, I was expecting an issue like this to crop up
sooner rather than later, so I was actually getting a bit nervous
about the fact that it had not yet shown itself...

							Thanx, Paul

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: LKMM litmus test for Roman Penyaev's rcu-rr
@ 2018-05-30 19:45                   ` Paul E. McKenney
  0 siblings, 0 replies; 45+ messages in thread
From: Paul E. McKenney @ 2018-05-30 19:45 UTC (permalink / raw)
  To: Alan Stern
  Cc: Linus Torvalds, Linux Kernel Mailing List, linux-arch,
	andrea.parri, Will Deacon, Peter Zijlstra, Boqun Feng,
	Nick Piggin, David Howells, Jade Alglave, Luc Maranget,
	Akira Yokosawa, Ingo Molnar, Roman Pen

On Wed, May 30, 2018 at 03:08:43PM -0400, Alan Stern wrote:
> On Wed, 30 May 2018, Paul E. McKenney wrote:
> 
> > On Wed, May 30, 2018 at 09:59:28AM -0500, Linus Torvalds wrote:
> > > On Wed, May 30, 2018 at 9:29 AM Alan Stern <stern@rowland.harvard.edu>
> > > wrote:
> > > 
> > > > >
> > > > > Can't we simplify the whole sequence as basically
> > > > >
> > > > >      A
> > > > >      if (!B)
> > > > >          D
> > > > >
> > > > > for that "not B" case, and just think about that. IOW, let's ignore the
> > > > > whole "not executed" code.
> > > 
> > > > Your listing is slightly misleading.
> > > 
> > > No. You're confused.
> > > 
> > > You're confused because you're conflating two *entirely* different things.
> > > 
> > > You're conflating the static source code with the dynamic execution. They
> > > are NOT THE SAME.
> > 
> > I am going to go out on a limb and assert that Linus is talking about
> > what gcc and hardware do, while Alan is talking about what the tool and
> > memory model do.
> 
> Indeed.  The very first line Linus quoted in his first reply to me
> (elided above) was:
> 
> 	Putting this into herd would be extremely difficult, if not impossible,
> 	because it involves analyzing code that was not executed.
> 
> It should be clear from this that I was talking about herd.  Not gcc or
> real hardware.
> 
> (After rereading his message a few times, I'm not sure exactly what 
> Linus was talking about...)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: LKMM litmus test for Roman Penyaev's rcu-rr
  2018-05-30 19:45                   ` Paul E. McKenney
@ 2018-05-30 20:28                     ` Alan Stern
  -1 siblings, 0 replies; 45+ messages in thread
From: Alan Stern @ 2018-05-30 20:28 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Linus Torvalds, Linux Kernel Mailing List, linux-arch,
	andrea.parri, Will Deacon, Peter Zijlstra, Boqun Feng,
	Nick Piggin, David Howells, Jade Alglave, Luc Maranget,
	Akira Yokosawa, Ingo Molnar, Roman Pen

On Wed, 30 May 2018, Paul E. McKenney wrote:

> > > My current guess is that we need to change the memory-model tool.  If
> > > that is the case, here are some possible resolutions:
> > > 
> > > 1.	Make herd's C-language control dependencies work the same as its
> > > 	assembly language, so that they extend beyond the end of the
> > > 	"if" statement.  I believe that this would make Roman's case
> > > 	work, but it could claim that other situations are safe that
> > > 	are actually problematic due to compiler optimizations.
> > > 
> > > 	The fact that the model currently handles only READ_ONCE()
> > > 	and WRITE_ONCE() and not unmarked reads and writes make this
> > > 	option more attractive than it otherwise be, compilers not
> > > 	being allowed to reorder volatile accesses, but we are likely
> > > 	to introduce unmarked accesses sometime in the future.
> > 
> > Preserving the order of volatile accesses isn't sufficient.  The
> > compiler is still allowed to translate
> > 
> > 	r1 = READ_ONCE(x);
> > 	if (r1) {
> > 		...
> > 	}
> > 	WRITE_ONCE(y, r2);
> > 
> > into something resembling
> > 
> > 	r1 = READ_ONCE(x);
> > 	WRITE_ONCE(y, r2);
> > 	if (r1) {
> > 		...
> > 	}
> > 
> > (provided the "..." part doesn't contain any volatile accesses,
> > barriers, or anything affecting r2), which would destroy any run-time
> > control dependency.  The CPU could then execute the write before the
> > read.
> 
> True, but almost all current litmus tests do have at least one volatile
> access in their "if" statements.  I am guessing that this has the same
> memory-model tooling issues as #2 below, but I am as usual happy to be
> proven wrong.  ;-)

It shouldn't be all that bad.  The dependencies are generated by herd,
meaning that the code would have to be changed to make control
dependencies extend beyond the ends of "if" statements.  I don't think
the necessary changes would be tremendously big, especially since the
LISA front end already behaves this way.

> > > 2.	Like #1 above, but only if something in one of the "if"'s
> > > 	branches would prevent the compiler from reordering
> > > 	(smp_mb(), synchronize_rcu(), value-returning non-relaxed
> > > 	RMW atomic, ...).  Easy for me to say, but I am guessing
> > > 	that much violence would be needed to the tooling to make
> > > 	this work.  ;-)
> > 
> > This would be my preference.  But I'm afraid it isn't practical at the 
> > moment.
> 
> I bet that some combination of scripting and smallish modifications to
> tooling could make it happen in reasonably short term.  Might be more
> difficult to make something more future-proof, though, agreed.

I have no idea what sort of scripting/smallish modifications could do 
the job.  You could ask Luc, if you're not afraid of giving him an 
aneurysm.  :-)

> > > If I understand Alan correctly, there is not an obvious way to make
> > > this change within the confines of the memory model's .bell and .cat
> > > files.
> > 
> > No way at all.  It would require significant changes to herd's internal 
> > workings and its external interface -- my original point.
> 
> I was afraid of that.  ;-)
> 
> Though truth be told, I was expecting an issue like this to crop up
> sooner rather than later, so I was actually getting a bit nervous
> about the fact that it had not yet shown itself...

The fact is, herd was never meant to act like a compiler.  Some 
disagreements are inevitable.

Alan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: LKMM litmus test for Roman Penyaev's rcu-rr
@ 2018-05-30 20:28                     ` Alan Stern
  0 siblings, 0 replies; 45+ messages in thread
From: Alan Stern @ 2018-05-30 20:28 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Linus Torvalds, Linux Kernel Mailing List, linux-arch,
	andrea.parri, Will Deacon, Peter Zijlstra, Boqun Feng,
	Nick Piggin, David Howells, Jade Alglave, Luc Maranget,
	Akira Yokosawa, Ingo Molnar, Roman Pen

On Wed, 30 May 2018, Paul E. McKenney wrote:

> > > My current guess is that we need to change the memory-model tool.  If
> > > that is the case, here are some possible resolutions:
> > > 
> > > 1.	Make herd's C-language control dependencies work the same as its
> > > 	assembly language, so that they extend beyond the end of the
> > > 	"if" statement.  I believe that this would make Roman's case
> > > 	work, but it could claim that other situations are safe that
> > > 	are actually problematic due to compiler optimizations.
> > > 
> > > 	The fact that the model currently handles only READ_ONCE()
> > > 	and WRITE_ONCE() and not unmarked reads and writes make this
> > > 	option more attractive than it otherwise be, compilers not
> > > 	being allowed to reorder volatile accesses, but we are likely
> > > 	to introduce unmarked accesses sometime in the future.
> > 
> > Preserving the order of volatile accesses isn't sufficient.  The
> > compiler is still allowed to translate
> > 
> > 	r1 = READ_ONCE(x);
> > 	if (r1) {
> > 		...
> > 	}
> > 	WRITE_ONCE(y, r2);
> > 
> > into something resembling
> > 
> > 	r1 = READ_ONCE(x);
> > 	WRITE_ONCE(y, r2);
> > 	if (r1) {
> > 		...
> > 	}
> > 
> > (provided the "..." part doesn't contain any volatile accesses,
> > barriers, or anything affecting r2), which would destroy any run-time
> > control dependency.  The CPU could then execute the write before the
> > read.
> 
> True, but almost all current litmus tests do have at least one volatile
> access in their "if" statements.  I am guessing that this has the same
> memory-model tooling issues as #2 below, but I am as usual happy to be
> proven wrong.  ;-)

It shouldn't be all that bad.  The dependencies are generated by herd,
meaning that the code would have to be changed to make control
dependencies extend beyond the ends of "if" statements.  I don't think
the necessary changes would be tremendously big, especially since the
LISA front end already behaves this way.

> > > 2.	Like #1 above, but only if something in one of the "if"'s
> > > 	branches would prevent the compiler from reordering
> > > 	(smp_mb(), synchronize_rcu(), value-returning non-relaxed
> > > 	RMW atomic, ...).  Easy for me to say, but I am guessing
> > > 	that much violence would be needed to the tooling to make
> > > 	this work.  ;-)
> > 
> > This would be my preference.  But I'm afraid it isn't practical at the 
> > moment.
> 
> I bet that some combination of scripting and smallish modifications to
> tooling could make it happen in reasonably short term.  Might be more
> difficult to make something more future-proof, though, agreed.

I have no idea what sort of scripting/smallish modifications could do 
the job.  You could ask Luc, if you're not afraid of giving him an 
aneurysm.  :-)

> > > If I understand Alan correctly, there is not an obvious way to make
> > > this change within the confines of the memory model's .bell and .cat
> > > files.
> > 
> > No way at all.  It would require significant changes to herd's internal 
> > workings and its external interface -- my original point.
> 
> I was afraid of that.  ;-)
> 
> Though truth be told, I was expecting an issue like this to crop up
> sooner rather than later, so I was actually getting a bit nervous
> about the fact that it had not yet shown itself...

The fact is, herd was never meant to act like a compiler.  Some 
disagreements are inevitable.

Alan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: LKMM litmus test for Roman Penyaev's rcu-rr
  2018-05-30 20:28                     ` Alan Stern
  (?)
@ 2018-05-30 21:49                     ` Paul E. McKenney
  -1 siblings, 0 replies; 45+ messages in thread
From: Paul E. McKenney @ 2018-05-30 21:49 UTC (permalink / raw)
  To: Alan Stern
  Cc: Linus Torvalds, Linux Kernel Mailing List, linux-arch,
	andrea.parri, Will Deacon, Peter Zijlstra, Boqun Feng,
	Nick Piggin, David Howells, Jade Alglave, Luc Maranget,
	Akira Yokosawa, Ingo Molnar, Roman Pen

On Wed, May 30, 2018 at 04:28:56PM -0400, Alan Stern wrote:
> On Wed, 30 May 2018, Paul E. McKenney wrote:
> 
> > > > My current guess is that we need to change the memory-model tool.  If
> > > > that is the case, here are some possible resolutions:
> > > > 
> > > > 1.	Make herd's C-language control dependencies work the same as its
> > > > 	assembly language, so that they extend beyond the end of the
> > > > 	"if" statement.  I believe that this would make Roman's case
> > > > 	work, but it could claim that other situations are safe that
> > > > 	are actually problematic due to compiler optimizations.
> > > > 
> > > > 	The fact that the model currently handles only READ_ONCE()
> > > > 	and WRITE_ONCE() and not unmarked reads and writes make this
> > > > 	option more attractive than it otherwise be, compilers not
> > > > 	being allowed to reorder volatile accesses, but we are likely
> > > > 	to introduce unmarked accesses sometime in the future.
> > > 
> > > Preserving the order of volatile accesses isn't sufficient.  The
> > > compiler is still allowed to translate
> > > 
> > > 	r1 = READ_ONCE(x);
> > > 	if (r1) {
> > > 		...
> > > 	}
> > > 	WRITE_ONCE(y, r2);
> > > 
> > > into something resembling
> > > 
> > > 	r1 = READ_ONCE(x);
> > > 	WRITE_ONCE(y, r2);
> > > 	if (r1) {
> > > 		...
> > > 	}
> > > 
> > > (provided the "..." part doesn't contain any volatile accesses,
> > > barriers, or anything affecting r2), which would destroy any run-time
> > > control dependency.  The CPU could then execute the write before the
> > > read.
> > 
> > True, but almost all current litmus tests do have at least one volatile
> > access in their "if" statements.  I am guessing that this has the same
> > memory-model tooling issues as #2 below, but I am as usual happy to be
> > proven wrong.  ;-)
> 
> It shouldn't be all that bad.  The dependencies are generated by herd,
> meaning that the code would have to be changed to make control
> dependencies extend beyond the ends of "if" statements.  I don't think
> the necessary changes would be tremendously big, especially since the
> LISA front end already behaves this way.

That was my thought.

> > > > 2.	Like #1 above, but only if something in one of the "if"'s
> > > > 	branches would prevent the compiler from reordering
> > > > 	(smp_mb(), synchronize_rcu(), value-returning non-relaxed
> > > > 	RMW atomic, ...).  Easy for me to say, but I am guessing
> > > > 	that much violence would be needed to the tooling to make
> > > > 	this work.  ;-)
> > > 
> > > This would be my preference.  But I'm afraid it isn't practical at the 
> > > moment.
> > 
> > I bet that some combination of scripting and smallish modifications to
> > tooling could make it happen in reasonably short term.  Might be more
> > difficult to make something more future-proof, though, agreed.
> 
> I have no idea what sort of scripting/smallish modifications could do 
> the job.  You could ask Luc, if you're not afraid of giving him an 
> aneurysm.  :-)

Two "if" keywords, one that carries control dependencies past the end
of the "if" (assembly-language style) and another that does not.  The
script then rewrites the "if" statements to one style or the other
depending on the contents of the "then" and "else" clauses.

> > > > If I understand Alan correctly, there is not an obvious way to make
> > > > this change within the confines of the memory model's .bell and .cat
> > > > files.
> > > 
> > > No way at all.  It would require significant changes to herd's internal 
> > > workings and its external interface -- my original point.
> > 
> > I was afraid of that.  ;-)
> > 
> > Though truth be told, I was expecting an issue like this to crop up
> > sooner rather than later, so I was actually getting a bit nervous
> > about the fact that it had not yet shown itself...
> 
> The fact is, herd was never meant to act like a compiler.  Some 
> disagreements are inevitable.

Agreed, the bit about identical stores at the beginnings of "then"
and "else" is one example.  But we should fix the ones that are more
straightforward.

						Thanx, Paul

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: LKMM litmus test for Roman Penyaev's rcu-rr
  2018-05-30 19:08                 ` Alan Stern
  (?)
  (?)
@ 2018-05-30 22:01                 ` Linus Torvalds
  2018-05-30 23:14                   ` Paul E. McKenney
  -1 siblings, 1 reply; 45+ messages in thread
From: Linus Torvalds @ 2018-05-30 22:01 UTC (permalink / raw)
  To: Alan Stern
  Cc: Paul McKenney, Linux Kernel Mailing List, linux-arch,
	andrea.parri, Will Deacon, Peter Zijlstra, Boqun Feng,
	Nick Piggin, David Howells, Jade Alglave, Luc Maranget,
	Akira Yokosawa, Ingo Molnar, Roman Pen

On Wed, May 30, 2018 at 2:08 PM Alan Stern <stern@rowland.harvard.edu> wrote:
>
> Indeed.  The very first line Linus quoted in his first reply to me
> (elided above) was:
>
>         Putting this into herd would be extremely difficult, if not impossible,
>         because it involves analyzing code that was not executed.
>
> It should be clear from this that I was talking about herd.  Not gcc or
> real hardware.

So what does herd actually work on? The source code or the executable,
or a trace?

I found the herd paper, but I'm on the road helping my daughter in
college move, and I don't have the background to skim the paper
quickly and come up with the obvious answer, so I'l just ask.

Because I really think that from our memory model standpoint, we
really do have the rule that

   load -> cond -> store

is ordered - even if the store address and store data is in no way
dependent on the load. The only thing that matters is that there's a
conditional that is dependent on the load in between the load and the
store.

Note that this is *independent* of how you get to the store. It
doesn't matter  if it's a fallthrough conditional jump or  a taken
conditional jump, or whether  there is a joining.

The only thing that *does* matter is whether the conditional can be
turned into a "select" statement. If the conditional can be turned
into a data dependency, then the ordering goes away. That is why it
was relevant whether "C" contained a barrier or not (it doesn't even
have to be a *memory* barrier, it just has to be a barrier for code
generation).

Note that the "C doesn't even have to have a memory barrier" is
important, because the orderin from load->cond->store doesn't actually
have anything to do with any memory ordering imposed by C, it's much
more fundamental than that.

> Preserving the order of volatile accesses isn't sufficient.  The
> compiler is still allowed to translate
>
>         r1 = READ_ONCE(x);
>         if (r1) {
>                 ...
>         }
>         WRITE_ONCE(y, r2);
>
> into something resembling
>
>         r1 = READ_ONCE(x);
>         WRITE_ONCE(y, r2);
>         if (r1) {
>                 ...
>         }

Correct.

What matter is that the code in C (now called "..." above ;^) has a
build-time barrier that means that the compiler cannot do that.

That barrier can be pretty much anything. The important part is
literally that there's a guarantee that the write cannot  be migrated
below the conditional.

But I don't know what level 'herd' works on. If it doesn't see
compiler barriers (eg our own "barrier()" macro that literally is just
that), only sees the generated code, then it really has no other
information than what he compiler _happened_ to do - it doesn't know
if the compiler did the store after the conditional because it *had*
to do so, or whether it was just a random instruction scheduling
decision.

            Linus

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: LKMM litmus test for Roman Penyaev's rcu-rr
  2018-05-30 22:01                 ` Linus Torvalds
@ 2018-05-30 23:14                   ` Paul E. McKenney
  2018-05-31 14:27                       ` Alan Stern
  0 siblings, 1 reply; 45+ messages in thread
From: Paul E. McKenney @ 2018-05-30 23:14 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alan Stern, Linux Kernel Mailing List, linux-arch, andrea.parri,
	Will Deacon, Peter Zijlstra, Boqun Feng, Nick Piggin,
	David Howells, Jade Alglave, Luc Maranget, Akira Yokosawa,
	Ingo Molnar, Roman Pen

On Wed, May 30, 2018 at 05:01:01PM -0500, Linus Torvalds wrote:
> On Wed, May 30, 2018 at 2:08 PM Alan Stern <stern@rowland.harvard.edu> wrote:
> >
> > Indeed.  The very first line Linus quoted in his first reply to me
> > (elided above) was:
> >
> >         Putting this into herd would be extremely difficult, if not impossible,
> >         because it involves analyzing code that was not executed.
> >
> > It should be clear from this that I was talking about herd.  Not gcc or
> > real hardware.
> 
> So what does herd actually work on? The source code or the executable,
> or a trace?

The source code, that is, the source code of the litmus test.
There are definitions for the various Linux operations, partly within
the herd tool and partly in the linux.def file in tools/memory-model.
The problem we are having is nailing down control dependencies and
compiler optimizations.  The bit about limiting control dependencies
through the end of "if" statement itself works well in a great many cases,
but this clearly is not one of them.

> I found the herd paper, but I'm on the road helping my daughter in
> college move, and I don't have the background to skim the paper
> quickly and come up with the obvious answer, so I'l just ask.

It is not a short learning curve.

> Because I really think that from our memory model standpoint, we
> really do have the rule that
> 
>    load -> cond -> store
> 
> is ordered - even if the store address and store data is in no way
> dependent on the load. The only thing that matters is that there's a
> conditional that is dependent on the load in between the load and the
> store.
> 
> Note that this is *independent* of how you get to the store. It
> doesn't matter  if it's a fallthrough conditional jump or  a taken
> conditional jump, or whether  there is a joining.
> 
> The only thing that *does* matter is whether the conditional can be
> turned into a "select" statement. If the conditional can be turned
> into a data dependency, then the ordering goes away. That is why it
> was relevant whether "C" contained a barrier or not (it doesn't even
> have to be a *memory* barrier, it just has to be a barrier for code
> generation).

Another situation is something like this:

	r1 = READ_ONCE(x);
	if (r1)
		r2 = r1 * 3 - 1;
	else
		r2 = 42;
	WRITE_ONCE(y, 5);
	do_something(r2);

Because there are no compiler barriers or volatile accesses in either
the "then" clause or the "else" clause, the compiler is within its
rights to do this:

	r1 = READ_ONCE(x);
	WRITE_ONCE(y, 5);
	if (r1)
		r2 = r1 * 3 - 1;
	else
		r2 = 42;
	do_something(r2);

Then weakly ordered CPUs can reorder the READ_ONCE() and WRITE_ONCE().
The compiler of course cannot because they are both volatile accesses.

> Note that the "C doesn't even have to have a memory barrier" is
> important, because the orderin from load->cond->store doesn't actually
> have anything to do with any memory ordering imposed by C, it's much
> more fundamental than that.

If there were volatiles or compiler barriers in the "then" or "else"
clauses, or if the store itself was there (and there weren't identical
stores in both), then agreed, the compiler cannot do much, nor can
the hardware.

> > Preserving the order of volatile accesses isn't sufficient.  The
> > compiler is still allowed to translate
> >
> >         r1 = READ_ONCE(x);
> >         if (r1) {
> >                 ...
> >         }
> >         WRITE_ONCE(y, r2);
> >
> > into something resembling
> >
> >         r1 = READ_ONCE(x);
> >         WRITE_ONCE(y, r2);
> >         if (r1) {
> >                 ...
> >         }
> 
> Correct.
> 
> What matter is that the code in C (now called "..." above ;^) has a
> build-time barrier that means that the compiler cannot do that.
> 
> That barrier can be pretty much anything. The important part is
> literally that there's a guarantee that the write cannot  be migrated
> below the conditional.
> 
> But I don't know what level 'herd' works on. If it doesn't see
> compiler barriers (eg our own "barrier()" macro that literally is just
> that), only sees the generated code, then it really has no other
> information than what he compiler _happened_ to do - it doesn't know
> if the compiler did the store after the conditional because it *had*
> to do so, or whether it was just a random instruction scheduling
> decision.

The 'herd' tool works on litmus-test source, and has almost no idea of
what compilers get up to.  In this particular case, it would be better
off being even more ignorant of what compilers get up to.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: LKMM litmus test for Roman Penyaev's rcu-rr
  2018-05-30 23:14                   ` Paul E. McKenney
@ 2018-05-31 14:27                       ` Alan Stern
  0 siblings, 0 replies; 45+ messages in thread
From: Alan Stern @ 2018-05-31 14:27 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Linus Torvalds, Linux Kernel Mailing List, linux-arch,
	andrea.parri, Will Deacon, Peter Zijlstra, Boqun Feng,
	Nick Piggin, David Howells, Jade Alglave, Luc Maranget,
	Akira Yokosawa, Ingo Molnar, Roman Pen

On Wed, 30 May 2018, Paul E. McKenney wrote:

> On Wed, May 30, 2018 at 05:01:01PM -0500, Linus Torvalds wrote:
> > On Wed, May 30, 2018 at 2:08 PM Alan Stern <stern@rowland.harvard.edu> wrote:
> > >
> > > Indeed.  The very first line Linus quoted in his first reply to me
> > > (elided above) was:
> > >
> > >         Putting this into herd would be extremely difficult, if not impossible,
> > >         because it involves analyzing code that was not executed.
> > >
> > > It should be clear from this that I was talking about herd.  Not gcc or
> > > real hardware.
> > 
> > So what does herd actually work on? The source code or the executable,
> > or a trace?
> 
> The source code, that is, the source code of the litmus test.
> There are definitions for the various Linux operations, partly within
> the herd tool and partly in the linux.def file in tools/memory-model.
> The problem we are having is nailing down control dependencies and
> compiler optimizations.  The bit about limiting control dependencies
> through the end of "if" statement itself works well in a great many cases,
> but this clearly is not one of them.
> 
> > I found the herd paper, but I'm on the road helping my daughter in
> > college move, and I don't have the background to skim the paper
> > quickly and come up with the obvious answer, so I'l just ask.
> 
> It is not a short learning curve.
> 
> > Because I really think that from our memory model standpoint, we
> > really do have the rule that
> > 
> >    load -> cond -> store
> > 
> > is ordered - even if the store address and store data is in no way
> > dependent on the load. The only thing that matters is that there's a
> > conditional that is dependent on the load in between the load and the
> > store.

This is true for all the architectures supported by the Linux kernel.  
However, in the future it might not always hold.  I can imagine that
CPU designers would want to include an optimization that checks a
conditional branch to see if it skips over only the following
instruction (and the following instruction can't affect the flow of
control); in that situation, they might decide there doesn't need to be
a control dependency to future stores.  Such an optimization would not
violate the C11 memory model.

Of course, this is purely theoretical.  And if anybody does decide to
try doing that, the memory-model people might scream at them so hard
they change their minds.  :-)

...

> > But I don't know what level 'herd' works on. If it doesn't see
> > compiler barriers (eg our own "barrier()" macro that literally is just
> > that), only sees the generated code, then it really has no other
> > information than what he compiler _happened_ to do - it doesn't know
> > if the compiler did the store after the conditional because it *had*
> > to do so, or whether it was just a random instruction scheduling
> > decision.
> 
> The 'herd' tool works on litmus-test source, and has almost no idea of
> what compilers get up to.  In this particular case, it would be better
> off being even more ignorant of what compilers get up to.  ;-)

In more detail, herd analyzes the source code and constructs a set of
all possible candidate executions.  herd then checks each execution to
see if it violates the rules of the memory model as expressed in the
.bell and .cat files.  If any of the non-violating executions satisfies
the litmus test's final "exists" condition, herd reports that the test
is allowed.

The point here is that when herd checks any one particular execution, 
it pays attention _only_ to the statements which are part of that 
execution.  Statements belonging to an untaken "if" branch are ignored 
completely.  That's why I think it would be difficult to make herd do 
exactly what we want in this case.

Alan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: LKMM litmus test for Roman Penyaev's rcu-rr
@ 2018-05-31 14:27                       ` Alan Stern
  0 siblings, 0 replies; 45+ messages in thread
From: Alan Stern @ 2018-05-31 14:27 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Linus Torvalds, Linux Kernel Mailing List, linux-arch,
	andrea.parri, Will Deacon, Peter Zijlstra, Boqun Feng,
	Nick Piggin, David Howells, Jade Alglave, Luc Maranget,
	Akira Yokosawa, Ingo Molnar, Roman Pen

On Wed, 30 May 2018, Paul E. McKenney wrote:

> On Wed, May 30, 2018 at 05:01:01PM -0500, Linus Torvalds wrote:
> > On Wed, May 30, 2018 at 2:08 PM Alan Stern <stern@rowland.harvard.edu> wrote:
> > >
> > > Indeed.  The very first line Linus quoted in his first reply to me
> > > (elided above) was:
> > >
> > >         Putting this into herd would be extremely difficult, if not impossible,
> > >         because it involves analyzing code that was not executed.
> > >
> > > It should be clear from this that I was talking about herd.  Not gcc or
> > > real hardware.
> > 
> > So what does herd actually work on? The source code or the executable,
> > or a trace?
> 
> The source code, that is, the source code of the litmus test.
> There are definitions for the various Linux operations, partly within
> the herd tool and partly in the linux.def file in tools/memory-model.
> The problem we are having is nailing down control dependencies and
> compiler optimizations.  The bit about limiting control dependencies
> through the end of "if" statement itself works well in a great many cases,
> but this clearly is not one of them.
> 
> > I found the herd paper, but I'm on the road helping my daughter in
> > college move, and I don't have the background to skim the paper
> > quickly and come up with the obvious answer, so I'l just ask.
> 
> It is not a short learning curve.
> 
> > Because I really think that from our memory model standpoint, we
> > really do have the rule that
> > 
> >    load -> cond -> store
> > 
> > is ordered - even if the store address and store data is in no way
> > dependent on the load. The only thing that matters is that there's a
> > conditional that is dependent on the load in between the load and the
> > store.

This is true for all the architectures supported by the Linux kernel.  
However, in the future it might not always hold.  I can imagine that
CPU designers would want to include an optimization that checks a
conditional branch to see if it skips over only the following
instruction (and the following instruction can't affect the flow of
control); in that situation, they might decide there doesn't need to be
a control dependency to future stores.  Such an optimization would not
violate the C11 memory model.

Of course, this is purely theoretical.  And if anybody does decide to
try doing that, the memory-model people might scream at them so hard
they change their minds.  :-)

...

> > But I don't know what level 'herd' works on. If it doesn't see
> > compiler barriers (eg our own "barrier()" macro that literally is just
> > that), only sees the generated code, then it really has no other
> > information than what he compiler _happened_ to do - it doesn't know
> > if the compiler did the store after the conditional because it *had*
> > to do so, or whether it was just a random instruction scheduling
> > decision.
> 
> The 'herd' tool works on litmus-test source, and has almost no idea of
> what compilers get up to.  In this particular case, it would be better
> off being even more ignorant of what compilers get up to.  ;-)

In more detail, herd analyzes the source code and constructs a set of
all possible candidate executions.  herd then checks each execution to
see if it violates the rules of the memory model as expressed in the
.bell and .cat files.  If any of the non-violating executions satisfies
the litmus test's final "exists" condition, herd reports that the test
is allowed.

The point here is that when herd checks any one particular execution, 
it pays attention _only_ to the statements which are part of that 
execution.  Statements belonging to an untaken "if" branch are ignored 
completely.  That's why I think it would be difficult to make herd do 
exactly what we want in this case.

Alan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: LKMM litmus test for Roman Penyaev's rcu-rr
  2018-05-31 14:27                       ` Alan Stern
  (?)
@ 2018-06-02 14:44                       ` Paul E. McKenney
  2018-06-04 14:17                           ` Alan Stern
  -1 siblings, 1 reply; 45+ messages in thread
From: Paul E. McKenney @ 2018-06-02 14:44 UTC (permalink / raw)
  To: Alan Stern
  Cc: Linus Torvalds, Linux Kernel Mailing List, linux-arch,
	andrea.parri, Will Deacon, Peter Zijlstra, Boqun Feng,
	Nick Piggin, David Howells, Jade Alglave, Luc Maranget,
	Akira Yokosawa, Ingo Molnar, Roman Pen

On Thu, May 31, 2018 at 10:27:33AM -0400, Alan Stern wrote:
> On Wed, 30 May 2018, Paul E. McKenney wrote:
> 
> > On Wed, May 30, 2018 at 05:01:01PM -0500, Linus Torvalds wrote:
> > > On Wed, May 30, 2018 at 2:08 PM Alan Stern <stern@rowland.harvard.edu> wrote:
> > > >
> > > > Indeed.  The very first line Linus quoted in his first reply to me
> > > > (elided above) was:
> > > >
> > > >         Putting this into herd would be extremely difficult, if not impossible,
> > > >         because it involves analyzing code that was not executed.
> > > >
> > > > It should be clear from this that I was talking about herd.  Not gcc or
> > > > real hardware.
> > > 
> > > So what does herd actually work on? The source code or the executable,
> > > or a trace?
> > 
> > The source code, that is, the source code of the litmus test.
> > There are definitions for the various Linux operations, partly within
> > the herd tool and partly in the linux.def file in tools/memory-model.
> > The problem we are having is nailing down control dependencies and
> > compiler optimizations.  The bit about limiting control dependencies
> > through the end of "if" statement itself works well in a great many cases,
> > but this clearly is not one of them.
> > 
> > > I found the herd paper, but I'm on the road helping my daughter in
> > > college move, and I don't have the background to skim the paper
> > > quickly and come up with the obvious answer, so I'l just ask.
> > 
> > It is not a short learning curve.
> > 
> > > Because I really think that from our memory model standpoint, we
> > > really do have the rule that
> > > 
> > >    load -> cond -> store
> > > 
> > > is ordered - even if the store address and store data is in no way
> > > dependent on the load. The only thing that matters is that there's a
> > > conditional that is dependent on the load in between the load and the
> > > store.
> 
> This is true for all the architectures supported by the Linux kernel.  
> However, in the future it might not always hold.  I can imagine that
> CPU designers would want to include an optimization that checks a
> conditional branch to see if it skips over only the following
> instruction (and the following instruction can't affect the flow of
> control); in that situation, they might decide there doesn't need to be
> a control dependency to future stores.  Such an optimization would not
> violate the C11 memory model.
> 
> Of course, this is purely theoretical.  And if anybody does decide to
> try doing that, the memory-model people might scream at them so hard
> they change their minds.  :-)
> 
> ...
> 
> > > But I don't know what level 'herd' works on. If it doesn't see
> > > compiler barriers (eg our own "barrier()" macro that literally is just
> > > that), only sees the generated code, then it really has no other
> > > information than what he compiler _happened_ to do - it doesn't know
> > > if the compiler did the store after the conditional because it *had*
> > > to do so, or whether it was just a random instruction scheduling
> > > decision.
> > 
> > The 'herd' tool works on litmus-test source, and has almost no idea of
> > what compilers get up to.  In this particular case, it would be better
> > off being even more ignorant of what compilers get up to.  ;-)
> 
> In more detail, herd analyzes the source code and constructs a set of
> all possible candidate executions.  herd then checks each execution to
> see if it violates the rules of the memory model as expressed in the
> .bell and .cat files.  If any of the non-violating executions satisfies
> the litmus test's final "exists" condition, herd reports that the test
> is allowed.
> 
> The point here is that when herd checks any one particular execution, 
> it pays attention _only_ to the statements which are part of that 
> execution.  Statements belonging to an untaken "if" branch are ignored 
> completely.  That's why I think it would be difficult to make herd do 
> exactly what we want in this case.

One crude but effective workaround is to replicate the code following the
"if" statement into both legs of the "if" statement.  This has the effect
of extending the control dependency to cover all of the code that used to
follow the "if" statement, leveraging herd's current limited knowledge of
compiler optimization.  This workaround would of course be hopeless for
general Linux-kernel code, but should be at least semi-acceptable for the
very small snippets of code that can be accommodated within litmus tests.

Please see the litmus test shown below, which uses this workaround,
allowing the smp_store_release() to be downgraded to WRITE_ONCE().

Given this workaround, crude though it might be, I believe that we can
take a more measured approach to identifying a longer-term solution.

Thoughts?

							Thanx, Paul
					
			
	
------------------------------------------------------------------------

C C-RomanPenyaev-list-rcu-rr-WA

(*
 * The same as C-RomanPenyaev-list-rcu-rr.litmus, but working around herd's
 * having just the wrong level of understanding of compiler optimizations
 * for that particular litmus test.  The trick is to replicate the code
 * following the "if" statement into both legs of that "if" statement.
 *)

{
	int *z=1; (* List: v->w->x->y->z. Noncircular, but long enough. *)
	int *y=z;
	int *x=y;
	int *w=x;
	int *v=w; (* List head is v. *)
	int *c=w; (* Cache, emulating ppcpu_con. *)
}

P0(int *c, int *v, int *w, int *x, int *y)
{
	rcu_assign_pointer(*w, y); /* Remove x from list. */
	synchronize_rcu();
	r1 = READ_ONCE(*c);
	if (r1 == x) {
		WRITE_ONCE(*c, 0); /* Invalidate cache. */
		synchronize_rcu();
		WRITE_ONCE(*x, 0);  /* Emulate kfree(x). */
	} else {
		WRITE_ONCE(*x, 0);  /* Emulate kfree(x). */
	}
}

P1(int *c, int *v)
{
	rcu_read_lock();
	r1 = READ_ONCE(*c); /* Pick up cache. */
	if (r1 == 0) {
		r1 = READ_ONCE(*v); /* Cache empty, start from head. */
	}
	r2 = rcu_dereference(*r1); /* Advance to next element. */
	smp_store_release(c, r2); /* Update cache. */
	rcu_read_unlock();

	/* And repeat. */
	rcu_read_lock();
	r3 = READ_ONCE(*c);
	if (r3 == 0) {
		r3 = READ_ONCE(*v);
	}
	r4 = rcu_dereference(*r3);
	smp_store_release(c, r4);
	rcu_read_unlock();
}

locations [0:r1; 1:r1; 1:r3; c; v; w; x; y]
exists (1:r1=0 \/ 1:r2=0 \/ 1:r3=0 \/ 1:r4=0) (* Better not be freed!!! *)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: LKMM litmus test for Roman Penyaev's rcu-rr
  2018-06-02 14:44                       ` Paul E. McKenney
@ 2018-06-04 14:17                           ` Alan Stern
  0 siblings, 0 replies; 45+ messages in thread
From: Alan Stern @ 2018-06-04 14:17 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Linus Torvalds, Linux Kernel Mailing List, linux-arch,
	andrea.parri, Will Deacon, Peter Zijlstra, Boqun Feng,
	Nick Piggin, David Howells, Jade Alglave, Luc Maranget,
	Akira Yokosawa, Ingo Molnar, Roman Pen

On Sat, 2 Jun 2018, Paul E. McKenney wrote:

> One crude but effective workaround is to replicate the code following the
> "if" statement into both legs of the "if" statement.  This has the effect
> of extending the control dependency to cover all of the code that used to
> follow the "if" statement, leveraging herd's current limited knowledge of
> compiler optimization.  This workaround would of course be hopeless for
> general Linux-kernel code, but should be at least semi-acceptable for the
> very small snippets of code that can be accommodated within litmus tests.
> 
> Please see the litmus test shown below, which uses this workaround,
> allowing the smp_store_release() to be downgraded to WRITE_ONCE().
> 
> Given this workaround, crude though it might be, I believe that we can
> take a more measured approach to identifying a longer-term solution.
> 
> Thoughts?

Yes, this works, although it is clearly just a stopgap.  And obviously
it can't be applied in situations where one of the legs of the "if"  
statement contains a non-trivial branch.

In the long run, I don't think this problem is solvable.  At least, not 
for all cases.  It requires too much guesswork about what optimizations 
a compiler might do.

Alan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: LKMM litmus test for Roman Penyaev's rcu-rr
@ 2018-06-04 14:17                           ` Alan Stern
  0 siblings, 0 replies; 45+ messages in thread
From: Alan Stern @ 2018-06-04 14:17 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Linus Torvalds, Linux Kernel Mailing List, linux-arch,
	andrea.parri, Will Deacon, Peter Zijlstra, Boqun Feng,
	Nick Piggin, David Howells, Jade Alglave, Luc Maranget,
	Akira Yokosawa, Ingo Molnar, Roman Pen

On Sat, 2 Jun 2018, Paul E. McKenney wrote:

> One crude but effective workaround is to replicate the code following the
> "if" statement into both legs of the "if" statement.  This has the effect
> of extending the control dependency to cover all of the code that used to
> follow the "if" statement, leveraging herd's current limited knowledge of
> compiler optimization.  This workaround would of course be hopeless for
> general Linux-kernel code, but should be at least semi-acceptable for the
> very small snippets of code that can be accommodated within litmus tests.
> 
> Please see the litmus test shown below, which uses this workaround,
> allowing the smp_store_release() to be downgraded to WRITE_ONCE().
> 
> Given this workaround, crude though it might be, I believe that we can
> take a more measured approach to identifying a longer-term solution.
> 
> Thoughts?

Yes, this works, although it is clearly just a stopgap.  And obviously
it can't be applied in situations where one of the legs of the "if"  
statement contains a non-trivial branch.

In the long run, I don't think this problem is solvable.  At least, not 
for all cases.  It requires too much guesswork about what optimizations 
a compiler might do.

Alan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: LKMM litmus test for Roman Penyaev's rcu-rr
  2018-06-04 14:17                           ` Alan Stern
  (?)
@ 2018-06-04 16:01                           ` Paul E. McKenney
  -1 siblings, 0 replies; 45+ messages in thread
From: Paul E. McKenney @ 2018-06-04 16:01 UTC (permalink / raw)
  To: Alan Stern
  Cc: Linus Torvalds, Linux Kernel Mailing List, linux-arch,
	andrea.parri, Will Deacon, Peter Zijlstra, Boqun Feng,
	Nick Piggin, David Howells, Jade Alglave, Luc Maranget,
	Akira Yokosawa, Ingo Molnar, Roman Pen

On Mon, Jun 04, 2018 at 10:17:47AM -0400, Alan Stern wrote:
> On Sat, 2 Jun 2018, Paul E. McKenney wrote:
> 
> > One crude but effective workaround is to replicate the code following the
> > "if" statement into both legs of the "if" statement.  This has the effect
> > of extending the control dependency to cover all of the code that used to
> > follow the "if" statement, leveraging herd's current limited knowledge of
> > compiler optimization.  This workaround would of course be hopeless for
> > general Linux-kernel code, but should be at least semi-acceptable for the
> > very small snippets of code that can be accommodated within litmus tests.
> > 
> > Please see the litmus test shown below, which uses this workaround,
> > allowing the smp_store_release() to be downgraded to WRITE_ONCE().
> > 
> > Given this workaround, crude though it might be, I believe that we can
> > take a more measured approach to identifying a longer-term solution.
> > 
> > Thoughts?
> 
> Yes, this works, although it is clearly just a stopgap.  And obviously
> it can't be applied in situations where one of the legs of the "if"  
> statement contains a non-trivial branch.

A non-trivial branch as in a goto?  If so, that could be problematic
depending on exactly what herd does in that case.

> In the long run, I don't think this problem is solvable.  At least, not 
> for all cases.  It requires too much guesswork about what optimizations 
> a compiler might do.

I agree that it cannot be perfect, but I bet that we can come extremely
close.  But yes, we need to think carefully about nested "if" statements
at the very least.  And there might well be some other way to get in
trouble.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: LKMM litmus test for Roman Penyaev's rcu-rr
  2018-05-30 19:08                 ` Alan Stern
                                   ` (2 preceding siblings ...)
  (?)
@ 2018-06-06  9:40                 ` Roman Penyaev
  2018-06-06 13:54                     ` Alan Stern
  2018-06-06 19:07                   ` Paul E. McKenney
  -1 siblings, 2 replies; 45+ messages in thread
From: Roman Penyaev @ 2018-06-06  9:40 UTC (permalink / raw)
  To: Alan Stern
  Cc: Paul E. McKenney, Linus Torvalds, Linux Kernel Mailing List,
	linux-arch, andrea.parri, Will Deacon, Peter Zijlstra,
	Boqun Feng, Nick Piggin, David Howells, Jade Alglave,
	Luc Maranget, Akira Yokosawa, Ingo Molnar

On Wed, May 30, 2018 at 9:08 PM, Alan Stern <stern@rowland.harvard.edu> wrote:
> On Wed, 30 May 2018, Paul E. McKenney wrote:
>
>> On Wed, May 30, 2018 at 09:59:28AM -0500, Linus Torvalds wrote:
>> > On Wed, May 30, 2018 at 9:29 AM Alan Stern <stern@rowland.harvard.edu>
>> > wrote:
>> >
>> > > >
>> > > > Can't we simplify the whole sequence as basically
>> > > >
>> > > >      A
>> > > >      if (!B)
>> > > >          D
>> > > >
>> > > > for that "not B" case, and just think about that. IOW, let's ignore the
>> > > > whole "not executed" code.
>> >
>> > > Your listing is slightly misleading.
>> >
>> > No. You're confused.
>> >
>> > You're confused because you're conflating two *entirely* different things.
>> >
>> > You're conflating the static source code with the dynamic execution. They
>> > are NOT THE SAME.
>>
>> I am going to go out on a limb and assert that Linus is talking about
>> what gcc and hardware do, while Alan is talking about what the tool and
>> memory model do.
>
> Indeed.  The very first line Linus quoted in his first reply to me
> (elided above) was:
>
>         Putting this into herd would be extremely difficult, if not impossible,
>         because it involves analyzing code that was not executed.
>
> It should be clear from this that I was talking about herd.  Not gcc or
> real hardware.
>
> (After rereading his message a few times, I'm not sure exactly what
> Linus was talking about...)
>
>>  In a perfect world, these would be the same, but it
>> appears that the world might not be perfect just now.
>>
>> My current guess is that we need to change the memory-model tool.  If
>> that is the case, here are some possible resolutions:
>>
>> 1.    Make herd's C-language control dependencies work the same as its
>>       assembly language, so that they extend beyond the end of the
>>       "if" statement.  I believe that this would make Roman's case
>>       work, but it could claim that other situations are safe that
>>       are actually problematic due to compiler optimizations.
>>
>>       The fact that the model currently handles only READ_ONCE()
>>       and WRITE_ONCE() and not unmarked reads and writes make this
>>       option more attractive than it otherwise be, compilers not
>>       being allowed to reorder volatile accesses, but we are likely
>>       to introduce unmarked accesses sometime in the future.
>
> Preserving the order of volatile accesses isn't sufficient.  The
> compiler is still allowed to translate
>
>         r1 = READ_ONCE(x);
>         if (r1) {
>                 ...
>         }
>         WRITE_ONCE(y, r2);
>
> into something resembling
>
>         r1 = READ_ONCE(x);
>         WRITE_ONCE(y, r2);
>         if (r1) {
>                 ...
>         }

Hi Alan,

According to the standard C99 Annex C "the controlling expression of
a selection statement (if or switch)" are the sequence points, just
like a volatile access (READ_ONCE or WRITE_ONCE).

"5.1.2.3 Program execution" states:
"At certain specified points in the execution sequence called sequence
points, all side effects of previous evaluations shall be complete
and no side effects of subsequent evaluations shall have taken place."

So in the example we have 3 sequence points: "READ_ONCE", "if" and
"WRITE_ONCE", which it seems can't be reordered.  Am I mistaken
interpreting standard?  Could you please clarify.

--
Roman

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: LKMM litmus test for Roman Penyaev's rcu-rr
  2018-06-06  9:40                 ` Roman Penyaev
@ 2018-06-06 13:54                     ` Alan Stern
  2018-06-06 19:07                   ` Paul E. McKenney
  1 sibling, 0 replies; 45+ messages in thread
From: Alan Stern @ 2018-06-06 13:54 UTC (permalink / raw)
  To: Roman Penyaev
  Cc: Paul E. McKenney, Linus Torvalds, Linux Kernel Mailing List,
	linux-arch, andrea.parri, Will Deacon, Peter Zijlstra,
	Boqun Feng, Nick Piggin, David Howells, Jade Alglave,
	Luc Maranget, Akira Yokosawa, Ingo Molnar

On Wed, 6 Jun 2018, Roman Penyaev wrote:

> > Preserving the order of volatile accesses isn't sufficient.  The
> > compiler is still allowed to translate
> >
> >         r1 = READ_ONCE(x);
> >         if (r1) {
> >                 ...
> >         }
> >         WRITE_ONCE(y, r2);
> >
> > into something resembling
> >
> >         r1 = READ_ONCE(x);
> >         WRITE_ONCE(y, r2);
> >         if (r1) {
> >                 ...
> >         }
> 
> Hi Alan,
> 
> According to the standard C99 Annex C "the controlling expression of
> a selection statement (if or switch)" are the sequence points, just
> like a volatile access (READ_ONCE or WRITE_ONCE).
> 
> "5.1.2.3 Program execution" states:
> "At certain specified points in the execution sequence called sequence
> points, all side effects of previous evaluations shall be complete
> and no side effects of subsequent evaluations shall have taken place."
> 
> So in the example we have 3 sequence points: "READ_ONCE", "if" and
> "WRITE_ONCE", which it seems can't be reordered.  Am I mistaken
> interpreting standard?  Could you please clarify.

Well, for one thing, we're talking about C11, not C99.

For another, as far as I understand it, the standard means the program
should behave _as if_ the side effects are completed in the order
stated.  It doesn't mean that the generated code has to behave that way
literally.  And in particular, the standard is referring to the
behavior of a single thread, not the interaction between multiple
concurrent threads.

Alan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: LKMM litmus test for Roman Penyaev's rcu-rr
@ 2018-06-06 13:54                     ` Alan Stern
  0 siblings, 0 replies; 45+ messages in thread
From: Alan Stern @ 2018-06-06 13:54 UTC (permalink / raw)
  To: Roman Penyaev
  Cc: Paul E. McKenney, Linus Torvalds, Linux Kernel Mailing List,
	linux-arch, andrea.parri, Will Deacon, Peter Zijlstra,
	Boqun Feng, Nick Piggin, David Howells, Jade Alglave,
	Luc Maranget, Akira Yokosawa, Ingo Molnar

On Wed, 6 Jun 2018, Roman Penyaev wrote:

> > Preserving the order of volatile accesses isn't sufficient.  The
> > compiler is still allowed to translate
> >
> >         r1 = READ_ONCE(x);
> >         if (r1) {
> >                 ...
> >         }
> >         WRITE_ONCE(y, r2);
> >
> > into something resembling
> >
> >         r1 = READ_ONCE(x);
> >         WRITE_ONCE(y, r2);
> >         if (r1) {
> >                 ...
> >         }
> 
> Hi Alan,
> 
> According to the standard C99 Annex C "the controlling expression of
> a selection statement (if or switch)" are the sequence points, just
> like a volatile access (READ_ONCE or WRITE_ONCE).
> 
> "5.1.2.3 Program execution" states:
> "At certain specified points in the execution sequence called sequence
> points, all side effects of previous evaluations shall be complete
> and no side effects of subsequent evaluations shall have taken place."
> 
> So in the example we have 3 sequence points: "READ_ONCE", "if" and
> "WRITE_ONCE", which it seems can't be reordered.  Am I mistaken
> interpreting standard?  Could you please clarify.

Well, for one thing, we're talking about C11, not C99.

For another, as far as I understand it, the standard means the program
should behave _as if_ the side effects are completed in the order
stated.  It doesn't mean that the generated code has to behave that way
literally.  And in particular, the standard is referring to the
behavior of a single thread, not the interaction between multiple
concurrent threads.

Alan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: LKMM litmus test for Roman Penyaev's rcu-rr
  2018-06-06 13:54                     ` Alan Stern
  (?)
@ 2018-06-06 14:41                     ` Roman Penyaev
  2018-06-06 15:55                         ` Alan Stern
  -1 siblings, 1 reply; 45+ messages in thread
From: Roman Penyaev @ 2018-06-06 14:41 UTC (permalink / raw)
  To: Alan Stern
  Cc: Paul E. McKenney, Linus Torvalds, Linux Kernel Mailing List,
	linux-arch, andrea.parri, Will Deacon, Peter Zijlstra,
	Boqun Feng, Nick Piggin, David Howells, Jade Alglave,
	Luc Maranget, Akira Yokosawa, Ingo Molnar

On Wed, Jun 6, 2018 at 3:54 PM, Alan Stern <stern@rowland.harvard.edu> wrote:
> On Wed, 6 Jun 2018, Roman Penyaev wrote:
>
>> > Preserving the order of volatile accesses isn't sufficient.  The
>> > compiler is still allowed to translate
>> >
>> >         r1 = READ_ONCE(x);
>> >         if (r1) {
>> >                 ...
>> >         }
>> >         WRITE_ONCE(y, r2);
>> >
>> > into something resembling
>> >
>> >         r1 = READ_ONCE(x);
>> >         WRITE_ONCE(y, r2);
>> >         if (r1) {
>> >                 ...
>> >         }
>>
>> Hi Alan,
>>
>> According to the standard C99 Annex C "the controlling expression of
>> a selection statement (if or switch)" are the sequence points, just
>> like a volatile access (READ_ONCE or WRITE_ONCE).
>>
>> "5.1.2.3 Program execution" states:
>> "At certain specified points in the execution sequence called sequence
>> points, all side effects of previous evaluations shall be complete
>> and no side effects of subsequent evaluations shall have taken place."
>>
>> So in the example we have 3 sequence points: "READ_ONCE", "if" and
>> "WRITE_ONCE", which it seems can't be reordered.  Am I mistaken
>> interpreting standard?  Could you please clarify.
>
> Well, for one thing, we're talking about C11, not C99.

C11 is a n1570, ISO/IEC 9899:2011 ? (according to wiki).  Found pdf on
the web contains similar lines, so should not be any differences for
that particular case.

> For another, as far as I understand it, the standard means the program
> should behave _as if_ the side effects are completed in the order
> stated.  It doesn't mean that the generated code has to behave that way
> literally.

Then I do not understand what are the differences between "side effects
are completed" and "code generated".  Abstract machine state should
provide some guarantees between sequence points, e.g.:

    foo();    /* function call */
    ------------|
    *a = 1;     |
    *b = 12;    | Compiler in his right to reorder.
    *c = 123;   |
    ------------|
    boo();    /* function call */

compiler in his right to reorder memory accesses between foo() and
boo() calls (foo and boo are sequence points, but memory accesses
are not), but:

   foo();    /* function call */
   *(volatile int *)a = 1;
   *(volatile int *)b = 12;
   *(volatile int *)c = 123;
   boo();    /* function call */

are all sequence points, so compiler can't reorder them.

Where am I mistaken?

> And in particular, the standard is referring to the
> behavior of a single thread, not the interaction between multiple
> concurrent threads.

Yes, that is clear: we are talking about code reordering in one
particular function in a single threaded environment.

--
Roman

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: LKMM litmus test for Roman Penyaev's rcu-rr
  2018-06-06 14:41                     ` Roman Penyaev
@ 2018-06-06 15:55                         ` Alan Stern
  0 siblings, 0 replies; 45+ messages in thread
From: Alan Stern @ 2018-06-06 15:55 UTC (permalink / raw)
  To: Roman Penyaev
  Cc: Paul E. McKenney, Linus Torvalds, Linux Kernel Mailing List,
	linux-arch, andrea.parri, Will Deacon, Peter Zijlstra,
	Boqun Feng, Nick Piggin, David Howells, Jade Alglave,
	Luc Maranget, Akira Yokosawa, Ingo Molnar

On Wed, 6 Jun 2018, Roman Penyaev wrote:

> On Wed, Jun 6, 2018 at 3:54 PM, Alan Stern <stern@rowland.harvard.edu> wrote:
> > On Wed, 6 Jun 2018, Roman Penyaev wrote:
> >
> >> > Preserving the order of volatile accesses isn't sufficient.  The
> >> > compiler is still allowed to translate
> >> >
> >> >         r1 = READ_ONCE(x);
> >> >         if (r1) {
> >> >                 ...
> >> >         }
> >> >         WRITE_ONCE(y, r2);
> >> >
> >> > into something resembling
> >> >
> >> >         r1 = READ_ONCE(x);
> >> >         WRITE_ONCE(y, r2);
> >> >         if (r1) {
> >> >                 ...
> >> >         }
> >>
> >> Hi Alan,
> >>
> >> According to the standard C99 Annex C "the controlling expression of
> >> a selection statement (if or switch)" are the sequence points, just
> >> like a volatile access (READ_ONCE or WRITE_ONCE).
> >>
> >> "5.1.2.3 Program execution" states:
> >> "At certain specified points in the execution sequence called sequence
> >> points, all side effects of previous evaluations shall be complete
> >> and no side effects of subsequent evaluations shall have taken place."
> >>
> >> So in the example we have 3 sequence points: "READ_ONCE", "if" and
> >> "WRITE_ONCE", which it seems can't be reordered.  Am I mistaken
> >> interpreting standard?  Could you please clarify.
> >
> > Well, for one thing, we're talking about C11, not C99.
> 
> C11 is a n1570, ISO/IEC 9899:2011 ? (according to wiki).  Found pdf on
> the web contains similar lines, so should not be any differences for
> that particular case.
> 
> > For another, as far as I understand it, the standard means the program
> > should behave _as if_ the side effects are completed in the order
> > stated.  It doesn't mean that the generated code has to behave that way
> > literally.
> 
> Then I do not understand what are the differences between "side effects
> are completed" and "code generated".  Abstract machine state should
> provide some guarantees between sequence points, e.g.:
> 
>     foo();    /* function call */
>     ------------|
>     *a = 1;     |
>     *b = 12;    | Compiler in his right to reorder.
>     *c = 123;   |
>     ------------|
>     boo();    /* function call */
> 
> compiler in his right to reorder memory accesses between foo() and
> boo() calls (foo and boo are sequence points, but memory accesses
> are not), but:
> 
>    foo();    /* function call */
>    *(volatile int *)a = 1;
>    *(volatile int *)b = 12;
>    *(volatile int *)c = 123;
>    boo();    /* function call */
> 
> are all sequence points, so compiler can't reorder them.
> 
> Where am I mistaken?

You are right so far.  Now suppose one of the sequence points is an
"if" statement:

	foo();		// sequence point
	*a = 1;
	if (*b)		// sequence point
		*c = 2;
	*d = 3;
	boo();		// sequence point

The object code has to behave _as though_ the write to *a was completed 
before the write to *d begins, because of the sequence point in 
between.  But nevertheless, the object code is allowed to be like this:

	foo();
	*d = 3;
	*a = 1;
	if (*b)
		*c = 2;
	boo();

because the overall effect is the same, if all you are concerned with
is this single thread.  (And assuming that none of the pointers alias
each other.)

The compiler must not move the "*d = 3" write up before the call to
foo(), because foo() might read the value of *d and then it would get
an incorrect answer.  Moving the write before the "if" statement and
the assignment to *a is okay, however, because those statements would
do the same thing no matter what value was stored in *d.

Alan

> > And in particular, the standard is referring to the
> > behavior of a single thread, not the interaction between multiple
> > concurrent threads.
> 
> Yes, that is clear: we are talking about code reordering in one
> particular function in a single threaded environment.
> 
> --
> Roman

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: LKMM litmus test for Roman Penyaev's rcu-rr
@ 2018-06-06 15:55                         ` Alan Stern
  0 siblings, 0 replies; 45+ messages in thread
From: Alan Stern @ 2018-06-06 15:55 UTC (permalink / raw)
  To: Roman Penyaev
  Cc: Paul E. McKenney, Linus Torvalds, Linux Kernel Mailing List,
	linux-arch, andrea.parri, Will Deacon, Peter Zijlstra,
	Boqun Feng, Nick Piggin, David Howells, Jade Alglave,
	Luc Maranget, Akira Yokosawa, Ingo Molnar

On Wed, 6 Jun 2018, Roman Penyaev wrote:

> On Wed, Jun 6, 2018 at 3:54 PM, Alan Stern <stern@rowland.harvard.edu> wrote:
> > On Wed, 6 Jun 2018, Roman Penyaev wrote:
> >
> >> > Preserving the order of volatile accesses isn't sufficient.  The
> >> > compiler is still allowed to translate
> >> >
> >> >         r1 = READ_ONCE(x);
> >> >         if (r1) {
> >> >                 ...
> >> >         }
> >> >         WRITE_ONCE(y, r2);
> >> >
> >> > into something resembling
> >> >
> >> >         r1 = READ_ONCE(x);
> >> >         WRITE_ONCE(y, r2);
> >> >         if (r1) {
> >> >                 ...
> >> >         }
> >>
> >> Hi Alan,
> >>
> >> According to the standard C99 Annex C "the controlling expression of
> >> a selection statement (if or switch)" are the sequence points, just
> >> like a volatile access (READ_ONCE or WRITE_ONCE).
> >>
> >> "5.1.2.3 Program execution" states:
> >> "At certain specified points in the execution sequence called sequence
> >> points, all side effects of previous evaluations shall be complete
> >> and no side effects of subsequent evaluations shall have taken place."
> >>
> >> So in the example we have 3 sequence points: "READ_ONCE", "if" and
> >> "WRITE_ONCE", which it seems can't be reordered.  Am I mistaken
> >> interpreting standard?  Could you please clarify.
> >
> > Well, for one thing, we're talking about C11, not C99.
> 
> C11 is a n1570, ISO/IEC 9899:2011 ? (according to wiki).  Found pdf on
> the web contains similar lines, so should not be any differences for
> that particular case.
> 
> > For another, as far as I understand it, the standard means the program
> > should behave _as if_ the side effects are completed in the order
> > stated.  It doesn't mean that the generated code has to behave that way
> > literally.
> 
> Then I do not understand what are the differences between "side effects
> are completed" and "code generated".  Abstract machine state should
> provide some guarantees between sequence points, e.g.:
> 
>     foo();    /* function call */
>     ------------|
>     *a = 1;     |
>     *b = 12;    | Compiler in his right to reorder.
>     *c = 123;   |
>     ------------|
>     boo();    /* function call */
> 
> compiler in his right to reorder memory accesses between foo() and
> boo() calls (foo and boo are sequence points, but memory accesses
> are not), but:
> 
>    foo();    /* function call */
>    *(volatile int *)a = 1;
>    *(volatile int *)b = 12;
>    *(volatile int *)c = 123;
>    boo();    /* function call */
> 
> are all sequence points, so compiler can't reorder them.
> 
> Where am I mistaken?

You are right so far.  Now suppose one of the sequence points is an
"if" statement:

	foo();		// sequence point
	*a = 1;
	if (*b)		// sequence point
		*c = 2;
	*d = 3;
	boo();		// sequence point

The object code has to behave _as though_ the write to *a was completed 
before the write to *d begins, because of the sequence point in 
between.  But nevertheless, the object code is allowed to be like this:

	foo();
	*d = 3;
	*a = 1;
	if (*b)
		*c = 2;
	boo();

because the overall effect is the same, if all you are concerned with
is this single thread.  (And assuming that none of the pointers alias
each other.)

The compiler must not move the "*d = 3" write up before the call to
foo(), because foo() might read the value of *d and then it would get
an incorrect answer.  Moving the write before the "if" statement and
the assignment to *a is okay, however, because those statements would
do the same thing no matter what value was stored in *d.

Alan

> > And in particular, the standard is referring to the
> > behavior of a single thread, not the interaction between multiple
> > concurrent threads.
> 
> Yes, that is clear: we are talking about code reordering in one
> particular function in a single threaded environment.
> 
> --
> Roman

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: LKMM litmus test for Roman Penyaev's rcu-rr
  2018-06-06  9:40                 ` Roman Penyaev
  2018-06-06 13:54                     ` Alan Stern
@ 2018-06-06 19:07                   ` Paul E. McKenney
  2018-06-06 19:23                     ` Linus Torvalds
  1 sibling, 1 reply; 45+ messages in thread
From: Paul E. McKenney @ 2018-06-06 19:07 UTC (permalink / raw)
  To: Roman Penyaev
  Cc: Alan Stern, Linus Torvalds, Linux Kernel Mailing List,
	linux-arch, andrea.parri, Will Deacon, Peter Zijlstra,
	Boqun Feng, Nick Piggin, David Howells, Jade Alglave,
	Luc Maranget, Akira Yokosawa, Ingo Molnar

On Wed, Jun 06, 2018 at 11:40:13AM +0200, Roman Penyaev wrote:
> On Wed, May 30, 2018 at 9:08 PM, Alan Stern <stern@rowland.harvard.edu> wrote:
> > On Wed, 30 May 2018, Paul E. McKenney wrote:
> >
> >> On Wed, May 30, 2018 at 09:59:28AM -0500, Linus Torvalds wrote:
> >> > On Wed, May 30, 2018 at 9:29 AM Alan Stern <stern@rowland.harvard.edu>
> >> > wrote:
> >> >
> >> > > >
> >> > > > Can't we simplify the whole sequence as basically
> >> > > >
> >> > > >      A
> >> > > >      if (!B)
> >> > > >          D
> >> > > >
> >> > > > for that "not B" case, and just think about that. IOW, let's ignore the
> >> > > > whole "not executed" code.
> >> >
> >> > > Your listing is slightly misleading.
> >> >
> >> > No. You're confused.
> >> >
> >> > You're confused because you're conflating two *entirely* different things.
> >> >
> >> > You're conflating the static source code with the dynamic execution. They
> >> > are NOT THE SAME.
> >>
> >> I am going to go out on a limb and assert that Linus is talking about
> >> what gcc and hardware do, while Alan is talking about what the tool and
> >> memory model do.
> >
> > Indeed.  The very first line Linus quoted in his first reply to me
> > (elided above) was:
> >
> >         Putting this into herd would be extremely difficult, if not impossible,
> >         because it involves analyzing code that was not executed.
> >
> > It should be clear from this that I was talking about herd.  Not gcc or
> > real hardware.
> >
> > (After rereading his message a few times, I'm not sure exactly what
> > Linus was talking about...)
> >
> >>  In a perfect world, these would be the same, but it
> >> appears that the world might not be perfect just now.
> >>
> >> My current guess is that we need to change the memory-model tool.  If
> >> that is the case, here are some possible resolutions:
> >>
> >> 1.    Make herd's C-language control dependencies work the same as its
> >>       assembly language, so that they extend beyond the end of the
> >>       "if" statement.  I believe that this would make Roman's case
> >>       work, but it could claim that other situations are safe that
> >>       are actually problematic due to compiler optimizations.
> >>
> >>       The fact that the model currently handles only READ_ONCE()
> >>       and WRITE_ONCE() and not unmarked reads and writes make this
> >>       option more attractive than it otherwise be, compilers not
> >>       being allowed to reorder volatile accesses, but we are likely
> >>       to introduce unmarked accesses sometime in the future.
> >
> > Preserving the order of volatile accesses isn't sufficient.  The
> > compiler is still allowed to translate
> >
> >         r1 = READ_ONCE(x);
> >         if (r1) {
> >                 ...
> >         }
> >         WRITE_ONCE(y, r2);
> >
> > into something resembling
> >
> >         r1 = READ_ONCE(x);
> >         WRITE_ONCE(y, r2);
> >         if (r1) {
> >                 ...
> >         }
> 
> Hi Alan,
> 
> According to the standard C99 Annex C "the controlling expression of
> a selection statement (if or switch)" are the sequence points, just
> like a volatile access (READ_ONCE or WRITE_ONCE).
> 
> "5.1.2.3 Program execution" states:
> "At certain specified points in the execution sequence called sequence
> points, all side effects of previous evaluations shall be complete
> and no side effects of subsequent evaluations shall have taken place."
> 
> So in the example we have 3 sequence points: "READ_ONCE", "if" and
> "WRITE_ONCE", which it seems can't be reordered.  Am I mistaken
> interpreting standard?  Could you please clarify.

The additional piece of information is the "as if" rule.  If the compiler
can prove that the visible effects of running the program will be the
same with and without a given optimization, or "as if" the optimization
had not been applied, then it is allowed to apply the optimization.

The thing that prevents the compiler from using the "as if" rule in your
example is the call to synchronize_rcu(), which could do pretty much
anything, as far as the compiler can tell.  We use barrier() in many of
the atomic operations and memory barriers for the same reason.

But what are our options?  Here are the ones I can see at the moment:

1.	Status quo.  This works reasonably well, but we have already
	seen that your scenario makes it ask for more synchronization
	than necessary.  Of course, asking for a bit too much is better
	than asking for a bit too little, but I hope that we can do
	better than that.  (Hey, it seemed like a good idea at the
	time!)

	This includes the workaround, so status quo is OK for the
	near future.

2.	Make the C-language notion of control dependencies match the
	assembly-language notion.  I believe that this would work
	well, but it might break when we introduce unmarked accesses.
	It might be the case that such breakage only happens when
	there is a data race, but I am not certain of this.  I can
	dream, can't I?  ;-)

3.	Introduce a new marking/attribute in the .def file that indicates
	whether an access is volatile or implies a compiler barrier.
	This might allow herd to be more selective about control dependencies,
	for example, extending them past the end of "if" statements
	containing compiler barriers.

	One tricky aspect of this approach is working out what the
	compiler can do to the "if" statement.	We definitely do not
	want to put the complexity of all possible compilers into herd!

4.	A similar effect could be achieved by running the litmus test
	through a script that does transformations similar to the
	workaround.

5.	We could use some pragmatic heuristic in herd, but allow
	alternative "if" keywords that force control dependencies
	to extend past the end of the "if" statement on the one hand
	or to be limited to the "if" statement on the other.

6.	Your ideas here!


							Thanx, Paul

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: LKMM litmus test for Roman Penyaev's rcu-rr
  2018-06-06 19:07                   ` Paul E. McKenney
@ 2018-06-06 19:23                     ` Linus Torvalds
  2018-06-07  9:43                       ` Paul E. McKenney
  0 siblings, 1 reply; 45+ messages in thread
From: Linus Torvalds @ 2018-06-06 19:23 UTC (permalink / raw)
  To: Paul McKenney
  Cc: Roman Pen, Alan Stern, Linux Kernel Mailing List, linux-arch,
	andrea.parri, Will Deacon, Peter Zijlstra, Boqun Feng,
	Nick Piggin, David Howells, Jade Alglave, Luc Maranget,
	Akira Yokosawa, Ingo Molnar

On Wed, Jun 6, 2018 at 12:05 PM Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
>
> 3.      Introduce a new marking/attribute in the .def file that indicates
>         whether an access is volatile or implies a compiler barrier.
>         This might allow herd to be more selective about control dependencies,
>         for example, extending them past the end of "if" statements
>         containing compiler barriers.
>
>         One tricky aspect of this approach is working out what the
>         compiler can do to the "if" statement.  We definitely do not
>         want to put the complexity of all possible compilers into herd!

This _smells_ like the right thing to do.

Aren't the litmus-tests effectively always going to be using READ_ONCE
etc volatile accesses anyway?

Most of what the LKMM litmus tests test for is things with side effects, no?

And honestly, that is exactly the kind of litmus test behavior we
*want* our memory model to have, in that any CPU designer (or compiler
designer) that uses our LKMM litmus tests should very much be aware of
the fact that we expect a conditional->store memory ordering to be a
ordering.

We have real code that depends on it, so I think LKMM should expose
those ordering requirements.

I'm also perfectly happy with no markings at all: all memory accesses
are "voiatile" as fat as C is concerned, and cannot be moved around by
the compiler at all - and all that LKMM tests is memory _ordering_,
not "compiler can do X".

Because I think your option 1. is absolutely against everything we
want to happen:

> 1.      Status quo.  This works reasonably well, but we have already
>         seen that your scenario makes it ask for more synchronization
>         than necessary.

We absolutely do *not* want CPU designers etc thinking that we'll add
insane synchronization.

We were already burned by insane bad CPU design once in the form of
the garbage that alpha desigers gave us.

I am personally not at all interested in seeing our memory ordering
rules be "nice". They should be as tight as possible, and *not* allow
any crazy shit that some insane person can come up with. No more
"dependent reads out of ordetr" garbage, and no more "store done
before the condition it depends on" garbage.

A CPU designer (or a C++ memory ordering person) who tries to sneak
shit like that past us should be shunned, and not taken seriously.

And our memory ordering rules should be very explicit about it, so
that people don't even _try_ to do insane things.

I want any memory ordering litmus tests to say "we depend on this, so
as a CPU designer don't mess it up, because then we won't run on the
resulting crap".

I'm not in the least interested in the LKMM litmus tests being an
excuse for unnecessarily weak memory ordering. That's the *opposite*
of what I would want any litmus tests to do.

If people are looking to use them that way, then I'm going to remove
them, because such litmus tests are not in the best interest of the
kernel.

Future CPU designs need to be *saner*, not perpetuate the kind of
garbage insanity we have seen so far.

                 Linus

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: LKMM litmus test for Roman Penyaev's rcu-rr
  2018-06-06 19:23                     ` Linus Torvalds
@ 2018-06-07  9:43                       ` Paul E. McKenney
  2018-06-07 14:57                           ` Alan Stern
  2018-06-07 15:06                         ` Linus Torvalds
  0 siblings, 2 replies; 45+ messages in thread
From: Paul E. McKenney @ 2018-06-07  9:43 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Roman Pen, Alan Stern, Linux Kernel Mailing List, linux-arch,
	andrea.parri, Will Deacon, Peter Zijlstra, Boqun Feng,
	Nick Piggin, David Howells, Jade Alglave, Luc Maranget,
	Akira Yokosawa, Ingo Molnar

On Wed, Jun 06, 2018 at 12:23:33PM -0700, Linus Torvalds wrote:
> On Wed, Jun 6, 2018 at 12:05 PM Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> >
> > 3.      Introduce a new marking/attribute in the .def file that indicates
> >         whether an access is volatile or implies a compiler barrier.
> >         This might allow herd to be more selective about control dependencies,
> >         for example, extending them past the end of "if" statements
> >         containing compiler barriers.
> >
> >         One tricky aspect of this approach is working out what the
> >         compiler can do to the "if" statement.  We definitely do not
> >         want to put the complexity of all possible compilers into herd!
> 
> This _smells_ like the right thing to do.
> 
> Aren't the litmus-tests effectively always going to be using READ_ONCE
> etc volatile accesses anyway?

Yes, right now LKMM handles only volatile (READ_ONCE and WRITE_ONCE)
or stronger.

> Most of what the LKMM litmus tests test for is things with side effects, no?
> 
> And honestly, that is exactly the kind of litmus test behavior we
> *want* our memory model to have, in that any CPU designer (or compiler
> designer) that uses our LKMM litmus tests should very much be aware of
> the fact that we expect a conditional->store memory ordering to be a
> ordering.
> 
> We have real code that depends on it, so I think LKMM should expose
> those ordering requirements.

Good point.  And I did manage to advocate for control dependencies at an
academic conference earlier this week without the need for a bodyguard,
so perhaps things are improving.  ;-)

> I'm also perfectly happy with no markings at all: all memory accesses
> are "voiatile" as fat as C is concerned, and cannot be moved around by
> the compiler at all - and all that LKMM tests is memory _ordering_,
> not "compiler can do X".

We are considering adding unmarked accesses, for example, accesses
protected by locks.  One possible litmus test (not yet supported!)
might look like this:

	C Z6.0+pooncelock+pooncelock+pombonce

	{}

	P0(int *x, int *y, spinlock_t *mylock)
	{
		spin_lock(mylock);
		WRITE_ONCE(*x, 1);
		y = 1;
		spin_unlock(mylock);
	}

	P1(int *y, int *z, spinlock_t *mylock)
	{
		int r0;

		spin_lock(mylock);
		r0 = y;
		WRITE_ONCE(*z, 1);
		spin_unlock(mylock);
	}

	P2(int *x, int *z)
	{
		int r1;

		WRITE_ONCE(*z, 2);
		smp_mb();
		r1 = READ_ONCE(x);
	}

	exists (1:r0=1 /\ z=2 /\ 2:r1=0)

Because y is only ever accessed under a lock, the compiler cannot do
anything to mess it up.  In this particular case, the compiler would
have a hard time messing up the other accesses, aside from extremely
unfriendly measures such as load/store tearing, but we have seen
Linux-kernel examples where the compiler could very reasonably repeat
and fuse loads and stores and so on.

Thoughts?

> Because I think your option 1. is absolutely against everything we
> want to happen:
> 
> > 1.      Status quo.  This works reasonably well, but we have already
> >         seen that your scenario makes it ask for more synchronization
> >         than necessary.
> 
> We absolutely do *not* want CPU designers etc thinking that we'll add
> insane synchronization.
> 
> We were already burned by insane bad CPU design once in the form of
> the garbage that alpha desigers gave us.
> 
> I am personally not at all interested in seeing our memory ordering
> rules be "nice". They should be as tight as possible, and *not* allow
> any crazy shit that some insane person can come up with. No more
> "dependent reads out of ordetr" garbage, and no more "store done
> before the condition it depends on" garbage.
> 
> A CPU designer (or a C++ memory ordering person) who tries to sneak
> shit like that past us should be shunned, and not taken seriously.

Given that I am writing this in a C++ Standards Committee meeting,
I do like the call for sanity.

> And our memory ordering rules should be very explicit about it, so
> that people don't even _try_ to do insane things.
> 
> I want any memory ordering litmus tests to say "we depend on this, so
> as a CPU designer don't mess it up, because then we won't run on the
> resulting crap".
> 
> I'm not in the least interested in the LKMM litmus tests being an
> excuse for unnecessarily weak memory ordering. That's the *opposite*
> of what I would want any litmus tests to do.
> 
> If people are looking to use them that way, then I'm going to remove
> them, because such litmus tests are not in the best interest of the
> kernel.
> 
> Future CPU designs need to be *saner*, not perpetuate the kind of
> garbage insanity we have seen so far.

OK, I think we all agree that we need to move away from status quo.
The workaround buys us some time, but the need to move is undisputed.
(If someone does dispute it, this would be a great time for them to make
their concerns known!)

I will take a hard look at option #3, and see if there any hidden gotchas.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: LKMM litmus test for Roman Penyaev's rcu-rr
  2018-06-07  9:43                       ` Paul E. McKenney
@ 2018-06-07 14:57                           ` Alan Stern
  2018-06-07 15:06                         ` Linus Torvalds
  1 sibling, 0 replies; 45+ messages in thread
From: Alan Stern @ 2018-06-07 14:57 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Linus Torvalds, Roman Pen, Linux Kernel Mailing List, linux-arch,
	andrea.parri, Will Deacon, Peter Zijlstra, Boqun Feng,
	Nick Piggin, David Howells, Jade Alglave, Luc Maranget,
	Akira Yokosawa, Ingo Molnar

On Thu, 7 Jun 2018, Paul E. McKenney wrote:

> On Wed, Jun 06, 2018 at 12:23:33PM -0700, Linus Torvalds wrote:
> > On Wed, Jun 6, 2018 at 12:05 PM Paul E. McKenney
> > <paulmck@linux.vnet.ibm.com> wrote:
> > >
> > > 3.      Introduce a new marking/attribute in the .def file that indicates
> > >         whether an access is volatile or implies a compiler barrier.
> > >         This might allow herd to be more selective about control dependencies,
> > >         for example, extending them past the end of "if" statements
> > >         containing compiler barriers.
> > >
> > >         One tricky aspect of this approach is working out what the
> > >         compiler can do to the "if" statement.  We definitely do not
> > >         want to put the complexity of all possible compilers into herd!
> > 
> > This _smells_ like the right thing to do.
> > 
> > Aren't the litmus-tests effectively always going to be using READ_ONCE
> > etc volatile accesses anyway?
> 
> Yes, right now LKMM handles only volatile (READ_ONCE and WRITE_ONCE)
> or stronger.
> 
> > Most of what the LKMM litmus tests test for is things with side effects, no?
> > 
> > And honestly, that is exactly the kind of litmus test behavior we
> > *want* our memory model to have, in that any CPU designer (or compiler
> > designer) that uses our LKMM litmus tests should very much be aware of
> > the fact that we expect a conditional->store memory ordering to be a
> > ordering.
> > 
> > We have real code that depends on it, so I think LKMM should expose
> > those ordering requirements.
> 
> Good point.  And I did manage to advocate for control dependencies at an
> academic conference earlier this week without the need for a bodyguard,
> so perhaps things are improving.  ;-)
> 
> > I'm also perfectly happy with no markings at all: all memory accesses
> > are "voiatile" as fat as C is concerned, and cannot be moved around by
> > the compiler at all - and all that LKMM tests is memory _ordering_,
> > not "compiler can do X".
> 
> We are considering adding unmarked accesses, for example, accesses
> protected by locks.  One possible litmus test (not yet supported!)
> might look like this:
> 
> 	C Z6.0+pooncelock+pooncelock+pombonce
> 
> 	{}
> 
> 	P0(int *x, int *y, spinlock_t *mylock)
> 	{
> 		spin_lock(mylock);
> 		WRITE_ONCE(*x, 1);
> 		y = 1;
> 		spin_unlock(mylock);
> 	}
> 
> 	P1(int *y, int *z, spinlock_t *mylock)
> 	{
> 		int r0;
> 
> 		spin_lock(mylock);
> 		r0 = y;
> 		WRITE_ONCE(*z, 1);
> 		spin_unlock(mylock);
> 	}
> 
> 	P2(int *x, int *z)
> 	{
> 		int r1;
> 
> 		WRITE_ONCE(*z, 2);
> 		smp_mb();
> 		r1 = READ_ONCE(x);
> 	}
> 
> 	exists (1:r0=1 /\ z=2 /\ 2:r1=0)
> 
> Because y is only ever accessed under a lock, the compiler cannot do
> anything to mess it up.  In this particular case, the compiler would
> have a hard time messing up the other accesses, aside from extremely
> unfriendly measures such as load/store tearing, but we have seen
> Linux-kernel examples where the compiler could very reasonably repeat
> and fuse loads and stores and so on.
> 
> Thoughts?
> 
> > Because I think your option 1. is absolutely against everything we
> > want to happen:
> > 
> > > 1.      Status quo.  This works reasonably well, but we have already
> > >         seen that your scenario makes it ask for more synchronization
> > >         than necessary.
> > 
> > We absolutely do *not* want CPU designers etc thinking that we'll add
> > insane synchronization.
> > 
> > We were already burned by insane bad CPU design once in the form of
> > the garbage that alpha desigers gave us.
> > 
> > I am personally not at all interested in seeing our memory ordering
> > rules be "nice". They should be as tight as possible, and *not* allow
> > any crazy shit that some insane person can come up with. No more
> > "dependent reads out of ordetr" garbage, and no more "store done
> > before the condition it depends on" garbage.
> > 
> > A CPU designer (or a C++ memory ordering person) who tries to sneak
> > shit like that past us should be shunned, and not taken seriously.
> 
> Given that I am writing this in a C++ Standards Committee meeting,
> I do like the call for sanity.
> 
> > And our memory ordering rules should be very explicit about it, so
> > that people don't even _try_ to do insane things.
> > 
> > I want any memory ordering litmus tests to say "we depend on this, so
> > as a CPU designer don't mess it up, because then we won't run on the
> > resulting crap".
> > 
> > I'm not in the least interested in the LKMM litmus tests being an
> > excuse for unnecessarily weak memory ordering. That's the *opposite*
> > of what I would want any litmus tests to do.
> > 
> > If people are looking to use them that way, then I'm going to remove
> > them, because such litmus tests are not in the best interest of the
> > kernel.
> > 
> > Future CPU designs need to be *saner*, not perpetuate the kind of
> > garbage insanity we have seen so far.
> 
> OK, I think we all agree that we need to move away from status quo.
> The workaround buys us some time, but the need to move is undisputed.
> (If someone does dispute it, this would be a great time for them to make
> their concerns known!)
> 
> I will take a hard look at option #3, and see if there any hidden gotchas.

There's another aspect to this discussion.

You can look at a memory model from three points of view:

1.	To a programmer, the model provides both guarantees (a certain
	code snippet will never yield a particular undesired result)
	and warnings (another snippet might yield an undesired result).

2.	To a CPU designer, the model provides limits on what the
	hardware should be allowed to do (e.g., never execute a store
	before the condition of a preceding conditional branch has been
	determined -- not even if the CPU knows that the store would be
	executed on both legs of the branch).

3.	To a compiler writer, the model provides limits on what code
	manipulations should be allowed.

Linus's comments mostly fall under viewpoint 2 (AFAICT), and a lot of
our thinking to date has been in viewpoint 1.  But viewpoint 3 is the
one most relevant to the code we have been discussing here.

Saying that a control dependency should extend beyond the end of an
"if" statement is basically equivalent to saying that compiler writers
are forbidden to implement certain optimizations.  Now, my experience
has been that compiler writers are loathe to give up an optimization
unless someone can point to a specific part of the language spec and
show that the proposed optimization would violate it.

Given that the LKMM has no official standing whatsoever with the C and 
C++ standards committees, how likely is it that we would be able to 
convince them to change the spec merely to satisfy our desires?

It's all very well for Linus to say "no more 'store done before the
condition it depends on' garbage", but that is an empty rant if we are
unable to exert any pressure on the standards or compiler writers.

Perhaps the best we could hope for would be to have a command-line flag
added to gcc (and LLVM?) that would forbid:

	combining two identical volatile stores in the two legs of an 
	"if" statement and moving the result up before the "if", and

	moving a volatile store up before a preceding "if" statement

(or something along those lines -- even expressing this idea precisely
is a difficult thing to do).  I seriously doubt the standards people
would entertain the idea of making these restrictions universal.  
Especially since the specs currently do not include any notion of
dependency at all.

Alan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: LKMM litmus test for Roman Penyaev's rcu-rr
@ 2018-06-07 14:57                           ` Alan Stern
  0 siblings, 0 replies; 45+ messages in thread
From: Alan Stern @ 2018-06-07 14:57 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Linus Torvalds, Roman Pen, Linux Kernel Mailing List, linux-arch,
	andrea.parri, Will Deacon, Peter Zijlstra, Boqun Feng,
	Nick Piggin, David Howells, Jade Alglave, Luc Maranget,
	Akira Yokosawa, Ingo Molnar

On Thu, 7 Jun 2018, Paul E. McKenney wrote:

> On Wed, Jun 06, 2018 at 12:23:33PM -0700, Linus Torvalds wrote:
> > On Wed, Jun 6, 2018 at 12:05 PM Paul E. McKenney
> > <paulmck@linux.vnet.ibm.com> wrote:
> > >
> > > 3.      Introduce a new marking/attribute in the .def file that indicates
> > >         whether an access is volatile or implies a compiler barrier.
> > >         This might allow herd to be more selective about control dependencies,
> > >         for example, extending them past the end of "if" statements
> > >         containing compiler barriers.
> > >
> > >         One tricky aspect of this approach is working out what the
> > >         compiler can do to the "if" statement.  We definitely do not
> > >         want to put the complexity of all possible compilers into herd!
> > 
> > This _smells_ like the right thing to do.
> > 
> > Aren't the litmus-tests effectively always going to be using READ_ONCE
> > etc volatile accesses anyway?
> 
> Yes, right now LKMM handles only volatile (READ_ONCE and WRITE_ONCE)
> or stronger.
> 
> > Most of what the LKMM litmus tests test for is things with side effects, no?
> > 
> > And honestly, that is exactly the kind of litmus test behavior we
> > *want* our memory model to have, in that any CPU designer (or compiler
> > designer) that uses our LKMM litmus tests should very much be aware of
> > the fact that we expect a conditional->store memory ordering to be a
> > ordering.
> > 
> > We have real code that depends on it, so I think LKMM should expose
> > those ordering requirements.
> 
> Good point.  And I did manage to advocate for control dependencies at an
> academic conference earlier this week without the need for a bodyguard,
> so perhaps things are improving.  ;-)
> 
> > I'm also perfectly happy with no markings at all: all memory accesses
> > are "voiatile" as fat as C is concerned, and cannot be moved around by
> > the compiler at all - and all that LKMM tests is memory _ordering_,
> > not "compiler can do X".
> 
> We are considering adding unmarked accesses, for example, accesses
> protected by locks.  One possible litmus test (not yet supported!)
> might look like this:
> 
> 	C Z6.0+pooncelock+pooncelock+pombonce
> 
> 	{}
> 
> 	P0(int *x, int *y, spinlock_t *mylock)
> 	{
> 		spin_lock(mylock);
> 		WRITE_ONCE(*x, 1);
> 		y = 1;
> 		spin_unlock(mylock);
> 	}
> 
> 	P1(int *y, int *z, spinlock_t *mylock)
> 	{
> 		int r0;
> 
> 		spin_lock(mylock);
> 		r0 = y;
> 		WRITE_ONCE(*z, 1);
> 		spin_unlock(mylock);
> 	}
> 
> 	P2(int *x, int *z)
> 	{
> 		int r1;
> 
> 		WRITE_ONCE(*z, 2);
> 		smp_mb();
> 		r1 = READ_ONCE(x);
> 	}
> 
> 	exists (1:r0=1 /\ z=2 /\ 2:r1=0)
> 
> Because y is only ever accessed under a lock, the compiler cannot do
> anything to mess it up.  In this particular case, the compiler would
> have a hard time messing up the other accesses, aside from extremely
> unfriendly measures such as load/store tearing, but we have seen
> Linux-kernel examples where the compiler could very reasonably repeat
> and fuse loads and stores and so on.
> 
> Thoughts?
> 
> > Because I think your option 1. is absolutely against everything we
> > want to happen:
> > 
> > > 1.      Status quo.  This works reasonably well, but we have already
> > >         seen that your scenario makes it ask for more synchronization
> > >         than necessary.
> > 
> > We absolutely do *not* want CPU designers etc thinking that we'll add
> > insane synchronization.
> > 
> > We were already burned by insane bad CPU design once in the form of
> > the garbage that alpha desigers gave us.
> > 
> > I am personally not at all interested in seeing our memory ordering
> > rules be "nice". They should be as tight as possible, and *not* allow
> > any crazy shit that some insane person can come up with. No more
> > "dependent reads out of ordetr" garbage, and no more "store done
> > before the condition it depends on" garbage.
> > 
> > A CPU designer (or a C++ memory ordering person) who tries to sneak
> > shit like that past us should be shunned, and not taken seriously.
> 
> Given that I am writing this in a C++ Standards Committee meeting,
> I do like the call for sanity.
> 
> > And our memory ordering rules should be very explicit about it, so
> > that people don't even _try_ to do insane things.
> > 
> > I want any memory ordering litmus tests to say "we depend on this, so
> > as a CPU designer don't mess it up, because then we won't run on the
> > resulting crap".
> > 
> > I'm not in the least interested in the LKMM litmus tests being an
> > excuse for unnecessarily weak memory ordering. That's the *opposite*
> > of what I would want any litmus tests to do.
> > 
> > If people are looking to use them that way, then I'm going to remove
> > them, because such litmus tests are not in the best interest of the
> > kernel.
> > 
> > Future CPU designs need to be *saner*, not perpetuate the kind of
> > garbage insanity we have seen so far.
> 
> OK, I think we all agree that we need to move away from status quo.
> The workaround buys us some time, but the need to move is undisputed.
> (If someone does dispute it, this would be a great time for them to make
> their concerns known!)
> 
> I will take a hard look at option #3, and see if there any hidden gotchas.

There's another aspect to this discussion.

You can look at a memory model from three points of view:

1.	To a programmer, the model provides both guarantees (a certain
	code snippet will never yield a particular undesired result)
	and warnings (another snippet might yield an undesired result).

2.	To a CPU designer, the model provides limits on what the
	hardware should be allowed to do (e.g., never execute a store
	before the condition of a preceding conditional branch has been
	determined -- not even if the CPU knows that the store would be
	executed on both legs of the branch).

3.	To a compiler writer, the model provides limits on what code
	manipulations should be allowed.

Linus's comments mostly fall under viewpoint 2 (AFAICT), and a lot of
our thinking to date has been in viewpoint 1.  But viewpoint 3 is the
one most relevant to the code we have been discussing here.

Saying that a control dependency should extend beyond the end of an
"if" statement is basically equivalent to saying that compiler writers
are forbidden to implement certain optimizations.  Now, my experience
has been that compiler writers are loathe to give up an optimization
unless someone can point to a specific part of the language spec and
show that the proposed optimization would violate it.

Given that the LKMM has no official standing whatsoever with the C and 
C++ standards committees, how likely is it that we would be able to 
convince them to change the spec merely to satisfy our desires?

It's all very well for Linus to say "no more 'store done before the
condition it depends on' garbage", but that is an empty rant if we are
unable to exert any pressure on the standards or compiler writers.

Perhaps the best we could hope for would be to have a command-line flag
added to gcc (and LLVM?) that would forbid:

	combining two identical volatile stores in the two legs of an 
	"if" statement and moving the result up before the "if", and

	moving a volatile store up before a preceding "if" statement

(or something along those lines -- even expressing this idea precisely
is a difficult thing to do).  I seriously doubt the standards people
would entertain the idea of making these restrictions universal.  
Especially since the specs currently do not include any notion of
dependency at all.

Alan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: LKMM litmus test for Roman Penyaev's rcu-rr
  2018-06-07  9:43                       ` Paul E. McKenney
  2018-06-07 14:57                           ` Alan Stern
@ 2018-06-07 15:06                         ` Linus Torvalds
  2018-06-07 19:57                           ` Paul E. McKenney
  1 sibling, 1 reply; 45+ messages in thread
From: Linus Torvalds @ 2018-06-07 15:06 UTC (permalink / raw)
  To: Paul McKenney
  Cc: Roman Pen, Alan Stern, Linux Kernel Mailing List, linux-arch,
	andrea.parri, Will Deacon, Peter Zijlstra, Boqun Feng,
	Nick Piggin, David Howells, Jade Alglave, Luc Maranget,
	Akira Yokosawa, Ingo Molnar

On Thu, Jun 7, 2018 at 2:41 AM Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
>
> We are considering adding unmarked accesses, for example, accesses
> protected by locks.  One possible litmus test (not yet supported!)
> might look like this:

Fair enough - you do want to have the distinction between "marked" and
"unmarked".

And it does make sense, although at that point I think you do hit the
"what can a compiler do" issue more. Right now, I think the things you
check are all pretty much "compiler can't do a lot of movement".

But I suspect that the markings you do have are going to be fairly
limited. Things like "READ_ONCE()" vs "smp_read_acquire()" are still
fairly simple from a compiler standpoint, at least when it comes to
control flow - they have "side effects". So I guess that's the only
real difference there - a regular read doesn't have side effects, so
it could be moved up past a conditional, and/or duplicated for each
use. That sounds much more complex to the checker than the existing
things it supports, no?

               Linus

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: LKMM litmus test for Roman Penyaev's rcu-rr
  2018-06-07 14:57                           ` Alan Stern
  (?)
@ 2018-06-07 15:40                           ` Linus Torvalds
  -1 siblings, 0 replies; 45+ messages in thread
From: Linus Torvalds @ 2018-06-07 15:40 UTC (permalink / raw)
  To: Alan Stern
  Cc: Paul McKenney, Roman Pen, Linux Kernel Mailing List, linux-arch,
	andrea.parri, Will Deacon, Peter Zijlstra, Boqun Feng,
	Nick Piggin, David Howells, Jade Alglave, Luc Maranget,
	Akira Yokosawa, Ingo Molnar

On Thu, Jun 7, 2018 at 7:57 AM Alan Stern <stern@rowland.harvard.edu> wrote:
>
> You can look at a memory model from three points of view:
>
> 1.      To a programmer, the model provides both guarantees (a certain
>         code snippet will never yield a particular undesired result)
>         and warnings (another snippet might yield an undesired result).
>
> 2.      To a CPU designer, the model provides limits on what the
>         hardware should be allowed to do (e.g., never execute a store
>         before the condition of a preceding conditional branch has been
>         determined -- not even if the CPU knows that the store would be
>         executed on both legs of the branch).
>
> 3.      To a compiler writer, the model provides limits on what code
>         manipulations should be allowed.
>
> Linus's comments mostly fall under viewpoint 2 (AFAICT), and a lot of
> our thinking to date has been in viewpoint 1.  But viewpoint 3 is the
> one most relevant to the code we have been discussing here.

Hmm. Yes. I've been ignoring it because we've historically just
"solved" it by having barriers (either "asm volatile" with a memory
clobber, or just a volatile access) for the cases under discussion.
There isn't a lot of room for compiler games there. We've had issues
with compilers _elsewhere_, but when it comes to memory models, we've
always solved it with "hey, let's make that access volatile", or "we
will force the compiler to use *this* instruction and nothing else".

So we've never actually had much worry about what a compiler could do
for these particular issues.

In fact, one of the reasons I personally doubt we will ever use the
C++ memory model things is that I simply don't trust the stanards body
to do the right thing. The discussion about control dependencies vs
data dependencies made me 100% convinced that there is no way in hell
that the C++ standards body model can ever work reliably.

Writing the requirements down as English, and writing them down as an
"abstract machine" that had a lot of syntax caused untold and
completely unnecessary verbiage, and made the end result
incomprehensible to any normal human. Very much so to a compiler
writer.

So honestly, I've given up on the C++ memory model. We've done quite
well with inline asm and "volatile", and since I do not believe we
will ever get rid of either, the C++ memory model isn't much of an
improvement.

And notice that this is very much a fundamental doubt about the whole
approach that the standards language is taking. I'm not doubting that
a memory model cannot be defined. I'm doubting that it can sanely be
described with the approach that the C standards body has.

Honestly, if the C++ standards body had taken a completely different
approach of actually *definining* a virtual machine, and defining the
actual code generation (ie the language would actually *define* what
every operation did in the virtual machine), and defining what the
memory ordering within that virtual machine was, I would give it a
chance in hell of working.

But that's not what the C++ standards body did. Instead, it tried to
define things in terms of "dependency chains" and other very
high-level abstractions.

So I do not believe for a second in the C++ memory ordering. It's
broken. It's broken for fundamental reasons, and I doubt we'll ever
use it because no compiler writer will ever get it remotely right.

> Saying that a control dependency should extend beyond the end of an
> "if" statement is basically equivalent to saying that compiler writers
> are forbidden to implement certain optimizations.

Honestly, what I think the standard should do is *define* that

     if (a) B else C

is defined to generate the code

      load a
      test
      je label1;
      B
      jmp label2
 label1:
      C
 label2:

in the virtual machine.  And then each operation (like "load" vs
"load_acquire" vs "load_volatile") would have well-defined semantics
in that virtual machine. And then a compiler writer would be able to
do any optimization he dam,n well please, as long as it is
semantically 100% equivalent.

That would actually give a program well-defined semantics, and I
guarantee it would make compiler writers happier too, because they'd
have a definition they can follow and think about. The memory ordering
requirements would fall out from the memory ordering defined for the
virtual machine, and then that machine could be described sanely.

In the above kind of situation, I believe we could have described what
"control vs data dependency" means, because you could have specified
it for the virtual machine, and then the compiler writers could
determine from that well-defined flow (and the kind of load they had)
whether they could turn a data dependency into a control dependency
during optimization or not.

But that's not how the C standard is written, and it's not how the
people involved *want* to write it. They want to write it at a high
level, not a low level.

End result: I really don't trust what the C++ memory ordering
semantics are. It's all much too wishy-washy, and I don't think a
compiler writer can ever understand it.

>  Now, my experience
> has been that compiler writers are loathe to give up an optimization
> unless someone can point to a specific part of the language spec and
> show that the proposed optimization would violate it.

I agree. Which is why we sometimes use the big hammer approach and
just say "don't do this optimization at all".

But it's also why the language definition being unclear is so
horrible. It's sometimes simply not all that obvious what the standard
actually allows and what it doesn't, and memory ordering is going to
be way more fragile than even the existing confusion.

So I agree 100% with you that the LKMM cases are not likely to affect
the C standard, and not affect compiler writers.

But I also think that's largely irrelevant, because I don't think it's
worth worrying about. I don't believe we will be using the standard
memory ordering, because it's not going to end up usable, and we'll
continue to roll our own.

I did hope a few years ago that that wouldn't be the case. But the
discussions I was involved with convinced me that there's no hope.

                Linus

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: LKMM litmus test for Roman Penyaev's rcu-rr
  2018-06-07 15:06                         ` Linus Torvalds
@ 2018-06-07 19:57                           ` Paul E. McKenney
  0 siblings, 0 replies; 45+ messages in thread
From: Paul E. McKenney @ 2018-06-07 19:57 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Roman Pen, Alan Stern, Linux Kernel Mailing List, linux-arch,
	andrea.parri, Will Deacon, Peter Zijlstra, Boqun Feng,
	Nick Piggin, David Howells, Jade Alglave, Luc Maranget,
	Akira Yokosawa, Ingo Molnar

On Thu, Jun 07, 2018 at 08:06:51AM -0700, Linus Torvalds wrote:
> On Thu, Jun 7, 2018 at 2:41 AM Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> >
> > We are considering adding unmarked accesses, for example, accesses
> > protected by locks.  One possible litmus test (not yet supported!)
> > might look like this:
> 
> Fair enough - you do want to have the distinction between "marked" and
> "unmarked".
> 
> And it does make sense, although at that point I think you do hit the
> "what can a compiler do" issue more. Right now, I think the things you
> check are all pretty much "compiler can't do a lot of movement".

Agreed, and the added freedom unmarked accesses give the compiler is
one reason why this is something being thought about as opposed to
something already done.  The approach that looks most feasible is to
make LKMM complain if there is a data race involving unmarked accesses.
This is a bit annoying given the Linux kernel's large number of unmarked
accesses to shared variable that do involve data races, however, my
belief is that developers and maintainers of such code are under the
obligation to make sure that the compiler cannot mess them up.  One
way that they could do this is to list the transformations that the
compiler could carry out, and make a separate litmus test for each,
substituting READ_ONCE() and WRITE_ONCE() for the unmarked accesses.

Then unmarked accesses would be for code protected by locks or that
used dependencies to avoid data races.

Thoughts?

> But I suspect that the markings you do have are going to be fairly
> limited. Things like "READ_ONCE()" vs "smp_read_acquire()" are still
> fairly simple from a compiler standpoint, at least when it comes to
> control flow - they have "side effects". So I guess that's the only
> real difference there - a regular read doesn't have side effects, so
> it could be moved up past a conditional, and/or duplicated for each
> use. That sounds much more complex to the checker than the existing
> things it supports, no?

Indeed!

And those sorts of compiler transformations are one of the things that
has pushed us to the "no data races" model.  If there are no data races,
then those compiler transformations don't affect the outcome, given that
C11 and later compilers are not allowed to introduce data races.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2018-06-07 19:55 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-28 22:08 LKMM litmus test for Roman Penyaev's rcu-rr Paul E. McKenney
2018-05-29 18:35 ` Alan Stern
2018-05-29 18:35   ` Alan Stern
2018-05-29 19:03   ` Paul E. McKenney
2018-05-29 20:49     ` Alan Stern
2018-05-29 20:49       ` Alan Stern
2018-05-29 21:10       ` Linus Torvalds
2018-05-29 22:53         ` Paul E. McKenney
2018-05-30 14:46           ` Alan Stern
2018-05-30 14:46             ` Alan Stern
2018-05-30 14:29         ` Alan Stern
2018-05-30 14:29           ` Alan Stern
2018-05-30 14:59           ` Linus Torvalds
2018-05-30 18:10             ` Alan Stern
2018-05-30 18:38             ` Paul E. McKenney
2018-05-30 19:08               ` Alan Stern
2018-05-30 19:08                 ` Alan Stern
2018-05-30 19:45                 ` Paul E. McKenney
2018-05-30 19:45                   ` Paul E. McKenney
2018-05-30 19:45                   ` Paul E. McKenney
2018-05-30 20:28                   ` Alan Stern
2018-05-30 20:28                     ` Alan Stern
2018-05-30 21:49                     ` Paul E. McKenney
2018-05-30 22:01                 ` Linus Torvalds
2018-05-30 23:14                   ` Paul E. McKenney
2018-05-31 14:27                     ` Alan Stern
2018-05-31 14:27                       ` Alan Stern
2018-06-02 14:44                       ` Paul E. McKenney
2018-06-04 14:17                         ` Alan Stern
2018-06-04 14:17                           ` Alan Stern
2018-06-04 16:01                           ` Paul E. McKenney
2018-06-06  9:40                 ` Roman Penyaev
2018-06-06 13:54                   ` Alan Stern
2018-06-06 13:54                     ` Alan Stern
2018-06-06 14:41                     ` Roman Penyaev
2018-06-06 15:55                       ` Alan Stern
2018-06-06 15:55                         ` Alan Stern
2018-06-06 19:07                   ` Paul E. McKenney
2018-06-06 19:23                     ` Linus Torvalds
2018-06-07  9:43                       ` Paul E. McKenney
2018-06-07 14:57                         ` Alan Stern
2018-06-07 14:57                           ` Alan Stern
2018-06-07 15:40                           ` Linus Torvalds
2018-06-07 15:06                         ` Linus Torvalds
2018-06-07 19:57                           ` Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.