linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Will Deacon <will.deacon@arm.com>
To: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	"c++std-parallel@accu.org" <c++std-parallel@accu.org>,
	"linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>,
	"gcc@gcc.gnu.org" <gcc@gcc.gnu.org>,
	p796231 <Peter.Sewell@cl.cam.ac.uk>,
	"mark.batty@cl.cam.ac.uk" <Mark.Batty@cl.cam.ac.uk>,
	Peter Zijlstra <peterz@infradead.org>,
	Ramana Radhakrishnan <Ramana.Radhakrishnan@arm.com>,
	David Howells <dhowells@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Ingo Molnar <mingo@kernel.org>,
	"michaelw@ca.ibm.com" <michaelw@ca.ibm.com>
Subject: Re: Compilers and RCU readers: Once more unto the breach!
Date: Thu, 21 May 2015 20:24:22 +0100	[thread overview]
Message-ID: <20150521192422.GC19204@arm.com> (raw)
In-Reply-To: <20150520181606.GT6776@linux.vnet.ibm.com>

On Wed, May 20, 2015 at 07:16:06PM +0100, Paul E. McKenney wrote:
> On Wed, May 20, 2015 at 04:46:17PM +0100, Will Deacon wrote:
> > On Wed, May 20, 2015 at 01:15:22PM +0100, Paul E. McKenney wrote:
> > > Indeed, something like this does -not- carry a dependency from the
> > > memory_order_consume load to q:
> > > 
> > > 	char *p, q;
> > > 
> > > 	p = atomic_load_explicit(&gp, memory_order_consume);
> > > 	q = gq + (intptr_t)p - (intptr_t)p;
> > > 
> > > If this was compiled with -O0, ARM and Power might well carry a
> > > dependency, but given any optimization, the assembly language would have
> > > no hint of any such dependency.  So I am not seeing any particular danger.
> > 
> > The above is a welcome relaxation over C11, since ARM doesn't even give
> > you ordering based off false data dependencies. My concern is more to do
> > with how this can be specified precisely without prohibing honest compiler
> > and hardware optimisations.
> 
> That last is the challenge.  I believe that I am pretty close, but I am
> sure that additional adjustment will be required.  Especially given that
> we also need the memory model to be amenable to formal analysis.

Well, there's still the whole thin-air problem which unfortunately doesn't
go away with your proposal... (I was hoping that differentiating between
true and false dependencies would solve that, but your set of rules isn't
broad enough and I don't blame you at all for that!).

> > Out of interest, how do you tackle examples (4) and (5) of (assuming the
> > reads are promoted to consume loads)?:
> > 
> >   http://www.cl.cam.ac.uk/~pes20/cpp/notes42.html
> > 
> > my understanding is that you permit both outcomes (I appreciate you're
> > not directly tackling out-of-thin-air, but treatment of dependencies
> > is heavily related).

Thanks for taking the time to walk these two examples through.

> Let's see...  #4 is as follows, given promotion to memory_order_consume
> and (I am guessing) memory_order_relaxed:
> 
> 	r1 = atomic_load_explicit(&x, memory_order_consume);
> 	if (r1 == 42)
> 	  atomic_store_explicit(&y, r1, memory_order_relaxed);
> 	------------------------------------------------------
> 	r2 = atomic_load_explicit(&y, memory_order_consume);
> 	if (r2 == 42)
> 	  atomic_store_explicit(&x, 42, memory_order_relaxed);
> 	else
> 	  atomic_store_explicit(&x, 42, memory_order_relaxed);
> 
> The second thread does not have a proper control dependency, even with
> the memory_order_consume load because both branches assign the same
> value to "x".  This means that the compiler is within its rights to
> optimize this into the following:
> 
> 	r1 = atomic_load_explicit(&x, memory_order_consume);
> 	if (r1 == 42)
> 	  atomic_store_explicit(&y, r1, memory_order_relaxed);
> 	------------------------------------------------------
> 	r2 = atomic_load_explicit(&y, memory_order_consume);
> 	atomic_store_explicit(&x, 42, memory_order_relaxed);
> 
> There is no dependency between the second thread's pair of statements,
> so both the compiler and the CPU are within their rights to optimize
> further as follows:
> 
> 	r1 = atomic_load_explicit(&x, memory_order_consume);
> 	if (r1 == 42)
> 	  atomic_store_explicit(&y, r1, memory_order_relaxed);
> 	------------------------------------------------------
> 	atomic_store_explicit(&x, 42, memory_order_relaxed);
> 	r2 = atomic_load_explicit(&y, memory_order_consume);
> 
> If the compiler makes this final optimization, even mythical SC hardware
> is within its rights to end up with (r1 == 42 && r2 == 42).  Which is
> fine, as far as I am concerned.  Or at least something that can be
> lived with.

Agreed.

> On to #5:
> 
> 	r1 = atomic_load_explicit(&x, memory_order_consume);
> 	if (r1 == 42)
> 	  atomic_store_explicit(&y, r1, memory_order_relaxed);
> 	----------------------------------------------------
> 	r2 = atomic_load_explicit(&y, memory_order_consume);
> 	if (r2 == 42)
> 	  atomic_store_explicit(&x, 42, memory_order_relaxed);
> 
> The first thread's accesses are dependency ordered.  The second thread's
> ordering is in a corner case that memory-barriers.txt does not cover.
> You are supposed to start control dependencies with READ_ONCE_CTRL(), not
> a memory_order_consume load (AKA rcu_dereference and friends).  However,
> Alpha would have a full barrier as part of the memory_order_consume load,
> and the rest of the processors would (one way or another) respect the
> control dependency.  And the compiler would have some fun trying to
> break it.

But this is interesting because the first thread is ordered whilst the
second is not, so doesn't that effectively forbid the compiler from
constant-folding values if it can't prove that there is no dependency
chain?

> So the current Linux memory model would allow (r1 == 42 && r2 == 42),
> but I don't know of any hardware/compiler combination that would
> allow it.  And no, I am -not- going to update memory-barriers.txt for
> this litmus test, its theoretical interest notwithstanding!  ;-)

Indeed, I also don't know of any hardware which permits speculative
writes to become visible, but it's the compiler (and the language
definition) that we need to think about here.

> In summary, both #4 and #5 would be allowed, as modified above.
> Seem reasonable?

It feels like it's suppressing a reasonable compiler optimisation, but again,
I'm not a compiler writer ;)

Will

  reply	other threads:[~2015-05-21 19:24 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-20  0:55 Compilers and RCU readers: Once more unto the breach! Paul E. McKenney
2015-05-20  1:57 ` Linus Torvalds
2015-05-20  2:10   ` Linus Torvalds
2015-05-20  2:41     ` Paul E. McKenney
2015-05-20 11:47       ` Will Deacon
2015-05-20 12:15         ` Paul E. McKenney
2015-05-20 15:46           ` Will Deacon
2015-05-20 15:54             ` Andrew Haley
2015-05-20 18:16               ` [c++std-parallel-1632] " Paul E. McKenney
2015-05-21 14:22                 ` Michael Matz
2015-05-21 15:10                   ` Paul E. McKenney
2015-05-21 16:17                     ` Michael Matz
2015-05-21 18:37                       ` Paul E. McKenney
2015-05-20 18:16             ` Paul E. McKenney
2015-05-21 19:24               ` Will Deacon [this message]
2015-05-21 20:02                 ` Paul E. McKenney
2015-05-21 20:42                   ` Linus Torvalds
2015-05-21 22:02                     ` Paul E. McKenney
2015-05-22  6:43                     ` Ingo Molnar
2015-05-22 10:43                       ` Richard Kenner
2015-05-22 13:11                         ` Paul E. McKenney
2015-05-22 13:12                       ` Paul E. McKenney
2015-05-26 17:37                     ` [c++std-parallel-1641] " Torvald Riegel
2015-05-22 17:30                   ` Will Deacon
2015-05-22 18:55                     ` Paul E. McKenney
2015-05-20 13:18         ` David Howells
2015-05-20 13:30           ` Paul E. McKenney
2015-05-20 13:37           ` David Howells
2015-05-20 13:44             ` Ramana Radhakrishnan
2015-05-20 14:03               ` Paul E. McKenney
2015-05-20 14:15                 ` Ramana Radhakrishnan
2015-05-20 15:12                   ` Paul E. McKenney
2015-05-20 15:46                   ` David Howells
2015-05-20 14:02             ` [c++std-parallel-1624] " Paul E. McKenney
2015-05-20  2:34   ` Paul E. McKenney
2015-05-20  7:34     ` [c++std-parallel-1614] " Jens Maurer
2015-05-20  9:03       ` Richard Biener
2015-05-20 12:02         ` Paul E. McKenney
2015-05-20 12:01       ` [c++std-parallel-1616] " Paul E. McKenney
2015-05-26 17:08 ` [c++std-parallel-1611] " Torvald Riegel
2015-05-27  1:41   ` [c++std-parallel-1651] " Paul E. McKenney
2015-07-14  0:44 ` Paul E. McKenney
2015-09-22 17:00   ` Paul E. McKenney
     [not found]     ` <CAPUmR1aqV_cQWjE8qC9x2sfmW-1ocKKMtCgNbjZH0cJ-AO2WTg@mail.gmail.com>
2015-09-23 23:26       ` [c++std-parallel-2008] " Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150521192422.GC19204@arm.com \
    --to=will.deacon@arm.com \
    --cc=Mark.Batty@cl.cam.ac.uk \
    --cc=Peter.Sewell@cl.cam.ac.uk \
    --cc=Ramana.Radhakrishnan@arm.com \
    --cc=akpm@linux-foundation.org \
    --cc=c++std-parallel@accu.org \
    --cc=dhowells@redhat.com \
    --cc=gcc@gcc.gnu.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=michaelw@ca.ibm.com \
    --cc=mingo@kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).