Re: [PATCH] refcount_t: add ACQUIRE ordering on success for dec(sub)_and_test variants

From: Dmitry Vyukov <dvyukov@google.com>
To: "Reshetova, Elena" <elena.reshetova@intel.com>
Cc: Andrea Parri <andrea.parri@amarulasolutions.com>,
	Peter Zijlstra <peterz@infradead.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Kees Cook <keescook@chromium.org>,
	Alan Stern <stern@rowland.harvard.edu>
Subject: Re: [PATCH] refcount_t: add ACQUIRE ordering on success for dec(sub)_and_test variants
Date: Wed, 30 Jan 2019 11:34:13 +0100	[thread overview]
Message-ID: <CACT4Y+Y2GFQ2dt2GiNP4k+9F1kYw3BzeqhBBJ-QzjZmSux9gHA@mail.gmail.com> (raw)
In-Reply-To: <2236FBA76BA1254E88B949DDB74E612BA4B9A2B5@IRSMSX102.ger.corp.intel.com>

On Wed, Jan 30, 2019 at 11:19 AM Reshetova, Elena
<elena.reshetova@intel.com> wrote:
>
>  > So, you are saying that ACQUIRE does not guarantee that "po-later stores
> > > on the same CPU and all propagated stores from other CPUs
> > > must propagate to all other CPUs after the acquire operation "?
> > > I was reading about acquire before posting this and trying to understand,
> > > and this was my conclusion that it should provide this, but I can easily be wrong
> > > on this.
> > >
> > > Andrea, Peter, could you please comment?
> >
> > Short version:  I am not convinced by the above sentence, and I suggest
> > to remove it (as done in
> >
> >   http://lkml.kernel.org/r/20190128142910.GA7232@andrea ).
>
> Sorry, I misunderstood your previous email on this. I somehow misread it
> that " A-cumulative property" as a notion that is not used in LKMM for ACQUIRE,
> so I should not mention the notion, but the guarantees stay, but it is guarantees
> that are also wrong, which is much worse.
>
> >
> > ---
> > To elaborate:  I think that we should first discuss the meaning of that
> > "[...] after the acquire operation (does)",  because there is no notion
> > of "ACQUIRE (or more generally, load) propagation" in the LKMM:
> >
> > Stores propagate (after being executed) to other CPUs.  Loads _execute_
> > (possibly multiple times /speculatively, but this is irrelevant for the
> > discussion below).
> >
> > A detailed, but still informal, description of these concepts is in:
> >
> >   tools/memory-model/Documentation/explanation.txt
> >
> > (c.f., in particular, section "AN OPERATIONAL MODEL"); I can illustrate
> > them with an example:
> >
> >       { initially: x=0, y=0; }
> >
> >       CPU0                    CPU1
> >       --------------------------------------
> >       LOAD-ACQUIRE x=0        LOAD y=1
> >       STORE y=1
> >
> > In this scenario,
> >
> >   a) CPU0's "LOAD-ACQUIRE x=0" executes before CPU0's "STORE y=1"
> >      executes (this is guaranteed by the ACQUIRE),
> >
> >   b) CPU0's "STORE y=1" executes before "STORE y=1" propagates to
> >      CPU1 (a store cannot be propagated before being executed),
> >
> >   c) CPU0's "STORE y=1" propagates to CPU1 before CPU1's "LOAD y=1"
> >      executes (since CPU1 "sees the store").
> >
> > The example also illustrates the following property:
> >
> >   ACQUIRE guarantees that po-later stores on the same CPU must
> >   propagate to all other CPUs after the acquire _executes_.
> >
> > (combine (a) and (b) ).
> >
> > OTOH, please notice that:
> >
> >   ACQUIRE does _NOT_ guarantee that all propagated stores from
> >   other CPUs (to the CPU executing the ACQUIRE) must propagate
> >   to all other CPUs after the acquire operation _executes_.
>
> Thank you very much Andrea, this example and explanation clarifies it nicely!
> So Acquire only really affects the current CPU "view of the world" and operation
> propagation from it, and not anything else, which is actually very logical.
>
> My initial confusion was because I was thinking of ACQUIRE as a pair
> for RELEASE, i.e. it should provide a complementary guarantees to
>  RELEASE ones, just on po-later operations.
>
> >
> > In fact, we've already seen how full barriers can be used to break such
> > "guarantee"; for example, in
> >
> >       { initially: x=0, y=0; }
> >
> >       CPU0                    CPU1
> >               ...
> >       ---------------------------------------------------
> >       STORE x=1               LOAD x=1
> >                               FULL-BARRIER
> >                               LOAD-ACQUIRE y=0
> >
> > the full barrier forces CPU0's "STORE x=1" (seen by/propagated to CPU1)
> > to be propagated to all CPUs _before_ "LOAD-ACQUIRE y=0" is executed.
> >
> > Does this make sense?
>
> Yes, thank you again! I think it would take me still a long while to be familiar
> with all these notions and not to be confused even in simple things.
>
> >
> >
> > > > Is ACQUIRE strictly stronger than control dependency?
> > >
> > > In my understanding yes.
> >
> > +1 (or we have a problem)
> >
> >
> > >
> > > > It generally looks so unless there is something very subtle that I am
> > > > missing. If so, should we replace it with just "RELEASE ordering +
> > > > ACQUIRE ordering on success"? Looks simpler with less magic trickery.
> > >
> > > I was just trying to mention all the applicable orderings/guarantees.
> > > I can remove "control dependency" part if it is easier for people to understand
> > > (the main goal of documentation).
> >
> > This sounds like a good idea; thank you, Dmitry, for pointing this out.
>
> I will remove it. So, the rule that we always mention the strongest type of barrier
> When we mention some ordering guarantees, right?

My reasoning here was that control dependency is just a very subtle
thing so I think it's better if people just not see it at all and not
start thinking in terms of control dependencies until absolutely
necessary.

I am not sure how to generalize this. There are not too many other
cases where one barrier type is a full superset of another. E.g.
rmb/wmb are orthogonal to acquire/release.

But if we take full barrier, then, yes, it definitely makes sense to
just say that an operation provides full barrier rather than full
barrier, acquire barrier, release barrier, read barrier, write
barrier, control dependency, ... :)