Re: [PATCH] tools/memory-model: Make ppo a subrelation of po

From: Jonas Oberhauser <jonas.oberhauser@huaweicloud.com>
To: Alan Stern <stern@rowland.harvard.edu>
Cc: Jonas Oberhauser <jonas.oberhauser@huawei.com>,
	paulmck@kernel.org, parri.andrea@gmail.com, will@kernel.org,
	peterz@infradead.org, boqun.feng@gmail.com, npiggin@gmail.com,
	dhowells@redhat.com, j.alglave@ucl.ac.uk, luc.maranget@inria.fr,
	akiyks@gmail.com, dlustig@nvidia.com, joel@joelfernandes.org,
	urezki@gmail.com, quic_neeraju@quicinc.com, frederic@kernel.org,
	linux-kernel@vger.kernel.org, viktor@mpi-sws.org
Subject: Re: [PATCH] tools/memory-model: Make ppo a subrelation of po
Date: Tue, 24 Jan 2023 21:23:02 +0100	[thread overview]
Message-ID: <40447973-6f6b-86f7-1147-d8f20a943767@huaweicloud.com> (raw)
In-Reply-To: <Y9AR4Gr10SyCKovo@rowland.harvard.edu>

On 1/24/2023 6:14 PM, Alan Stern wrote:
> On Tue, Jan 24, 2023 at 02:14:03PM +0100, Jonas Oberhauser wrote:
>> After mulling it over a bit in my big old head, I consider that even though
>> dropping the [W] may be shorter, it might make for the simpler model by
>> excluding lots of cases.
>> That makes me think you should do it for real in the definition of prop. And
>> not just at the very end, because in fact each cumul-fence link might come
>> from a non-A-cumulative fence. So the same argument you are giving should be
>> applied recursively.
>> Either
>>
>> 	prop = (overwrite & ext)? ; (cumul-fence; [W])* ; rfe?
>>
>> or integrate it directly into cumul-fence.
> I dislike this sort of argument.  I understand the formal memory model
> by relating it to the informal operational model.  Thus, cumul-fence
> links a write W to another event E when the fence guarantees that W will
> propagate to E's CPU before E executes.

I later wondered why it's not defined like this and realized that prop 
means that it's before E executes.

> That's how the memory model
> expresses the propagation properties of these fences.

I don't think that's really a perfect match though.
For example, W ->wmb E (and thus cumul-fence) does guarantee that W 
propagates to E's CPU before E executes.
But the propagation property of wmb is that W propagates to every CPU 
before E propagates to that CPU.
It just so happens that the time E propagates to E's CPU is the time it 
executes.

Indeed, looking at the non-strong properties of fences only, should give 
rise to a relation that only says "W propagates to any CPU before E 
propagates to that CPU" and that is a relation between stores. And quite 
different from "W propagates to E's CPU before E executes".

I believe that relation is (cumul-fence;[W])+.
Then
   X ->(overwrite&ext);(cumul-fence;[W])* E
means that there is some W co-after X which propagates to any CPU no 
later than E due to the weak properties of fences along that path.
And
   X ->(overwrite&ext);(cumul-fence;[W])*;rfe? E
implies that there is some W co-after X which propagates to the CPU 
executing E no later than E executes. (because E observes or executes 
and hence propapagated to itself a store that must propagate to E's CPU 
no earlier than W).

I think this is closer to the idea of expressing the (non-strong) 
propagation properties of the fences.

> I don't want to
> rule out the possibility that E is a read merely because cumul-fence
> might be followed by another, A-cumulative fence.
Perhaps you mean non-A-cumulative fence?
The A-cumulative fences (when their A-cumulativity is actually used) 
already rule out reads because they use
overwrite;cumul-fence*;rfe;(the a-cumulativity)

>
>>>>> Consider: Could we remove all propagation-ordering fences from ppo
>>>>> because they are subsumed by prop?  (Or is that just wrong?)
>>>> Surely not, since prop doesn't usually provide ordering by itself.
>>> Sorry, I meant the prop-related non-ppo parts of hb and pb.
>> I still don't follow :( Can you write some equations to show me what you
>> mean?
> Consider:
>
> 	Rx=1			Ry=1
> 	Wrelease Y=1		Wx=1
>
> Here we have Wx=1 ->hb* Ry=1 by (prop \ id) & int, using the fact that
> Wy=1 is an A-cumulative release fence.  But we also have
>
> 	Wx=1 ->rfe Rx=1 ->ppo Wy=1 ->rfe Ry=1.
>
> Thus there are two distinct ways of proving that Wx=1 ->hb* Ry=1.  If we
> removed the fence term from the definition of ppo (or weakened it to
> just rmb | acq), we would eliminate the second, redundant proof.  Is
> this the sort of thing you think we should do?

The reason I wouldn't do something like that is that firstly, the fence 
does preserve the program order, and secondly there are proofs where you 
need to use that fact.

>>>>>>> In fact, I wouldn't mind removing the happens-before, propagation, and
>>>>>>> rcu axioms from LKMM entirely, replacing them with the single
>>>>>>> executes-before axiom.
>>>>>> I was planning to propose the same thing, however, I would also propose to
>>>>>> redefine hb and rb by dropping the hb/pb parts at the end of these
>>>>>> relations.
>>>>>>
>>>>>>     hb = ....
>>>>>>     pb = prop ; strong-fence ; [Marked]
>>>>>>     rb = prop ; rcu-fence ; [Marked]
>>>>>>
>>>>>>     xb = hb|pb|rb
>>>>>>     acyclic xb
>>>>> I'm not so sure that's a good idea.  For instance, it would require the
>>>>> definitions of rcu-link and rb to be changed from having (hb* ; pb*) to
>>>>> having (hb | pb)*.
>>>> I think that's an improvement. It's obvious that (hb | pb)* is right and so
>>>> is (pb | hb)*.
>>>> For (hb* ; pb*), the first reaction is "why do all the hb edges need to be
>>>> before the pb edges?", until one realizes that pb actually allows hb* at the
>>>> end, so in a sense this is  hb* ; (pb ; hb*)*, and then one has to
>>>> understand that this means that the prop;strong-fence edges can appear any
>>>> number of times at arbitrary locations. It just seems like defining (pb |
>>>> hb)* with extra steps.
>>> This can be mentioned explicitly as a comment or in explanation.txt.
>> Ok, but why not just use  (pb|hb)* and (pb|hb|rb)* and not worry about
>> having to explain anything?
>> And make the hb and rb definitions simpler at the same time?
> Do you think (pb | hb)* is simpler than pb* (as in the statement of the
> propagation axiom)?
pb+,  however aren't you thinking of getting rid of the propagation axiom?
I still think (pb' | hb)+ where pb' is the simpler definition of pb is 
simpler than pb*, where pb=pb';hb*.

> Besides, remember what I said about understanding the formal memory
> model in terms of the operational model.  pb relates a write W to
> another event E when the strong fence guarantees that W will propagate
> to E's CPU before E executes.
I suppose to every CPU before E executes?

> If the hb* term were omitted from the definition of pb, this wouldn't be true any more.  Or at least, not as
> true as it should be.

Why is that the right level of how true it should be?

doesn't W ->pb;rb E also guarantee that W will propagate to E's CPU 
before E executes?
Or even just W ->pb;pb E?
Why only consider W->pb;hb E?

I'd rather think of it in terms of "this is the basic block that implies 
that it W executes before E because it propagates to every CPU before E 
executes, and then you can of course extend it by adding any of pb,rb, 
and hb at the end to get a longer "executes before"".

Best wishes, jonas