Dear all, On Mon, Jun 07, 2021 at 12:52:35PM +0100, Will Deacon wrote: > On Mon, Jun 07, 2021 at 12:43:01PM +0200, Peter Zijlstra wrote: > > On Sun, Jun 06, 2021 at 11:43:42AM -0700, Linus Torvalds wrote: > > > So while the example code is insane and pointless (and you shouldn't > > > read *too* much into it), conceptually the notion of that pattern of > > > > > > if (READ_ONCE(a)) { > > > WRITE_ONCE(b,1); > > > .. do something .. > > > } else { > > > WRITE_ONCE(b,1); > > > .. do something else .. > > > } > > > > This is actually more tricky than it would appear (isn't it always). > > > > The thing is, that normally we must avoid speculative stores, because > > they'll result in out-of-thin-air values. > > > > *Except* in this case, where both branches emit the same store, then > > it's a given that the store will happen and it will not be OOTA. > > Someone's actually done the proof for that apparently (Will, you have a > > reference to Jade's paper?) > > I don't think there's a paper on this, but Jade and I are hoping to talk > about aspects of it at LPC (assuming the toolchain MC gets accepted). > > > There's apparently also a competition going on who can build the > > weakestest ARM64 implementation ever. > > > > Combine the two, and you'll get a CPU that *will* emit the store early > > :/ > > So there are a lot of important details missing here and, as above, I think > this is something worth discussing at LPC with Jade. The rough summary is > that the arm64 memory model recently (so recently that it's not yet landed > in the public docs) introduced something called "pick dependencies", which > are a bit like control dependencies only they don't create order to all > subsequent stores. These are useful for some conditional data-processing > instructions such as CSEL and CAS, but it's important to note here that > *conditional branch instructions behave exactly as you would expect*. > > > > To reiterate, in the code sequence at the top of this mail, if the compiler > emits something along the lines of: > > LDR > > STR > > then the load *will* be ordered before the store, even if the same store > instruction is executed regardless of the branch direction. Yes, one can > fantasize about a CPU that executes both taken and non-taken paths and > figures out that the STR can be hoisted before the load, but that is not > allowed by the architecture today. > > It's the conditional instructions that are more fun. For example, the CSEL > instruction: > > CSEL X0, X1, X2, > > basically says: > > if (cond) > X0 = X1; > else > X0 = X2; > > these are just register-register operations, but the idea is that the CPU > can predict that "branching event" inside the CSEL instruction and > speculatively rename X0 while waiting for the condition to resolve. > > So then you can add loads and stores to the mix along the lines of: > > LDR X0, [X1] // X0 = *X1 > CMP X0, X2 > CSEL X3, X4, X5, EQ // X3 = (X0 == X2) ? X4 : X5 > STR X3, [X6] // MUST BE ORDERED AFTER THE LOAD > STR X7, [X8] // Can be reordered > > (assuming X1, X6, X8 all point to different locations in memory) > > So now we have a dependency from the load to the first store, but the > interesting part is that the last store is _not_ ordered wrt either of the > other two memory accesses, whereas it would be if we used a conditional > branch instead of the CSEL. Make sense? > > Now, obviously the compiler is blissfully unaware that conditional > data processing instructions can give rise to dependencies than > conditional branches, so the question really is how much do we need to > care in the kernel? > > My preference is to use load-acquire instead of control dependencies so > that we don't have to worry about this, or any future relaxations to the > CPU architecture, at all. > > Jade -- please can you correct me if I got any of this wrong? > Sincere apologies in taking so long to reply. I attach a technical report which describes the status of dependencies in the Arm memory model. I have also released the corresponding cat files and a collection of interesting litmus tests over here: https://github.com/herd/herdtools7/commit/f80bd7c2e49d7d3adad22afc62ff4768d65bf830 I hope this material can help inform this conversation and I would love to hear your thoughts. Thanks, Jade > Will