Eric Anholt writes: > The compute shader dispatch interface is pretty simple -- just pass in > the regs that userspace has passed us, with no CLs to run. However, > with no CL to run it means that we need to do manual cache flushing of > the L2 after the HW execution completes (for SSBO, atomic, and > image_load_store writes that are the output of compute shaders). > > This doesn't yet expose the L2 cache's ability to have a region of the > address space not write back to memory (which could be used for > shared_var storage). > > So far, the Mesa side has been tested on V3D v4.2 simpenrose (passing > the ES31 tests), and on the kernel side on 7278 (failing atomic > compswap tests in a way that doesn't reproduce on simpenrose). Fixed the compswap issue in Mesa. Looks like cmpxchg needed the TYPE field set to vec2 instead of 32-bit, since there are two values written to TMUD. The spec says the field is ignored, but that doesn't seem to be the case.