> The DDK blob has the ability to mark only certain areas of memory as > coherent for performance reasons. For simple things like kmscube I would > expect that it's basically write-only from the CPU and almost all memory the > GPU touches isn't touched by the CPU. I.e. coherency isn't helping and the > coherency traffic is probably expensive. Whether the complexity is worth it > for "real" content I don't know - it may just be silly benchmarks that > benefit. Right, Panfrost userspace specifically assumes GPU reads to be expensive and treats GPU memory as write-only *except* for a few special cases (compute-like workloads, glReadPixels, some blits, etc). The vast majority of the GPU memory - everything used in kmscube - will be write-only to the CPU and fed directly into the display zero-copy (or read by the GPU later as a dmabuf).