TMZ is more complicated. If there is a TMZ buffer used by a command buffer, then all other used buffers must also be TMZ or read only. If no TMZ buffers are used by a command buffer, then TMZ is disabled. If a context is not secure, TMZ is also disabled. A context can switch between secure and non-secure based on the buffers being used. So mixing secure and non-secure memory writes in one command buffer won't work. This is not fixable in the driver - apps must be aware of this. Marek On Wed, Jun 3, 2020 at 5:50 AM Daniel Stone wrote: > Hi Alex, > > On Mon, 1 Jun 2020 at 15:25, Alex Deucher wrote: > > On Fri, May 29, 2020 at 11:03 AM Daniel Stone > wrote: > > > What Weston _does_ know, however, is that display controller can work > > > with modifier set A, and the GPU can work with modifier set B, and if > > > the client can pick something from modifier set A, then there is a > > > much greater probability that Weston can leave the GPU alone so it can > > > be entirely used by the client. It also knows that if the surface > > > can't be directly scanned out for whatever reason, then there's no > > > point in the client optimising for direct scanout, and it can tell the > > > client to select based on optimality purely for the GPU. > > > > Just so I understand this correctly, the main reason for this is to > > deal with display hardware and render hardware from different vendors > > which may or may not support any common formats other than linear. > > It handles pretty much everything other than a single-context, > single-GPU, single-device, tunnel. > > When sharing between subsystems and device categories, it lets us talk > about capabilities in a more global way. For example, GBM lets you > talk about 'scanout' and 'texture' and 'render', but what about media > codecs? We could add the concept of decode/encode to something like > GBM, and all the protocols like Wayland/X11 as well, then hope it > actually works, but ... > > When sharing between heterogeneous vendors, it lets us talk about > capabilities in a neutral way. For example, if you look at most modern > Arm SoCs, your GPU, display controller, and media codec, will very > likely all be from three totally different vendors. A GPU like > Mali-T8xx can be shipped in tens of different vendor SoCs in several > different revisions each. Just saying 'scanout' is totally meaningless > for the Panfrost driver. Putting awareness for every different KMS > platform and every different codec down into the Mesa driver is a > synchronisation nightmare, and all those drivers would also need > specific awareness about the Mesa driver. So modifiers allow us to > explicitly describe that we want a particular revision of Arm > Framebuffer Compression, and all the components can understand that > without having to be specifically aware of 15 different KMS drivers. > But even if you have the same vendor ... > > When sharing between multiple devices of the same class from the same > vendor, it lets us surface and transit that information in a generic > way, without AMD having to figure out ways to tunnel back-channel > information between different instances of drivers potentially > targeting different revisions. The alternatives seem to be deeply > pessimal hacks, and we think we can do better. And when we get > pessimal ... > > In every case, modifiers are about surfacing and sharing information. > One of the reasons Collabora have been putting so much time and energy > into this work is exactly _because_ solving those problems on a > case-by-case basis was a pretty lucrative source of revenue for us. > Debugging these kinds of issues before has usually involved specific > driver knowledge, hacking into the driver to insert your own tracing, > etc. > > If you (as someone who's trying to use a device optimally) are > fortunate enough that you can get the attention of a vendor and have > them solve the problem for you, then that's lucky for everyone apart > from the AMD engineers who have to go solve it. If you're not, and you > can't figure it out yourself, then you have to go pay a consultancy. > On the face of it, that's good for us, except that we don't want to be > doing that kind of repetitive boring work. But it's bad for the > ecosystem that this knowledge is hidden away and that you have to pay > specialists to extract it. So we're really keen to surface as much > mechanism and information as possible, to give people the tools to > either solve their own problems or at least make well-informed > reports, burn down a toxic source of revenue, waste less engineering > time extracting hidden information, and empower users as much as > possible. > > > It > > provides a way to tunnel device capabilities between the different > > drivers. In the case of a device with display and rendering on the > > same device or multiple devices from the same vendor, it not really > > that useful. > > Oh no, it's still super useful. There are a ton of corner cases where > 'if you're on same same-vendor same-gen same-silicon hardware' falls > apart - in addition to the world just not being very much > same-vendor/same-gen/same-silicon anymore. For some concrete examples: > > On NVIDIA Tegra hardware, planes within the display controller have > heterogeneous capability. Some can decompress and detile, others > can't. > > On Rockchip hardware, AFBC (DCC equivalent) is available for scanout > on any plane, and can be produced by the GPU. Great! Except that 'any > plane' isn't 'every plane' - there's a global decompression unit. > > On Intel hardware, they appear to have forked the media codec IP, > shipping two different versions of the codec, one as 'low-power' and > one as 'normal', obviously with varying capability. > > Even handwaving those away as vendor errors - that performance on > those gens will always be pessimal and they should do better next time > - I don't think same-vendor/same-gen/same-silicon is a good design > point anymore. Between heterogeneous cut-and-paste SoCs, multi-GPU and > eGPU usecases, virtualisation and tunneling, etc, the usecases are > starting to demand that we do better. Vulkan's memory-allocation > design also really pushes against the model that memory allocations > themselves are blessed with side-channel descriptor tags. > > 'Those aren't my usecases and we've made Vulkan work so we don't need > it' is an entirely reasonable position, but then you're just > exchanging the problem of describing your tiling & compression layouts > in a 56-bit enum to make modifiers work, for the problem of > maintaining a surprisingly wide chunk of the display stack. For all > the reasons above, over the past few years, the entire rest of the > ecosystem has settled on using modifiers to describe and negotiate > buffer exchange across context/process/protocol/subsystem/device > boundaries. All the effort of making this work in KMS, GBM, EGL, > Vulkan, Wayland, X11, V4L2, VA-API, GStreamer, etc, is going there. > > Realistically, the non-modifier path is probably going to bitrot, and > people are certainly resistant to putting more smarts into it, because > it just adds complexity to a now-single-vendor path - even NVIDIA are > pushing this forward, and their display path is much more of an > encapsulated magic tunnel than AMD's. In that sense, it's pretty much > accumulating technical debt; the longer you avoid dealing with the > display stack by implementing modifiers, the more work you have to put > into maintaining the display stack by fixing the non-modifier path. > > > It doesn't seem to provide much over the current EGL > > hints (SCANOUT, SECURE, etc.). > > Well yeah, if those single bits of information are enough to perfectly > encapsulate everything you need to know, then sure. But it hasn't been > for others, which is why we've all migrated away from them. > > > I still don't understand how it solves > > the DCC problem though. Compression and encryption seem kind like > > meta modifiers. There is an under laying high level layout, linear, > > tiled, etc. but it could also be compressed and/or encrypted. Is the > > idea that those are separate modifiers? E.g., > > 0: linear > > 1: linear | encrypted > > 2. linear | compressed > > 3: linear | encrypted | compressed > > 4: tiled1 > > 5: tiled1 | encrypted > > 6: tiled1 | compressed > > 7: tiled1 | encrypted | compressed > > etc. > > Or that the modifiers only expose the high level layout, and it's then > > up the the driver(s) to enable compression, etc. if both sides have a > > compatible layout? > > Do you remember the old wfb from xserver? Think of modifiers as pretty > much that. The format (e.g. A8R8G8B8) describes what you will read > when you load a particular pixel/texel, and what will get stored when > you write. The modifier describes how to get there: that includes both > tiling (since you need to know the particular tiling layout in order > to know the byte location to access), and compression (since you need > to know the particular compression mechanism in order to access the > pixel, e.g. for RLE-type compression that you need to access the first > pixel of the tile if the 'all pixels are the identical' bit is set). > > The idea is that these tokens fully describe the mechanisms in use, > without the drivers needing to do magic heuristics. For instance, if > your modifier is just 'tiled', then that's not a full description. A > full description would tell you about supertiling structures, tile > sizes and ordering, etc. The definitions already in > include/uapi/drm/drm_fourcc.h are a bit of a mixed bag - we've > definitely learnt more as we've gone on - but the NVIDIA definitions > are pretty exemplary for something deeply parameterised along a lot > of variable axes. > > Basically, if you have to have sets of heuristics which you keep in > sync in order to translate from modifier -> hardware layout params, > then your modifiers aren't expressive enough. From a very quick look > at DC, that would be your tile-split, tile-mode, array-mode, and > swizzle-mode parameters, plus whatever from dc_tiling_mode isn't > completely static and deterministic. 'DCCRate' always appears to be > hardcoded to 1 (and 'DCCRateChroma' never set), but that might be one > to parameterise as well. > > With that expression, you don't have to determine the tiling layout > from dimensions/usage/etc, because the modifier _is_ the tiling > layout, ditto compression. > > Encryption I'm minded to consider as something different. Modifiers > don't cover buffer placement at all. That includes whether or not the > memory is physically contiguous, whether it's in > hidden-VRAM/BAR/sysmem, which device it lives on, etc. As far as I can > tell from TMZ, encryption is essentially a side effect of placement? > The memory is encrypted, the encryption is an immutable property of > the allocation, and if the device is configured to access encrypted > memory (by being 'secure'), then the encryption is transparent, no? > > That being said, there is a reasonable argument to consume a single > bit in modifiers for TMZ on/off (assuming TMZ is not parameterised), > which would make its availability and use much more transparent. > > Cheers, > Daniel >