TMZ is more complicated. If there is a TMZ buffer used by a command buffer,
then all other used buffers must also be TMZ or read only. If no TMZ
buffers are used by a command buffer, then TMZ is disabled. If a context is
not secure, TMZ is also disabled. A context can switch between secure and
non-secure based on the buffers being used.

So mixing secure and non-secure memory writes in one command buffer won't
work. This is not fixable in the driver - apps must be aware of this.

Marek

On Wed, Jun 3, 2020 at 5:50 AM Daniel Stone <daniel@fooishbar.org> wrote:

> Hi Alex,
>
> On Mon, 1 Jun 2020 at 15:25, Alex Deucher <alexdeucher@gmail.com> wrote:
> > On Fri, May 29, 2020 at 11:03 AM Daniel Stone <daniel@fooishbar.org>
> wrote:
> > > What Weston _does_ know, however, is that display controller can work
> > > with modifier set A, and the GPU can work with modifier set B, and if
> > > the client can pick something from modifier set A, then there is a
> > > much greater probability that Weston can leave the GPU alone so it can
> > > be entirely used by the client. It also knows that if the surface
> > > can't be directly scanned out for whatever reason, then there's no
> > > point in the client optimising for direct scanout, and it can tell the
> > > client to select based on optimality purely for the GPU.
> >
> > Just so I understand this correctly, the main reason for this is to
> > deal with display hardware and render hardware from different vendors
> > which may or may not support any common formats other than linear.
>
> It handles pretty much everything other than a single-context,
> single-GPU, single-device, tunnel.
>
> When sharing between subsystems and device categories, it lets us talk
> about capabilities in a more global way. For example, GBM lets you
> talk about 'scanout' and 'texture' and 'render', but what about media
> codecs? We could add the concept of decode/encode to something like
> GBM, and all the protocols like Wayland/X11 as well, then hope it
> actually works, but ...
>
> When sharing between heterogeneous vendors, it lets us talk about
> capabilities in a neutral way. For example, if you look at most modern
> Arm SoCs, your GPU, display controller, and media codec, will very
> likely all be from three totally different vendors. A GPU like
> Mali-T8xx can be shipped in tens of different vendor SoCs in several
> different revisions each. Just saying 'scanout' is totally meaningless
> for the Panfrost driver. Putting awareness for every different KMS
> platform and every different codec down into the Mesa driver is a
> synchronisation nightmare, and all those drivers would also need
> specific awareness about the Mesa driver. So modifiers allow us to
> explicitly describe that we want a particular revision of Arm
> Framebuffer Compression, and all the components can understand that
> without having to be specifically aware of 15 different KMS drivers.
> But even if you have the same vendor ...
>
> When sharing between multiple devices of the same class from the same
> vendor, it lets us surface and transit that information in a generic
> way, without AMD having to figure out ways to tunnel back-channel
> information between different instances of drivers potentially
> targeting different revisions. The alternatives seem to be deeply
> pessimal hacks, and we think we can do better. And when we get
> pessimal ...
>
> In every case, modifiers are about surfacing and sharing information.
> One of the reasons Collabora have been putting so much time and energy
> into this work is exactly _because_ solving those problems on a
> case-by-case basis was a pretty lucrative source of revenue for us.
> Debugging these kinds of issues before has usually involved specific
> driver knowledge, hacking into the driver to insert your own tracing,
> etc.
>
> If you (as someone who's trying to use a device optimally) are
> fortunate enough that you can get the attention of a vendor and have
> them solve the problem for you, then that's lucky for everyone apart
> from the AMD engineers who have to go solve it. If you're not, and you
> can't figure it out yourself, then you have to go pay a consultancy.
> On the face of it, that's good for us, except that we don't want to be
> doing that kind of repetitive boring work. But it's bad for the
> ecosystem that this knowledge is hidden away and that you have to pay
> specialists to extract it. So we're really keen to surface as much
> mechanism and information as possible, to give people the tools to
> either solve their own problems or at least make well-informed
> reports, burn down a toxic source of revenue, waste less engineering
> time extracting hidden information, and empower users as much as
> possible.
>
> > It
> > provides a way to tunnel device capabilities between the different
> > drivers.  In the case of a device with display and rendering on the
> > same device or multiple devices from the same vendor, it not really
> > that useful.
>
> Oh no, it's still super useful. There are a ton of corner cases where
> 'if you're on same same-vendor same-gen same-silicon hardware' falls
> apart - in addition to the world just not being very much
> same-vendor/same-gen/same-silicon anymore. For some concrete examples:
>
> On NVIDIA Tegra hardware, planes within the display controller have
> heterogeneous capability. Some can decompress and detile, others
> can't.
>
> On Rockchip hardware, AFBC (DCC equivalent) is available for scanout
> on any plane, and can be produced by the GPU. Great! Except that 'any
> plane' isn't 'every plane' - there's a global decompression unit.
>
> On Intel hardware, they appear to have forked the media codec IP,
> shipping two different versions of the codec, one as 'low-power' and
> one as 'normal', obviously with varying capability.
>
> Even handwaving those away as vendor errors - that performance on
> those gens will always be pessimal and they should do better next time
> - I don't think same-vendor/same-gen/same-silicon is a good design
> point anymore. Between heterogeneous cut-and-paste SoCs, multi-GPU and
> eGPU usecases, virtualisation and tunneling, etc, the usecases are
> starting to demand that we do better. Vulkan's memory-allocation
> design also really pushes against the model that memory allocations
> themselves are blessed with side-channel descriptor tags.
>
> 'Those aren't my usecases and we've made Vulkan work so we don't need
> it' is an entirely reasonable position, but then you're just
> exchanging the problem of describing your tiling & compression layouts
> in a 56-bit enum to make modifiers work, for the problem of
> maintaining a surprisingly wide chunk of the display stack. For all
> the reasons above, over the past few years, the entire rest of the
> ecosystem has settled on using modifiers to describe and negotiate
> buffer exchange across context/process/protocol/subsystem/device
> boundaries. All the effort of making this work in KMS, GBM, EGL,
> Vulkan, Wayland, X11, V4L2, VA-API, GStreamer, etc, is going there.
>
> Realistically, the non-modifier path is probably going to bitrot, and
> people are certainly resistant to putting more smarts into it, because
> it just adds complexity to a now-single-vendor path - even NVIDIA are
> pushing this forward, and their display path is much more of an
> encapsulated magic tunnel than AMD's. In that sense, it's pretty much
> accumulating technical debt; the longer you avoid dealing with the
> display stack by implementing modifiers, the more work you have to put
> into maintaining the display stack by fixing the non-modifier path.
>
> > It doesn't seem to provide much over the current EGL
> > hints (SCANOUT, SECURE, etc.).
>
> Well yeah, if those single bits of information are enough to perfectly
> encapsulate everything you need to know, then sure. But it hasn't been
> for others, which is why we've all migrated away from them.
>
> > I still don't understand how it solves
> > the DCC problem though.  Compression and encryption seem kind like
> > meta modifiers.  There is an under laying high level layout, linear,
> > tiled, etc. but it could also be compressed and/or encrypted.  Is the
> > idea that those are separate modifiers?  E.g.,
> > 0: linear
> > 1: linear | encrypted
> > 2. linear | compressed
> > 3: linear | encrypted | compressed
> > 4: tiled1
> > 5: tiled1 | encrypted
> > 6: tiled1 | compressed
> > 7: tiled1 | encrypted | compressed
> > etc.
> > Or that the modifiers only expose the high level layout, and it's then
> > up the the driver(s) to enable compression, etc. if both sides have a
> > compatible layout?
>
> Do you remember the old wfb from xserver? Think of modifiers as pretty
> much that. The format (e.g. A8R8G8B8) describes what you will read
> when you load a particular pixel/texel, and what will get stored when
> you write. The modifier describes how to get there: that includes both
> tiling (since you need to know the particular tiling layout in order
> to know the byte location to access), and compression (since you need
> to know the particular compression mechanism in order to access the
> pixel, e.g. for RLE-type compression that you need to access the first
> pixel of the tile if the 'all pixels are the identical' bit is set).
>
> The idea is that these tokens fully describe the mechanisms in use,
> without the drivers needing to do magic heuristics. For instance, if
> your modifier is just 'tiled', then that's not a full description. A
> full description would tell you about supertiling structures, tile
> sizes and ordering, etc. The definitions already in
> include/uapi/drm/drm_fourcc.h are a bit of a mixed bag - we've
> definitely learnt more as we've gone on - but the NVIDIA definitions
> are  pretty exemplary for something deeply parameterised along a lot
> of variable axes.
>
> Basically, if you have to have sets of heuristics which you keep in
> sync in order to translate from modifier -> hardware layout params,
> then your modifiers aren't expressive enough. From a very quick look
> at DC, that would be your tile-split, tile-mode, array-mode, and
> swizzle-mode parameters, plus whatever from dc_tiling_mode isn't
> completely static and deterministic. 'DCCRate' always appears to be
> hardcoded to 1 (and 'DCCRateChroma' never set), but that might be one
> to parameterise as well.
>
> With that expression, you don't have to determine the tiling layout
> from dimensions/usage/etc, because the modifier _is_ the tiling
> layout, ditto compression.
>
> Encryption I'm minded to consider as something different. Modifiers
> don't cover buffer placement at all. That includes whether or not the
> memory is physically contiguous, whether it's in
> hidden-VRAM/BAR/sysmem, which device it lives on, etc. As far as I can
> tell from TMZ, encryption is essentially a side effect of placement?
> The memory is encrypted, the encryption is an immutable property of
> the allocation, and if the device is configured to access encrypted
> memory (by being 'secure'), then the encryption is transparent, no?
>
> That being said, there is a reasonable argument to consume a single
> bit in modifiers for TMZ on/off (assuming TMZ is not parameterised),
> which would make its availability and use much more transparent.
>
> Cheers,
> Daniel
>